Gemini Robotics

In December of 2024, I launched the Projects/Gemini Live Web Console. That project lowered the barrier of prototyping with conversational AI and it became an ideal basis for Deepmind researchers to build upon. I worked with People/Fei Xia and others at Deepmind and Creative Lab to produce a series of robotics demonstrations of embodied reasoning.

Working with robotics and AI researchers in New York, London and Mountain View I helped build servers, natural language tools and custom demonstrations for 3 different types of robots: a humanoid (Atari), a bi-arm robot (Aloha) and a 9-DOF industrial robot (Omega).

An animation of 9 different displays of robotics showcasing traits of embodiment

Spatial Understanding: Vectors In this demonstration the camera feed from the robot is sent into Gemini's Embodied Reasoning model which responds with vectors spanning objects. You see this used repeatedly in demonstrations such as when Aloha is folding an origami fox and the model suggests where the eyes of the fox should be.

Video of 2D Bounding Boxes output

Screenshot of the tool with Settings panel open New tools were developed for the project, some of which have since been added to the github repository; for example this settings panel that allows you to view the registered tools as well as to modify the system instructions.

Kyle Phillips

Creative Technologist, Google NYC

Gemini Robotics

Conversational AI enables robots to reason about the physical world

March, 2025

Supporting Materials

Creative Lab team

1 Reference