.md

Kyle Phillips

Creative Technologist, Google NYC

Gemini Robotics

Using conversational AI for embodied reasoning demonstrations

March, 2025

In December of 2024, I launched the Projects/Gemini Live API Web Console. That project lowered the barrier of prototyping with conversational AI and it became an ideal basis for Deepmind researchers to build upon. I worked with People/Fei Xia and others at Deepmind and Creative Lab to produce a series of robotics demonstrations of embodied reasoning.

An animation of 9 different displays of robotics showcasing traits of embodiment

Working with robotics and AI researchers in New York, London and Mountain View I helped build servers, natural language tools and custom demonstrations for 3 different types of robots: a humanoid (Atari), a bi-arm robot (Aloha) and a 9-DOF industrial robot (Omega).

Spatial Understanding: Vectors In this demonstration the camera feed from the robot is sent into Gemini's Embodied Reasoning model which responds with vectors spanning objects. You see this used repeatedly in demonstrations such as when Aloha is folding an origami fox and the model suggests where the eyes of the fox should be.

Video of 2D Bounding Boxes output

Screenshot of the tool with Settings panel open New tools were developed for the project, some of which have since been added to the github repository; for example this settings panel that allows you to view the registered tools as well as to modify the system instructions.

1 Reference

  1. Projects/Gemini Live API Web Console