Google DeepMind Robotics

I was asked to make an animation for a blog…

and I accidentally created the mascot for Google DeepMind Robotics. Plan was simple, design a robot and show it evolve into a new and improved robot. Since then, I’ve produced several different variations of the robot, even a dog. One of my animations was recently played at ICRA 2024 in Tokyo, Japan and the researchers went home with an award.

SARA: Self-Adaptive Robust Attention

Google DeepMind introduces SARA-RT, a method that improves the efficiency of Robotics Transformers (RT) for robots. SARA-RT makes complex RT models smaller and faster while maintaining their accuracy. This could be beneficial for deploying robots in real-world situations.

SARA was awarded Best Paper in Robotic Manipulation at ICRA 2024

DAY/NIGHT: Learning to Learn Faster from Human Feedback with Language Model Predictive Control

Our robots can be taught to do new tasks, using large Foundation models (LLM & VLMs) pre-trained with in-context learning (ICL), which enables both high-level teaching and low-level teaching.

RT-Sketch: Goal-Conditioned Imitation Learning from Hand-Drawn Sketches

RT-Sketch introduces hand-drawn sketches as a new way for robots to learn. Sketches offer a clear and flexible way to specify goals, allowing robots to understand the task and handle ambiguity better than traditional methods.

Scaling up learning across many different robot types

Together with partners from 33 academic labs, we have pooled data from 22 different robot types to create the Open X-Embodiment dataset and RT-X model

Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions

In this work, we present a scalable reinforcement learning method for training multi-task policies from large offline datasets that can leverage both human demonstrations and autonomously collected data. Our method uses a Transformer to provide a scalable representation for Q-functions trained via offline temporal difference backups. We therefore refer to the method as Q-Transformer.

RT-2: Vision-Language-Action Models

Robotic Transformer 2 (RT-2) is a novel vision-language-action (VLA) model that learns from both web and robotics data, and translates this knowledge into generalised instructions for robotic control.