https://www.deepmind.com/publications/a-generalist-agent

[621d346e39][621d346e07]
 
Research
 
Blog
 
Impact
 
Safety & Ethics
 
About
 
Careers
MenuClose
[                    ][ ]
[621debb799]
 
Research
 
Publications
 
Authors' notes
 
Open source
 
Highlighted Research
 
AlphaFold
 
AlphaGo
 
WaveNet
 
Blog
 
Applied
 
Company
 
Ethics and Society
 
Events
 
Open source
 
Research
Teams
 
Research
 
Applied
 
Engineering
 
Ethics & Society
 
Operations
 
Science
 
About
 
Impact
 
Safety & Ethics
 
Careers
 
Scholarships
 
Learning resources
 
The Podcast
 
Press
 
Terms and conditions
 
Privacy policy
 
Modern Slavery Statement
 
Alphabet Inc.
 
 
 
 
 
Paper
Publication
A Generalist Agent

  * Download
  * View publication
  * View open source

Abstract

Inspired by progress in large-scale language modelling, we apply a
similar approach towards building a single generalist agent beyond
the realm of text outputs. The agent, which we refer to as Gato,
works as a multi-modal, multi-task, multi-embodiment generalist
policy. The same network with the same weights can play Atari,
caption images, chat, stack blocks with a real robot arm and much
more, deciding based on its context whether to output text, joint
torques, button presses, or other tokens. In this report we describe
the model and the data, and document the current capabilities of
Gato.

Authors' notes

Inspired by progress in large-scale language modelling, we apply a
similar approach towards building a single generalist agent beyond
the realm of text outputs. The agent, which we refer to as Gato,
works as a multi-modal, multi-task, multi-embodiment generalist
policy. The same network with the same weights can play Atari,
caption images, chat, stack blocks with a real robot arm and much
more, deciding based on its context whether to output text, joint
torques, button presses, or other tokens.

[627d13d743]

During the training phase of Gato, data from different tasks and
modalities are serialised into a flat sequence of tokens, batched,
and processed by a transformer neural network similar to a large
language model. The loss is masked so that Gato only predicts action
and text targets.

[627d148b71]

When deploying Gato, a prompt, such as a demonstration, is tokenised,
forming the initial sequence. Next, the environment yields the first
observation, which is also tokenised and appended to the sequence.
Gato samples the action vector autoregressively, one token at a time.



Once all tokens comprising the action vector have been sampled
(determined by the action specification of the environment), the
action is decoded and sent to the environment which steps and yields
a new observation. Then the procedure repeats. The model always sees
all previous observations and actions within its context window of
1024 tokens.

[627d14de5d]

Gato is trained on a large number of datasets comprising agent
experience in both simulated and real-world environments, in addition
to a variety of natural language and image datasets. The number of
tasks, where the performance of the pretrained Gato model is above a
percentage of expert score, grouped by domain, is shown here.

[627d15240b]

The following images also show how the pre-trained Gato model with
the same weights can do image captioning, engage in an interactive
dialogue, and control a robot arm, among many other tasks.

[627d15dba0]
[627d161a97]
[627d1648c0]
Authors
Scott Reed, Konrad Zolna, Emilio Parisotto, Sergio Gomez Colmenarejo,
Alexander Novikov, Gabriel Barth-Maron, Mai Gimenez, Yury Sulsky,
Jackie Kay, Jost Tobias Springenberg, Tom Eccles, Jake Bruce, Ali
Razavi, Ashley Edwards, Nicolas Heess, Yutian Chen, Raia Hadsell,
Oriol Vinyals, Mahyar Bordbar, and Nando de Freitas
Venue
arXiv
Published
May 12, 2022
Tags
 
Games
 
Language
 
Control & robotics
Share

  *  
  *  
  *  

Solving intelligence to advance science and benefit humanity
 
Research
 
Highlighted research
 
Publications
 
Authors' notes
 
Open source
 
Blog
 
Applied
 
Company
 
Ethics and Society
 
Events
 
Open source
 
Research
 
About
 
Impact
 
Safety & Ethics
 
Careers
 
Press
 
Terms and conditions
 
Privacy policy
 
Modern slavery statement
 
Alphabet Inc.