[HN Gopher] Voyager: An Open-Ended Embodied Agent with LLMs
       ___________________________________________________________________
        
       Voyager: An Open-Ended Embodied Agent with LLMs
        
       Author : jedixit
       Score  : 76 points
       Date   : 2023-05-26 16:02 UTC (6 hours ago)
        
 (HTM) web link (voyager.minedojo.org)
 (TXT) w3m dump (voyager.minedojo.org)
        
       | lsy wrote:
       | I'm not sure how the authors arrive at the idea that this agent
       | is embodied or open-ended. It is sending API calls to minecraft,
       | there's no "body" involved except as a symbolic concept in a game
       | engine, and the fact that minecraft is a video game with a
       | limited variety of behaviors (and the authors give the GPT an
       | "overarching goal" of novelty) precludes open-endedness. To me
       | this feels like an example of the ludic fallacy. Spitting out
       | "bot.equip('sword')" requires a lot of non-LLM work to be done on
       | the back end of that call to actually translate to game
       | mechanics, and it doesn't indicate that the LLM understands
       | anything about what it "really" means to equip a sword, or that
       | it would be able to navigate a real-world environment with swords
       | etc.
        
       | jamilton wrote:
       | I wonder how many prompts this uses in a minute.
       | 
       | Interestingly, they mod the server so that the game pauses while
       | waiting for a response from GPT-4. That's a nice way to get
       | around the delays.
        
       | tehsauce wrote:
       | This is very cool despite the most important caveat:
       | 
       | "Note that we do not directly compare with prior methods that
       | take Minecraft screen pixels as input and output low-level
       | controls [54-56]. It would not be an apple-to-apple comparison,
       | because we rely on the high-level Mineflayer [53] API to control
       | the agent. Our work's focus is on pushing the limits of GPT-4 for
       | lifelong embodied agent learning, rather than solving the 3D
       | perception or sensorimotor control problems. VOYAGER is
       | orthogonal and can be combined with gradient-based approaches
       | like VPT [8] as long as the controller provides a code API."
        
       | startupsfail wrote:
       | TLDR. An AI system with an IQ of 110-130, with some careful
       | prompting can generate code to play Minecraft through an API.
        
         | andai wrote:
         | >130 IQ
         | 
         | >mines straight up and down
        
           | startupsfail wrote:
           | I'm not sure what is the point that you are making. GPT-4
           | does tend to pass various IQ tests with the scores in the
           | range of 110 to 130, with outliers between 90 to 150.
        
             | notamy wrote:
             | It's a joke about playing the game "right." Mining straight
             | up/down is a rather suboptimal strategy, as:
             | 
             | - mining straight up means you either seal your path behind
             | you, or are limited how high up you can go
             | 
             | - mining straight down likely traps you in a pit
             | 
             | - mining straight down far enough can drop you straight
             | into lava, as many Minecraft players learn early on
        
       | bartwr wrote:
       | This is kind of amazing given that obviously, GPT-4 never
       | contained such tasks and data. I think it puts an end to the
       | claim that "language models are only stochastic parrots and
       | cannot do any reasoning". No, this is 100% a form of reasoning
       | and furthermore, learning that is more similar to how humans
       | learn (gradient-less).
       | 
       | I still don't understand it and it blows my mind - how such
       | properties emerge just from compressing the task of next word
       | prediction. (Yes, I know this is oversimplification, but not a
       | misleading one).
        
         | asperous wrote:
         | While I do believe LLMs can perform some reasoning, I'm not
         | sure this is the best example as all the reasoning you would
         | ever need for Minecraft is well contained in the data set used
         | to train it. A lot has been written about minecraft.
         | 
         | To me, it would be more convincing if they developed an enterly
         | new game with somewhat novel and arbitrary rules and saw if the
         | embodied agent could learn this game.
        
         | notamy wrote:
         | Looking at the paper, as I understand it they're using
         | Mineflayer https://github.com/PrismarineJS/mineflayer and
         | passing parts of the state of the game as JSON to the LLM that
         | are used for code generation to complete tasks.
         | 
         | > I still don't understand it and it blows my mind - how such
         | properties emerge just from compressing the task of next word
         | prediction.
         | 
         | The Mineflayer library is very popular, so all the relevant
         | tasks are likely already extant in the training data.
        
         | emptysongglass wrote:
         | You declare:
         | 
         | > I think it puts an end to the claim that "language models are
         | only stochastic parrots and cannot do any reasoning".
         | 
         | But then two sentences later:
         | 
         | > I still don't understand it and it blows my mind
         | 
         | I've said this before to others and it bears repeating because
         | your line of thinking is dangerous (not sudden AI cataclysm):
         | to feel so totally qualified to make such a statement armed
         | with ignorance, not knowledge, is the cause of mass hysteria
         | around LLMs.
         | 
         | What is happening can be understood without resorting to the
         | sort of magical thinking that ascribes agency to these models.
        
           | godelski wrote:
           | > What is happening can be understood without resorting to
           | the sort of magical thinking that ascribes agency to these
           | models.
           | 
           | This is what has (as an ML researcher) made me hate
           | conversations around ML/AI recently. Honestly getting me
           | burned out on an area of research I truly love and am
           | passionate about. A lot of technical people openly and
           | confidently are talking about magic. Talking as if the model
           | didn't have access to relevant information (the "zero-shot
           | myth") and other such nonesense. It is one thing for a layman
           | to say these things, but another to see them on the top
           | comment on a website aimed at people with high tech literacy.
           | And even worse to see it coming from my research peers. These
           | models are impressive, and I don't want to diminish that (I
           | shouldn't have to say this sentence but here we are), but we
           | have to be clear that the models aren't magic either. We know
           | a lot about how they work too. They aren't black boxes, they
           | are opaque, and every day we reduce the opacity.
           | 
           | For clarity: here's an alternative explanation to the results
           | that's even weaker than the paper's settings (explains
           | autogpt better). LLM has a good memory. LLM is told (or can
           | infer through relevant information like keywords: "diamond
           | axe") that it is in a minecraft setting. It then looks up a
           | compressed version of a player's guide that was part of its
           | training data. It then uses that data to execute goals. This
           | is still an impressive feat! But it is still in line with the
           | stochastic parrot paradigm. I'm not sure why people don't
           | think stochastic parrots aren't impressive. They are.
           | 
           | But right now ML/AI culture feels like Anime or weed culture.
           | The people it attracts makes you feel embarrassed to be
           | associated with it.
        
             | og_kalu wrote:
             | >LLM has a good memory. LLM is told (or can infer through
             | relevant information like keywords: "diamond axe") that it
             | is in a minecraft setting. It then looks up a compressed
             | version of a player's guide that was part of its training
             | data. It then uses that data to execute goals.
             | 
             | What about any of what you've just said screams parrot to
             | you ?
             | 
             | I mean here is how the man who coined the term describes
             | it.
             | 
             | A "stochastic parrot", according to Bender, is an entity
             | "for haphazardly stitching together sequences of linguistic
             | forms ... according to probabilistic information about how
             | they combine, but without any reference to meaning."
             | 
             | So..what exactly from what you've just stated implies the
             | above meaning ?
        
               | [deleted]
        
         | godelski wrote:
         | > GPT-4 never contained such tasks and data
         | 
         | No task, but we need to be clear that it did have the data.
         | Remember that GPT4 was trained on a significant portion of the
         | internet, which likely includes sites like Reddit and game fact
         | websites. So there's a good chance GPT4 learned the tech tree
         | and was trained on data about how to progress up that tree,
         | including speed runner discussions. (also remember that as of
         | March GPT4 is also trained on images, not just text)
         | 
         | What data it was trained on is very important and I'm not sure
         | why we keep coming back to this issue. "GPT4 has no zero-shot
         | data" should be as drilled into everyone's head as sayings like
         | "correlation does not equate to causation" and "garbage in,
         | garbage out". Maybe people do not know this data is on the
         | internet? But I'm surprised if the average HN user thought that
         | way.
         | 
         | This doesn't make the paper less valuable or meaningful. But it
         | is more like watching a 10 year old who's read every chess book
         | and played against computers beat (or do really well) against a
         | skilled player vs a 10 year old who's never heard of chess
         | beating a skilled player. Both are still impressive, one just
         | seems like magic though and should raise suspicion.
        
       ___________________________________________________________________
       (page generated 2023-05-26 23:01 UTC)