[HN Gopher] Voyager: An Open-Ended Embodied Agent with LLMs
___________________________________________________________________
Voyager: An Open-Ended Embodied Agent with LLMs
Author : jedixit
Score : 76 points
Date : 2023-05-26 16:02 UTC (6 hours ago)
(HTM) web link (voyager.minedojo.org)
(TXT) w3m dump (voyager.minedojo.org)
| lsy wrote:
| I'm not sure how the authors arrive at the idea that this agent
| is embodied or open-ended. It is sending API calls to minecraft,
| there's no "body" involved except as a symbolic concept in a game
| engine, and the fact that minecraft is a video game with a
| limited variety of behaviors (and the authors give the GPT an
| "overarching goal" of novelty) precludes open-endedness. To me
| this feels like an example of the ludic fallacy. Spitting out
| "bot.equip('sword')" requires a lot of non-LLM work to be done on
| the back end of that call to actually translate to game
| mechanics, and it doesn't indicate that the LLM understands
| anything about what it "really" means to equip a sword, or that
| it would be able to navigate a real-world environment with swords
| etc.
| jamilton wrote:
| I wonder how many prompts this uses in a minute.
|
| Interestingly, they mod the server so that the game pauses while
| waiting for a response from GPT-4. That's a nice way to get
| around the delays.
| tehsauce wrote:
| This is very cool despite the most important caveat:
|
| "Note that we do not directly compare with prior methods that
| take Minecraft screen pixels as input and output low-level
| controls [54-56]. It would not be an apple-to-apple comparison,
| because we rely on the high-level Mineflayer [53] API to control
| the agent. Our work's focus is on pushing the limits of GPT-4 for
| lifelong embodied agent learning, rather than solving the 3D
| perception or sensorimotor control problems. VOYAGER is
| orthogonal and can be combined with gradient-based approaches
| like VPT [8] as long as the controller provides a code API."
| startupsfail wrote:
| TLDR. An AI system with an IQ of 110-130, with some careful
| prompting can generate code to play Minecraft through an API.
| andai wrote:
| >130 IQ
|
| >mines straight up and down
| startupsfail wrote:
| I'm not sure what is the point that you are making. GPT-4
| does tend to pass various IQ tests with the scores in the
| range of 110 to 130, with outliers between 90 to 150.
| notamy wrote:
| It's a joke about playing the game "right." Mining straight
| up/down is a rather suboptimal strategy, as:
|
| - mining straight up means you either seal your path behind
| you, or are limited how high up you can go
|
| - mining straight down likely traps you in a pit
|
| - mining straight down far enough can drop you straight
| into lava, as many Minecraft players learn early on
| bartwr wrote:
| This is kind of amazing given that obviously, GPT-4 never
| contained such tasks and data. I think it puts an end to the
| claim that "language models are only stochastic parrots and
| cannot do any reasoning". No, this is 100% a form of reasoning
| and furthermore, learning that is more similar to how humans
| learn (gradient-less).
|
| I still don't understand it and it blows my mind - how such
| properties emerge just from compressing the task of next word
| prediction. (Yes, I know this is oversimplification, but not a
| misleading one).
| asperous wrote:
| While I do believe LLMs can perform some reasoning, I'm not
| sure this is the best example as all the reasoning you would
| ever need for Minecraft is well contained in the data set used
| to train it. A lot has been written about minecraft.
|
| To me, it would be more convincing if they developed an enterly
| new game with somewhat novel and arbitrary rules and saw if the
| embodied agent could learn this game.
| notamy wrote:
| Looking at the paper, as I understand it they're using
| Mineflayer https://github.com/PrismarineJS/mineflayer and
| passing parts of the state of the game as JSON to the LLM that
| are used for code generation to complete tasks.
|
| > I still don't understand it and it blows my mind - how such
| properties emerge just from compressing the task of next word
| prediction.
|
| The Mineflayer library is very popular, so all the relevant
| tasks are likely already extant in the training data.
| emptysongglass wrote:
| You declare:
|
| > I think it puts an end to the claim that "language models are
| only stochastic parrots and cannot do any reasoning".
|
| But then two sentences later:
|
| > I still don't understand it and it blows my mind
|
| I've said this before to others and it bears repeating because
| your line of thinking is dangerous (not sudden AI cataclysm):
| to feel so totally qualified to make such a statement armed
| with ignorance, not knowledge, is the cause of mass hysteria
| around LLMs.
|
| What is happening can be understood without resorting to the
| sort of magical thinking that ascribes agency to these models.
| godelski wrote:
| > What is happening can be understood without resorting to
| the sort of magical thinking that ascribes agency to these
| models.
|
| This is what has (as an ML researcher) made me hate
| conversations around ML/AI recently. Honestly getting me
| burned out on an area of research I truly love and am
| passionate about. A lot of technical people openly and
| confidently are talking about magic. Talking as if the model
| didn't have access to relevant information (the "zero-shot
| myth") and other such nonesense. It is one thing for a layman
| to say these things, but another to see them on the top
| comment on a website aimed at people with high tech literacy.
| And even worse to see it coming from my research peers. These
| models are impressive, and I don't want to diminish that (I
| shouldn't have to say this sentence but here we are), but we
| have to be clear that the models aren't magic either. We know
| a lot about how they work too. They aren't black boxes, they
| are opaque, and every day we reduce the opacity.
|
| For clarity: here's an alternative explanation to the results
| that's even weaker than the paper's settings (explains
| autogpt better). LLM has a good memory. LLM is told (or can
| infer through relevant information like keywords: "diamond
| axe") that it is in a minecraft setting. It then looks up a
| compressed version of a player's guide that was part of its
| training data. It then uses that data to execute goals. This
| is still an impressive feat! But it is still in line with the
| stochastic parrot paradigm. I'm not sure why people don't
| think stochastic parrots aren't impressive. They are.
|
| But right now ML/AI culture feels like Anime or weed culture.
| The people it attracts makes you feel embarrassed to be
| associated with it.
| og_kalu wrote:
| >LLM has a good memory. LLM is told (or can infer through
| relevant information like keywords: "diamond axe") that it
| is in a minecraft setting. It then looks up a compressed
| version of a player's guide that was part of its training
| data. It then uses that data to execute goals.
|
| What about any of what you've just said screams parrot to
| you ?
|
| I mean here is how the man who coined the term describes
| it.
|
| A "stochastic parrot", according to Bender, is an entity
| "for haphazardly stitching together sequences of linguistic
| forms ... according to probabilistic information about how
| they combine, but without any reference to meaning."
|
| So..what exactly from what you've just stated implies the
| above meaning ?
| [deleted]
| godelski wrote:
| > GPT-4 never contained such tasks and data
|
| No task, but we need to be clear that it did have the data.
| Remember that GPT4 was trained on a significant portion of the
| internet, which likely includes sites like Reddit and game fact
| websites. So there's a good chance GPT4 learned the tech tree
| and was trained on data about how to progress up that tree,
| including speed runner discussions. (also remember that as of
| March GPT4 is also trained on images, not just text)
|
| What data it was trained on is very important and I'm not sure
| why we keep coming back to this issue. "GPT4 has no zero-shot
| data" should be as drilled into everyone's head as sayings like
| "correlation does not equate to causation" and "garbage in,
| garbage out". Maybe people do not know this data is on the
| internet? But I'm surprised if the average HN user thought that
| way.
|
| This doesn't make the paper less valuable or meaningful. But it
| is more like watching a 10 year old who's read every chess book
| and played against computers beat (or do really well) against a
| skilled player vs a 10 year old who's never heard of chess
| beating a skilled player. Both are still impressive, one just
| seems like magic though and should raise suspicion.
___________________________________________________________________
(page generated 2023-05-26 23:01 UTC)