[HN Gopher] Can language models serve as text-based world simula...
___________________________________________________________________
Can language models serve as text-based world simulators?
Author : mpweiher
Score : 68 points
Date : 2024-06-15 12:25 UTC (10 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| Bluestein wrote:
| Nethack FTW :)
| ziggy_star wrote:
| Seriously, ha, time is a flat circle.
| anthk wrote:
| I still love slashem.
| xrd wrote:
| I made a D&D game for my kids using llama. It went surprisingly
| well. Lots of potential here.
|
| https://blog.katarismo.com/2023-05-26-i-m-a-dad-i-replaced-m...
| klaussilveira wrote:
| This is neat. In theory, you could hook llama.cpp into into a
| GOAL-based planner (https://www.gamedevs.org/uploads/three-
| states-plan-ai-of-fea...) and have much better default bots
| navigating your nav mesh. Even better, if you record player
| actions as GOAL actions within the nav mesh, you can use that to
| fine tune the model. Or even feed it back in realtime so they
| learn the modus operandi of the player.
| skybrian wrote:
| Maybe they can, but they're trained on conversation and stories
| (among other things) rather than simulations. These are more
| general than the log of a simulation run in how they use time.
| Stories can be chronological, but they can also fill in the
| details in any order using things like flashbacks. Or they can
| get really complicated with time travel stories.
|
| So it seems like to understand a story well, an LLM would need a
| more sophisticated notion of time, where there is a history and
| events are slotted in as they're learned from the narrative, and
| sometimes the whole history gets reinterpreted based on new
| information. (Plot twist!)
|
| It would be fascinating if some mechanistic interpretability
| researcher figured out how they actually work with story time.
| Are there the rudiments of that kind of understanding yet?
| lyu07282 wrote:
| TL;DR: unfortunately the conclusion is, no they can't
|
| > the best recorded performance is 59.9% on accurately simulating
| state transitions [...] after 10 steps, average simulation
| accuracy would reduce to less than 1%. Our results indicate that
| LLMs are not yet able to reliably act as text world simulators
| guestbest wrote:
| I think there is a MUD effort called LlamaTale which allows
| creation of this as telepresence. I'm trying to put a MUD
| together to try this out.
| firtoz wrote:
| Would be interesting to see a version of this but incorporating
| tool usage features of the latest models.
| gfosco wrote:
| Not by themselves... but with a good program built around it,
| managing the actual state, and very careful prompting, yes. I've
| been thinking about this for a while, always have the desire to
| make a game.
| bongodongobob wrote:
| Exactly. I've been working on something like this for a while
| using finite state machines to control the prompts. The biggest
| struggle I've had is creating memory. It's one thing to save a
| list of events and feed it into the prompts, but there are
| always issues interpreting it and making sure the memories are
| detailed enough but not an entire page.
|
| For example, you "conquered the dungeon in level 1". If this
| gets saved as "conquered the dungeon" the next time you get to
| a new dungeon, it may think you already beat it and won't
| generate NPC monsters, that kind of thing.
| chumpro wrote:
| I am also working on something similar. I have written a
| grammar system for controlling the llm that is not context
| free.
|
| One of the main challenges is to have a database of what is
| reality, id -> object, and then have the llm return ids for
| objects referenced by the user. I am doing a system where
| something that has no id has not been created yet.
|
| I use story templates with labeled segments to create
| characters, items, location as well as events. Example: "Once
| upon a time there was a <role>cowardly knight</role> named
| <name>Billy Bonkers</name>. The named qualities become the
| data structure associated with the id of the new character.
| It can be hilarious to prompt the llm to do nerdy space
| humor!
|
| I seems to be very possible to create a "narrator driven"
| game happening in a "real" world.
|
| I plan on using a multimodal llm and all the stuff that is
| created will have an image. That combined with id-less
| objects existing but not having a detailed description will
| make qualities and objects visible in the created images also
| part of the world.
| anthk wrote:
| Inform7 it's a beast and the resulting game can be run under a
| 486. No AI required.
| d13 wrote:
| Try it!
|
| All you need to do is type this into llama 3 8B or 70B 16fp
| instruct:
|
| "You are a text adventure game."
|
| Done.
| sbotzek wrote:
| I tried for a few months getting ChatGPT4 to work with a MUD. In
| my experience it's not very good at that particular task.
|
| One problem I ran into was its ability to logically connect
| rooms. In a MUD you navigate by going north, south, east, west,
| up, or down. Not every room lets you go any direction. And
| usually, if you go east, in your new room, you can go west.
| Rarely a level creator will make this not be true. ChatGPT4 was
| pretty bad at it though.
|
| Another problem was descriptions. It might mention a mountain in
| the distance once. But then never again. So this giant landmark
| was described in a single room.
|
| It was also difficult to get it to create a fair quantity of
| secrets in logical places. Lots of times it would just chain
| together multiple secrets in a single place. If you have more
| than one, you want to spread it around
|
| And finally, room layout. It tended to not be very good at this.
| Lots of linear layouts. It didn't have an eye towards when
| details should be complex rooms and when it can just be a line in
| a description.
|
| So it could do it, but it created levels that weren't very fun or
| particularly creative, even when it came to room descriptions.
| danjc wrote:
| Somewhat related, Yann LeCun posted a few months back about how
| many concepts aren't understood through language and therefore
| can't be modeled through it (which is why LLM's are terrible
| with things like position and direction).
|
| https://x.com/ylecun/status/1768353714794901530?s=46
| Folcon wrote:
| To people who claim that "thinking and reasoning require
| language", here is a problem: Imagine standing at the
| North Pole of the Earth. Walk in any direction, in a
| straight line, for 1 km. Now turn 90 degrees to the
| left. Walk for as long as it takes to pass your
| starting point. Have you walked: 1. More than
| 2xPi km 2. Exactly 2xPi km 3. Less than 2xPi
| km 4. I never came close to my starting point.
| Think about how you tried to answer this question and tell us
| whether it was based on language.
|
| Just quoting this here in case anything happens to the
| tweet...
|
| I agree with this, however I have one tiny nitpick, feel free
| to tell me if you think I'm wrong or being overly nitpicky,
| but the knowledge of the situation in which the phenomenon
| that he's describing occurs, I learned about entirely from
| language. So my ability to answer the question is based on
| that knowledge.
|
| I'm aware that the reasoning problem itself doesn't utilise
| it, but a position and direction system in and of itself
| arguably also suffers from being insufficient.
|
| I suppose I'm wondering if "setup" counts as needing language
| model?
| dragonwriter wrote:
| I don't think the example is a good rebuttal of "thinking
| and reasoning require language".
|
| It _may be_ a decent challenge, probably still not an
| actual rebuttal, of "language is sufficient for all
| thinking and reasoning", but "X is required for Y" and "X
| is sufficient for everything encompassed by Y" are very
| different claims.
| Folcon wrote:
| That's fair, it's not supposed to be a rebuttal of that
| :)...
|
| Though now that you said it, I'm really thinking about
| the original statement.
|
| Honest question, can you give me an example of thinking
| or reasoning that happens fully independently of
| reasoning via symbols or their manipulation?
|
| I feel like I'm missing something obvious, but nothing is
| coming to mind right now :)...
|
| Just thinking about the underlying statement and
| simplifying language down to symbolic expression. (I was
| originally going to say manipulation, but it doesn't feel
| like it quite fits...)
| dragonwriter wrote:
| > Honest question, can you give me an example of thinking
| or reasoning that happens fully independently of
| reasoning via symbols or their manipulation?
|
| I don't think we have anything but subjective, indirect
| understanding of _how_ thinking happens, but I think
| that, at a minimum, what we describe as "reasoning"
| specifically is tightly conceptually related to, if not a
| subset of, manipulation of abstract symbols to which
| concrete experiences may be approximately mapped.
|
| I'm not sure I'd say this is the same thing as language,
| but there's at a minimum a shared common symbol-
| manipulation underlying both. Do the capacities always
| come together? I'm not sure how we would know that, I
| think our ability to recognize reasoning is tied to it
| being mapped to language, and are ability to distinguish
| something as language rather than nonlinguistic signaling
| or mimicry of something else that is using language is
| tied to independent expression of reasoning through it.
| erik_seaberg wrote:
| If the brain actually used language to represent ideas in
| use, we should have succeeded in finding its
| https://en.wikipedia.org/wiki/Universal_grammar. Instead it
| seems more like language is a kind of lossy compression we
| keep reinventing for ideas, to export them from a brain in
| some tangible way and get them into another (even in the
| future).
| Folcon wrote:
| Hmm, I'm not sure if that holds?
|
| Just because you use a mechanism for expressing your
| ideas and reasoning, doesn't mean that underlying reality
| has to confirm in any way to it?
|
| We invent new symbols and terms all the time as we
| experience new phenomena. A universal grammar may still
| be possible, barring incompleteness anyway, we may just
| be lacking a whole bunch of ideas still...
| bottom999mottob wrote:
| @erik_seaberg the lack of success in finding a universal
| grammar is a logical leap. The failure to find something
| does not necessarily mean it doesn't exist. Language is
| powerful for expressing abstract ideas without explicitly
| saying them, which suggests language more than "lossy"
| compression because it's more similar to Shannon's
| lossless compression with prefix codes. I see where
| you're coming from though.
|
| @Felcon If language is a mechanism for expressing ideas
| and reasoning, it should reflect our cognitive processes
| that generate those ideas, so " [...] Just because you
| use a mechanism for expressing your ideas and reasoning,
| doesn't mean that underlying reality has to confirm in
| any way to it" is a bit contradictory. Are our cognitive
| processes not included in reality?
|
| The existance of a universal grammar is a specific
| hypothesis that requires empirical evidence. It's tiring
| to hear Chomsky's ideas parroted despite no empirical
| framework to stand on. What ideas could we be lacking?
| This argument is similar to String Theory proponents who
| kept pulling ideas out of the ether to support an
| unsubstantiated theory.
| skywhopper wrote:
| Language is useful to transmit the concepts but is not
| sufficient to actually solve problems with those concepts.
| maxglute wrote:
| This interaction down the chain is interesting:
|
| Q: yann do you have an internal monologue?
|
| A: Not that I know of.
| TeMPOraL wrote:
| My bet is that this is wrong, and at the same time, language
| isn't _required_ - just sufficient. I see concepts as defined
| _only_ through associations with other concepts (which can be
| modeled as proximity in high-dimensional space, and that 's
| precisely what LLMs are doing) and, sometimes, through
| memorized sensory data - the latter isn't the typical case,
| but it's needed to make the recursive definition (concepts
| defined in terms of concepts) stay anchored to reality.
|
| From that follows that written language is enough to build
| that structure of concepts (latent space in ML terms). So is
| spoken language. So is vision in general, or hearing in
| general[0]. The brain will build one concept space out of all
| inputs available; it is necessary and sufficient to have at
| least one, but none of them alone is itself necessary.
|
| --
|
| [0] - Languages are higher-level regularities used for
| communication, growing on top of those senses, but not
| strictly necessary for understanding the real world. I'd use
| people who have no perceptible inner voice and high
| visualization skills as a counter to the idea that concepts
| need to be thought of symbolically in something resembling a
| written or spoken language.
| TillE wrote:
| I think the best thing to do in a scenario like that is
| standard procgen where you randomly, logically generate the
| rooms with a bunch of descriptive tags, and then LLM-ify the
| room descriptions, with some context for what the world/area is
| supposed to be like.
| adastra22 wrote:
| It can even dynamically extend by using prompts to generate
| new rooms as the user enters them, but keeping track of
| generated world outside of the LLM.
|
| Which is the same way nearly all procedurally generated games
| work.
| Waterluvian wrote:
| I think the issue might be that people like to throw the tool
| at the whole problem when it's not the right tool for a lot of
| it.
|
| Don't use LLMs for logic. Use it for colour and flavour.
| Generate your own layout, populate the rooms in a manner
| befitting the context, difficulty, story arc, and then whenever
| a user makes an action, update the state then pass to the LLM
| the state and ask it to describe the room and make a few other
| smaller decisions.
| enterexit wrote:
| Curious, can this objective be reworded as "How big a universe an
| LLM can serve to simulate?"
| sk11001 wrote:
| Recently I'm getting some strange behavior from ChatGPT (with
| GPT-4) where it outputs a code snippet in which it declares a
| variable, and then on the very next line it forgets the name of
| the variable and refers to it as something else. Or it refers to
| the variable but it makes a typo in the name.
|
| If that's the behavior by one of the best models on a few lines
| of code, I'm not hopeful for a world simulator any time soon
| unless we see another big leap in model quality.
| therobots927 wrote:
| I feel like it's an open secret at this point that chatGPT is
| not useful for anything requiring consistent, logical thought
| which is a requirement for most human jobs.
| nodja wrote:
| Chat GPT4 has been in general both better and worse than 3.5. I
| sometimes start a conversation with GPT4 and then have 3.5 fix
| it in another window (by pasting the code and pretending it's
| mine). It feels like things have degraded for coding tasks
| specifically, but the more recent knowledge of GPT4 is helpful
| if I'm asking how to use a certain library that's more recent
| as it won't just pretend the library exists, GPT4 is also much
| slower.
|
| I wish there was a ChatGPT 3.5 updated with more recent
| knowledge.
| catlifeonmars wrote:
| Think about what the model is doing:
|
| It's sampling from a statistical distribution of likely tokens,
| one of which is the correctly spelled variable name.
| sk11001 wrote:
| My point is that in previous versions it was sampling more
| correctly more often than it has been recently.
| rany_ wrote:
| From what I've heard, the reason it's doing that is to add
| an AI fingerprint. It has to pick less likely tokens to
| encode this information to the detriment of the output
| quality. Unfortunately it's only hearsay but it made sense
| to me so I thought I'd share.
| TeMPOraL wrote:
| Yes. The presence of the correctly spelled name earlier in
| the context should've dominated the distribution so much that
| it should've been extremely unlikely to select an incorrectly
| spelled variable name in the place where the correct name
| should go.
| bbor wrote:
| Technically "text-based world simulator" pretty much _is_ an
| explanation of LLMs already. That's why we all suddenly care
| about dumb chatbots --- they accidentally cracked the frame
| problem.
| P_I_Staker wrote:
| The answer is no
| carabiner wrote:
| Too much data are locked up in non-public sources that still
| affect the world. You will not find complete engineering analysis
| reports for the Ford F-150 engine on the internet or full email
| exchanges for Trump's presidential campaign planning in Ohio. Yet
| these all influence us.
| binary132 wrote:
| Can headlines written in the interrogative case ever be answered
| in the affirmative?
| echelon wrote:
| There exists at least one headline written in the interrogative
| case that can be answered in the affirmative.
| moritzwarhier wrote:
| https://web.archive.org/web/20160410022155/http://ccdb5fs.ke.
| ..
| moritzwarhier wrote:
| For academic studies, apparently yes.
|
| https://en.m.wikipedia.org/wiki/Betteridge's_law_of_headline...
|
| Since this is an AI paper published on arxiv, assuming a
| similar distribution is surely not justified.
|
| But I'd still put it in a different category than clickbait
| news headlines.
| ninetyninenine wrote:
| The answer is yes.
|
| https://aidungeon.com/
|
| This actually came out before chatGPT and it floored me.
| leonardspeiser wrote:
| I was able to get this to work with an LLM, but had to build some
| short term memory to keep awareness as you explored. The current
| site allows you to interact with an Oregon Trail or Zork like
| world, but you can specify any time in history and then it will
| create the stories and stay within that world. I also have it
| generate some images to go along with you for fun.
| https://grue.is/ (PS, I don't know how to code, so this is also
| proof that you can use an LLM to write all the software, here is
| my github if you are interested in learning more about that:
| https://github.com/lrspeiser/Grue.is)
| w4ffl35 wrote:
| I've already built things like this. Cool paper but I build
| things, put them to the side, never write a paper. Sometimes
| tweet about them. Then I see a paper later about some similar
| thing and the crowd goes wild. Not a complaint. Idk what else to
| say.
___________________________________________________________________
(page generated 2024-06-15 23:00 UTC)