[HN Gopher] Can language models serve as text-based world simula...
       ___________________________________________________________________
        
       Can language models serve as text-based world simulators?
        
       Author : mpweiher
       Score  : 68 points
       Date   : 2024-06-15 12:25 UTC (10 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | Bluestein wrote:
       | Nethack FTW :)
        
         | ziggy_star wrote:
         | Seriously, ha, time is a flat circle.
        
           | anthk wrote:
           | I still love slashem.
        
       | xrd wrote:
       | I made a D&D game for my kids using llama. It went surprisingly
       | well. Lots of potential here.
       | 
       | https://blog.katarismo.com/2023-05-26-i-m-a-dad-i-replaced-m...
        
       | klaussilveira wrote:
       | This is neat. In theory, you could hook llama.cpp into into a
       | GOAL-based planner (https://www.gamedevs.org/uploads/three-
       | states-plan-ai-of-fea...) and have much better default bots
       | navigating your nav mesh. Even better, if you record player
       | actions as GOAL actions within the nav mesh, you can use that to
       | fine tune the model. Or even feed it back in realtime so they
       | learn the modus operandi of the player.
        
       | skybrian wrote:
       | Maybe they can, but they're trained on conversation and stories
       | (among other things) rather than simulations. These are more
       | general than the log of a simulation run in how they use time.
       | Stories can be chronological, but they can also fill in the
       | details in any order using things like flashbacks. Or they can
       | get really complicated with time travel stories.
       | 
       | So it seems like to understand a story well, an LLM would need a
       | more sophisticated notion of time, where there is a history and
       | events are slotted in as they're learned from the narrative, and
       | sometimes the whole history gets reinterpreted based on new
       | information. (Plot twist!)
       | 
       | It would be fascinating if some mechanistic interpretability
       | researcher figured out how they actually work with story time.
       | Are there the rudiments of that kind of understanding yet?
        
       | lyu07282 wrote:
       | TL;DR: unfortunately the conclusion is, no they can't
       | 
       | > the best recorded performance is 59.9% on accurately simulating
       | state transitions [...] after 10 steps, average simulation
       | accuracy would reduce to less than 1%. Our results indicate that
       | LLMs are not yet able to reliably act as text world simulators
        
       | guestbest wrote:
       | I think there is a MUD effort called LlamaTale which allows
       | creation of this as telepresence. I'm trying to put a MUD
       | together to try this out.
        
       | firtoz wrote:
       | Would be interesting to see a version of this but incorporating
       | tool usage features of the latest models.
        
       | gfosco wrote:
       | Not by themselves... but with a good program built around it,
       | managing the actual state, and very careful prompting, yes. I've
       | been thinking about this for a while, always have the desire to
       | make a game.
        
         | bongodongobob wrote:
         | Exactly. I've been working on something like this for a while
         | using finite state machines to control the prompts. The biggest
         | struggle I've had is creating memory. It's one thing to save a
         | list of events and feed it into the prompts, but there are
         | always issues interpreting it and making sure the memories are
         | detailed enough but not an entire page.
         | 
         | For example, you "conquered the dungeon in level 1". If this
         | gets saved as "conquered the dungeon" the next time you get to
         | a new dungeon, it may think you already beat it and won't
         | generate NPC monsters, that kind of thing.
        
           | chumpro wrote:
           | I am also working on something similar. I have written a
           | grammar system for controlling the llm that is not context
           | free.
           | 
           | One of the main challenges is to have a database of what is
           | reality, id -> object, and then have the llm return ids for
           | objects referenced by the user. I am doing a system where
           | something that has no id has not been created yet.
           | 
           | I use story templates with labeled segments to create
           | characters, items, location as well as events. Example: "Once
           | upon a time there was a <role>cowardly knight</role> named
           | <name>Billy Bonkers</name>. The named qualities become the
           | data structure associated with the id of the new character.
           | It can be hilarious to prompt the llm to do nerdy space
           | humor!
           | 
           | I seems to be very possible to create a "narrator driven"
           | game happening in a "real" world.
           | 
           | I plan on using a multimodal llm and all the stuff that is
           | created will have an image. That combined with id-less
           | objects existing but not having a detailed description will
           | make qualities and objects visible in the created images also
           | part of the world.
        
       | anthk wrote:
       | Inform7 it's a beast and the resulting game can be run under a
       | 486. No AI required.
        
       | d13 wrote:
       | Try it!
       | 
       | All you need to do is type this into llama 3 8B or 70B 16fp
       | instruct:
       | 
       | "You are a text adventure game."
       | 
       | Done.
        
       | sbotzek wrote:
       | I tried for a few months getting ChatGPT4 to work with a MUD. In
       | my experience it's not very good at that particular task.
       | 
       | One problem I ran into was its ability to logically connect
       | rooms. In a MUD you navigate by going north, south, east, west,
       | up, or down. Not every room lets you go any direction. And
       | usually, if you go east, in your new room, you can go west.
       | Rarely a level creator will make this not be true. ChatGPT4 was
       | pretty bad at it though.
       | 
       | Another problem was descriptions. It might mention a mountain in
       | the distance once. But then never again. So this giant landmark
       | was described in a single room.
       | 
       | It was also difficult to get it to create a fair quantity of
       | secrets in logical places. Lots of times it would just chain
       | together multiple secrets in a single place. If you have more
       | than one, you want to spread it around
       | 
       | And finally, room layout. It tended to not be very good at this.
       | Lots of linear layouts. It didn't have an eye towards when
       | details should be complex rooms and when it can just be a line in
       | a description.
       | 
       | So it could do it, but it created levels that weren't very fun or
       | particularly creative, even when it came to room descriptions.
        
         | danjc wrote:
         | Somewhat related, Yann LeCun posted a few months back about how
         | many concepts aren't understood through language and therefore
         | can't be modeled through it (which is why LLM's are terrible
         | with things like position and direction).
         | 
         | https://x.com/ylecun/status/1768353714794901530?s=46
        
           | Folcon wrote:
           | To people who claim that "thinking and reasoning require
           | language", here is a problem:         Imagine standing at the
           | North Pole of the Earth.         Walk in any direction, in a
           | straight line, for 1 km.         Now turn 90 degrees to the
           | left.         Walk for as long as it takes to pass your
           | starting point.         Have you walked:         1. More than
           | 2xPi km         2. Exactly 2xPi km         3. Less than 2xPi
           | km         4. I never came close to my starting point.
           | Think about how you tried to answer this question and tell us
           | whether it was based on language.
           | 
           | Just quoting this here in case anything happens to the
           | tweet...
           | 
           | I agree with this, however I have one tiny nitpick, feel free
           | to tell me if you think I'm wrong or being overly nitpicky,
           | but the knowledge of the situation in which the phenomenon
           | that he's describing occurs, I learned about entirely from
           | language. So my ability to answer the question is based on
           | that knowledge.
           | 
           | I'm aware that the reasoning problem itself doesn't utilise
           | it, but a position and direction system in and of itself
           | arguably also suffers from being insufficient.
           | 
           | I suppose I'm wondering if "setup" counts as needing language
           | model?
        
             | dragonwriter wrote:
             | I don't think the example is a good rebuttal of "thinking
             | and reasoning require language".
             | 
             | It _may be_ a decent challenge, probably still not an
             | actual rebuttal, of  "language is sufficient for all
             | thinking and reasoning", but "X is required for Y" and "X
             | is sufficient for everything encompassed by Y" are very
             | different claims.
        
               | Folcon wrote:
               | That's fair, it's not supposed to be a rebuttal of that
               | :)...
               | 
               | Though now that you said it, I'm really thinking about
               | the original statement.
               | 
               | Honest question, can you give me an example of thinking
               | or reasoning that happens fully independently of
               | reasoning via symbols or their manipulation?
               | 
               | I feel like I'm missing something obvious, but nothing is
               | coming to mind right now :)...
               | 
               | Just thinking about the underlying statement and
               | simplifying language down to symbolic expression. (I was
               | originally going to say manipulation, but it doesn't feel
               | like it quite fits...)
        
               | dragonwriter wrote:
               | > Honest question, can you give me an example of thinking
               | or reasoning that happens fully independently of
               | reasoning via symbols or their manipulation?
               | 
               | I don't think we have anything but subjective, indirect
               | understanding of _how_ thinking happens, but I think
               | that, at a minimum, what we describe as  "reasoning"
               | specifically is tightly conceptually related to, if not a
               | subset of, manipulation of abstract symbols to which
               | concrete experiences may be approximately mapped.
               | 
               | I'm not sure I'd say this is the same thing as language,
               | but there's at a minimum a shared common symbol-
               | manipulation underlying both. Do the capacities always
               | come together? I'm not sure how we would know that, I
               | think our ability to recognize reasoning is tied to it
               | being mapped to language, and are ability to distinguish
               | something as language rather than nonlinguistic signaling
               | or mimicry of something else that is using language is
               | tied to independent expression of reasoning through it.
        
             | erik_seaberg wrote:
             | If the brain actually used language to represent ideas in
             | use, we should have succeeded in finding its
             | https://en.wikipedia.org/wiki/Universal_grammar. Instead it
             | seems more like language is a kind of lossy compression we
             | keep reinventing for ideas, to export them from a brain in
             | some tangible way and get them into another (even in the
             | future).
        
               | Folcon wrote:
               | Hmm, I'm not sure if that holds?
               | 
               | Just because you use a mechanism for expressing your
               | ideas and reasoning, doesn't mean that underlying reality
               | has to confirm in any way to it?
               | 
               | We invent new symbols and terms all the time as we
               | experience new phenomena. A universal grammar may still
               | be possible, barring incompleteness anyway, we may just
               | be lacking a whole bunch of ideas still...
        
               | bottom999mottob wrote:
               | @erik_seaberg the lack of success in finding a universal
               | grammar is a logical leap. The failure to find something
               | does not necessarily mean it doesn't exist. Language is
               | powerful for expressing abstract ideas without explicitly
               | saying them, which suggests language more than "lossy"
               | compression because it's more similar to Shannon's
               | lossless compression with prefix codes. I see where
               | you're coming from though.
               | 
               | @Felcon If language is a mechanism for expressing ideas
               | and reasoning, it should reflect our cognitive processes
               | that generate those ideas, so " [...] Just because you
               | use a mechanism for expressing your ideas and reasoning,
               | doesn't mean that underlying reality has to confirm in
               | any way to it" is a bit contradictory. Are our cognitive
               | processes not included in reality?
               | 
               | The existance of a universal grammar is a specific
               | hypothesis that requires empirical evidence. It's tiring
               | to hear Chomsky's ideas parroted despite no empirical
               | framework to stand on. What ideas could we be lacking?
               | This argument is similar to String Theory proponents who
               | kept pulling ideas out of the ether to support an
               | unsubstantiated theory.
        
             | skywhopper wrote:
             | Language is useful to transmit the concepts but is not
             | sufficient to actually solve problems with those concepts.
        
           | maxglute wrote:
           | This interaction down the chain is interesting:
           | 
           | Q: yann do you have an internal monologue?
           | 
           | A: Not that I know of.
        
           | TeMPOraL wrote:
           | My bet is that this is wrong, and at the same time, language
           | isn't _required_ - just sufficient. I see concepts as defined
           | _only_ through associations with other concepts (which can be
           | modeled as proximity in high-dimensional space, and that 's
           | precisely what LLMs are doing) and, sometimes, through
           | memorized sensory data - the latter isn't the typical case,
           | but it's needed to make the recursive definition (concepts
           | defined in terms of concepts) stay anchored to reality.
           | 
           | From that follows that written language is enough to build
           | that structure of concepts (latent space in ML terms). So is
           | spoken language. So is vision in general, or hearing in
           | general[0]. The brain will build one concept space out of all
           | inputs available; it is necessary and sufficient to have at
           | least one, but none of them alone is itself necessary.
           | 
           | --
           | 
           | [0] - Languages are higher-level regularities used for
           | communication, growing on top of those senses, but not
           | strictly necessary for understanding the real world. I'd use
           | people who have no perceptible inner voice and high
           | visualization skills as a counter to the idea that concepts
           | need to be thought of symbolically in something resembling a
           | written or spoken language.
        
         | TillE wrote:
         | I think the best thing to do in a scenario like that is
         | standard procgen where you randomly, logically generate the
         | rooms with a bunch of descriptive tags, and then LLM-ify the
         | room descriptions, with some context for what the world/area is
         | supposed to be like.
        
           | adastra22 wrote:
           | It can even dynamically extend by using prompts to generate
           | new rooms as the user enters them, but keeping track of
           | generated world outside of the LLM.
           | 
           | Which is the same way nearly all procedurally generated games
           | work.
        
         | Waterluvian wrote:
         | I think the issue might be that people like to throw the tool
         | at the whole problem when it's not the right tool for a lot of
         | it.
         | 
         | Don't use LLMs for logic. Use it for colour and flavour.
         | Generate your own layout, populate the rooms in a manner
         | befitting the context, difficulty, story arc, and then whenever
         | a user makes an action, update the state then pass to the LLM
         | the state and ask it to describe the room and make a few other
         | smaller decisions.
        
       | enterexit wrote:
       | Curious, can this objective be reworded as "How big a universe an
       | LLM can serve to simulate?"
        
       | sk11001 wrote:
       | Recently I'm getting some strange behavior from ChatGPT (with
       | GPT-4) where it outputs a code snippet in which it declares a
       | variable, and then on the very next line it forgets the name of
       | the variable and refers to it as something else. Or it refers to
       | the variable but it makes a typo in the name.
       | 
       | If that's the behavior by one of the best models on a few lines
       | of code, I'm not hopeful for a world simulator any time soon
       | unless we see another big leap in model quality.
        
         | therobots927 wrote:
         | I feel like it's an open secret at this point that chatGPT is
         | not useful for anything requiring consistent, logical thought
         | which is a requirement for most human jobs.
        
         | nodja wrote:
         | Chat GPT4 has been in general both better and worse than 3.5. I
         | sometimes start a conversation with GPT4 and then have 3.5 fix
         | it in another window (by pasting the code and pretending it's
         | mine). It feels like things have degraded for coding tasks
         | specifically, but the more recent knowledge of GPT4 is helpful
         | if I'm asking how to use a certain library that's more recent
         | as it won't just pretend the library exists, GPT4 is also much
         | slower.
         | 
         | I wish there was a ChatGPT 3.5 updated with more recent
         | knowledge.
        
         | catlifeonmars wrote:
         | Think about what the model is doing:
         | 
         | It's sampling from a statistical distribution of likely tokens,
         | one of which is the correctly spelled variable name.
        
           | sk11001 wrote:
           | My point is that in previous versions it was sampling more
           | correctly more often than it has been recently.
        
             | rany_ wrote:
             | From what I've heard, the reason it's doing that is to add
             | an AI fingerprint. It has to pick less likely tokens to
             | encode this information to the detriment of the output
             | quality. Unfortunately it's only hearsay but it made sense
             | to me so I thought I'd share.
        
           | TeMPOraL wrote:
           | Yes. The presence of the correctly spelled name earlier in
           | the context should've dominated the distribution so much that
           | it should've been extremely unlikely to select an incorrectly
           | spelled variable name in the place where the correct name
           | should go.
        
       | bbor wrote:
       | Technically "text-based world simulator" pretty much _is_ an
       | explanation of LLMs already. That's why we all suddenly care
       | about dumb chatbots --- they accidentally cracked the frame
       | problem.
        
       | P_I_Staker wrote:
       | The answer is no
        
       | carabiner wrote:
       | Too much data are locked up in non-public sources that still
       | affect the world. You will not find complete engineering analysis
       | reports for the Ford F-150 engine on the internet or full email
       | exchanges for Trump's presidential campaign planning in Ohio. Yet
       | these all influence us.
        
       | binary132 wrote:
       | Can headlines written in the interrogative case ever be answered
       | in the affirmative?
        
         | echelon wrote:
         | There exists at least one headline written in the interrogative
         | case that can be answered in the affirmative.
        
           | moritzwarhier wrote:
           | https://web.archive.org/web/20160410022155/http://ccdb5fs.ke.
           | ..
        
         | moritzwarhier wrote:
         | For academic studies, apparently yes.
         | 
         | https://en.m.wikipedia.org/wiki/Betteridge's_law_of_headline...
         | 
         | Since this is an AI paper published on arxiv, assuming a
         | similar distribution is surely not justified.
         | 
         | But I'd still put it in a different category than clickbait
         | news headlines.
        
       | ninetyninenine wrote:
       | The answer is yes.
       | 
       | https://aidungeon.com/
       | 
       | This actually came out before chatGPT and it floored me.
        
       | leonardspeiser wrote:
       | I was able to get this to work with an LLM, but had to build some
       | short term memory to keep awareness as you explored. The current
       | site allows you to interact with an Oregon Trail or Zork like
       | world, but you can specify any time in history and then it will
       | create the stories and stay within that world. I also have it
       | generate some images to go along with you for fun.
       | https://grue.is/ (PS, I don't know how to code, so this is also
       | proof that you can use an LLM to write all the software, here is
       | my github if you are interested in learning more about that:
       | https://github.com/lrspeiser/Grue.is)
        
       | w4ffl35 wrote:
       | I've already built things like this. Cool paper but I build
       | things, put them to the side, never write a paper. Sometimes
       | tweet about them. Then I see a paper later about some similar
       | thing and the crowd goes wild. Not a complaint. Idk what else to
       | say.
        
       ___________________________________________________________________
       (page generated 2024-06-15 23:00 UTC)