hngopher.com

       [HN Gopher] Show HN: LLM plays Pokemon (open sourced)
       ___________________________________________________________________
        
       Show HN: LLM plays Pokemon (open sourced)
        
       I built a bot that plays Pokemon FireRed. It can explore, battle,
       and respond to game events. Farthest I made it was Viridian Forest.
       I paused development a couple months ago, but given the launch of
       ClaudePlaysPokemon, decided to open source!
        
       Author : adenta
       Score  : 88 points
       Date   : 2025-02-26 19:31 UTC (3 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | minimaxir wrote:
       | See also: the AI Plays Pokemon project that went megaviral a year
       | or so ago, using CNNs and RL instead of LLMs:
       | https://github.com/PWhiddy/PokemonRedExperiments
       | 
       | > I believe that Claude Plays Pokemon isn't doing any of the
       | memory parsing I spent a ton of time, they are just streaming the
       | memory directly to Claude 3.7 and it is figuring it out
       | 
       | It is implied they are using structured Pokemon data from the LLM
       | and saving it as a knowledge base. That is the only way they can
       | get live Pokemon party data to display in the UI:
       | https://www.twitch.tv/claudeplayspokemon
       | 
       | The AI Plays Pokemon project above does note some of the memory
       | addresses where that data is contained, since it used that data
       | to calculate the reward for the PPO.
        
         | adenta wrote:
         | On this page (https://excalidraw.com/#json=WrM9ViixPu2je5cVJZGC
         | e,no_UoONhF...) linked from their twitch, it says: " This info
         | is all parsed directly from the RAM of the game, Claude Code is
         | very good at this task". I'm reading that as "we are pumping
         | the RAM directly into the LLM", but I could be mistaken.
        
           | None4U wrote:
           | I suspect that means they wrote the memory parser using
           | Claude (the Twitch description also mentions the LLM getting
           | specific info)
        
           | minimaxir wrote:
           | I agree that's ambigiously worded. For example, I'm not sure
           | if Claude could identify "MT MOON B1F" from the RAM data
           | alone since internally world map areas are only known by IDs,
           | while AI Plays Pokemon did annotate the corresponding area
           | with a human-readable name. https://github.com/PWhiddy/Pokemo
           | nRedExperiments/blob/master...
           | 
           | Though this RAM data _could_ be in Claude 's training data.
        
       | kanzure wrote:
       | You can also directly pull in the emulation state and map back to
       | game source code, and then make a script for tool use (not shown
       | here): https://github.com/pret/pokemon-reverse-engineering-
       | tools/bl... Well I see on your page that you already saw the pret
       | advice about memory extraction, hopefully the link is useful
       | anyway.
        
         | adenta wrote:
         | Yeah, took a similar approach at
         | https://github.com/adenta/fire_red_agent/blob/main/app/servi...
        
       | ArlenBales wrote:
       | > To me, this is the future of TV.
       | 
       | The future of television is watching bots play video games? What
       | a sad future.
        
         | farts_mckensy wrote:
         | Watchin AI robots fight each other gladiator style world be
         | pretty cool.
        
           | adenta wrote:
           | Idk what the parent comment said but let's make AI robots
           | fight each other gladiator style.
        
         | adenta wrote:
         | Yeah I think _a_ future of television might've been more apt.
         | They should've made a season 5 to Snowpiercer.
        
       | deadbabe wrote:
       | I want to note that if you really wanted an AI to play Pokemon
       | you can do it with a far simpler and cheaper AI than an LLM and
       | it would play the game far better, making this mostly an exercise
       | in overcomplicating something trivial. But sometimes when you
       | have a hammer everything will look like a nail.
        
         | adenta wrote:
         | I disagree. Getting a computer to play a game like a human has
         | an incredibly broad range of applications. Imagine a system
         | like this that is on autopilot, but can get suggestions from a
         | twitch chat, nudging its behavior in a specific direction. Two
         | such systems could be run by two teams, and they could do a
         | weekly battle.
         | 
         | This isn't an exercise in AI, it's an exercise in TV production
         | IMO.
        
         | minimaxir wrote:
         | The AI Plays Pokemon project only made it to Mt. Moon (where
         | coincidentially ClaudePlaysPokemon is stuck now) with many
         | months of iteration and many many hours of compute.
         | 
         | The reason Claude 3.7's performance is interesting is that the
         | LLM approach defeated Lt. Surge, far past Mt. Moon. (I wonder
         | how Claude solved the infamous puzzle in Surge's gym)
         | 
         | https://www.anthropic.com/research/visible-extended-thinking
        
           | deadbabe wrote:
           | Not talking about Reinforcement learning type AI, I'm talking
           | about classically programmed AI with standard pathfinders,
           | GOAP, behavior trees, etc...
        
             | adenta wrote:
             | Got a link handy?
        
             | Philpax wrote:
             | But how much effort do you have to put in to build an agent
             | that can play a specific game? Can you retarget that agent
             | easily? How well will your agent deal with circumstances
             | that it wasn't designed for?
        
         | drusepth wrote:
         | I don't think this project is meant to "solve" a task (hammer,
         | nail) insomuch as it's just an interesting "what if" experiment
         | to observe and play around with new technology.
        
         | futureshock wrote:
         | I know what you are saying, but I very much disagree. There are
         | also better chess engines. That's not the point.
         | 
         | It's all about the "G" in AGI. This is a nice demonstration of
         | how LLMs are a generalizable intelligence. It was not designed
         | to play Pokemon, Pokemon was no special part of its training
         | set, Pokemon was not part of its evaluation criteria. And yet,
         | it plays Pokemon, and rather well!
         | 
         | And to see each iteration of Claude be able to progress further
         | and faster in Pokemon helps demonstrate that each generation of
         | the LLM is getting smarter in general, not just better fitted
         | to standard benchmarks.
         | 
         | The point is to build the universal hammer that can hammer
         | every nail, just as the human mind is the universal hammer.
        
           | wordpad25 wrote:
           | Pokemon guides were definitely part of every LLM training
           | set. Game is so old, there are thousands of guides and videos
           | on the topic.
           | 
           | LLMs will readily offer high quality Pokemon gameplay advice
           | without needing to searc online.
        
             | minimaxir wrote:
             | The operative phrase of that comment being "no special
             | part."
             | 
             | If you watch the Twitch stream it is obvious Claude has
             | general knowledge of what to do to win in Pokemon but
             | cannot recall specifics.
        
               | northern-lights wrote:
               | For eg., Bug type attack is super effective against
               | Poison type in Gen 1 but not very effective in Gen 2 and
               | onnwards. But Claude keeps bringing Nidoran into
               | Weedle/Caterpie.
        
         | imtringued wrote:
         | It's a publicity stunt by anthropic (Claude plays Pokemon).
         | 
         | Obviously they are going to show off their LLM
        
       | dang wrote:
       | Related ongoing thread:
       | 
       |  _Claude Plays Pokemon_ -
       | https://news.ycombinator.com/item?id=43173825
        
       | tgtweak wrote:
       | You can use claude computer functions to actually play it on an
       | emulator with no programming at all - but that kind of feels like
       | cheating :D
        
         | adenta wrote:
         | I tried! It didn't work super well
        
       | mclau156 wrote:
       | Honestly Claude 3.7 can make a pokemon game in pygame fairly
       | easily, at that point it would have a lot more control over it
        
       | evanextreme wrote:
       | Was working on a similar thing last year! Might as well open
       | source at this point too.
        
         | adenta wrote:
         | Email me when it launches! (In profile)
        
       | podoman wrote:
       | Have you considered calling this bot "intern bot"? - Jay
        
         | adenta wrote:
         | https://static.wikia.nocookie.net/loveinterest/images/5/5c/3...
        
       | montebicyclelo wrote:
       | Super cool to see this idea working. I had a go at getting an LLM
       | to play Pokemon in 2023, with openai vision. With only 100
       | expensive api calls a day, I shelved the project after putting
       | together a quick POC and finding that the model struggled to see
       | things or work out where the player was. I guess models are now
       | better, but also looks like people are providing the model with
       | information in addition to the game screen.
       | 
       | https://x.com/sidradcliffe/status/1722355983643525427?t=dYMk...
        
         | adenta wrote:
         | The vision models still struggle in my experience. I got around
         | that by reading the RAM and describing all the objects
         | positions on screen
        
       ___________________________________________________________________
       (page generated 2025-02-26 23:00 UTC)