[HN Gopher] Show HN: LLM plays Pokemon (open sourced)
___________________________________________________________________
Show HN: LLM plays Pokemon (open sourced)
I built a bot that plays Pokemon FireRed. It can explore, battle,
and respond to game events. Farthest I made it was Viridian Forest.
I paused development a couple months ago, but given the launch of
ClaudePlaysPokemon, decided to open source!
Author : adenta
Score : 88 points
Date : 2025-02-26 19:31 UTC (3 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| minimaxir wrote:
| See also: the AI Plays Pokemon project that went megaviral a year
| or so ago, using CNNs and RL instead of LLMs:
| https://github.com/PWhiddy/PokemonRedExperiments
|
| > I believe that Claude Plays Pokemon isn't doing any of the
| memory parsing I spent a ton of time, they are just streaming the
| memory directly to Claude 3.7 and it is figuring it out
|
| It is implied they are using structured Pokemon data from the LLM
| and saving it as a knowledge base. That is the only way they can
| get live Pokemon party data to display in the UI:
| https://www.twitch.tv/claudeplayspokemon
|
| The AI Plays Pokemon project above does note some of the memory
| addresses where that data is contained, since it used that data
| to calculate the reward for the PPO.
| adenta wrote:
| On this page (https://excalidraw.com/#json=WrM9ViixPu2je5cVJZGC
| e,no_UoONhF...) linked from their twitch, it says: " This info
| is all parsed directly from the RAM of the game, Claude Code is
| very good at this task". I'm reading that as "we are pumping
| the RAM directly into the LLM", but I could be mistaken.
| None4U wrote:
| I suspect that means they wrote the memory parser using
| Claude (the Twitch description also mentions the LLM getting
| specific info)
| minimaxir wrote:
| I agree that's ambigiously worded. For example, I'm not sure
| if Claude could identify "MT MOON B1F" from the RAM data
| alone since internally world map areas are only known by IDs,
| while AI Plays Pokemon did annotate the corresponding area
| with a human-readable name. https://github.com/PWhiddy/Pokemo
| nRedExperiments/blob/master...
|
| Though this RAM data _could_ be in Claude 's training data.
| kanzure wrote:
| You can also directly pull in the emulation state and map back to
| game source code, and then make a script for tool use (not shown
| here): https://github.com/pret/pokemon-reverse-engineering-
| tools/bl... Well I see on your page that you already saw the pret
| advice about memory extraction, hopefully the link is useful
| anyway.
| adenta wrote:
| Yeah, took a similar approach at
| https://github.com/adenta/fire_red_agent/blob/main/app/servi...
| ArlenBales wrote:
| > To me, this is the future of TV.
|
| The future of television is watching bots play video games? What
| a sad future.
| farts_mckensy wrote:
| Watchin AI robots fight each other gladiator style world be
| pretty cool.
| adenta wrote:
| Idk what the parent comment said but let's make AI robots
| fight each other gladiator style.
| adenta wrote:
| Yeah I think _a_ future of television might've been more apt.
| They should've made a season 5 to Snowpiercer.
| deadbabe wrote:
| I want to note that if you really wanted an AI to play Pokemon
| you can do it with a far simpler and cheaper AI than an LLM and
| it would play the game far better, making this mostly an exercise
| in overcomplicating something trivial. But sometimes when you
| have a hammer everything will look like a nail.
| adenta wrote:
| I disagree. Getting a computer to play a game like a human has
| an incredibly broad range of applications. Imagine a system
| like this that is on autopilot, but can get suggestions from a
| twitch chat, nudging its behavior in a specific direction. Two
| such systems could be run by two teams, and they could do a
| weekly battle.
|
| This isn't an exercise in AI, it's an exercise in TV production
| IMO.
| minimaxir wrote:
| The AI Plays Pokemon project only made it to Mt. Moon (where
| coincidentially ClaudePlaysPokemon is stuck now) with many
| months of iteration and many many hours of compute.
|
| The reason Claude 3.7's performance is interesting is that the
| LLM approach defeated Lt. Surge, far past Mt. Moon. (I wonder
| how Claude solved the infamous puzzle in Surge's gym)
|
| https://www.anthropic.com/research/visible-extended-thinking
| deadbabe wrote:
| Not talking about Reinforcement learning type AI, I'm talking
| about classically programmed AI with standard pathfinders,
| GOAP, behavior trees, etc...
| adenta wrote:
| Got a link handy?
| Philpax wrote:
| But how much effort do you have to put in to build an agent
| that can play a specific game? Can you retarget that agent
| easily? How well will your agent deal with circumstances
| that it wasn't designed for?
| drusepth wrote:
| I don't think this project is meant to "solve" a task (hammer,
| nail) insomuch as it's just an interesting "what if" experiment
| to observe and play around with new technology.
| futureshock wrote:
| I know what you are saying, but I very much disagree. There are
| also better chess engines. That's not the point.
|
| It's all about the "G" in AGI. This is a nice demonstration of
| how LLMs are a generalizable intelligence. It was not designed
| to play Pokemon, Pokemon was no special part of its training
| set, Pokemon was not part of its evaluation criteria. And yet,
| it plays Pokemon, and rather well!
|
| And to see each iteration of Claude be able to progress further
| and faster in Pokemon helps demonstrate that each generation of
| the LLM is getting smarter in general, not just better fitted
| to standard benchmarks.
|
| The point is to build the universal hammer that can hammer
| every nail, just as the human mind is the universal hammer.
| wordpad25 wrote:
| Pokemon guides were definitely part of every LLM training
| set. Game is so old, there are thousands of guides and videos
| on the topic.
|
| LLMs will readily offer high quality Pokemon gameplay advice
| without needing to searc online.
| minimaxir wrote:
| The operative phrase of that comment being "no special
| part."
|
| If you watch the Twitch stream it is obvious Claude has
| general knowledge of what to do to win in Pokemon but
| cannot recall specifics.
| northern-lights wrote:
| For eg., Bug type attack is super effective against
| Poison type in Gen 1 but not very effective in Gen 2 and
| onnwards. But Claude keeps bringing Nidoran into
| Weedle/Caterpie.
| imtringued wrote:
| It's a publicity stunt by anthropic (Claude plays Pokemon).
|
| Obviously they are going to show off their LLM
| dang wrote:
| Related ongoing thread:
|
| _Claude Plays Pokemon_ -
| https://news.ycombinator.com/item?id=43173825
| tgtweak wrote:
| You can use claude computer functions to actually play it on an
| emulator with no programming at all - but that kind of feels like
| cheating :D
| adenta wrote:
| I tried! It didn't work super well
| mclau156 wrote:
| Honestly Claude 3.7 can make a pokemon game in pygame fairly
| easily, at that point it would have a lot more control over it
| evanextreme wrote:
| Was working on a similar thing last year! Might as well open
| source at this point too.
| adenta wrote:
| Email me when it launches! (In profile)
| podoman wrote:
| Have you considered calling this bot "intern bot"? - Jay
| adenta wrote:
| https://static.wikia.nocookie.net/loveinterest/images/5/5c/3...
| montebicyclelo wrote:
| Super cool to see this idea working. I had a go at getting an LLM
| to play Pokemon in 2023, with openai vision. With only 100
| expensive api calls a day, I shelved the project after putting
| together a quick POC and finding that the model struggled to see
| things or work out where the player was. I guess models are now
| better, but also looks like people are providing the model with
| information in addition to the game screen.
|
| https://x.com/sidradcliffe/status/1722355983643525427?t=dYMk...
| adenta wrote:
| The vision models still struggle in my experience. I got around
| that by reading the RAM and describing all the objects
| positions on screen
___________________________________________________________________
(page generated 2025-02-26 23:00 UTC)