hngopher.com

       [HN Gopher] Show HN: FLE v0.3 - Claude Code Plays Factorio
       ___________________________________________________________________
        
       Show HN: FLE v0.3 - Claude Code Plays Factorio
        
       We're excited to release v0.3.0 of the Factorio Learning
       Environment (FLE), an open-source environment for evaluating AI
       agents on long-horizon planning, spatial reasoning, and automation
       tasks.  == What is FLE? ==  FLE uses the game Factorio to test
       whether AI can handle complex, open-ended engineering challenges.
       Agents write Python code to build automated factories, progressing
       from simple resource extraction (~30 units/min) to sophisticated
       production chains (millions of units/sec).  == What's new in 0.3.0
       ==  - Headless scaling: No longer needs the game client, enabling
       massive parallelization!  - OpenAI Gym compatibility: Standard
       interface for RL research  - Claude Code integration: We're
       livestreaming Claude playing Factorio [on
       Twitch](http://twitch.tv/playsfactorio)  - Better tooling and SDK:
       1-line CLI commands to run evaluations (with W&B logging)  == Key
       findings ==  We evaluated frontier models (Claude Opus 4.1, GPT-5,
       Gemini 2.5 Pro, Grok 4) on 24 production automation tasks of
       increasing complexity.  Even the best models struggle:  - Most
       models still rely on semi-manual strategies rather than true
       automation  - Agents rarely define helper functions or
       abstractions, limiting their ability to scale  - Error recovery
       remains difficult - agents often get stuck in repetitive failure
       loops  The performance gap between models on FLE correlates more
       closely with real-world task benchmarks (like GDPVal) than with
       traditional coding/reasoning evals.  == Why this matters ==  Unlike
       benchmarks based on exams that saturate quickly, Factorio's
       exponential complexity scaling means there's effectively no
       performance ceiling. The skills needed - system debugging,
       constraint satisfaction, logistics optimization - transfer directly
       to real challenges.  == Try it yourself ==  >>> uv add factorio-
       learning-environment  >>> uv add "factorio-learning-
       environment[eval]"  >>> fle cluster start  >>> fle eval --config
       configs/gym_run_config.json  We're looking for researchers,
       engineers, and modders interested in pushing the boundaries of
       agent capabilities. Join our Discord if you want to contribute. We
       look forward to meeting you and seeing what you can build!  -- FLE
       Team
        
       Author : noddybear
       Score  : 38 points
       Date   : 2025-10-03 19:32 UTC (3 hours ago)
        
 (HTM) web link (jackhopkins.github.io)
 (TXT) w3m dump (jackhopkins.github.io)
        
       | bottydim wrote:
       | haha, I am sure somewhere, some PhD student told their
       | supervisor: "No, seriously, I have to play 600 hours of
       | Factorio... for science."
        
       | georgeh4cks wrote:
       | Loving the 'Claude plays' integration. Great work
        
       | dang wrote:
       | Related. Others?
       | 
       |  _Multi-Agent Coordination in Factorio: FLE v0.2.0_ -
       | https://news.ycombinator.com/item?id=43926829 - May 2025 (5
       | comments)
       | 
       |  _Show HN: Factorio Learning Environment - Agents Build
       | Factories_ - https://news.ycombinator.com/item?id=43331582 -
       | March 2025 (209 comments)
        
         | noddybear wrote:
         | This is our earlier work. Since May we've made it really easy
         | for the community to build their own agents to play the game:
         | you can now hook up your terminal to get Claude Code to play
         | the game.
        
           | dang wrote:
           | That's great!
           | 
           | (just for clarity: links to past threads in no way imply that
           | the new post isn't welcome! They're just because some readers
           | enjoy poking back through past related discussions as well)
        
       | yeasku wrote:
       | Are bitters and cliffs disabled?
        
         | noddybear wrote:
         | Biters are disabled, but cliffs are not
        
       | kyars wrote:
       | Live-stream is epic
        
       ___________________________________________________________________
       (page generated 2025-10-03 23:00 UTC)