hngopher.com

       [HN Gopher] Open SWE: An open-source asynchronous coding agent
       ___________________________________________________________________
        
       Open SWE: An open-source asynchronous coding agent
        
       https://www.youtube.com/watch?v=TaYVvXbOs8c
       https://github.com/langchain-ai/open-swe
        
       Author : palashshah
       Score  : 55 points
       Date   : 2025-08-08 16:16 UTC (6 hours ago)
        
 (HTM) web link (blog.langchain.com)
 (TXT) w3m dump (blog.langchain.com)
        
       | dabockster wrote:
       | > We believe that all agents will long more like this in the
       | future - long running, asynchronous, more autonomous.
       | Specifically, we think that they will:
       | 
       | > Run asynchronously in the cloud
       | 
       | > cloud
       | 
       | Reality check:
       | 
       | https://huggingface.co/Menlo/Jan-nano-128k-gguf
       | 
       | That model will run, with decent conversation quality, at roughly
       | the same memory footprint as a few Chrome tabs. It's only a
       | matter of time until we get coding models that can do that, and
       | then only a further matter of time until we see agentic
       | capabilities at that memory footprint. I mean, I can already get
       | agentic coding with one of the new Qwen3 models - super slowly,
       | but it works in the first place. And the quality matches or even
       | beats some of the cloud models and vibe coding apps.
       | 
       | And that model is just one example. Researchers all over the
       | world are making new models almost daily that can run on an off-
       | the-shelf gaming computer. If you have a modern Nvidia graphics
       | card, you can run AI on your own computer totally offline. That's
       | the reality.
        
         | koakuma-chan wrote:
         | Do you know what "MCP-based methodology" is? I am skeptical of
         | a 4B model scoring twice as high as Gemini 2.5 Pro
        
           | dabockster wrote:
           | Yeah I know about Model Context Protocol. But it's still only
           | a small part of the AI puzzle. I'm saying that we're at a
           | point now where a whole AI stack can run, in some form, 100%
           | on-device with okayish accuracy. When you think about that,
           | and where we're headed, it makes the whole idea of cloud AI
           | look like a dinosaur.
        
             | koakuma-chan wrote:
             | I mean, I am asking what "MCP-based methodology" is,
             | because it doesn't make sense for a 4B model to outperform
             | Gemini 2.5 Pro et al by that much.
        
           | toshinoriyagi wrote:
           | I'm not too sure what "MCP-based methodology" is, but Jan-
           | nano-128k is a small model specifically designed to be able
           | to answer in-depth questions accurately via tool-use
           | (researching in a provided document or searching the web).
           | 
           | It outperforms those other models, which are not using tools,
           | thanks to the tool use and specificity.
           | 
           | Because it is only 4B parameters, it is naturally terrible at
           | other things I believe-it's not designed for them and doesn't
           | have enough parameters.
           | 
           | In hindsight, "MCP-based methodology" likely refers to its
           | tool-use.
        
           | cbcoutinho wrote:
           | From the paper:
           | 
           | > Most language models face a fundamental tradeoff where
           | powerful capabilities require substantial computational
           | resources. We shatter this constraint with Jan-nano, a 4B
           | parameter language model that redefines efficiency through
           | radical specialization: instead of trying to know everything,
           | it masters the art of finding anything instantly. Fine-tuned
           | from Qwen3-4B using our novel multi-stage Reinforcement
           | Learning with Verifiable Rewards (RLVR) system that
           | completely eliminates reliance on next token prediction
           | training (SFT), Jan-nano achieves 83.2% on SimpleQA benchmark
           | with MCP integration while running on consumer hardware. With
           | 128K context length, Jan-nano proves that intelligence isn't
           | about scale, it's about strategy.
           | 
           | > For our MCP evaluation, we used mcp-server-serper which
           | provides google search and scrape tools
           | 
           | https://arxiv.org/abs/2506.22760
        
         | Martinussen wrote:
         | Data storage has gotten cheaper and more efficient/manageable
         | every year for decades, yet people seem content with having
         | less storage than a mid-range desktop from a decade and a half
         | ago, split between their phone and laptop, and leaving
         | everything else to the "> cloud" - I wouldn't be so sure we're
         | going to see people reach for technological independence this
         | time either.
        
           | merelysounds wrote:
           | One factor here is people preferring portable devices. Note
           | that portable SSDs are also popular.
           | 
           | Also, usage patterns can be different; with storage, if I use
           | 90% of my local content only occasionally, I can archive that
           | to the cloud and continue using the remaining local 10%.
        
         | prophesi wrote:
         | I'm also excited for local LLM's to be capable of assisting
         | with nontrivial coding tasks, but we're far from reaching that
         | point. VRAM remains a huge bottleneck for even a top-of-the-
         | line gaming PC to run them. The best these days for agentic
         | coding that get close to the vibe-check of frontier models seem
         | to be Qwen3-Coder-480B-A35B-Instruct, DeepSeek-Coder-V2-236B,
         | GLM 4.5, and GPT-OSS-120B. The latter being the only one
         | capable of fitting on a 64 to 96GB VRAM machine with
         | quantization.
         | 
         | Of course, the line will always be pushed back as frontier
         | models incrementally improve, but the quality is night and day
         | between these open models consumers can feasibly run versus
         | even the cheaper frontier models.
         | 
         | That said, I too have no interest in this if local models
         | aren't supported and hope that's down the pipeline just so I
         | can try tinkering with it. Though it looks like it utilizes
         | multiple models for various tasks (planner, programmer,
         | reviewer, router, and summarizer) so that only adds to the
         | difficulty of the VRAM bottleneck if you'd like to load
         | different models per task. So I think it makes sense for them
         | to focus on just Claude for now to prove the concept.
         | 
         | edit: I personally use Qwen3 Coder 30B 4bit for both
         | autocomplete and talking to an agent, and switch to a frontier
         | model for the agent when Qwen3 starts running in circles.
        
       | cowpig wrote:
       | I was excited by the announcement but then
       | 
       | > Runs in an isolated sandbox Every task runs in a secure,
       | isolated Daytona sandbox.
       | 
       | Oh, so fake open source? Daytona is an AGPL-licensed codebase
       | that doesn't actually open-source the control plane, and the
       | first instruction in the README is to sign up for their service.
       | 
       | > From the "open-swe" README:
       | 
       | Open SWE can be used in multiple ways:
       | 
       | * From the UI. You can create, manage and execute Open SWE tasks
       | from the web application. See the 'From the UI' page in the docs
       | for more information.
       | 
       | * From GitHub. You can start Open SWE tasks directly from GitHub
       | issues simply by adding a label open-swe, or open-swe-auto
       | (adding -auto will cause Open SWE to automatically accept the
       | plan, requiring no intervention from you). For enhanced
       | performance on complex tasks, use open-swe-max or open-swe-max-
       | auto labels which utilize Claude Opus 4.1 for both planning and
       | programming. See the 'From GitHub' page in the docs for more
       | information.
       | 
       | * * *
       | 
       | The "from the UI" links to their hosted web interface. If I
       | cannot run it myself it's fake open-source
        
         | mitchitized wrote:
         | Hol up
         | 
         | How can it be AGPL and not provide full source? AGPL is like
         | the most aggressive of the GPL license variants. If they
         | somehow circumvented the intent behind this license that is a
         | problem.
        
           | Multicomp wrote:
           | Spitballing here but if it's their code that they have
           | copyright on, they can license it to us as agpl, without
           | binding themselves to those same terms. They have all rights
           | as copyright holders regardless of a given license.
        
         | esafak wrote:
         | It's a hosted service with an open source client?
        
       | tevon wrote:
       | Very cool! Am using it now and really like the sidebar chat that
       | allows you to add context during a run.
       | 
       | I hit an error that was not recoverable. I'd love to see
       | functionality to bring all that context over to a new thread, or
       | otherwise force it to attempt to recover.
        
       | lta wrote:
       | Nice, but I want exactly the opposite. I want my agents to run
       | locally without any sort of black box and I certainly don't want
       | to be stuck with whatever UI you've designed to interact with the
       | git provider you've selected.
       | 
       | It's not a super surprising coming from this pole of over
       | engineering so thick I'm surprised it wasn't developed by
       | Microsoft in the 90s or 00s
        
         | kristianp wrote:
         | Yes, where's the open source agent that runs on the command
         | line?
        
           | ryuuseijin wrote:
           | It's called opencode: https://opencode.ai/
        
             | numpad0 wrote:
             | TIL opencode-opencode name conflict was resolved by
             | opencode keeping opencode name and opencode renaming to
             | Crush
             | 
             | 1: https://github.com/sst/opencode
             | 
             | 2: https://github.com/opencode-ai/opencode
             | 
             | 3: https://github.com/charmbracelet/crush
        
       ___________________________________________________________________
       (page generated 2025-08-08 23:00 UTC)