[HN Gopher] Open SWE: An open-source asynchronous coding agent
___________________________________________________________________
Open SWE: An open-source asynchronous coding agent
https://www.youtube.com/watch?v=TaYVvXbOs8c
https://github.com/langchain-ai/open-swe
Author : palashshah
Score : 55 points
Date : 2025-08-08 16:16 UTC (6 hours ago)
(HTM) web link (blog.langchain.com)
(TXT) w3m dump (blog.langchain.com)
| dabockster wrote:
| > We believe that all agents will long more like this in the
| future - long running, asynchronous, more autonomous.
| Specifically, we think that they will:
|
| > Run asynchronously in the cloud
|
| > cloud
|
| Reality check:
|
| https://huggingface.co/Menlo/Jan-nano-128k-gguf
|
| That model will run, with decent conversation quality, at roughly
| the same memory footprint as a few Chrome tabs. It's only a
| matter of time until we get coding models that can do that, and
| then only a further matter of time until we see agentic
| capabilities at that memory footprint. I mean, I can already get
| agentic coding with one of the new Qwen3 models - super slowly,
| but it works in the first place. And the quality matches or even
| beats some of the cloud models and vibe coding apps.
|
| And that model is just one example. Researchers all over the
| world are making new models almost daily that can run on an off-
| the-shelf gaming computer. If you have a modern Nvidia graphics
| card, you can run AI on your own computer totally offline. That's
| the reality.
| koakuma-chan wrote:
| Do you know what "MCP-based methodology" is? I am skeptical of
| a 4B model scoring twice as high as Gemini 2.5 Pro
| dabockster wrote:
| Yeah I know about Model Context Protocol. But it's still only
| a small part of the AI puzzle. I'm saying that we're at a
| point now where a whole AI stack can run, in some form, 100%
| on-device with okayish accuracy. When you think about that,
| and where we're headed, it makes the whole idea of cloud AI
| look like a dinosaur.
| koakuma-chan wrote:
| I mean, I am asking what "MCP-based methodology" is,
| because it doesn't make sense for a 4B model to outperform
| Gemini 2.5 Pro et al by that much.
| toshinoriyagi wrote:
| I'm not too sure what "MCP-based methodology" is, but Jan-
| nano-128k is a small model specifically designed to be able
| to answer in-depth questions accurately via tool-use
| (researching in a provided document or searching the web).
|
| It outperforms those other models, which are not using tools,
| thanks to the tool use and specificity.
|
| Because it is only 4B parameters, it is naturally terrible at
| other things I believe-it's not designed for them and doesn't
| have enough parameters.
|
| In hindsight, "MCP-based methodology" likely refers to its
| tool-use.
| cbcoutinho wrote:
| From the paper:
|
| > Most language models face a fundamental tradeoff where
| powerful capabilities require substantial computational
| resources. We shatter this constraint with Jan-nano, a 4B
| parameter language model that redefines efficiency through
| radical specialization: instead of trying to know everything,
| it masters the art of finding anything instantly. Fine-tuned
| from Qwen3-4B using our novel multi-stage Reinforcement
| Learning with Verifiable Rewards (RLVR) system that
| completely eliminates reliance on next token prediction
| training (SFT), Jan-nano achieves 83.2% on SimpleQA benchmark
| with MCP integration while running on consumer hardware. With
| 128K context length, Jan-nano proves that intelligence isn't
| about scale, it's about strategy.
|
| > For our MCP evaluation, we used mcp-server-serper which
| provides google search and scrape tools
|
| https://arxiv.org/abs/2506.22760
| Martinussen wrote:
| Data storage has gotten cheaper and more efficient/manageable
| every year for decades, yet people seem content with having
| less storage than a mid-range desktop from a decade and a half
| ago, split between their phone and laptop, and leaving
| everything else to the "> cloud" - I wouldn't be so sure we're
| going to see people reach for technological independence this
| time either.
| merelysounds wrote:
| One factor here is people preferring portable devices. Note
| that portable SSDs are also popular.
|
| Also, usage patterns can be different; with storage, if I use
| 90% of my local content only occasionally, I can archive that
| to the cloud and continue using the remaining local 10%.
| prophesi wrote:
| I'm also excited for local LLM's to be capable of assisting
| with nontrivial coding tasks, but we're far from reaching that
| point. VRAM remains a huge bottleneck for even a top-of-the-
| line gaming PC to run them. The best these days for agentic
| coding that get close to the vibe-check of frontier models seem
| to be Qwen3-Coder-480B-A35B-Instruct, DeepSeek-Coder-V2-236B,
| GLM 4.5, and GPT-OSS-120B. The latter being the only one
| capable of fitting on a 64 to 96GB VRAM machine with
| quantization.
|
| Of course, the line will always be pushed back as frontier
| models incrementally improve, but the quality is night and day
| between these open models consumers can feasibly run versus
| even the cheaper frontier models.
|
| That said, I too have no interest in this if local models
| aren't supported and hope that's down the pipeline just so I
| can try tinkering with it. Though it looks like it utilizes
| multiple models for various tasks (planner, programmer,
| reviewer, router, and summarizer) so that only adds to the
| difficulty of the VRAM bottleneck if you'd like to load
| different models per task. So I think it makes sense for them
| to focus on just Claude for now to prove the concept.
|
| edit: I personally use Qwen3 Coder 30B 4bit for both
| autocomplete and talking to an agent, and switch to a frontier
| model for the agent when Qwen3 starts running in circles.
| cowpig wrote:
| I was excited by the announcement but then
|
| > Runs in an isolated sandbox Every task runs in a secure,
| isolated Daytona sandbox.
|
| Oh, so fake open source? Daytona is an AGPL-licensed codebase
| that doesn't actually open-source the control plane, and the
| first instruction in the README is to sign up for their service.
|
| > From the "open-swe" README:
|
| Open SWE can be used in multiple ways:
|
| * From the UI. You can create, manage and execute Open SWE tasks
| from the web application. See the 'From the UI' page in the docs
| for more information.
|
| * From GitHub. You can start Open SWE tasks directly from GitHub
| issues simply by adding a label open-swe, or open-swe-auto
| (adding -auto will cause Open SWE to automatically accept the
| plan, requiring no intervention from you). For enhanced
| performance on complex tasks, use open-swe-max or open-swe-max-
| auto labels which utilize Claude Opus 4.1 for both planning and
| programming. See the 'From GitHub' page in the docs for more
| information.
|
| * * *
|
| The "from the UI" links to their hosted web interface. If I
| cannot run it myself it's fake open-source
| mitchitized wrote:
| Hol up
|
| How can it be AGPL and not provide full source? AGPL is like
| the most aggressive of the GPL license variants. If they
| somehow circumvented the intent behind this license that is a
| problem.
| Multicomp wrote:
| Spitballing here but if it's their code that they have
| copyright on, they can license it to us as agpl, without
| binding themselves to those same terms. They have all rights
| as copyright holders regardless of a given license.
| esafak wrote:
| It's a hosted service with an open source client?
| tevon wrote:
| Very cool! Am using it now and really like the sidebar chat that
| allows you to add context during a run.
|
| I hit an error that was not recoverable. I'd love to see
| functionality to bring all that context over to a new thread, or
| otherwise force it to attempt to recover.
| lta wrote:
| Nice, but I want exactly the opposite. I want my agents to run
| locally without any sort of black box and I certainly don't want
| to be stuck with whatever UI you've designed to interact with the
| git provider you've selected.
|
| It's not a super surprising coming from this pole of over
| engineering so thick I'm surprised it wasn't developed by
| Microsoft in the 90s or 00s
| kristianp wrote:
| Yes, where's the open source agent that runs on the command
| line?
| ryuuseijin wrote:
| It's called opencode: https://opencode.ai/
| numpad0 wrote:
| TIL opencode-opencode name conflict was resolved by
| opencode keeping opencode name and opencode renaming to
| Crush
|
| 1: https://github.com/sst/opencode
|
| 2: https://github.com/opencode-ai/opencode
|
| 3: https://github.com/charmbracelet/crush
___________________________________________________________________
(page generated 2025-08-08 23:00 UTC)