[HN Gopher] Qwen3-Coder: Agentic Coding in the World
___________________________________________________________________
Qwen3-Coder: Agentic Coding in the World
Author : danielhanchen
Score : 99 points
Date : 2025-07-22 21:12 UTC (1 hours ago)
(HTM) web link (qwenlm.github.io)
(TXT) w3m dump (qwenlm.github.io)
| jddj wrote:
| Odd to see this languishing at the bottom of /new. Looks very
| interesting.
|
| Open, small, if the benchmarks are to be believed sonnet 4~ish,
| tool use?
| danielhanchen wrote:
| Ye the model looks extremely powerful! I think they're also
| maybe making a small variant as well, but unsure yet!
| sourcecodeplz wrote:
| Yes they are:
|
| "Today, we're announcing Qwen3-Coder, our most agentic code
| model to date. Qwen3-Coder is available in multiple sizes,
| but we're excited to introduce its most powerful variant
| first: Qwen3-Coder-480B-A35B-Instruct."
|
| https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct
| danielhanchen wrote:
| Oh yes fantastic! Excited for them!
| fotcorn wrote:
| It says that there are multiple sizes in the second sentence
| of the huggingface page:
| https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct
|
| You won't be out of work creating ggufs anytime soon :)
| danielhanchen wrote:
| :)
| stuartjohnson12 wrote:
| Qwen has previously engaged in deceptive benchmark hacking.
| They previously claimed SOTA coding performance back in January
| and there's a good reason that no software engineer you know
| was writing code with Qwen 2.5.
|
| https://winbuzzer.com/2025/01/29/alibabas-new-qwen-2-5-max-m...
|
| Alibaba is not a company whose culture is conducive to earnest
| acknowledgement that they are behind SOTA.
| mohsen1 wrote:
| Open weight models matching Cloud 4 is exciting! It's really
| possible to run this locally since it's MoE
| danielhanchen wrote:
| Ye! Super excited for Coder!!
| ilaksh wrote:
| Where do you put the 480 GB to run it at any kind of speed? You
| have that much RAM?
| Cheer2171 wrote:
| You can get a used 5 year old Xeon Dell or Lenovo Workstation
| and 8x64GB of ECC DDR4 RAM for about $1500-$2000.
|
| Or you can rent a newer one for $300/mo on the cloud
| sourcecodeplz wrote:
| Everyone keeps saying this but it is not really useful.
| Without a dedicated GPU & VRAM, you are waiting overnight
| for a response... The MoE models are great but they need
| dedicated GPU & VRAM to work fast.
| danielhanchen wrote:
| You don't actually need 480GB of RAM, but if you want at
| least 3 tokens / s, it's a must.
|
| If you have 500GB of SSD, llama.cpp does disk offloading ->
| it'll be slow though less than 1 token / s
| UncleOxidant wrote:
| > but if you want at least 3 tokens / s
|
| 3 t/s isn't going to be a lot of fun to use.
| teaearlgraycold wrote:
| As far as inference costs go 480GB of RAM is cheap.
| generalizations wrote:
| > Additionally, we are actively exploring whether the Coding
| Agent can achieve self-improvement
|
| How casually we enter the sci-fi era.
| jasonthorsness wrote:
| What sort of hardware will run Qwen3-Coder-480B-A35B-Instruct?
|
| With the performance apparently comparable to Sonnet some of the
| heavy Claude Code users could be interested in running it
| locally. They have instructions for configuring it for use by
| Claude Code. Huge bills for usage are regularly shared on X, so
| maybe it could even be economical (like for a team of 6 or
| something sharing a local instance).
| sourcecodeplz wrote:
| With RAM you would need at least 500gb to load it but some
| 100-200gb more for context too. Pair it with a 24gb GPU and the
| speed will be 10t/s, at least, I estimate.
| danielhanchen wrote:
| Oh yes for the FP8, you will need 500GB ish. 4bit around
| 250GB - offloading MoE experts / layers to RAM will
| definitely help - as you mentioned a 24GB card should be
| enough!
| danielhanchen wrote:
| I'm currently trying to make dynamic GGUF quants for them! It
| should use 24GB of VRAM + 128GB of RAM for dynamic 2bit or so -
| they should be up in an hour or so: https://huggingface.co/unsl
| oth/Qwen3-Coder-480B-A35B-Instruc.... On running them locally,
| I do have docs as well:
| https://docs.unsloth.ai/basics/qwen3-coder
| gardnr wrote:
| Legend
| danielhanchen wrote:
| :)
| zettabomb wrote:
| Any significant benefits at 3 or 4 bit? I have access to
| twice that much VRAM and system RAM but of course that could
| potentially be better used for KV cache.
| sourcecodeplz wrote:
| For coding you want more precision so the higher the quant
| the better. But there is discussion if a smaller model in
| higher quant is better than a larger one in lower quant.
| Need to test for yourself with your use cases I'm afraid.
|
| e: They did announce smaller variants will be released.
| danielhanchen wrote:
| Yes the higher the quant, the better! The other approach
| is dynamically choosing to upcast some layers!
| fzzzy wrote:
| I would say that three or four bit are likely to be
| significantly better. But that's just from my previous
| experience with quants. Personally, I try not to use
| anything smaller than a Q4.
| danielhanchen wrote:
| So dynamic quants like what I upload are not actually 4bit!
| It's a mixture of 4bit to 8bit with important layers being
| in higher precision! I wrote about our method here:
| https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs
| ilaksh wrote:
| To run the real version with the bench arks they give, it would
| be a nonquantized non distilled version. So I am guessing that
| is a cluster of 8 H200s if you want to be more or less up to
| date. They have B200s now which are much faster but also much
| more expensive. $300,000+
|
| You will see people making quantized distilled versions but
| they never give benchmark results.
| danielhanchen wrote:
| Oh you can run the Q8_0 / Q8_K_XL which is nearly equivalent
| to FP8 (maybe off by 0.01% or less) -> you will need 500GB of
| VRAM + RAM + Disk space. Via MoE layer offloading, it should
| function ok
| summarity wrote:
| This should work well for MLX Distributed. The low
| activation MoE is great for multi node inference.
| btian wrote:
| Do need to be super fancy. Just RTX Pro 6000 and 256GB of RAM.
| rbren wrote:
| Glad to see everyone centering on using OpenHands [1] as the
| scaffold! Nothing more frustrating than seeing "private scaffold"
| on a public benchmark report.
|
| [1] https://github.com/All-Hands-AI/OpenHands
| flakiness wrote:
| The "qwen-code" app seems to be a gemini-cli fork.
|
| https://github.com/QwenLM/qwen-code
| https://github.com/QwenLM/qwen-code/blob/main/LICENSE
|
| I hope these OSS CC clones converge at some point.
|
| Actually it is mentioned in the page: we're also
| open-sourcing a command-line tool for agentic coding: Qwen Code.
| Forked from Gemini Code
| rapind wrote:
| I currently use claude-code as the director basically, but
| outsource heavy thinking to openai and gemini pro via zen mcp.
| I could instead use gemini-cli as it's also supported by zen. I
| would imagine it's trivial to add qwen-coder support if it's
| based on gemini-cli.
| mrbonner wrote:
| They also support Claude Code. But my understanding is Claude
| Code is closed source and only support Clade API endpoint. How
| do they make it work?
| rapind wrote:
| I just checked and it's up on OpenRouter. (not affiliated)
| https://openrouter.ai/qwen/qwen3-coder
| danielhanchen wrote:
| I'm currently making 2bit to 8bit GGUFs for local deployment!
| Will be up in an hour or so at
| https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruc...
|
| Also docs on running it in a 24GB GPU + 128 to 256GB of RAM here:
| https://docs.unsloth.ai/basics/qwen3-coder
| mathrawka wrote:
| Looks like the docs have a typo: Recommended
| context: 65,536 tokens (can be increased)
|
| That should be recommended token output, as shown in the
| official docs as: Adequate Output Length: We
| recommend using an output length of 65,536 tokens for most
| queries, which is adequate for instruct models.
| pxc wrote:
| [delayed]
___________________________________________________________________
(page generated 2025-07-22 23:00 UTC)