[HN Gopher] Qwen3-Coder: Agentic Coding in the World
       ___________________________________________________________________
        
       Qwen3-Coder: Agentic Coding in the World
        
       Author : danielhanchen
       Score  : 99 points
       Date   : 2025-07-22 21:12 UTC (1 hours ago)
        
 (HTM) web link (qwenlm.github.io)
 (TXT) w3m dump (qwenlm.github.io)
        
       | jddj wrote:
       | Odd to see this languishing at the bottom of /new. Looks very
       | interesting.
       | 
       | Open, small, if the benchmarks are to be believed sonnet 4~ish,
       | tool use?
        
         | danielhanchen wrote:
         | Ye the model looks extremely powerful! I think they're also
         | maybe making a small variant as well, but unsure yet!
        
           | sourcecodeplz wrote:
           | Yes they are:
           | 
           | "Today, we're announcing Qwen3-Coder, our most agentic code
           | model to date. Qwen3-Coder is available in multiple sizes,
           | but we're excited to introduce its most powerful variant
           | first: Qwen3-Coder-480B-A35B-Instruct."
           | 
           | https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct
        
             | danielhanchen wrote:
             | Oh yes fantastic! Excited for them!
        
           | fotcorn wrote:
           | It says that there are multiple sizes in the second sentence
           | of the huggingface page:
           | https://huggingface.co/Qwen/Qwen3-Coder-480B-A35B-Instruct
           | 
           | You won't be out of work creating ggufs anytime soon :)
        
             | danielhanchen wrote:
             | :)
        
         | stuartjohnson12 wrote:
         | Qwen has previously engaged in deceptive benchmark hacking.
         | They previously claimed SOTA coding performance back in January
         | and there's a good reason that no software engineer you know
         | was writing code with Qwen 2.5.
         | 
         | https://winbuzzer.com/2025/01/29/alibabas-new-qwen-2-5-max-m...
         | 
         | Alibaba is not a company whose culture is conducive to earnest
         | acknowledgement that they are behind SOTA.
        
       | mohsen1 wrote:
       | Open weight models matching Cloud 4 is exciting! It's really
       | possible to run this locally since it's MoE
        
         | danielhanchen wrote:
         | Ye! Super excited for Coder!!
        
         | ilaksh wrote:
         | Where do you put the 480 GB to run it at any kind of speed? You
         | have that much RAM?
        
           | Cheer2171 wrote:
           | You can get a used 5 year old Xeon Dell or Lenovo Workstation
           | and 8x64GB of ECC DDR4 RAM for about $1500-$2000.
           | 
           | Or you can rent a newer one for $300/mo on the cloud
        
             | sourcecodeplz wrote:
             | Everyone keeps saying this but it is not really useful.
             | Without a dedicated GPU & VRAM, you are waiting overnight
             | for a response... The MoE models are great but they need
             | dedicated GPU & VRAM to work fast.
        
           | danielhanchen wrote:
           | You don't actually need 480GB of RAM, but if you want at
           | least 3 tokens / s, it's a must.
           | 
           | If you have 500GB of SSD, llama.cpp does disk offloading ->
           | it'll be slow though less than 1 token / s
        
             | UncleOxidant wrote:
             | > but if you want at least 3 tokens / s
             | 
             | 3 t/s isn't going to be a lot of fun to use.
        
           | teaearlgraycold wrote:
           | As far as inference costs go 480GB of RAM is cheap.
        
       | generalizations wrote:
       | > Additionally, we are actively exploring whether the Coding
       | Agent can achieve self-improvement
       | 
       | How casually we enter the sci-fi era.
        
       | jasonthorsness wrote:
       | What sort of hardware will run Qwen3-Coder-480B-A35B-Instruct?
       | 
       | With the performance apparently comparable to Sonnet some of the
       | heavy Claude Code users could be interested in running it
       | locally. They have instructions for configuring it for use by
       | Claude Code. Huge bills for usage are regularly shared on X, so
       | maybe it could even be economical (like for a team of 6 or
       | something sharing a local instance).
        
         | sourcecodeplz wrote:
         | With RAM you would need at least 500gb to load it but some
         | 100-200gb more for context too. Pair it with a 24gb GPU and the
         | speed will be 10t/s, at least, I estimate.
        
           | danielhanchen wrote:
           | Oh yes for the FP8, you will need 500GB ish. 4bit around
           | 250GB - offloading MoE experts / layers to RAM will
           | definitely help - as you mentioned a 24GB card should be
           | enough!
        
         | danielhanchen wrote:
         | I'm currently trying to make dynamic GGUF quants for them! It
         | should use 24GB of VRAM + 128GB of RAM for dynamic 2bit or so -
         | they should be up in an hour or so: https://huggingface.co/unsl
         | oth/Qwen3-Coder-480B-A35B-Instruc.... On running them locally,
         | I do have docs as well:
         | https://docs.unsloth.ai/basics/qwen3-coder
        
           | gardnr wrote:
           | Legend
        
             | danielhanchen wrote:
             | :)
        
           | zettabomb wrote:
           | Any significant benefits at 3 or 4 bit? I have access to
           | twice that much VRAM and system RAM but of course that could
           | potentially be better used for KV cache.
        
             | sourcecodeplz wrote:
             | For coding you want more precision so the higher the quant
             | the better. But there is discussion if a smaller model in
             | higher quant is better than a larger one in lower quant.
             | Need to test for yourself with your use cases I'm afraid.
             | 
             | e: They did announce smaller variants will be released.
        
               | danielhanchen wrote:
               | Yes the higher the quant, the better! The other approach
               | is dynamically choosing to upcast some layers!
        
             | fzzzy wrote:
             | I would say that three or four bit are likely to be
             | significantly better. But that's just from my previous
             | experience with quants. Personally, I try not to use
             | anything smaller than a Q4.
        
             | danielhanchen wrote:
             | So dynamic quants like what I upload are not actually 4bit!
             | It's a mixture of 4bit to 8bit with important layers being
             | in higher precision! I wrote about our method here:
             | https://docs.unsloth.ai/basics/unsloth-dynamic-2.0-ggufs
        
         | ilaksh wrote:
         | To run the real version with the bench arks they give, it would
         | be a nonquantized non distilled version. So I am guessing that
         | is a cluster of 8 H200s if you want to be more or less up to
         | date. They have B200s now which are much faster but also much
         | more expensive. $300,000+
         | 
         | You will see people making quantized distilled versions but
         | they never give benchmark results.
        
           | danielhanchen wrote:
           | Oh you can run the Q8_0 / Q8_K_XL which is nearly equivalent
           | to FP8 (maybe off by 0.01% or less) -> you will need 500GB of
           | VRAM + RAM + Disk space. Via MoE layer offloading, it should
           | function ok
        
             | summarity wrote:
             | This should work well for MLX Distributed. The low
             | activation MoE is great for multi node inference.
        
         | btian wrote:
         | Do need to be super fancy. Just RTX Pro 6000 and 256GB of RAM.
        
       | rbren wrote:
       | Glad to see everyone centering on using OpenHands [1] as the
       | scaffold! Nothing more frustrating than seeing "private scaffold"
       | on a public benchmark report.
       | 
       | [1] https://github.com/All-Hands-AI/OpenHands
        
       | flakiness wrote:
       | The "qwen-code" app seems to be a gemini-cli fork.
       | 
       | https://github.com/QwenLM/qwen-code
       | https://github.com/QwenLM/qwen-code/blob/main/LICENSE
       | 
       | I hope these OSS CC clones converge at some point.
       | 
       | Actually it is mentioned in the page:                  we're also
       | open-sourcing a command-line tool for agentic coding: Qwen Code.
       | Forked from Gemini Code
        
         | rapind wrote:
         | I currently use claude-code as the director basically, but
         | outsource heavy thinking to openai and gemini pro via zen mcp.
         | I could instead use gemini-cli as it's also supported by zen. I
         | would imagine it's trivial to add qwen-coder support if it's
         | based on gemini-cli.
        
         | mrbonner wrote:
         | They also support Claude Code. But my understanding is Claude
         | Code is closed source and only support Clade API endpoint. How
         | do they make it work?
        
       | rapind wrote:
       | I just checked and it's up on OpenRouter. (not affiliated)
       | https://openrouter.ai/qwen/qwen3-coder
        
       | danielhanchen wrote:
       | I'm currently making 2bit to 8bit GGUFs for local deployment!
       | Will be up in an hour or so at
       | https://huggingface.co/unsloth/Qwen3-Coder-480B-A35B-Instruc...
       | 
       | Also docs on running it in a 24GB GPU + 128 to 256GB of RAM here:
       | https://docs.unsloth.ai/basics/qwen3-coder
        
         | mathrawka wrote:
         | Looks like the docs have a typo:                   Recommended
         | context: 65,536 tokens (can be increased)
         | 
         | That should be recommended token output, as shown in the
         | official docs as:                   Adequate Output Length: We
         | recommend using an output length of 65,536 tokens for most
         | queries, which is adequate for instruct models.
        
       | pxc wrote:
       | [delayed]
        
       ___________________________________________________________________
       (page generated 2025-07-22 23:00 UTC)