[HN Gopher] Qwen3 Coder 480B is Live on Cerebras
       ___________________________________________________________________
        
       Qwen3 Coder 480B is Live on Cerebras
        
       Author : retreatguru
       Score  : 19 points
       Date   : 2025-08-01 17:50 UTC (5 hours ago)
        
 (HTM) web link (www.cerebras.ai)
 (TXT) w3m dump (www.cerebras.ai)
        
       | retreatguru wrote:
       | I'm looking forward to trying this out.
       | 
       | I'd like to try this out: use Claude Code as the interface, setup
       | claude-code-router to connect to Cerebras Qwen3 coder and see 20x
       | speed up. The speed difference might make up for the slightly
       | less intelligence compared to Sonnet or Opus.
       | 
       | I don't see Qwen3 Coder available yet on Open Router
       | https://openrouter.ai/provider/cerebras
        
         | retreatguru wrote:
         | It's up there now.
        
         | gnulinux wrote:
         | It's averaging to $0.3/1M input tok and $1.2/1M output tok.
         | That's kind of mind blowingly cheap for a model at its caliber.
         | Gemini 2.5 Pro is more than 10x that price.
        
       | alcasa wrote:
       | Really cool, especially once 256k context size becomes available.
       | 
       | I think higher performance will be a key differentiator in AI
       | tool quality from a user perspective, especially in use-cases
       | where model quality is already sufficiently good for human-in-
       | loop usage.
        
       | gnulinux wrote:
       | At $2/1Mt it's cheaper than e.g. Gemini 2.5 Pro which is
       | ($1.25/1Mt for input and $10/1Mt per output). When I code with
       | Aider my requests average to something like 5000 tokens input and
       | 800 tokens output. At this rate, Gemini 2.5 Pro is about $0.01425
       | per single Aider request and Cerebras Qwen3 Coder is $0.0116 per
       | request. Not a significant difference, but I think sufficiently
       | cheaper to be competitive, especially given Qwen3-coder is on
       | part with Gemini/Claude/o3, it even surpasses them in some tests.
       | 
       | NOTE: Currently in OpenRouter, Qwen3-Coder requests are averaging
       | to $0.3/1M input tok and $1.2/1M output tok. That's just so
       | significantly cheaper that I wouldn't be surprised if open weight
       | models start eating Google/Anthropic/OpenAI lunch.
       | https://openrouter.ai/qwen/qwen3-coder
        
       | M4v3R wrote:
       | 2000 tokens per second is absolutely insane for a model that's on
       | par with GPT 4.1. However throughoutput is only one part of the
       | equation, the other being latency. As of right now it looks like
       | the latency for every API call is quite high, it takes few
       | seconds to receive first token for every API call. This means
       | it's not as exciting for agentic use where many API calls are
       | being made in quick succession. I wish providers focused more on
       | this part.
        
       | pxc wrote:
       | This feels way less annoying to use than ChatGPT. But I wonder
       | how much the effect is lost when the tool does many of the things
       | that make models like o3 useful (repeated web searches, running
       | code in a sandbox, etc.).
       | 
       | For code generation, this does seem pretty useful with something
       | like Qwen3-Coder-480B, if that generates good enough code for
       | your purposes.
       | 
       | But for chat, I wonder: does this kind of speed call for models
       | that behave pretty differently to current ones? With virtually
       | instant speed, I find myself wanting _much_ shorter answers
       | sometimes. Maybe a model whose design and training are focused on
       | concision and a context with lots and lots of turns would be a
       | uniquely useful option with this kind of hardware.
       | 
       | But I guess the hardware is really for training, right, and the
       | inference-as-a-service stuff is basically a powerful form of
       | marketing?
        
       ___________________________________________________________________
       (page generated 2025-08-01 23:01 UTC)