[HN Gopher] Qwen3 Coder 480B is Live on Cerebras
___________________________________________________________________
Qwen3 Coder 480B is Live on Cerebras
Author : retreatguru
Score : 19 points
Date : 2025-08-01 17:50 UTC (5 hours ago)
(HTM) web link (www.cerebras.ai)
(TXT) w3m dump (www.cerebras.ai)
| retreatguru wrote:
| I'm looking forward to trying this out.
|
| I'd like to try this out: use Claude Code as the interface, setup
| claude-code-router to connect to Cerebras Qwen3 coder and see 20x
| speed up. The speed difference might make up for the slightly
| less intelligence compared to Sonnet or Opus.
|
| I don't see Qwen3 Coder available yet on Open Router
| https://openrouter.ai/provider/cerebras
| retreatguru wrote:
| It's up there now.
| gnulinux wrote:
| It's averaging to $0.3/1M input tok and $1.2/1M output tok.
| That's kind of mind blowingly cheap for a model at its caliber.
| Gemini 2.5 Pro is more than 10x that price.
| alcasa wrote:
| Really cool, especially once 256k context size becomes available.
|
| I think higher performance will be a key differentiator in AI
| tool quality from a user perspective, especially in use-cases
| where model quality is already sufficiently good for human-in-
| loop usage.
| gnulinux wrote:
| At $2/1Mt it's cheaper than e.g. Gemini 2.5 Pro which is
| ($1.25/1Mt for input and $10/1Mt per output). When I code with
| Aider my requests average to something like 5000 tokens input and
| 800 tokens output. At this rate, Gemini 2.5 Pro is about $0.01425
| per single Aider request and Cerebras Qwen3 Coder is $0.0116 per
| request. Not a significant difference, but I think sufficiently
| cheaper to be competitive, especially given Qwen3-coder is on
| part with Gemini/Claude/o3, it even surpasses them in some tests.
|
| NOTE: Currently in OpenRouter, Qwen3-Coder requests are averaging
| to $0.3/1M input tok and $1.2/1M output tok. That's just so
| significantly cheaper that I wouldn't be surprised if open weight
| models start eating Google/Anthropic/OpenAI lunch.
| https://openrouter.ai/qwen/qwen3-coder
| M4v3R wrote:
| 2000 tokens per second is absolutely insane for a model that's on
| par with GPT 4.1. However throughoutput is only one part of the
| equation, the other being latency. As of right now it looks like
| the latency for every API call is quite high, it takes few
| seconds to receive first token for every API call. This means
| it's not as exciting for agentic use where many API calls are
| being made in quick succession. I wish providers focused more on
| this part.
| pxc wrote:
| This feels way less annoying to use than ChatGPT. But I wonder
| how much the effect is lost when the tool does many of the things
| that make models like o3 useful (repeated web searches, running
| code in a sandbox, etc.).
|
| For code generation, this does seem pretty useful with something
| like Qwen3-Coder-480B, if that generates good enough code for
| your purposes.
|
| But for chat, I wonder: does this kind of speed call for models
| that behave pretty differently to current ones? With virtually
| instant speed, I find myself wanting _much_ shorter answers
| sometimes. Maybe a model whose design and training are focused on
| concision and a context with lots and lots of turns would be a
| uniquely useful option with this kind of hardware.
|
| But I guess the hardware is really for training, right, and the
| inference-as-a-service stuff is basically a powerful form of
| marketing?
___________________________________________________________________
(page generated 2025-08-01 23:01 UTC)