[HN Gopher] Cerebras systems raises $1.1B Series G
___________________________________________________________________
Cerebras systems raises $1.1B Series G
Author : fcpguru
Score : 102 points
Date : 2025-09-30 15:54 UTC (7 hours ago)
(HTM) web link (www.cerebras.ai)
(TXT) w3m dump (www.cerebras.ai)
| fcpguru wrote:
| Their core product is the Wafer Scale Engine (WSE-3) -- the
| largest single chip ever made for AI, designed to train and run
| models much faster and more efficiently than traditional GPUs.
|
| Just tried https://cloud.cerebras.ai wow is it fast!
| OGEnthusiast wrote:
| I'm surprised how under-the-radar Cerebras is. Being able to get
| near-instantaneous responses from Qwen3 and gpt-oss is pretty
| incredible.
| data-ottawa wrote:
| I wish I could invest in them. Agree they're under the radar.
| redwood wrote:
| Would be interesting if IBM were to acquire. Seems like the big
| iron approach to GPUs
| maz1b wrote:
| Cerebras has been a true revelation when it comes to inference. I
| have a lot of respect for their founder, team, innovation, and
| technology. The colossal size of the WS3 chip, utilizing DRAM to
| mind-boggling scale, it's definitely ultra cool stuff.
|
| I also wonder why they have not been acquired yet. Or is it
| intentional?
|
| I will say, their pricing and deployment strategy is a bit murky
| and unclear. Paying $1500-$10,000 per month plus usage costs? I'm
| assuming that it has to do with chasing and optimizing for higher
| value contracts and deeper-pocketed customers, hence the minimum
| monthly spend that they require.
|
| I'm not claiming to be an expert, but as a CEO/CTO, there were
| other providers in the market that had relatively comparable
| inference speed (obviously Cerebras is #1), easier onboarding,
| better response from people that worked there (all of my
| experience with Cerebras have been days/weeks late or simply
| ignored). IMHO, if Cerebras wants to gain more mindshare, they'll
| have to look into this aspect.
| oceanplexian wrote:
| I've been using them as a customer and have been fairly
| impressed. The thing is, a lot of inference providers might
| seem better on paper but it turns out they're not.
|
| Recently there was a fiasco I saw posted on r/localllama where
| many of the OpenRouter providers were degraded on benchmarks
| compared to base models, implying they are serving up quantized
| models to save costs, but lying to customers about it. Unless
| you're actually auditing the tokens you're purchasing you may
| not be getting what you're paying for even if the T/s and
| $/token seems better.
| dlojudice wrote:
| OpenRouter should be responsible for this quality control,
| right? It seems to me to be the right player in the chain
| with the duties and scale to do so.
| liuliu wrote:
| They were acquisition target since 2017 (from the OpenAI
| internal emails). So lacking of acquisition is not because
| lacking of interests. Let you wonder what happened in these
| due-diligence.
| OkayPhysicist wrote:
| The UAE has sunk a lot of money into them, and I suspect it's
| not purely a financial move. If that's the case, an acquisition
| might be more complicated than it would seem at first glance.
| throw123890423 wrote:
| > I will say, their pricing and deployment strategy is a bit
| murky and unclear. Paying $1500-$10,000 per month plus usage
| costs? I'm assuming that it has to do with chasing and
| optimizing for higher value contracts and deeper-pocketed
| customers, hence the minimum monthly spend that they require.
|
| Yeah wait, why rent chips instead of sell them? Why wouldn't
| customers want to invest money in competition for cheaper
| inference hardware? It's not like Nvidia has a blacklist of
| companies that have bought chips from competitors, or anything.
| Now that would be crazy! That sure would make this market tough
| to compete in, wouldn't it. I'm so glad Nvidia is definitely
| not pressuring companies to not buy from competitors or
| anything.
| aurareturn wrote:
| Their chips weren't selling because:
|
| 1. They're useless for training in 2025. They were designed
| for training prior to LLM explosion. They're not practical
| for training anymore because they rely on SRAM which is not
| scalable.
|
| 2. No one is going to spend the resources to optimize models
| to run on their SDK and hardware. Open source inference
| engines don't optimize for Cerebras hardware.
|
| Given the above two reasons, it makes a lot of sense that no
| one is investing in their hardware and they have switched to
| a cloud model selling speed as the differentiator.
|
| It's not always "Nvidia bad".
| nsteel wrote:
| > utilizing DRAM to mind-boggling scale
|
| I thought it was the SRAM scaling that was impressive, no?
| maz1b wrote:
| oops, typo! S and D are next to each other on the keyboard.
| thanks for pointing this out
| aurareturn wrote:
| I also wonder why they have not been acquired yet. Or is it
| intentional?
|
| A few issues:
|
| 1. To achieve high speeds, they put everything on SRAM. I
| estimated that they needed over $100m of chips just to do Qwen
| 3 at max context size. You can run the same model with max
| context size on $1m of Blackwell chips but at a slower speed.
| Anandtech had an article saying that Cerebras was selling a
| single chip for around $2-3m.
| https://news.ycombinator.com/item?id=44658198
|
| 2. SRAM has virtually stopped scaling in new nodes. Therefore,
| new generations of wafer scale chips won't gain as much as
| traditional GPUs.
|
| 3. Cerebras was designed in the pre-ChatGPT era where much
| smaller models were being trained. It is practically useless
| for training in 2025 because of how big LLMs have gotten. It
| can only do inference but see above 2 problems.
|
| 4. To inference very large LLMs economically, Cerebras would
| need to use external HBM. If it has to reach outside for
| memory, the benefits of a wafer scale chip greatly diminishes.
| Remember that the whole idea was to put the entire AI model
| inside the wafer so memory bandwidth is ultra fast.
|
| 5. Chip interconnect technology might make wafer scale chips
| more redundant. TSMC has a roadmap for glueing more than 2 GPU
| dies together. Nvidia's Feynman GPUs might have 4 dies glued
| together. IE, the sweet spot for large chips might not be wafer
| scale but perhaps 2, 4, 8 GPUs together.
|
| 6. Nvidia seems to be moving much faster in terms of
| development and responding to market needs. For example,
| Blackwell is focused on FP4 inferencing now. I suppose the
| nature of designing and building a wafer scale chip is more
| complex than a GPU. Cerebras also needs to wait for new nodes
| to fully mature so that yields can be higher.
|
| There exists a niche where some applications might need super
| fast token generation regardless of price. Hedge funds and
| Wallstreet might be good use cases. But it won't challenge
| Nvidia in training or large scale inference.
| addaon wrote:
| > SRAM has virtually stopped scaling in new nodes.
|
| But there are several 1T memories that are still scaling,
| more or less -- eDRAM, MRAM, etc. Is there anything
| preventing their general architecture from moving to a 1T
| technology once the density advantages outweigh the need for
| pipelining to hide access time?
| aurareturn wrote:
| I'm pretty sure that HBM4 can be 20-30x faster in terms of
| bandwidth than eDRAM. That makes eDRAM not an option for AI
| workloads since bandwidth is the main bottleneck.
| addaon wrote:
| HBM4 is limited to a few thousand bits of width per
| stack. eDRAM bandwidth scales with chip area. A full-
| wafer chip could have astonishing bandwidth.
| Voloskaya wrote:
| > I estimated that they needed over $100m of chips just to do
| Qwen 3 at max context size
|
| I will point out (again :)), that this math is completely
| wrong. There is no need (nor performance gains) to store the
| entire weights of the model in SRAM. You simply store n
| transformer blocks on-chip and then stream block l+n from
| external memory to on-chip when you start computing block l,
| this completely masks the communication time behind the
| compute time, and specifically does not require you to buy
| 100M$ worth of SRAM. This is standard stuff that is done
| routinely in many scenarios, e.g. FSDP.
|
| https://www.cerebras.ai/blog/cerebras-software-
| release-2.0-5...
| MichaelZuo wrote:
| So then what explains such a low implied valuation at
| series G?
|
| There's no way that could be the case if the technology was
| competitive.
| Voloskaya wrote:
| I'm not saying it's particularly competitive, I'm saying
| claiming it cost 100M$ to run Qwen is complete lunacy.
| There is a gulf between those 2 things.
|
| And beyond pure performance competitiveness there are
| many things that make it hard for Cerebras and to be
| actually competitive: can they ship enough chips to meet
| the need of large clusters ? What about the software
| stack and lack of great support compared to nvidia? Lack
| of ml engineers that know how to use them, when everyone
| knows how to use CUDA and there are many things developed
| on top of it by the community (e.g triton).
|
| Just look at the valuation difference between AMD and
| Nvidia, when AMD is already very competitive. But being
| 99% of the way there is still not enough for customers
| that are going to pay 5B$ for their clusters.
| vlovich123 wrote:
| I did experiments with this on traditional consumer GPU and
| the larger the discrepancy between model size and VRAM, the
| faster it dropped off (exponentially) to as if you didn't
| even have any VRAM in the first place (over PCIe). This
| technique is well known and works when you have more than
| enough bandwidth.
|
| However, the whole point that even HBM is a problem is the
| available bandwidth is insufficient, so if you're marrying
| SRAM and HBM I would expect the performance gains to be
| overall modest for models that exceed available SRAM in a
| meaningful way.
| ramshanker wrote:
| I am not able to guess, what is preventing Cerebras from
| replacing few of the cores in the Wafer-Scale package with HBM
| memory? It seems the only constraint with their WSE3 is memory
| capacity. Considering the size of NVDA chips, Only a small subset
| of wafer area should easily exceed the memory size of
| contemporary models.
| reliabilityguy wrote:
| DRAMs (core of the HBM memories) use different technology nodes
| than logic and SRAM. Also, stacking that many DRAMs on waver
| will complicate the packaging quite a bit I think.
| xadhominemx wrote:
| I don't think so. The reason why Cerebras is so fast for
| inference is that the KV cache sits in the SRAM.
| aurareturn wrote:
| If you replace some cores with HBM on package, you basically
| get the traditional GPU + HBM model.
| Shakahs wrote:
| Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder
| on Cerebras is often more productive for me because it's just so
| incredibly _fast_. Even if it takes more LLM calls to complete a
| task, those calls are all happening in a fraction of the time.
| nerpderp82 wrote:
| We must have very different workflows, I am curious about
| yours. What tools are you using and how are you guiding
| Qwen3-Coder? When I am using Claude Code, it often works for
| 10+ minutes at a time, so I am not aware of inference speed.
| CaptainOfCoit wrote:
| > When I am using Claude Code, it often works for 10+ minutes
| at a time, so I am not aware of inference speed.
|
| Indirectly, it sounds like you're aware about the inference
| speed? Imagine if it took 2 minutes instead of 10 minutes,
| that's what the parent means.
| yodon wrote:
| 2 minutes is the worst delay. With 10 minutes, I can and do
| context switch to something else and use the time
| productively. With 2 min, I wait and get frustrated and
| bored.
| solarkraft wrote:
| You must write very elaborate prompts for 10 minutes to be
| worth the wait. What permissions are you giving it and how
| much do you care about the generated code? How much time did
| you spend on initial setup?
|
| I've found that the best way for myself to do LLM assisted
| coding at this point in time is in a somewhat tight feedback
| loop. I find myself wanting to refine the code and
| architectural approaches a fair amount as I see them coming
| in and latency matters a lot to me here.
| ripped_britches wrote:
| Do you use cursor or what? Interested in how you set this up
| Shakahs wrote:
| I use it via the Kilo Code extension for VSCode, which is
| invoking Qwen3-Coder via a Cerebras Code subscription.
|
| https://github.com/Kilo-Org/kilocode
| https://www.cerebras.ai/blog/introducing-cerebras-code
| mythz wrote:
| Running Qwen3 coder at speed is great, but would also prefer to
| have access to other leading OSS models like GLM 4.6, Kimi K2 and
| DeepSeek v3.2 before considering switching subs.
|
| Groq also runs OSS models at speed which is my preferred way to
| access Kimi K2 on their free quotas.
| JLO64 wrote:
| My experience with Cerebras is pretty mixed. On the one hand for
| simple and basic requests, it truly is mind blowing how fast it
| is. That said, I've had nothing but issues and empty responses
| whenever I try to use them for coding tasks (Opencode via
| Openrouter, GPT-OSS). It's gotten to a point where I've disabled
| them as a provider on Openrouter.
| divmain wrote:
| I experienced the same, but I think it is a limitation of
| OpenRouter. When I hit Cerebra's OpenAI endpoint directly, it
| works flawlessly.
| allisdust wrote:
| If the idiots at AMZN have any brains left, they would acquire
| this and make it the center of their inference offerings. But
| considering how lackluster their performance and strategy as a
| company has been off late, I doubt that.
|
| Disappointed quite a bit with this fund raise. They were expected
| to IPO this year and give us poor retail investors a chance at
| investing in them.
| reliabilityguy wrote:
| Amazon has their own chips for inference and training:
| Trainium1/2.
| allisdust wrote:
| Nothing (may be except groq ?) comes even close to Cerebras
| in inference speed. I seriously don't get why these guys
| aren't more popular. The difference in using them as a
| inference provider vs anything else for any use case is like
| night and day. I hope more inference providers focus on
| speed. And this is where AMZN will benefit a lot since their
| entire cloud model is to have something people would anyway
| want and mark it up by 3x. God forbid if AVGO acquires this.
| xadhominemx wrote:
| Cerebras hasn't made any technical breakthroughs, they are
| just putting everything in SRAM. It's a brute force
| approach to get very high inference throughput but comes at
| extremely high cost per token per second and is not useful
| for batched inferencing. Groq uses the same approach.
|
| Memory hierarchy management across HBM/DDR/Flash is much
| more difficult but necessary to achieve practical inference
| economics.
| twothreeone wrote:
| I don't think you realize the history of wafer-scale
| integration and what it means for the chip industry [1].
| The approach was famously taken by Gene Amdahl's Trilogy
| Systems in the 80ies, but failed dramatically leading to
| (among others) deployment of "accelerator cards" in the
| form of.. the NVIDIA GeForce 256, the first GPU in 1999.
| It's not like NVIDIA hasn't been trying to integrate
| multiple dies in the same package, but doing that
| successfully has been a huge technological hurdle so far.
|
| [1] https://ieeexplore.ieee.org/abstract/document/9623424
| averne_ wrote:
| The main reason a wafer scale chip works there is because
| their cores are extremely tiny, and silicon area that
| gets fused off in the event of a defect is much lower
| than on NVIDIA chips, where a whole SM can get disabled.
| AFAIU this approach is not easily applicable to complex
| core designs.
| xadhominemx wrote:
| I understand that topic well. They stitched top metal
| layers across the reticle - not that challenging, and the
| foundational IP is not their own.
|
| Everyone else went the CoWoS direction, which enables
| heterogeneous integration and much more cost effective
| inference.
| onlyrealcuzzo wrote:
| It would be hard to beat designing their own in-house offering
| that is 50% as good, at 20% the cost.
|
| That's the problem.
|
| Unless the majority of the value is on the other end of the
| curve, it's a tough sell.
| rvz wrote:
| Sooner or later, lots of competitors including Cerebras are going
| to take apart Nvidia's data center market share and it will cause
| many AI model firms to question the unnecessary spend and
| hoarding of GPUs.
|
| OpenAI is _still_ developing their own chips with Broadcom, but
| they are not operational yet. So for now, they 're buying GPUs
| from Nvidia to build up their own revenue income (to later spend
| it on their own chips)
|
| By 2030, eventually many companies will be looking for
| alternatives to Nvidia like Cerebras or Lightmatter for both
| training and inference use-cases.
|
| For example [0] Meta just acquired a chip startup for this
| _exact_ reason - _" An alternative to training AI systems"_ and
| _" to cut infrastructure costs linked to its spending on advanced
| AI tools."_.
|
| [0] https://www.reuters.com/business/meta-buy-chip-startup-
| rivos...
| onlyrealcuzzo wrote:
| There's so much optimization to be made when developing the
| model and the hardware it runs on, most of the big players are
| likely to run a non-trivial percentage of their workloads on
| proprietary chips _eventually_.
|
| If that's 5 years into the future, that looks bad for Nvidia,
| if it's >10 years in the future, that doesn't affect Nvidia's
| current stock price very much.
| arjie wrote:
| I just tried out Qwen-3-480B-Coder on them yesterday and to be
| honest it's not good enough. It's very fast but has trouble on
| lots of tasks that Claude Code just solves. Perhaps part of it is
| that I'm using Charm's Crush instead of Claude Code.
| tibbydudeza wrote:
| Damm they are fast.
| dgfitz wrote:
| Valued at 8.1 billion dollars.
|
| https://www.cerebras.ai/pricing
|
| $50/month for one person for code (daily token limit), or pay per
| token, or $1500/month for small teams, or an enterprise agreement
| (contact for pricing).
|
| Seems high.
| lvl155 wrote:
| Last I tried, their service was spotty and unreliable. I would
| wait maybe a year or so to retry.
| fcpguru wrote:
| does Guillaume Verdon from https://www.extropic.ai/ have thoughts
| on on cerebras?
|
| (or other people that read the litepaper
| https://www.extropic.ai/future)
| landl0rd wrote:
| Beff has shipped zero chips and shitposted a lot. It is a cool
| idea but he has made tons of promises and it's starting to seem
| more like vaporware. Don't get me wrong, I hope it works, but
| doubt it will. Less podcasts more building please.
|
| He reads to me like someone who markets better than he does
| things. I am disinclined to take him as an authority in this
| space.
|
| How do you believe this is related to Cerebras?
___________________________________________________________________
(page generated 2025-09-30 23:01 UTC)