hngopher.com

       [HN Gopher] Cerebras systems raises $1.1B Series G
       ___________________________________________________________________
        
       Cerebras systems raises $1.1B Series G
        
       Author : fcpguru
       Score  : 102 points
       Date   : 2025-09-30 15:54 UTC (7 hours ago)
        
 (HTM) web link (www.cerebras.ai)
 (TXT) w3m dump (www.cerebras.ai)
        
       | fcpguru wrote:
       | Their core product is the Wafer Scale Engine (WSE-3) -- the
       | largest single chip ever made for AI, designed to train and run
       | models much faster and more efficiently than traditional GPUs.
       | 
       | Just tried https://cloud.cerebras.ai wow is it fast!
        
       | OGEnthusiast wrote:
       | I'm surprised how under-the-radar Cerebras is. Being able to get
       | near-instantaneous responses from Qwen3 and gpt-oss is pretty
       | incredible.
        
         | data-ottawa wrote:
         | I wish I could invest in them. Agree they're under the radar.
        
       | redwood wrote:
       | Would be interesting if IBM were to acquire. Seems like the big
       | iron approach to GPUs
        
       | maz1b wrote:
       | Cerebras has been a true revelation when it comes to inference. I
       | have a lot of respect for their founder, team, innovation, and
       | technology. The colossal size of the WS3 chip, utilizing DRAM to
       | mind-boggling scale, it's definitely ultra cool stuff.
       | 
       | I also wonder why they have not been acquired yet. Or is it
       | intentional?
       | 
       | I will say, their pricing and deployment strategy is a bit murky
       | and unclear. Paying $1500-$10,000 per month plus usage costs? I'm
       | assuming that it has to do with chasing and optimizing for higher
       | value contracts and deeper-pocketed customers, hence the minimum
       | monthly spend that they require.
       | 
       | I'm not claiming to be an expert, but as a CEO/CTO, there were
       | other providers in the market that had relatively comparable
       | inference speed (obviously Cerebras is #1), easier onboarding,
       | better response from people that worked there (all of my
       | experience with Cerebras have been days/weeks late or simply
       | ignored). IMHO, if Cerebras wants to gain more mindshare, they'll
       | have to look into this aspect.
        
         | oceanplexian wrote:
         | I've been using them as a customer and have been fairly
         | impressed. The thing is, a lot of inference providers might
         | seem better on paper but it turns out they're not.
         | 
         | Recently there was a fiasco I saw posted on r/localllama where
         | many of the OpenRouter providers were degraded on benchmarks
         | compared to base models, implying they are serving up quantized
         | models to save costs, but lying to customers about it. Unless
         | you're actually auditing the tokens you're purchasing you may
         | not be getting what you're paying for even if the T/s and
         | $/token seems better.
        
           | dlojudice wrote:
           | OpenRouter should be responsible for this quality control,
           | right? It seems to me to be the right player in the chain
           | with the duties and scale to do so.
        
         | liuliu wrote:
         | They were acquisition target since 2017 (from the OpenAI
         | internal emails). So lacking of acquisition is not because
         | lacking of interests. Let you wonder what happened in these
         | due-diligence.
        
         | OkayPhysicist wrote:
         | The UAE has sunk a lot of money into them, and I suspect it's
         | not purely a financial move. If that's the case, an acquisition
         | might be more complicated than it would seem at first glance.
        
         | throw123890423 wrote:
         | > I will say, their pricing and deployment strategy is a bit
         | murky and unclear. Paying $1500-$10,000 per month plus usage
         | costs? I'm assuming that it has to do with chasing and
         | optimizing for higher value contracts and deeper-pocketed
         | customers, hence the minimum monthly spend that they require.
         | 
         | Yeah wait, why rent chips instead of sell them? Why wouldn't
         | customers want to invest money in competition for cheaper
         | inference hardware? It's not like Nvidia has a blacklist of
         | companies that have bought chips from competitors, or anything.
         | Now that would be crazy! That sure would make this market tough
         | to compete in, wouldn't it. I'm so glad Nvidia is definitely
         | not pressuring companies to not buy from competitors or
         | anything.
        
           | aurareturn wrote:
           | Their chips weren't selling because:
           | 
           | 1. They're useless for training in 2025. They were designed
           | for training prior to LLM explosion. They're not practical
           | for training anymore because they rely on SRAM which is not
           | scalable.
           | 
           | 2. No one is going to spend the resources to optimize models
           | to run on their SDK and hardware. Open source inference
           | engines don't optimize for Cerebras hardware.
           | 
           | Given the above two reasons, it makes a lot of sense that no
           | one is investing in their hardware and they have switched to
           | a cloud model selling speed as the differentiator.
           | 
           | It's not always "Nvidia bad".
        
         | nsteel wrote:
         | > utilizing DRAM to mind-boggling scale
         | 
         | I thought it was the SRAM scaling that was impressive, no?
        
           | maz1b wrote:
           | oops, typo! S and D are next to each other on the keyboard.
           | thanks for pointing this out
        
         | aurareturn wrote:
         | I also wonder why they have not been acquired yet. Or is it
         | intentional?
         | 
         | A few issues:
         | 
         | 1. To achieve high speeds, they put everything on SRAM. I
         | estimated that they needed over $100m of chips just to do Qwen
         | 3 at max context size. You can run the same model with max
         | context size on $1m of Blackwell chips but at a slower speed.
         | Anandtech had an article saying that Cerebras was selling a
         | single chip for around $2-3m.
         | https://news.ycombinator.com/item?id=44658198
         | 
         | 2. SRAM has virtually stopped scaling in new nodes. Therefore,
         | new generations of wafer scale chips won't gain as much as
         | traditional GPUs.
         | 
         | 3. Cerebras was designed in the pre-ChatGPT era where much
         | smaller models were being trained. It is practically useless
         | for training in 2025 because of how big LLMs have gotten. It
         | can only do inference but see above 2 problems.
         | 
         | 4. To inference very large LLMs economically, Cerebras would
         | need to use external HBM. If it has to reach outside for
         | memory, the benefits of a wafer scale chip greatly diminishes.
         | Remember that the whole idea was to put the entire AI model
         | inside the wafer so memory bandwidth is ultra fast.
         | 
         | 5. Chip interconnect technology might make wafer scale chips
         | more redundant. TSMC has a roadmap for glueing more than 2 GPU
         | dies together. Nvidia's Feynman GPUs might have 4 dies glued
         | together. IE, the sweet spot for large chips might not be wafer
         | scale but perhaps 2, 4, 8 GPUs together.
         | 
         | 6. Nvidia seems to be moving much faster in terms of
         | development and responding to market needs. For example,
         | Blackwell is focused on FP4 inferencing now. I suppose the
         | nature of designing and building a wafer scale chip is more
         | complex than a GPU. Cerebras also needs to wait for new nodes
         | to fully mature so that yields can be higher.
         | 
         | There exists a niche where some applications might need super
         | fast token generation regardless of price. Hedge funds and
         | Wallstreet might be good use cases. But it won't challenge
         | Nvidia in training or large scale inference.
        
           | addaon wrote:
           | > SRAM has virtually stopped scaling in new nodes.
           | 
           | But there are several 1T memories that are still scaling,
           | more or less -- eDRAM, MRAM, etc. Is there anything
           | preventing their general architecture from moving to a 1T
           | technology once the density advantages outweigh the need for
           | pipelining to hide access time?
        
             | aurareturn wrote:
             | I'm pretty sure that HBM4 can be 20-30x faster in terms of
             | bandwidth than eDRAM. That makes eDRAM not an option for AI
             | workloads since bandwidth is the main bottleneck.
        
               | addaon wrote:
               | HBM4 is limited to a few thousand bits of width per
               | stack. eDRAM bandwidth scales with chip area. A full-
               | wafer chip could have astonishing bandwidth.
        
           | Voloskaya wrote:
           | > I estimated that they needed over $100m of chips just to do
           | Qwen 3 at max context size
           | 
           | I will point out (again :)), that this math is completely
           | wrong. There is no need (nor performance gains) to store the
           | entire weights of the model in SRAM. You simply store n
           | transformer blocks on-chip and then stream block l+n from
           | external memory to on-chip when you start computing block l,
           | this completely masks the communication time behind the
           | compute time, and specifically does not require you to buy
           | 100M$ worth of SRAM. This is standard stuff that is done
           | routinely in many scenarios, e.g. FSDP.
           | 
           | https://www.cerebras.ai/blog/cerebras-software-
           | release-2.0-5...
        
             | MichaelZuo wrote:
             | So then what explains such a low implied valuation at
             | series G?
             | 
             | There's no way that could be the case if the technology was
             | competitive.
        
               | Voloskaya wrote:
               | I'm not saying it's particularly competitive, I'm saying
               | claiming it cost 100M$ to run Qwen is complete lunacy.
               | There is a gulf between those 2 things.
               | 
               | And beyond pure performance competitiveness there are
               | many things that make it hard for Cerebras and to be
               | actually competitive: can they ship enough chips to meet
               | the need of large clusters ? What about the software
               | stack and lack of great support compared to nvidia? Lack
               | of ml engineers that know how to use them, when everyone
               | knows how to use CUDA and there are many things developed
               | on top of it by the community (e.g triton).
               | 
               | Just look at the valuation difference between AMD and
               | Nvidia, when AMD is already very competitive. But being
               | 99% of the way there is still not enough for customers
               | that are going to pay 5B$ for their clusters.
        
             | vlovich123 wrote:
             | I did experiments with this on traditional consumer GPU and
             | the larger the discrepancy between model size and VRAM, the
             | faster it dropped off (exponentially) to as if you didn't
             | even have any VRAM in the first place (over PCIe). This
             | technique is well known and works when you have more than
             | enough bandwidth.
             | 
             | However, the whole point that even HBM is a problem is the
             | available bandwidth is insufficient, so if you're marrying
             | SRAM and HBM I would expect the performance gains to be
             | overall modest for models that exceed available SRAM in a
             | meaningful way.
        
       | ramshanker wrote:
       | I am not able to guess, what is preventing Cerebras from
       | replacing few of the cores in the Wafer-Scale package with HBM
       | memory? It seems the only constraint with their WSE3 is memory
       | capacity. Considering the size of NVDA chips, Only a small subset
       | of wafer area should easily exceed the memory size of
       | contemporary models.
        
         | reliabilityguy wrote:
         | DRAMs (core of the HBM memories) use different technology nodes
         | than logic and SRAM. Also, stacking that many DRAMs on waver
         | will complicate the packaging quite a bit I think.
        
         | xadhominemx wrote:
         | I don't think so. The reason why Cerebras is so fast for
         | inference is that the KV cache sits in the SRAM.
        
         | aurareturn wrote:
         | If you replace some cores with HBM on package, you basically
         | get the traditional GPU + HBM model.
        
       | Shakahs wrote:
       | Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder
       | on Cerebras is often more productive for me because it's just so
       | incredibly _fast_. Even if it takes more LLM calls to complete a
       | task, those calls are all happening in a fraction of the time.
        
         | nerpderp82 wrote:
         | We must have very different workflows, I am curious about
         | yours. What tools are you using and how are you guiding
         | Qwen3-Coder? When I am using Claude Code, it often works for
         | 10+ minutes at a time, so I am not aware of inference speed.
        
           | CaptainOfCoit wrote:
           | > When I am using Claude Code, it often works for 10+ minutes
           | at a time, so I am not aware of inference speed.
           | 
           | Indirectly, it sounds like you're aware about the inference
           | speed? Imagine if it took 2 minutes instead of 10 minutes,
           | that's what the parent means.
        
             | yodon wrote:
             | 2 minutes is the worst delay. With 10 minutes, I can and do
             | context switch to something else and use the time
             | productively. With 2 min, I wait and get frustrated and
             | bored.
        
           | solarkraft wrote:
           | You must write very elaborate prompts for 10 minutes to be
           | worth the wait. What permissions are you giving it and how
           | much do you care about the generated code? How much time did
           | you spend on initial setup?
           | 
           | I've found that the best way for myself to do LLM assisted
           | coding at this point in time is in a somewhat tight feedback
           | loop. I find myself wanting to refine the code and
           | architectural approaches a fair amount as I see them coming
           | in and latency matters a lot to me here.
        
         | ripped_britches wrote:
         | Do you use cursor or what? Interested in how you set this up
        
           | Shakahs wrote:
           | I use it via the Kilo Code extension for VSCode, which is
           | invoking Qwen3-Coder via a Cerebras Code subscription.
           | 
           | https://github.com/Kilo-Org/kilocode
           | https://www.cerebras.ai/blog/introducing-cerebras-code
        
       | mythz wrote:
       | Running Qwen3 coder at speed is great, but would also prefer to
       | have access to other leading OSS models like GLM 4.6, Kimi K2 and
       | DeepSeek v3.2 before considering switching subs.
       | 
       | Groq also runs OSS models at speed which is my preferred way to
       | access Kimi K2 on their free quotas.
        
       | JLO64 wrote:
       | My experience with Cerebras is pretty mixed. On the one hand for
       | simple and basic requests, it truly is mind blowing how fast it
       | is. That said, I've had nothing but issues and empty responses
       | whenever I try to use them for coding tasks (Opencode via
       | Openrouter, GPT-OSS). It's gotten to a point where I've disabled
       | them as a provider on Openrouter.
        
         | divmain wrote:
         | I experienced the same, but I think it is a limitation of
         | OpenRouter. When I hit Cerebra's OpenAI endpoint directly, it
         | works flawlessly.
        
       | allisdust wrote:
       | If the idiots at AMZN have any brains left, they would acquire
       | this and make it the center of their inference offerings. But
       | considering how lackluster their performance and strategy as a
       | company has been off late, I doubt that.
       | 
       | Disappointed quite a bit with this fund raise. They were expected
       | to IPO this year and give us poor retail investors a chance at
       | investing in them.
        
         | reliabilityguy wrote:
         | Amazon has their own chips for inference and training:
         | Trainium1/2.
        
           | allisdust wrote:
           | Nothing (may be except groq ?) comes even close to Cerebras
           | in inference speed. I seriously don't get why these guys
           | aren't more popular. The difference in using them as a
           | inference provider vs anything else for any use case is like
           | night and day. I hope more inference providers focus on
           | speed. And this is where AMZN will benefit a lot since their
           | entire cloud model is to have something people would anyway
           | want and mark it up by 3x. God forbid if AVGO acquires this.
        
             | xadhominemx wrote:
             | Cerebras hasn't made any technical breakthroughs, they are
             | just putting everything in SRAM. It's a brute force
             | approach to get very high inference throughput but comes at
             | extremely high cost per token per second and is not useful
             | for batched inferencing. Groq uses the same approach.
             | 
             | Memory hierarchy management across HBM/DDR/Flash is much
             | more difficult but necessary to achieve practical inference
             | economics.
        
               | twothreeone wrote:
               | I don't think you realize the history of wafer-scale
               | integration and what it means for the chip industry [1].
               | The approach was famously taken by Gene Amdahl's Trilogy
               | Systems in the 80ies, but failed dramatically leading to
               | (among others) deployment of "accelerator cards" in the
               | form of.. the NVIDIA GeForce 256, the first GPU in 1999.
               | It's not like NVIDIA hasn't been trying to integrate
               | multiple dies in the same package, but doing that
               | successfully has been a huge technological hurdle so far.
               | 
               | [1] https://ieeexplore.ieee.org/abstract/document/9623424
        
               | averne_ wrote:
               | The main reason a wafer scale chip works there is because
               | their cores are extremely tiny, and silicon area that
               | gets fused off in the event of a defect is much lower
               | than on NVIDIA chips, where a whole SM can get disabled.
               | AFAIU this approach is not easily applicable to complex
               | core designs.
        
               | xadhominemx wrote:
               | I understand that topic well. They stitched top metal
               | layers across the reticle - not that challenging, and the
               | foundational IP is not their own.
               | 
               | Everyone else went the CoWoS direction, which enables
               | heterogeneous integration and much more cost effective
               | inference.
        
         | onlyrealcuzzo wrote:
         | It would be hard to beat designing their own in-house offering
         | that is 50% as good, at 20% the cost.
         | 
         | That's the problem.
         | 
         | Unless the majority of the value is on the other end of the
         | curve, it's a tough sell.
        
       | rvz wrote:
       | Sooner or later, lots of competitors including Cerebras are going
       | to take apart Nvidia's data center market share and it will cause
       | many AI model firms to question the unnecessary spend and
       | hoarding of GPUs.
       | 
       | OpenAI is _still_ developing their own chips with Broadcom, but
       | they are not operational yet. So for now, they 're buying GPUs
       | from Nvidia to build up their own revenue income (to later spend
       | it on their own chips)
       | 
       | By 2030, eventually many companies will be looking for
       | alternatives to Nvidia like Cerebras or Lightmatter for both
       | training and inference use-cases.
       | 
       | For example [0] Meta just acquired a chip startup for this
       | _exact_ reason - _" An alternative to training AI systems"_ and
       | _" to cut infrastructure costs linked to its spending on advanced
       | AI tools."_.
       | 
       | [0] https://www.reuters.com/business/meta-buy-chip-startup-
       | rivos...
        
         | onlyrealcuzzo wrote:
         | There's so much optimization to be made when developing the
         | model and the hardware it runs on, most of the big players are
         | likely to run a non-trivial percentage of their workloads on
         | proprietary chips _eventually_.
         | 
         | If that's 5 years into the future, that looks bad for Nvidia,
         | if it's >10 years in the future, that doesn't affect Nvidia's
         | current stock price very much.
        
       | arjie wrote:
       | I just tried out Qwen-3-480B-Coder on them yesterday and to be
       | honest it's not good enough. It's very fast but has trouble on
       | lots of tasks that Claude Code just solves. Perhaps part of it is
       | that I'm using Charm's Crush instead of Claude Code.
        
       | tibbydudeza wrote:
       | Damm they are fast.
        
       | dgfitz wrote:
       | Valued at 8.1 billion dollars.
       | 
       | https://www.cerebras.ai/pricing
       | 
       | $50/month for one person for code (daily token limit), or pay per
       | token, or $1500/month for small teams, or an enterprise agreement
       | (contact for pricing).
       | 
       | Seems high.
        
       | lvl155 wrote:
       | Last I tried, their service was spotty and unreliable. I would
       | wait maybe a year or so to retry.
        
       | fcpguru wrote:
       | does Guillaume Verdon from https://www.extropic.ai/ have thoughts
       | on on cerebras?
       | 
       | (or other people that read the litepaper
       | https://www.extropic.ai/future)
        
         | landl0rd wrote:
         | Beff has shipped zero chips and shitposted a lot. It is a cool
         | idea but he has made tons of promises and it's starting to seem
         | more like vaporware. Don't get me wrong, I hope it works, but
         | doubt it will. Less podcasts more building please.
         | 
         | He reads to me like someone who markets better than he does
         | things. I am disinclined to take him as an authority in this
         | space.
         | 
         | How do you believe this is related to Cerebras?
        
       ___________________________________________________________________
       (page generated 2025-09-30 23:01 UTC)