[HN Gopher] Top researchers leave Intel to build startup with 't...
       ___________________________________________________________________
        
       Top researchers leave Intel to build startup with 'the biggest,
       baddest CPU'
        
       Author : dangle1
       Score  : 133 points
       Date   : 2025-06-06 14:07 UTC (8 hours ago)
        
 (HTM) web link (www.oregonlive.com)
 (TXT) w3m dump (www.oregonlive.com)
        
       | Ocha wrote:
       | https://archive.ph/BSKSq
        
       | esafak wrote:
       | Can't they make a GPU instead? Please save us!
        
         | AlotOfReading wrote:
         | A GPU is a very different beast that relies much more heavily
         | on having a gigantic team of software developers supporting it.
         | A CPU is (comparatively) straightforward. You fab and validate
         | a world class design, make sure compiler support is good
         | enough, upstream some drivers and kernel support, and make sure
         | the standard documentation/debugging/optimization tools are all
         | functional. This is incredibly difficult, but achievable
         | because these are all standardized and well understood
         | interface points.
         | 
         | With GPUs you have all these challenges while also building a
         | massively complicated set of custom compilers and interfaces on
         | the software side, while at the same time trying to keep broken
         | user software written against some other company's interface
         | not only functional, but performant.
        
           | esafak wrote:
           | It's not the GPU I want per se but its ability to run ML
           | tasks. If you can do that with your CPU fine!
        
             | mort96 wrote:
             | Well that's even more difficult because not only do you
             | need drivers for the widespread graphics libraries Vulkan,
             | OpenGL and Direct3D, but you also need to deal with the
             | GPGPU mess. Most software won't ever support your compute-
             | focused GPU because you won't support CUDA.
        
             | AlotOfReading wrote:
             | Echoing the other comment, this isn't easier. I was on a
             | team that did it. The ML team was overheard by media
             | complaining that we were preventing them from achieving
             | their goals because we had taken 2 years to build something
             | that didn't beat the latest hardware from Nvidia, let alone
             | keep pace with how fast their demands had grown.
        
               | mdaniel wrote:
               | I don't need it to beat the latest from nvidia, just be
               | affordable, available, and have user servicable ram slots
               | so "48gb" isn't such an ooo-ahh amount of memory
               | 
               | I couldn't find any buy it now links but 512gb sticks
               | don't seem to be fantasies, either:
               | https://news.samsung.com/global/samsung-develops-
               | industrys-f...
        
               | kvemkon wrote:
               | And now, 4 years later, I still can choose only among
               | micron and hynix for consumer DDR5 DIMM. No samsung or
               | nanya which I could order right now.
               | 
               | While micron (crucial) 64GB DDR5 (SO-)DIMMs are available
               | since few months.
        
             | Bolwin wrote:
             | I mean you most certainly can. Pretty much every ml library
             | has cpu support
        
               | esafak wrote:
               | Not theoretically, but practically, viably.
        
           | Asraelite wrote:
           | > make sure compiler support is good enough
           | 
           | Do compilers optimize for specific RISC-V CPUs, not just
           | profiles/extensions? Same for drivers and kernel support.
           | 
           | My understanding was that if it's RISC-V compliant, no extra
           | work is needed for existing software to run on it.
        
             | AlotOfReading wrote:
             | The major compilers optimize for microarchitecture, yes.
             | Here's the tablegen scheduling definition behind LLVM's
             | -mtune=sifive-670 flag as an example:
             | https://github.com/llvm/llvm-
             | project/blob/main/llvm/lib/Targ...
             | 
             | It's not that things won't _run_ , but this is necessary
             | for compilers to generate well optimized code.
        
             | Arnavion wrote:
             | You want to optimize for specific chips because different
             | chips have different capabilities that are not captured by
             | just what extensions they support.
             | 
             | A simple example is that the CPU might support running two
             | specific instructions better if they were adjacent than if
             | they were separated by other instructions (
             | https://en.wikichip.org/wiki/macro-operation_fusion ). So
             | the optimizer can try to put those instructions next to
             | each other. LLVM has target features for this, like "lui-
             | addi-fusion" for CPUs that will fuse a `lui; addi` sequence
             | into a single immediate load.
             | 
             | A more complex example is keeping track of the CPU's
             | internal state. The optimizer models the state of the CPU's
             | functional units (integer, address generation, etc) so that
             | it has an idea of which units will be in use at what time.
             | If the optimizer has to allocate multiple instructions that
             | will use some combination of those units, it can try to lay
             | them out in an order that will minimize stalling on busy
             | units while leaving other units unused.
             | 
             | That information also tells the optimizer about the latency
             | of each instruction, so when it has a choice between
             | multiple ways to compute the same operation it can choose
             | the one that works better on this CPU.
             | 
             | See also: https://myhsu.xyz/llvm-sched-model-1/
             | https://myhsu.xyz/llvm-sched-model-1.5/
             | 
             | If you don't do this your code will still run on your CPU.
             | It just won't necessarily be as optimal as it could be.
        
               | Bolwin wrote:
               | Wonder if we could generalize this so you can just give
               | the optimizer a file containing all this info, without
               | needing to explicitly add support for each cpu
        
               | frankchn wrote:
               | These configuration files exist
               | (https://llvm.org/docs/TableGen/,
               | https://github.com/llvm/llvm-
               | project/blob/main/llvm/lib/Targ...) but it is very
               | complicated because the processors themselves are very
               | complicated.
        
         | speedgoose wrote:
         | I hope to see dedicated GPU coprocessors disappear sooner
         | rather than later, just like arithmetic coprocessors did.
        
           | wtallis wrote:
           | Arithmetic co-processors didn't disappear so much as they
           | moved onto the main CPU die. There were performance
           | advantages to having the FPU on the CPU, and there were no
           | longer significant cost advantages to having the FPU be
           | separate and optional.
           | 
           | For GPUs today and in the foreseeable future, there are still
           | good reasons for them to remain discrete, in some market
           | segments. Low-power laptops have already moved entirely to
           | integrated GPUs, and entry-level gaming laptops are moving in
           | that direction. Desktops have widely varying GPU needs
           | ranging from the minimal iGPUs that all desktop CPUs now
           | already have, up to GPUs that dwarf the CPU in die and
           | package size and power budget. Servers have needs ranging
           | from one to several GPUs per CPU. There's no one right answer
           | for _how much_ GPU to integrate with the CPU.
        
             | otabdeveloper4 wrote:
             | By "GPU" they probably mean "matrix multiplication
             | coprocessor for AI tasks", not _actually_ a graphics
             | processor.
        
               | wtallis wrote:
               | That doesn't really change anything. The use cases for a
               | GPU in any given market segment don't change depending on
               | whether you _call_ it a GPU.
               | 
               | And for low-power consumer devices like laptops, "matrix
               | multiplication coprocessor for AI tasks" is at least as
               | likely to mean NPU as GPU, and NPUs are always integrated
               | rather than discrete.
        
               | touisteur wrote:
               | Wondering how you'd classify Gaudi, tenstorrent-stuff,
               | groq, or lightmatter's photonic thing.
               | 
               | Calling something a GPU tends to make people ask for
               | (good, performant) support for opengl, Vulkan,
               | direct3d... which seem like a huge waste of effort if you
               | want to be an "AI-coprocessor".
        
               | wtallis wrote:
               | > Wondering how you'd classify Gaudi, tenstorrent-stuff,
               | groq, or lightmatter's photonic thing.
               | 
               | Completely irrelevant to consumer hardware, in basically
               | the same way as NVIDIA's Hopper (a data center GPU that
               | doesn't do graphics). They're ML accelerators that for
               | the foreseeable future will mostly remain discrete
               | components and not be integrated onto Xeon/EPYC server
               | CPUs. We've seen a handful of products where a small
               | amount of CPU gets grafted onto a large GPU/accelerator
               | to remove the need for a separate host CPU, but that's
               | definitely not on track to kill off discrete accelerators
               | in the datacenter space.
               | 
               | > Calling something a GPU tends to make people ask for
               | (good, performant) support for opengl, Vulkan,
               | direct3d... which seem like a huge waste of effort if you
               | want to be an "AI-coprocessor".
               | 
               | This is not a problem outside the consumer hardware
               | market.
        
           | rjsw wrote:
           | That was one of the ideas behind Larrabee [1]. You can run
           | Mesa on the CPU today using the llvmpipe backend.
           | 
           | https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
        
           | saltcured wrote:
           | Aspects of this has been happening for a long time, as SIMD
           | extensions and as multi-core packaging.
           | 
           | But, there is much more to discrete GPUs than vector
           | instructions or parallel cores. It's very different memory
           | and cache systems with very different synchronization
           | tradeoffs. It's like an embedded computer hanging off your
           | PCI bus, and this computer does not have the same stable
           | architecture as your general purpose CPU running the host OS.
           | 
           | In some ways, the whole modern graphics stack is a sort of
           | integration and commoditization of the supercomputers of
           | decades ago. What used to be special vector machines and
           | clusters full of regular CPUs and RAM has moved into massive
           | chips.
           | 
           | But as other posters said, there is still a lot more
           | abstraction in the graphics/numeric programming models and a
           | lot more compiler and runtime tools to hide the platform.
           | Unless one of these hidden platforms "wins" in the market,
           | it's hard for me to imagine general purpose OS and apps being
           | able to handle the massive differences between particular GPU
           | systems.
           | 
           | It would easily be like prior decades where multicore wasn't
           | taking off because most apps couldn't really use it. Or where
           | special things like the "cell processor" in the playstation
           | required very dedicated development to use effectively. The
           | heterogeneity of system architectures makes it hard for
           | general purpose reuse and hard to "port" software that wasn't
           | written with the platform in mind.
        
       | jmclnx wrote:
       | >AheadComputing is betting on an open architecture called RISC-V
       | 
       | I wish them success, plus I hope they do not do what Intel did
       | with its add-ons.
       | 
       | Hoping for an open system (which I think RISC-V is) and nothing
       | even close to Intel ME or AMT.
       | 
       | https://en.wikipedia.org/wiki/Intel_Management_Engine
       | 
       | https://en.wikipedia.org/wiki/Intel_Active_Management_Techno...
        
         | constantcrying wrote:
         | >Hoping for an open system (which I think RISC-V is) and
         | nothing even close to Intel ME or AMT.
         | 
         | The architecture is independent of additional silicon with
         | separate functions. The "only" thing which makes RISC-V open
         | are that the specifications are freely available and freely
         | usable.
         | 
         | Intel ME is, by design, separate from the actual CPU. Whether
         | the CPU uses x86 or RISC-V is essentially irrelevant.
        
       | ahartmetz wrote:
       | I don't know, RISC-V doesn't seem to be very disruptive at this
       | point? And what's the deal with specialized chips that the
       | article mentions? Today, the "biggest, baddest" CPUs - or at
       | least CPU cores - are the general-purpose (PC and, somehow, Apple
       | mobile / tablet) ones. The opposite of specialized.
       | 
       | Are they going to make one with 16384 cores for AI / graphics or
       | are they going to make one with 8 / 16 / 32 cores that can each
       | execute like 20 instructions per cycle?
        
         | jasoneckert wrote:
         | Most of the work that goes into chip design isn't related to
         | the ISA per se. So, it's entirely plausible that some talented
         | chip engineers could design something that implements RISC-V in
         | a way that is quite powerful, much like how Apple did with ARM.
         | 
         | The biggest roadblock would be lack of support on the software
         | side.
        
           | ahartmetz wrote:
           | Yeah sure, but the question remains if it's going to be a
           | huge amount of small cores or a moderate amount of huge
           | cores.
           | 
           | What it can't be is something like the Mill if they implement
           | the RISC-V ISA.
        
             | mixmastamyk wrote:
             | Article implies a CPU focus at first, though is a bit
             | vague. Title is clear however.
        
             | leetrout wrote:
             | For those that don't know about the Mill see
             | https://millcomputing.com/
             | 
             | I came to this thread looking for a comment about this.
             | I've been patiently following along for over a decade now
             | and I'm not optimistic anything will come from the project
             | :(
        
               | ahartmetz wrote:
               | Yeah, I guess not at this point, but the presentations
               | were very interesting to watch. According to the
               | yearly(!) updates on their website, they are still going
               | but not really close to finishing a product. Hm.
        
           | lugu wrote:
           | On the software side, I think the biggest blocker is
           | affordable UEFI laptop for developers. A risc-v startup
           | aiming to disrupt the cloud should include this in their
           | master plan.
        
           | trollbridge wrote:
           | Doesn't RISC-V have a fairly reasonable ecosystem by now?
        
         | glookler wrote:
         | If RISC-V started with supercomputing and worked down that
         | would be a change from how an architecture disruption usually
         | works in the industry.
        
       | mixmastamyk wrote:
       | I was hoping they'd work with existing RV folks rather than
       | starting another one of a dozen smaller attempts. Article says
       | however that Keller from Tenstorrent will be on their board. Good
       | I suppose, but hard to know the ramifications. Why not merge
       | their companies and investments in one direction?
        
       | constantcrying wrote:
       | The article is so bad. Why do they refuse to say anything about
       | what these companies are _actually_ trying to make. RISC-V Chips
       | exist, does the journalist just not know? Does the company refuse
       | to say what they are doing?
        
         | pragma_x wrote:
         | It reads like they're trying to drum up investment. This is why
         | the focus is on the pedigree of the founders, since they don't
         | have a product to speak of yet.
        
         | muricula wrote:
         | The article is written for a different audience than you might
         | be used to. oregonlive is the website for the newspaper The
         | Oregonian, which is the largest newspaper in the state of
         | Oregon. Intel has many of its largest fabs in Oregon and is a
         | big employer there. The local news is writing about a hip new
         | startup for a non-technical audience who know what Intel is and
         | why it's important, but need to be reminded what a CPU actually
         | is.
        
           | Ericson2314 wrote:
           | TBH this is a bad sign about job sprawl.
           | 
           | The fact that California housing pushed Intel to Oregon
           | probably helped lead to its failures. Every time a company
           | relocates to get cost of living (and thus payroll) costs down
           | by relocating to a place with fewer potential employees and
           | fewer competing employers, modernity slams on the breaks.
        
             | Ericson2314 wrote:
             | https://www.aheadcomputing.com/post/everyone-deserves-a-
             | bett... sheesh, even the company's own writing is kinda
             | folksy too.
        
             | muricula wrote:
             | That might have been true in the early 2000s when they were
             | growing the Hillsborough Oregon campus but most new fabs
             | are opening in Arizona for taxation and political reasons.
             | I don't have the numbers to back it up, but based on
             | articles about Intel layoffs I believe that Intel has been
             | shedding jobs in Oregon for a while now.
             | 
             | This wiki page has a list of Intel fab starts, you can see
             | them being constructed in Oregon until 2013, and after that
             | all new construction moved elsewhere. https://en.wikipedia.
             | org/wiki/List_of_Intel_manufacturing_si...
             | 
             | I can imagine this slow disinvestment in Oregon would only
             | encourage some architects to quit an found a RISC-V
             | startup.
        
               | Ericson2314 wrote:
               | I am saying that all this stuff should have never left
               | the bay area, and the bay area should have millions more
               | people than it does today.
               | 
               | Arizona is also a mistake --- a far worse place for high
               | tech than Oregon!. It is a desert real estate ponzi
               | scheme with no top-tier schools, no history of top-tier
               | high-skill intellectual job markets. In general the sun
               | belt (including LA) is the land of stupid.
               | 
               | The electoral college is always winning out over the best
               | economic geography, and it sucks.
        
       | 1970-01-01 wrote:
       | Staring at current AI chip demand levels and choosing to go with
       | RISC chips is the boldest move you could make. Good luck. The
       | competition with the big boys will be relentless. I expect them
       | to be bought if they actually make a dent in the market.
        
       | saulpw wrote:
       | The traitorous four.
        
         | asplake wrote:
         | > The traitorous eight was a group of eight employees who left
         | Shockley Semiconductor Laboratory in 1957 to found Fairchild
         | Semiconductor.
         | 
         | https://en.wikipedia.org/wiki/Traitorous_eight
        
           | saulpw wrote:
           | Thanks, I guess that particular history and analogy would not
           | be known universally :)
        
       | Foobar8568 wrote:
       | Bring back the architecture madness era of the 80s/90s.
        
       | aesbetic wrote:
       | This is more a bad look for Intel than anything truly exciting
       | since they refuse to produce any details lol
        
       | guywithahat wrote:
       | As someone who knows almost nothing about CPU architecture, I've
       | always wondered if there could be a new instruction set, better
       | suited to today's needs. I realize it would require a monumental
       | software effort but most of these instruction sets are decades
       | old. RISC-V is newer but my understanding is it's still based
       | around ARM, just without royalties (and thus isn't bringing many
       | new ideas to the table per say)
        
         | ItCouldBeWorse wrote:
         | I think the ideal would be something like a Xilinx offering,
         | tailoring the CPU- regarding cache, parallelism and in hardware
         | execution of hotloop components, depending on the task.
         | 
         | Your CPU changes with every app, tab and program you open.
         | Changing from one core, to n-core plus AI-GPU and back. This
         | idea, that you have to write it all in stone, always seemed
         | wild to me.
        
           | dehrmann wrote:
           | I'm fuzzy on how FPGAs actually work, but they're heavier
           | weight than you think, so I don't think you'd necessarily get
           | the wins you're imagining.
        
           | FuriouslyAdrift wrote:
           | You should definitely look into AMD's Instict, Xynq, and
           | Versal lines, then.
        
         | jcranmer wrote:
         | > RISC-V is newer but my understanding is it's still based
         | around ARM, just without royalties (and thus isn't bringing
         | many new ideas to the table per say)
         | 
         | RISC-V is the fifth version of a series of academic chip
         | designs at Berkeley (hence it's name).
         | 
         | In terms of design philosophy, it's probably closest to MIPS of
         | the major architectures; I'll point out that some of its early
         | whitepapers are explicitly calling out ARM and x86 as the kind
         | of architectural weirdos to avoid emulating.
        
           | dehrmann wrote:
           | > I'll point out that some of its early whitepapers are
           | explicitly calling out ARM and x86 as the kind of
           | architectural weirdos to avoid emulating.
           | 
           | Says every new system without legacy concerns.
        
           | guywithahat wrote:
           | Theoretically wouldn't MIPS be worse, since it was designed
           | to help students understand CPU architectures (and not to be
           | performant)?
           | 
           | Also I don't meet to come off confrontational, I genuinely
           | don't know
        
             | BitwiseFool wrote:
             | MIPS was a genuine attempt at creating a new commercially
             | viable architecture. Some of the design goals of MIPS made
             | it conducive towards teaching, namely its relative
             | simplicity and lack of legacy cruft. It was never intended
             | to be an academic only ISA. Although I'm certain the owners
             | hoped that learning MIPS in college would lead to wider
             | industry adoption. That did not happen.
             | 
             | Interestingly, I recently completed a masters-level
             | computer architecture course and we used MIPS. However,
             | starting next semester the class will use RISC-V instead.
        
             | jcranmer wrote:
             | The reason why I say RISC-V is probably most influenced by
             | MIPS is because RISC-V places a rather heavy emphasis on
             | being a "pure" RISC design. (Also, RISC-V _was_ designed by
             | a university team, not industry!) Some of the core
             | criticisms of the RISC-V ISA is on it carrying on some of
             | these trends even when experience has suggested that doing
             | otherwise would be better (e.g., RISC-V uses load-linked
             | /store-conditional instead of compare-and-swap).
             | 
             | Given that the core motivation of RISC was to be a
             | maximally performant design for architectures, the authors
             | of RISC-V would disagree with you that their approach is
             | compromising performance.
        
             | zozbot234 wrote:
             | MIPS has a few weird features such as delay slots, that
             | RISC-V sensibly dispenses with. There's been also quite a
             | bit of convergent evolution in the meantime, such that
             | AArch64 is significantly closer to MIPS and RISC-V compared
             | to ARM32. Though it's still using condition codes where
             | MIPS and RISC-V just have conditional branch instructions.
        
         | ksec wrote:
         | >I've always wondered if there could be a new instruction set,
         | better suited to today's needs.
         | 
         | AArach64 is pretty much a completely new ISA built from ground
         | up.
         | 
         | https://www.realworldtech.com/arm64/5/
        
         | dehrmann wrote:
         | I'm far from an expert here, but these days, it's better to
         | view the instruction set as a frontend to the actual
         | implementation. You can see this with Intel's E/P cores; the
         | instructions are the same, but the chips are optimized
         | differently.
         | 
         | There actually have been changes for "today's needs," and
         | they're usually things like AES acceleration. ARM tried to run
         | Java natively with Jazelle, but it's still best to think of it
         | as a frontend, and the fact that Android is mostly Java and
         | ARM, but this feature got dropped says a lot.
         | 
         | The fact that there haven't been that many changes shows they
         | got the fundamental operations and architecture styles right.
         | What's lacking today is where GPUs step in: massively wide
         | SIMD.
        
       | logicchains wrote:
       | I wonder if it'll be ready before the Mill CPU?
        
       | pstuart wrote:
       | If Intel were smart (cough), they'd fund lots of skunkworks
       | startups like this that could move quickly and freely, but then
       | be "guided home" into intel once mature enough.
        
         | cjbgkagh wrote:
         | That creates a split between those who get to work on skunk
         | works and those stuck on legacy. It's very possible to end up
         | with a google like situation where no-one wants to keep the
         | lights on for old projects as doing so would be career suicide.
         | There have been some attempts at other companies at requiring
         | people to have a stake in multiple projects in different stages
         | of the lifecycle but I've never seen a stable version of this,
         | as individuals benefit from bending the rules.
        
           | pstuart wrote:
           | Those are valid problems, however, they are not
           | insurmountable.
           | 
           | There's plenty of people who would be fine doing unexciting
           | dead end work if they were compensated well enough (pay,
           | work-life balance, acknowledgement of value, etc).
           | 
           | This is ye olde Creative Destruction dilemma. There's too
           | much inertia and politics internally to make these projects
           | succeed in house. But if a startup was owned by the org and
           | they mapped out a path of how to absorb it after it takes off
           | they then reap the rewards rather than watch yet another
           | competitor eat their lunch.
        
             | cjbgkagh wrote:
             | A spin-out to reacquire. I've seen a lot of outsourcing
             | innovation via startups with much the same effects as skunk
             | works. People at the main company become demoralized that
             | the only way to get anything done is to leave the company,
             | why solve a problem internally when you can do it
             | externally for a whole more money and recognition. The
             | causes brain drain to the point that the execs at the main
             | company become suspicious of anyone who choses to remain
             | long term. It even gets to the point that even after you're
             | acquired it's better to leave and do it over again because
             | the execs will forget you were acquired and start confusing
             | you with their lifers.
             | 
             | The only way I've seen anyone deal with this issue
             | successfully is with rather small companies which don't
             | have nearly as much of the whole agency cost of management
             | to deal with.
        
         | mbreese wrote:
         | Or fund a lot of CPU startups that were tied back to Intel for
         | manufacturing/foundry work. Sure, they could be funding their
         | next big CPU competitor, but they'd still be able to capture
         | the revenue from actually producing the chips.
        
       | kleiba wrote:
       | Good luck not infringing on any patents!
       | 
       | And that's not sarcasm, I'm serious.
        
         | neuroelectron wrote:
         | Intel restructures into patent troll, hiring reverse engineers
         | and investing in chip sanding and epoxy acids.
        
       | energy123 wrote:
       | I like the retro-ish and out of trend name they've chosen:
       | AheadComputing.
        
         | laughingcurve wrote:
         | Together Compute
         | 
         | SFCompute
         | 
         | And so on ... definitely not out of trend
        
       | johnklos wrote:
       | One of the biggest problems with CPUs is legacy. Tie yourself to
       | any legacy, and now you're spending millions of transistors to
       | make sure some way that made sense ages ago still works.
       | 
       | Just as a thought experiment, consider the fact that the i80486
       | has 1.2 million transistors. An eight core Ryzen 9700X has around
       | 12 billion. The difference in clock speed is roughly 80 times,
       | and the difference in number of transistors is 1,250 times.
       | 
       | These are wild generalizations, but let's ask ourselves: If a
       | Ryzen takes 1,250 times the transistor for one core, does one
       | core run 1,250 times (even taking hyperthreading in to account)
       | faster than an i80486 at the same clock? 500 times? 100 times?
       | 
       | It doesn't, because massive amounts of those transistors go to
       | keeping things in sync, dealing with changes in execution,
       | folding instructions, decoding a horrible instruction set, et
       | cetera.
       | 
       | So what might we be able to do if we didn't need to worry about
       | figuring out how long our instructions are? Didn't need to deal
       | with Spectre and Meltdown issues? If we made out-of-order work in
       | ways where much more could be in flight and the compilers /
       | assemblers would know how to avoid stalls based on dependencies,
       | or how to schedule dependencies? What if we took expensive
       | operations, like semaphores / locks, and built solutions in to
       | the chip?
       | 
       | Would we get to 1,250 times faster for 1,250 times the number of
       | transistors? No. Would we get a lot more performance than we get
       | out of a contemporary x86 CPU? Absolutely.
        
         | johnklos wrote:
         | > and the difference in number of transistors is 1,250 times
         | 
         | I should've written _per core_.
        
         | colechristensen wrote:
         | GPUs scaled wide with a similar number of transistors to a 486
         | and just lots more cores, thousands to tens of thousands of
         | cores averaging out to maybe 5 million transistors per core.
         | 
         | CPUs scaled tall with specialized instruction to make the
         | single thread go faster, no the amount done per transistor does
         | not scale anywhere near linearly, very many of the transistors
         | are dark on any given cycle compared to a much simpler core
         | that will have much higher utilization.
        
         | zozbot234 wrote:
         | > Didn't need to deal with Spectre and Meltdown issues? If we
         | made out-of-order work in ways where much more could be in
         | flight and the compilers / assemblers would know how to avoid
         | stalls based on dependencies, or how to schedule dependencies?
         | What if we took expensive operations, like semaphores / locks,
         | and built solutions in to the chip?
         | 
         | I'm pretty sure that these goals will conflict with one another
         | at some point. For example, the way one solves Spectre/Meltdown
         | issues in a principled way is by changing the hardware and
         | system architecture to have some notion of "privacy-sensitive"
         | data that shouldn't be speculated on. But this will unavoidably
         | limit the scope of OOO and the amount of instructions that can
         | be "in-flight" at any given time.
         | 
         | For that matter, with modern chips, semaphores/locks are
         | already implemented with hardware builtin operations, so you
         | can't do that much better. Transactional memory is an
         | interesting possibility but requires changes on the software
         | side to work properly.
        
         | dist-epoch wrote:
         | If you look at a Zen5 die shots half of the space is taken by
         | L3 cache.
         | 
         | And from each individual core:
         | 
         | - 25% per core L1/L2 cache
         | 
         | - 25% vector stuff (SSE, AVX, ...)
         | 
         | - from the remaining 50% only about 20% is doing instruction
         | decoding
         | 
         | https://www.techpowerup.com/img/AFnVIoGFWSCE6YXO.jpg
        
           | zozbot234 wrote:
           | The real issue with complex insn decoding is that it's hard
           | to make the decode stage wider and at some point this will
           | limit the usefulness of a bigger chip. For instance, AArch64
           | chips tend to have wider decode than their close x86_64
           | equivalents.
        
         | Sohcahtoa82 wrote:
         | > If a Ryzen takes 1,250 times the transistor for one core,
         | does one core run 1,250 times (even taking hyperthreading in to
         | account) faster than an i80486 at the same clock? 500 times?
         | 100 times?
         | 
         | Would be interesting to see a benchmark on this.
         | 
         | If we restricted it to 486 instructions only, I'd expect the
         | Ryzen to be 10-15x faster. The modern CPU will perform out-of-
         | order execution with some instructions even run in parallel,
         | even in single-core and single-threaded execution, not to
         | mention superior branch prediction and more cache.
         | 
         | If you allowed modern instructions like AVX-512, then the
         | speedup could easily be 30x or more.
         | 
         | > Would we get to 1,250 times faster for 1,250 times the number
         | of transistors? No. Would we get a lot more performance than we
         | get out of a contemporary x86 CPU? Absolutely.
         | 
         | I doubt you'd get significantly more performance, though you'd
         | likely gain power efficiency.
         | 
         | Half of what you described in your hypothetical instruction set
         | are already implemented in ARM.
        
         | kvemkon wrote:
         | Would be interesting to compare transistor count without L3
         | (and perhaps L2) cache.
         | 
         | 16-core Zen 5 CPU achieves more than 2 TFLOPS FP64. So number
         | crunching performance scaled very well.
         | 
         | It is weird, that the best consumer GPU can 4 TFLOPS. Some
         | years ago GPUs were an order of magnitude and more faster than
         | CPUs. Today GPUs are likely to be artificially limited.
        
           | kvemkon wrote:
           | E.g. AMD Radeon PRO VII with 13.23 billion transistors
           | achieves 6.5 TFLOPS FP64 in 2020 [1].
           | 
           | [1] https://www.techpowerup.com/gpu-specs/radeon-pro-
           | vii.c3575
        
           | zozbot234 wrote:
           | > 16-core Zen 5 CPU achieves more than 2 TFLOPS FP64. So
           | number crunching performance scaled very well.
           | 
           | These aren't realistic numbers in most cases because you're
           | almost always limited by memory bandwidth, and even if memory
           | bandwidth is not an issue you'll have to worry about
           | thermals. Theoretical CPU compute ceiling is almost never the
           | real bottleneck. GPU's have a very different architecture
           | with higher memory bandwidth and running their chips a lot
           | slower and cooler (lower clock frequency) so they can reach
           | much higher numbers in practical scenarios.
        
             | kvemkon wrote:
             | Sure, not for BLAS Level 1 and 2 operations. But not even
             | for Level 3?
        
           | layla5alive wrote:
           | Huh, consumer GPUs are doing Petaflops of floating point.
           | FP64 isn't a useful comparison because FP64 is nerfed on
           | consumer GPUs.
        
             | kvemkon wrote:
             | Even recent nVidia 5090 has 104.75 TFLOPS FP32.
             | 
             | It's useful comparison in terms of achievable performance
             | per transistor count.
        
         | amelius wrote:
         | > and now you're spending millions of transistors
         | 
         | and spending millions on patent lawsuits ...
        
         | epx wrote:
         | Aren't 99,99999% of these transistors used in cache?
        
           | PopePompus wrote:
           | Gosh no. Often a majority of the transistors are used in
           | cache, but not 99%.
        
           | nomel wrote:
           | Look up "CPU die diagram". You'll see the physical layout of
           | the CPU with annotated blocks.
           | 
           | Zen 3 example: https://www.reddit.com/r/Amd/comments/jqjg8e/q
           | uick_zen3_die_...
           | 
           | So, more like 85%, or around 6 orders of magnitude difference
           | from your guess. ;)
        
         | AnthonyMouse wrote:
         | Modern CPUs don't actually execute the legacy instructions,
         | they execute core-native instructions and have a piece of
         | silicon dedicated to translating the legacy instructions into
         | them. That piece of silicon isn't that big. Modern CPUs use
         | more transistors because transistors are a lot cheaper now,
         | e.g. the i486 had 8KiB of cache, the Ryzen 9700X has >40MiB.
         | The extra transistors don't make it linearly faster but they
         | make it faster enough to be worth it when transistors are
         | cheap.
         | 
         | Modern CPUs also have a lot of things integrated into the "CPU"
         | that used to be separate chips. The i486 didn't have on-die
         | memory or PCI controllers etc., and those things were
         | themselves less complicated then (e.g. a single memory channel
         | and a shared peripheral bus for all devices). The i486SX didn't
         | even have a floating point unit. The Ryzen 9000 series die
         | contains an entire GPU.
        
         | Szpadel wrote:
         | that's exactly why Intel proposed x86S
         | 
         | that's basically x86 without 16 and 32 bit support, no real
         | mode etc.
         | 
         | CPU starts initialized in 64bit without all that legacy crap.
         | 
         | that's IMO great idea. I think every few decades we need to
         | stop and think again about what works best and take fresh start
         | or drop some legacy unused features.
         | 
         | risc v have only mandatory basic set of instructions, as little
         | as possible to be Turing complete and everything else is
         | extension that can be (theoretically) removed in the future.
         | 
         | this also could be used to remove legacy parts without
         | disrupting architecture
        
         | saati wrote:
         | But history showed exactly the opposite, if you don't have an
         | already existing software ecosystem you are dead, the
         | transistors for implementing x86 peculiarities are very much
         | worth it if people in the market want x86.
        
         | layla5alive wrote:
         | In terms of FLOPS, Ryzen is ~1,000,000 times faster than a 486.
         | 
         | For serial branchy code, it isn't a million times faster, but
         | that has almost nothing to do with legacy and everything to do
         | with the nature of serial code and that you can't linearly
         | improve serial execution with architecture and transistor
         | counts (you can sublinearly improve it), but rather with Denard
         | scaling.
         | 
         | It is worth noting, though, that purely via Denard scaling,
         | Ryzen is already >100x faster, though! And via architecture
         | (those transistors) it is several multiples beyond that.
         | 
         | In general compute, if you could clock it down at 33 or 66MHz,
         | a Ryzen would be much faster than a 486, due to using those
         | transistors for ILP (instruction-level parallelism) and TLP
         | (thread-level parallelism). But you won't see any TLP in a
         | single serial program that a 486 would have been running, and
         | you won't get any of the SIMD benefits either, so you won't get
         | anywhere near that in practice on 486 code.
         | 
         | The key to contemporary high performance computing is having
         | more independent work to do, and organizing the data/work to
         | expose the independence to the software/hardware.
        
       | neuroelectron wrote:
       | Tldr: RISC-V ASICs
        
       | zackmorris wrote:
       | I hope they design, build and sell a true 256-1024+ multicore CPU
       | with local memories that appears as an ordinary desktop computer
       | with a unified memory space for under $1000.
       | 
       | I've written about it at length and I'm sure that anyone who's
       | seen my comments is sick of me sounding like a broken record. But
       | there's truly a vast realm of uncharted territory there. I
       | believe that transputers and reprogrammable logic chips like
       | FPGAs failed because we didn't have languages like Erlang/Go and
       | GNU Octave/MATLAB to orchestrate a large number of processes or
       | handle SIMD/MIMD simultaneously. Modern techniques like passing
       | by value via copy-on-write (used by UNIX forking, PHP arrays and
       | Clojure state) were suppressed when mainstream imperative
       | languages using pointers and references captured the market. And
       | it's really hard to beat Amdahl's law when we're worried about
       | side effects. I think that anxiety is what inspired Rust, but
       | there are so many easier ways of avoiding those problems in the
       | first place.
        
         | zozbot234 wrote:
         | If you have 256-1024+ multicore CPUs they will probably have a
         | fake unified memory space that's really a lot more like NUMA
         | underneath. Not too different from how GPU compute works under
         | the hood. And it would let you write seamless parallel code by
         | just using Rust.
        
         | jiggawatts wrote:
         | Check out the Azure HBv5 servers.
         | 
         | High bandwidth memory on-package with 352 AMD Zen 4 cores!
         | 
         | With 7 TB/s memory bandwidth, it's basically an x86 GPU.
         | 
         | This is the future of high performance computing. It used to be
         | available only for supercomputers but it's trickling down to
         | cloud VMs you can rent for reasonable money. Eventually it'll
         | be standard for workstations under your desk.
        
       | ngneer wrote:
       | I wonder if there is any relation to the cancelled Royal and
       | Beast Lake projects.
       | 
       | https://www.notebookcheck.net/Intel-CEO-abruptly-trashed-Roy...
        
       | phendrenad2 wrote:
       | Previous discussion 9 months ago:
       | https://news.ycombinator.com/item?id=41353155
        
       | bluesounddirect wrote:
       | https://archive.ph/BSKSq
        
       ___________________________________________________________________
       (page generated 2025-06-06 23:01 UTC)