[HN Gopher] Top researchers leave Intel to build startup with 't...
___________________________________________________________________
Top researchers leave Intel to build startup with 'the biggest,
baddest CPU'
Author : dangle1
Score : 133 points
Date : 2025-06-06 14:07 UTC (8 hours ago)
(HTM) web link (www.oregonlive.com)
(TXT) w3m dump (www.oregonlive.com)
| Ocha wrote:
| https://archive.ph/BSKSq
| esafak wrote:
| Can't they make a GPU instead? Please save us!
| AlotOfReading wrote:
| A GPU is a very different beast that relies much more heavily
| on having a gigantic team of software developers supporting it.
| A CPU is (comparatively) straightforward. You fab and validate
| a world class design, make sure compiler support is good
| enough, upstream some drivers and kernel support, and make sure
| the standard documentation/debugging/optimization tools are all
| functional. This is incredibly difficult, but achievable
| because these are all standardized and well understood
| interface points.
|
| With GPUs you have all these challenges while also building a
| massively complicated set of custom compilers and interfaces on
| the software side, while at the same time trying to keep broken
| user software written against some other company's interface
| not only functional, but performant.
| esafak wrote:
| It's not the GPU I want per se but its ability to run ML
| tasks. If you can do that with your CPU fine!
| mort96 wrote:
| Well that's even more difficult because not only do you
| need drivers for the widespread graphics libraries Vulkan,
| OpenGL and Direct3D, but you also need to deal with the
| GPGPU mess. Most software won't ever support your compute-
| focused GPU because you won't support CUDA.
| AlotOfReading wrote:
| Echoing the other comment, this isn't easier. I was on a
| team that did it. The ML team was overheard by media
| complaining that we were preventing them from achieving
| their goals because we had taken 2 years to build something
| that didn't beat the latest hardware from Nvidia, let alone
| keep pace with how fast their demands had grown.
| mdaniel wrote:
| I don't need it to beat the latest from nvidia, just be
| affordable, available, and have user servicable ram slots
| so "48gb" isn't such an ooo-ahh amount of memory
|
| I couldn't find any buy it now links but 512gb sticks
| don't seem to be fantasies, either:
| https://news.samsung.com/global/samsung-develops-
| industrys-f...
| kvemkon wrote:
| And now, 4 years later, I still can choose only among
| micron and hynix for consumer DDR5 DIMM. No samsung or
| nanya which I could order right now.
|
| While micron (crucial) 64GB DDR5 (SO-)DIMMs are available
| since few months.
| Bolwin wrote:
| I mean you most certainly can. Pretty much every ml library
| has cpu support
| esafak wrote:
| Not theoretically, but practically, viably.
| Asraelite wrote:
| > make sure compiler support is good enough
|
| Do compilers optimize for specific RISC-V CPUs, not just
| profiles/extensions? Same for drivers and kernel support.
|
| My understanding was that if it's RISC-V compliant, no extra
| work is needed for existing software to run on it.
| AlotOfReading wrote:
| The major compilers optimize for microarchitecture, yes.
| Here's the tablegen scheduling definition behind LLVM's
| -mtune=sifive-670 flag as an example:
| https://github.com/llvm/llvm-
| project/blob/main/llvm/lib/Targ...
|
| It's not that things won't _run_ , but this is necessary
| for compilers to generate well optimized code.
| Arnavion wrote:
| You want to optimize for specific chips because different
| chips have different capabilities that are not captured by
| just what extensions they support.
|
| A simple example is that the CPU might support running two
| specific instructions better if they were adjacent than if
| they were separated by other instructions (
| https://en.wikichip.org/wiki/macro-operation_fusion ). So
| the optimizer can try to put those instructions next to
| each other. LLVM has target features for this, like "lui-
| addi-fusion" for CPUs that will fuse a `lui; addi` sequence
| into a single immediate load.
|
| A more complex example is keeping track of the CPU's
| internal state. The optimizer models the state of the CPU's
| functional units (integer, address generation, etc) so that
| it has an idea of which units will be in use at what time.
| If the optimizer has to allocate multiple instructions that
| will use some combination of those units, it can try to lay
| them out in an order that will minimize stalling on busy
| units while leaving other units unused.
|
| That information also tells the optimizer about the latency
| of each instruction, so when it has a choice between
| multiple ways to compute the same operation it can choose
| the one that works better on this CPU.
|
| See also: https://myhsu.xyz/llvm-sched-model-1/
| https://myhsu.xyz/llvm-sched-model-1.5/
|
| If you don't do this your code will still run on your CPU.
| It just won't necessarily be as optimal as it could be.
| Bolwin wrote:
| Wonder if we could generalize this so you can just give
| the optimizer a file containing all this info, without
| needing to explicitly add support for each cpu
| frankchn wrote:
| These configuration files exist
| (https://llvm.org/docs/TableGen/,
| https://github.com/llvm/llvm-
| project/blob/main/llvm/lib/Targ...) but it is very
| complicated because the processors themselves are very
| complicated.
| speedgoose wrote:
| I hope to see dedicated GPU coprocessors disappear sooner
| rather than later, just like arithmetic coprocessors did.
| wtallis wrote:
| Arithmetic co-processors didn't disappear so much as they
| moved onto the main CPU die. There were performance
| advantages to having the FPU on the CPU, and there were no
| longer significant cost advantages to having the FPU be
| separate and optional.
|
| For GPUs today and in the foreseeable future, there are still
| good reasons for them to remain discrete, in some market
| segments. Low-power laptops have already moved entirely to
| integrated GPUs, and entry-level gaming laptops are moving in
| that direction. Desktops have widely varying GPU needs
| ranging from the minimal iGPUs that all desktop CPUs now
| already have, up to GPUs that dwarf the CPU in die and
| package size and power budget. Servers have needs ranging
| from one to several GPUs per CPU. There's no one right answer
| for _how much_ GPU to integrate with the CPU.
| otabdeveloper4 wrote:
| By "GPU" they probably mean "matrix multiplication
| coprocessor for AI tasks", not _actually_ a graphics
| processor.
| wtallis wrote:
| That doesn't really change anything. The use cases for a
| GPU in any given market segment don't change depending on
| whether you _call_ it a GPU.
|
| And for low-power consumer devices like laptops, "matrix
| multiplication coprocessor for AI tasks" is at least as
| likely to mean NPU as GPU, and NPUs are always integrated
| rather than discrete.
| touisteur wrote:
| Wondering how you'd classify Gaudi, tenstorrent-stuff,
| groq, or lightmatter's photonic thing.
|
| Calling something a GPU tends to make people ask for
| (good, performant) support for opengl, Vulkan,
| direct3d... which seem like a huge waste of effort if you
| want to be an "AI-coprocessor".
| wtallis wrote:
| > Wondering how you'd classify Gaudi, tenstorrent-stuff,
| groq, or lightmatter's photonic thing.
|
| Completely irrelevant to consumer hardware, in basically
| the same way as NVIDIA's Hopper (a data center GPU that
| doesn't do graphics). They're ML accelerators that for
| the foreseeable future will mostly remain discrete
| components and not be integrated onto Xeon/EPYC server
| CPUs. We've seen a handful of products where a small
| amount of CPU gets grafted onto a large GPU/accelerator
| to remove the need for a separate host CPU, but that's
| definitely not on track to kill off discrete accelerators
| in the datacenter space.
|
| > Calling something a GPU tends to make people ask for
| (good, performant) support for opengl, Vulkan,
| direct3d... which seem like a huge waste of effort if you
| want to be an "AI-coprocessor".
|
| This is not a problem outside the consumer hardware
| market.
| rjsw wrote:
| That was one of the ideas behind Larrabee [1]. You can run
| Mesa on the CPU today using the llvmpipe backend.
|
| https://en.wikipedia.org/wiki/Larrabee_(microarchitecture)
| saltcured wrote:
| Aspects of this has been happening for a long time, as SIMD
| extensions and as multi-core packaging.
|
| But, there is much more to discrete GPUs than vector
| instructions or parallel cores. It's very different memory
| and cache systems with very different synchronization
| tradeoffs. It's like an embedded computer hanging off your
| PCI bus, and this computer does not have the same stable
| architecture as your general purpose CPU running the host OS.
|
| In some ways, the whole modern graphics stack is a sort of
| integration and commoditization of the supercomputers of
| decades ago. What used to be special vector machines and
| clusters full of regular CPUs and RAM has moved into massive
| chips.
|
| But as other posters said, there is still a lot more
| abstraction in the graphics/numeric programming models and a
| lot more compiler and runtime tools to hide the platform.
| Unless one of these hidden platforms "wins" in the market,
| it's hard for me to imagine general purpose OS and apps being
| able to handle the massive differences between particular GPU
| systems.
|
| It would easily be like prior decades where multicore wasn't
| taking off because most apps couldn't really use it. Or where
| special things like the "cell processor" in the playstation
| required very dedicated development to use effectively. The
| heterogeneity of system architectures makes it hard for
| general purpose reuse and hard to "port" software that wasn't
| written with the platform in mind.
| jmclnx wrote:
| >AheadComputing is betting on an open architecture called RISC-V
|
| I wish them success, plus I hope they do not do what Intel did
| with its add-ons.
|
| Hoping for an open system (which I think RISC-V is) and nothing
| even close to Intel ME or AMT.
|
| https://en.wikipedia.org/wiki/Intel_Management_Engine
|
| https://en.wikipedia.org/wiki/Intel_Active_Management_Techno...
| constantcrying wrote:
| >Hoping for an open system (which I think RISC-V is) and
| nothing even close to Intel ME or AMT.
|
| The architecture is independent of additional silicon with
| separate functions. The "only" thing which makes RISC-V open
| are that the specifications are freely available and freely
| usable.
|
| Intel ME is, by design, separate from the actual CPU. Whether
| the CPU uses x86 or RISC-V is essentially irrelevant.
| ahartmetz wrote:
| I don't know, RISC-V doesn't seem to be very disruptive at this
| point? And what's the deal with specialized chips that the
| article mentions? Today, the "biggest, baddest" CPUs - or at
| least CPU cores - are the general-purpose (PC and, somehow, Apple
| mobile / tablet) ones. The opposite of specialized.
|
| Are they going to make one with 16384 cores for AI / graphics or
| are they going to make one with 8 / 16 / 32 cores that can each
| execute like 20 instructions per cycle?
| jasoneckert wrote:
| Most of the work that goes into chip design isn't related to
| the ISA per se. So, it's entirely plausible that some talented
| chip engineers could design something that implements RISC-V in
| a way that is quite powerful, much like how Apple did with ARM.
|
| The biggest roadblock would be lack of support on the software
| side.
| ahartmetz wrote:
| Yeah sure, but the question remains if it's going to be a
| huge amount of small cores or a moderate amount of huge
| cores.
|
| What it can't be is something like the Mill if they implement
| the RISC-V ISA.
| mixmastamyk wrote:
| Article implies a CPU focus at first, though is a bit
| vague. Title is clear however.
| leetrout wrote:
| For those that don't know about the Mill see
| https://millcomputing.com/
|
| I came to this thread looking for a comment about this.
| I've been patiently following along for over a decade now
| and I'm not optimistic anything will come from the project
| :(
| ahartmetz wrote:
| Yeah, I guess not at this point, but the presentations
| were very interesting to watch. According to the
| yearly(!) updates on their website, they are still going
| but not really close to finishing a product. Hm.
| lugu wrote:
| On the software side, I think the biggest blocker is
| affordable UEFI laptop for developers. A risc-v startup
| aiming to disrupt the cloud should include this in their
| master plan.
| trollbridge wrote:
| Doesn't RISC-V have a fairly reasonable ecosystem by now?
| glookler wrote:
| If RISC-V started with supercomputing and worked down that
| would be a change from how an architecture disruption usually
| works in the industry.
| mixmastamyk wrote:
| I was hoping they'd work with existing RV folks rather than
| starting another one of a dozen smaller attempts. Article says
| however that Keller from Tenstorrent will be on their board. Good
| I suppose, but hard to know the ramifications. Why not merge
| their companies and investments in one direction?
| constantcrying wrote:
| The article is so bad. Why do they refuse to say anything about
| what these companies are _actually_ trying to make. RISC-V Chips
| exist, does the journalist just not know? Does the company refuse
| to say what they are doing?
| pragma_x wrote:
| It reads like they're trying to drum up investment. This is why
| the focus is on the pedigree of the founders, since they don't
| have a product to speak of yet.
| muricula wrote:
| The article is written for a different audience than you might
| be used to. oregonlive is the website for the newspaper The
| Oregonian, which is the largest newspaper in the state of
| Oregon. Intel has many of its largest fabs in Oregon and is a
| big employer there. The local news is writing about a hip new
| startup for a non-technical audience who know what Intel is and
| why it's important, but need to be reminded what a CPU actually
| is.
| Ericson2314 wrote:
| TBH this is a bad sign about job sprawl.
|
| The fact that California housing pushed Intel to Oregon
| probably helped lead to its failures. Every time a company
| relocates to get cost of living (and thus payroll) costs down
| by relocating to a place with fewer potential employees and
| fewer competing employers, modernity slams on the breaks.
| Ericson2314 wrote:
| https://www.aheadcomputing.com/post/everyone-deserves-a-
| bett... sheesh, even the company's own writing is kinda
| folksy too.
| muricula wrote:
| That might have been true in the early 2000s when they were
| growing the Hillsborough Oregon campus but most new fabs
| are opening in Arizona for taxation and political reasons.
| I don't have the numbers to back it up, but based on
| articles about Intel layoffs I believe that Intel has been
| shedding jobs in Oregon for a while now.
|
| This wiki page has a list of Intel fab starts, you can see
| them being constructed in Oregon until 2013, and after that
| all new construction moved elsewhere. https://en.wikipedia.
| org/wiki/List_of_Intel_manufacturing_si...
|
| I can imagine this slow disinvestment in Oregon would only
| encourage some architects to quit an found a RISC-V
| startup.
| Ericson2314 wrote:
| I am saying that all this stuff should have never left
| the bay area, and the bay area should have millions more
| people than it does today.
|
| Arizona is also a mistake --- a far worse place for high
| tech than Oregon!. It is a desert real estate ponzi
| scheme with no top-tier schools, no history of top-tier
| high-skill intellectual job markets. In general the sun
| belt (including LA) is the land of stupid.
|
| The electoral college is always winning out over the best
| economic geography, and it sucks.
| 1970-01-01 wrote:
| Staring at current AI chip demand levels and choosing to go with
| RISC chips is the boldest move you could make. Good luck. The
| competition with the big boys will be relentless. I expect them
| to be bought if they actually make a dent in the market.
| saulpw wrote:
| The traitorous four.
| asplake wrote:
| > The traitorous eight was a group of eight employees who left
| Shockley Semiconductor Laboratory in 1957 to found Fairchild
| Semiconductor.
|
| https://en.wikipedia.org/wiki/Traitorous_eight
| saulpw wrote:
| Thanks, I guess that particular history and analogy would not
| be known universally :)
| Foobar8568 wrote:
| Bring back the architecture madness era of the 80s/90s.
| aesbetic wrote:
| This is more a bad look for Intel than anything truly exciting
| since they refuse to produce any details lol
| guywithahat wrote:
| As someone who knows almost nothing about CPU architecture, I've
| always wondered if there could be a new instruction set, better
| suited to today's needs. I realize it would require a monumental
| software effort but most of these instruction sets are decades
| old. RISC-V is newer but my understanding is it's still based
| around ARM, just without royalties (and thus isn't bringing many
| new ideas to the table per say)
| ItCouldBeWorse wrote:
| I think the ideal would be something like a Xilinx offering,
| tailoring the CPU- regarding cache, parallelism and in hardware
| execution of hotloop components, depending on the task.
|
| Your CPU changes with every app, tab and program you open.
| Changing from one core, to n-core plus AI-GPU and back. This
| idea, that you have to write it all in stone, always seemed
| wild to me.
| dehrmann wrote:
| I'm fuzzy on how FPGAs actually work, but they're heavier
| weight than you think, so I don't think you'd necessarily get
| the wins you're imagining.
| FuriouslyAdrift wrote:
| You should definitely look into AMD's Instict, Xynq, and
| Versal lines, then.
| jcranmer wrote:
| > RISC-V is newer but my understanding is it's still based
| around ARM, just without royalties (and thus isn't bringing
| many new ideas to the table per say)
|
| RISC-V is the fifth version of a series of academic chip
| designs at Berkeley (hence it's name).
|
| In terms of design philosophy, it's probably closest to MIPS of
| the major architectures; I'll point out that some of its early
| whitepapers are explicitly calling out ARM and x86 as the kind
| of architectural weirdos to avoid emulating.
| dehrmann wrote:
| > I'll point out that some of its early whitepapers are
| explicitly calling out ARM and x86 as the kind of
| architectural weirdos to avoid emulating.
|
| Says every new system without legacy concerns.
| guywithahat wrote:
| Theoretically wouldn't MIPS be worse, since it was designed
| to help students understand CPU architectures (and not to be
| performant)?
|
| Also I don't meet to come off confrontational, I genuinely
| don't know
| BitwiseFool wrote:
| MIPS was a genuine attempt at creating a new commercially
| viable architecture. Some of the design goals of MIPS made
| it conducive towards teaching, namely its relative
| simplicity and lack of legacy cruft. It was never intended
| to be an academic only ISA. Although I'm certain the owners
| hoped that learning MIPS in college would lead to wider
| industry adoption. That did not happen.
|
| Interestingly, I recently completed a masters-level
| computer architecture course and we used MIPS. However,
| starting next semester the class will use RISC-V instead.
| jcranmer wrote:
| The reason why I say RISC-V is probably most influenced by
| MIPS is because RISC-V places a rather heavy emphasis on
| being a "pure" RISC design. (Also, RISC-V _was_ designed by
| a university team, not industry!) Some of the core
| criticisms of the RISC-V ISA is on it carrying on some of
| these trends even when experience has suggested that doing
| otherwise would be better (e.g., RISC-V uses load-linked
| /store-conditional instead of compare-and-swap).
|
| Given that the core motivation of RISC was to be a
| maximally performant design for architectures, the authors
| of RISC-V would disagree with you that their approach is
| compromising performance.
| zozbot234 wrote:
| MIPS has a few weird features such as delay slots, that
| RISC-V sensibly dispenses with. There's been also quite a
| bit of convergent evolution in the meantime, such that
| AArch64 is significantly closer to MIPS and RISC-V compared
| to ARM32. Though it's still using condition codes where
| MIPS and RISC-V just have conditional branch instructions.
| ksec wrote:
| >I've always wondered if there could be a new instruction set,
| better suited to today's needs.
|
| AArach64 is pretty much a completely new ISA built from ground
| up.
|
| https://www.realworldtech.com/arm64/5/
| dehrmann wrote:
| I'm far from an expert here, but these days, it's better to
| view the instruction set as a frontend to the actual
| implementation. You can see this with Intel's E/P cores; the
| instructions are the same, but the chips are optimized
| differently.
|
| There actually have been changes for "today's needs," and
| they're usually things like AES acceleration. ARM tried to run
| Java natively with Jazelle, but it's still best to think of it
| as a frontend, and the fact that Android is mostly Java and
| ARM, but this feature got dropped says a lot.
|
| The fact that there haven't been that many changes shows they
| got the fundamental operations and architecture styles right.
| What's lacking today is where GPUs step in: massively wide
| SIMD.
| logicchains wrote:
| I wonder if it'll be ready before the Mill CPU?
| pstuart wrote:
| If Intel were smart (cough), they'd fund lots of skunkworks
| startups like this that could move quickly and freely, but then
| be "guided home" into intel once mature enough.
| cjbgkagh wrote:
| That creates a split between those who get to work on skunk
| works and those stuck on legacy. It's very possible to end up
| with a google like situation where no-one wants to keep the
| lights on for old projects as doing so would be career suicide.
| There have been some attempts at other companies at requiring
| people to have a stake in multiple projects in different stages
| of the lifecycle but I've never seen a stable version of this,
| as individuals benefit from bending the rules.
| pstuart wrote:
| Those are valid problems, however, they are not
| insurmountable.
|
| There's plenty of people who would be fine doing unexciting
| dead end work if they were compensated well enough (pay,
| work-life balance, acknowledgement of value, etc).
|
| This is ye olde Creative Destruction dilemma. There's too
| much inertia and politics internally to make these projects
| succeed in house. But if a startup was owned by the org and
| they mapped out a path of how to absorb it after it takes off
| they then reap the rewards rather than watch yet another
| competitor eat their lunch.
| cjbgkagh wrote:
| A spin-out to reacquire. I've seen a lot of outsourcing
| innovation via startups with much the same effects as skunk
| works. People at the main company become demoralized that
| the only way to get anything done is to leave the company,
| why solve a problem internally when you can do it
| externally for a whole more money and recognition. The
| causes brain drain to the point that the execs at the main
| company become suspicious of anyone who choses to remain
| long term. It even gets to the point that even after you're
| acquired it's better to leave and do it over again because
| the execs will forget you were acquired and start confusing
| you with their lifers.
|
| The only way I've seen anyone deal with this issue
| successfully is with rather small companies which don't
| have nearly as much of the whole agency cost of management
| to deal with.
| mbreese wrote:
| Or fund a lot of CPU startups that were tied back to Intel for
| manufacturing/foundry work. Sure, they could be funding their
| next big CPU competitor, but they'd still be able to capture
| the revenue from actually producing the chips.
| kleiba wrote:
| Good luck not infringing on any patents!
|
| And that's not sarcasm, I'm serious.
| neuroelectron wrote:
| Intel restructures into patent troll, hiring reverse engineers
| and investing in chip sanding and epoxy acids.
| energy123 wrote:
| I like the retro-ish and out of trend name they've chosen:
| AheadComputing.
| laughingcurve wrote:
| Together Compute
|
| SFCompute
|
| And so on ... definitely not out of trend
| johnklos wrote:
| One of the biggest problems with CPUs is legacy. Tie yourself to
| any legacy, and now you're spending millions of transistors to
| make sure some way that made sense ages ago still works.
|
| Just as a thought experiment, consider the fact that the i80486
| has 1.2 million transistors. An eight core Ryzen 9700X has around
| 12 billion. The difference in clock speed is roughly 80 times,
| and the difference in number of transistors is 1,250 times.
|
| These are wild generalizations, but let's ask ourselves: If a
| Ryzen takes 1,250 times the transistor for one core, does one
| core run 1,250 times (even taking hyperthreading in to account)
| faster than an i80486 at the same clock? 500 times? 100 times?
|
| It doesn't, because massive amounts of those transistors go to
| keeping things in sync, dealing with changes in execution,
| folding instructions, decoding a horrible instruction set, et
| cetera.
|
| So what might we be able to do if we didn't need to worry about
| figuring out how long our instructions are? Didn't need to deal
| with Spectre and Meltdown issues? If we made out-of-order work in
| ways where much more could be in flight and the compilers /
| assemblers would know how to avoid stalls based on dependencies,
| or how to schedule dependencies? What if we took expensive
| operations, like semaphores / locks, and built solutions in to
| the chip?
|
| Would we get to 1,250 times faster for 1,250 times the number of
| transistors? No. Would we get a lot more performance than we get
| out of a contemporary x86 CPU? Absolutely.
| johnklos wrote:
| > and the difference in number of transistors is 1,250 times
|
| I should've written _per core_.
| colechristensen wrote:
| GPUs scaled wide with a similar number of transistors to a 486
| and just lots more cores, thousands to tens of thousands of
| cores averaging out to maybe 5 million transistors per core.
|
| CPUs scaled tall with specialized instruction to make the
| single thread go faster, no the amount done per transistor does
| not scale anywhere near linearly, very many of the transistors
| are dark on any given cycle compared to a much simpler core
| that will have much higher utilization.
| zozbot234 wrote:
| > Didn't need to deal with Spectre and Meltdown issues? If we
| made out-of-order work in ways where much more could be in
| flight and the compilers / assemblers would know how to avoid
| stalls based on dependencies, or how to schedule dependencies?
| What if we took expensive operations, like semaphores / locks,
| and built solutions in to the chip?
|
| I'm pretty sure that these goals will conflict with one another
| at some point. For example, the way one solves Spectre/Meltdown
| issues in a principled way is by changing the hardware and
| system architecture to have some notion of "privacy-sensitive"
| data that shouldn't be speculated on. But this will unavoidably
| limit the scope of OOO and the amount of instructions that can
| be "in-flight" at any given time.
|
| For that matter, with modern chips, semaphores/locks are
| already implemented with hardware builtin operations, so you
| can't do that much better. Transactional memory is an
| interesting possibility but requires changes on the software
| side to work properly.
| dist-epoch wrote:
| If you look at a Zen5 die shots half of the space is taken by
| L3 cache.
|
| And from each individual core:
|
| - 25% per core L1/L2 cache
|
| - 25% vector stuff (SSE, AVX, ...)
|
| - from the remaining 50% only about 20% is doing instruction
| decoding
|
| https://www.techpowerup.com/img/AFnVIoGFWSCE6YXO.jpg
| zozbot234 wrote:
| The real issue with complex insn decoding is that it's hard
| to make the decode stage wider and at some point this will
| limit the usefulness of a bigger chip. For instance, AArch64
| chips tend to have wider decode than their close x86_64
| equivalents.
| Sohcahtoa82 wrote:
| > If a Ryzen takes 1,250 times the transistor for one core,
| does one core run 1,250 times (even taking hyperthreading in to
| account) faster than an i80486 at the same clock? 500 times?
| 100 times?
|
| Would be interesting to see a benchmark on this.
|
| If we restricted it to 486 instructions only, I'd expect the
| Ryzen to be 10-15x faster. The modern CPU will perform out-of-
| order execution with some instructions even run in parallel,
| even in single-core and single-threaded execution, not to
| mention superior branch prediction and more cache.
|
| If you allowed modern instructions like AVX-512, then the
| speedup could easily be 30x or more.
|
| > Would we get to 1,250 times faster for 1,250 times the number
| of transistors? No. Would we get a lot more performance than we
| get out of a contemporary x86 CPU? Absolutely.
|
| I doubt you'd get significantly more performance, though you'd
| likely gain power efficiency.
|
| Half of what you described in your hypothetical instruction set
| are already implemented in ARM.
| kvemkon wrote:
| Would be interesting to compare transistor count without L3
| (and perhaps L2) cache.
|
| 16-core Zen 5 CPU achieves more than 2 TFLOPS FP64. So number
| crunching performance scaled very well.
|
| It is weird, that the best consumer GPU can 4 TFLOPS. Some
| years ago GPUs were an order of magnitude and more faster than
| CPUs. Today GPUs are likely to be artificially limited.
| kvemkon wrote:
| E.g. AMD Radeon PRO VII with 13.23 billion transistors
| achieves 6.5 TFLOPS FP64 in 2020 [1].
|
| [1] https://www.techpowerup.com/gpu-specs/radeon-pro-
| vii.c3575
| zozbot234 wrote:
| > 16-core Zen 5 CPU achieves more than 2 TFLOPS FP64. So
| number crunching performance scaled very well.
|
| These aren't realistic numbers in most cases because you're
| almost always limited by memory bandwidth, and even if memory
| bandwidth is not an issue you'll have to worry about
| thermals. Theoretical CPU compute ceiling is almost never the
| real bottleneck. GPU's have a very different architecture
| with higher memory bandwidth and running their chips a lot
| slower and cooler (lower clock frequency) so they can reach
| much higher numbers in practical scenarios.
| kvemkon wrote:
| Sure, not for BLAS Level 1 and 2 operations. But not even
| for Level 3?
| layla5alive wrote:
| Huh, consumer GPUs are doing Petaflops of floating point.
| FP64 isn't a useful comparison because FP64 is nerfed on
| consumer GPUs.
| kvemkon wrote:
| Even recent nVidia 5090 has 104.75 TFLOPS FP32.
|
| It's useful comparison in terms of achievable performance
| per transistor count.
| amelius wrote:
| > and now you're spending millions of transistors
|
| and spending millions on patent lawsuits ...
| epx wrote:
| Aren't 99,99999% of these transistors used in cache?
| PopePompus wrote:
| Gosh no. Often a majority of the transistors are used in
| cache, but not 99%.
| nomel wrote:
| Look up "CPU die diagram". You'll see the physical layout of
| the CPU with annotated blocks.
|
| Zen 3 example: https://www.reddit.com/r/Amd/comments/jqjg8e/q
| uick_zen3_die_...
|
| So, more like 85%, or around 6 orders of magnitude difference
| from your guess. ;)
| AnthonyMouse wrote:
| Modern CPUs don't actually execute the legacy instructions,
| they execute core-native instructions and have a piece of
| silicon dedicated to translating the legacy instructions into
| them. That piece of silicon isn't that big. Modern CPUs use
| more transistors because transistors are a lot cheaper now,
| e.g. the i486 had 8KiB of cache, the Ryzen 9700X has >40MiB.
| The extra transistors don't make it linearly faster but they
| make it faster enough to be worth it when transistors are
| cheap.
|
| Modern CPUs also have a lot of things integrated into the "CPU"
| that used to be separate chips. The i486 didn't have on-die
| memory or PCI controllers etc., and those things were
| themselves less complicated then (e.g. a single memory channel
| and a shared peripheral bus for all devices). The i486SX didn't
| even have a floating point unit. The Ryzen 9000 series die
| contains an entire GPU.
| Szpadel wrote:
| that's exactly why Intel proposed x86S
|
| that's basically x86 without 16 and 32 bit support, no real
| mode etc.
|
| CPU starts initialized in 64bit without all that legacy crap.
|
| that's IMO great idea. I think every few decades we need to
| stop and think again about what works best and take fresh start
| or drop some legacy unused features.
|
| risc v have only mandatory basic set of instructions, as little
| as possible to be Turing complete and everything else is
| extension that can be (theoretically) removed in the future.
|
| this also could be used to remove legacy parts without
| disrupting architecture
| saati wrote:
| But history showed exactly the opposite, if you don't have an
| already existing software ecosystem you are dead, the
| transistors for implementing x86 peculiarities are very much
| worth it if people in the market want x86.
| layla5alive wrote:
| In terms of FLOPS, Ryzen is ~1,000,000 times faster than a 486.
|
| For serial branchy code, it isn't a million times faster, but
| that has almost nothing to do with legacy and everything to do
| with the nature of serial code and that you can't linearly
| improve serial execution with architecture and transistor
| counts (you can sublinearly improve it), but rather with Denard
| scaling.
|
| It is worth noting, though, that purely via Denard scaling,
| Ryzen is already >100x faster, though! And via architecture
| (those transistors) it is several multiples beyond that.
|
| In general compute, if you could clock it down at 33 or 66MHz,
| a Ryzen would be much faster than a 486, due to using those
| transistors for ILP (instruction-level parallelism) and TLP
| (thread-level parallelism). But you won't see any TLP in a
| single serial program that a 486 would have been running, and
| you won't get any of the SIMD benefits either, so you won't get
| anywhere near that in practice on 486 code.
|
| The key to contemporary high performance computing is having
| more independent work to do, and organizing the data/work to
| expose the independence to the software/hardware.
| neuroelectron wrote:
| Tldr: RISC-V ASICs
| zackmorris wrote:
| I hope they design, build and sell a true 256-1024+ multicore CPU
| with local memories that appears as an ordinary desktop computer
| with a unified memory space for under $1000.
|
| I've written about it at length and I'm sure that anyone who's
| seen my comments is sick of me sounding like a broken record. But
| there's truly a vast realm of uncharted territory there. I
| believe that transputers and reprogrammable logic chips like
| FPGAs failed because we didn't have languages like Erlang/Go and
| GNU Octave/MATLAB to orchestrate a large number of processes or
| handle SIMD/MIMD simultaneously. Modern techniques like passing
| by value via copy-on-write (used by UNIX forking, PHP arrays and
| Clojure state) were suppressed when mainstream imperative
| languages using pointers and references captured the market. And
| it's really hard to beat Amdahl's law when we're worried about
| side effects. I think that anxiety is what inspired Rust, but
| there are so many easier ways of avoiding those problems in the
| first place.
| zozbot234 wrote:
| If you have 256-1024+ multicore CPUs they will probably have a
| fake unified memory space that's really a lot more like NUMA
| underneath. Not too different from how GPU compute works under
| the hood. And it would let you write seamless parallel code by
| just using Rust.
| jiggawatts wrote:
| Check out the Azure HBv5 servers.
|
| High bandwidth memory on-package with 352 AMD Zen 4 cores!
|
| With 7 TB/s memory bandwidth, it's basically an x86 GPU.
|
| This is the future of high performance computing. It used to be
| available only for supercomputers but it's trickling down to
| cloud VMs you can rent for reasonable money. Eventually it'll
| be standard for workstations under your desk.
| ngneer wrote:
| I wonder if there is any relation to the cancelled Royal and
| Beast Lake projects.
|
| https://www.notebookcheck.net/Intel-CEO-abruptly-trashed-Roy...
| phendrenad2 wrote:
| Previous discussion 9 months ago:
| https://news.ycombinator.com/item?id=41353155
| bluesounddirect wrote:
| https://archive.ph/BSKSq
___________________________________________________________________
(page generated 2025-06-06 23:01 UTC)