[HN Gopher] Testing AMD's Bergamo: Zen 4c
___________________________________________________________________
Testing AMD's Bergamo: Zen 4c
Author : latchkey
Score : 77 points
Date : 2024-06-22 17:36 UTC (5 hours ago)
(HTM) web link (chipsandcheese.com)
(TXT) w3m dump (chipsandcheese.com)
| tedunangst wrote:
| I remember as part of the TBB library release, Intel remarked
| that 100 pentium cores had the same transistor count of a core2.
| Took a while, but starting to turn the corner on slower and wider
| becoming more common.
| imtringued wrote:
| The problem with adding more cores is that SRAM is the real
| bottleneck. Adding more cores means more cache.
|
| Until someone figures out how to do 3D stackable SRAM similar
| to how SSDs work, SRAM will always consume most of the area on
| your chip.
| wmf wrote:
| It's called V-Cache.
| JonChesterfield wrote:
| This isn't fill the reticle with CPU, it's make a dozen
| separate chips and package them on a network fabric. Amount
| of cache can increase linearly with number of cores without a
| problem.
|
| There is some outstanding uncertainty about cache coherency
| vs performance as N goes up which shows up in numa cliff
| fashion. My pet theory is that'll be what ultimately kills
| x64 - the concurrency semantics are skewed really hard
| towards convenience and thus away from scalability.
| doctor_eval wrote:
| I know next to nothing about CPU architecture so please
| forgive a stupid question.
|
| Are you saying that the x86 memory model means RAM latency
| is more impactful than on some other architectures?
|
| Is this (tangentially) related to the memory mode that
| Apple reportedly added to the M1 to emulate x86 memory
| model to make emulation faster? - presumably to account for
| assumptions that compilers make about the state of the CPU
| after certain operations?
| nullc wrote:
| Is the 4c core slower in any other way than L3 cache reductions?
|
| Would be interesting to see a compute bound perfectly scaling
| workload and compare it in terms of absolute performance and
| performance per watt between Bergamo and Genoa.
| mlyle wrote:
| Same performance per cycle; less L3 and lower maximum
| frequency.
|
| Zen4C is said to be a little better performance per watt than
| Zen4, but it's not clear how much.
| toast0 wrote:
| > Is the 4c core slower in any other way than L3 cache
| reductions?
|
| It's got a lower clock ceiling. Not much lower than acheivable
| clocks in really dense zen4 Epyc, but a lot less than an 8 core
| Ryzen.
| wmf wrote:
| https://www.phoronix.com/review/amd-epyc-9754-bergamo/2
| Pet_Ant wrote:
| The actual title seems to be "Testing AMD's Bergamo: Zen 4c Spam"
| which I really like because for the perspectives of 20 years ago
| this would feel a bit like "spam" or a CPU-core "Zergling rush".
|
| As I said before, I do believe that this is the future of CPUs
| core. [1] With RAM latency not really having kept pace with CPUs
| have more performant cores really seems like a waste. In a Cloud
| setting where you always have some work to do it seems like
| simpler cores but more of them is really the answer. It's in the
| environment that the weight of x86's legacy will catch up with us
| and we'll need to get rid of all the waste transistors decoding
| cruft.
|
| https://news.ycombinator.com/item?id=40535915
| ComputerGuru wrote:
| I agreed with you up until the x86 comment. Legacy x86 support
| is largely a red herring. The constraints are architectural (as
| you noted, per-core memory bandwidth, plus other things) more
| than they are due to being tied down to legacy instruction
| sets.
| mlyle wrote:
| If the goal ends up being many-many-core, x86's complexity
| tax may start to matter. The cost of x86 compatibility
| relative to all the complexities required for performance has
| been small, but if we end up deciding that memory latency is
| going to kill us and we can't keep complex cores fed, then
| that is a vote for simpler architectures.
|
| I suspect the future will be something between these extremes
| (tons of dumb cores or ever-more-complicated cores to try and
| squeeze out IPC), though.
| Pet_Ant wrote:
| Each core needs to handle the full complexity of x86. Now, as
| super-scalar OoO x86 cores have evolved the percentage of die
| allocated to decoding the cruft has gone down.
|
| ...but when we start swarming simple cores, that cost starts
| to rise. Each core needs to be able to decode everything. Now
| when you can a 100 cores, even if the cruft is just 4%, that
| means you can have 4 more cores. This is for free if you are
| willing to recompile your code.
|
| Now, it may turn out that we need more decoding complexity
| than something like RISC-V currently has (Qualcomm has been
| working in it), but these will be deliberate, intentionally
| chose instead of accrued, that meet the needs of today and
| current trade offs, and not of the eart 80's.
| magicalhippo wrote:
| As a developer of fairly standard software, there's very
| little I can say I rely on from the x86/x64 ISA.
|
| One big one is probably around consistency model[1] and
| such which affects atomic operations and synchronizing
| multi-threaded code. Usually not directly though, I
| typically use libraries or OS primitives.
|
| Are there any non-obvious (to me anyway) ways us "typical
| devs" rely on x86/x64?
|
| I get the sense that a lot of software is one recompile
| away from running on some other ISA, but perhaps I'm overly
| naive.
|
| [1]: https://en.wikipedia.org/wiki/Consistency_model
| adrian_b wrote:
| The article says "AMD's server platform also leaves potential for
| expansion. Top end Genoa SKUs use 12 compute chiplets while
| Bergamo is limited to just eight. 12 Zen 4c compute dies would
| let AMD fit 192 cores in a single socket".
|
| It should be noted that the successor of Bergamo, Turin dense,
| which is expected by the end of the year, will have 12 compute
| chiplets, for a total of 192 Zen 5c cores, bringing thus both
| more cores and faster cores.
| metadat wrote:
| 192 cores per socket? If so, that's pretty wild.
___________________________________________________________________
(page generated 2024-06-22 23:00 UTC)