[HN Gopher] AMD Disables Zen 4's Loop Buffer
       ___________________________________________________________________
        
       AMD Disables Zen 4's Loop Buffer
        
       Author : luyu_wu
       Score  : 292 points
       Date   : 2024-11-30 20:47 UTC (1 days ago)
        
 (HTM) web link (chipsandcheese.com)
 (TXT) w3m dump (chipsandcheese.com)
        
       | syntaxing wrote:
       | Interesting read, one thing I don't understand is how much space
       | does loop buffer take on the die? I'm curious with it removed, on
       | future chips could you use the space for something more useful
       | like a bigger L2 cache?
        
         | progbits wrote:
         | It says 144 micro-op entries per core. Not sure how many bytes
         | that is, but L2 caches these days are around 1MB per core, so
         | assuming the loop buffer die space is mostly storage (sounds
         | like it) then it wouldn't make a notable difference.
        
         | Remnant44 wrote:
         | My understanding is that it's a pretty small optimization on
         | the front end. It doesn't have a lot of entries to begin with
         | (144) so the amount of space saved is probably negligible.
         | Theoretically, the loop buffer would let you save power or
         | improve performance in a tight loop. In practice, it doesn't
         | seem to do either, and AMD removed it completely for Zen 5.
        
         | akira2501 wrote:
         | I think most modern chips are routing constrained and not
         | floorspace constrained. You can build tons of features but
         | getting them all power and normalized signals is an absolute
         | chore.
        
         | atq2119 wrote:
         | Judging from the diagrams, the loop buffer is using the same
         | storage as the micro-op queue that's there anyway. If that is
         | accurate (and it does seem plausible), then the area cost is
         | just some additional control logic. I suspect the most
         | expensive part is detecting a loop in the first place, but
         | that's probably quite small compared to the size of the queue.
        
       | eqvinox wrote:
       | > Strangely, the game sees a 5% performance loss with the loop
       | buffer disabled when pinned to the non-VCache die. I have no
       | explanation for this, [...]
       | 
       | With more detailed power measurements, it could be possible to
       | determine if this is thermal/power budget related? It does sound
       | like the feature was intended to conserve power...
        
         | eek2121 wrote:
         | He didn't provide enough detail here. The second CCD on a Ryzen
         | chip is not as well binned as the first one even on. non-X3D
         | chips. Also, EVERY chip is different.
         | 
         | Most of the cores on CCD0 of my non-X3D chip hit 5.6-5.75ghz.
         | CCD 1 has cores topping out at 5.4-5.5ghz.
         | 
         | V-Cache chips for Zen 4 have a huge clock penalty, however the
         | Cache more than makes up for it.
         | 
         | Did he test CCD1 on the same chip with both the feature
         | disabled and enabled? Did he attempt to isolate other changes
         | like security fixes as well? He admitted "no" in his article.
         | 
         | The only proper way to test would be to find a way to disable
         | the feature on a bios that has it enabled and test both
         | scenarios across the same chip, and even then the result may
         | still not be accurate due to other possible branch conditions.
         | A full performance profile could bring accuracy, but I suspect
         | only an AMD engineer could do that...
        
           | clamchowder wrote:
           | Yes, I tested on CCD1 (the non-vcache CCD) on both BIOS
           | versions.
        
         | ryao wrote:
         | He mentioned that it was disabled somewhere between the two
         | UEFI versions he tested. Presumably there are other changes
         | included, so his measurements are not strict A/B testing.
        
       | Pannoniae wrote:
       | From another article:
       | 
       | "Both the fetch+decode and op cache pipelines can be active at
       | the same time, and both feed into the in-order micro-op queue.
       | Zen 4 could use its micro-op queue as a loop buffer, but Zen 5
       | does not. I asked why the loop buffer was gone in Zen 5 in side
       | conversations. They quickly pointed out that the loop buffer
       | wasn't deleted. Rather, Zen 5's frontend was a new design and the
       | loop buffer never got added back. As to why, they said the loop
       | buffer was primarily a power optimization. It could help IPC in
       | some cases, but the primary goal was to let Zen 4 shut off much
       | of the frontend in small loops. Adding any feature has an
       | engineering cost, which has to be balanced against potential
       | benefits. Just as with having dual decode clusters service a
       | single thread, whether the loop buffer was worth engineer time
       | was apparently "no"."
        
       | londons_explore wrote:
       | The article seems to suggest that the loop buffer provides no
       | performance benefit and no power benefit.
       | 
       | If so, it might be a classic case of "Team of engineers spent
       | months working on new shiny feature which turned out to not
       | actually have any benefit, but was shipped anyway, possibly so
       | someone could save face".
       | 
       | I see this in software teams when someone suggests it's time to
       | rewrite the codebase to get rid of legacy bloat and increase
       | performance. Yet, when the project is done, there are more lines
       | of code and performance is worse.
       | 
       | In both cases, the project shouldn't have shipped.
        
         | adgjlsfhk1 wrote:
         | > but was shipped anyway, possibly so someone could save face
         | 
         | no. once the core has it and you realize it doesn't help much,
         | it absolutely is a risk to remove it.
        
           | glzone1 wrote:
           | No kidding. I was adjacent to a tape out w some last minute
           | tweaks - ugh. The problem is the current cycle time is very
           | slow and costly and u spend as much time validating things as
           | you do designing. It's not programming.
        
             | hajile wrote:
             | If you work on a critical piece of software (especially one
             | you can't update later), you absolutely can spend way more
             | time validating than you do writing code.
             | 
             | The ease of pushing updates encourages lazy coding.
        
               | chefandy wrote:
               | > The ease of pushing updates encourages lazy coding.
               | 
               | Certainly in some cases, but in others, it just shifts
               | the economics: Obviously, fault tolerance can be
               | laborious and time consuming, and that time and labor is
               | taken from something else. When the natures of your dev
               | and distribution pipelines render faults less disruptive,
               | and you have a good foundational codebase and code review
               | process that pay attention to security and core
               | stability, quickly creating 3 working features can be
               | much, much more valuable than making sure 1 working
               | feature will never ever generate a support ticket.
        
             | magicalhippo wrote:
             | Once interviewed at a place which made sensors that was
             | used a lot in the oil industry. Once you put a sensor on
             | the bottom of the ocean 100+ meters (300+ feet) down,
             | they're not getting serviced any time soon.
             | 
             | They showed me the facilities, and the vast majority was
             | taken up by testing and validation rigs. The sensors would
             | go through many stages, taking several weeks.
             | 
             | The final stage had an adjacent room with a viewing window
             | and a nice couch, so a representative for the client could
             | watch the final tests before bringing the sensors back.
             | 
             | Quite the opposite to the "just publish a patch" mentality
             | that's so prevalent these days.
        
             | oefrha wrote:
             | > It's not programming.
             | 
             | Even for software it's often risky to remove code once it's
             | in there. Lots of software products are shipped with tons
             | of unused code and assets because no one's got time to
             | validate nothing's gonna go wrong when you remove them.
             | Check out some game teardowns, they often have dead assets
             | from years ago, sometimes even completely unrelated things
             | from the studio's past projects.
             | 
             | Of course it's 100x worse for hardware projects.
        
               | gtirloni wrote:
               | And that's another reason for tackling technical debt
               | early on because once it compounds, no one is ever
               | touching that thing.
        
         | akira2501 wrote:
         | > but was shipped anyway, possibly so someone could save face
         | 
         | Was shipped anyway because it can be disabled with a firmware
         | update and because drastically altering physical hardware
         | layouts mid design was likely to have worse impacts.
        
           | readyplayernull wrote:
           | That bathroom with a door to the kitchen.
        
           | eek2121 wrote:
           | Well that and changing a chip can take years due to
           | redesigning, putting through validation, RTM, and time to
           | create.
           | 
           | Building chips is a multiyear process and most folks don't
           | understand this.
        
           | usrusr wrote:
           | What you describe would be shipped physically but disabled,
           | and that certainly happens a lot. For exactly those reasons.
           | What GP described was shipped not only physically present but
           | also not even disabled, because politics. That would be a
           | very different thing.
        
         | ksaj wrote:
         | "the project shouldn't have shipped."
         | 
         | Tell that to the share holders. As a public company, they can
         | very quickly lose enormous amounts of money by being behind or
         | below on just about anything.
        
         | sweetjuly wrote:
         | The article also mentions they had trouble measuring power
         | usage in general so we can't necessarily (and, really,
         | shouldn't) conclude that it has no impact whatsoever. I highly
         | doubt that AMD's engineering teams are so unprincipled as to
         | allow people to add HW features with no value (why would you
         | dedicate area and power to a feature which doesn't do
         | anything?), and so I'm inclined to give them the benefit of the
         | doubt here and assume that Chips 'n Cheese simply couldn't
         | measure the impact.
        
           | clamchowder wrote:
           | Note - I saw the article through from start to finish. For
           | power measurements I modified my memory bandwidth test to
           | read AMD's core energy status MSR, and modified the
           | instruction bandwidth testing part to create a loop within
           | the test array. (https://github.com/clamchowder/Microbenchmar
           | ks/commit/6942ab...)
           | 
           | Remember most of the technical analysis on Chips and Cheese
           | is a one person effort, and I simply don't have infinite free
           | time or equipment to dig deeper into power. That's why I
           | wrote "Perhaps some more mainstream tech outlets will figure
           | out AMD disabled the loop buffer at some point, and do
           | testing that I personally lack the time and resources to
           | carry out."
        
           | kimixa wrote:
           | > engineering teams are so unprincipled as to allow people to
           | add HW features with no value
           | 
           | This is often pretty common, as the performance
           | characteristics are often unknown until late in the hardware
           | design cycle - it would be "easy" if each cycle was just
           | changing that single unit with everything else static, but
           | that isn't the case as everything is changing around it. And
           | then by the time you've got everything together complete
           | enough to actually test end-to-end pipeline performance,
           | removing things is often the riskier choice.
           | 
           | And that's before you even get to the point of low-level
           | implementation/layout/node specific optimizations, which can
           | then again have somewhat unexpected results on frequency and
           | power metrics.
        
         | saagarjha wrote:
         | Only on Hacker News will you get CPU validation fanfiction.
        
         | EVa5I7bHFq9mnYK wrote:
         | >> when the project is done, there are more lines of code and
         | performance is worse
         | 
         | There is an added benefit though - that the new programmers now
         | are fluent in the code base. That benefit might be worth more
         | than LOCs or performance.
        
         | iforgotpassword wrote:
         | Well the other possibility is that the power benchmarks are
         | accurate: the buffer did save power, but then they figured out
         | an even better optimization on the microcodes level that would
         | make the regular path save even more power, so the buffer
         | actually became a power hog.
        
         | 01100011 wrote:
         | Working at.. a very popular HW company.. I'll say that we(the
         | SW folks) are currently obsessed with 'doing something' even if
         | the thing we're doing hasn't fully been proven to have benefits
         | outside of some narrow use cases or targeted benchmarks. It's
         | very frustrating, but no one wants to put the time in to do the
         | research up front. It's easier to just move forward with a new
         | project because upper management stays happy and doesn't ask
         | questions.
        
           | usrusr wrote:
           | Is it that expectation of major updates coming in at a fixed
           | cycle? Not only expected by upper management but also by end
           | users? That's a difficult trap to get out of.
           | 
           | I wonder if that will be the key benefit of Google's switch
           | to two "major" Android releases each year: it will get people
           | used to nothing newsworthy happening within a version
           | increment. And I also wonder if that's intentional, and my
           | guess is not the tiniest bit.
        
           | markus_zhang wrote:
           | Do you have new software managers/directors who are
           | encouraging such behavior? From my experience new leaders
           | tend to lean on this tactics to grab power.
        
         | weinzierl wrote:
         | _" The article seems to suggest that the loop buffer provides
         | no performance benefit and no power benefit."_
         | 
         | It tests the performance benefit hypothesis in different
         | scenarios and does not find evidence that supports it. It makes
         | _one_ best effort attempt to test the power benefit hypothesis
         | and concludes it with: _" Results make no sense."_
         | 
         | I think the real take-away is that performance measurements
         | without considering power tell only half the story. We came a
         | long way when it comes to the performance measurement half but
         | power measurement is still hard. We should work on that.
        
         | firebot wrote:
         | The article clearly articulates that there's no performance
         | benefit. However there's efficiency. It reduces power
         | consumption.
        
         | hinkley wrote:
         | Someone elsewhere quotes a game specific benchmark of about
         | 15%. Which will mostly matter when your FPS starts to make game
         | play difficult.
         | 
         | There will be a certain number of people who will delay an
         | upgrade a bit more because the new machines don't have enough
         | extra oomph to warrant it. Little's Law can apply to finance
         | when it's interval between purchases.
        
       | londons_explore wrote:
       | In the "power" section, it seems the analysis doesn't divide by
       | the number of instructions executed per second.
       | 
       | Energy used per instruction is almost certainly the metric that
       | should be considered to see the benefits of this loop buffer, not
       | energy used per second (power, watts).
        
         | eek2121 wrote:
         | Every instruction takes a different amount of clock cycles (and
         | this varies between architectures or iterations of an
         | architecture such as Zen 4-Zen 5), so that is not feasible
         | unless running the workload produced the exact same
         | instructions per cycle, which is impossible due to multi
         | threading/tasking. Even order and the contents of RAM matters
         | since both can change everything.
         | 
         | While you can somewhat isolate for this by doing hundreds of
         | runs for both on and off, that takes tons of time and still
         | won't be 100% accurate.
         | 
         | Even disabling the feature can cause the code to use a
         | different branch which may shift everything around.
         | 
         | I am not specifically familiar with this issue, but I have seen
         | cases where disabling a feature shifted the load from integer
         | units to the FPU or the GPU as an example, or added 2
         | additional instructions while taking away 5.
        
       | rasz wrote:
       | Anecdotally one of very few differences between 1979 68000 and
       | 1982 68010 was addition of "loop mode", a 6 byte Loop Buffer :)
        
         | crest wrote:
         | Much more importantly they fixed the MMU support. The original
         | 68000 lost some state required to recover from a page fault the
         | workaround was ugly and expensive: run two CPUs "time shifted"
         | by one cycle and inject a recoverable interrupt on the second
         | CPU. Apparently it was still cheaper than the alternatives at
         | the time if you wanted a CPU with MMU, a 32 bit ISA and a 24
         | bit address bus. Must have been a wild time.
        
           | phire wrote:
           | _> run two CPUs  "time shifted" by one cycle and inject a
           | recoverable interrupt on the second CPU._
           | 
           | That's not quite how it was implemented.
           | 
           | Instead, the second 68000 was halted and disconnected from
           | the bus until the first 68000 (the executor) trigged a fault.
           | Then the first 68000 would be held in halt, disconnected from
           | the bus and the second 68000 (the fixer) would take over the
           | bus to run the fault handler code.
           | 
           | After the fault had been handled, the first 68000 could be
           | released from halt and it would resume execution of the
           | instruction, with all state intact.
           | 
           | As for the cost of a second 68000, extra logic and larger
           | PCBs? Well, the of the Motorola 68451 MMU (or equivalent)
           | absolutely dwarfed the cost of everything else, so adding a
           | second CPU really wasn't a big deal.
           | 
           | Technically it didn't need to be another 68000, any CPU would
           | do. But it's simpler to use a single ISA.
           | 
           | For more details, see Motorola's application note here: http:
           | //marc.retronik.fr/motorola/68K/68000/Application%20Not...
        
         | Dylan16807 wrote:
         | That's neat. For small loop buffers, I quite like the
         | GreenArrays forth core. It has 18 bit words that hold 4
         | instructions each, and one of the opcodes decrements a loop
         | counter and goes back to the start of the word. And it can run
         | appreciably faster while it's doing that.
        
         | ack_complete wrote:
         | The loop buffer on the 68010 was almost useless, because not
         | only was it only 6 bytes, it only held two instructions. One
         | had to be the loop instruction (DBcc), so the loop body had to
         | be a single instruction. Pretty much the only thing it could
         | speed up in practice was an unoptimized memcpy.
        
       | eek2121 wrote:
       | It sounds to me like it was too small to make any real difference
       | except in very specific scenarios and a larger one would have
       | been too expensive to implement compared to the benefit.
       | 
       | That being said, some workloads will see a small regression,
       | however AMD has made some small performance improvements since
       | launch.
       | 
       | They should have just made it a BIOS option for Zen 4. The fact
       | they do not appear to have done so does indicate the possibility
       | of a bug or security issue.
        
         | crest wrote:
         | Them *quietly* disabling a feature that few users will notice
         | yet complicates the frontend suggests they pulled this chicken
         | bit because they wanted to avoid or delay disclosing a hardware
         | bug to the general public, but already push the mitigation.
         | Fucking vendors! Will they ever learn? _sigh_
        
           | whaleofatw2022 wrote:
           | Devils advocate... if this is being actively exploited or is
           | easily exploitable, the delay in announcement can prevent
           | other actions.
        
           | dannyw wrote:
           | Every modern CPU has dozens of hardware bugs that aren't
           | disclosed and quietly patched away or not mentioned.
        
           | BartjeD wrote:
           | Quitely disabling it is also a big risk. Because you're
           | signalling that in all probablity you were aware of the
           | severity of the issue; Enough so that you took steps to patch
           | it.
           | 
           | If you don't disclose the vulnerability then affected parties
           | cannot start taking countermeasures, except out of sheer
           | paranoia.
           | 
           | Disclosing a vulnerability is a way shift liability onto the
           | end user. You didn't update? Then don't complain. Only rarely
           | do disclosures lead to product liability. I don't remember
           | this (liability) happening with Meltdown and Spectre either.
           | So wouldn't assume this is AMD being secretive.
        
       | shantara wrote:
       | This is a wild guess, but could this feature be disabled in an
       | attempt at preventing some publicly undisclosed hardware
       | vulnerability?
        
         | throw_away_x1y2 wrote:
         | Bingo.
         | 
         | I can't say more. :(
        
           | pdimitar wrote:
           | Have we learned nothing from Spectre and Meltdown?... :(
        
             | gpderetta wrote:
             | Complex systems are complex?
        
               | pdimitar wrote:
               | Sadly you're right. And obviously we're not about to give
               | up on high IPC. I get it and I'm not judging -- it's just
               | a bit saddening.
        
             | StressedDev wrote:
             | A lot has been learned. Unfortunately, people still make
             | mistakes and hardware will continue to have security
             | vulnerabilities.
        
             | sweetjuly wrote:
             | I imagine this is more of a functional issue. i.e., the
             | loop buffer caused corruption of the instruction stream
             | under some weird specific circumstances. Spectre and
             | Meltdown are not functional issues but rather just side
             | channel issues.
             | 
             | This should be fun, however, for someone with enough time
             | to chase down and try and find the bug. Depending on the
             | consequences of the bug and the conditions under which it
             | hits, maybe you could even write an exploit (either going
             | from JavaScript to the browser or from user mode to the
             | kernel) with it :) Though, I strongly suspect that reverse
             | engineering and weaponizing the bug without any insider
             | knowledge will be exceedingly difficult. And, anyways,
             | there's also a decent chance this issue just leads to a
             | hang/livelock/MCE which would make it pointless to exploit.
        
             | rincebrain wrote:
             | The problem is that we're more or less stuck with this
             | class of problem unless we end up with something that looks
             | like a Xeon Phi without shared resources and run
             | calculations on many, many truly independent cores, or we
             | accept that the worst and best case performance cases are
             | identical (which I don't foresee anyone really agreeing
             | to).
             | 
             | Or, framed differently, if Intel or AMD announced a new
             | gamer CPU tomorrow that was 3x faster in most games but
             | utterly unsafe against all Meltdown/Spectre-class vulns,
             | how fast do you think they'd sell out?
        
               | thechao wrote:
               | Larabee was fun to program, but I think it'd have an even
               | worse time hardening memory sideband effects: the barrel
               | processor (which was necessary to have anything like
               | reasonable performance) was humorously easy to use for
               | cross-process exfiltration. Like... it was so easy, we
               | actually used it as an IPC mechanism.
        
               | wheybags wrote:
               | > it was so easy, we actually used it as an IPC
               | mechanism.
               | 
               | Can you elaborate on that? It sounds interesting
        
               | thechao wrote:
               | Now you're asking me technical details from more than a
               | decade ago. My recollection is that you could map one of
               | the caches between cores -- there were uncached-write-
               | through instructions. By reverse engineering the cache's
               | hash, you could write to a specific cache-line; the uc-
               | write would push it up into the correct line and the
               | "other core" could snoop that line from its side with a
               | lazy read-and-clear. The whole thing was janky-AF, but
               | way the hell faster than sending a message around the
               | ring. (My recollection was that the three interlocking
               | rings could make the longest-range message take hundreds
               | of cycles.)
        
               | rincebrain wrote:
               | Sure, absolutely, there's large numbers of additional
               | classes of side effects you would need to harden against
               | if you wanted to eliminate everything, I was mostly
               | thinking specifically of something with an enormous
               | number of cores without the 4-way SMT as a high-level
               | description.
               | 
               | I was always morbidly curious about programming those,
               | but never to the point of actually buying one, and I
               | always had more things to do in the day than time in past
               | life when we had a few of the cards in my office.
        
               | magicalhippo wrote:
               | We already have heterogeneous cores these days, with E
               | and P, and we have a ton of them as they take little
               | space on the die relative to cache. The solution, it
               | seems to me, is to have most cores go brrrrrr and a few
               | that are secure.
               | 
               | Given that we have effectively two browser platforms
               | (Chromium and Firefox) and two operating systems to
               | contend with (Linux and Windows), it seems entirely
               | tractable to get the security sensitive threads scheduled
               | to the "S cores".
        
               | astrange wrote:
               | That's a secure enclave aka secure element aka TPM. Once
               | you start wanting security you usually think up enough
               | other features (voltage glitching prevention, memory
               | encryption) that it's worth moving it off the CPU.
        
               | Dylan16807 wrote:
               | That's a wildly different type of security. I just want
               | to sandbox some code, not treat the entire world as
               | hostile.
        
               | ggu7hgfk8j wrote:
               | The main security boundary a modern computer upholds is
               | web vs everything else, including protecting one webpage
               | from another.
               | 
               | So I think it should be the javascript that should run on
               | these hypothetical cores.
               | 
               | Though perhaps a few other operations might choose to use
               | them as well.
        
               | nine_k wrote:
               | Also all the TLS, SSH, Wireguard and other encryption,
               | anything with long-persisted secret information.
               | Everything else, even secret (like displayed OTP codes)
               | is likely too fleeting for a snooping attack to be able
               | to find and exfiltrate it, even if an exfiltration
               | channel remains. Until a better exfiltration method is
               | found, of course :-(
               | 
               | I think we're headed towards the future of many highly
               | insulated computing nodes that share little if anything.
               | Maybe they'd have a faster way to communicate, e.g. by
               | remapping fast cache-like memory between cores, but that
               | memory would never be uncontrollably shared the way cache
               | lines are now.
        
               | lukan wrote:
               | "if Intel or AMD announced a new gamer CPU tomorrow that
               | was 3x faster in most games but utterly unsafe against
               | all Meltdown/Spectre-class vulns, how fast do you think
               | they'd sell out"
               | 
               | Well, many people have gaming computers, they won't use
               | for anything serious. So I would also buy it. And in
               | restricted gaming consoles, I suppose the risk is not too
               | high?
        
               | sim7c00 wrote:
               | you mean those consoles that can attack the rest of your
               | devices and your neighbours via its wireless chips?
        
               | ggu7hgfk8j wrote:
               | Speculation attacks enables code running on the machine
               | to access data it shouldn't. I don't see how that relates
               | to your scenario.
        
               | formerly_proven wrote:
               | Consoles are hardened very well to prevent homebrew,
               | cracking, cheating etc.
        
               | dcow wrote:
               | But this class of vuln is about data leaking between
               | users in a multi-user system.
        
               | wongarsu wrote:
               | Isn't it rather about data leaks between any two
               | processes? Whether those two processes belong to
               | different users is a detail of the threat model and the
               | OS's security model. In a console it could well be about
               | data leaks between a game with code-injection
               | vulnerability and the OS or DRM system.
        
               | alexvitkov wrote:
               | They're a pain in the ass all around. Spectre allowed you
               | to read everything paged in (including kernel memory)
               | from JS in the browser.
               | 
               | To mitigate it browsers did a bunch of hacks, including
               | nerfing precision on all timer APIs and disabling shared
               | memory, because you need an accurate timer for the
               | exploit - to this day performance.now() rounds to 1MS on
               | firefox and 0.1MS on Chrome.
               | 
               | This 1MS rounding funnily is a headache for me right as
               | we speak. On a say 240Hz monitor, for video games you
               | need to render a frame every ~4.16ms -- 1ms precision is
               | not enough for accurate ticker -- even if you render your
               | frames on time, the result can't be perfectly smooth as
               | the browser doesn't give an accurate enough timer by
               | which to advance your physics every frame.
        
               | izacus wrote:
               | Also, many games today outright install rootkits to
               | monitor your memory (see [1]) - some heartbleed is so far
               | down the line of credible threats on a gaming machine
               | that its outright ludicrous to trade off performance for
               | it.
               | 
               | [1]:https://www.club386.com/assassins-creed-shadows-drm-
               | wants-to...
        
               | nottorp wrote:
               | > how fast do you think they'd sell out?
               | 
               | 10-20 min, depending on how many they make :)
        
               | jorvi wrote:
               | Also, a good chunk of these vulnerabilities (Retbleed,
               | Downfall, Rowhammer, there's probably a few I'm
               | forgetting) are either theoretical, lab-only or spear
               | exploits that require a lot of setup. And then the
               | leaking info from something like Retbleed mostly applies
               | to shared machines like in cloud infrastructure.
               | 
               | Which makes it kind of terrible that the kernel has these
               | mitigations turned on by default, stealing somewhere in
               | the neighborhood of 20-60% of performance on older gen
               | hardware, just because the kernel has to roll with "one
               | size fits all" defaults.
        
               | nine_k wrote:
               | If you know what you're doing, you do something like
               | this: https://gist.github.com/jfeilbach/f06bb8408626383a0
               | 83f68276f... and make Linux fast again (c).
               | 
               | If you don't know what kernel parameters are and what do
               | they affect, it's likely safer to go with all the
               | mitigations enabled by default :-|
        
               | jorvi wrote:
               | Yeah, I personally have Retbleed and Downfall mitigations
               | disabled, the rest thankfully doesn't severely affect my
               | CPU performance.
               | 
               | Appreciate sharing the gist though!
        
               | DSingularity wrote:
               | I don't think you are thinking of this right. One bit of
               | leakage makes it half as hard to break encryption via
               | brute force. It's a serious problem. The defaults are
               | justified.
               | 
               | I think things will only shift once we have systems they
               | ship with fully sandboxes that are minimally optimized
               | and fully isolated. Until then we are forced to assume
               | the worst.
        
               | jorvi wrote:
               | > I don't think you are thinking of this right. One bit
               | of leakage makes it half as hard to break encryption via
               | brute force.
               | 
               | The problem is that you need to execute on the system,
               | then need to know which application you're targeting,
               | then figure out the timings, and even then you're not
               | certain you are getting the bits you want.
               | 
               | Enabling mitigations For servers? Sure. Cloud servers?
               | Definitely. High profile targets? Go for it.
               | 
               | The current defaults are like foisting iOS its "Lockdown
               | Mode" on all users by default and then expecting them to
               | figure out how to turn it off, except you have to do it
               | by connecting it to your Mac/PC and punching in a bunch
               | of terminal commands.
               | 
               | Then again, almost all kernel settings are server-optimal
               | (and even then, 90s server optimal). There should
               | honestly should be some serious effort to modernize the
               | defaults for reasonably modern servers, and then also
               | have a separate kernel for desktops (akin to CachyOS,
               | just more upstream).
        
               | int0x29 wrote:
               | Itanium allegedly was free from branch prediction issues
               | but I suspect cache behavior still might have been an
               | issue. Unfortunately it's also dead as a doornail.
        
             | Am4TIfIsER0ppos wrote:
             | We learned that processor manufacturers love "bugs" that
             | get solved by making them or your code slower giving you
             | incentive to buy a newer one to restore the performance.
        
               | shepherdjerred wrote:
               | This seems unnecessarily cynical. Are you saying
               | Intel/AMD are intentionally crippling CPUs?
        
               | bobmcnamara wrote:
               | I'm not saying Intel intentionally limited CPUs, just
               | that they have intentionally limited a lot of things and
               | lied about it in the past.
               | 
               | https://www.ftc.gov/news-events/news/press-
               | releases/2010/08/...
        
             | tedunangst wrote:
             | I was told the lesson is to avoid Intel and only buy AMD
             | because they don't make mistakes.
        
               | UberFly wrote:
               | No one said to buy AMD because they don't make mistakes.
               | AMD just currently makes a better product overall.
        
               | Dylan16807 wrote:
               | I do not think you are accurately recounting what people
               | said.
        
             | PittleyDunkin wrote:
             | I'm still not convinced most of the computers in my home
             | need to care about leaking data this way. I'm open to being
             | persuaded, though.
        
               | pdimitar wrote:
               | I am not convinced either but I am willing to bet some
               | software is adversarial and will try to exfiltrate data.
               | F.ex. many people look suspiciously at Zoom and Chrome.
               | 
               | So as long as stuff is not perfectly isolated from each
               | other then there's always a room for a bad actor to snoop
               | on stuff.
        
               | api wrote:
               | For most of these vulnerabilities the risk is low, but
               | keep in mind that your web browser runs random untrusted
               | code from all over the Internet in a VM with a JIT
               | compiler. This means you can't rule out the possibility
               | that someone will figure out a way to exploit this over
               | the web reliably, which would be catastrophic.
               | 
               | "Attacks only get better."
        
             | RobotToaster wrote:
             | We should have learnt from the fdiv bug[0] that processor
             | manufacturers need to be mandated to recall faulty
             | hardware.
             | 
             | [0] https://en.wikipedia.org/wiki/Pentium_FDIV_bug
        
               | nine_k wrote:
               | It depends on the severity of the problem, and the impact
               | on the _customers_ already using these systems. It may be
               | more economical for the customer to apply a patch and
               | lose a few percent of peak performance than to put
               | thousands of boxes offline and schedule personnel to swap
               | CPUs. This is to say nothing of the hassle of bringing
               | your new laptop to a service center, and taking a
               | replacement, or waiting if your exact configuration is
               | unavailable at the moment.
        
             | aseipp wrote:
             | This might come as a shock, but I can assure you that the
             | designing high end microprocessors have probably forgotten
             | more about these topics than most of the people here have
             | ever known.
        
               | pdimitar wrote:
               | Huh?
        
         | bell-cot wrote:
         | The Article more-or-less speculates that:
         | 
         | > Zen 4 is AMD's first attempt at putting a loop buffer into a
         | high performance CPU. Validation is always difficult,
         | especially when implementing a feature for the first time. It's
         | not crazy to imagine that AMD internally discovered a bug that
         | no one else hit, and decided to turn off the loop buffer out of
         | an abundance of caution. I can't think of any other reason AMD
         | would mess with Zen 4's frontend this far into the core's
         | lifecycle.
        
         | bhouston wrote:
         | Yeah, my first thoughts too.
        
         | BartjeD wrote:
         | Quitely disabling it is also a big risk. Because you're
         | signalling that in all probablity you were aware of the
         | severity of the issue; Enough so that you took steps to patch
         | it.
         | 
         | If you don't disclose the vulnerability then affected parties
         | cannot start taking countermeasures, except out of sheer
         | paranoia.
         | 
         | Disclosing a vulnerability is a way shift liability onto the
         | end user. You didn't update? Then don't complain. Only rarely
         | do disclosures lead to product liability. I don't remember this
         | (liability) happening with Meltdown and Spectre either. So
         | wouldn't assume this is AMD being secretive.
        
           | wtallis wrote:
           | Please don't post duplicate comments like this. Your first
           | comment (https://news.ycombinator.com/item?id=42287118) was
           | fine but spamming a thread with copy-and-pasted comments just
           | hurts the signal to noise ratio.
        
             | alexnewman wrote:
             | Im confused would you be ok if he's addressing the same
             | point in the forum with a slightly different sentence?
        
               | wtallis wrote:
               | Any threaded discussion carries the risk of different
               | subthreads ending up in the same place. The simplest
               | solution is to just _not post twice_ , and trust that the
               | reader can read the rest of the thread; HN threads
               | usually don't get long enough for good comments to get
               | too buried, and not duplicating comments helps avoid that
               | problem. If there's something slightly different, it may
               | be worth linking to another comment in a different
               | subthread and adding a few sentences to cover the
               | differences. Copying a whole comment is never a good
               | answer, and re-wording it to obscure the fact that it's
               | not saying anything new is also bad. New comments should
               | have something new to say.
        
               | lukan wrote:
               | Posting a link of the other replay is a solution as well.
        
               | hinkley wrote:
               | I would get confused handling follow-ups to both copies.
               | 
               | I have enough trouble if someone responds to my responses
               | in a tone similar to GP and I end up treating them like
               | the same person (eg, GP makes a jab and now I'm snarky 9r
               | call out the wrong person). Especially if I have to step
               | away to deal with life.
        
             | hobobaggins wrote:
             | And, just like that, you turned the rest of this thread
             | into a meta discussion about HN rather than about the
             | topic. It's ironic, because that really hurt the SNR more
             | than a duplicated, but on-topic, comment.
        
           | immibis wrote:
           | The countermeasure is to disable the loop buffer. Everyone
           | who wants to protect themselves from the unknown
           | vulnerability should disable the loop buffer. Once everyone's
           | done that or had a reasonable opportunity to do that, it can
           | be safely published.
        
             | jdiff wrote:
             | There's no real impetus except paranoia if the change is
             | unannounced. You don't have to detail the vulnerability,
             | just inform people that somewhere, one exists, and that
             | this _is_ in fact a countermeasure. Without doing that, you
             | don 't shift liability, you don't actually get people out
             | of harm's way, you don't really benefit at all.
        
         | baq wrote:
         | Indeed, it might be the case that there's more than that
         | disabled, since numbers are somewhat surprising:
         | 
         | > Still, the Cyberpunk 2077 data bothers me. Performance
         | counters also indicate higher average IPC with the loop buffer
         | enabled when the game is running on the VCache die.
         | Specifically, it averages 1.25 IPC with the loop buffer on, and
         | 1.07 IPC with the loop buffer disabled. And, there is a tiny
         | performance dip on the new BIOS.
         | 
         | Smells of microcode mitigations if you ask me, but naturally
         | let's wait for the CVE.
        
           | hinkley wrote:
           | Or another logic bug. We haven't had a really juicy one in a
           | while.
        
       | ksec wrote:
       | Wondering if Loop Buffer is still there with Zen 5?
       | 
       | ( Idly waiting for x86 to try and compete with ARM on efficiency.
       | Unfortunately I dont see Zen 6 or Panther Lake getting close. )
        
         | monocasa wrote:
         | It is not.
        
       | CalChris wrote:
       | If it saved power wouldn't that lead to less thermal throttling
       | and thus improved performance? That power had to matter in the
       | first place or it wouldn't have been worth it in the first place.
        
         | kllrnohj wrote:
         | Not necessarily. Let's say this optimization can save 0.1w in
         | certain situations. If one of those situations is common when
         | the chip is idle just keeping wifi alive, well hey that's 0.1w
         | in a ~1w total draw scenario, that's 10% that's huge!
         | 
         | But when the CPU is pulling 100w under load? Well now we're
         | talking an amount so small it's irrelevant. Maybe with a well
         | calibrated scope you could figure out if it was on or not.
         | 
         | Since this is in the micro-op queue in the front end, it's
         | going to be more about that very low total power draw side of
         | things where this comes into play. So this would have been
         | something they were doing to see if it helped for the laptop
         | skus, not for the desktop ones.
        
           | Out_of_Characte wrote:
           | You're probaly right on the mark with this. Though even
           | desktops and servers can benefit from lower idle power draw.
           | So there is a chance that it might have been moved to a
           | different c-state.
        
       | mleonhard wrote:
       | It looks like they disabled a feature flag. I didn't expect to
       | see such things in CPUs.
        
         | astrange wrote:
         | They have lots of them (called "chicken bits"). Some of them
         | have BIOS flags, some don't.
         | 
         | It's very very expensive to fix a bug in a CPU, so it's easier
         | to expose control flags or microcode so you can patch it out.
        
       | fulafel wrote:
       | Interesting that in the Cortex-A15 this is a "key design
       | feature". Are there any numbers about its effect other chips?
       | 
       | I guess this could also be used as an optimization target at
       | least on devices that are more long lived designs (eg consoles).
        
         | nwallin wrote:
         | I'm curious about this too. I would expect any RISC
         | architecture to gain relatively little from a loop buffer. The
         | point of RISC is that instruction fetch/decode is substantially
         | easier, if not trivial.
        
       | Loic wrote:
       | For me the most interesting paragraph in the article is:
       | 
       | > Perhaps the best way of looking at Zen 4's loop buffer is that
       | it signals the company has engineering bandwidth to go try
       | things. Maybe it didn't go anywhere this time. But letting
       | engineers experiment with a low risk, low impact feature is a
       | great way to build confidence. I look forward to seeing more of
       | that confidence in the future.
        
       | Neywiny wrote:
       | I have a 7950x3d. It's my upgrade from.... Skylake's 6700k. I
       | guess I'm subconsciously drawn to chips with hardware loop
       | buffers disabled by software.
        
       ___________________________________________________________________
       (page generated 2024-12-01 23:00 UTC)