[HN Gopher] AMD Disables Zen 4's Loop Buffer
___________________________________________________________________
AMD Disables Zen 4's Loop Buffer
Author : luyu_wu
Score : 292 points
Date : 2024-11-30 20:47 UTC (1 days ago)
(HTM) web link (chipsandcheese.com)
(TXT) w3m dump (chipsandcheese.com)
| syntaxing wrote:
| Interesting read, one thing I don't understand is how much space
| does loop buffer take on the die? I'm curious with it removed, on
| future chips could you use the space for something more useful
| like a bigger L2 cache?
| progbits wrote:
| It says 144 micro-op entries per core. Not sure how many bytes
| that is, but L2 caches these days are around 1MB per core, so
| assuming the loop buffer die space is mostly storage (sounds
| like it) then it wouldn't make a notable difference.
| Remnant44 wrote:
| My understanding is that it's a pretty small optimization on
| the front end. It doesn't have a lot of entries to begin with
| (144) so the amount of space saved is probably negligible.
| Theoretically, the loop buffer would let you save power or
| improve performance in a tight loop. In practice, it doesn't
| seem to do either, and AMD removed it completely for Zen 5.
| akira2501 wrote:
| I think most modern chips are routing constrained and not
| floorspace constrained. You can build tons of features but
| getting them all power and normalized signals is an absolute
| chore.
| atq2119 wrote:
| Judging from the diagrams, the loop buffer is using the same
| storage as the micro-op queue that's there anyway. If that is
| accurate (and it does seem plausible), then the area cost is
| just some additional control logic. I suspect the most
| expensive part is detecting a loop in the first place, but
| that's probably quite small compared to the size of the queue.
| eqvinox wrote:
| > Strangely, the game sees a 5% performance loss with the loop
| buffer disabled when pinned to the non-VCache die. I have no
| explanation for this, [...]
|
| With more detailed power measurements, it could be possible to
| determine if this is thermal/power budget related? It does sound
| like the feature was intended to conserve power...
| eek2121 wrote:
| He didn't provide enough detail here. The second CCD on a Ryzen
| chip is not as well binned as the first one even on. non-X3D
| chips. Also, EVERY chip is different.
|
| Most of the cores on CCD0 of my non-X3D chip hit 5.6-5.75ghz.
| CCD 1 has cores topping out at 5.4-5.5ghz.
|
| V-Cache chips for Zen 4 have a huge clock penalty, however the
| Cache more than makes up for it.
|
| Did he test CCD1 on the same chip with both the feature
| disabled and enabled? Did he attempt to isolate other changes
| like security fixes as well? He admitted "no" in his article.
|
| The only proper way to test would be to find a way to disable
| the feature on a bios that has it enabled and test both
| scenarios across the same chip, and even then the result may
| still not be accurate due to other possible branch conditions.
| A full performance profile could bring accuracy, but I suspect
| only an AMD engineer could do that...
| clamchowder wrote:
| Yes, I tested on CCD1 (the non-vcache CCD) on both BIOS
| versions.
| ryao wrote:
| He mentioned that it was disabled somewhere between the two
| UEFI versions he tested. Presumably there are other changes
| included, so his measurements are not strict A/B testing.
| Pannoniae wrote:
| From another article:
|
| "Both the fetch+decode and op cache pipelines can be active at
| the same time, and both feed into the in-order micro-op queue.
| Zen 4 could use its micro-op queue as a loop buffer, but Zen 5
| does not. I asked why the loop buffer was gone in Zen 5 in side
| conversations. They quickly pointed out that the loop buffer
| wasn't deleted. Rather, Zen 5's frontend was a new design and the
| loop buffer never got added back. As to why, they said the loop
| buffer was primarily a power optimization. It could help IPC in
| some cases, but the primary goal was to let Zen 4 shut off much
| of the frontend in small loops. Adding any feature has an
| engineering cost, which has to be balanced against potential
| benefits. Just as with having dual decode clusters service a
| single thread, whether the loop buffer was worth engineer time
| was apparently "no"."
| londons_explore wrote:
| The article seems to suggest that the loop buffer provides no
| performance benefit and no power benefit.
|
| If so, it might be a classic case of "Team of engineers spent
| months working on new shiny feature which turned out to not
| actually have any benefit, but was shipped anyway, possibly so
| someone could save face".
|
| I see this in software teams when someone suggests it's time to
| rewrite the codebase to get rid of legacy bloat and increase
| performance. Yet, when the project is done, there are more lines
| of code and performance is worse.
|
| In both cases, the project shouldn't have shipped.
| adgjlsfhk1 wrote:
| > but was shipped anyway, possibly so someone could save face
|
| no. once the core has it and you realize it doesn't help much,
| it absolutely is a risk to remove it.
| glzone1 wrote:
| No kidding. I was adjacent to a tape out w some last minute
| tweaks - ugh. The problem is the current cycle time is very
| slow and costly and u spend as much time validating things as
| you do designing. It's not programming.
| hajile wrote:
| If you work on a critical piece of software (especially one
| you can't update later), you absolutely can spend way more
| time validating than you do writing code.
|
| The ease of pushing updates encourages lazy coding.
| chefandy wrote:
| > The ease of pushing updates encourages lazy coding.
|
| Certainly in some cases, but in others, it just shifts
| the economics: Obviously, fault tolerance can be
| laborious and time consuming, and that time and labor is
| taken from something else. When the natures of your dev
| and distribution pipelines render faults less disruptive,
| and you have a good foundational codebase and code review
| process that pay attention to security and core
| stability, quickly creating 3 working features can be
| much, much more valuable than making sure 1 working
| feature will never ever generate a support ticket.
| magicalhippo wrote:
| Once interviewed at a place which made sensors that was
| used a lot in the oil industry. Once you put a sensor on
| the bottom of the ocean 100+ meters (300+ feet) down,
| they're not getting serviced any time soon.
|
| They showed me the facilities, and the vast majority was
| taken up by testing and validation rigs. The sensors would
| go through many stages, taking several weeks.
|
| The final stage had an adjacent room with a viewing window
| and a nice couch, so a representative for the client could
| watch the final tests before bringing the sensors back.
|
| Quite the opposite to the "just publish a patch" mentality
| that's so prevalent these days.
| oefrha wrote:
| > It's not programming.
|
| Even for software it's often risky to remove code once it's
| in there. Lots of software products are shipped with tons
| of unused code and assets because no one's got time to
| validate nothing's gonna go wrong when you remove them.
| Check out some game teardowns, they often have dead assets
| from years ago, sometimes even completely unrelated things
| from the studio's past projects.
|
| Of course it's 100x worse for hardware projects.
| gtirloni wrote:
| And that's another reason for tackling technical debt
| early on because once it compounds, no one is ever
| touching that thing.
| akira2501 wrote:
| > but was shipped anyway, possibly so someone could save face
|
| Was shipped anyway because it can be disabled with a firmware
| update and because drastically altering physical hardware
| layouts mid design was likely to have worse impacts.
| readyplayernull wrote:
| That bathroom with a door to the kitchen.
| eek2121 wrote:
| Well that and changing a chip can take years due to
| redesigning, putting through validation, RTM, and time to
| create.
|
| Building chips is a multiyear process and most folks don't
| understand this.
| usrusr wrote:
| What you describe would be shipped physically but disabled,
| and that certainly happens a lot. For exactly those reasons.
| What GP described was shipped not only physically present but
| also not even disabled, because politics. That would be a
| very different thing.
| ksaj wrote:
| "the project shouldn't have shipped."
|
| Tell that to the share holders. As a public company, they can
| very quickly lose enormous amounts of money by being behind or
| below on just about anything.
| sweetjuly wrote:
| The article also mentions they had trouble measuring power
| usage in general so we can't necessarily (and, really,
| shouldn't) conclude that it has no impact whatsoever. I highly
| doubt that AMD's engineering teams are so unprincipled as to
| allow people to add HW features with no value (why would you
| dedicate area and power to a feature which doesn't do
| anything?), and so I'm inclined to give them the benefit of the
| doubt here and assume that Chips 'n Cheese simply couldn't
| measure the impact.
| clamchowder wrote:
| Note - I saw the article through from start to finish. For
| power measurements I modified my memory bandwidth test to
| read AMD's core energy status MSR, and modified the
| instruction bandwidth testing part to create a loop within
| the test array. (https://github.com/clamchowder/Microbenchmar
| ks/commit/6942ab...)
|
| Remember most of the technical analysis on Chips and Cheese
| is a one person effort, and I simply don't have infinite free
| time or equipment to dig deeper into power. That's why I
| wrote "Perhaps some more mainstream tech outlets will figure
| out AMD disabled the loop buffer at some point, and do
| testing that I personally lack the time and resources to
| carry out."
| kimixa wrote:
| > engineering teams are so unprincipled as to allow people to
| add HW features with no value
|
| This is often pretty common, as the performance
| characteristics are often unknown until late in the hardware
| design cycle - it would be "easy" if each cycle was just
| changing that single unit with everything else static, but
| that isn't the case as everything is changing around it. And
| then by the time you've got everything together complete
| enough to actually test end-to-end pipeline performance,
| removing things is often the riskier choice.
|
| And that's before you even get to the point of low-level
| implementation/layout/node specific optimizations, which can
| then again have somewhat unexpected results on frequency and
| power metrics.
| saagarjha wrote:
| Only on Hacker News will you get CPU validation fanfiction.
| EVa5I7bHFq9mnYK wrote:
| >> when the project is done, there are more lines of code and
| performance is worse
|
| There is an added benefit though - that the new programmers now
| are fluent in the code base. That benefit might be worth more
| than LOCs or performance.
| iforgotpassword wrote:
| Well the other possibility is that the power benchmarks are
| accurate: the buffer did save power, but then they figured out
| an even better optimization on the microcodes level that would
| make the regular path save even more power, so the buffer
| actually became a power hog.
| 01100011 wrote:
| Working at.. a very popular HW company.. I'll say that we(the
| SW folks) are currently obsessed with 'doing something' even if
| the thing we're doing hasn't fully been proven to have benefits
| outside of some narrow use cases or targeted benchmarks. It's
| very frustrating, but no one wants to put the time in to do the
| research up front. It's easier to just move forward with a new
| project because upper management stays happy and doesn't ask
| questions.
| usrusr wrote:
| Is it that expectation of major updates coming in at a fixed
| cycle? Not only expected by upper management but also by end
| users? That's a difficult trap to get out of.
|
| I wonder if that will be the key benefit of Google's switch
| to two "major" Android releases each year: it will get people
| used to nothing newsworthy happening within a version
| increment. And I also wonder if that's intentional, and my
| guess is not the tiniest bit.
| markus_zhang wrote:
| Do you have new software managers/directors who are
| encouraging such behavior? From my experience new leaders
| tend to lean on this tactics to grab power.
| weinzierl wrote:
| _" The article seems to suggest that the loop buffer provides
| no performance benefit and no power benefit."_
|
| It tests the performance benefit hypothesis in different
| scenarios and does not find evidence that supports it. It makes
| _one_ best effort attempt to test the power benefit hypothesis
| and concludes it with: _" Results make no sense."_
|
| I think the real take-away is that performance measurements
| without considering power tell only half the story. We came a
| long way when it comes to the performance measurement half but
| power measurement is still hard. We should work on that.
| firebot wrote:
| The article clearly articulates that there's no performance
| benefit. However there's efficiency. It reduces power
| consumption.
| hinkley wrote:
| Someone elsewhere quotes a game specific benchmark of about
| 15%. Which will mostly matter when your FPS starts to make game
| play difficult.
|
| There will be a certain number of people who will delay an
| upgrade a bit more because the new machines don't have enough
| extra oomph to warrant it. Little's Law can apply to finance
| when it's interval between purchases.
| londons_explore wrote:
| In the "power" section, it seems the analysis doesn't divide by
| the number of instructions executed per second.
|
| Energy used per instruction is almost certainly the metric that
| should be considered to see the benefits of this loop buffer, not
| energy used per second (power, watts).
| eek2121 wrote:
| Every instruction takes a different amount of clock cycles (and
| this varies between architectures or iterations of an
| architecture such as Zen 4-Zen 5), so that is not feasible
| unless running the workload produced the exact same
| instructions per cycle, which is impossible due to multi
| threading/tasking. Even order and the contents of RAM matters
| since both can change everything.
|
| While you can somewhat isolate for this by doing hundreds of
| runs for both on and off, that takes tons of time and still
| won't be 100% accurate.
|
| Even disabling the feature can cause the code to use a
| different branch which may shift everything around.
|
| I am not specifically familiar with this issue, but I have seen
| cases where disabling a feature shifted the load from integer
| units to the FPU or the GPU as an example, or added 2
| additional instructions while taking away 5.
| rasz wrote:
| Anecdotally one of very few differences between 1979 68000 and
| 1982 68010 was addition of "loop mode", a 6 byte Loop Buffer :)
| crest wrote:
| Much more importantly they fixed the MMU support. The original
| 68000 lost some state required to recover from a page fault the
| workaround was ugly and expensive: run two CPUs "time shifted"
| by one cycle and inject a recoverable interrupt on the second
| CPU. Apparently it was still cheaper than the alternatives at
| the time if you wanted a CPU with MMU, a 32 bit ISA and a 24
| bit address bus. Must have been a wild time.
| phire wrote:
| _> run two CPUs "time shifted" by one cycle and inject a
| recoverable interrupt on the second CPU._
|
| That's not quite how it was implemented.
|
| Instead, the second 68000 was halted and disconnected from
| the bus until the first 68000 (the executor) trigged a fault.
| Then the first 68000 would be held in halt, disconnected from
| the bus and the second 68000 (the fixer) would take over the
| bus to run the fault handler code.
|
| After the fault had been handled, the first 68000 could be
| released from halt and it would resume execution of the
| instruction, with all state intact.
|
| As for the cost of a second 68000, extra logic and larger
| PCBs? Well, the of the Motorola 68451 MMU (or equivalent)
| absolutely dwarfed the cost of everything else, so adding a
| second CPU really wasn't a big deal.
|
| Technically it didn't need to be another 68000, any CPU would
| do. But it's simpler to use a single ISA.
|
| For more details, see Motorola's application note here: http:
| //marc.retronik.fr/motorola/68K/68000/Application%20Not...
| Dylan16807 wrote:
| That's neat. For small loop buffers, I quite like the
| GreenArrays forth core. It has 18 bit words that hold 4
| instructions each, and one of the opcodes decrements a loop
| counter and goes back to the start of the word. And it can run
| appreciably faster while it's doing that.
| ack_complete wrote:
| The loop buffer on the 68010 was almost useless, because not
| only was it only 6 bytes, it only held two instructions. One
| had to be the loop instruction (DBcc), so the loop body had to
| be a single instruction. Pretty much the only thing it could
| speed up in practice was an unoptimized memcpy.
| eek2121 wrote:
| It sounds to me like it was too small to make any real difference
| except in very specific scenarios and a larger one would have
| been too expensive to implement compared to the benefit.
|
| That being said, some workloads will see a small regression,
| however AMD has made some small performance improvements since
| launch.
|
| They should have just made it a BIOS option for Zen 4. The fact
| they do not appear to have done so does indicate the possibility
| of a bug or security issue.
| crest wrote:
| Them *quietly* disabling a feature that few users will notice
| yet complicates the frontend suggests they pulled this chicken
| bit because they wanted to avoid or delay disclosing a hardware
| bug to the general public, but already push the mitigation.
| Fucking vendors! Will they ever learn? _sigh_
| whaleofatw2022 wrote:
| Devils advocate... if this is being actively exploited or is
| easily exploitable, the delay in announcement can prevent
| other actions.
| dannyw wrote:
| Every modern CPU has dozens of hardware bugs that aren't
| disclosed and quietly patched away or not mentioned.
| BartjeD wrote:
| Quitely disabling it is also a big risk. Because you're
| signalling that in all probablity you were aware of the
| severity of the issue; Enough so that you took steps to patch
| it.
|
| If you don't disclose the vulnerability then affected parties
| cannot start taking countermeasures, except out of sheer
| paranoia.
|
| Disclosing a vulnerability is a way shift liability onto the
| end user. You didn't update? Then don't complain. Only rarely
| do disclosures lead to product liability. I don't remember
| this (liability) happening with Meltdown and Spectre either.
| So wouldn't assume this is AMD being secretive.
| shantara wrote:
| This is a wild guess, but could this feature be disabled in an
| attempt at preventing some publicly undisclosed hardware
| vulnerability?
| throw_away_x1y2 wrote:
| Bingo.
|
| I can't say more. :(
| pdimitar wrote:
| Have we learned nothing from Spectre and Meltdown?... :(
| gpderetta wrote:
| Complex systems are complex?
| pdimitar wrote:
| Sadly you're right. And obviously we're not about to give
| up on high IPC. I get it and I'm not judging -- it's just
| a bit saddening.
| StressedDev wrote:
| A lot has been learned. Unfortunately, people still make
| mistakes and hardware will continue to have security
| vulnerabilities.
| sweetjuly wrote:
| I imagine this is more of a functional issue. i.e., the
| loop buffer caused corruption of the instruction stream
| under some weird specific circumstances. Spectre and
| Meltdown are not functional issues but rather just side
| channel issues.
|
| This should be fun, however, for someone with enough time
| to chase down and try and find the bug. Depending on the
| consequences of the bug and the conditions under which it
| hits, maybe you could even write an exploit (either going
| from JavaScript to the browser or from user mode to the
| kernel) with it :) Though, I strongly suspect that reverse
| engineering and weaponizing the bug without any insider
| knowledge will be exceedingly difficult. And, anyways,
| there's also a decent chance this issue just leads to a
| hang/livelock/MCE which would make it pointless to exploit.
| rincebrain wrote:
| The problem is that we're more or less stuck with this
| class of problem unless we end up with something that looks
| like a Xeon Phi without shared resources and run
| calculations on many, many truly independent cores, or we
| accept that the worst and best case performance cases are
| identical (which I don't foresee anyone really agreeing
| to).
|
| Or, framed differently, if Intel or AMD announced a new
| gamer CPU tomorrow that was 3x faster in most games but
| utterly unsafe against all Meltdown/Spectre-class vulns,
| how fast do you think they'd sell out?
| thechao wrote:
| Larabee was fun to program, but I think it'd have an even
| worse time hardening memory sideband effects: the barrel
| processor (which was necessary to have anything like
| reasonable performance) was humorously easy to use for
| cross-process exfiltration. Like... it was so easy, we
| actually used it as an IPC mechanism.
| wheybags wrote:
| > it was so easy, we actually used it as an IPC
| mechanism.
|
| Can you elaborate on that? It sounds interesting
| thechao wrote:
| Now you're asking me technical details from more than a
| decade ago. My recollection is that you could map one of
| the caches between cores -- there were uncached-write-
| through instructions. By reverse engineering the cache's
| hash, you could write to a specific cache-line; the uc-
| write would push it up into the correct line and the
| "other core" could snoop that line from its side with a
| lazy read-and-clear. The whole thing was janky-AF, but
| way the hell faster than sending a message around the
| ring. (My recollection was that the three interlocking
| rings could make the longest-range message take hundreds
| of cycles.)
| rincebrain wrote:
| Sure, absolutely, there's large numbers of additional
| classes of side effects you would need to harden against
| if you wanted to eliminate everything, I was mostly
| thinking specifically of something with an enormous
| number of cores without the 4-way SMT as a high-level
| description.
|
| I was always morbidly curious about programming those,
| but never to the point of actually buying one, and I
| always had more things to do in the day than time in past
| life when we had a few of the cards in my office.
| magicalhippo wrote:
| We already have heterogeneous cores these days, with E
| and P, and we have a ton of them as they take little
| space on the die relative to cache. The solution, it
| seems to me, is to have most cores go brrrrrr and a few
| that are secure.
|
| Given that we have effectively two browser platforms
| (Chromium and Firefox) and two operating systems to
| contend with (Linux and Windows), it seems entirely
| tractable to get the security sensitive threads scheduled
| to the "S cores".
| astrange wrote:
| That's a secure enclave aka secure element aka TPM. Once
| you start wanting security you usually think up enough
| other features (voltage glitching prevention, memory
| encryption) that it's worth moving it off the CPU.
| Dylan16807 wrote:
| That's a wildly different type of security. I just want
| to sandbox some code, not treat the entire world as
| hostile.
| ggu7hgfk8j wrote:
| The main security boundary a modern computer upholds is
| web vs everything else, including protecting one webpage
| from another.
|
| So I think it should be the javascript that should run on
| these hypothetical cores.
|
| Though perhaps a few other operations might choose to use
| them as well.
| nine_k wrote:
| Also all the TLS, SSH, Wireguard and other encryption,
| anything with long-persisted secret information.
| Everything else, even secret (like displayed OTP codes)
| is likely too fleeting for a snooping attack to be able
| to find and exfiltrate it, even if an exfiltration
| channel remains. Until a better exfiltration method is
| found, of course :-(
|
| I think we're headed towards the future of many highly
| insulated computing nodes that share little if anything.
| Maybe they'd have a faster way to communicate, e.g. by
| remapping fast cache-like memory between cores, but that
| memory would never be uncontrollably shared the way cache
| lines are now.
| lukan wrote:
| "if Intel or AMD announced a new gamer CPU tomorrow that
| was 3x faster in most games but utterly unsafe against
| all Meltdown/Spectre-class vulns, how fast do you think
| they'd sell out"
|
| Well, many people have gaming computers, they won't use
| for anything serious. So I would also buy it. And in
| restricted gaming consoles, I suppose the risk is not too
| high?
| sim7c00 wrote:
| you mean those consoles that can attack the rest of your
| devices and your neighbours via its wireless chips?
| ggu7hgfk8j wrote:
| Speculation attacks enables code running on the machine
| to access data it shouldn't. I don't see how that relates
| to your scenario.
| formerly_proven wrote:
| Consoles are hardened very well to prevent homebrew,
| cracking, cheating etc.
| dcow wrote:
| But this class of vuln is about data leaking between
| users in a multi-user system.
| wongarsu wrote:
| Isn't it rather about data leaks between any two
| processes? Whether those two processes belong to
| different users is a detail of the threat model and the
| OS's security model. In a console it could well be about
| data leaks between a game with code-injection
| vulnerability and the OS or DRM system.
| alexvitkov wrote:
| They're a pain in the ass all around. Spectre allowed you
| to read everything paged in (including kernel memory)
| from JS in the browser.
|
| To mitigate it browsers did a bunch of hacks, including
| nerfing precision on all timer APIs and disabling shared
| memory, because you need an accurate timer for the
| exploit - to this day performance.now() rounds to 1MS on
| firefox and 0.1MS on Chrome.
|
| This 1MS rounding funnily is a headache for me right as
| we speak. On a say 240Hz monitor, for video games you
| need to render a frame every ~4.16ms -- 1ms precision is
| not enough for accurate ticker -- even if you render your
| frames on time, the result can't be perfectly smooth as
| the browser doesn't give an accurate enough timer by
| which to advance your physics every frame.
| izacus wrote:
| Also, many games today outright install rootkits to
| monitor your memory (see [1]) - some heartbleed is so far
| down the line of credible threats on a gaming machine
| that its outright ludicrous to trade off performance for
| it.
|
| [1]:https://www.club386.com/assassins-creed-shadows-drm-
| wants-to...
| nottorp wrote:
| > how fast do you think they'd sell out?
|
| 10-20 min, depending on how many they make :)
| jorvi wrote:
| Also, a good chunk of these vulnerabilities (Retbleed,
| Downfall, Rowhammer, there's probably a few I'm
| forgetting) are either theoretical, lab-only or spear
| exploits that require a lot of setup. And then the
| leaking info from something like Retbleed mostly applies
| to shared machines like in cloud infrastructure.
|
| Which makes it kind of terrible that the kernel has these
| mitigations turned on by default, stealing somewhere in
| the neighborhood of 20-60% of performance on older gen
| hardware, just because the kernel has to roll with "one
| size fits all" defaults.
| nine_k wrote:
| If you know what you're doing, you do something like
| this: https://gist.github.com/jfeilbach/f06bb8408626383a0
| 83f68276f... and make Linux fast again (c).
|
| If you don't know what kernel parameters are and what do
| they affect, it's likely safer to go with all the
| mitigations enabled by default :-|
| jorvi wrote:
| Yeah, I personally have Retbleed and Downfall mitigations
| disabled, the rest thankfully doesn't severely affect my
| CPU performance.
|
| Appreciate sharing the gist though!
| DSingularity wrote:
| I don't think you are thinking of this right. One bit of
| leakage makes it half as hard to break encryption via
| brute force. It's a serious problem. The defaults are
| justified.
|
| I think things will only shift once we have systems they
| ship with fully sandboxes that are minimally optimized
| and fully isolated. Until then we are forced to assume
| the worst.
| jorvi wrote:
| > I don't think you are thinking of this right. One bit
| of leakage makes it half as hard to break encryption via
| brute force.
|
| The problem is that you need to execute on the system,
| then need to know which application you're targeting,
| then figure out the timings, and even then you're not
| certain you are getting the bits you want.
|
| Enabling mitigations For servers? Sure. Cloud servers?
| Definitely. High profile targets? Go for it.
|
| The current defaults are like foisting iOS its "Lockdown
| Mode" on all users by default and then expecting them to
| figure out how to turn it off, except you have to do it
| by connecting it to your Mac/PC and punching in a bunch
| of terminal commands.
|
| Then again, almost all kernel settings are server-optimal
| (and even then, 90s server optimal). There should
| honestly should be some serious effort to modernize the
| defaults for reasonably modern servers, and then also
| have a separate kernel for desktops (akin to CachyOS,
| just more upstream).
| int0x29 wrote:
| Itanium allegedly was free from branch prediction issues
| but I suspect cache behavior still might have been an
| issue. Unfortunately it's also dead as a doornail.
| Am4TIfIsER0ppos wrote:
| We learned that processor manufacturers love "bugs" that
| get solved by making them or your code slower giving you
| incentive to buy a newer one to restore the performance.
| shepherdjerred wrote:
| This seems unnecessarily cynical. Are you saying
| Intel/AMD are intentionally crippling CPUs?
| bobmcnamara wrote:
| I'm not saying Intel intentionally limited CPUs, just
| that they have intentionally limited a lot of things and
| lied about it in the past.
|
| https://www.ftc.gov/news-events/news/press-
| releases/2010/08/...
| tedunangst wrote:
| I was told the lesson is to avoid Intel and only buy AMD
| because they don't make mistakes.
| UberFly wrote:
| No one said to buy AMD because they don't make mistakes.
| AMD just currently makes a better product overall.
| Dylan16807 wrote:
| I do not think you are accurately recounting what people
| said.
| PittleyDunkin wrote:
| I'm still not convinced most of the computers in my home
| need to care about leaking data this way. I'm open to being
| persuaded, though.
| pdimitar wrote:
| I am not convinced either but I am willing to bet some
| software is adversarial and will try to exfiltrate data.
| F.ex. many people look suspiciously at Zoom and Chrome.
|
| So as long as stuff is not perfectly isolated from each
| other then there's always a room for a bad actor to snoop
| on stuff.
| api wrote:
| For most of these vulnerabilities the risk is low, but
| keep in mind that your web browser runs random untrusted
| code from all over the Internet in a VM with a JIT
| compiler. This means you can't rule out the possibility
| that someone will figure out a way to exploit this over
| the web reliably, which would be catastrophic.
|
| "Attacks only get better."
| RobotToaster wrote:
| We should have learnt from the fdiv bug[0] that processor
| manufacturers need to be mandated to recall faulty
| hardware.
|
| [0] https://en.wikipedia.org/wiki/Pentium_FDIV_bug
| nine_k wrote:
| It depends on the severity of the problem, and the impact
| on the _customers_ already using these systems. It may be
| more economical for the customer to apply a patch and
| lose a few percent of peak performance than to put
| thousands of boxes offline and schedule personnel to swap
| CPUs. This is to say nothing of the hassle of bringing
| your new laptop to a service center, and taking a
| replacement, or waiting if your exact configuration is
| unavailable at the moment.
| aseipp wrote:
| This might come as a shock, but I can assure you that the
| designing high end microprocessors have probably forgotten
| more about these topics than most of the people here have
| ever known.
| pdimitar wrote:
| Huh?
| bell-cot wrote:
| The Article more-or-less speculates that:
|
| > Zen 4 is AMD's first attempt at putting a loop buffer into a
| high performance CPU. Validation is always difficult,
| especially when implementing a feature for the first time. It's
| not crazy to imagine that AMD internally discovered a bug that
| no one else hit, and decided to turn off the loop buffer out of
| an abundance of caution. I can't think of any other reason AMD
| would mess with Zen 4's frontend this far into the core's
| lifecycle.
| bhouston wrote:
| Yeah, my first thoughts too.
| BartjeD wrote:
| Quitely disabling it is also a big risk. Because you're
| signalling that in all probablity you were aware of the
| severity of the issue; Enough so that you took steps to patch
| it.
|
| If you don't disclose the vulnerability then affected parties
| cannot start taking countermeasures, except out of sheer
| paranoia.
|
| Disclosing a vulnerability is a way shift liability onto the
| end user. You didn't update? Then don't complain. Only rarely
| do disclosures lead to product liability. I don't remember this
| (liability) happening with Meltdown and Spectre either. So
| wouldn't assume this is AMD being secretive.
| wtallis wrote:
| Please don't post duplicate comments like this. Your first
| comment (https://news.ycombinator.com/item?id=42287118) was
| fine but spamming a thread with copy-and-pasted comments just
| hurts the signal to noise ratio.
| alexnewman wrote:
| Im confused would you be ok if he's addressing the same
| point in the forum with a slightly different sentence?
| wtallis wrote:
| Any threaded discussion carries the risk of different
| subthreads ending up in the same place. The simplest
| solution is to just _not post twice_ , and trust that the
| reader can read the rest of the thread; HN threads
| usually don't get long enough for good comments to get
| too buried, and not duplicating comments helps avoid that
| problem. If there's something slightly different, it may
| be worth linking to another comment in a different
| subthread and adding a few sentences to cover the
| differences. Copying a whole comment is never a good
| answer, and re-wording it to obscure the fact that it's
| not saying anything new is also bad. New comments should
| have something new to say.
| lukan wrote:
| Posting a link of the other replay is a solution as well.
| hinkley wrote:
| I would get confused handling follow-ups to both copies.
|
| I have enough trouble if someone responds to my responses
| in a tone similar to GP and I end up treating them like
| the same person (eg, GP makes a jab and now I'm snarky 9r
| call out the wrong person). Especially if I have to step
| away to deal with life.
| hobobaggins wrote:
| And, just like that, you turned the rest of this thread
| into a meta discussion about HN rather than about the
| topic. It's ironic, because that really hurt the SNR more
| than a duplicated, but on-topic, comment.
| immibis wrote:
| The countermeasure is to disable the loop buffer. Everyone
| who wants to protect themselves from the unknown
| vulnerability should disable the loop buffer. Once everyone's
| done that or had a reasonable opportunity to do that, it can
| be safely published.
| jdiff wrote:
| There's no real impetus except paranoia if the change is
| unannounced. You don't have to detail the vulnerability,
| just inform people that somewhere, one exists, and that
| this _is_ in fact a countermeasure. Without doing that, you
| don 't shift liability, you don't actually get people out
| of harm's way, you don't really benefit at all.
| baq wrote:
| Indeed, it might be the case that there's more than that
| disabled, since numbers are somewhat surprising:
|
| > Still, the Cyberpunk 2077 data bothers me. Performance
| counters also indicate higher average IPC with the loop buffer
| enabled when the game is running on the VCache die.
| Specifically, it averages 1.25 IPC with the loop buffer on, and
| 1.07 IPC with the loop buffer disabled. And, there is a tiny
| performance dip on the new BIOS.
|
| Smells of microcode mitigations if you ask me, but naturally
| let's wait for the CVE.
| hinkley wrote:
| Or another logic bug. We haven't had a really juicy one in a
| while.
| ksec wrote:
| Wondering if Loop Buffer is still there with Zen 5?
|
| ( Idly waiting for x86 to try and compete with ARM on efficiency.
| Unfortunately I dont see Zen 6 or Panther Lake getting close. )
| monocasa wrote:
| It is not.
| CalChris wrote:
| If it saved power wouldn't that lead to less thermal throttling
| and thus improved performance? That power had to matter in the
| first place or it wouldn't have been worth it in the first place.
| kllrnohj wrote:
| Not necessarily. Let's say this optimization can save 0.1w in
| certain situations. If one of those situations is common when
| the chip is idle just keeping wifi alive, well hey that's 0.1w
| in a ~1w total draw scenario, that's 10% that's huge!
|
| But when the CPU is pulling 100w under load? Well now we're
| talking an amount so small it's irrelevant. Maybe with a well
| calibrated scope you could figure out if it was on or not.
|
| Since this is in the micro-op queue in the front end, it's
| going to be more about that very low total power draw side of
| things where this comes into play. So this would have been
| something they were doing to see if it helped for the laptop
| skus, not for the desktop ones.
| Out_of_Characte wrote:
| You're probaly right on the mark with this. Though even
| desktops and servers can benefit from lower idle power draw.
| So there is a chance that it might have been moved to a
| different c-state.
| mleonhard wrote:
| It looks like they disabled a feature flag. I didn't expect to
| see such things in CPUs.
| astrange wrote:
| They have lots of them (called "chicken bits"). Some of them
| have BIOS flags, some don't.
|
| It's very very expensive to fix a bug in a CPU, so it's easier
| to expose control flags or microcode so you can patch it out.
| fulafel wrote:
| Interesting that in the Cortex-A15 this is a "key design
| feature". Are there any numbers about its effect other chips?
|
| I guess this could also be used as an optimization target at
| least on devices that are more long lived designs (eg consoles).
| nwallin wrote:
| I'm curious about this too. I would expect any RISC
| architecture to gain relatively little from a loop buffer. The
| point of RISC is that instruction fetch/decode is substantially
| easier, if not trivial.
| Loic wrote:
| For me the most interesting paragraph in the article is:
|
| > Perhaps the best way of looking at Zen 4's loop buffer is that
| it signals the company has engineering bandwidth to go try
| things. Maybe it didn't go anywhere this time. But letting
| engineers experiment with a low risk, low impact feature is a
| great way to build confidence. I look forward to seeing more of
| that confidence in the future.
| Neywiny wrote:
| I have a 7950x3d. It's my upgrade from.... Skylake's 6700k. I
| guess I'm subconsciously drawn to chips with hardware loop
| buffers disabled by software.
___________________________________________________________________
(page generated 2024-12-01 23:00 UTC)