[HN Gopher] AMD Disables Zen 4's Loop Buffer
___________________________________________________________________
AMD Disables Zen 4's Loop Buffer
Author : luyu_wu
Score : 65 points
Date : 2024-11-30 20:47 UTC (2 hours ago)
(HTM) web link (chipsandcheese.com)
(TXT) w3m dump (chipsandcheese.com)
| syntaxing wrote:
| Interesting read, one thing I don't understand is how much space
| does loop buffer take on the die? I'm curious with it removed, on
| future chips could you use the space for something more useful
| like a bigger L2 cache?
| progbits wrote:
| It says 144 micro-op entries per core. Not sure how many bytes
| that is, but L2 caches these days are around 1MB per core, so
| assuming the loop buffer die space is mostly storage (sounds
| like it) then it wouldn't make a notable difference.
| Remnant44 wrote:
| My understanding is that it's a pretty small optimization on
| the front end. It doesn't have a lot of entries to begin with
| (144) so the amount of space saved is probably negligible.
| Theoretically, the loop buffer would let you save power or
| improve performance in a tight loop. In practice, it doesn't
| seem to do either, and AMD removed it completely for Zen 5.
| akira2501 wrote:
| I think most modern chips are routing constrained and not
| floorspace constrained. You can build tons of features but
| getting them all power and normalized signals is an absolute
| chore.
| eqvinox wrote:
| > Strangely, the game sees a 5% performance loss with the loop
| buffer disabled when pinned to the non-VCache die. I have no
| explanation for this, [...]
|
| With more detailed power measurements, it could be possible to
| determine if this is thermal/power budget related? It does sound
| like the feature was intended to conserve power...
| Pannoniae wrote:
| From another article:
|
| "Both the fetch+decode and op cache pipelines can be active at
| the same time, and both feed into the in-order micro-op queue.
| Zen 4 could use its micro-op queue as a loop buffer, but Zen 5
| does not. I asked why the loop buffer was gone in Zen 5 in side
| conversations. They quickly pointed out that the loop buffer
| wasn't deleted. Rather, Zen 5's frontend was a new design and the
| loop buffer never got added back. As to why, they said the loop
| buffer was primarily a power optimization. It could help IPC in
| some cases, but the primary goal was to let Zen 4 shut off much
| of the frontend in small loops. Adding any feature has an
| engineering cost, which has to be balanced against potential
| benefits. Just as with having dual decode clusters service a
| single thread, whether the loop buffer was worth engineer time
| was apparently "no"."
| londons_explore wrote:
| The article seems to suggest that the loop buffer provides no
| performance benefit and no power benefit.
|
| If so, it might be a classic case of "Team of engineers spent
| months working on new shiny feature which turned out to not
| actually have any benefit, but was shipped anyway, possibly so
| someone could save face".
|
| I see this in software teams when someone suggests it's time to
| rewrite the codebase to get rid of legacy bloat and increase
| performance. Yet, when the project is done, there are more lines
| of code and performance is worse.
|
| In both cases, the project shouldn't have shipped.
| adgjlsfhk1 wrote:
| > but was shipped anyway, possibly so someone could save face
|
| no. once the core has it and you realize it doesn't help much,
| it absolutely is a risk to remove it.
| akira2501 wrote:
| > but was shipped anyway, possibly so someone could save face
|
| Was shipped anyway because it can be disabled with a firmware
| update and because drastically altering physical hardware
| layouts mid design was likely to have worse impacts.
| londons_explore wrote:
| In the "power" section, it seems the analysis doesn't divide by
| the number of instructions executed per second.
|
| Energy used per instruction is almost certainly the metric that
| should be considered to see the benefits of this loop buffer, not
| energy used per second (power, watts).
| rasz wrote:
| Anecdotally one of very few differences between 1979 68000 and
| 1982 68010 was addition of "loop mode", a 6 byte Loop Buffer :)
___________________________________________________________________
(page generated 2024-11-30 23:00 UTC)