[HN Gopher] Load-Store Conflicts
___________________________________________________________________
Load-Store Conflicts
Author : ashvardanian
Score : 70 points
Date : 2025-05-04 17:18 UTC (5 hours ago)
(HTM) web link (zeux.io)
(TXT) w3m dump (zeux.io)
| haberman wrote:
| A very interesting article that goes deeper into store-to-load
| forwarding than anything I've read before.
| Sesse__ wrote:
| I find Clang generally a bit too eager to combine loads. This is
| especially bad when returning structs through the stack; you
| typically write them piecemeal in some function, return, and then
| the caller often wants to copy it from the stack into somewhere
| else, which it does with SIMD loads/stores.
|
| This is a significant problem on AMD; Intel and Apple seems to be
| better.
| cwzwarich wrote:
| > This is a significant problem on AMD; Intel and Apple seems
| to be better.
|
| When did this change? In my testing years ago (while I was
| writing Rosetta 2, so Icelake-era Intel), Intel only allowed a
| load to forward from a single store, and no partial forwarding
| (i.e. mixed cache/register) without a huge penalty, whereas AMD
| at least allowed partial forwarding (or had a considerably
| lower penalty than Intel).
| Sesse__ wrote:
| I don't know if AMD allows more or fewer _situations_, but
| empirically, I'm seeing a lot of total cycles lost to this on
| Zen 2 and 3, and much less on the Intel CPUs I've been
| testing (mostly Skylake derivatives and Alder Lake).
|
| I haven't tested Zen 4 or 5, but I haven't heard anything
| that indicates they should be a lot better.
| cwzwarich wrote:
| Interesting! IIRC, the LLVM passes dedicated to dodging
| this issue were contributed by Intel engineers, so maybe
| there's some bias.
___________________________________________________________________
(page generated 2025-05-04 23:00 UTC)