hngopher.com

       [HN Gopher] Load-Store Conflicts
       ___________________________________________________________________
        
       Load-Store Conflicts
        
       Author : ashvardanian
       Score  : 70 points
       Date   : 2025-05-04 17:18 UTC (5 hours ago)
        
 (HTM) web link (zeux.io)
 (TXT) w3m dump (zeux.io)
        
       | haberman wrote:
       | A very interesting article that goes deeper into store-to-load
       | forwarding than anything I've read before.
        
       | Sesse__ wrote:
       | I find Clang generally a bit too eager to combine loads. This is
       | especially bad when returning structs through the stack; you
       | typically write them piecemeal in some function, return, and then
       | the caller often wants to copy it from the stack into somewhere
       | else, which it does with SIMD loads/stores.
       | 
       | This is a significant problem on AMD; Intel and Apple seems to be
       | better.
        
         | cwzwarich wrote:
         | > This is a significant problem on AMD; Intel and Apple seems
         | to be better.
         | 
         | When did this change? In my testing years ago (while I was
         | writing Rosetta 2, so Icelake-era Intel), Intel only allowed a
         | load to forward from a single store, and no partial forwarding
         | (i.e. mixed cache/register) without a huge penalty, whereas AMD
         | at least allowed partial forwarding (or had a considerably
         | lower penalty than Intel).
        
           | Sesse__ wrote:
           | I don't know if AMD allows more or fewer _situations_, but
           | empirically, I'm seeing a lot of total cycles lost to this on
           | Zen 2 and 3, and much less on the Intel CPUs I've been
           | testing (mostly Skylake derivatives and Alder Lake).
           | 
           | I haven't tested Zen 4 or 5, but I haven't heard anything
           | that indicates they should be a lot better.
        
             | cwzwarich wrote:
             | Interesting! IIRC, the LLVM passes dedicated to dodging
             | this issue were contributed by Intel engineers, so maybe
             | there's some bias.
        
       ___________________________________________________________________
       (page generated 2025-05-04 23:00 UTC)