[HN Gopher] Lessons learned from profiling an algorithm in Rust
       ___________________________________________________________________
        
       Lessons learned from profiling an algorithm in Rust
        
       Author : urcyanide
       Score  : 84 points
       Date   : 2024-10-13 15:03 UTC (6 hours ago)
        
 (HTM) web link (blog.mapotofu.org)
 (TXT) w3m dump (blog.mapotofu.org)
        
       | andrewaylett wrote:
       | That's really interesting -- I do enjoy a good optimisation.
       | 
       | I was looking at one of the diffs, and thinking at a sufficiently
       | advanced compiler should be able to generate the same efficient
       | code for both -- and indeed it does, if you turn the optimiser
       | on: https://godbolt.org/z/hjP5qjabz                 - let shift =
       | if (i / 32) % 2 == 0 { 32 } else { 0 };       + let shift = ((i
       | >> 5) & 1) << 5;
        
         | NovaX wrote:
         | I'm confused because isn't the bitwise version the inverted
         | logic? If the LSB is 1 then it is an odd value, which should be
         | zero, yet that is shifted to become 32. The original modulus is
         | for an even value becoming 32. Shouldn't the original code or
         | compiler invert it first? I'd expect                   let
         | shift = ((~(i >> 5) & 1) << 5);
         | 
         | EDIT: The compiler uses "vpandn" with the conditional version
         | and "vpand" with the bitwise version. The difference is it
         | includes a bitwise logical NOT operation on the first source
         | operand. It looks like the compiler and I are correct, the
         | author's bitwise version is inverted, and the incorrect code
         | was merged in the author's commit. Also, I think this could be
         | reduced to just (~i & 32).
        
       | carlmr wrote:
       | Great writeup with easy to understand steps. One thing it's
       | lacking though is in the conclusion. I'd like to see a comparison
       | to the C++ implementation.
        
         | efnx wrote:
         | Yes, exactly. How close does it come after all those
         | optimisations?
        
       | wrs wrote:
       | I'm a Rust newbie, wondering how f32::clone could show up in a
       | profile. Wouldn't that be an inline no-op under any kind of
       | optimization? I mean, cloning a float is, at worst, a MOV
       | instruction, no?
        
         | MiguelX413 wrote:
         | Floats aren't stored in the same kinds of registers.
        
       | mwkaufma wrote:
       | I don't understand why half of these aren't optimized by the
       | compiler automatically. (x - y).norm_squared()? Why is
       | f32::clone() not just an inline mov? Begging a lot of questions.
        
         | JackYoustra wrote:
         | I've previously had problems with the compiler not inlining /
         | eliding instructions solely due to profiling code (see a blog
         | post: https://www.jackyoustra.com/blog/llama-ios#-bug-bug-
         | slowdown...). I wonder if it's that?
         | 
         | (I've also always had a sneaking suspicion I did something
         | wrong in my example, so if anyone knows let me know)
        
           | vlovich123 wrote:
           | Pcwalton's explanation is much more likely to be correct
           | https://news.ycombinator.com/context?id=41830704
           | 
           | Profiling native code with optimizations on is very very
           | tricky.
        
       | pcwalton wrote:
       | I'm guessing that f32::clone showing up in the profile isn't
       | actually a call to f32::clone, because you have optimizations on
       | (if it actually is a call to a "movd xmm0,dword ptr [rdi]; ret"
       | instruction pair, that's a bug in the compiler). Rather it's the
       | result of the compiler choosing to attribute seemingly-random
       | lines to f32::clone, because when lines from multiple functions
       | are fused into one instruction the compiler will just pick one,
       | and it happened to pick f32::clone to write into the debug info.
       | You really want to look at instruction-level profiling when
       | you're profiling at that level instead of the individual
       | functions, since debug info is going to be very unreliable.
        
         | eftychis wrote:
         | Seconded. This could have been essentially anything and
         | everything else bunched together. Or we have a compiler or
         | debug symbol bug in our hands.
        
       ___________________________________________________________________
       (page generated 2024-10-13 22:00 UTC)