[HN Gopher] Kernel optimization with BOLT (binary optimization a...
       ___________________________________________________________________
        
       Kernel optimization with BOLT (binary optimization and layout tool)
        
       Author : chmaynard
       Score  : 124 points
       Date   : 2024-10-31 10:45 UTC (6 days ago)
        
 (HTM) web link (lwn.net)
 (TXT) w3m dump (lwn.net)
        
       | BSDobelix wrote:
       | One can try it out with CachyOS/Arch:
       | 
       | https://cachyos.org/blog/2411-kernel-autofdo/
        
         | knowitnone wrote:
         | wanted to see what CachyOS is about.
         | https://www.phoronix.com/review/cachyos-linux-perf/5 it came
         | second place to ClearLinux which is not bad.
        
         | ndesaulniers wrote:
         | Note: that's autoFDO+propeller. This article is about BOLT.
        
       | kardos wrote:
       | Does it work with Intel fortran-compiled code?
        
         | kijiki wrote:
         | As long as you relink with relocations preserved in the final
         | ELF binary, it should.
        
       | JoelJacobson wrote:
       | Here is another interesting BOLT article, this one on PostgreSQL
       | optimization:
       | 
       | https://vondra.me/posts/playing-with-bolt-and-postgres/
       | 
       | "results are unexpectedly good, in some cases up to 40%"
        
         | pfdietz wrote:
         | That's amazing.
        
       | vsskanth wrote:
       | Anyone know of a windows equivalent to BOLT ?
        
         | Cieric wrote:
         | Some google searching brought up this.
         | https://learn.microsoft.com/en-us/cpp/build/profile-guided-o...
         | I'm only reading over it now, but I'm going to test it out a
         | bit when I can.
        
       | OnlyMortal wrote:
       | Back in the day on the Mac, the order of source files in your
       | project would determine locality in the binary.
       | 
       | If memory serves, this was with MPW C or maybe CodeWarrior.
       | 
       | You could see the jump (jmp) instructions use short jumps rather
       | than long ones.
        
         | rurban wrote:
         | This is still relevant. I had big success in writing an order
         | optimizer for perl5
        
         | fsflyer wrote:
         | The Metrowerks profiler and linker worked together to optimize
         | locality in the binary, the focus was on PowerPC code. The
         | linker could generate the static call tree, but the profiler
         | could generate a dynamic call tree of what was actually called.
         | Separating out the cold portions of the call tree into portions
         | of the executable that didn't get paged in was the goal.
         | 
         | I worked on the Profiler and I seem to remember that Microsoft
         | was one of the developers that put a bunch of effort into using
         | this to optimize the Office suite on Mac. I remember the
         | release of Word that used it was snappier.
        
       | stephc_int13 wrote:
       | Instruction Cache and TLB trashing is an often overlooked
       | consequence of code bloat and sometimes of overly aggressive
       | micro-benchmark driven optimization.
       | 
       | Reorganizing the binary is an interesting approach to minimize
       | the cost, but I think that any performance oriented developer
       | should keep in mind that most projects are rarely dependent on a
       | single hot loop but on many systems working together and
       | competing for space in the cache(s).
       | 
       | I generally use -Os instead of -O2 and -O3 in my projects, while
       | trying to reduce code bloat to a minimum for that reason.
        
       ___________________________________________________________________
       (page generated 2024-11-06 23:01 UTC)