[HN Gopher] Kernel optimization with BOLT (binary optimization a...
___________________________________________________________________
Kernel optimization with BOLT (binary optimization and layout tool)
Author : chmaynard
Score : 124 points
Date : 2024-10-31 10:45 UTC (6 days ago)
(HTM) web link (lwn.net)
(TXT) w3m dump (lwn.net)
| BSDobelix wrote:
| One can try it out with CachyOS/Arch:
|
| https://cachyos.org/blog/2411-kernel-autofdo/
| knowitnone wrote:
| wanted to see what CachyOS is about.
| https://www.phoronix.com/review/cachyos-linux-perf/5 it came
| second place to ClearLinux which is not bad.
| ndesaulniers wrote:
| Note: that's autoFDO+propeller. This article is about BOLT.
| kardos wrote:
| Does it work with Intel fortran-compiled code?
| kijiki wrote:
| As long as you relink with relocations preserved in the final
| ELF binary, it should.
| JoelJacobson wrote:
| Here is another interesting BOLT article, this one on PostgreSQL
| optimization:
|
| https://vondra.me/posts/playing-with-bolt-and-postgres/
|
| "results are unexpectedly good, in some cases up to 40%"
| pfdietz wrote:
| That's amazing.
| vsskanth wrote:
| Anyone know of a windows equivalent to BOLT ?
| Cieric wrote:
| Some google searching brought up this.
| https://learn.microsoft.com/en-us/cpp/build/profile-guided-o...
| I'm only reading over it now, but I'm going to test it out a
| bit when I can.
| OnlyMortal wrote:
| Back in the day on the Mac, the order of source files in your
| project would determine locality in the binary.
|
| If memory serves, this was with MPW C or maybe CodeWarrior.
|
| You could see the jump (jmp) instructions use short jumps rather
| than long ones.
| rurban wrote:
| This is still relevant. I had big success in writing an order
| optimizer for perl5
| fsflyer wrote:
| The Metrowerks profiler and linker worked together to optimize
| locality in the binary, the focus was on PowerPC code. The
| linker could generate the static call tree, but the profiler
| could generate a dynamic call tree of what was actually called.
| Separating out the cold portions of the call tree into portions
| of the executable that didn't get paged in was the goal.
|
| I worked on the Profiler and I seem to remember that Microsoft
| was one of the developers that put a bunch of effort into using
| this to optimize the Office suite on Mac. I remember the
| release of Word that used it was snappier.
| stephc_int13 wrote:
| Instruction Cache and TLB trashing is an often overlooked
| consequence of code bloat and sometimes of overly aggressive
| micro-benchmark driven optimization.
|
| Reorganizing the binary is an interesting approach to minimize
| the cost, but I think that any performance oriented developer
| should keep in mind that most projects are rarely dependent on a
| single hot loop but on many systems working together and
| competing for space in the cache(s).
|
| I generally use -Os instead of -O2 and -O3 in my projects, while
| trying to reduce code bloat to a minimum for that reason.
___________________________________________________________________
(page generated 2024-11-06 23:01 UTC)