[HN Gopher] Link Time Optimizations: New Way to Do Compiler Opti...
___________________________________________________________________
Link Time Optimizations: New Way to Do Compiler Optimizations
Author : signa11
Score : 35 points
Date : 2025-05-19 07:48 UTC (2 days ago)
(HTM) web link (johnnysswlab.com)
(TXT) w3m dump (johnnysswlab.com)
| sakex wrote:
| Maybe add the date to the title, because it's hardly new at this
| point
| vsl wrote:
| ...or in 2020 (the year of the article).
| Deukhoofd wrote:
| What do you mean, new? LTO has been in GCC since 2011. It's old
| enough to have a social media account in most jurisdictions.
| jeffbee wrote:
| Pretty sure MSVC ".NET" was doing link-time whole-program
| optimization in 2001.
| andyayers wrote:
| HPUX compilers were doing this back in 1993.
| jeffbee wrote:
| Oh yeah, well ... actually I got nothin'. You win.
|
| I will just throw in some nostalgia for how good that
| compiler was. My college roommate brought an HP pizza box
| that his dad secured from HP, and the way the C compiler
| quoted chapter and verse from ISO C in its error messages
| was impressive.
| abainbridge wrote:
| Or academics in 1986:
| https://dl.acm.org/doi/abs/10.1145/13310.13338
|
| The idea of optimizations running at different stages in
| the build, with different visibility of the whole program,
| was discussed in 1979, but the world was so different back
| then that the discussion seems foreign.
| https://dl.acm.org/doi/pdf/10.1145/872732.806974
| srean wrote:
| Yes and if I remember correctly there used to be Linux distros
| that had all the distro binaries LTO'ed.
| phkahler wrote:
| I tried LTO with Solvespace 4 years ago and got about 15 percent
| better performance:
|
| https://github.com/solvespace/solvespace/issues/972
|
| Build time was terrible taking a few minutes vs 30-40 seconds for
| a full build. Have they done anything to use multi-core for LTO?
| It only used one core for that.
|
| Also tested OpenMP which was obviously a bigger win. More
| recently I ran the same test after upgrading from an AMD 2400G to
| a 5700G which has double the cores and about 1.5x the IPC. The
| result was a solid 3x improvement so we scale well with cores
| going from 4 to 8.
| wahern wrote:
| Both clang and GCC support multi-core LTO, as does Rust.
| However, you have to partition the code, so the more cores you
| use the less benefit to LTO. Rust partitions by crate by
| default, but it can increase parallelism by partitioning each
| crate. I think "fat LTO" is the term typically used for whole-
| program, or at least in the case of Rust, whole-crate LTO,
| whereas "thin LTO" is what you get when you LTO partitions and
| then link those together normally. For clang and GCC, you can
| either have them automatically partition the code for thin LTO
| , or do it explicitly via your Makefile rules[1].
|
| [1] Interestingly, GCC actually invokes Make internally to
| implement thin LTO, which lets it play nice with GNU Make's job
| control and obey the -j switch.
| WalterBright wrote:
| Link time optimizations were done in the 1980s if I recall
| correctly.
|
| I never tried to implement them, finding it easier and more
| effective for the compiler to simply compile all the source files
| at the same time.
|
| The D compiler is designed to be able to build one object file
| per source file at a time, or one object file which combines all
| of the source files. Most people choose the one object file.
| srean wrote:
| I think MLton does it this way.
|
| http://mlton.org/WholeProgramOptimization
|
| Dynamically linked and dynamically loaded libraries are useful
| though (paid for with its problems of course)
| tester756 wrote:
| Yea, generating many object files seems like weird thing. Maybe
| it was good thing decades ago, but now?
|
| Because then you need to link them, thus you need some kind of
| linker.
|
| Just generate one output file and skip the linker
| WalterBright wrote:
| I've considered many times doing just that.
| tester756 wrote:
| And what was the result/conclusion of such considerations?
| yencabulator wrote:
| Not maybe. Sufficient RAM for compilation was a serious issue
| back in the day.
| senkora wrote:
| In C++, there is a trick to get this behavior called "unity
| builds", where you include all of your source files into a
| single file and then invoke the compiler on that file.
|
| Of course, being C++, this subtly changes behavior and must be
| done carefully. I like this article that explains the ins and
| outs of using unity builds:
| https://austinmorlan.com/posts/unity_jumbo_build/
| WalterBright wrote:
| > this subtly changes behavior
|
| The D module design ensures that module imports are
| independent of each other and are independent of the
| importer.
| Remnant44 wrote:
| Link time optimization is definitely not new, but it is
| incredibly powerful - I have personally had situations where the
| failure to be able to inline functions from a static library
| without lto cut performance in half.
|
| It's easy to dismiss a basic article like this, but it's
| basically a discovery that every Junior engineer will make, and
| it's useful to talk about those too!
| srean wrote:
| The inline keyword should really have been intended for call
| sites rather than definitions.
|
| Perhaps language designers thought that if a function needs to
| be inlined everywhere, it would lead to verbose code. In any
| case, it's a weak hint that compilers generally treat with much
| disdain.
| lilyball wrote:
| ffmpeg has a lot of assembly code in it, so it's a very odd
| choice of program to use for this kind of test as LTO is
| presumably not going to do anything to the assembly.
___________________________________________________________________
(page generated 2025-05-21 23:01 UTC)