hngopher.com

       [HN Gopher] Link Time Optimizations: New Way to Do Compiler Opti...
       ___________________________________________________________________
        
       Link Time Optimizations: New Way to Do Compiler Optimizations
        
       Author : signa11
       Score  : 35 points
       Date   : 2025-05-19 07:48 UTC (2 days ago)
        
 (HTM) web link (johnnysswlab.com)
 (TXT) w3m dump (johnnysswlab.com)
        
       | sakex wrote:
       | Maybe add the date to the title, because it's hardly new at this
       | point
        
         | vsl wrote:
         | ...or in 2020 (the year of the article).
        
       | Deukhoofd wrote:
       | What do you mean, new? LTO has been in GCC since 2011. It's old
       | enough to have a social media account in most jurisdictions.
        
         | jeffbee wrote:
         | Pretty sure MSVC ".NET" was doing link-time whole-program
         | optimization in 2001.
        
           | andyayers wrote:
           | HPUX compilers were doing this back in 1993.
        
             | jeffbee wrote:
             | Oh yeah, well ... actually I got nothin'. You win.
             | 
             | I will just throw in some nostalgia for how good that
             | compiler was. My college roommate brought an HP pizza box
             | that his dad secured from HP, and the way the C compiler
             | quoted chapter and verse from ISO C in its error messages
             | was impressive.
        
             | abainbridge wrote:
             | Or academics in 1986:
             | https://dl.acm.org/doi/abs/10.1145/13310.13338
             | 
             | The idea of optimizations running at different stages in
             | the build, with different visibility of the whole program,
             | was discussed in 1979, but the world was so different back
             | then that the discussion seems foreign.
             | https://dl.acm.org/doi/pdf/10.1145/872732.806974
        
         | srean wrote:
         | Yes and if I remember correctly there used to be Linux distros
         | that had all the distro binaries LTO'ed.
        
       | phkahler wrote:
       | I tried LTO with Solvespace 4 years ago and got about 15 percent
       | better performance:
       | 
       | https://github.com/solvespace/solvespace/issues/972
       | 
       | Build time was terrible taking a few minutes vs 30-40 seconds for
       | a full build. Have they done anything to use multi-core for LTO?
       | It only used one core for that.
       | 
       | Also tested OpenMP which was obviously a bigger win. More
       | recently I ran the same test after upgrading from an AMD 2400G to
       | a 5700G which has double the cores and about 1.5x the IPC. The
       | result was a solid 3x improvement so we scale well with cores
       | going from 4 to 8.
        
         | wahern wrote:
         | Both clang and GCC support multi-core LTO, as does Rust.
         | However, you have to partition the code, so the more cores you
         | use the less benefit to LTO. Rust partitions by crate by
         | default, but it can increase parallelism by partitioning each
         | crate. I think "fat LTO" is the term typically used for whole-
         | program, or at least in the case of Rust, whole-crate LTO,
         | whereas "thin LTO" is what you get when you LTO partitions and
         | then link those together normally. For clang and GCC, you can
         | either have them automatically partition the code for thin LTO
         | , or do it explicitly via your Makefile rules[1].
         | 
         | [1] Interestingly, GCC actually invokes Make internally to
         | implement thin LTO, which lets it play nice with GNU Make's job
         | control and obey the -j switch.
        
       | WalterBright wrote:
       | Link time optimizations were done in the 1980s if I recall
       | correctly.
       | 
       | I never tried to implement them, finding it easier and more
       | effective for the compiler to simply compile all the source files
       | at the same time.
       | 
       | The D compiler is designed to be able to build one object file
       | per source file at a time, or one object file which combines all
       | of the source files. Most people choose the one object file.
        
         | srean wrote:
         | I think MLton does it this way.
         | 
         | http://mlton.org/WholeProgramOptimization
         | 
         | Dynamically linked and dynamically loaded libraries are useful
         | though (paid for with its problems of course)
        
         | tester756 wrote:
         | Yea, generating many object files seems like weird thing. Maybe
         | it was good thing decades ago, but now?
         | 
         | Because then you need to link them, thus you need some kind of
         | linker.
         | 
         | Just generate one output file and skip the linker
        
           | WalterBright wrote:
           | I've considered many times doing just that.
        
             | tester756 wrote:
             | And what was the result/conclusion of such considerations?
        
           | yencabulator wrote:
           | Not maybe. Sufficient RAM for compilation was a serious issue
           | back in the day.
        
         | senkora wrote:
         | In C++, there is a trick to get this behavior called "unity
         | builds", where you include all of your source files into a
         | single file and then invoke the compiler on that file.
         | 
         | Of course, being C++, this subtly changes behavior and must be
         | done carefully. I like this article that explains the ins and
         | outs of using unity builds:
         | https://austinmorlan.com/posts/unity_jumbo_build/
        
           | WalterBright wrote:
           | > this subtly changes behavior
           | 
           | The D module design ensures that module imports are
           | independent of each other and are independent of the
           | importer.
        
       | Remnant44 wrote:
       | Link time optimization is definitely not new, but it is
       | incredibly powerful - I have personally had situations where the
       | failure to be able to inline functions from a static library
       | without lto cut performance in half.
       | 
       | It's easy to dismiss a basic article like this, but it's
       | basically a discovery that every Junior engineer will make, and
       | it's useful to talk about those too!
        
         | srean wrote:
         | The inline keyword should really have been intended for call
         | sites rather than definitions.
         | 
         | Perhaps language designers thought that if a function needs to
         | be inlined everywhere, it would lead to verbose code. In any
         | case, it's a weak hint that compilers generally treat with much
         | disdain.
        
       | lilyball wrote:
       | ffmpeg has a lot of assembly code in it, so it's a very odd
       | choice of program to use for this kind of test as LTO is
       | presumably not going to do anything to the assembly.
        
       ___________________________________________________________________
       (page generated 2025-05-21 23:01 UTC)