[HN Gopher] Removing global state from LLD, the LLVM linker
       ___________________________________________________________________
        
       Removing global state from LLD, the LLVM linker
        
       Author : ingve
       Score  : 71 points
       Date   : 2024-11-18 06:38 UTC (3 days ago)
        
 (HTM) web link (maskray.me)
 (TXT) w3m dump (maskray.me)
        
       | beeforpork wrote:
       | Why not use thread_local instead of passing a param everywhere?
       | What's the drawback there?
        
         | mrkeen wrote:
         | Thread-local is way too magical for me. I wouldn't want to
         | debug a system that made use of it.
         | 
         | Also, if you pass a param, then it can be shared.
        
           | geocar wrote:
           | > Thread-local is way too magical for me. I wouldn't want to
           | debug a system that made use of it.
           | 
           | There's a perfectly cromulent register just begging to be
           | used; the circuitry has already been paid for, generating
           | heat whether you like it or not, what magic are you afraid of
           | here?
           | 
           | > Also, if you pass a param, then it can be shared.
           | 
           | Maybe, but if you design for sharing you'll never use your
           | program might be bigger and slower as a result. Sometimes
           | that matters.
        
             | cesarb wrote:
             | > > Thread-local is way too magical for me.
             | 
             | > There's a perfectly cromulent register just begging to be
             | used; [...] what magic are you afraid of here?
             | 
             | Most of the magic is not when using the thread-local
             | variable, but when allocating it. When you declare a
             | "static __thread char *p", how do you know that for
             | instance this is located at the 123th word of the per-
             | thread area? What if that declaration is on a dynamic
             | library, which was loaded late (dlopen) into the process?
             | What about threads which were started before that dynamic
             | library was loaded, and therefore did not have enough space
             | in their per-thread area for that thread-local variable,
             | when they call into code which references it? What happens
             | if the thread-local variable has an initializer?
             | 
             | The documentation at
             | https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html links
             | to a 81-page document describing four TLS access models,
             | and that's just for Unix-style ELF; Windows platforms have
             | their own complexities (which IIRC includes a per-process
             | maximum of 64 or 1088 TLS slots, with slots above the first
             | 64 being handled in a slightly different way).
        
               | maccard wrote:
               | The initialisation model in c++ is totally and utterly
               | broken and indecipherable. That doesn't stop me from
               | doing vector<int> foo = {1,2, 3};
        
               | intelVISA wrote:
               | Avoiding thread locals due to dynamic libraries being bad
               | is justified but still doesn't feel like the right
               | tradeoff.
        
               | AshamedCaptain wrote:
               | When you declare a `static char *p;', how do you even
               | know in which address of memory it is going to end up ??
               | How do you know what will happen if another compilation
               | unit declares another variable of the same name? Another
               | static library? Another dynamic library? What about
               | initialization, what about other constructors that may
               | read memory before main() runs? What about injected
               | threads that are started before that? Madness, I tell
               | you, absolute and utter madness.
        
         | rwmj wrote:
         | Certain linker operations can be multi-threaded (not sure if
         | this is specifically true for LLD). Particularly LTO in the GNU
         | toolchain, but also there's been a lot of effort recently to
         | make linking faster by actually having it use all available
         | cores.
        
         | ComputerGuru wrote:
         | thread_local is usually considered the hack to make unthreaded
         | code littered with static variables useable from multiple
         | thread contexts. It has overhead and reduces the compiler's
         | ability to optimize the code as compared to when parameters are
         | used.
         | 
         | Also, until very recently, a lot compilers/platforms were
         | unable to handle thread_local variables larger than a pointer
         | size making it difficult to retrofit a lot of old code.
        
           | o11c wrote:
           | It's worth noting that `thread_local` does reduce register
           | pressure. Unfortunately, almost no languages actually
           | natively support the scoping that sane use of this requires.
        
         | malkia wrote:
         | I use thread_local a lot, but until recently, on Windows a
         | delay-loaded dll with thread_local would've not worked, and the
         | fix that is in place today is costly, okay that may not be the
         | typical case, but it shows that support for such feature can
         | create a lot of cost elsewhere.
         | 
         | Another pitfall with these is with thread-stealing concurrent
         | schedulers - e.g. your worker thread now waits on something,
         | and the scheduler decides to reuse the current thread for
         | another worker - what is the meaning of thread_local there?
         | 
         | Another one would be coroutines (though haven't used them a lot
         | in C/C++).
        
       | high_na_euv wrote:
       | Ive always struggled to understand the need to have linker
       | 
       | Like, you could easily write your compiler to do not have to rely
       | on such machinery
       | 
       | Meanwhile they add complexity and decrease quality of error
       | messages (in cpp)
        
         | mschuster91 wrote:
         | > Like, you could easily write your compiler to do not have to
         | rely on such machinery
         | 
         | You need a linker as soon as you are dealing with either
         | multiple languages in one project (say, C++ and ASM) or if you
         | include other libraries.
        
         | Joker_vD wrote:
         | Separate compilation. Of course, if your compiler is fast
         | enough to rebuild the whole universe in 6 seconds and then rest
         | on the seventh -- an approach Wirth advocated in one of his
         | papers about an implementation of Pascal system -- you won't
         | need a linker. But most compilers are not that fast.
         | 
         | Besides, there is more than one programming language, so that's
         | something we have to deal with somehow.
         | 
         | And to be fair, merging modules in the compiler, as you go by,
         | while not that difficult, is just annoying. If you link them
         | properly together, into big amalgamated text/rodata/data
         | sections, then you need to apply relocations (and have them in
         | the first place). If you just place them next to each other,
         | then you have to organize the inter-module calls via some moral
         | equivalent of GOT/PLT. In any case, all this logic really
         | doesn't have much to do with code generation proper, it's
         | administrativia -- and logic for dealing with has already been
         | written for you and packed in the so called "link editor".
        
           | uptownfunk wrote:
           | What are the bottlenecks that make this so slow
        
         | ChadNauseam wrote:
         | When I first came to C++ from Rust I was surprised by the
         | regularity of linker errors. Rust must be compiled with a
         | linker as well but I don't think I've ever seen a linker error,
         | except when doing exotic things far outside of my typical day-
         | to-day.
         | 
         | I guess rustc detects the situations in which the linker would
         | throw an error and then throws its own error preemptively. It
         | leads to a much better user experience than the C++ one, since
         | the error messages produced by the linker are always
         | unnecessarily terrible
        
           | Joker_vD wrote:
           | > I guess rustc detects the situations in which the linker
           | would throw an error and then throws its own error
           | preemptively.
           | 
           | Pretty much. The crucial difference between C and Rust which
           | enables Rust to do this sort of detection is that in Rust,
           | the extern things are anchored in modules (crates? whatever),
           | and so when you import things, you have to say _where_ you
           | are importing them from.                   extern void
           | *magic_init(int);         extern void *magic_stuff(void*,
           | const char*, int);         extern void magic_fini(void*);
           | 
           | versus                   use crate_of_magic::{init, stuff,
           | fini};
           | 
           | It even enables one to actually type-check against the
           | imported crate during the compilation (IIRC if Cargo can't
           | locate the imported crate to look into it, it will refuse to
           | build the project), as opposed to hoping that the headers
           | you've included are correctly describing the object file
           | you'll be linking against.
        
           | 0x457 wrote:
           | Only time I get linker errors in rust is when it's linking
           | some dynamic library written in C.
        
         | wyldfire wrote:
         | All but the most trivial programs require a linker to resolve
         | references among object files. And while "int main() {}" might
         | seem like a trivial C program, it's not (by that definition, at
         | least).
         | 
         | Your favorite toolchain will often include archives and objects
         | that you might take for granted like crt0.o, init.o, fini.o,
         | libgcc/clang_rt.builtins and more.
         | 
         | The compiler's design is simplified by not having to resolve
         | references among symbols. The assembler can do this for
         | references within a section and linkers can do it among
         | sections. Linkers might have to add trampolines/thunks for
         | relocations that span a distance longer than the opcode could
         | reach. Loaders do this symbol resolution and relocation at
         | runtime.
        
       | mingodad wrote:
       | I did the same for tinycc here https://github.com/mingodad/tinycc
       | and used Netbeans IDE that has great refactoring options for
       | C/C++/Java.
       | 
       | Benchmarking the reentrant result showed it to be around 5%
       | slower.
       | 
       | Now I'm trying to redo it again but this time scripting the
       | refactoring using sparse https://github.com/lucvoo/sparse to
       | parse and using it's error messages with with line/column to
       | guide the refactoring, I already got an initial script that
       | performs some initial transformations and is repeatable, but more
       | work need to be done, mainly enhance/extend the info that sparse
       | provide while parsing the code.
        
         | mingodad wrote:
         | Also for C/C++ binaries with debug info gdb is one of the
         | ingredients used to show where and how much globals exists:
         | 
         | gdb -batch -ex "info variables" -ex quit --args binary-to-
         | examine
        
       ___________________________________________________________________
       (page generated 2024-11-21 23:01 UTC)