[HN Gopher] Removing global state from LLD, the LLVM linker
___________________________________________________________________
Removing global state from LLD, the LLVM linker
Author : ingve
Score : 71 points
Date : 2024-11-18 06:38 UTC (3 days ago)
(HTM) web link (maskray.me)
(TXT) w3m dump (maskray.me)
| beeforpork wrote:
| Why not use thread_local instead of passing a param everywhere?
| What's the drawback there?
| mrkeen wrote:
| Thread-local is way too magical for me. I wouldn't want to
| debug a system that made use of it.
|
| Also, if you pass a param, then it can be shared.
| geocar wrote:
| > Thread-local is way too magical for me. I wouldn't want to
| debug a system that made use of it.
|
| There's a perfectly cromulent register just begging to be
| used; the circuitry has already been paid for, generating
| heat whether you like it or not, what magic are you afraid of
| here?
|
| > Also, if you pass a param, then it can be shared.
|
| Maybe, but if you design for sharing you'll never use your
| program might be bigger and slower as a result. Sometimes
| that matters.
| cesarb wrote:
| > > Thread-local is way too magical for me.
|
| > There's a perfectly cromulent register just begging to be
| used; [...] what magic are you afraid of here?
|
| Most of the magic is not when using the thread-local
| variable, but when allocating it. When you declare a
| "static __thread char *p", how do you know that for
| instance this is located at the 123th word of the per-
| thread area? What if that declaration is on a dynamic
| library, which was loaded late (dlopen) into the process?
| What about threads which were started before that dynamic
| library was loaded, and therefore did not have enough space
| in their per-thread area for that thread-local variable,
| when they call into code which references it? What happens
| if the thread-local variable has an initializer?
|
| The documentation at
| https://gcc.gnu.org/onlinedocs/gcc/Thread-Local.html links
| to a 81-page document describing four TLS access models,
| and that's just for Unix-style ELF; Windows platforms have
| their own complexities (which IIRC includes a per-process
| maximum of 64 or 1088 TLS slots, with slots above the first
| 64 being handled in a slightly different way).
| maccard wrote:
| The initialisation model in c++ is totally and utterly
| broken and indecipherable. That doesn't stop me from
| doing vector<int> foo = {1,2, 3};
| intelVISA wrote:
| Avoiding thread locals due to dynamic libraries being bad
| is justified but still doesn't feel like the right
| tradeoff.
| AshamedCaptain wrote:
| When you declare a `static char *p;', how do you even
| know in which address of memory it is going to end up ??
| How do you know what will happen if another compilation
| unit declares another variable of the same name? Another
| static library? Another dynamic library? What about
| initialization, what about other constructors that may
| read memory before main() runs? What about injected
| threads that are started before that? Madness, I tell
| you, absolute and utter madness.
| rwmj wrote:
| Certain linker operations can be multi-threaded (not sure if
| this is specifically true for LLD). Particularly LTO in the GNU
| toolchain, but also there's been a lot of effort recently to
| make linking faster by actually having it use all available
| cores.
| ComputerGuru wrote:
| thread_local is usually considered the hack to make unthreaded
| code littered with static variables useable from multiple
| thread contexts. It has overhead and reduces the compiler's
| ability to optimize the code as compared to when parameters are
| used.
|
| Also, until very recently, a lot compilers/platforms were
| unable to handle thread_local variables larger than a pointer
| size making it difficult to retrofit a lot of old code.
| o11c wrote:
| It's worth noting that `thread_local` does reduce register
| pressure. Unfortunately, almost no languages actually
| natively support the scoping that sane use of this requires.
| malkia wrote:
| I use thread_local a lot, but until recently, on Windows a
| delay-loaded dll with thread_local would've not worked, and the
| fix that is in place today is costly, okay that may not be the
| typical case, but it shows that support for such feature can
| create a lot of cost elsewhere.
|
| Another pitfall with these is with thread-stealing concurrent
| schedulers - e.g. your worker thread now waits on something,
| and the scheduler decides to reuse the current thread for
| another worker - what is the meaning of thread_local there?
|
| Another one would be coroutines (though haven't used them a lot
| in C/C++).
| high_na_euv wrote:
| Ive always struggled to understand the need to have linker
|
| Like, you could easily write your compiler to do not have to rely
| on such machinery
|
| Meanwhile they add complexity and decrease quality of error
| messages (in cpp)
| mschuster91 wrote:
| > Like, you could easily write your compiler to do not have to
| rely on such machinery
|
| You need a linker as soon as you are dealing with either
| multiple languages in one project (say, C++ and ASM) or if you
| include other libraries.
| Joker_vD wrote:
| Separate compilation. Of course, if your compiler is fast
| enough to rebuild the whole universe in 6 seconds and then rest
| on the seventh -- an approach Wirth advocated in one of his
| papers about an implementation of Pascal system -- you won't
| need a linker. But most compilers are not that fast.
|
| Besides, there is more than one programming language, so that's
| something we have to deal with somehow.
|
| And to be fair, merging modules in the compiler, as you go by,
| while not that difficult, is just annoying. If you link them
| properly together, into big amalgamated text/rodata/data
| sections, then you need to apply relocations (and have them in
| the first place). If you just place them next to each other,
| then you have to organize the inter-module calls via some moral
| equivalent of GOT/PLT. In any case, all this logic really
| doesn't have much to do with code generation proper, it's
| administrativia -- and logic for dealing with has already been
| written for you and packed in the so called "link editor".
| uptownfunk wrote:
| What are the bottlenecks that make this so slow
| ChadNauseam wrote:
| When I first came to C++ from Rust I was surprised by the
| regularity of linker errors. Rust must be compiled with a
| linker as well but I don't think I've ever seen a linker error,
| except when doing exotic things far outside of my typical day-
| to-day.
|
| I guess rustc detects the situations in which the linker would
| throw an error and then throws its own error preemptively. It
| leads to a much better user experience than the C++ one, since
| the error messages produced by the linker are always
| unnecessarily terrible
| Joker_vD wrote:
| > I guess rustc detects the situations in which the linker
| would throw an error and then throws its own error
| preemptively.
|
| Pretty much. The crucial difference between C and Rust which
| enables Rust to do this sort of detection is that in Rust,
| the extern things are anchored in modules (crates? whatever),
| and so when you import things, you have to say _where_ you
| are importing them from. extern void
| *magic_init(int); extern void *magic_stuff(void*,
| const char*, int); extern void magic_fini(void*);
|
| versus use crate_of_magic::{init, stuff,
| fini};
|
| It even enables one to actually type-check against the
| imported crate during the compilation (IIRC if Cargo can't
| locate the imported crate to look into it, it will refuse to
| build the project), as opposed to hoping that the headers
| you've included are correctly describing the object file
| you'll be linking against.
| 0x457 wrote:
| Only time I get linker errors in rust is when it's linking
| some dynamic library written in C.
| wyldfire wrote:
| All but the most trivial programs require a linker to resolve
| references among object files. And while "int main() {}" might
| seem like a trivial C program, it's not (by that definition, at
| least).
|
| Your favorite toolchain will often include archives and objects
| that you might take for granted like crt0.o, init.o, fini.o,
| libgcc/clang_rt.builtins and more.
|
| The compiler's design is simplified by not having to resolve
| references among symbols. The assembler can do this for
| references within a section and linkers can do it among
| sections. Linkers might have to add trampolines/thunks for
| relocations that span a distance longer than the opcode could
| reach. Loaders do this symbol resolution and relocation at
| runtime.
| mingodad wrote:
| I did the same for tinycc here https://github.com/mingodad/tinycc
| and used Netbeans IDE that has great refactoring options for
| C/C++/Java.
|
| Benchmarking the reentrant result showed it to be around 5%
| slower.
|
| Now I'm trying to redo it again but this time scripting the
| refactoring using sparse https://github.com/lucvoo/sparse to
| parse and using it's error messages with with line/column to
| guide the refactoring, I already got an initial script that
| performs some initial transformations and is repeatable, but more
| work need to be done, mainly enhance/extend the info that sparse
| provide while parsing the code.
| mingodad wrote:
| Also for C/C++ binaries with debug info gdb is one of the
| ingredients used to show where and how much globals exists:
|
| gdb -batch -ex "info variables" -ex quit --args binary-to-
| examine
___________________________________________________________________
(page generated 2024-11-21 23:01 UTC)