[HN Gopher] Driving Compilers (2023)
       ___________________________________________________________________
        
       Driving Compilers (2023)
        
       Author : misonic
       Score  : 81 points
       Date   : 2025-05-05 02:17 UTC (20 hours ago)
        
 (HTM) web link (fabiensanglard.net)
 (TXT) w3m dump (fabiensanglard.net)
        
       | lynx97 wrote:
       | Nitpick: Almost all Hello World C examples are wrong. printf is
       | for when you need to use a format string. Hello World doesn't.
       | Besides:
       | 
       | > puts() writes the string s and a trailing newline to stdout.
       | 
       | int main() { puts("Hello World!"); }
        
         | PhilipRoman wrote:
         | Eh, it compiles down to the same thing with optimizations
         | enabled:
         | 
         | https://godbolt.org/z/zcqa4Txen
         | 
         | But I agree, using printf for constant strings is one step away
         | from doing printf(x) which is a big no-no.
        
           | Joker_vD wrote:
           | Useless bit of compiler optimizations trivia: the "this
           | printf() is equivalent to puts()" optimization seems to work
           | by looking for the '%' in the format string, not by counting
           | whether there is only one argument to printf(), e.g. if you
           | add 42 as a second argument to the printf() -- which is
           | absolutely legal and required by the standard to Work as
           | Intended(tm) -- the resulting binary still uses puts().
        
         | unwind wrote:
         | I agree, but I have to point out that if you're gonna be like
         | that, then you should be explicit about your final
         | return 0;
        
           | tavianator wrote:
           | The C standard (since C99) says that `main()` has an implicit
           | `return 0`, you don't need to write it explicitly.
        
             | 01HNNWZ0MV43FF wrote:
             | Sure but are we teaching good habits to students, or are we
             | golfing?
        
         | indigoabstract wrote:
         | The example is kind of pedantic, but I think a linter might be
         | able to catch it.
        
         | Mbwagava wrote:
         | Eh, not a fan of puts. It doesn't add any value over write or
         | printf and it should be named "printLine".
         | 
         | But if you're still using raw libc in 2025 that's a problem you
         | willingly opted into. I have zero sympathy.
        
         | tom_ wrote:
         | But "Hello world\n" is a format string. The format strings with
         | no % chars in them are the best type of format string! They're
         | nearly impossible to get wrong!
        
         | david2ndaccount wrote:
         | That's not the point of hello world. It's not to be as small a
         | valid program as possible. It's to be a small program that also
         | exercises the needed functionality for using the tool usefully.
         | All of the exercises following that hello world need formatted
         | text, so introducing puts would just add confusion and wouldn't
         | verify that you have a working printf.
        
       | Timwi wrote:
       | I share the frustration the author describes. When I started out
       | programming as a child, I used Turbo Pascal, but I was aware of
       | Turbo C and that more people used that than Pascal. Nevertheless,
       | I couldn't really wrap my head around C at the time, and it was
       | partly due to linker errors that I couldn't understand; and it
       | seemed that Turbo Pascal just didn't use a linker, so it was
       | easier to understand and tinker with at age 9.
       | 
       | It's intriguing to think how different my experience could have
       | been if educational material at the time had focused as much on
       | full explanations of the compiler+linker process, including
       | example error conditions, as it did on teaching the language.
       | 
       | 30 years later, I like to claim that I have a reasonably workable
       | understanding of how compilers work, but I'm still nebulous on
       | how linkers do what they do. I'm much more comfortable with
       | higher-level compilers such as C# that compile to a VM bytecode
       | (IL) and don't worry about linkers.
        
         | virgilp wrote:
         | Linkers pretty much map data sections to memory, and in doing
         | so are able to replace symbolic names (like global variables,
         | or goto targets) with numbers. They may also completely drop
         | some things that are not needed (e.g. code/files in a library
         | that is never referenced).
         | 
         | I'm over-simplifying and also it's a bit incorrect, because
         | there's also the loader that does a lot of the same work that
         | linkers do, when loading the program in memory. So linkers
         | don't actually produce the final image - but really, they're
         | rather "simple" things (for some definition of "simple").
         | 
         | The hard-to-understand linker errors are typically caused by
         | the compiler, not the linker (it's the compiler that
         | speculatively chooses to use a symbol with a long and funny
         | name, thinking that it'll later be provided by <somebody>, when
         | in fact the linker later finds out that no library or object
         | file actually provided said symbol; and then for the linker to
         | give you a decent error message, it needs to have a pretty good
         | understanding of what the compiler was actually trying to do -
         | i.e. to know implementation details of the compiler that
         | otherwise would not concern it at all).
        
           | tester756 wrote:
           | >The hard-to-understand linker errors are typically caused by
           | the compiler, not the linker (it's the compiler that
           | speculatively chooses to use a symbol with a long and funny
           | name, thinking that it'll later be provided by <somebody>,
           | when in fact the linker later finds out that no library or
           | object file actually provided said symbol; and then for the
           | linker to give you a decent error message, it needs to have a
           | pretty good understanding of what the compiler was actually
           | trying to do - i.e. to know implementation details of the
           | compiler that otherwise would not concern it at all).
           | 
           | So... maybe let's avoid having linker as another/external
           | tool and just let compiler perform linking
        
             | boricj wrote:
             | The linker stitches object files together, regardless of
             | their origin. If a compiler directly outputs a finalized
             | artifact, then it would be impossible to add code written
             | in other programming languages into the mix unless the
             | compiler also doubles as a linker.
        
               | tester756 wrote:
               | > then it would be impossible to add code written in
               | other programming languages into
               | 
               | Is this really that important that we cannot skip this
               | requirement?
        
               | raddan wrote:
               | You might be surprised how often multi-language programs
               | appear. Basically all of modern day Python for starters.
               | But also a number of important numerical libraries for C
               | are actually written in Fortran.
        
         | antonvs wrote:
         | C# and Java still do linking, it just happens dynamically at
         | runtime. That's part of why startup time is slower in those
         | languages, and why performance can be less predictable.
         | 
         | The main difference between linkers for native binaries and
         | linking in IL-based languages is that native binary linking
         | involves resolving memory addresses at build time. In the
         | object files that are being linked, memory addresses are
         | typically 0-relative to whatever section they're in within that
         | file. When you combine a bunch of object files together, you
         | have to adjust the addresses so they can live together in the
         | same address space. Object file A and B both might use
         | addresses 0-10, but when they're linked together, the linker
         | will arrange it so that e.g. A uses 0-10 and B uses 11-21.
         | That's just a bit of simple offset arithmetic. And if both
         | reference the same non-local symbol, it will be arranged so
         | that both refer to the same memory address.
         | 
         | The IL-based languages retain all the relevant symbol
         | information at runtime, which allows for a lot of flexibility
         | at the cost of some performance - i.e. runtime lookups. This is
         | typically optimized by caching the address after the first
         | lookup, or if JIT compilation is occurring, embedding the
         | relocated addresses in generated code.
         | 
         | The linker UX issues you ran into were mostly a function of the
         | state of the art at the time, though. Languages like Go and
         | Rust do native linking nowadays in a way that users barely
         | notice. IL-based languages had a better linking UX partly
         | because they were forced to - linking problems at runtime do
         | still occur, e.g. "class not found", but if linking in general
         | had been a common problem for _users_ at runtime instead of
         | developers at build time, those languages would have struggled
         | to get adoption.
        
           | neonsunset wrote:
           | Go and Rust are subject to linking too, Rust just happens to
           | have a saner system which deals with it under the hood. It
           | also goes through the same tooling C and C++ do and the
           | subsequent object files may also need to be linked before
           | producing a binary. Java and .NET's loading system are
           | different since JVM uses loading at class granularity based
           | on classpath whilst .NET uses assemblies, with Java, to my
           | knowledge, moving towards modules which are similar a couple
           | decades later (to also improve its startup latency). .NET's
           | assembly system was made to directly address the pains of
           | header/source file compilation and linking issues well-
           | understood even back in the late 90s.
        
             | pjmlp wrote:
             | Java modules have nothing to do with that, rather not all
             | packages are supposed to be public rather sub-packages as
             | way to have clean implementations, but given the
             | granularity, many developers end up relying on internals
             | that were designed only for consumption from public APIs.
             | 
             | .NET Assemblies suffer from the same, unless you make use
             | of some tricks like _InternalsVisibleTo_ attribute.
             | 
             | During the .NET 1.0 days there was the idea to have
             | components, for a role similar to how Java modules have
             | come to fulfill, but it never took off, and the idea was
             | confusing as many developers usually thought they related
             | to COM, when they heard "components" alongside .NET.
             | 
             | https://learn.microsoft.com/en-us/dotnet/framework/app-
             | domai...
        
           | pjmlp wrote:
           | It also happens at compile time if AOT is used.
           | 
           | Go doesn't add much to the way Turbo
           | Pascal/Delphi/Ada/Modula-2,... linkers already work.
           | 
           | The main problem with languages like C and C++ is the
           | prevalence of UNIX linker moderl.
        
         | Narishma wrote:
         | > It's intriguing to think how different my experience could
         | have been if educational material at the time had focused as
         | much on full explanations of the compiler+linker process,
         | including example error conditions, as it did on teaching the
         | language.
         | 
         | Did you not read the manuals that came with Turbo C or Pascal?
         | They explain all those things. They taught both the language
         | and the tools. For example:
         | https://archive.org/details/bitsavers_borlandturVersion5.0Us...
         | 
         | Microsoft tools back then also came with extensive high quality
         | manuals.
        
         | pjmlp wrote:
         | The main difference is that languages that aren't C or C++,
         | usually have the freedom to live outside the UNIX linker model,
         | thus they have much more richer linker tooling.
         | 
         | That C# model you praise, you will find it easily on Object
         | Pascal, Delphi, Modula-2, Eiffel, Oberon, and many other
         | compiled languages that have their own compiler toolchain,
         | without depending on having to have object files that look like
         | they were generated from a C compiler.
        
         | dragontamer wrote:
         | Linkers become abundantly clear when you write an OS from
         | scratch.
         | 
         | Hardware has very peculiar rules for how it loads. The old
         | floppy bootloader would only load the first sector (512 bytes),
         | and after that it's that 512-byte code blocks job to finish
         | loading the code and running it (often called the 2nd stage
         | bootloader).
         | 
         | So writing this makes it super obvious what linkers do. At
         | first you hardcode everything to set addresses. But then a
         | function grows and no longer fits.
         | 
         | So now you have functions + their lengths, as well as a few
         | holes for where your global variables go
         | 
         | And then different .c files may want different global (or
         | static) variables. So now you need to somehow add the lengths
         | of all data segments across all your .c files together.
         | 
         | And then suddenly you understand Linkers, and just use LD / Elf
         | files.
         | 
         | --------
         | 
         | It's a bit of a trial by fire. But not really??? There are
         | super simple computers out there called MicroControllers with
         | just 200 page manuals describing everything.
         | 
         | Writing a bootloader for some simple Atmel AVR chip is perfect
         | for this learning experience. ATMega328p is the classic but
         | there are better more modern chips.
         | 
         | But ATMega328p was popular 15 years ago and still is
         | manufactured in large numbers today
        
           | raddan wrote:
           | I wrote a bootloader for an iPod Mini when I was an
           | undergrad, and honestly, I don't think that would have helped
           | me understand linking the first time around. With 20 years of
           | hindsight and lots more hacking experience I can see the
           | connection, but it's not an obvious one.
        
             | dragontamer wrote:
             | Writing a bootloader in one file is easy enough and will
             | avoid the need of a linker.
             | 
             | The issue is when you have two, three or four .c files that
             | are compiled as separate units that then need to be
             | combined together.
             | 
             | Today, AVR chips and assembly works perfectly fine with
             | .elf objects. But you will likely need to mess with linker
             | scripts to get your bootloader working across different
             | setups.
             | 
             | Especially if you have an element of dynamic boot loading
             | (ex: bootloader program that later continues to load more
             | Application code off of a MicroSD card or UART or over I2C
             | comms.
             | 
             | I'm really not sure how far you can get with this toy
             | project without running into immediate linker issues (or
             | linker scripts).
        
       | stef-13013 wrote:
       | Really nice, thanks !!
        
       ___________________________________________________________________
       (page generated 2025-05-05 23:02 UTC)