[HN Gopher] Driving Compilers (2023)
___________________________________________________________________
Driving Compilers (2023)
Author : misonic
Score : 81 points
Date : 2025-05-05 02:17 UTC (20 hours ago)
(HTM) web link (fabiensanglard.net)
(TXT) w3m dump (fabiensanglard.net)
| lynx97 wrote:
| Nitpick: Almost all Hello World C examples are wrong. printf is
| for when you need to use a format string. Hello World doesn't.
| Besides:
|
| > puts() writes the string s and a trailing newline to stdout.
|
| int main() { puts("Hello World!"); }
| PhilipRoman wrote:
| Eh, it compiles down to the same thing with optimizations
| enabled:
|
| https://godbolt.org/z/zcqa4Txen
|
| But I agree, using printf for constant strings is one step away
| from doing printf(x) which is a big no-no.
| Joker_vD wrote:
| Useless bit of compiler optimizations trivia: the "this
| printf() is equivalent to puts()" optimization seems to work
| by looking for the '%' in the format string, not by counting
| whether there is only one argument to printf(), e.g. if you
| add 42 as a second argument to the printf() -- which is
| absolutely legal and required by the standard to Work as
| Intended(tm) -- the resulting binary still uses puts().
| unwind wrote:
| I agree, but I have to point out that if you're gonna be like
| that, then you should be explicit about your final
| return 0;
| tavianator wrote:
| The C standard (since C99) says that `main()` has an implicit
| `return 0`, you don't need to write it explicitly.
| 01HNNWZ0MV43FF wrote:
| Sure but are we teaching good habits to students, or are we
| golfing?
| indigoabstract wrote:
| The example is kind of pedantic, but I think a linter might be
| able to catch it.
| Mbwagava wrote:
| Eh, not a fan of puts. It doesn't add any value over write or
| printf and it should be named "printLine".
|
| But if you're still using raw libc in 2025 that's a problem you
| willingly opted into. I have zero sympathy.
| tom_ wrote:
| But "Hello world\n" is a format string. The format strings with
| no % chars in them are the best type of format string! They're
| nearly impossible to get wrong!
| david2ndaccount wrote:
| That's not the point of hello world. It's not to be as small a
| valid program as possible. It's to be a small program that also
| exercises the needed functionality for using the tool usefully.
| All of the exercises following that hello world need formatted
| text, so introducing puts would just add confusion and wouldn't
| verify that you have a working printf.
| Timwi wrote:
| I share the frustration the author describes. When I started out
| programming as a child, I used Turbo Pascal, but I was aware of
| Turbo C and that more people used that than Pascal. Nevertheless,
| I couldn't really wrap my head around C at the time, and it was
| partly due to linker errors that I couldn't understand; and it
| seemed that Turbo Pascal just didn't use a linker, so it was
| easier to understand and tinker with at age 9.
|
| It's intriguing to think how different my experience could have
| been if educational material at the time had focused as much on
| full explanations of the compiler+linker process, including
| example error conditions, as it did on teaching the language.
|
| 30 years later, I like to claim that I have a reasonably workable
| understanding of how compilers work, but I'm still nebulous on
| how linkers do what they do. I'm much more comfortable with
| higher-level compilers such as C# that compile to a VM bytecode
| (IL) and don't worry about linkers.
| virgilp wrote:
| Linkers pretty much map data sections to memory, and in doing
| so are able to replace symbolic names (like global variables,
| or goto targets) with numbers. They may also completely drop
| some things that are not needed (e.g. code/files in a library
| that is never referenced).
|
| I'm over-simplifying and also it's a bit incorrect, because
| there's also the loader that does a lot of the same work that
| linkers do, when loading the program in memory. So linkers
| don't actually produce the final image - but really, they're
| rather "simple" things (for some definition of "simple").
|
| The hard-to-understand linker errors are typically caused by
| the compiler, not the linker (it's the compiler that
| speculatively chooses to use a symbol with a long and funny
| name, thinking that it'll later be provided by <somebody>, when
| in fact the linker later finds out that no library or object
| file actually provided said symbol; and then for the linker to
| give you a decent error message, it needs to have a pretty good
| understanding of what the compiler was actually trying to do -
| i.e. to know implementation details of the compiler that
| otherwise would not concern it at all).
| tester756 wrote:
| >The hard-to-understand linker errors are typically caused by
| the compiler, not the linker (it's the compiler that
| speculatively chooses to use a symbol with a long and funny
| name, thinking that it'll later be provided by <somebody>,
| when in fact the linker later finds out that no library or
| object file actually provided said symbol; and then for the
| linker to give you a decent error message, it needs to have a
| pretty good understanding of what the compiler was actually
| trying to do - i.e. to know implementation details of the
| compiler that otherwise would not concern it at all).
|
| So... maybe let's avoid having linker as another/external
| tool and just let compiler perform linking
| boricj wrote:
| The linker stitches object files together, regardless of
| their origin. If a compiler directly outputs a finalized
| artifact, then it would be impossible to add code written
| in other programming languages into the mix unless the
| compiler also doubles as a linker.
| tester756 wrote:
| > then it would be impossible to add code written in
| other programming languages into
|
| Is this really that important that we cannot skip this
| requirement?
| raddan wrote:
| You might be surprised how often multi-language programs
| appear. Basically all of modern day Python for starters.
| But also a number of important numerical libraries for C
| are actually written in Fortran.
| antonvs wrote:
| C# and Java still do linking, it just happens dynamically at
| runtime. That's part of why startup time is slower in those
| languages, and why performance can be less predictable.
|
| The main difference between linkers for native binaries and
| linking in IL-based languages is that native binary linking
| involves resolving memory addresses at build time. In the
| object files that are being linked, memory addresses are
| typically 0-relative to whatever section they're in within that
| file. When you combine a bunch of object files together, you
| have to adjust the addresses so they can live together in the
| same address space. Object file A and B both might use
| addresses 0-10, but when they're linked together, the linker
| will arrange it so that e.g. A uses 0-10 and B uses 11-21.
| That's just a bit of simple offset arithmetic. And if both
| reference the same non-local symbol, it will be arranged so
| that both refer to the same memory address.
|
| The IL-based languages retain all the relevant symbol
| information at runtime, which allows for a lot of flexibility
| at the cost of some performance - i.e. runtime lookups. This is
| typically optimized by caching the address after the first
| lookup, or if JIT compilation is occurring, embedding the
| relocated addresses in generated code.
|
| The linker UX issues you ran into were mostly a function of the
| state of the art at the time, though. Languages like Go and
| Rust do native linking nowadays in a way that users barely
| notice. IL-based languages had a better linking UX partly
| because they were forced to - linking problems at runtime do
| still occur, e.g. "class not found", but if linking in general
| had been a common problem for _users_ at runtime instead of
| developers at build time, those languages would have struggled
| to get adoption.
| neonsunset wrote:
| Go and Rust are subject to linking too, Rust just happens to
| have a saner system which deals with it under the hood. It
| also goes through the same tooling C and C++ do and the
| subsequent object files may also need to be linked before
| producing a binary. Java and .NET's loading system are
| different since JVM uses loading at class granularity based
| on classpath whilst .NET uses assemblies, with Java, to my
| knowledge, moving towards modules which are similar a couple
| decades later (to also improve its startup latency). .NET's
| assembly system was made to directly address the pains of
| header/source file compilation and linking issues well-
| understood even back in the late 90s.
| pjmlp wrote:
| Java modules have nothing to do with that, rather not all
| packages are supposed to be public rather sub-packages as
| way to have clean implementations, but given the
| granularity, many developers end up relying on internals
| that were designed only for consumption from public APIs.
|
| .NET Assemblies suffer from the same, unless you make use
| of some tricks like _InternalsVisibleTo_ attribute.
|
| During the .NET 1.0 days there was the idea to have
| components, for a role similar to how Java modules have
| come to fulfill, but it never took off, and the idea was
| confusing as many developers usually thought they related
| to COM, when they heard "components" alongside .NET.
|
| https://learn.microsoft.com/en-us/dotnet/framework/app-
| domai...
| pjmlp wrote:
| It also happens at compile time if AOT is used.
|
| Go doesn't add much to the way Turbo
| Pascal/Delphi/Ada/Modula-2,... linkers already work.
|
| The main problem with languages like C and C++ is the
| prevalence of UNIX linker moderl.
| Narishma wrote:
| > It's intriguing to think how different my experience could
| have been if educational material at the time had focused as
| much on full explanations of the compiler+linker process,
| including example error conditions, as it did on teaching the
| language.
|
| Did you not read the manuals that came with Turbo C or Pascal?
| They explain all those things. They taught both the language
| and the tools. For example:
| https://archive.org/details/bitsavers_borlandturVersion5.0Us...
|
| Microsoft tools back then also came with extensive high quality
| manuals.
| pjmlp wrote:
| The main difference is that languages that aren't C or C++,
| usually have the freedom to live outside the UNIX linker model,
| thus they have much more richer linker tooling.
|
| That C# model you praise, you will find it easily on Object
| Pascal, Delphi, Modula-2, Eiffel, Oberon, and many other
| compiled languages that have their own compiler toolchain,
| without depending on having to have object files that look like
| they were generated from a C compiler.
| dragontamer wrote:
| Linkers become abundantly clear when you write an OS from
| scratch.
|
| Hardware has very peculiar rules for how it loads. The old
| floppy bootloader would only load the first sector (512 bytes),
| and after that it's that 512-byte code blocks job to finish
| loading the code and running it (often called the 2nd stage
| bootloader).
|
| So writing this makes it super obvious what linkers do. At
| first you hardcode everything to set addresses. But then a
| function grows and no longer fits.
|
| So now you have functions + their lengths, as well as a few
| holes for where your global variables go
|
| And then different .c files may want different global (or
| static) variables. So now you need to somehow add the lengths
| of all data segments across all your .c files together.
|
| And then suddenly you understand Linkers, and just use LD / Elf
| files.
|
| --------
|
| It's a bit of a trial by fire. But not really??? There are
| super simple computers out there called MicroControllers with
| just 200 page manuals describing everything.
|
| Writing a bootloader for some simple Atmel AVR chip is perfect
| for this learning experience. ATMega328p is the classic but
| there are better more modern chips.
|
| But ATMega328p was popular 15 years ago and still is
| manufactured in large numbers today
| raddan wrote:
| I wrote a bootloader for an iPod Mini when I was an
| undergrad, and honestly, I don't think that would have helped
| me understand linking the first time around. With 20 years of
| hindsight and lots more hacking experience I can see the
| connection, but it's not an obvious one.
| dragontamer wrote:
| Writing a bootloader in one file is easy enough and will
| avoid the need of a linker.
|
| The issue is when you have two, three or four .c files that
| are compiled as separate units that then need to be
| combined together.
|
| Today, AVR chips and assembly works perfectly fine with
| .elf objects. But you will likely need to mess with linker
| scripts to get your bootloader working across different
| setups.
|
| Especially if you have an element of dynamic boot loading
| (ex: bootloader program that later continues to load more
| Application code off of a MicroSD card or UART or over I2C
| comms.
|
| I'm really not sure how far you can get with this toy
| project without running into immediate linker issues (or
| linker scripts).
| stef-13013 wrote:
| Really nice, thanks !!
___________________________________________________________________
(page generated 2025-05-05 23:02 UTC)