[HN Gopher] The Journey Before main()
       ___________________________________________________________________
        
       The Journey Before main()
        
       Author : amitprasad
       Score  : 108 points
       Date   : 2025-10-25 19:33 UTC (3 hours ago)
        
 (HTM) web link (amit.prasad.me)
 (TXT) w3m dump (amit.prasad.me)
        
       | hagbard_c wrote:
       | On the subject of symbols:
       | 
       | > Yeah, that's it. Now, 2308 may be _slightly_ bloated because we
       | link against musl instead of glibc, but the point still stands:
       | There's a lot of stuff going on behind the scenes here.
       | 
       | Slightly bloated is a slight understatement. The same program
       | linked to glibc tops at _36_ symbols in _.symtab_ :
       | $ readelf -a hello|grep "'.symtab'"         Symbol table
       | '.symtab' contains 36 entries:
        
         | amitprasad wrote:
         | Ah I should have taken the time to verify; It might also have
         | something to do with the way I was compiling / cross-compiling
         | for RISC-V!
         | 
         | More generally, I'm not surprised at the symtab bloat from
         | statically-linking given the absolute size increase of the
         | binary.
        
       | vbezhenar wrote:
       | I wonder how many C projects prefer to avoid standard library,
       | just invoking Linux syscalls directly. Much more fun to write
       | software this way, IMO.
        
         | forrestthewoods wrote:
         | You had me with "avoid C standard library" but lost me at
         | "incoming Linux syscalls directly".
         | 
         | Windows support is a requirement, and no WSL2 doesn't count.
         | 
         | C standard library is pretty bad and it'd be great if not using
         | it was a little easier and more common.
        
           | pmc00 wrote:
           | You can do this in Windows too, useful if you want tiny
           | executables that use minimum resources.
           | 
           | I wrote this little systemwide mute utility for Windows that
           | way, annoying to be missing some parts of the CRT but not
           | bad, code here: https://github.com/pablocastro/minimute
        
             | gpm wrote:
             | I thought windows had an unstable syscall interface?
        
               | LegionMammal978 wrote:
               | It looks like that project does link against the usual
               | Windows DLLs, it just doesn't use a static or dynamic C
               | runtime.
        
               | pmc00 wrote:
               | Windows isn't quite like Linux in that typically apps
               | don't make syscalls directly. Maybe you could say what's
               | in ntdll is the system call contract, but in practice you
               | call the subsystem specific API, typically the Win32 API,
               | which is huge compared to the Linux syscall list because
               | it includes all sorts of things like UI, COM (!), etc.
               | 
               | The project has some of the properties discussed above
               | such as not having a typical main() (or winmain), because
               | there's no CRT to call it.
        
               | Dwedit wrote:
               | Pretty much yeah.
               | 
               | You have your usual Win32 API functions found in
               | libraries like Kernel32, User32, and GDI32, but since
               | after Windows XP, those don't actually make system calls.
               | The actual system calls are found in NTDLL and Win32U.
               | Lots of functions you can import, and they're basically
               | one instruction long. Just SYSENTER for the native
               | version, or a switch back to 64-bit mode for a WOW64 DLL.
               | The names of the function always begin with Nt, like
               | NtCreateFile. There's a corresponding Kernel mode call
               | that starts with Zw instead, so in Kernel mode you have
               | ZwCreateFile.
               | 
               | But the system call numbers used with SYSENTER are indeed
               | reordered every time there's a major version change to
               | Windows, so you just call into NTDLL or Win32U instead if
               | you want to directly make a system call.
        
           | antihero wrote:
           | > Windows support is a requirement
           | 
           | Why, exactly?
        
           | AnimalMuppet wrote:
           | > Windows support is a requirement...
           | 
           | For what?
           | 
           | There is _some_ software for which Windows support is
           | required. There are others for which it is not, and never
           | will be. (And for an article about running ELF files on RiscV
           | with a Linux OS, the  "Windows support" complaint seems a bit
           | odd...)
        
           | throwawaysoxjje wrote:
           | A requirement from whom? To do what?
        
           | rfl890 wrote:
           | You can make CRT-free Win32 programs, read this guide[1] and
           | you're all set. I've written a couple CLI utilities which are
           | completely CRT-free and weigh just under a few kilobytes.
           | 
           | [1]: https://nullprogram.com/blog/2023/02/15/
        
             | forrestthewoods wrote:
             | Great post!
        
           | WJW wrote:
           | Obviously only a requirement if you intend your software to
           | run under windows. But if you don't, why bother. Not all
           | software is intended to be distributed to users far and wide.
           | Some of it is just for yourself, and some of it will only
           | ever run on linux servers.
        
             | forrestthewoods wrote:
             | > some of it will only ever run on linux servers.
             | 
             | I've spent quite a lot of time dealing with code that will
             | ever run on Linux which did not in fact only ever run on
             | Linux!
             | 
             | Obviously for hobby projects anyone can do what they want.
             | But adult projects should support Windows imho and consider
             | Windows support from the start. Cross-platform is super
             | easy unless you choose to make it hard.
        
         | jjmarr wrote:
         | Tons of driver code does this.
        
         | 1718627440 wrote:
         | I generally try to stay portable, but file descriptors are just
         | to nice, to not use them.
        
           | Retr0id wrote:
           | File descriptors are part of the linux syscall API, not libc.
           | Are you thinking of FILE?
        
         | electroly wrote:
         | Not exactly the same, but on Windows if you use entirely Win32
         | calls you can avoid linking any C runtime library. Win32 is
         | below the C standard library on Windows and the C runtime is
         | optional.
        
       | mmsc wrote:
       | It's also possible to pack a whole codebase into "before main()"
       | - or with no main() at all. I was recently experimenting doing
       | this, as well as a whole codebase that only uses main() and calls
       | itself over and over. Good fun: https://joshua.hu/packing-
       | codebase-into-single-function-disr...
        
       | khaledh wrote:
       | > A note on interpreters: If the executable file starts with a
       | shebang (#!), the kernel will use the shebang-specified
       | interpreter to run the program. For example, #!/usr/bin/python3
       | will run the program using the Python interpreter, #!/bin/bash
       | will run the program using the Bash shell, etc.
       | 
       | This caused me a lot of pain while trying to debug a 3rd party
       | Java application that was trying to launch an executable script,
       | and throwing an IO error "java.io.IOException: error=2, No such
       | file or directory." I was puzzled because I _know_ the script is
       | right there (using its full path) and it had the executable bit
       | set. It turns out that the shebang in the script was wrong, so
       | the OS was complaining (actual error from a shell would be  "The
       | file specified the interpreter '/foo/bar', which is not an
       | executable command."), but the Java error was completely
       | misleading :|
       | 
       | Note: If you wonder why I didn't see this error by running the
       | script myself: I did, and it ran fine locally. But the
       | application was running on a remote host that had a different
       | path for the interpreter.
        
         | mscdex wrote:
         | Also be aware that kernel support for shebangs depends on
         | CONFIG_BINFMT_SCRIPT=y being in the kernel config.
        
         | 1718627440 wrote:
         | Note, that this is not a Java specific problem, it can occur
         | with other programs as well. "No such file or directory" is
         | just the nice description for ENOENT, which can occur in a lot
         | of syscalls. I typically just run the program through strace,
         | then you will quickly see what the program did.
        
         | gjf wrote:
         | For those interested, I did a breakdown of the hashbang:
         | https://blog.foletta.net/post/2021-04-19-what-the/
        
       | itopaloglu83 wrote:
       | I like doing this with old microcontrollers like PIC16 series
       | etc. You said see how to stack pointer, timers, and variables
       | etc. all are configured.
        
       | fweimer wrote:
       | > The ELF file contains a dynamic section which tells the kernel
       | which shared libraries to load, and another section which tells
       | the kernel to dynamically "relocate" pointers to those functions,
       | so everything checks out.
       | 
       | This is _not_ how dynamic linking works on GNU /Linux. The kernel
       | processes the program headers for the main program (mapping the
       | PT_LOAD segments, without relocating them) and notices the
       | PT_INTERP program interpreter (the path to the dynamic linker)
       | among the program headers. The kernel then loads the dynamic
       | linker in much the same way as the main program (again without
       | relocation) and transfers control to its entry point. It's up to
       | the dynamic linker to self-relocate, load the referenced share
       | objects (this time using plain mmap and mprotect, the kernel ELF
       | loader is not used for that), relocate them and the main program,
       | and then transfer control to the main program.
       | 
       | The scheme is not that dissimilar to the #! shebang lines, with
       | the dynamic linker taking the role of the script interpreter,
       | except that ELF is a binary format.
        
         | amitprasad wrote:
         | You're right, and I knew this back in February when I wrote
         | most of this post. I must have revised it down incorrectly
         | before posting; will correct. Bit of a facepalm from my side.
        
       | turbert wrote:
       | Its been a while since I've touched this stuff but my
       | recollection is the ELF interpreter (ldso, not the kernel) is
       | responsible for everything after mapping the initial ELF's
       | segments.
       | 
       | iirc execve maps pt_load segments from the program header,
       | populates the aux vector on the stack, and jump straight to the
       | ELF interpreter's entry point. Any linked objects are loaded in
       | userspace by the elf interpreter. The kernel has no knowledge of
       | the PLT/GOT.
        
       | archmaster wrote:
       | This is awesome! To anyone interested in learning more about
       | this, I wrote https://cpu.land/ a couple years ago. It doesn't go
       | as in-depth into e.g. memory layout as OP does but does cover
       | multitasking and how the code is loaded in the first place.
        
       ___________________________________________________________________
       (page generated 2025-10-25 23:00 UTC)