[HN Gopher] Linux x86 program start up - How the heck do we get ...
       ___________________________________________________________________
        
       Linux x86 program start up - How the heck do we get to main()?
       (2011)
        
       Author : rwmj
       Score  : 162 points
       Date   : 2021-11-04 09:04 UTC (13 hours ago)
        
 (HTM) web link (dbp-consulting.com)
 (TXT) w3m dump (dbp-consulting.com)
        
       | matheusmoreira wrote:
       | Great article, it was immensely useful in my quest to understand
       | Linux processes. It says "Linux x86 program" but it assumes it's
       | a hosted C program whose entry point is provided by GCC
       | startfiles and libc. Everything after _start is specific to C.
       | Note that _start is an arbitrary symbol and merely the linker's
       | default value of the entry point. It can be changed to anything.
       | 
       | The important Linux facts are:
       | 
       | 1. _start is not a function
       | 
       | It can't be returned from, the exit system call must be issued
       | before execution terminates.
       | 
       | 2. Arguments and environment are on the stack
       | 
       | Argument count and vector can be simply popped off the stack into
       | appropriate registers, in that order.
       | 
       | The environment vector is located after the NULL terminator of
       | the argument vector, or argument count + 1.
       | 
       | The auxiliary vector is located after the NULL terminator of the
       | environment vector. No count is provided for that, so code must
       | loop through the environment looking for the sentinel in order to
       | find it.
       | 
       | The auxiliary vector is really interesting. I don't usually see
       | software making direct use of it. It contains interesting
       | information such as CPU identifier and capabilities, page size,
       | the location of the Linux vDSO, some random bytes, program file
       | name, user and group IDs, among other things.
       | 
       | https://github.com/torvalds/linux/blob/master/include/uapi/l...
       | 
       | https://github.com/torvalds/linux/blob/master/arch/x86/inclu...
       | 
       | https://github.com/torvalds/linux/blob/master/Documentation/...
       | 
       | This is the data the Linux kernel passes to programs. After
       | organizing these parameters, the program is free to do whatever
       | it wants. The libc entry point will naturally start setting up
       | libc. In particular, it seems to spend a lot of time setting up
       | the init and fini insanity that's probably better off forgotten.
       | 
       | https://blogs.oracle.com/solaris/post/init-and-fini-processi...
       | 
       | It's not necessary. After this, you can just run your program
       | directly. I used to develop a liblinux that illustrates all this
       | with much simpler code:
       | 
       | https://github.com/matheusmoreira/liblinux/blob/master/start...
       | 
       | https://github.com/matheusmoreira/liblinux/blob/master/start...
       | 
       | The entry point code passes the stack pointer to a C function
       | which gathers all kernel parameters and starts the program with
       | no further setup. I made several example programs, including one
       | which outputs all these variables.
       | 
       | https://github.com/matheusmoreira/liblinux/blob/master/examp...
       | 
       | I stopped developing this because I discovered the Linux itself
       | has a better solution that they use for their own tools:
       | 
       | https://github.com/torvalds/linux/blob/master/tools/include/...
       | 
       | The entry point code for all supported architectures is present
       | as inline assembly code!
        
         | p4bl0 wrote:
         | > _The environment vector is located after the NULL terminator
         | of the environment vector, or argument count + 1._
         | 
         | For those who read that and don't know, it is the second
         | occurrence of "environment" that should be "arguments".
        
           | matheusmoreira wrote:
           | Fixed it, thanks!
        
         | titzer wrote:
         | Hey, good to see you and liblinux again.
         | 
         | I'll just add that yes, things are actually way simple from ELF
         | point of view. If you generate an ELF file by hand (or from a
         | compiler you write), you can simply point it to the first
         | instruction and the argv, argc, and environment pointers arrive
         | as described above.
         | 
         | Virgil startup code is ~15 assembly instructions (even less for
         | test binaries), and then it calls into the Virgil runtime
         | source to get the heap setup and start allocating the first
         | objects (array of strings for arguments).
         | 
         | I love low level posts like these. It's important we don't
         | forget that C is just _one_ alternative.
        
           | matheusmoreira wrote:
           | Good to see you and Virgil again!
           | 
           | > I love low level posts like these. It's important we don't
           | forget that C is just _one_ alternative.
           | 
           | Yes!! It's good to know where Linux ends and all the other
           | stuff begins. Existing documentation makes things really
           | confusing, it assumes people want the C stuff. Sometimes it
           | even tells readers they aren't supposed to touch these
           | "internals". It takes a lot of work to unravel this mess and
           | get to the essential stuff.
        
         | lathiat wrote:
         | I wonder how you get those 16 random bytes. I had no idea so
         | much info was in AUx. uid. Program name etc. TIL.
        
           | rwmj wrote:
           | The AT_RANDOM bytes are designed to provide randomness to the
           | loader when the program is loaded. I looked at glibc and it's
           | using this for a random stack check canary (which is needed
           | very early during dynamic loading), and not actually used for
           | anything else.
           | 
           | glibc nulls out its internal pointer (_dl_random) after use
           | so you can't easily get the pointer later, but of course it'd
           | be a bad idea to try and use it.
        
         | ebingdom wrote:
         | > It can't be returned from, the exit system call must be
         | issued before execution terminates.
         | 
         | So what happens if exit isn't called?
        
           | JCWasmx86 wrote:
           | 1. If you add a "ret", you just jump to an invalid address.
           | 
           | 2. If you add nothing, the CPU will continue to execute the
           | bytes that follow.
           | 
           | In both cases it is quite certain you end with a segmentation
           | fault.(Or in case 2., an illegal instruction)
        
           | chrisseaton wrote:
           | You keep running whatever 'instructions' appear in the data
           | after your last program instruction, so you could most likely
           | have a segmentation fault, or worse run some random data as
           | instructions and so crash, or worse still run some random
           | data that happens to also be real instructions that does
           | something harmful.
        
           | matheusmoreira wrote:
           | Segmentation violation.
           | 
           | I've seen code with a hlt instruction after main and the exit
           | system call. Not sure what their intentions are, it should be
           | unreachable.
        
       | dang wrote:
       | One past thread:
       | 
       |  _Linux x86 Program Start Up_ -
       | https://news.ycombinator.com/item?id=8739661 - Dec 2014 (30
       | comments)
        
         | HahaReally wrote:
         | https://news.ycombinator.com/from?site=dbp-consulting.com
         | 
         | There are more than that lol. Gets reposted once a year almost.
        
       | commandlinefan wrote:
       | > part of the problem is that there is a prevalent unconscious
       | gender bias in STEM that makes it unwelcoming for women
       | 
       | Just three paragraphs in.
        
         | chrsig wrote:
         | oh no! anything but trying to make STEM more welcoming to the
         | women!! How can we possibly read any of this!?
        
         | avsbst wrote:
         | Would refer to the guidelines [1] for this site:
         | 
         | > Be kind. Don't be snarky.
         | 
         | > Comments should get more thoughtful and substantive, not
         | less, as a topic gets more divisive.
         | 
         | > Eschew flamebait. Avoid unrelated controversies and generic
         | tangents.
         | 
         | [1] https://news.ycombinator.com/newsguidelines.html#comments
        
       ___________________________________________________________________
       (page generated 2021-11-04 23:01 UTC)