[HN Gopher] Linux x86 program start up - How the heck do we get ...
___________________________________________________________________
Linux x86 program start up - How the heck do we get to main()?
(2011)
Author : rwmj
Score : 162 points
Date : 2021-11-04 09:04 UTC (13 hours ago)
(HTM) web link (dbp-consulting.com)
(TXT) w3m dump (dbp-consulting.com)
| matheusmoreira wrote:
| Great article, it was immensely useful in my quest to understand
| Linux processes. It says "Linux x86 program" but it assumes it's
| a hosted C program whose entry point is provided by GCC
| startfiles and libc. Everything after _start is specific to C.
| Note that _start is an arbitrary symbol and merely the linker's
| default value of the entry point. It can be changed to anything.
|
| The important Linux facts are:
|
| 1. _start is not a function
|
| It can't be returned from, the exit system call must be issued
| before execution terminates.
|
| 2. Arguments and environment are on the stack
|
| Argument count and vector can be simply popped off the stack into
| appropriate registers, in that order.
|
| The environment vector is located after the NULL terminator of
| the argument vector, or argument count + 1.
|
| The auxiliary vector is located after the NULL terminator of the
| environment vector. No count is provided for that, so code must
| loop through the environment looking for the sentinel in order to
| find it.
|
| The auxiliary vector is really interesting. I don't usually see
| software making direct use of it. It contains interesting
| information such as CPU identifier and capabilities, page size,
| the location of the Linux vDSO, some random bytes, program file
| name, user and group IDs, among other things.
|
| https://github.com/torvalds/linux/blob/master/include/uapi/l...
|
| https://github.com/torvalds/linux/blob/master/arch/x86/inclu...
|
| https://github.com/torvalds/linux/blob/master/Documentation/...
|
| This is the data the Linux kernel passes to programs. After
| organizing these parameters, the program is free to do whatever
| it wants. The libc entry point will naturally start setting up
| libc. In particular, it seems to spend a lot of time setting up
| the init and fini insanity that's probably better off forgotten.
|
| https://blogs.oracle.com/solaris/post/init-and-fini-processi...
|
| It's not necessary. After this, you can just run your program
| directly. I used to develop a liblinux that illustrates all this
| with much simpler code:
|
| https://github.com/matheusmoreira/liblinux/blob/master/start...
|
| https://github.com/matheusmoreira/liblinux/blob/master/start...
|
| The entry point code passes the stack pointer to a C function
| which gathers all kernel parameters and starts the program with
| no further setup. I made several example programs, including one
| which outputs all these variables.
|
| https://github.com/matheusmoreira/liblinux/blob/master/examp...
|
| I stopped developing this because I discovered the Linux itself
| has a better solution that they use for their own tools:
|
| https://github.com/torvalds/linux/blob/master/tools/include/...
|
| The entry point code for all supported architectures is present
| as inline assembly code!
| p4bl0 wrote:
| > _The environment vector is located after the NULL terminator
| of the environment vector, or argument count + 1._
|
| For those who read that and don't know, it is the second
| occurrence of "environment" that should be "arguments".
| matheusmoreira wrote:
| Fixed it, thanks!
| titzer wrote:
| Hey, good to see you and liblinux again.
|
| I'll just add that yes, things are actually way simple from ELF
| point of view. If you generate an ELF file by hand (or from a
| compiler you write), you can simply point it to the first
| instruction and the argv, argc, and environment pointers arrive
| as described above.
|
| Virgil startup code is ~15 assembly instructions (even less for
| test binaries), and then it calls into the Virgil runtime
| source to get the heap setup and start allocating the first
| objects (array of strings for arguments).
|
| I love low level posts like these. It's important we don't
| forget that C is just _one_ alternative.
| matheusmoreira wrote:
| Good to see you and Virgil again!
|
| > I love low level posts like these. It's important we don't
| forget that C is just _one_ alternative.
|
| Yes!! It's good to know where Linux ends and all the other
| stuff begins. Existing documentation makes things really
| confusing, it assumes people want the C stuff. Sometimes it
| even tells readers they aren't supposed to touch these
| "internals". It takes a lot of work to unravel this mess and
| get to the essential stuff.
| lathiat wrote:
| I wonder how you get those 16 random bytes. I had no idea so
| much info was in AUx. uid. Program name etc. TIL.
| rwmj wrote:
| The AT_RANDOM bytes are designed to provide randomness to the
| loader when the program is loaded. I looked at glibc and it's
| using this for a random stack check canary (which is needed
| very early during dynamic loading), and not actually used for
| anything else.
|
| glibc nulls out its internal pointer (_dl_random) after use
| so you can't easily get the pointer later, but of course it'd
| be a bad idea to try and use it.
| ebingdom wrote:
| > It can't be returned from, the exit system call must be
| issued before execution terminates.
|
| So what happens if exit isn't called?
| JCWasmx86 wrote:
| 1. If you add a "ret", you just jump to an invalid address.
|
| 2. If you add nothing, the CPU will continue to execute the
| bytes that follow.
|
| In both cases it is quite certain you end with a segmentation
| fault.(Or in case 2., an illegal instruction)
| chrisseaton wrote:
| You keep running whatever 'instructions' appear in the data
| after your last program instruction, so you could most likely
| have a segmentation fault, or worse run some random data as
| instructions and so crash, or worse still run some random
| data that happens to also be real instructions that does
| something harmful.
| matheusmoreira wrote:
| Segmentation violation.
|
| I've seen code with a hlt instruction after main and the exit
| system call. Not sure what their intentions are, it should be
| unreachable.
| dang wrote:
| One past thread:
|
| _Linux x86 Program Start Up_ -
| https://news.ycombinator.com/item?id=8739661 - Dec 2014 (30
| comments)
| HahaReally wrote:
| https://news.ycombinator.com/from?site=dbp-consulting.com
|
| There are more than that lol. Gets reposted once a year almost.
| commandlinefan wrote:
| > part of the problem is that there is a prevalent unconscious
| gender bias in STEM that makes it unwelcoming for women
|
| Just three paragraphs in.
| chrsig wrote:
| oh no! anything but trying to make STEM more welcoming to the
| women!! How can we possibly read any of this!?
| avsbst wrote:
| Would refer to the guidelines [1] for this site:
|
| > Be kind. Don't be snarky.
|
| > Comments should get more thoughtful and substantive, not
| less, as a topic gets more divisive.
|
| > Eschew flamebait. Avoid unrelated controversies and generic
| tangents.
|
| [1] https://news.ycombinator.com/newsguidelines.html#comments
___________________________________________________________________
(page generated 2021-11-04 23:01 UTC)