[HN Gopher] A Simple ELF
___________________________________________________________________
A Simple ELF
Author : signa11
Score : 122 points
Date : 2024-12-26 18:10 UTC (4 hours ago)
(HTM) web link (4zm.org)
(TXT) w3m dump (4zm.org)
| jart wrote:
| I love articles like this. If you want to see a tutorial on how
| you can take this a step further, by creating a tiny ELF file
| that runs on Linux, FreeBSD, NetBSD, and OpenBSD 7.3 then check
| out https://justine.lol/sizetricks/#elf
| matheusmoreira wrote:
| I would also recommend the legendary Teensy Files:
|
| https://www.muppetlabs.com/~breadbox/software/tiny/
|
| They sparked my interest in ELF and freestanding programs.
| cylinder714 wrote:
| And Chris Wellons' "A Magnetized Needle and a Steady Hand,"
| detailing how to build an ELF implementation of 'true' using
| nothing more than 'echo' or 'printf':
| https://nullprogram.com/blog/2016/11/17/
| matheusmoreira wrote:
| Huge fan of that blog and its author!
| LegionMammal978 wrote:
| If anyone's interested, last year I replicated this exercise
| for an x86-64 Linux executable [0], and also golfed a Hello
| World as small as I could. I ended up using a little-known
| pattern (an ET_DYN executable with no interpreter, normally
| only used for the ld.so binary) to shave off more bytes than
| anyone else who had tried it, to the best of my knowledge.
|
| [0] https://tmpout.sh/3/22.html
| matheusmoreira wrote:
| I would like to note that Linux is the _only_ kernel which will
| allow you to do this! The Linux system call interface is stable
| and defined at the instruction set level. Linking against some
| system library is _absolutely_ required on every other system.
|
| I've written an article about this idea:
|
| https://www.matheusmoreira.com/articles/linux-system-calls
|
| You can get incredibly far with just this. I wrote a freestanding
| lisp interpreter with nothing but Linux system calls. It turned
| into a little framework for freestanding Linux programs. It's
| been incredibly fun.
|
| Freestanding C is a much better language. A lot of legacy
| nonsense is in the standard library. The Linux system call
| interface is really nice to work with. Calling write is not that
| hard. It's the printf style string building and formatting that I
| sometimes miss.
| LegionMammal978 wrote:
| " _Absolutely_ required " is some strong language. It's
| perfectly possible to, e.g., perform direct syscalls on
| Windows, and you'll occasionally see malware that does it to
| avoid certain forms of detection. You just have to switch on
| the OS version, and update your binary if you want it to be
| compatible with a newer version.
| oguz-ismail wrote:
| > Linking against some system library is absolutely required on
| every other system.
|
| Not on FreeBSD, NetBSD, OpenBSD or Solaris.
|
| The article you linked says this but it's not true:
|
| > Sometimes it's not even possible to use system calls at all.
| OpenBSD has implemented system call origin verification, a
| security mechanism that only allows system calls originating
| from the system's libc. So not only is the kernel ABI unstable,
| normal programs are not even allowed to interface with the
| kernel at all.
|
| You can still make system calls from normal programs, you just
| need to list the addresses of system call instructions in an
| ELF section named openbsd.syscalls.
| matheusmoreira wrote:
| > Not on FreeBSD, NetBSD, OpenBSD or Solaris.
|
| Can you cite any sources? I wasn't able to find any
| documentation that corroborates what you said when I wrote
| the article. The few texts I found actually suggested
| otherwise. Maybe things have changed since then?
|
| > You can still make system calls from normal programs, you
| just need to list the addresses of system call instructions
| in an ELF section named openbsd.syscalls.
|
| I see. So they have added a mechanism to list the sections
| allowed to perform system calls. That's news to me. Do they
| guarantee the system call numbers will remain stable though?
| That older system calls will remain available?
| oguz-ismail wrote:
| > Can you cite any sources?
|
| Personal experience.
|
| > Do they guarantee the system call numbers will remain
| stable though?
|
| No. Doesn't mean you can't make system calls from outside
| the libc though.
| LegionMammal978 wrote:
| > Can you cite any sources?
|
| For one, the FreeBSD kernel specifically has a
| compatibility layer for Linux binaries to use their
| familiar syscalls [0]. For its ordinary syscalls, it also
| has a policy not to break binary compatibility without good
| reason [1]. Most other OSes just don't maintain quite the
| level of 'indefinite stability' that the Linux kernel does
| across different versions. And even Linux doesn't implement
| older versions of syscalls when the kernel is ported to new
| architectures, so eventually you have to rotate your
| implementation regardless, if you want people to run your
| code on new systems.
|
| > The few texts I found actually suggested otherwise.
|
| People often say " _X_ is impossible " when the truth is "
| _X_ is tricky and full of caveats, and I don 't want to
| think about it, so stop asking". (Or if the devs themselves
| are saying it, it might be "I want to look like I'm 'tough
| on crime' toward users of undocumented behavior", as if
| that could stop Hyrum's law from running its course.) In
| this case, it's generally "If you do it on an OS other than
| Linux, you can run into big compatibility issues," not
| "It's impossible on OSes other than Linux."
|
| As for compatibility issues, you're running into that the
| moment you do undocumented fun stuff like omitting ELF
| sections or overlapping headers, which future Linux
| versions could start rejecting on the basis of "no one
| needs to do that legitimately". So I wouldn't start drawing
| the line on syscall number compatibility.
|
| [0] https://docs.freebsd.org/en/books/handbook/linuxemu/
|
| [1] https://wiki.freebsd.org/AddingSyscalls#Backward_compat
| ibily
| matheusmoreira wrote:
| > For one, the FreeBSD kernel specifically has a
| compatibility layer for Linux binaries to use their
| familiar syscalls [0].
|
| I believe this strengthens my argument. Linux kernel-
| userspace interface is so stable other projects are
| implementing it. I remember Justine Tunney mentioning
| this before, the idea that the x86_64 Linux system call
| ABI is turning into some kind of lingua franca of systems
| programming.
|
| https://justine.lol/ape.html
|
| > x86-64 Linux ABI Makes a Pretty Good Lingua Franca
|
| Would be interesting if people started targeting Linux
| because of this, banking on the fact that other systems
| will just implement Linux. Even Windows has Linux built
| into it these days.
|
| > For its ordinary syscalls, it also has a policy not to
| break binary compatibility without good reason.
|
| Thank you for the source. I don't think that's a
| particularly strong guarantee. It's certainly stronger
| than OpenBSD's at least.
|
| > Most other OSes just don't maintain quite the level of
| 'indefinite stability' that the Linux kernel does across
| different versions
|
| Yeah. I think this is something that makes Linux unique.
|
| > And even Linux doesn't implement older versions of
| syscalls when the kernel is ported to new architectures,
| so eventually you have to rotate your implementation
| regardless, if you want people to run your code on new
| systems.
|
| That's true. Only new architectures are affected though.
| The old ones have all the old system calls, many with
| multiple versions, all supported. Porting to a new
| architecture doesn't invalidate the stability of existing
| ones.
|
| > People often say "X is impossible" when the truth is "X
| is tricky and full of caveats, and I don't want to think
| about it, so stop asking".
|
| > Or if the devs themselves are saying it, it might be "I
| want to look like I'm 'tough on crime' toward users of
| undocumented behavior"
|
| I get what you're saying. I truly apologize if I came
| across that way. I did _not_ mean to say that.
|
| I got interested in this low level direct system call
| stuff because I literally got sick of reading "but you,
| mere mortal, are not meant to access these raw system
| interfaces, that's for us, you are meant to call the
| little library function we made for you" in the Linux and
| libc manuals. Last thing I want is to end up doing the
| same to others.
|
| By "can't do this" I meant to say the developers
| maintaining the system don't want you bypassing their
| system libraries and won't take responsibility for it if
| you do so. If the program breaks because the kernel
| interfaces changed, they'll tell us it's our own fault
| and refuse fix to it.
|
| Linux takes the opposite approach: breaking user space
| makes Linus Torvalds yell at the people until the
| breakage is reverted. I'm enthusiastic about it because
| it's the only system where this is supported.
|
| > As for compatibility issues, you're running into that
| the moment you start doing undocumented fun stuff like
| omitting ELF sections or overlapping headers
|
| I agree. Should be fine as long as the ELF specification
| is respected. It's okay though, ELF is flexible enough
| that even in 2024 it's possible to invent some new fun
| stuff.
|
| https://www.matheusmoreira.com/articles/self-contained-
| lone-...
|
| Embedding arbitrary files into an existing ELF and
| patching it so that Linux automatically maps it in before
| the program even runs. Since Linux gives processes a
| pointer to the program headers, the file is in memory and
| reachable without a issuing a single system call.
| EGreg wrote:
| An ELF, and almost in time for Christmas!
| compiler-guy wrote:
| If one properly specifies the input, output, and clobber
| constraints to the asm statement, there is no need for the
| volatile keyword in any of this.
| boricj wrote:
| The Linux kernel source tree has nolibc [1], a header-only C
| standard library implementation that is about as barebones and
| paper-thin as it gets and is the next step up from a pure
| freestanding environment as shown in this article. I've used it
| to create a tiny but working program that prints out the ASCII
| table [2] as part of my Ghidra extension test suite.
|
| [1]
| https://github.com/torvalds/linux/tree/master/tools/include/...
|
| [2] https://github.com/boricj/ghidra-delinker-
| extension/tree/mas...
| Retr0id wrote:
| I haven't done a proper write-up yet but this is my current
| technique for emitting minimal ELF files written in freestanding
| C:
|
| 1. hand-written minimal ELF headers, with enough asm to do
| `_exit(main(argc, argv))`:
| https://github.com/DavidBuchanan314/kurl/blob/main/golfed/el...
| (currently only implemented for aarch64)
|
| 2. "Linux Syscall Support" library for conveniently making raw
| syscalls from C: https://chromium.googlesource.com/linux-syscall-
| support/
|
| 3. To avoid custom linker scripts (which I hate with a passion),
| I embed my hand-crafted ELF within a regular ELF, and slice it
| out at the end (using a python script). The "container" ELF is a
| regular full-fat ELF, potentially including working debug
| symbols, but the inner ELF has none of the cruft.
|
| Using this technique, I wrote a barely-functional TLS1.3 client
| that fits in ~3.5KB (see the rest of repo from the first link)
| einpoklum wrote:
| 1. X86_64 assumed...
|
| 2. Why is it that exiting at the end of main() requires a system
| call? Wouldn't a `ret` instruction go "back" to somplace where
| the OS itself will do cleanup work?
| compiler-guy wrote:
| Not without libc doing the glue work.
|
| A return instruction from main hands things back to libc which
| does some cleanup and then makes this same syscall.
| boricj wrote:
| > Why is it that exiting at the end of main() requires a system
| call? Wouldn't a `ret` instruction go "back" to somplace where
| the OS itself will do cleanup work?
|
| Usually that's done by the C runtime library, but there isn't
| one there since this is a freestanding environment. Had the
| program not exited through a syscall (or entered an infinite
| loop), it would most likely crash after veering off the main()
| function.
| cesarb wrote:
| > Why is it that exiting at the end of main() requires a system
| call? Wouldn't a `ret` instruction go "back" to somplace where
| the OS itself will do cleanup work?
|
| The only way for execution to cross the barrier between "user
| space" and "kernel space" is through a system call or an
| interrupt (we won't speak of call gates). Even if the OS had
| put an address on the stack, so that the "ret" would go there
| after returning from main(), the code there would still need to
| do a system call to go back to the OS.
|
| While nowadays Linux has a shared page of code mapped on every
| process (the vDSO), that wasn't the case in the past; all code
| on the "user space" side had to come from either the executable
| itself, or a library it loaded. Given that, it's natural that
| it was left to the executable to call the "exit" system call at
| the end.
| CaesarA wrote:
| I still don't understand how people were able to write software
| in the days when assembly was the only option for speedy
| execution.
| 6SixTy wrote:
| Keeping things pretty simple in project scope and hardware
| helps quite a lot
| akdas wrote:
| A while ago, I created an interactive explanation of the
| different parts of a minimal ELF file:
| https://scratchpad.avikdas.com/elf-explanation/elf-explanati...
|
| I wrote this page for my own compiler that I'm working on, but I
| think it would be a good complement to this article. Note that
| the page is not that great on mobile, the extra real estate on
| desktop really helps.
| josephcsible wrote:
| The custom entry points look wrong to me. Aren't they breaking
| the rules over stack alignment when calling functions?
| Specifically, that rip is supposed to be congruent to 8 mod 16 at
| the beginning of a function, and supposed to be divisible by 16
| right before a call instruction. The problem is that when code
| execution starts at the entry point, rip is divisible by 16, but
| by writing it as a C function, the compiler will assume it's off
| by 8 from what it actually is.
| oguz-ismail wrote:
| Does it matter unless you're reading a float from varargs? What
| else can it break?
| josephcsible wrote:
| I don't know exactly what, but I know there is more than just
| that, because calling printf breaks with a misaligned stack
| even when you're not passing it any floating-point arguments.
| And even if it doesn't break anything for you today, you're
| basically committing UB by violating the compiler's
| assumptions.
| ptspts wrote:
| Aren't there GCC command-line flags to specify alignment
| assumptions?
| josephcsible wrote:
| Yes (see https://gcc.gnu.org/onlinedocs/gcc-14.2.0/gcc/x8
| 6-Options.ht...), but this article doesn't use them.
| fsmv wrote:
| This is from the SysV calling convention not x86 itself. The
| CPU can do unaligned just fine. You don't have to use the
| calling convention when not calling out to a library.
| josephcsible wrote:
| You're right that it's not inherent to the architecture, but
| even if you're only calling your own code, if your own code
| is written in C, then GCC will assume it too, unless you use
| command-line arguments or attributes to tell it otherwise,
| neither of which is being done here.
| quotemstr wrote:
| Christ, why couldn't PE have won?
| boricj wrote:
| As in the Portable Executable file format? There are no tricks
| used in this article that rely on the specifics of ELF, unlike
| some more extreme examples [1] that abuse every trick in the
| book to shave off more bytes from executables.
|
| If anything, PE piggybacks on top of COFF which is a complete
| mess of a file format. I'm currently writing a standalone
| library for reading and writing toolchain file formats [2] (to
| replace some messy bespoke code in my Ghidra extension) and
| this under-specified, fragmented into multiple dialects,
| weirdly contorted relic is a pain to deal with.
|
| COFF was a stepping stone from a.out to ELF that should've
| lasted only a couple of years on Unix systems and somehow it
| managed to metastasize at a crucial point in time inside
| multiple software ecosystems, most notably Windows and
| indirectly .NET and UEFI through PE. Frankly, I'd ask instead
| why couldn't PE and COFF have lost.
|
| [1]
| https://nathanotterness.com/2021/10/tiny_elf_modernized.html
|
| [2] https://github.com/boricj/binary-file-toolkit
| ptspts wrote:
| For 32-bit x86 (i386 and i686), I've written a libc and a
| toolchain to.automate this: https://github.com/pts/minilibc686 .
| It can use mainstream free C compilers (GCC, Clang, OpenWatcom
| cc386, TinyCC and PCC) and assemblers (GNU as and NASM) out of
| the box.
|
| A printf-hello-world is about 1 KiB. A write-hello-world
| (syscalls only) is less than 200 bytes. Assembly programming
| skills not needed to use it.
___________________________________________________________________
(page generated 2024-12-26 23:00 UTC)