[HN Gopher] How is a binary executable organized? Let's explore ...
___________________________________________________________________
How is a binary executable organized? Let's explore it (2014)
Author : tripdout
Score : 284 points
Date : 2024-02-02 17:43 UTC (5 hours ago)
(HTM) web link (jvns.ca)
(TXT) w3m dump (jvns.ca)
| jstrieb wrote:
| As I've said in other threads
| https://news.ycombinator.com/item?id=38847750#38862450, I highly
| recommend writing an ELF by hand at least once. It's a great
| exercise to understand the basic parts of an executable. It's
| also helpful if you want to go the opposite direction of this
| article - bottom-up instead of top-down.
|
| Lots of other great discussion in various threads on that other
| HN post.
| Retr0id wrote:
| Similarly, I'd recommend writing a simple ELF loader. There's a
| fair bit of implementation complexity in dynamic linking, but
| if you only support static ELFs then it's straight-forward.
| mkoubaa wrote:
| I've seriously considered writing an ELF loader that uses a
| special symbol (like _resolve) where dynamic library
| resolving is done imperatively. The flexibility from libdl
| always feel underwhelming.
| Isamu wrote:
| Yes, likewise I wrote a reader that simply tried to parse
| every bit of a complex ELF binary to report its structure and
| quickly found myself in poorly documented territory. It's an
| education if you want it.
| akdas wrote:
| Writing an ELF file by hand is something I did recently:
| https://github.com/avik-das/garlic/blob/master/recursive/elf...
|
| To explain the format to myself and others, I also created an
| interactive visualization for the bytes in the file. It helps
| me to click on a byte, see an explanation for it and see
| related bytes elsewhere in the file highlighted.
| https://scratchpad.avikdas.com/elf-explanation/elf-explanati...
| sva_ wrote:
| > When the program starts running, you might think it starts at
| main. It doesn't! It actually goes to _start. This does a bunch
| of Very Important Things that I don't understand very well,
| including calling main. So I won't explain them.
|
| The way I understand it, the symbol main is a C-specific thing.
| The symbol _start is a language-agnostic entry point for the
| binary that will in this case call main.
|
| A convention of i.e. calling the entry point _start with main's
| argc/argv would make the format a lot less flexible.
| a-priori wrote:
| Technically the name _start is not special either. The binary
| lists its entry point address in a header and that's where the
| OS starts execution from. That symbol is just called _start by
| convention by C and other languages, which is what the linker
| uses to set the entry point when writing the ELF headers, but
| if you're writing your own linker scripts you could call the
| entry point whatever you want.
| Keyframe wrote:
| to extend on it, _start is where .text begins and address of
| that is set by linker
| a-priori wrote:
| The entry point can be anywhere in the .text section, and
| often won't be at the beginning of the section.
| Retr0id wrote:
| Technically it doesn't even need to be in the .text
| section, it could be anywhere in the address space.
| You'll get a segfault if it's not somewhere executable
| though (assuming you're on a system with an appropriately
| configured MMU)
| Keyframe wrote:
| yes and then you'll have a bad time, but at the same time
| per convention _start is where .text begins. You can see
| where it starts with readelf --file-header <executable>
| and look at Entry point address field. You can change it,
| yes.
| logdahl wrote:
| A common hack to reduce ELF size is actually to start the
| first section (possibly the .text) right on the elf
| header, as this circumvents the alignment requirements.
| Keyframe wrote:
| probably not even mandatory.. lots of /usr/bin stuff on
| my ubuntu machine have __libc_start_main only
| Retr0id wrote:
| No, it's not even a convention, _start is most commonly
| _not_ where .text begins.
|
| Compiling a static hello world binary on my system
| (aarch64 fedora 39, gcc -static hello.c -o hello), .text
| starts at 0x410080, e_entry is at 0x4103c0, and the
| _start symbol is also at 0x4103c0. This is not unusual at
| all.
| vlovich123 wrote:
| > Things that I don't understand very well, including calling
| main. So I won't explain them.
|
| It depends on the language runtime, but a common task will be
| initializing global non-0 statics. For languages like
| Rust/C/C++ you can also inject variables to be initialized via
| linker flags. Before start if the program is dynamically linked
| then I believe the linker runtime is run to resolve the links
| and then transfer control to _start.
|
| Basically hacks on hacks on hacks added organically to offer
| extensibility and the hacks have enough social adoption and are
| good enough that we stick with them.
| seanw444 wrote:
| > Basically hacks on hacks on hacks added organically to
| offer extensibility and the hacks have enough social adoption
| and are good enough that we stick with them.
|
| The more I learn about the deep depths of modern computing,
| the more I realize that they're actually _full_ of inelegant
| legacy cruft.
| kccqzy wrote:
| The style guide at both my previous and current employers
| explicitly forbids having global non-0 statics for this exact
| reason: code that runs before main() is very unusual. Many
| assumptions do not hold.
|
| A far better way is to use function-local statics. A static
| variable inside a function is initialized when execution
| reaches that point when the function is being called.
| Furthermore, such initialization is thread safe so that one
| initialization happens despite multiple concurrent calls of
| the function.
|
| The only exception to that style guide rule is the new
| constinit in C++20. It is sometimes called linker-initialized
| to make it even clearer that the program didn't do anything
| to initialize it, the linker did.
| cesarb wrote:
| > Furthermore, such initialization is thread safe so that
| one initialization happens despite multiple concurrent
| calls of the function.
|
| IIRC, there are some popular compilers in which
| initialization of static variables inside functions is not
| thread safe (even though AFAIK the C++ standard said they
| should be).
| vlovich123 wrote:
| I'm not aware of this problem in MSVC, clang and gcc and
| those are the most popular afaik.
| secondcoming wrote:
| The issue with Meyers Singletons is that every time they're
| accessed a flag must be checked first. This is bad in hop
| loops.
| zerotolerance wrote:
| Julia's articles are always excellent. I've always had great
| results teaching people that compiled code doesn't keep secrets
| by demoing `strings`.
| actionfromafar wrote:
| Can you elaborate?
| shzhdbi09gv8ioi wrote:
| man strings
| dilyevsky wrote:
| If you put something like if mySecretPassword
| == "Qwerty123" { ...
|
| then "Qwerty123" will be easily seen by strings utility.
| Which is pretty obvious but I'm guessing some junior folks
| will be surprised.
| xutopia wrote:
| You can run the `strings` command on most executables (or
| PDFs) and get an output of the strings represented in the
| file. Of course you can obfuscate some of those strings if
| you do things right but a lot of people who don't know about
| `strings` could write a password protected feature in a
| compile bit of code and be embarrassed to see how easy it is
| to find out what the password is.
| zerotolerance wrote:
| The other replies are pretty good. You can find all sorts of
| goodies in string data inside a binary: hostnames, URL
| fragments, error messages or templates, credentials. Pretty
| much any string constants that a program might use.
| sva_ wrote:
| Explain that to the German judges that fined some poor fella
| for finding passwords in a binary by [doing the equivalent of]
| running strings on it. They claim he 'circumvented' the
| software's security measures.
|
| https://www.theregister.com/2024/01/19/germany_fine_security...
| actionfromafar wrote:
| Not a criticism, not even a nit-pick, but a reflection
|
| _" (binaries are kind of the definition of platform-specific, so
| this is all platform-specific)_ (this is true!)
|
| When "Actually Portable Executable" took the (geek) proved that
| the same binary could run on a bunch of platforms, that was a
| surreal moment I still haven't mentally recovered from.
|
| Here we spent decades trying to solve the cross-platform problem,
| in so many fractals of ways (Java, cross-platform libraries, etc
| etc) and the solution was right under our noses all this time.
| norir wrote:
| I personally am not convinced that portable binaries are a net
| positive. I believe in the era of fast computers that source
| distribution and local compilation is superior to binary
| distribution. Unfortunately, much of the software we rely on is
| so large, and compilers so relatively slow, that binary
| distribution is something of a necessary evil. I'd rather see
| more effort towards simpler software components (that naturally
| compile fast) and faster compilers than portable binaries.
| csdvrx wrote:
| You can have both. APE are generally faster and smaller.
|
| Fat APEs (aarch64 + x86_64) are larger, but interesting in
| their own way.
| lisper wrote:
| > Executables aren't magic.
|
| Nothing in a computer is magic. It was all designed by humans,
| every single one of which was once a clueless noob. No one is
| born understanding this stuff.
| JoshuaRogers wrote:
| > This does a bunch of Very Important Things that I don't
| understand very well, including calling main. So I won't
| explain them.
|
| Honestly, this line was the best in the whole article. It felt
| like at that moment I knew the person talking to me wasn't
| trying to prove that they were some sage (personally guilty
| here) but instead of was someone who wanted to show me
| something cool that we could both enjoy.
|
| Wonderful write up.
| infinite8s wrote:
| "It is no exaggeration to regard this as the most fundamental
| idea in programming: The evaluator, which determines the
| meaning of expressions on a piece of paper, is just another
| piece of paper." --SICP
| latexr wrote:
| > Nothing in a computer is magic.
|
| I think that's covered by the text, in the sentence right after
| that one (emphasis mine):
|
| > ELF is a file format _like any other_!
| olsher wrote:
| The actual /behavior/ of computers, though, tends to emerge
| from the confluence of complex processes that humans /can't/
| understand...our AGI leverages this emergence to enable problem
| solving in domains where complexity exceeds human capabilities.
| fragmede wrote:
| cat-ing a binary to the terminal is a recipe for sadness. I like
| | hd, which is hexdump -C, though that's just as impenetrable to
| the naked eye.
| heinrichhartman wrote:
| I started my blog in 2012, when I shifted my academic career from
| Mathematics to Computer Science. This topic was literally the
| first thing that I studied:
|
| https://heinrichhartmann.com/archive/Dissecting-Hello-World....
|
| Never regretted going down this deep rabbit hole. IIRC, Julia
| also has a math background. Maybe it's the desire for bottom-up
| reasoning that leads math folks towards experiments like this.
| Great to see her making this approachable for a large audience.
| sergejf wrote:
| The format of executable files fascinated me back in the early
| 90s, to the point that I spent weeks writing (in Modula 2) a DOS
| and Windows executable file viewer that I named VEXE, releasing
| it as shareware in 1991.
|
| It found a niche following among crackers, even deserving a
| mention in a +ORC tutorial,
| https://gist.github.com/callowaysutton/48bdf0245e17e72d41a15...,
| probably because it could detect various encryption and
| compression methods used to prevent the reverse engineering of
| those programs.
| randall wrote:
| Amazingly helpful!
| adolph wrote:
| For folks interested in this topic who have not seen Cosmopolitan
| and RedBean, actually pdrtable executable (2020) is a great read
| too: https://justine.lol/ape.html
|
| https://redbean.dev/
| as1009 wrote:
| Great thread
| as1009 wrote:
| great thread, thank you!
___________________________________________________________________
(page generated 2024-02-02 23:00 UTC)