[HN Gopher] How is a binary executable organized? Let's explore ...
       ___________________________________________________________________
        
       How is a binary executable organized? Let's explore it (2014)
        
       Author : tripdout
       Score  : 284 points
       Date   : 2024-02-02 17:43 UTC (5 hours ago)
        
 (HTM) web link (jvns.ca)
 (TXT) w3m dump (jvns.ca)
        
       | jstrieb wrote:
       | As I've said in other threads
       | https://news.ycombinator.com/item?id=38847750#38862450, I highly
       | recommend writing an ELF by hand at least once. It's a great
       | exercise to understand the basic parts of an executable. It's
       | also helpful if you want to go the opposite direction of this
       | article - bottom-up instead of top-down.
       | 
       | Lots of other great discussion in various threads on that other
       | HN post.
        
         | Retr0id wrote:
         | Similarly, I'd recommend writing a simple ELF loader. There's a
         | fair bit of implementation complexity in dynamic linking, but
         | if you only support static ELFs then it's straight-forward.
        
           | mkoubaa wrote:
           | I've seriously considered writing an ELF loader that uses a
           | special symbol (like _resolve) where dynamic library
           | resolving is done imperatively. The flexibility from libdl
           | always feel underwhelming.
        
           | Isamu wrote:
           | Yes, likewise I wrote a reader that simply tried to parse
           | every bit of a complex ELF binary to report its structure and
           | quickly found myself in poorly documented territory. It's an
           | education if you want it.
        
         | akdas wrote:
         | Writing an ELF file by hand is something I did recently:
         | https://github.com/avik-das/garlic/blob/master/recursive/elf...
         | 
         | To explain the format to myself and others, I also created an
         | interactive visualization for the bytes in the file. It helps
         | me to click on a byte, see an explanation for it and see
         | related bytes elsewhere in the file highlighted.
         | https://scratchpad.avikdas.com/elf-explanation/elf-explanati...
        
       | sva_ wrote:
       | > When the program starts running, you might think it starts at
       | main. It doesn't! It actually goes to _start. This does a bunch
       | of Very Important Things that I don't understand very well,
       | including calling main. So I won't explain them.
       | 
       | The way I understand it, the symbol main is a C-specific thing.
       | The symbol _start is a language-agnostic entry point for the
       | binary that will in this case call main.
       | 
       | A convention of i.e. calling the entry point _start with main's
       | argc/argv would make the format a lot less flexible.
        
         | a-priori wrote:
         | Technically the name _start is not special either. The binary
         | lists its entry point address in a header and that's where the
         | OS starts execution from. That symbol is just called _start by
         | convention by C and other languages, which is what the linker
         | uses to set the entry point when writing the ELF headers, but
         | if you're writing your own linker scripts you could call the
         | entry point whatever you want.
        
           | Keyframe wrote:
           | to extend on it, _start is where .text begins and address of
           | that is set by linker
        
             | a-priori wrote:
             | The entry point can be anywhere in the .text section, and
             | often won't be at the beginning of the section.
        
               | Retr0id wrote:
               | Technically it doesn't even need to be in the .text
               | section, it could be anywhere in the address space.
               | You'll get a segfault if it's not somewhere executable
               | though (assuming you're on a system with an appropriately
               | configured MMU)
        
               | Keyframe wrote:
               | yes and then you'll have a bad time, but at the same time
               | per convention _start is where .text begins. You can see
               | where it starts with readelf --file-header <executable>
               | and look at Entry point address field. You can change it,
               | yes.
        
               | logdahl wrote:
               | A common hack to reduce ELF size is actually to start the
               | first section (possibly the .text) right on the elf
               | header, as this circumvents the alignment requirements.
        
               | Keyframe wrote:
               | probably not even mandatory.. lots of /usr/bin stuff on
               | my ubuntu machine have __libc_start_main only
        
               | Retr0id wrote:
               | No, it's not even a convention, _start is most commonly
               | _not_ where .text begins.
               | 
               | Compiling a static hello world binary on my system
               | (aarch64 fedora 39, gcc -static hello.c -o hello), .text
               | starts at 0x410080, e_entry is at 0x4103c0, and the
               | _start symbol is also at 0x4103c0. This is not unusual at
               | all.
        
         | vlovich123 wrote:
         | > Things that I don't understand very well, including calling
         | main. So I won't explain them.
         | 
         | It depends on the language runtime, but a common task will be
         | initializing global non-0 statics. For languages like
         | Rust/C/C++ you can also inject variables to be initialized via
         | linker flags. Before start if the program is dynamically linked
         | then I believe the linker runtime is run to resolve the links
         | and then transfer control to _start.
         | 
         | Basically hacks on hacks on hacks added organically to offer
         | extensibility and the hacks have enough social adoption and are
         | good enough that we stick with them.
        
           | seanw444 wrote:
           | > Basically hacks on hacks on hacks added organically to
           | offer extensibility and the hacks have enough social adoption
           | and are good enough that we stick with them.
           | 
           | The more I learn about the deep depths of modern computing,
           | the more I realize that they're actually _full_ of inelegant
           | legacy cruft.
        
           | kccqzy wrote:
           | The style guide at both my previous and current employers
           | explicitly forbids having global non-0 statics for this exact
           | reason: code that runs before main() is very unusual. Many
           | assumptions do not hold.
           | 
           | A far better way is to use function-local statics. A static
           | variable inside a function is initialized when execution
           | reaches that point when the function is being called.
           | Furthermore, such initialization is thread safe so that one
           | initialization happens despite multiple concurrent calls of
           | the function.
           | 
           | The only exception to that style guide rule is the new
           | constinit in C++20. It is sometimes called linker-initialized
           | to make it even clearer that the program didn't do anything
           | to initialize it, the linker did.
        
             | cesarb wrote:
             | > Furthermore, such initialization is thread safe so that
             | one initialization happens despite multiple concurrent
             | calls of the function.
             | 
             | IIRC, there are some popular compilers in which
             | initialization of static variables inside functions is not
             | thread safe (even though AFAIK the C++ standard said they
             | should be).
        
               | vlovich123 wrote:
               | I'm not aware of this problem in MSVC, clang and gcc and
               | those are the most popular afaik.
        
             | secondcoming wrote:
             | The issue with Meyers Singletons is that every time they're
             | accessed a flag must be checked first. This is bad in hop
             | loops.
        
       | zerotolerance wrote:
       | Julia's articles are always excellent. I've always had great
       | results teaching people that compiled code doesn't keep secrets
       | by demoing `strings`.
        
         | actionfromafar wrote:
         | Can you elaborate?
        
           | shzhdbi09gv8ioi wrote:
           | man strings
        
           | dilyevsky wrote:
           | If you put something like                 if mySecretPassword
           | == "Qwerty123" {          ...
           | 
           | then "Qwerty123" will be easily seen by strings utility.
           | Which is pretty obvious but I'm guessing some junior folks
           | will be surprised.
        
           | xutopia wrote:
           | You can run the `strings` command on most executables (or
           | PDFs) and get an output of the strings represented in the
           | file. Of course you can obfuscate some of those strings if
           | you do things right but a lot of people who don't know about
           | `strings` could write a password protected feature in a
           | compile bit of code and be embarrassed to see how easy it is
           | to find out what the password is.
        
           | zerotolerance wrote:
           | The other replies are pretty good. You can find all sorts of
           | goodies in string data inside a binary: hostnames, URL
           | fragments, error messages or templates, credentials. Pretty
           | much any string constants that a program might use.
        
         | sva_ wrote:
         | Explain that to the German judges that fined some poor fella
         | for finding passwords in a binary by [doing the equivalent of]
         | running strings on it. They claim he 'circumvented' the
         | software's security measures.
         | 
         | https://www.theregister.com/2024/01/19/germany_fine_security...
        
       | actionfromafar wrote:
       | Not a criticism, not even a nit-pick, but a reflection
       | 
       |  _" (binaries are kind of the definition of platform-specific, so
       | this is all platform-specific)_ (this is true!)
       | 
       | When "Actually Portable Executable" took the (geek) proved that
       | the same binary could run on a bunch of platforms, that was a
       | surreal moment I still haven't mentally recovered from.
       | 
       | Here we spent decades trying to solve the cross-platform problem,
       | in so many fractals of ways (Java, cross-platform libraries, etc
       | etc) and the solution was right under our noses all this time.
        
         | norir wrote:
         | I personally am not convinced that portable binaries are a net
         | positive. I believe in the era of fast computers that source
         | distribution and local compilation is superior to binary
         | distribution. Unfortunately, much of the software we rely on is
         | so large, and compilers so relatively slow, that binary
         | distribution is something of a necessary evil. I'd rather see
         | more effort towards simpler software components (that naturally
         | compile fast) and faster compilers than portable binaries.
        
           | csdvrx wrote:
           | You can have both. APE are generally faster and smaller.
           | 
           | Fat APEs (aarch64 + x86_64) are larger, but interesting in
           | their own way.
        
       | lisper wrote:
       | > Executables aren't magic.
       | 
       | Nothing in a computer is magic. It was all designed by humans,
       | every single one of which was once a clueless noob. No one is
       | born understanding this stuff.
        
         | JoshuaRogers wrote:
         | > This does a bunch of Very Important Things that I don't
         | understand very well, including calling main. So I won't
         | explain them.
         | 
         | Honestly, this line was the best in the whole article. It felt
         | like at that moment I knew the person talking to me wasn't
         | trying to prove that they were some sage (personally guilty
         | here) but instead of was someone who wanted to show me
         | something cool that we could both enjoy.
         | 
         | Wonderful write up.
        
         | infinite8s wrote:
         | "It is no exaggeration to regard this as the most fundamental
         | idea in programming: The evaluator, which determines the
         | meaning of expressions on a piece of paper, is just another
         | piece of paper." --SICP
        
         | latexr wrote:
         | > Nothing in a computer is magic.
         | 
         | I think that's covered by the text, in the sentence right after
         | that one (emphasis mine):
         | 
         | > ELF is a file format _like any other_!
        
         | olsher wrote:
         | The actual /behavior/ of computers, though, tends to emerge
         | from the confluence of complex processes that humans /can't/
         | understand...our AGI leverages this emergence to enable problem
         | solving in domains where complexity exceeds human capabilities.
        
       | fragmede wrote:
       | cat-ing a binary to the terminal is a recipe for sadness. I like
       | | hd, which is hexdump -C, though that's just as impenetrable to
       | the naked eye.
        
       | heinrichhartman wrote:
       | I started my blog in 2012, when I shifted my academic career from
       | Mathematics to Computer Science. This topic was literally the
       | first thing that I studied:
       | 
       | https://heinrichhartmann.com/archive/Dissecting-Hello-World....
       | 
       | Never regretted going down this deep rabbit hole. IIRC, Julia
       | also has a math background. Maybe it's the desire for bottom-up
       | reasoning that leads math folks towards experiments like this.
       | Great to see her making this approachable for a large audience.
        
       | sergejf wrote:
       | The format of executable files fascinated me back in the early
       | 90s, to the point that I spent weeks writing (in Modula 2) a DOS
       | and Windows executable file viewer that I named VEXE, releasing
       | it as shareware in 1991.
       | 
       | It found a niche following among crackers, even deserving a
       | mention in a +ORC tutorial,
       | https://gist.github.com/callowaysutton/48bdf0245e17e72d41a15...,
       | probably because it could detect various encryption and
       | compression methods used to prevent the reverse engineering of
       | those programs.
        
       | randall wrote:
       | Amazingly helpful!
        
       | adolph wrote:
       | For folks interested in this topic who have not seen Cosmopolitan
       | and RedBean, actually pdrtable executable (2020) is a great read
       | too: https://justine.lol/ape.html
       | 
       | https://redbean.dev/
        
       | as1009 wrote:
       | Great thread
        
       | as1009 wrote:
       | great thread, thank you!
        
       ___________________________________________________________________
       (page generated 2024-02-02 23:00 UTC)