[HN Gopher] Exploring Object File Formats
       ___________________________________________________________________
        
       Exploring Object File Formats
        
       Author : ingve
       Score  : 74 points
       Date   : 2024-01-15 09:35 UTC (1 days ago)
        
 (HTM) web link (maskray.me)
 (TXT) w3m dump (maskray.me)
        
       | eterps wrote:
       | I have a soft spot for the brutal simplicity of the .COM format.
        
         | 082349872349872 wrote:
         | Both ELF and Mach-O (and presumably other recent formats) are
         | amenable to a pseudo-COM approach:
         | https://news.ycombinator.com/item?id=38593896
        
         | actionfromafar wrote:
         | Also see the pure brutality that is the early Microsoft Office
         | DOC file. :) Just dumping the memory IIRC.
        
           | fullspectrumdev wrote:
           | I vaguely recall reading that on some versions of Office way
           | back when, memory from other programs would sometimes leak
           | into saved document files.
           | 
           | (Pauses commenting to go check) yes it's referenced in
           | Chapter 3 of "Silence on the Wire", apparently Office on
           | Windows 95/98 would dump memory from "other programs" into
           | word docs, with some anecdotal sightings in later versions.
        
             | actionfromafar wrote:
             | Maybe a case of reused allocated memory and just writing
             | chunks of memory to disk. Simplistic example:
             | 
             | struct blah_block{ int sz; char buf[510]; }
        
         | WalterBright wrote:
         | What's interesting about that is every tool I've seen to
         | generate .com files would set SS=DS=ES=CS to the same value.
         | 
         | But, it turns out, this is not necessary. The only thing about
         | COM is that the file size has to be less than 64K. This enables
         | another memory model, where the code seg and data seg are
         | different, enabling substantially larger programs that were
         | still COM programs. This was the Zortech "small" memory model.
        
       | pjmlp wrote:
       | Nice to see an article that remembers AIX isn't about ELF.
       | 
       | Symbian also used a COFF variant.
        
       | boricj wrote:
       | I've written an ELF object file exporter as part of a Ghidra
       | extension [1]. It's a bit finicky to get it right (toolchains
       | assume that object files are valid and don't have much in the way
       | of diagnostics), but these are fairly simple under the hood.
       | Section bytes, symbols and relocations, with some headers and
       | metadata to wrap these up...
       | 
       | It's a bit of a shame that object files aren't more of a _lingua
       | franca_ of toolchains in practice. Embedding binary blobs inside
       | a program in a portable way is still a mess today.
       | 
       | [1] https://github.com/boricj/ghidra-delinker-
       | extension/tree/mas...
        
       | khaledh wrote:
       | Low-level programming is one of my favourite subjects. I've
       | written a simple ELF parser in Nim (with the help of an amazing
       | binary parsing library) among other things:
       | https://github.com/khaledh/elfdump
        
         | eddd-ddde wrote:
         | That reminds me, couple of years ago when I was leaning to
         | program, one of my first projects was a ELF parser so I could
         | understand what binaries where doing and how they where built.
        
         | a2code wrote:
         | How did you learn to do this?
        
           | hnthrowaway0328 wrote:
           | Just read the ELF specification. There is a header and
           | everything else follows. There are a lot of explanations
           | online, here is one: https://www.cs.cmu.edu/afs/cs/academic/c
           | lass/15213-f00/docs/...
        
         | WalterBright wrote:
         | In order to learn an object file format, the first thing I'd do
         | is write a dumper for it.
        
       | xvilka wrote:
       | Always quality content in that blog. We used MaskRay's article[1]
       | on stack unwinding to improve our debuginfo (DWARF) support[2] in
       | the past. If someone wants to have a more hands-on approach to
       | executable file formats, e.g., XCOFF or GOFF, they can check
       | Rizin's ideas for new formats[3] to support.
       | 
       | [1] https://maskray.me/blog/2020-11-08-stack-unwinding
       | 
       | [2] https://rizin.re/posts/gsoc-2023-dwarf/
       | 
       | [3]
       | https://github.com/rizinorg/ideas/issues?q=is%3Aissue+is%3Ao...
        
       | norir wrote:
       | Nice overview. They didn't get to the hideous self-referential
       | (and largely undocumented) trie that is used for the symbol name
       | mappings in mach-o. Not fun to implement. And frustrating because
       | there is so much wasted space in typical mach-o binaries that it
       | seems very much not worth the compression effort, at least by I
       | don't know 2005?
        
       | snvzz wrote:
       | Notably missing is HUNK, AmigaOS's object file format.
        
       ___________________________________________________________________
       (page generated 2024-01-16 23:01 UTC)