[HN Gopher] Show HN: A Ghidra extension for exporting parts of a...
       ___________________________________________________________________
        
       Show HN: A Ghidra extension for exporting parts of a program as
       object files
        
       This Ghidra extension unrelocates machine code through analysis and
       then synthesizes a working object file from a listing selection. It
       effectively turns computer programs into Lego bricks, to be torn
       down into pieces and reused into something new.  It supports the
       COFF and ELF object file formats, for the x86 and MIPS
       architectures. It has been successfully used on Linux, Windows and
       PlayStation executables. One user report is on a commercial video
       game from 2009 with a ~7 MiB Windows executable written in C++: it
       was delinked without its C runtime library and then relinked into a
       new executable at a different base address, with no visible change
       in functionality, as a prelude to a decompilation project.  Use-
       cases I've demonstrated on my blog include modding, making software
       ports, converting executable file formats, creating libraries...
       I've originally built this as part of a video game decompilation
       project ; I've been working on this over the past 2.5 years and
       recently it has started gaining some users besides me.
        
       Author : boricj
       Score  : 129 points
       Date   : 2024-08-22 08:54 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | bigdict wrote:
       | See also: objcopy.
       | 
       | https://sourceware.org/binutils/docs/binutils/objcopy.html
        
         | boricj wrote:
         | While objcopy can do many things, it can't undo the work of the
         | linker. If relocations aren't unapplied and a new relocation
         | table generated, these spots inside the new object file will
         | reference the original program's address space, leading to some
         | exotic undefined behavior.
         | 
         | Delinking is a subject with very few resources online, but
         | there are a couple of other tools for it out there:
         | - https://github.com/endrazine/wcc            -
         | https://github.com/jonwil/unlinkerida            -
         | https://github.com/jnider/delinker
        
       | jchw wrote:
       | Oh, great to see this here. I think this is an extremely cool
       | project, and I helped to add MS COFF support. (P.S.: I will note
       | that my initial PR was notably worse than the ELF support that
       | was already present, so if you run into problems with it...
       | probably my fault :P I can see it is being improved, though.)
       | That said, I haven't done anything big with it yet. The most fun
       | I had was delinking a Hello World executable compiled with Visual
       | Studio 2003, relinking it to Linux x86 with GCC+glibc, and then
       | relinking _that_ to MinGW+msvcrt again. Doing anything larger
       | than hello world is a bit beyond me yet, though, in part because
       | I 'm actually a pretty big n00b when it comes to Ghidra and
       | haven't even really figured out a good way to select the ranges
       | for delinking from a large binary. I should've probably asked
       | someone by now, but oh well. :)
       | 
       | Coincidentally, a derivation for this just got merged into
       | Nixpkgs earlier today, so if you're using NixOS unstable it's
       | possible to install it using ghidra.withExtensions; it is under
       | ghidra-extensions.ghidra-delinker-extension. Only one problem:
       | There was a new version released a few days ago and I didn't
       | rebase my PR, so it is out of date. I will try to push an update
       | soon.
        
         | boricj wrote:
         | > I'm actually a pretty big n00b when it comes to Ghidra and
         | haven't even really figured out a good way to select the ranges
         | for delinking from a large binary.
         | 
         | One way to keep track of things to delink is to use folders and
         | fragments inside a program tree. For example, I have a Ghidra
         | program where I've figured out the name and ranges of the
         | various object files that originally made up the executable.
         | These folders or fragments can then be selected as a whole with
         | right-click > Select Addresses.
         | 
         | The relocation synthesizer analyzer and the exporter can also
         | be scripted, either independently or using the program's tree
         | manager. This removes the need to select by hand the ranges you
         | want as well as invoking manually the analyzer and the
         | exporter.
        
       | toomuchtodo wrote:
       | Previous:
       | 
       |  _Show HN: A Ghidra extension that turns programs back into
       | object files_ - https://news.ycombinator.com/item?id=38852362 -
       | Jan 2024 (4 comments)
        
       | mhh__ wrote:
       | It might be interesting to tie this into something I had a
       | daydream about once and then never bothered to actually do:
       | generate header files from debug info (and then possibly have
       | some LLM tidy it up)
        
         | jchw wrote:
         | Actually there are a few attempts at this! Here's one for
         | Microsoft Program Database:
         | 
         | https://github.com/wbenny/pdbex
         | 
         | As for using an LLM to tidy it up... It doesn't seem like there
         | has been a ton of success applying LLM models to reverse
         | engineering yet... A part of me is wondering if this will wind
         | up being a place where the LLM architecture proves
         | insufficient. I'm not an expert but if I had to place a bet I'd
         | bet on diffusion models being more interesting for a lot of
         | reverse engineering use cases. That said, it's not really the
         | same thing, but with Binary Ninja they have a feature called
         | Sidekick that uses an LLM to try to clean up the disassembly;
         | I'm kind of unimpressed but maybe it is useful to somebody.
        
           | dvdkon wrote:
           | I'll add my attempt here: https://gitlab.com/dvdkon/pdb2hpp
           | 
           | Its output is kind of ugly, limited by limitations of either
           | the PDB format or Microsoft's terrible parser library, but
           | I've successfully used it for calling functions from a
           | proprietary DLL.
        
         | boricj wrote:
         | Tangentially, I've considered generating debugging symbols for
         | the exported object files, based on the contents of the Ghidra
         | database, in order to improve the debugging experience when
         | using them.
         | 
         | I haven't implemented that feature yet because so far I've
         | managed to get by without it. Also, it sounds like a rather
         | deep rabbit hole to fall into and the one I'm currently inside
         | of is big enough as it is.
        
         | chc4 wrote:
         | pahole gives you compilable C header files from ELF DWARF
         | information. LLMs seems irrelevant here: either your header
         | files have all the types exported from the executable correctly
         | so they are usable with the original values, or they aren't
         | correct/complete and having an LLM make up some more doesn't
         | help.
         | 
         | Ghidra also has native functionality to export its data
         | structures, which it can create from DWARF structures (Right
         | click -> Export to C header).
        
       | almostgotcaught wrote:
       | So is this a completely fool-proof process? Ie i'm asking if it's
       | guaranteed to succeed or if the analysis is conservative. Ie if
       | some piece/datum/feature is missing in the ELF then the delinking
       | will fail?
        
         | boricj wrote:
         | > So is this a completely fool-proof process?
         | 
         | That's... complicated to answer.
         | 
         | My analyzers rely on an accurate Ghidra database, at least for
         | the parts you want to export. While I've put a fair amount of
         | effort into logging the various issues than can crop up which
         | require fixing, they can't see what isn't there. In particular,
         | missing references and truncation of variables won't be
         | detected and will result in exotic undefined behavior.
         | 
         | There are ways to track down some of these issues. The best
         | I've found so far is to relink the executable at a different
         | base address and making sure that the original program's
         | address ranges are unmapped ; that should lead to segmentation
         | faults when absolute relocation spots are missed that can be
         | debugged (but that only works if your target has a MMU).
         | Truncated variables are very tricky to troubleshoot (especially
         | if you don't suspect it) since it's the memory following the
         | truncated variable that gets corrupted. An integer that is
         | mistaken for a pointer can also be very tricky to track down,
         | as the integer's value will vary depending on the address the
         | target symbol gets, leading to erratic program behavior (that's
         | especially an issue for program loaded very low in the address
         | space).
         | 
         | That being said, if the Ghidra database is accurate enough
         | _and_ you export back to the same object file format used
         | originally _and_ you subsequently use it onto the same platform
         | with the same toolchain, you _can_ delink megabytes of program
         | code and data successfully. I consider that if the linker did
         | it, then it should be possible to undo it.
         | 
         | Now, if you start cross-delinking to something that doesn't
         | match the original program's platform and toolchain (like
         | delinking from a Linux i386 ELF executable into a COFF object
         | file and using it with a i386 Windows toolchain) then it's
         | another story. If the exporter can express the relocations then
         | you might end up with a working relocatable object file, but
         | you'll still have potentially mismatched ABIs to contend with.
         | It can be done, but that's not something I would recommend as a
         | first project.
         | 
         | TL;DR Depending on what you do and the accuracy of the Ghidra
         | database, it can range from "it just works" all the way to
         | praying to Cthulhu for mercy.
        
       | jxjx wrote:
       | This sounds very interesting. And is tempting me to delve back
       | into a game reverse engineering project I abandoned a few years
       | back.
       | 
       | Do you have a fully worked example of how to use this and then
       | how to make use of its output? Would love to see an end-to-end
       | walkthrough.
        
       | sweeter wrote:
       | That sounds like magic, I'm not going to lie. I have to
       | understand how this is possible.
        
       | hmfrh wrote:
       | How much work is it to figure out which sections of the
       | executable to export?
       | 
       | Would it be realistic to be able to export a modern-ish
       | (2008-2015) Win32 game into objects and then compile/link it into
       | a full executable again with less than a few hours work?
        
       ___________________________________________________________________
       (page generated 2024-08-22 17:00 UTC)