[HN Gopher] Reverse engineering programs with unknown instructio...
       ___________________________________________________________________
        
       Reverse engineering programs with unknown instruction sets (2012)
       [pdf]
        
       Author : lauriewired
       Score  : 116 points
       Date   : 2023-01-27 11:45 UTC (11 hours ago)
        
 (HTM) web link (www.recon.cx)
 (TXT) w3m dump (www.recon.cx)
        
       | olivierduval wrote:
       | Amazing !!! Look a lot like breaking a cypher with the added
       | specifics of processor knowledge !
        
       | [deleted]
        
       | stuckkeys wrote:
       | Is the site decompilation.info down? Cannot access it.
        
         | crecker wrote:
         | It seems so
        
       | kijiki wrote:
       | Also enjoyable, reverse engineering the Transmeta Crusoe's
       | internal VLIW instruction set:
       | https://www.realworldtech.com/crusoe-intro/
       | 
       | I suspect the Anonymous author might have gotten a tip or two
       | from a friendly Transmeta hardware or software engineer.
        
       | egberts1 wrote:
       | I once wrote a detector of 38 known machine languages.
       | 
       | Akin to an expansion of the UNIX file command.
       | 
       | It would listed known machine code(s) encountered at least within
       | 4 bytes (in probability order).
       | 
       | Good times, good times.
       | 
       | (oh, sadly, not open source, but proprietary; I still do wish I
       | could release this gem.)
        
         | unwind wrote:
         | In what context was that used, if you can elaborate?
        
           | egberts1 wrote:
           | Like the UNIX file command, it lists out what the file
           | content probably is/are.
           | 
           | It can also breakdown the file in question by regions and
           | group such data content into most probable types ... for each
           | region.
           | 
           | As to its final application, that is not in my contract/task
           | description.
        
             | unwind wrote:
             | Okay, thanks.
             | 
             | Yeah I'm very familiar with 'file', I just wondered in what
             | context one needs the ability to identify 38 machine
             | languages, i.e. why does an organization deal with files
             | containing unknown machine code, and have the need to
             | identify them?
             | 
             | Sounds like maybe reverse engineering/security
             | "research"-oriented work, perhaps.
        
               | egberts1 wrote:
               | I was basically leveraging my eidetic memory of opcodes
               | and operands and its bitfields.
               | 
               | It all got started with writing pure assembly for
               | Motorola 6502 (for arcades) and PDP-11 then eventually
               | ended with ARM/RISC/MIPS. Most esoteric one is the
               | Transmeta VLIW (TMS3200-02).
               | 
               | and someone asked for one (internally).
        
       | tom_ wrote:
       | Previously on HN, possibly not unrelated:
       | https://news.ycombinator.com/item?id=25115916
        
       | amelius wrote:
       | But what if the CPU assumes the instruction stream is compressed?
        
         | gus_massa wrote:
         | In the slide 9, they show the frequency of each 16-bit value.
         | In a compressed code, the frequency of each value should be
         | almost equal.
         | 
         | 10 or 20 years ago, when reverse engineering any unknown file
         | it was a good to assume it was no compressed and you could get
         | some insight looking at the hex editor and hopping the best.
         | Now many are compressed, so a good first step is to change the
         | extension to .zip and try WinRar (or look for a header if you
         | are not lazy).
         | 
         | I assume that with compressed code you can use the same
         | strategy. Try to assume it's using a well known compression
         | algorithm, and crossing your fingers.
        
           | anthk wrote:
           | 7zip, unar, innoextract...
           | 
           | And, of course, upx.
        
       | msm_ wrote:
       | Shout out to CPUAdventure challenge from DragonCTF 2019, which
       | were basically this. If you like the slides, you should find this
       | writeup entertaining:
       | https://www.robertxiao.ca/hacking/dsctf-2019-cpu-adventure-u...
        
         | thrdbndndn wrote:
         | Thanks, this is much easier to understand than a slide (without
         | presenter).
        
       | skissane wrote:
       | I wonder what the mystery instruction set in the slides actually
       | is? (Assuming it is a real instruction set and not just something
       | made up to demo the idea.)
        
         | [deleted]
        
         | gwern wrote:
         | It's a reverse-engineering conference presentation by 2 Russian
         | authors who highlight that they aren't providing any details
         | about the context despite the obvious extreme relevance, and
         | where their solution does not handle any obfuscation at all. So
         | they are probably not decompiling APT malware running in nested
         | VMs, but I'm going to guess reverse-engineering old highly-
         | secret Russian military hardware where the only docs are high-
         | level ones about the usage and repair, not what the chips are
         | _doing_ , and where the contractor wants to bugfix or develop
         | new versions but needs to understand all the inner logic and
         | what empirical ad hoc corrections it might be incorporating
         | through the wisdom of long-dead Russian mega-brain engineers.
        
       | tempodox wrote:
       | Stuff like that is definitely fun. In the 1990s I bought a Sharp
       | PC-E500S pocket computer and hacked the CPU's instruction set.
       | With no internet and no documentation about the processor, I
       | invented my own assembler syntax for the instructions. Assembler,
       | disassembler, hex monitor, (written in Basic) are all still
       | working to this day.
        
         | lloydatkinson wrote:
         | You should post that online I'm sure people, including me,
         | would love to read it.
        
           | tempodox wrote:
           | All my notes at the time were made with pencil on paper. Even
           | if I could find them, I'm not sure they would still be
           | readable. The Basic programs could only be copied by re-
           | typing them manually on a contemporary computer. Presenting
           | this pre-internet stuff on a website would just be too much
           | work, sorry.
        
             | hasmanean wrote:
             | That just makes it a meta challenge...for some unknown
             | engineer who wants to reverse-engineer an engineer's
             | program that reverse-engineered a program with an unknown
             | instruction set.
        
             | codetrotter wrote:
             | I understand and sympathise with that.
             | 
             | If you do find the documents though, please consider just
             | scanning them and uploading them to Internet Archive and
             | posting the links to HN. That way someone else in the
             | future can find it and decide if they want to do the manual
             | re-typing etc themselves :)
        
         | fallat wrote:
         | Please _please_ write about the whole process :) I 'd love to
         | read it!
        
         | intelVISA wrote:
         | Lovely, you should document your stories that sounds
         | impressive!
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-01-27 23:01 UTC)