[HN Gopher] Cicoparser: Full game reverse engineering [video]
       ___________________________________________________________________
        
       Cicoparser: Full game reverse engineering [video]
        
       Author : gabonator
       Score  : 48 points
       Date   : 2021-04-20 04:51 UTC (18 hours ago)
        
 (HTM) web link (www.youtube.com)
 (TXT) w3m dump (www.youtube.com)
        
       | gitowiec wrote:
       | This was great to watch. I wish it could work with Linux. First
       | game I would like to recompile is Dune.
        
         | wts42 wrote:
         | Excellent pick. Mentat approved.
        
         | gabonator wrote:
         | Should work with linux without any problem - cicoparser was
         | initially developed on windows and does not use any libraries
         | besides std, you can build it using gcc compiler... Host
         | application is based on SDL2, so it should work really anywhere
         | without any extra work
        
       | Cloudef wrote:
       | There's similar set of tools by notaz[1] that were used to static
       | recompile starcraft, diablo, diablo 2, and jazz jackrabbit games
       | to ARM Linux. You can read more about the recompilation here[2].
       | 
       | 1: https://github.com/notaz/ia32rtools 2:
       | https://www.giantpockets.com/starcraft-pandora-port-came/
        
       | tibbydudeza wrote:
       | It gave me flashbacks to using DOS with Norton Commander :).
        
       | tralarpa wrote:
       | I got very excited when I saw the description of the video
       | "Conversion of game into C++ with cicoparser and IDA
       | disassembler". I thought "neat, a new decompiler".
       | 
       | But then I understood what CicoParser is doing: it translates
       | machine instructions into C-statements, i.e. when your binary
       | contains an instruction like "mov 123,sp", the output will be a C
       | source file with a statement "memory16(_ds,123)=_sp;". On the
       | github page, they say it is not a CPU emulator, but I would
       | rather say it is a CPU emulator with AOT compilation of the
       | binary.
        
         | gabonator wrote:
         | If it was CPU emulator, it would update all the flags everytime
         | performing any ALU operation (I have seen this approach in one
         | source-to-source compiler). Actually, there is not much you can
         | do: If the instruction stores SP into DS:123, it converts the
         | instruction into simple assignment *((WORD*)&memory[ds*16+123])
         | = sp. All the ALU operations are directly calculated using the
         | target instruction set, the flags register is updated only when
         | necessary. Nor the memory is emulated, it directly accessess
         | the memory buffer (in the video there are just extra range
         | checks, even the *16 operation can be optimized replacing ds/es
         | with memory pointers). Only thing that is emulated is the EGA
         | adapter.
        
           | habibur wrote:
           | Right. From a birds eye view it might more look like
           | assembly. But look closer and you see it summarizes a bunch
           | of idiomatic assemblies into C code.
           | 
           | And it will improve over time if the developers continue to
           | give it effort.
        
           | tralarpa wrote:
           | Thanks for the explanation. Very nice project. I guess self-
           | modifying code does not work with this technique, does it? (I
           | don't know much about DOS games and how common self-modifying
           | code was on PCs).
           | 
           | Concerning access to video memory: I saw that you treat them
           | "manually" in some cases. I am wondering whether you could
           | avoid that by using virtual memory. You could mark the pages
           | as invalid and when an instruction tries to access them, you
           | catch it and replace the memory[...] access instruction by a
           | call to memoryVideoGet. The JIT-Compiler of the Amiga
           | emulator uses a similar technique for indirect accesses to
           | hardware registers.
        
             | gabonator wrote:
             | Good point. In the set of games (10 in total, release date
             | up to 1991) I was porting I found only one that used this
             | nasty technique. And it was just rewriting only single byte
             | of code (something like rewriting nop instruction into
             | return). So very simple case so far. Of course using
             | cicoparser doesn't mean that you get working code without
             | any manual work. You will always need to fix some issues by
             | hand. Virtual memory does not solve anything in this case.
             | Writing to EGA video ram means that you want to display
             | some pixels. But the write operation goes through some
             | extra logic which decides what to do with the byte being
             | written (extra rotation, masking...) and by reading the
             | same addrees you are not guaranteed to get the same value
             | back. EGA control registers handle this process and you
             | simply need to emulate this behaviour somehow.
        
         | albertzeyer wrote:
         | But what exactly is the difference to a decompiler then?
        
           | tralarpa wrote:
           | Probably depends on your definition of a decompiler. For me,
           | a decompiler reverses to some extend the operation of a
           | compiler. Variables instead of registers, function call
           | arguments instead of stack pushs, etc.
           | 
           | Of course, you could also say that a decompiler is any tool
           | that produces something from a binary that you can compile
           | again. But in that case, I could claim that this here is also
           | a decompiled program:                  byte[] programbinary={
           | put binary of the program here };
           | runEmulator(programbinary);
        
             | teawrecks wrote:
             | A compiler has optimization steps. Rather than going
             | straight from human readable C to binary, it compiles to an
             | IR and then uses some heuristics to create binary that is
             | more efficient for the machine to execute.
             | 
             | I feel like you're effectively asking for an optimization
             | step. Decompile to an IR, and then use some heuristics to
             | get back to C that is more efficient for humans to read.
             | 
             | And if a compiler without an optimizer is still a compiler,
             | then a decompiler without an "optimizer" should still be
             | called a decompiler.
        
           | CodeArtisan wrote:
           | A decompiler translates bytecote into a structured program.
           | 
           | https://en.wikipedia.org/wiki/Structured_programming
        
       ___________________________________________________________________
       (page generated 2021-04-20 23:01 UTC)