hngopher.com

       [HN Gopher] Show HN: I wrote a Java decompiler in pure C language
       ___________________________________________________________________
        
       Show HN: I wrote a Java decompiler in pure C language
        
       Author : neocanable
       Score  : 135 points
       Date   : 2025-06-03 12:14 UTC (10 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | appendixv3 wrote:
       | Very cool project! Love the idea of a Java decompiler written in
       | C -- the speed must be great.
       | 
       | Any plan to support `.dex` in the future? Also curious how you
       | handle inner classes inside JARs.
        
         | tslater2006 wrote:
         | The readme shows support for dumping dex files. Edit: missed
         | that it has a comment that stays "unsupport for now" but at
         | least it looks like something planned
        
         | mdaniel wrote:
         | The "jikes" compiler from IBM
         | <https://github.com/daveshields/jikespg> was written in C++ and
         | was for the longest time _screaming_ fast. It also had its own
         | parser generator lpg which was fun to play with, if you 're
         | into those things <https://github.com/daveshields/jikespg>
         | 
         | It seems someone liked it and made a "v2" along with LSP
         | support https://github.com/A-LPG/LPG2#lpg2
        
           | amiga386 wrote:
           | Jikes also gave massively better error messages than the
           | official Java compiler, from what I remember, and it
           | certainly ran a lot faster on the Amiga
           | (https://aminet.net/package/dev/lang/jikes) than trying to
           | run javac via Kaffe (https://en.wikipedia.org/wiki/Kaffe)
           | did.
        
           | pjmlp wrote:
           | Certainly not everything on Jikes, given that it was one of
           | the first bootstraped Java toolchains.
           | 
           | https://www.jikesrvm.org/
        
         | neocanable wrote:
         | I am writing the part of decompiling dex and apk. The current
         | speed is about 10 times faster than that of Java, and it takes
         | up less resources than Java. And the compiled binary is
         | smaller, only about 300k. Thank you for your attention.
        
           | mdaniel wrote:
           | This has been my life experience with things written in
           | C/C++, so speed doesn't matter. Or, I guess from an
           | alternative perspective, it _ran_ very fast, but _exited_
           | very fast, too :-D                 $ ./objdir/garlic
           | $the_jar_file -o out-dir -t $(nproc)       Progress : 85
           | (1024)Segmentation fault: 11
        
             | neocanable wrote:
             | Sorry for giving you a bad experience. Please provide the
             | jar file or class file. I hope I can fix it as soon as
             | possible.
        
             | uecker wrote:
             | Is it? This is my experience with Python. The C/C++
             | programs I use daily never seem to crash (Linux, bash,
             | terminals, X, firefox, vim, etc.). It must be years ago one
             | of those programs crashed while I used it.
        
               | 1718627440 wrote:
               | Also a segfault IS the protection layer intervening, it
               | is equivalent to a exception in other languages. The real
               | problem is, when there is no segfault.
        
               | uecker wrote:
               | This is absolutely true. But even this does not happen in
               | the software I use every day. Software written is C is
               | definitely the most stable I use - by far. That there are
               | people running around claiming that it is impossible to
               | write stable software in C and it crashes all the time
               | due to bugs is rather unfortunate, as it is far from the
               | truth.
        
         | neocanable wrote:
         | It is processes inner classes recursively. First read all entry
         | from jar, and analyze the relationships between classes. Then
         | do some decompile job.
        
       | keepamovin wrote:
       | By hand or with AI? Fascinating. So much work! What was your
       | motivation for this?
        
         | xandrius wrote:
         | Irrelevant to me. People would never ask whether someone has
         | created something looking at SO or not. If the thing works as
         | advertised, good for them!
        
           | lyxell wrote:
           | To some people the process leading to a finished project is
           | the most interesting thing about posts like these.
        
             | johnisgood wrote:
             | LLMs can explain the process, and you can build projects
             | with LLMs explaining the process.
        
               | nticompass wrote:
               | LLMs can attempt to explain the code, but it can't
               | explain people's thought process and that's the
               | interesting part.
               | 
               | I want to hear about the reverse engineering, how you
               | thought the code through. LLMs are boring.
        
               | johnisgood wrote:
               | They can, if you write down your thought process, which
               | is probably what you should do when you are using an LLM
               | to create a product, but what do I know.
        
               | Ghoelian wrote:
               | > They can, if you write down your thought process
               | 
               | Just write a blogpost at that point.
        
               | johnisgood wrote:
               | You do not have to be as accurate or that specific, you
               | do not have to worry about the way you word or organize
               | things, the LLM can figure it out, as opposed to a blog
               | post.
               | 
               | So "To some people the process leading to a finished
               | project is the most interesting thing about posts like
               | these." is bullshit, that is said by someone who has
               | never used LLM _properly_. You can achieve it with LLMs.
               | You definitely can, I know, I did, accurately (I double
               | checked).
               | 
               | I will throw it out here, too:
               | https://news.ycombinator.com/item?id=44163063 (My AI
               | skeptic friends are all nuts)
        
               | ryan93 wrote:
               | That is not true
        
               | johnisgood wrote:
               | How come? You had different experiences? Which LLMs, what
               | prompts? Give me all the details that supports your claim
               | that it is not true. My experiences completely differ
               | from yours, so the way I use it, it is very much true.
               | 
               | That said, it is probably pointless to argue with full-
               | blown AI-skeptics.
               | 
               | People had lots of great and productive-enhancing
               | experiences with LLMs, you did not, great, that does not
               | reflect the tool, it reflects your way of using the tool.
               | 
               | I will just throw it out here:
               | https://news.ycombinator.com/item?id=44163063 (My AI
               | skeptic friends are all nuts)
        
               | shortrounddev2 wrote:
               | LLM output is simply not interesting
        
               | johnisgood wrote:
               | I did not say that you should copy paste its output
               | verbatim. I thought this was obvious.
               | 
               | Additionally, "interesting" is highly subjective. It
               | could be technically correct, yet uninteresting.
        
               | zerr wrote:
               | It's not about explaining the process but experiencing
               | it.
        
               | johnisgood wrote:
               | Well, they can experience it if they wish to. Sadly most
               | vibe-coders do not.
        
         | neocanable wrote:
         | 90% by hand, 10% AI. I do this for fun and to learn about jvm.
        
           | jebarker wrote:
           | I think that sort of ratio is the sweet spot for learning.
           | I've been writing an 8086 simulator in C++ and using an LLM
           | for answering specific technical questions I come up with has
           | drastically sped up my progress without it actually doing the
           | work for me.
        
           | keepamovin wrote:
           | Wow, impressive. A project of the scale and depth.
        
         | Bjartr wrote:
         | A great question to ask. We're in the middle of learning where
         | AI can and can't be effective. Knowing where and how it's being
         | used is quite useful.
        
       | stefanos82 wrote:
       | Nice job! I don't know whether you know https://github.com/java-
       | decompiler/jd-gui or not, but in case you haven't seen it before,
       | maybe you could use it as a reference, since it's written in
       | Java, for extra fun with your adventure?
        
         | rafram wrote:
         | Things may have changed, but my impression as of several years
         | ago was that JD-GUI was far, far behind the state of the art
         | (Fernflower, aka the built-in IntelliJ decompiler) in terms of
         | correctness, re-sugaring, support for modern Java features, and
         | so on. Fernflower is open source as part of IntelliJ:
         | https://github.com/fesh0r/fernflower
        
           | GranPC wrote:
           | Is there a good GUI for this a la jadx-gui that isn't an
           | entire IDE?
        
             | rafram wrote:
             | Not that I know of. The features I'd want in order to
             | consider a decompiler GUI "good" (e.g. a good text editing
             | control, go-to-definition, find usages, manual renaming of
             | obfuscated symbol names) quickly approach the scope of an
             | entire IDE, though.
        
               | DefineOutside wrote:
               | The most feature advanced decompiler I know of is Recaf.
               | It supports a mix of decompilers and even bytecode
               | editing.
        
       | gibibit wrote:
       | I am always curious how different C programs decide how to manage
       | memory.
       | 
       | In this case there are is a custom string library. Functions
       | returned owned heap-allocated strings.
       | 
       | However, I think there's a problem where static strings are used
       | interchangably with heap-allocated strings, such as in the
       | function `string class_simple_name(string full)` (
       | https://github.com/neocanable/garlic/blob/72357ddbcffdb75641... )
       | 
       | Sometimes it returns a static string like `g_str_int` and
       | sometimes a newly heap-allocated string, such as returned by
       | `class_type_array_name(g_str_int, depth)`.
       | 
       | Callers have no way to properly release the memory allocated by
       | this function.
        
         | neocanable wrote:
         | In multi-threaded mode, each thread will create a separate
         | memory pool. If in single-threaded mode, a global memory pool
         | is used. You can refer to https://github.com/neocanable/garlic/
         | blob/72357ddbcffdb75641.... The x_alloc and x_alloc_in in it
         | indicate where the memory is allocated. When each task ends,
         | the memory allocated in the memory pool is released, and the
         | cycle repeats.
        
         | IshKebab wrote:
         | Interesting. Someone should come up with a language that
         | prevents these sorts of mistakes!
        
           | brabel wrote:
           | That's impossible. Just be more careful and everything should
           | work, the author's C was just a bit rusty!
        
             | neocanable wrote:
             | This project is my first project written in C language.
             | Before this, my C language level was only at printf("hello
             | world"). I am very happy because this project made me dare
             | to use secondary pointers.
        
               | sim7c00 wrote:
               | u did really well ppl like to pick on C. :) thanks for
               | making it in C, fun to read ur code and see how others go
               | about this language!
        
           | kookamamie wrote:
           | Yes, perhaps it could have a marketing slogan like "Write
           | once, crash everywhere!"
        
           | cenamus wrote:
           | Thank god Lisp is older than C, don't have to deal with such
           | nonsense :-)
        
           | pjmlp wrote:
           | If only there were a couple of OSes implementated during the
           | 1960's with such programming languages....
        
           | uecker wrote:
           | I think he is using memory pools, so this is ok.
        
         | SunlitCat wrote:
         | Strings! The bane of C programming, and a big reason I prefer
         | C++. :D
        
         | norir wrote:
         | Many command line tools do not need memory management at all,
         | at least to first approximation. Free nothing and let the os
         | cleanup on process exit. Most libraries can either use an arena
         | internally and copy any values that get returned to the user to
         | the heap at boundaries or require the user to externally create
         | and destroy the arena. This can be made ergonomic with one
         | macro that injects an arena argument into function defs and
         | another that replaces malloc by bumping the local arena data
         | pointer that the prior macro injected.
        
           | 1718627440 wrote:
           | That might be true, but leaking is neither the critical nor
           | the most hard to find memory management issue, and good luck
           | trying to adapt or even run valgrind with a codebase that
           | mindlessly allocates and leaks everywhere.
        
             | guerrilla wrote:
             | Pretty sure you can just disable leak checking.
        
               | 1718627440 wrote:
               | But for example verifying that memory is not touched
               | after it is supposed to, is much harder when you can't
               | rely on it being freed.
               | 
               | Of course literally running valgrind is still possible,
               | but it is difficult to get useful information.
        
               | nick__m wrote:
               | You cannot have use-after-free if you never call free, so
               | there are no points at which memory should not be
               | touched.
               | 
               | That's the beauty of the never free memory management
               | strategy.
        
               | dajtxx wrote:
               | It can still be a bug if you use something after you
               | would have freed it because your code isn't meant to be
               | using that object any more. It points to errors in the
               | logic.
        
       | cosmolev wrote:
       | How does the output compare to https://www.decompiler.com/ in
       | terms of correctness?
        
       | kamma4434 wrote:
       | I cannot help but wonder why starting a new project in C in 2025.
       | It's like driving a car with no seat belts. You sure you want to
       | do that?
        
         | zzo38computer wrote:
         | In my experience, although many of the other programming
         | languages do improve some things compared with C, they also
         | make many things worse and avoid some of the benefits of C
         | programming.
        
           | pjmlp wrote:
           | I can't recall anything in that sense regarding Modula-2 and
           | Object Pascal, other than not bringing UNIX to the party.
        
         | ronsor wrote:
         | Yes, yes I'm sure. I like using C sometimes.
        
         | sim7c00 wrote:
         | i only write in C. if id build a car it wouldnt have seatbelts.
         | boring, put in ejector seats! not safe? no problem for C :).
        
           | SunlitCat wrote:
           | ejector seats in C car?
           | 
           | goto eject; ...more code we are going to ignore, it could be
           | important but nah, ignore it, what could be happen?...
           | 
           | eject: up_through_the_roof();
           | 
           | :D
        
         | uecker wrote:
         | I moved from C++ to C and I am more productive. I also think
         | this "no seat belts" meme is exaggerated, as there are plenty
         | of tools and strategies to make C fairly safe to use. (it is
         | true though that many people do not put the seat belts on).
        
         | Sophira wrote:
         | We _need_ people who can (and do) write in C, assembly, and all
         | these low-level languages. Otherwise, software will just get
         | slower and slower.
        
           | AgentME wrote:
           | Rust has the same low-level memory model as C without the
           | footguns.
        
             | dardeaup wrote:
             | Rust certainly does have some improvements, but I'm not
             | 100% certain that it's the best tool for all low-level
             | software. For example, I'm experimenting with Rust for some
             | filesystem type code and I can't figure out how to
             | write/read a struct to/from disk all at once. I'm brand new
             | to Rust, so it's quite possible that it can be done and I
             | just don't know the technique. Basically, I'm looking for
             | something in Rust analogous to C's fread/fwrite. I know I
             | can write out each field of the struct individually, but
             | when the struct has many fields it means having to write a
             | huge amount of nasty boilerplate code when in C it's a
             | single function call (fread/fwrite).
        
       | ConanRus wrote:
       | Can you also write a C decompiler in pure Java language?
        
         | dardeaup wrote:
         | Of course it can be done! It wouldn't be as general purpose as
         | the Java decompiler in C because the C decompiler would have to
         | know about the CPU architecture of the executable code (just as
         | the Java decompiler has to know about JVM opcodes).
        
       | kazinator wrote:
       | You've used GPL2 code taken from git (hashmap.c) in your Apache
       | 2.0 project.
       | 
       | https://opensource.stackexchange.com/questions/10737/inclusi...
        
       | jbellis wrote:
       | I don't think it's available in a standalone repo but it IS
       | available as a standalone library, IntelliJ's FernFlower
       | decompiler is the gold standard
       | https://github.com/JetBrains/intellij-community/blob/master/...
       | https://www.jetbrains.com/intellij-repository/releases
       | 
       | I guess there's some history there that I'm not familiar with
       | because JBoss also has a FernFlower decompiler library
       | https://mvnrepository.com/artifact/org.jboss.windup.decompil...
        
       ___________________________________________________________________
       (page generated 2025-06-03 23:00 UTC)