[HN Gopher] Show HN: I wrote a Java decompiler in pure C language
___________________________________________________________________
Show HN: I wrote a Java decompiler in pure C language
Author : neocanable
Score : 135 points
Date : 2025-06-03 12:14 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| appendixv3 wrote:
| Very cool project! Love the idea of a Java decompiler written in
| C -- the speed must be great.
|
| Any plan to support `.dex` in the future? Also curious how you
| handle inner classes inside JARs.
| tslater2006 wrote:
| The readme shows support for dumping dex files. Edit: missed
| that it has a comment that stays "unsupport for now" but at
| least it looks like something planned
| mdaniel wrote:
| The "jikes" compiler from IBM
| <https://github.com/daveshields/jikespg> was written in C++ and
| was for the longest time _screaming_ fast. It also had its own
| parser generator lpg which was fun to play with, if you 're
| into those things <https://github.com/daveshields/jikespg>
|
| It seems someone liked it and made a "v2" along with LSP
| support https://github.com/A-LPG/LPG2#lpg2
| amiga386 wrote:
| Jikes also gave massively better error messages than the
| official Java compiler, from what I remember, and it
| certainly ran a lot faster on the Amiga
| (https://aminet.net/package/dev/lang/jikes) than trying to
| run javac via Kaffe (https://en.wikipedia.org/wiki/Kaffe)
| did.
| pjmlp wrote:
| Certainly not everything on Jikes, given that it was one of
| the first bootstraped Java toolchains.
|
| https://www.jikesrvm.org/
| neocanable wrote:
| I am writing the part of decompiling dex and apk. The current
| speed is about 10 times faster than that of Java, and it takes
| up less resources than Java. And the compiled binary is
| smaller, only about 300k. Thank you for your attention.
| mdaniel wrote:
| This has been my life experience with things written in
| C/C++, so speed doesn't matter. Or, I guess from an
| alternative perspective, it _ran_ very fast, but _exited_
| very fast, too :-D $ ./objdir/garlic
| $the_jar_file -o out-dir -t $(nproc) Progress : 85
| (1024)Segmentation fault: 11
| neocanable wrote:
| Sorry for giving you a bad experience. Please provide the
| jar file or class file. I hope I can fix it as soon as
| possible.
| uecker wrote:
| Is it? This is my experience with Python. The C/C++
| programs I use daily never seem to crash (Linux, bash,
| terminals, X, firefox, vim, etc.). It must be years ago one
| of those programs crashed while I used it.
| 1718627440 wrote:
| Also a segfault IS the protection layer intervening, it
| is equivalent to a exception in other languages. The real
| problem is, when there is no segfault.
| uecker wrote:
| This is absolutely true. But even this does not happen in
| the software I use every day. Software written is C is
| definitely the most stable I use - by far. That there are
| people running around claiming that it is impossible to
| write stable software in C and it crashes all the time
| due to bugs is rather unfortunate, as it is far from the
| truth.
| neocanable wrote:
| It is processes inner classes recursively. First read all entry
| from jar, and analyze the relationships between classes. Then
| do some decompile job.
| keepamovin wrote:
| By hand or with AI? Fascinating. So much work! What was your
| motivation for this?
| xandrius wrote:
| Irrelevant to me. People would never ask whether someone has
| created something looking at SO or not. If the thing works as
| advertised, good for them!
| lyxell wrote:
| To some people the process leading to a finished project is
| the most interesting thing about posts like these.
| johnisgood wrote:
| LLMs can explain the process, and you can build projects
| with LLMs explaining the process.
| nticompass wrote:
| LLMs can attempt to explain the code, but it can't
| explain people's thought process and that's the
| interesting part.
|
| I want to hear about the reverse engineering, how you
| thought the code through. LLMs are boring.
| johnisgood wrote:
| They can, if you write down your thought process, which
| is probably what you should do when you are using an LLM
| to create a product, but what do I know.
| Ghoelian wrote:
| > They can, if you write down your thought process
|
| Just write a blogpost at that point.
| johnisgood wrote:
| You do not have to be as accurate or that specific, you
| do not have to worry about the way you word or organize
| things, the LLM can figure it out, as opposed to a blog
| post.
|
| So "To some people the process leading to a finished
| project is the most interesting thing about posts like
| these." is bullshit, that is said by someone who has
| never used LLM _properly_. You can achieve it with LLMs.
| You definitely can, I know, I did, accurately (I double
| checked).
|
| I will throw it out here, too:
| https://news.ycombinator.com/item?id=44163063 (My AI
| skeptic friends are all nuts)
| ryan93 wrote:
| That is not true
| johnisgood wrote:
| How come? You had different experiences? Which LLMs, what
| prompts? Give me all the details that supports your claim
| that it is not true. My experiences completely differ
| from yours, so the way I use it, it is very much true.
|
| That said, it is probably pointless to argue with full-
| blown AI-skeptics.
|
| People had lots of great and productive-enhancing
| experiences with LLMs, you did not, great, that does not
| reflect the tool, it reflects your way of using the tool.
|
| I will just throw it out here:
| https://news.ycombinator.com/item?id=44163063 (My AI
| skeptic friends are all nuts)
| shortrounddev2 wrote:
| LLM output is simply not interesting
| johnisgood wrote:
| I did not say that you should copy paste its output
| verbatim. I thought this was obvious.
|
| Additionally, "interesting" is highly subjective. It
| could be technically correct, yet uninteresting.
| zerr wrote:
| It's not about explaining the process but experiencing
| it.
| johnisgood wrote:
| Well, they can experience it if they wish to. Sadly most
| vibe-coders do not.
| neocanable wrote:
| 90% by hand, 10% AI. I do this for fun and to learn about jvm.
| jebarker wrote:
| I think that sort of ratio is the sweet spot for learning.
| I've been writing an 8086 simulator in C++ and using an LLM
| for answering specific technical questions I come up with has
| drastically sped up my progress without it actually doing the
| work for me.
| keepamovin wrote:
| Wow, impressive. A project of the scale and depth.
| Bjartr wrote:
| A great question to ask. We're in the middle of learning where
| AI can and can't be effective. Knowing where and how it's being
| used is quite useful.
| stefanos82 wrote:
| Nice job! I don't know whether you know https://github.com/java-
| decompiler/jd-gui or not, but in case you haven't seen it before,
| maybe you could use it as a reference, since it's written in
| Java, for extra fun with your adventure?
| rafram wrote:
| Things may have changed, but my impression as of several years
| ago was that JD-GUI was far, far behind the state of the art
| (Fernflower, aka the built-in IntelliJ decompiler) in terms of
| correctness, re-sugaring, support for modern Java features, and
| so on. Fernflower is open source as part of IntelliJ:
| https://github.com/fesh0r/fernflower
| GranPC wrote:
| Is there a good GUI for this a la jadx-gui that isn't an
| entire IDE?
| rafram wrote:
| Not that I know of. The features I'd want in order to
| consider a decompiler GUI "good" (e.g. a good text editing
| control, go-to-definition, find usages, manual renaming of
| obfuscated symbol names) quickly approach the scope of an
| entire IDE, though.
| DefineOutside wrote:
| The most feature advanced decompiler I know of is Recaf.
| It supports a mix of decompilers and even bytecode
| editing.
| gibibit wrote:
| I am always curious how different C programs decide how to manage
| memory.
|
| In this case there are is a custom string library. Functions
| returned owned heap-allocated strings.
|
| However, I think there's a problem where static strings are used
| interchangably with heap-allocated strings, such as in the
| function `string class_simple_name(string full)` (
| https://github.com/neocanable/garlic/blob/72357ddbcffdb75641... )
|
| Sometimes it returns a static string like `g_str_int` and
| sometimes a newly heap-allocated string, such as returned by
| `class_type_array_name(g_str_int, depth)`.
|
| Callers have no way to properly release the memory allocated by
| this function.
| neocanable wrote:
| In multi-threaded mode, each thread will create a separate
| memory pool. If in single-threaded mode, a global memory pool
| is used. You can refer to https://github.com/neocanable/garlic/
| blob/72357ddbcffdb75641.... The x_alloc and x_alloc_in in it
| indicate where the memory is allocated. When each task ends,
| the memory allocated in the memory pool is released, and the
| cycle repeats.
| IshKebab wrote:
| Interesting. Someone should come up with a language that
| prevents these sorts of mistakes!
| brabel wrote:
| That's impossible. Just be more careful and everything should
| work, the author's C was just a bit rusty!
| neocanable wrote:
| This project is my first project written in C language.
| Before this, my C language level was only at printf("hello
| world"). I am very happy because this project made me dare
| to use secondary pointers.
| sim7c00 wrote:
| u did really well ppl like to pick on C. :) thanks for
| making it in C, fun to read ur code and see how others go
| about this language!
| kookamamie wrote:
| Yes, perhaps it could have a marketing slogan like "Write
| once, crash everywhere!"
| cenamus wrote:
| Thank god Lisp is older than C, don't have to deal with such
| nonsense :-)
| pjmlp wrote:
| If only there were a couple of OSes implementated during the
| 1960's with such programming languages....
| uecker wrote:
| I think he is using memory pools, so this is ok.
| SunlitCat wrote:
| Strings! The bane of C programming, and a big reason I prefer
| C++. :D
| norir wrote:
| Many command line tools do not need memory management at all,
| at least to first approximation. Free nothing and let the os
| cleanup on process exit. Most libraries can either use an arena
| internally and copy any values that get returned to the user to
| the heap at boundaries or require the user to externally create
| and destroy the arena. This can be made ergonomic with one
| macro that injects an arena argument into function defs and
| another that replaces malloc by bumping the local arena data
| pointer that the prior macro injected.
| 1718627440 wrote:
| That might be true, but leaking is neither the critical nor
| the most hard to find memory management issue, and good luck
| trying to adapt or even run valgrind with a codebase that
| mindlessly allocates and leaks everywhere.
| guerrilla wrote:
| Pretty sure you can just disable leak checking.
| 1718627440 wrote:
| But for example verifying that memory is not touched
| after it is supposed to, is much harder when you can't
| rely on it being freed.
|
| Of course literally running valgrind is still possible,
| but it is difficult to get useful information.
| nick__m wrote:
| You cannot have use-after-free if you never call free, so
| there are no points at which memory should not be
| touched.
|
| That's the beauty of the never free memory management
| strategy.
| dajtxx wrote:
| It can still be a bug if you use something after you
| would have freed it because your code isn't meant to be
| using that object any more. It points to errors in the
| logic.
| cosmolev wrote:
| How does the output compare to https://www.decompiler.com/ in
| terms of correctness?
| kamma4434 wrote:
| I cannot help but wonder why starting a new project in C in 2025.
| It's like driving a car with no seat belts. You sure you want to
| do that?
| zzo38computer wrote:
| In my experience, although many of the other programming
| languages do improve some things compared with C, they also
| make many things worse and avoid some of the benefits of C
| programming.
| pjmlp wrote:
| I can't recall anything in that sense regarding Modula-2 and
| Object Pascal, other than not bringing UNIX to the party.
| ronsor wrote:
| Yes, yes I'm sure. I like using C sometimes.
| sim7c00 wrote:
| i only write in C. if id build a car it wouldnt have seatbelts.
| boring, put in ejector seats! not safe? no problem for C :).
| SunlitCat wrote:
| ejector seats in C car?
|
| goto eject; ...more code we are going to ignore, it could be
| important but nah, ignore it, what could be happen?...
|
| eject: up_through_the_roof();
|
| :D
| uecker wrote:
| I moved from C++ to C and I am more productive. I also think
| this "no seat belts" meme is exaggerated, as there are plenty
| of tools and strategies to make C fairly safe to use. (it is
| true though that many people do not put the seat belts on).
| Sophira wrote:
| We _need_ people who can (and do) write in C, assembly, and all
| these low-level languages. Otherwise, software will just get
| slower and slower.
| AgentME wrote:
| Rust has the same low-level memory model as C without the
| footguns.
| dardeaup wrote:
| Rust certainly does have some improvements, but I'm not
| 100% certain that it's the best tool for all low-level
| software. For example, I'm experimenting with Rust for some
| filesystem type code and I can't figure out how to
| write/read a struct to/from disk all at once. I'm brand new
| to Rust, so it's quite possible that it can be done and I
| just don't know the technique. Basically, I'm looking for
| something in Rust analogous to C's fread/fwrite. I know I
| can write out each field of the struct individually, but
| when the struct has many fields it means having to write a
| huge amount of nasty boilerplate code when in C it's a
| single function call (fread/fwrite).
| ConanRus wrote:
| Can you also write a C decompiler in pure Java language?
| dardeaup wrote:
| Of course it can be done! It wouldn't be as general purpose as
| the Java decompiler in C because the C decompiler would have to
| know about the CPU architecture of the executable code (just as
| the Java decompiler has to know about JVM opcodes).
| kazinator wrote:
| You've used GPL2 code taken from git (hashmap.c) in your Apache
| 2.0 project.
|
| https://opensource.stackexchange.com/questions/10737/inclusi...
| jbellis wrote:
| I don't think it's available in a standalone repo but it IS
| available as a standalone library, IntelliJ's FernFlower
| decompiler is the gold standard
| https://github.com/JetBrains/intellij-community/blob/master/...
| https://www.jetbrains.com/intellij-repository/releases
|
| I guess there's some history there that I'm not familiar with
| because JBoss also has a FernFlower decompiler library
| https://mvnrepository.com/artifact/org.jboss.windup.decompil...
___________________________________________________________________
(page generated 2025-06-03 23:00 UTC)