[HN Gopher] JOPA: Java compiler in C++, Jikes modernized to Java...
___________________________________________________________________
JOPA: Java compiler in C++, Jikes modernized to Java 6 with Claude
Author : pshirshov
Score : 49 points
Date : 2025-11-23 17:17 UTC (3 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| pshirshov wrote:
| Essentially, I've tried to throw a task which, I thought, Claude
| won't handle. It did with minimal supervision. Some things had to
| be done in "adversarial" mode where Claude coded and Codex
| criticized/reviewed, but it is what it is. An LLM was able to
| implement generics and many other language features with very
| little supervision in less than a day o_O.
|
| I've been thrilled to see it using GDB with inhuman speed and
| efficiency.
| yosefk wrote:
| I am very impressed with the kind of things people pull out of
| Claude's zhopa but can't see such opportunities in my own work.
| Is success mostly the result of it being able to test its
| output reliably, and of how easy it is to set up the
| environment for this testing?
| pshirshov wrote:
| > Is success mostly the result of it being able to test its
| output reliably, and of how easy it is to set up the
| environment for this testing?
|
| I won't say so. From my experience the key to success is the
| ability to split big tasks into smaller ones and help the
| model with solutions when it's stuck.
|
| Reproducible environments (Nix) help a lot, yes, same for
| sound testing strategies. But the ability to plan is the key.
| orbifold wrote:
| One other thing I've observed is that Claude fares much
| better in a well engineered pre-existing codebase. It
| adopts to most of the style and has plenty of "positive"
| examples to follow. It also benefits from the existing test
| infrastructure. It will still tend to go in infinite loops
| or introduce bugs and then oscillate between them, but I've
| found it to be scarily efficient at implement medium sized
| features in complicated codebases.
| pshirshov wrote:
| Yes, that too, but this particular project was an ancient
| C++ codebase with extremely tight coupling, manual memory
| management and very little abstraction.
| UncleEntity wrote:
| Claude will also tend to go for the "test-passing"
| development style where it gets super fixated on making the
| tests pass with no regards to how the features will work with
| whatever is intended to be built later.
|
| I had to throw away a couple days worth of work because the
| code it built to pass the tests wasn't able to do the actual
| thing it was designed for and the only workaround was to go
| back and build it correctly while, ironically, still keeping
| the same tests.
|
| You kind of have to keep it on a short leash but it'll get
| there in the end... hopefully.
| tekacs wrote:
| zhopa -> jopa (zhopa) for those who don't spot the joke
| 1024bees wrote:
| how did you get gdb working with Claude? There are a few mcp
| servers that looks fine, curious what you used
| pshirshov wrote:
| Well, just told it to use gdb when necessary, MCP wasn't
| required at all! Also it helps to tell it to integrate
| cpptrace and always look at the stacks.
| formerly_proven wrote:
| MCP is more or less obsolete for code generation since agents
| can just run CLI tools directly.
| UncleOxidant wrote:
| > Some things had to be done in "adversarial" mode where Claude
| coded and Codex criticized/reviewed
|
| How does one set up this kind of adversarial mode? What tools
| would you need to use? I generally use Cline or KiloCode - is
| this possible with those?
| pshirshov wrote:
| My own (very dirty) tool, there are some public ones,
| probably I'll try to migrate to one of the more mature tools
| later. Example: https://github.com/ruvnet/claude-flow
|
| > is this possible with those?
|
| You can always write to stdin/read from stdout even if there
| is no SDK available I guess. Or create your own agent on top
| of an LLM provider.
| proxysna wrote:
| Jopa means ass in russian, this reminded me of Pidora.
| koakuma-chan wrote:
| There's JEPA too
| dimaaan wrote:
| Don't forget NPM packages Mocha and Chai (Pee and Tea)
| photios wrote:
| I came here for this comment! TIL about Pidora :D
| mike386 wrote:
| There is also "mudyla" repo in the org, so
| pshirshov wrote:
| That's Multimodal Dynamic Launcher. A very nice thing
| actually, a scripting orchestrator.
| pshirshov wrote:
| Btw, working on Java 7 support. At this moment I sorta have
| working Java 7 compiler targeting Java 6 bytecode (Java 7 has
| StackMapTable which is sort of annoying).
|
| Also, I've tried to replace parser with a modern one. Claude
| succeeds in generating Java 8 parsers with various parser
| generators/parser combinators but fails to resolve extremely
| tight coupling.
| algo_trader wrote:
| what is the feasibility/crazyness level of "llm porting" the
| javac source code to c++ ?
|
| setting copyright issues aside, javac is a pretty clean
| textual-input-output program, and It can probably be reduced to
| a single thread variant
| pshirshov wrote:
| Claude won't handle a project of that scale. Even with Java 7
| modernization project, which is much simpler than full javac
| translation, I constantly hit context limits and Claude
| throws things like "API Error: 400 {"type":"error","error":{"
| type":"invalid_request_error","message":"messages.3.content.7
| 6: `thinking` or `redacted_thinking` blocks in the latest
| assistant message cannot be modified. These blocks must
| remain as they were in the original
| response."},"request_id":"req_011CVWwBJpf3ZrmYGkYZLQVf"}" at
| me.
| Snuggly73 wrote:
| looking at the "att" branches (excuse my unhealthy curiosity) I
| can only say - "jesus fucking christ".
|
| from the old parser ast -> to json -> to new ast representation
| (that is basically again copy of the old one) -> to some new
| incomplete bytecode generation
|
| im sure there is some good explanation, but....why?! :)
| pshirshov wrote:
| I've been looking for a way to decouple legacy parser from
| the rest of the compiler, plus create a way to dump parser
| output in a readable form. Unfortunately, the coupling is too
| tight there. In my own compilers all the outputs of all the
| phases are serializable.
|
| In the end I've just reanimated the original parser generator
| and progressed to full Java 7 syntactically (-att5 branch),
| but there are some major obstacles with bytecode.
| Snuggly73 wrote:
| i thought it might be something like this (still a weird
| overkill), but if you are effectively replacing the parser
| with new peg and replacing the backend with something new -
| then there is nothing left - just start from scratch :)
| exabrial wrote:
| tangential: Isn't there from the same time period, a java
| compiler written in java?
| pshirshov wrote:
| It's much older, but even now this is THE ONLY viable pathway
| to bootstrap a modern JDK from scratch. I'm trying to modernize
| it so the bootstrap path might be shortened.
|
| See https://bootstrappable.org/projects/java.html
| cyberax wrote:
| Java-to-bytecode compiler (javac) has always been written in
| Java. There was a JVM written in Java: Jikes RVM.
| anthk wrote:
| Jikes didn't.
| pshirshov wrote:
| Ah, by the way. I've tried to do the same with Codex
| (gpt-5.1-codex-max) and Gemini (2.5 pro), both failed
| spectacularly. This job was done mostly by Sonnet 4.5. Java 6 did
| not require intensive supervision. Java 7 parts are done with
| Opus 4.5 and it constantly hits its limits, I have to regularly
| intervene.
| goranmoomin wrote:
| I'm genuinely curious on how well this is working, is there an
| independent Java test suite that covers major Java 5/6 features
| that can verify that the JOPA compiler works per the spec? I.e. I
| see that Claude has wrote a few tests in it's commits, but it
| would be wonderful if there's a non-Clauded independent test
| suite (probably from other Java implementations?) that tracks
| progress.
|
| I do feel that that is pretty much needed to claim that Claude is
| adding features to match the Java spec.
| pshirshov wrote:
| Well, it's complicated. The original jdk compliance tests are
| notoriously hard to deal with. Currently I parse nearly 100% of
| positive testcases from JDK 7 test suite (in one of Java 7
| branches) but I only have several dozens of true end to end
| tests (build .java with jopa, validate classfile with javap,
| run classfile with javac).
|
| So, I can't tell how good it actually is but it definitely
| handles reasonably complex source files with generics
| (something the original compiler was unable to do).
|
| The actual goal of the project is to be able to build at least
| ANT to simplify clean bootstrap of OpenJDK.
| AtlasBarfed wrote:
| That is perilously close to the usual:
|
| "AI DID EVERYTHING IN A DAY"
|
| "How do you know it works?"
|
| "... it just looks like it does"
|
| Like when I ask AIs to port sed to java, and it writes test
| cases ... running sed on a CLI and doesn't implement the full
| lang spec no matter how much prompting I give it.
| pshirshov wrote:
| Well, at least the emitted bytecode validates with javap
| and a lot of stuff definitely runs on real jvm.
| th0ma5 wrote:
| I think the criticisms are too often dismissed as moving
| the goalposts or ignorant of potential, but short of
| recreating the active open bugs in Java, you've created a
| different thing whose differences have to be managed and
| it is unclear how helpful that may be despite the working
| implementations of subsets.
| pshirshov wrote:
| If I (or someone else) can use it as a start point in
| bootstrap process - that's fine with me. This is not
| supposed to be a top-tier compiler. Essentially, it needs
| to be able to build ANT.
| sgammon wrote:
| [j|y]ikes
| atgreen wrote:
| Related: I recently got javac working with OpenLDK, my JVM
| bytecode to Common Lisp transpiler. The `javacl` binary is a
| dumped sbcl image that behaves just like OpenJDK javac program,
| but with CL under the hood (eg. java objects/methods are all
| CLOS).
| pshirshov wrote:
| Please post the link here. If it's more than just a demo, it
| might be a valuable tool.
| atgreen wrote:
| https://github.com/atgreen/openldk
| pshirshov wrote:
| I think it's an extremely valuable tool. If it can compile
| from sources - it would be priceless.
| atgreen wrote:
| openldk itself builds from source. It reads jar/class
| files and JIT-transpiles them to common lisp code, which
| is in turn compiled to native instructions. It does not
| read java source code at all. But you can run OpenJDK's
| javac with OpenLDK. You can write "native" methods in
| Common Lisp, extend Java classes with CLOS classes, use
| conditions/restarts, :before/:after/:around methods, dump
| images, etc. There's some ways to go still, but -- like I
| said -- javac just started working as a native lisp image
| executable, which was an important milestone.
| pshirshov wrote:
| I cannot bootstrap openjdk from ground zero, from pure
| source files without binaries. From what I know there is
| just one pathway to that, the Guix one, which starts from
| Jikes.
| shawn_w wrote:
| I remember discovering and using jikes in the 90's. It was /so/
| much faster than javac back then.
|
| "Modernizing" to Java 6 is amusing.
| pshirshov wrote:
| Almost got to Java 7. And there is a huge gap between Jikes'
| original Java 4 and Java 6.
|
| Even Java 6 support should make ground zero bootstrap of modern
| JDKs much easier.
___________________________________________________________________
(page generated 2025-11-26 23:00 UTC)