[HN Gopher] Jazelle DBX: Allow ARM processors to execute Java by...
       ___________________________________________________________________
        
       Jazelle DBX: Allow ARM processors to execute Java bytecode in
       hardware
        
       Author : vincent_s
       Score  : 96 points
       Date   : 2024-01-22 13:27 UTC (9 hours ago)
        
 (HTM) web link (en.wikipedia.org)
 (TXT) w3m dump (en.wikipedia.org)
        
       | nanolith wrote:
       | Jazelle and its replacement, ThumbEE, have been deprecated and
       | removed in later architectures.
       | 
       | On modern Cortex-A systems, there are enough resources to make
       | JIT feasible. On smaller systems, AOT is a reasonable
       | alternative.
        
       | fch42 wrote:
       | My brainfog claims some blurry memories of this ... for one,
       | documentation is lacking so much that an opensource JVM using
       | Jazelle never happened; you wanted to develop a JVM on top of it,
       | you'd pay ARM for docs, professional services, and unit licenses.
       | And second, that once things got to the ARM11 series cores,
       | software JITs beat the cr* out of Jazelle. I don't remember any
       | early Android device ever used it.
       | 
       | ARM is quite capable in vapourware generation. 64bit ARM was
       | press-released (https://www.zdnet.com/article/arm-to-
       | unleash-64-bit-jaguar-f...) a decade before ARMv8 / aarch64
       | became a thing.
       | 
       | (I'd love to learn more)
        
         | RicoElectrico wrote:
         | > I don't remember any early Android device ever used it.
         | 
         | It couldn't have, as Dalvik VM is distinct from JVM.
        
           | fch42 wrote:
           | It executes Java Bytecode. Whether Dalvik VM was/is a "Java"
           | VM is hardly relevant there (not the least because "Java" is
           | so much more than Java Bytecode, and Jazelle does nothing to
           | help with anything on top of the latter).
        
             | RicoElectrico wrote:
             | Apparently it's not even Java bytecode. Would make sense,
             | after all Dalvik is register-based.
             | https://stackoverflow.com/a/36335740
        
             | pjmlp wrote:
             | It certainly doesn't.
             | 
             | https://source.android.com/docs/core/runtime/dalvik-
             | bytecode
        
             | bitwize wrote:
             | Java bytecode is transpiled to Dalvik's own bytecode as a
             | build step. Dalvik itself doesn't run Java bytecode. This
             | is one of the reasons why Oracle sued Google: clearly
             | Google was trying to appropriate Java with some clever IP
             | law dodges with this Dalvik business.
        
         | Vogtinator wrote:
         | There is a bit of info including example code on
         | https://hackspire.org/index.php/Jazelle
        
         | jaywee wrote:
         | Sun wanted to do the same thing in late 90ties - picoJAVA
         | (embedded), microJava and UltraJava (VLIW workstations).
         | 
         | Relegated to the dustbin of history.
        
           | miki123211 wrote:
           | Java Card still survives, though.
           | 
           | I find Java Card pretty puzzling. You go from high-level
           | interpreted languages on powerful servers, to Java and C++ on
           | less powerful devices (like old phones for example), to
           | almost exclusively C on Microcontrollers, and then back to
           | Java again on cards. If. it makes sense to write Java code
           | for a device small enough to draw power from radio waves, why
           | aren't we doing that on microcontrollers?
        
             | cpgxiii wrote:
             | The Java Card environment is quite limited, though, due to
             | resource limitations.
             | 
             | There have been several more-or-less successful attempts at
             | running higher-level languages on microcontrollers, e.g.
             | .Net Micro Framework and CircuitPython. In all of these
             | cases, though, you tend to struggle with all the native
             | device behavior being described/intended by the vendor for
             | use with C or C++ and the BSP for the higher level
             | environment being an afterthought.
        
           | sillywalk wrote:
           | FYI UltraJava was renamed to MAJC[0] which IIRC was only used
           | in Sun's XVR graphics cards.
           | 
           | More from Ars (1999)
           | https://archive.arstechnica.com/cpu/4q99/majc/majc-1.html
           | 
           | [0] https://en.wikipedia.org/wiki/MAJC
        
         | fidotron wrote:
         | The performance of Dalvik was far below J2ME on Nokia and Sony
         | Ericsson feature phones for a very long time, and Android
         | relied on pushing a lot to C libraries to compensate.
        
           | pjmlp wrote:
           | As Nokia alumni it was incredible how much of the Google
           | fanbase believed in Dalvik's performance fairy tail.
           | 
           | ART is another matter, though.
        
           | toast0 wrote:
           | Sure, but J2ME can't seek backwards in open files. (That was
           | added in Java 1.4, and J2ME is 1.3)
        
         | zozbot234 wrote:
         | There was a successor ThumbEE ("Execution Environment") that
         | _was_ comprehensively documented. But it didn 't get much
         | attention either and later chips removed it.
        
         | happosai wrote:
         | IIRC jazelle left it to the implementers which bytecodes to
         | handle in HW and what to trap to SW. Since SW JIT beat the
         | Jazelle implementtions, by the arm11 times, the cpu
         | implementers would just leave everything for the SW traps... So
         | while the original raspberry pi was arm1176j and J meant
         | Jazelle support, it was all already hollowed out.
        
       | michaelt wrote:
       | I remember reading about Jazelle many years ago - before the
       | release of the iPhone and suchlike. This was the age when people
       | were coming up with things like 'Java Card' - smartcards
       | programmed directly in Java.
       | 
       | I never heard of anyone actually using Jazelle, though - I assume
       | JIT ended up working better.
        
         | fch42 wrote:
         | I'm a little in the realm of speculation here. Part of the
         | issue with Java for embedded devices was "a bad fit". What made
         | Java thrive in the server or even applet spaces wasn't the
         | instruction set but the rich ecosystem around Java. Yet,
         | threading - as "inherent" to Java it is - is provided by the OS
         | and "only" used/wrapped by the JVM. All the libraries ...
         | megabytes of (useful) software, yet not implemented (nor even
         | helped) by hardware acceleration. The "equivalent" of static
         | linking to minimize the footprint never quite existed.
         | 
         | So on a smartcard ... write software in a (uncommon, and when
         | compared with ARM which is a very "rich" assembly language)
         | form of low-level instruction set, and pay both Sun and ARM
         | top$ for the privilege - nevermind the likely "runtime"
         | footprint far exceeding the 256kB RAM you planned for that 5$
         | card - why? Writing small singlethreaded software in anything
         | that compiles down to a static ARM binary has been easy and
         | quick enough that going off the ARM instruction set looked
         | pointless for most. And learning which parts of "Java" actually
         | worked in such an environment was hard, even (or especially?)
         | for developers who knew (the strengths of) Java well. Because
         | developers and specifiers expected "rich Java", and couldn't
         | care less about the Bytecode. JITs later only hoovered up the
         | ashes.
        
           | pjmlp wrote:
           | Java is doing just fine in embedded devices.
           | 
           | https://www.ptc.com/en/products/developer-tools/perc
           | 
           | https://www.aicas.com/wp/
           | 
           | https://www.microej.com/
           | 
           | https://en.wikipedia.org/wiki/BD-J
           | 
           | https://www.thalesgroup.com/en/markets/digital-identity-
           | and-...
        
         | bombcar wrote:
         | IIRC people didn't "really believe" that Java could _actually
         | be performant_ because they assumed that since it has a JIT
         | layer, it would _never even get close_ to native code.
         | 
         | But the reality was that JIT allows code to get _faster_ over
         | time, as the JIT improves.
         | 
         | Things like Jazelle let chip manufacturers paper over a paper
         | objection.
        
           | PaulHoule wrote:
           | Specialized hardware has been losing out against general for
           | years.
           | 
           | There were those "LISP machines" in the early 1980s but when
           | Common Lisp was designed they made sure it could be
           | implemented efficiently on emerging 32-bit machines.
        
             | bombcar wrote:
             | Part of the reason is anytime a specialized hardware is
             | found that _works_ , the generalized hardware steals the
             | feature that makes it faster - basically the story of all
             | the extensions to x86 like SSE, etc.
        
           | o11c wrote:
           | > But the reality was that JIT allows code to get faster over
           | time, as the JIT improves.
           | 
           | Ehh .. PGO is only somewhat better for JIT than AOT. More
           | often for purely-numerical code the win is because the AOT
           | doesn't do per-machine `-march=native`. It's the memory model
           | that kills JVM performance for any nontrivial app though.
        
             | tenaf0 wrote:
             | Well, code size is another interesting aspect here. A JIT
             | compiler can effectively create any number of versions for
             | a hot method, based on even very aggressive assumptions (an
             | easy one would be that a given object is non-null, or that
             | the interface only has a single instance loaded). The
             | checks for these are cheap (e.g. it could be encoded as
             | trapping an invalid page address in case of NPEs), and
             | invalidation's cost is amortized.
             | 
             | Contrast this with the problem of specialization in AOT
             | languages, which can easily result in bloated binaries (PGO
             | does help here quite a lot, that much is true). For
             | example, generics might output a completely new function
             | for every type it gets instantiated with - if the function
             | is not that hot, it actually makes sense to rather try to
             | handle more cases with the same code.
        
       | BenoitP wrote:
       | The gains seem to not have been high enough to sustain that
       | project. Nowadays CPUs plan, fuse and reorder so much of micro-
       | code that lower-level languages can sort of be considered virtual
       | as well.
       | 
       | But Java and similar languages extract more freedom-of-operation
       | from the programmer to the runtime: no memory address
       | shenanigans, richer types, and to some extent immutability and
       | sealed chunks of code. All these could be picked up and turned
       | into more performance by the hardware; with some help from the
       | compiler. Sort of like SQL being a 4th-gen language, letting the
       | runtime collect statistics and chose the best course of execution
       | (if you squint at it in the dark with colored glasses)
       | 
       | More recent work about this is to be found on the RISC-V J
       | extension [1], still to be formalized and picked up by the
       | industry. Three features could help dynamic languages:
       | 
       | * Pointer masking: you can fit a lot in the unused higher bits of
       | an address. Some GCs use them to annotate memory (refered-
       | to/visited/unvisited/etc.), but you have to mask them. A hardware
       | assisted mask could help a lot.
       | 
       | * Memory tagging: Helps with security, helps with bounds-checking
       | 
       | * More control over instruction caches
       | 
       | It is sort of stale at the moment, and if you track down the
       | people working on it they've been reassigned to the AI-
       | accelerator craze. But it's going to come back, as Moore's law
       | continues to end and Java's TCO will again be at the top of the
       | bean-counter's stack.
       | 
       | [1] https://github.com/riscv/riscv-j-extension
        
         | lionkor wrote:
         | > as Moore's law continues to end
         | 
         | more like Wirths law proving itself still
        
         | pjmlp wrote:
         | As free beer AOT compilers for Java are commonly available, and
         | as shown on Android since version 5, I doubt special opcodes
         | will matter again.
         | 
         | Ironically when one dives into computer archeology, old
         | Assembly languages are occasionally referred as bytecodes, the
         | reason being that in CISC designs with microcoded CPUs they
         | were already seen that way by hardware teams.
        
           | BenoitP wrote:
           | I'm still not decided on AOT vs JIT being the endgame.
           | 
           | In theory JIT should be higher performance, because it
           | benefits from statistics taken at actual runtime. Given a
           | smart enough compiler. But as a piece of code matures and
           | gets more stable, the envelope of executions is better known
           | and programmers can encode that at compile-time. That's the
           | tradeoff taken by Rust: ask for more proofs from the
           | programmers, and Rust is continuing to pick up speed.
           | 
           | That's also what the Leyden project / condensers [1] is
           | about, if I understand correctly. Pick up proofs and
           | guarantees as early as possible and transform the program.
           | For example by constant-propagating a configuration file
           | taken up during build-time.
           | 
           | Something I've pondered over the years: a programmer's job is
           | not to produce code. It is to produce proofs and guarantees
           | (yet another digression/rant: generating code was never a
           | problem. Before LLMs we could copy-paste code from
           | StackOverflow just fine)
           | 
           | In the end it's only about marginal improvements though.
           | These could be superseded by changes of paradigm like RAM
           | getting some compute capabilities; or programs being split
           | into a myriad of specialized instructions. For example
           | filters, rules and parsing going inside the network card; SQL
           | projections and filters going into the SSD controller; or
           | matrix-multiplication going into integrated GPU/TPU/etc just
           | like now.
           | 
           | [1] https://openjdk.org/projects/leyden/notes/03-toward-
           | condense...
        
             | pjmlp wrote:
             | The best solution isn't AOT vs JIT, rather JIT and AOT,
             | having both available as standard part of the tooling.
             | 
             | Android has learnt to have both, and thanks to PGO being
             | shared across devices via Play Store, the AOT/JIT outcome
             | reaches the ideal optimum for a specific application.
             | 
             | Azul and IBM have similar approaches on their JVMs with a
             | cluster based JIT, and JIT caches as AOT alternative.
             | 
             | Also stuff like GPGPU is a mix of AOT and JIT, and is doing
             | quite alright.
             | 
             | I am not so confident with LLMs, when they get good enough
             | programmers will be left out of the loop, and will have to
             | contend to similar roles as when doing no-code SaaS configs
             | or some form of architects.
             | 
             | A few programmers will remain as the LLMs high priests.
        
               | BenoitP wrote:
               | > A few programmers will remain as the LLMs high priests.
               | 
               | That's interesting.
               | 
               | It's controversial to say that in 2024, but not all
               | opinions have the same value. Some are great, but some
               | are plain dumb. The current corporate right opinion is to
               | praise LLMs as end-all be-all. I've been asked to advise
               | a private banking family office wanting to get into LLMs.
               | For advising their clients' financial decisions. I
               | politely declined. Can there be a worse use case? LLMs
               | are parrots with the brain size of the internet. With
               | thoughts of random origin mixed together randomly. It
               | produces wonderful form, but abysmal analysis.
               | 
               | IMHO as LLMs will begin to be indistinguishable from real
               | users (and internet dogs), there's going to be a
               | resurging need to trace origin to a human; and maybe to
               | also rank their opinions as well. My money is on some
               | form of distributed social proof designating the high
               | priests.
        
               | pjmlp wrote:
               | I see the current state of LLMs are when we read about
               | Assembly programmers being suspicious something like high
               | level languages would ever take off.
               | 
               | When we read about history of Fortran, there are several
               | remarks on the amount of work put into place to win over
               | those developers, as otherwise Fortran would have been
               | yet another failed attempt.
               | 
               | LLMs seem to be at a similar stage, maybe their Fortran
               | moment isn't yet here, parrots as you say, but it will
               | come.
        
             | tenaf0 wrote:
             | I do think, that in the general case, a JIT compiler is
             | required: you can't make _every_ program fast, without
             | having the ability to synthesize new code based on only-
             | runtime available information. There are many where AOT is
             | more than enough, but not all are such. Note, this doesn't
             | preclude AOT /hybrid models as pjmlp correctly says.
             | 
             | One stereotypical (but not the best) example would be
             | regexes: you basically want to compile some AST into a
             | mini-program. This can also be done with a tiny interpreter
             | without JIT, which will be quite competitive in speed (I
             | believe that's what rust has, and it's indeed one of the
             | fastest - the advantage of the problem/domain here is that
             | you really can have tiny interpreters that efficiently use
             | the caches, having very little overhead on today's CPUs),
             | but I am quite sure that a "JITted rust" with all the other
             | optimizations/memory layouts _could_ potentially fair
             | better, but of course it's not a trivial additional
             | complexity.
        
         | vextea wrote:
         | Remember when for a while Azul tried to sell custom CPUs to
         | support features in their JVM (e.g. some garbage collector
         | features that required hardware interrupts and some other extra
         | instructions). Although they dropped it pretty quickly in favor
         | of just working on software
         | 
         | https://www.cpushack.com/2016/05/21/azul-systems-vega-3-54-c...
        
           | sillywalk wrote:
           | IBM's Z14 (and later I assume) supported Guarded Storage
           | Facility for 'pauseless Java Garbage collection.'
        
         | toast0 wrote:
         | > Pointer masking: you can fit a lot in the unused higher bits
         | of an address. Some GCs use them to annotate memory (refered-
         | to/visited/unvisited/etc.), but you have to mask them. A
         | hardware assisted mask could help a lot.
         | 
         | If you're building hardware masking, it should be viable for
         | low bits too. If you define all your objects to be n-byte
         | aligned, it frees up low bits for things too, and might not be
         | an imposition, things like to be aligned.
        
           | ithkuil wrote:
           | The sparc ISA had tagged arithmetic instructions so that you
           | could tag integers using LSBs and ignore them
        
         | pjc50 wrote:
         | One of the few elements left like this is the ARM Javascript
         | instruction: https://news.ycombinator.com/item?id=24808207
        
         | funcDropShadow wrote:
         | The Java ecosystems initially started with optimizing Java
         | compilers. That setup could benefit from direct hardware
         | support for Java bytecode. Later, it was discovered that it is
         | more beneficial to remove the optimization from javac in order
         | to provide more context to the JIT compiler. Which enables
         | better optimizations from JIT compilers. By directly running
         | Java bytecode, you would loose so many optimizations done by
         | Hotspot, that it is hard to get on par just by interpreting
         | bytecode in hardware. The story may be different for restricted
         | JVMs that don't have a sophisticated JIT.
        
           | miohtama wrote:
           | The current (largest) end-user Java ecosystem is in practice
           | Android and it ahead-of-time compiling ART.
           | 
           | Java itself got very good. Though Oracle was blocked to leech
           | money, or have return for their investment, depending on the
           | viewpoint.
        
             | kaba0 wrote:
             | I don't really get your last point - java's improvements
             | are _due_ to Oracle, not despite it. They have a terrible
             | name, but they have been excellent stewards of the
             | platform.
        
               | miohtama wrote:
               | Android ART would unlikely to exist if Oracle would have
               | been enforcing licensing requirements as they wished.
               | 
               | ART runs on devices for 1B+ users and is more relevant
               | for the world population as Oracle. Although we can
               | speculate likely Android would have switched to something
               | else if Oracle were to win in the court.
        
       | bobsmith432 wrote:
       | Fun fact, both the Wii's seconday ARM chip used for security
       | tasks and the iPhone 2G's processors had Jazelle but never used
       | them.
        
         | khangaroo wrote:
         | The 3DS has it too: https://github.com/SonoSooS/libjz
        
         | repiret wrote:
         | It was on every Arm926, 1136 and 1176. Lots of devices of a
         | certain era had it but didn't use it.
        
       | willvarfar wrote:
       | In a similar spirit, Apple seems to have made sure some critical
       | OSX idioms were fast on the M1, perhaps even influencing their
       | instruction set.
       | 
       | Retaining and releasing an NSObject took ~6.5 nanoseconds on the
       | M1 when it came out, comparing with ~30 nanoseconds on the equiv
       | gen Intel.
       | 
       | In fact, the M1 _emulated_ an Intel retaining and releasing an
       | NSObject fast than an Intel could!
       | 
       | One source: https://daringfireball.net/2020/11/the_m1_macs
        
         | ithkuil wrote:
         | The M1 emulation with Rosetta is actually dynamic recompilation
         | so of you're measuring only that specific small section it's
         | not surprising that Rosetta could have emitted optimal code for
         | that instruction sequence
        
       | nadavwr wrote:
       | Running bytecode instructions in hardware essentially means a
       | hardware-based interpreter. It likely would have been the best
       | performing interpreter for the hardware, but JIT-compilation to
       | native code still would run circles around it.
       | 
       | During years when this instruction set was relevant (though
       | apparently unutilized), Oracle still had very limited ARM support
       | for Java SE, so having a fast interpreter could have been
       | desirable -- but it makes no sense on beefier ARM systems that
       | are able to support decent JIT or AOT support available nowadays.
        
       | nickpsecurity wrote:
       | If you want a Java processor, there's a cheap one for FPGA's
       | here:
       | 
       | https://www.jopdesign.com/
       | 
       | It's more for embedded use. Might give someone ideas, though.
        
       | depr wrote:
       | The trend of posting a page here based on some detail from a
       | comment, that was posted in another thread ("What's that
       | touchscreen in my room?" in this case) a few days ago, has become
       | quite frequent and a bit annoying.
       | 
       | To everyone who wants to write: but I didn't read that thread and
       | I find this quite interesting; you are free to find it
       | interesting, but I did read about it 2 days ago and to me it
       | looks like karma farming.
        
         | donio wrote:
         | Regarding the quality of posts in general the problem is not
         | what gets posted, there is a ton of junk in the "new" queue and
         | most of it never makes it to the front page. It's what gets
         | upvoted.
        
         | bitwize wrote:
         | Why you gotta yuck other people's yum, man?
         | 
         | To me it seems like Hackernews, as a whole, goes off on the
         | same kinds of thought-tangents as I do, and that makes the site
         | more interesting. And I _was_ one of the commenters about
         | Jazelle on the thread you mentioned.
        
           | depr wrote:
           | Because this is a community and I care about what goes on in
           | it. I think this is precisely not a thought-tangent, it is
           | taking the tangent that occurred elsewhere and posting an
           | article about it to score internet points. If we keep doing
           | that, this becomes even more of an echo chamber and a very
           | boring place.
        
         | czscout wrote:
         | Not everyone spends so much time on this site that they can
         | easily spot a post as an extension of a related discussion
         | elsewhere on the site. Someone posted this page, it got upvoted
         | by others who found it interesting, and now it's on the front
         | page. What's wrong with that?
        
           | depr wrote:
           | Popular content is not necessarily good content (very boring
           | to say this, but just look at reddit). And posting articles
           | to get upvotes, which I'm not saying this post is necessarily
           | doing but at least _some_ are doing, leads to lower quality.
           | HN barely has any methods for maintaining overall quality of
           | the website and it will automatically degrade as it gets
           | larger.
           | 
           | To simply allow these posts and having them hit the front
           | page when they get upvotes is a valid position. But I think
           | it contributes to a website that is less interesting.
           | 
           | I don't think these posts should be removed, but they should
           | at least be frowned upon, and/or linked to the original
           | comment thread.
        
         | icegreentea2 wrote:
         | I think attribution would be nice. Both from a honesty
         | standpoint, but also generally useful.
        
       | LeFantome wrote:
       | I was going to say that history repeats itself. This is
       | incorrect. This is actually just history.
       | 
       | This is so old that its replacement, ThumbEE, had already been
       | deprecated as well.
        
       | Twirrim wrote:
       | A little related, back in the day Sun Microsystems came up with
       | picoJava, https://en.wikipedia.org/wiki/PicoJava, a full
       | microprocessor specification dedicated to native execution of
       | java bytecode. It never really went anywhere, other than a few
       | engineering experiments, as far as I remember.
       | 
       | For a while Linus Torvalds, of the Linux kernel fame, worked for
       | a company called Transmeta,
       | https://en.wikipedia.org/wiki/Transmeta, who were doing some
       | really interesting things. They were aiming to make a highly
       | efficient processor, that could handle x86 through a special
       | software translation layer. One of the languages they could
       | support was picoJava. IIRC, the processor was never designed to
       | run operating systems etc. natively. The intent was always to
       | have it work through the translation layer, something that could
       | easily be patched and updated to add support for any x86
       | extensions that Intel or AMD might introduce.
        
       | specialist wrote:
       | I was never clear on how a Java bytecodes would be implemented on
       | a register-based CPU. Efficiently.
       | 
       | The JVM is stack-based, right? So it'd be an interpreter (in
       | microcode)? Unless there's some kind of "virtual" stack, as
       | spec'd for picoJava.
       | 
       | I'm less clear on how Jazelle would implement only a subset of
       | the bytecodes.
       | 
       | Am noob. And a quick scholar search says the relevant papers are
       | paywalled. Oh well; now it's just a curiosity.
       | 
       | Stack-based CPUs are cool, right? For embedded. Super efficient
       | and cheap, enough power for IoT or secure enclaves or whatever.
       | 
       | But it seems that window of opportunity closed. Indeed, if it was
       | ever open.
        
       | penguin_booze wrote:
       | I hope people get it: an arm, and then thumb.
        
       ___________________________________________________________________
       (page generated 2024-01-22 23:01 UTC)