[HN Gopher] JEP 483: Ahead-of-Time Class Loading and Linking
       ___________________________________________________________________
        
       JEP 483: Ahead-of-Time Class Loading and Linking
        
       Author : ptx
       Score  : 128 points
       Date   : 2024-12-21 19:53 UTC (1 days ago)
        
 (HTM) web link (openjdk.org)
 (TXT) w3m dump (openjdk.org)
        
       | petesoper wrote:
       | Sweet!
        
       | foolfoolz wrote:
       | i'm curious if any of this was inspired from aws lambda snapstart
        
         | layer8 wrote:
         | Maybe read the _History_ section.
        
       | layer8 wrote:
       | > [example hello world] program runs in 0.031 seconds on JDK 23.
       | After doing the small amount of additional work required to
       | create an AOT cache it runs in in 0.018 seconds on JDK NN -- an
       | improvement of 42%. The AOT cache occupies 11.4 megabytes.
       | 
       | That's not immediately convincing that it will be worth it. It is
       | a start I guess.
        
         | dgfitz wrote:
         | How so?
         | 
         | RAM is almost free if you're not on embedded, and embedded
         | could run Java sure, but it isn't common.
        
           | pdpi wrote:
           | That's not an in-memory cache either. AIUI it's storing those
           | artefacts to disk
        
             | lmz wrote:
             | Container sizes may be affected though.
        
               | bobnamob wrote:
               | So you're now weighing the increased container pull time
               | (due to size) vs the class load time you're saving
               | through the cache.
               | 
               | It's nice to at least have the option of making that
               | tradeoff
               | 
               | (And I suspect for plenty of applications, the class
               | cache will be worth more time than (an also probably
               | cached) image pull)
        
               | pdpi wrote:
               | If you're deploying Java applications, container size
               | isn't exactly your first priority anyhow, and this is
               | O(n) additional space.
               | 
               | If image size is a concern, I imagine a native binary
               | using GraalVM would've been a better way out anyhow, and
               | you'll bypass this cache entirely.
        
           | imtringued wrote:
           | RAM might be inexpensive, but this hasn't stopped cloud
           | providers from being stingy with RAM and price gouging.
           | 
           | At current RAM prices you'd expect the smallest instances to
           | have 2GB, yet they still charge $4/month for 512MB, which
           | isn't enough to run the average JVM web server.
        
             | zokier wrote:
             | That is pretty ridiculous complaint. Your problem is that
             | they allow configuring instances smaller than your
             | arbitrary baseline? Especially as AWS allows you to pick
             | 2/4/8 GB per vCPU for general purpose instances. And the
             | smallest of these (c7g.medium) is 2GB/1vCPU. The .5 GB
             | t4g.nano has actually more generous ratio because it also
             | has only .1 vCPU, putting it at 5GB/vCPU.
             | 
             | I'd assume they are very aware of demand levels for
             | different types and would be adjusting the configurations
             | if needed.
        
       | o11c wrote:
       | The concern that jumps out at me is: what about flags that affect
       | code generation? Some are tied to the subarch (e.g. "does this
       | amd64 have avx2?" - relevant if the cache is backed up and
       | restored to a slightly different machine, or sometimes even if it
       | reboots with a different kernel config), others to java's own
       | flags (does compressed pointers affect codegen? disabling
       | intrinsics?).
        
         | lxgr wrote:
         | I don't see any mention that code is actually going to be
         | stored in a JITted form, so possibly it's just architecture-
         | independent loading and linking data being cached?
        
           | MBCook wrote:
           | My impression from reading this was it was about knowing
           | which classes reference which other classes when and which
           | jars everything is in.
           | 
           | So I think you're right.
           | 
           | So a bit more linker style optimization than compiler related
           | caching stuff.
        
             | brabel wrote:
             | The JEP explains what this does:
             | 
             | "The AOT cache builds upon CDS by not only reading and
             | parsing class files ahead-of-time but also loading and
             | linking them."
             | 
             | While CDS (which has been available for years now) only
             | caches a parsed form of the class files that got loaded by
             | the application, the AOT cache will also "load and link"
             | the classes.
             | 
             | The ClassLoader.load method docs explain what loading
             | means: https://docs.oracle.com/en/java/javase/21/docs/api/j
             | ava.base...
             | 
             | 1. find the class (usually by looking at the file-index of
             | the jar, which is just a zip archive, but ClassLoaders can
             | implement this in many ways).
             | 
             | 2. link the class, which is done by the resolveClass
             | method: https://docs.oracle.com/en/java/javase/21/docs/api/
             | java.base... and explained in the Java Language
             | Specification: https://docs.oracle.com/javase/specs/jls/se2
             | 1/html/jls-12.ht...
             | 
             | "Three different activities are involved in linking:
             | verification, preparation, and resolution of symbolic
             | references."
             | 
             | Hence, I assume the AOT cache will somehow keep even
             | symbolic references between classes, which is quite
             | interesting.
        
           | ignoramous wrote:
           | From a related JEP (on AOT): https://openjdk.org/jeps/8335368
           | As another possible mismatch, suppose an AOT code asset is
           | compiled to use a specific level of ISA, such as Intel's
           | AVX-512, but the production run takes place on a machine that
           | does not support that ISA level. In that case the AOT code
           | asset must not be adopted. Just as with the previous case of
           | a devirtualized method, the presence of AVX-512 is a
           | dependency attached to the AOT asset which prevents it from
           | being adopted into the running VM.            Compare this
           | with the parallel case with static compilers: A miscompiled
           | method would probably lead to a crash. But with Java, there
           | is absolutely no change to program execution as a result of
           | the mismatch in ISA level in the CDS archive. Future
           | improvements are possible, where the training run may
           | generate more than one AOT code asset, for a method that is
           | vectorized, so as to cover various possibilities of ISA level
           | support in production.
           | 
           | Also: https://openjdk.org/projects/leyden/
        
       | fulafel wrote:
       | What does this mean for Clojure? At least loading the Clojure
       | runtime should benefit, but what about app code loading.
        
         | funcDropShadow wrote:
         | It should benefit, if namespaces are AOT-compiled by Clojure.
        
         | diggan wrote:
         | I feel like for the Clojure applications where you need it to
         | start really fast, like tiny CLI utilities that don't do a lot
         | of work, the improvements would be so marginal to not matter
         | much. The example they use in the JEP seems to have gone from a
         | ~4 second startup to ~2 seconds, which for a tiny CLI, still
         | would make it seem pretty slow. You're better off trying to use
         | Babashka, ClojureScript or any of the other solutions that give
         | a fast startup.
         | 
         | And for the bigger applications (like web services and alike),
         | you don't really care that it takes 5 seconds or 10 seconds to
         | start it, you only restart the server during deployment
         | anyways, so why would startup time matter so much?
        
           | dtech wrote:
           | The 4 second application is a web server. They also give a
           | basic example starting in 0.031s, fine for a CLI.
           | 
           | One of the use cases for startup time is AWS lambda and
           | similar.
        
             | bobnamob wrote:
             | Prebuilding a cache through a training run will be
             | difficult between lambda invocations though and
             | snapstart[1] already "solves" a lot of the issues a class
             | cache might address.
             | 
             | [1]
             | https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html
             | 
             | Of course, I wouldn't be surprised if the boffins at lambda
             | add some integration between snapstart and class caching
             | once their leadership can get it funded
        
             | diggan wrote:
             | > The 4 second application is a web server. They also give
             | a basic example starting in 0.031s, fine for a CLI.
             | 
             | Sure, my comment was more about the relative improvement.
             | In the case of the 0.031s example (which is the number
             | _without_ the improvement), it gets down to 0.018s with
             | this new AOT class loading. What value do you get from
             | something starting in 0.018s instead of 0.031s? The
             | difference is so marginal for that particular use case.
             | 
             | > One of the use cases for startup time is AWS lambda and
             | similar.
             | 
             | I suppose that's one use case where it does make sense to
             | really focus on startup times. But again, I'd rather use
             | something that fast startup already exists (Babashka,
             | ClojureScript) instead of having to add yet another build-
             | step into the process.
        
               | kuschku wrote:
               | If you're building e.g. a PS1 prompt replacement, you'll
               | want to start, gather data, output the PS1 prompt and
               | exit in less than 0.016s at most. Any slower and the user
               | will see a visible delay.
               | 
               | If you're on higher FPS monitors, the budget shrinks
               | accordingly. At 60fps you'll have 16ms, at 480fps you'll
               | have 2ms.
               | 
               | The same applies for any app that should feel like it
               | starts instantly.
        
               | Tostino wrote:
               | There are plenty of CLI applications that need to be low
               | overhead. E.g. postgres can call a wal archive command
               | for backup purposes, and I specifically remember work
               | being done to reduce the startup overhead for backup
               | tools like pgbackrest / wal-e.
        
           | dig1 wrote:
           | Big apps where startup time matters are desktop/mobile GUI
           | apps. These aren't heavily emphasized in the Clojure
           | community (excluding ClojureScript), but they are feasible to
           | build - and I do build some of them. If startup time is
           | reduced by 40%, the end user will definitely notice it.
           | 
           | IMHO, while optimizations in the JVM are always welcome, they
           | primarily address surface-level issues and don't tackle
           | Clojure's core limitation: the lack of a proper tree shaker
           | that understands Clojure semantics. Graalvm offers help here
           | by doing whole-program optimization at the bytecode level,
           | but a Clojure-specific tree shaker could take things further:
           | it could eliminate unused vars before/during Clojure AOT,
           | thereby reducing both program size and startup time. These
           | improvements would happen before the JVM optimizations kick
           | in, making everything that follows a nice extra bonus.
        
             | fulafel wrote:
             | Interesting thought, I wonder if there's a way to reason
             | about the magnitude of effect this would have.
        
             | dannyfreeman wrote:
             | Clojure and the JVM are so dynamic its hard to infer what
             | namespaces/vars/classes might be needed during runtime.
             | That makes static analysis like tree-shaking difficult.
             | Whose to say some strings are concatenated together at
             | runtime and used to load a namespace that might have been
             | tree-shaken out? The only way to really know is to run the
             | program.
        
           | misja111 wrote:
           | Load balanced web services on e.g. K8S could need to start
           | and stop quite a lot if load varies. Any speed up will be
           | welcome.
           | 
           | Also, I guess Java-based desktop applications like IntelliJ
           | and DBeaver will benefit.
        
         | pixelmonkey wrote:
         | I don't know how this JEP affects Clojure, but if you want to
         | use Clojure for fast-loading CLI apps, a good thing to look at
         | is babashka (bb). I wrote about it here:
         | 
         | "Learning about babashka (bb), a minimalist Clojure for
         | building CLI tools"
         | 
         | https://amontalenti.com/2020/07/11/babashka
        
       | s6af7ygt wrote:
       | I'm a dunce
        
         | dtech wrote:
         | Read the article, this doesn't reduce JIT capabilities at all.
        
       ___________________________________________________________________
       (page generated 2024-12-22 23:01 UTC)