[HN Gopher] JEP 483: Ahead-of-Time Class Loading and Linking
___________________________________________________________________
JEP 483: Ahead-of-Time Class Loading and Linking
Author : ptx
Score : 128 points
Date : 2024-12-21 19:53 UTC (1 days ago)
(HTM) web link (openjdk.org)
(TXT) w3m dump (openjdk.org)
| petesoper wrote:
| Sweet!
| foolfoolz wrote:
| i'm curious if any of this was inspired from aws lambda snapstart
| layer8 wrote:
| Maybe read the _History_ section.
| layer8 wrote:
| > [example hello world] program runs in 0.031 seconds on JDK 23.
| After doing the small amount of additional work required to
| create an AOT cache it runs in in 0.018 seconds on JDK NN -- an
| improvement of 42%. The AOT cache occupies 11.4 megabytes.
|
| That's not immediately convincing that it will be worth it. It is
| a start I guess.
| dgfitz wrote:
| How so?
|
| RAM is almost free if you're not on embedded, and embedded
| could run Java sure, but it isn't common.
| pdpi wrote:
| That's not an in-memory cache either. AIUI it's storing those
| artefacts to disk
| lmz wrote:
| Container sizes may be affected though.
| bobnamob wrote:
| So you're now weighing the increased container pull time
| (due to size) vs the class load time you're saving
| through the cache.
|
| It's nice to at least have the option of making that
| tradeoff
|
| (And I suspect for plenty of applications, the class
| cache will be worth more time than (an also probably
| cached) image pull)
| pdpi wrote:
| If you're deploying Java applications, container size
| isn't exactly your first priority anyhow, and this is
| O(n) additional space.
|
| If image size is a concern, I imagine a native binary
| using GraalVM would've been a better way out anyhow, and
| you'll bypass this cache entirely.
| imtringued wrote:
| RAM might be inexpensive, but this hasn't stopped cloud
| providers from being stingy with RAM and price gouging.
|
| At current RAM prices you'd expect the smallest instances to
| have 2GB, yet they still charge $4/month for 512MB, which
| isn't enough to run the average JVM web server.
| zokier wrote:
| That is pretty ridiculous complaint. Your problem is that
| they allow configuring instances smaller than your
| arbitrary baseline? Especially as AWS allows you to pick
| 2/4/8 GB per vCPU for general purpose instances. And the
| smallest of these (c7g.medium) is 2GB/1vCPU. The .5 GB
| t4g.nano has actually more generous ratio because it also
| has only .1 vCPU, putting it at 5GB/vCPU.
|
| I'd assume they are very aware of demand levels for
| different types and would be adjusting the configurations
| if needed.
| o11c wrote:
| The concern that jumps out at me is: what about flags that affect
| code generation? Some are tied to the subarch (e.g. "does this
| amd64 have avx2?" - relevant if the cache is backed up and
| restored to a slightly different machine, or sometimes even if it
| reboots with a different kernel config), others to java's own
| flags (does compressed pointers affect codegen? disabling
| intrinsics?).
| lxgr wrote:
| I don't see any mention that code is actually going to be
| stored in a JITted form, so possibly it's just architecture-
| independent loading and linking data being cached?
| MBCook wrote:
| My impression from reading this was it was about knowing
| which classes reference which other classes when and which
| jars everything is in.
|
| So I think you're right.
|
| So a bit more linker style optimization than compiler related
| caching stuff.
| brabel wrote:
| The JEP explains what this does:
|
| "The AOT cache builds upon CDS by not only reading and
| parsing class files ahead-of-time but also loading and
| linking them."
|
| While CDS (which has been available for years now) only
| caches a parsed form of the class files that got loaded by
| the application, the AOT cache will also "load and link"
| the classes.
|
| The ClassLoader.load method docs explain what loading
| means: https://docs.oracle.com/en/java/javase/21/docs/api/j
| ava.base...
|
| 1. find the class (usually by looking at the file-index of
| the jar, which is just a zip archive, but ClassLoaders can
| implement this in many ways).
|
| 2. link the class, which is done by the resolveClass
| method: https://docs.oracle.com/en/java/javase/21/docs/api/
| java.base... and explained in the Java Language
| Specification: https://docs.oracle.com/javase/specs/jls/se2
| 1/html/jls-12.ht...
|
| "Three different activities are involved in linking:
| verification, preparation, and resolution of symbolic
| references."
|
| Hence, I assume the AOT cache will somehow keep even
| symbolic references between classes, which is quite
| interesting.
| ignoramous wrote:
| From a related JEP (on AOT): https://openjdk.org/jeps/8335368
| As another possible mismatch, suppose an AOT code asset is
| compiled to use a specific level of ISA, such as Intel's
| AVX-512, but the production run takes place on a machine that
| does not support that ISA level. In that case the AOT code
| asset must not be adopted. Just as with the previous case of
| a devirtualized method, the presence of AVX-512 is a
| dependency attached to the AOT asset which prevents it from
| being adopted into the running VM. Compare this
| with the parallel case with static compilers: A miscompiled
| method would probably lead to a crash. But with Java, there
| is absolutely no change to program execution as a result of
| the mismatch in ISA level in the CDS archive. Future
| improvements are possible, where the training run may
| generate more than one AOT code asset, for a method that is
| vectorized, so as to cover various possibilities of ISA level
| support in production.
|
| Also: https://openjdk.org/projects/leyden/
| fulafel wrote:
| What does this mean for Clojure? At least loading the Clojure
| runtime should benefit, but what about app code loading.
| funcDropShadow wrote:
| It should benefit, if namespaces are AOT-compiled by Clojure.
| diggan wrote:
| I feel like for the Clojure applications where you need it to
| start really fast, like tiny CLI utilities that don't do a lot
| of work, the improvements would be so marginal to not matter
| much. The example they use in the JEP seems to have gone from a
| ~4 second startup to ~2 seconds, which for a tiny CLI, still
| would make it seem pretty slow. You're better off trying to use
| Babashka, ClojureScript or any of the other solutions that give
| a fast startup.
|
| And for the bigger applications (like web services and alike),
| you don't really care that it takes 5 seconds or 10 seconds to
| start it, you only restart the server during deployment
| anyways, so why would startup time matter so much?
| dtech wrote:
| The 4 second application is a web server. They also give a
| basic example starting in 0.031s, fine for a CLI.
|
| One of the use cases for startup time is AWS lambda and
| similar.
| bobnamob wrote:
| Prebuilding a cache through a training run will be
| difficult between lambda invocations though and
| snapstart[1] already "solves" a lot of the issues a class
| cache might address.
|
| [1]
| https://docs.aws.amazon.com/lambda/latest/dg/snapstart.html
|
| Of course, I wouldn't be surprised if the boffins at lambda
| add some integration between snapstart and class caching
| once their leadership can get it funded
| diggan wrote:
| > The 4 second application is a web server. They also give
| a basic example starting in 0.031s, fine for a CLI.
|
| Sure, my comment was more about the relative improvement.
| In the case of the 0.031s example (which is the number
| _without_ the improvement), it gets down to 0.018s with
| this new AOT class loading. What value do you get from
| something starting in 0.018s instead of 0.031s? The
| difference is so marginal for that particular use case.
|
| > One of the use cases for startup time is AWS lambda and
| similar.
|
| I suppose that's one use case where it does make sense to
| really focus on startup times. But again, I'd rather use
| something that fast startup already exists (Babashka,
| ClojureScript) instead of having to add yet another build-
| step into the process.
| kuschku wrote:
| If you're building e.g. a PS1 prompt replacement, you'll
| want to start, gather data, output the PS1 prompt and
| exit in less than 0.016s at most. Any slower and the user
| will see a visible delay.
|
| If you're on higher FPS monitors, the budget shrinks
| accordingly. At 60fps you'll have 16ms, at 480fps you'll
| have 2ms.
|
| The same applies for any app that should feel like it
| starts instantly.
| Tostino wrote:
| There are plenty of CLI applications that need to be low
| overhead. E.g. postgres can call a wal archive command
| for backup purposes, and I specifically remember work
| being done to reduce the startup overhead for backup
| tools like pgbackrest / wal-e.
| dig1 wrote:
| Big apps where startup time matters are desktop/mobile GUI
| apps. These aren't heavily emphasized in the Clojure
| community (excluding ClojureScript), but they are feasible to
| build - and I do build some of them. If startup time is
| reduced by 40%, the end user will definitely notice it.
|
| IMHO, while optimizations in the JVM are always welcome, they
| primarily address surface-level issues and don't tackle
| Clojure's core limitation: the lack of a proper tree shaker
| that understands Clojure semantics. Graalvm offers help here
| by doing whole-program optimization at the bytecode level,
| but a Clojure-specific tree shaker could take things further:
| it could eliminate unused vars before/during Clojure AOT,
| thereby reducing both program size and startup time. These
| improvements would happen before the JVM optimizations kick
| in, making everything that follows a nice extra bonus.
| fulafel wrote:
| Interesting thought, I wonder if there's a way to reason
| about the magnitude of effect this would have.
| dannyfreeman wrote:
| Clojure and the JVM are so dynamic its hard to infer what
| namespaces/vars/classes might be needed during runtime.
| That makes static analysis like tree-shaking difficult.
| Whose to say some strings are concatenated together at
| runtime and used to load a namespace that might have been
| tree-shaken out? The only way to really know is to run the
| program.
| misja111 wrote:
| Load balanced web services on e.g. K8S could need to start
| and stop quite a lot if load varies. Any speed up will be
| welcome.
|
| Also, I guess Java-based desktop applications like IntelliJ
| and DBeaver will benefit.
| pixelmonkey wrote:
| I don't know how this JEP affects Clojure, but if you want to
| use Clojure for fast-loading CLI apps, a good thing to look at
| is babashka (bb). I wrote about it here:
|
| "Learning about babashka (bb), a minimalist Clojure for
| building CLI tools"
|
| https://amontalenti.com/2020/07/11/babashka
| s6af7ygt wrote:
| I'm a dunce
| dtech wrote:
| Read the article, this doesn't reduce JIT capabilities at all.
___________________________________________________________________
(page generated 2024-12-22 23:01 UTC)