[HN Gopher] Azul introduces remote compilation for Java
       ___________________________________________________________________
        
       Azul introduces remote compilation for Java
        
       Author : ithkuil
       Score  : 38 points
       Date   : 2021-12-17 10:01 UTC (12 hours ago)
        
 (HTM) web link (www.theregister.com)
 (TXT) w3m dump (www.theregister.com)
        
       | mgaudet wrote:
       | Interesting. OpenJ9 has something similar in preview:
       | https://blog.openj9.org/tag/jitserver/
        
         | gavinray wrote:
         | So I thought this whole thing was bullshit, then I read the
         | blogpost from OpenJ9
         | 
         | I changed my mind, this image is really what they ought to be
         | showing. It gets the point across:
         | 
         | https://i0.wp.com/blog.openj9.org/wp-content/uploads/2021/09...
         | 
         | What you're looking at is allocation of multiple apps inside of
         | nodes in a cluster. With a JIT Server in each node, the memory
         | required for each instance of the app is reduced, such that the
         | effect is more instances can be fit in the same node size than
         | before.
         | 
         | It reminds me of "bonuses" on equipment in RPGs. You lose 1
         | equipment slot to have it taken up by the item, but in return
         | the rest of your equipment gets a bonus that more than makes up
         | for the slot you can't use now.
        
       | coldcode wrote:
       | Remote inside your environment might make sense, but sending your
       | source to an external entity to compile sounds like a security
       | nightmare.
        
         | cogman10 wrote:
         | Reading the article, they are only talking about "remote in
         | your environment" because this is somewhat latency sensitive.
         | Sending something like this over the internet would have far
         | too many issues.
        
       | stickfigure wrote:
       | Consider me extremely skeptical. One more network service to
       | potentially fail or have security issues. And a whole new
       | thundering herd problem on system recovery. I can only imagine
       | this is tailored for some peculiar workload that I've never seen.
        
         | cogman10 wrote:
         | The 2 usecases I see this working well for.
         | 
         | * Anything involving autoscaling
         | 
         | * My usecase of 1000s of servers doing ETL type work with java
         | processes getting created and destroyed often.
         | 
         | I don't think this is a great boon for long lived services.
        
       | heisenbit wrote:
       | Somehow the link of the OP does not work. This one does:
       | 
       | https://www.theregister.com/2021/12/15/azul_introduces_remot...
        
         | Zababa wrote:
         | Thank you!
        
         | dang wrote:
         | Changed from
         | https://www.theregister.com/2021/12/15/azul_introduces_remot...
         | above. Thanks!
        
       | rightbyte wrote:
       | Seems like a rather unnecessarily complex dependency to do remote
       | JIT compiles.
       | 
       | "it can reduce compute resources by up to 50 per cent"
       | 
       | How ... Do they not include their cloud service in the
       | computations required?
        
         | chrisseaton wrote:
         | > How ...
         | 
         | Not sure why you're so confused? Instead of compiling one
         | method a thousand times, you compile it once and share the
         | result. You still count the computation needed for the one
         | compilation, but strictly reduces compute resources as less
         | work is being done overall.
        
           | usrusr wrote:
           | A remake of those old p2p file-sharing networks but instead
           | of advertising file availability nodes would advertise
           | availability of particularly well-optimized versions of the
           | basic code unit every node already has?
           | 
           | Sounds ridiculous and even a bit pointless because you'd
           | still need the capacity to run lots of compilations along
           | not-yet-optimized code during startup (e.g. after redeploy),
           | but with cluster size being highly variable and usage cost
           | more and more replacing capacity cost as the primary concern
           | this might actually work out. For very large clusters and
           | certain workloads. And the more you modulate cluster size
           | depending on demand, the bigger the benefit from reusing
           | across nodes. Sure, it would be more straight-forward to just
           | clone a fully baked node, but there's a nice peace of mind in
           | not having to decide between cold start and hot clone, just
           | leave it to the self-organization mechanism.
        
             | chrisseaton wrote:
             | I think you're a bit confused about the basic idea.
             | 
             | > every node already has?
             | 
             | No every node doesn't already have it. That's the point.
             | One node computes on behalf of everyone.
             | 
             | > you'd still need the capacity to run lots of compilations
             | along not-yet-optimized code during startup
             | 
             | Well no - nodes also have interpreters so they don't wait.
             | And it's not any slower for someone else to run the
             | compilation on your behalf - the result comes in the same
             | time. And most code doesn't change between redeploys so
             | doesn't need to be recompiled.
        
               | usrusr wrote:
               | Every node already has the bytecode and the interpreter,
               | or more likely fast, superficial JIT without expensive
               | optimisation. But that's slow, so when you need to serve
               | a given load you'll need more nodes before they have
               | settled into a hot optimization state.
               | 
               | Regarding unchanged code after redeploy, usage patterns
               | might change a lot even for code that hasn't changed at
               | all. With profile guided optimizations (which I'd expect
               | to be more norm than exception for something like azul),
               | this would mean that nominally unchanged code could
               | suffer hard from running in a version optimized for a
               | different calling environment (the previous version of
               | the application).
               | 
               | In the grand scheme of things, I'd assume that this would
               | be the main benefit and goal of a JIT outcome sharing
               | mechanism like this: allowing nodes to cooperate by each
               | profiling a different part of their shared code, kind of
               | like becoming experts in a tiny specialization, and then
               | sharing that expertise.
        
         | hyperman1 wrote:
         | Presumably because it becomes a shared cost. 100 JVMs in 100
         | containers have to pay that cost individually. Now 1 or 2
         | central instances have the resources for compiling, the 100
         | worker bees don't need them. They also might becable to
         | optimize faster as they have runtime statistics from other
         | instances, skipping warmup.
        
           | dragontamer wrote:
           | 100 JVMs in 100 containers (on one machine) also have to pay
           | the costs of 100x individual garbage collections, instead of
           | centralizing those garbage collections and merging the
           | memories of everyone together.
           | 
           | Hmmm... maybe just having a monolith at that point is better?
           | If you can run 100 containers on one box, why not just run
           | one environment on one box?
        
             | vlovich123 wrote:
             | I know you're joking but there is an important distinction
             | that enables this. The optimization choices you're going to
             | make of how to lower bytecode to native are going to be
             | largely the same across all your machines - the minute
             | differences in profiles aren't going to make significant
             | differences unless you have the same codebase doing
             | drastically different things. Thus you can share cost
             | easily that so you accumulate execution profiles across
             | your cluster and then lower to native code efficiently.
             | 
             | The garbage collection choices are going to be unique to
             | the workload running on that server and have little
             | information worth sharing that will speed up operations.
        
               | hinkley wrote:
               | We could in theory get a lot of this benefit by
               | supporting profile guided optimization and building a
               | simple distribution system. You could even instrument a
               | fraction of your servers and distribute to the rest.
        
               | dragontamer wrote:
               | > The garbage collection choices are going to be unique
               | to the workload running on that server and have little
               | information worth sharing that will speed up operations.
               | 
               | Java's GC is generational though. Which means that Java-
               | style GC benefits from excessive memory (ie: the more
               | free memory you have, the less the GC is run, and the
               | more that any such object is "dead" before the GC is
               | run).
               | 
               | With more "dead objects" left in the heap, a "bigger
               | heap" benefits all processes.
               | 
               | --------
               | 
               | Consider this simple case: 1MB of live data in a 6MB
               | heap. The garbage collector could run 10 times in some
               | given time period (the heap drops down to 1MB each time
               | the GC is run, and 10-runs means that ~51MBs were
               | allocated in this timeframe 6MBs before the first garbage
               | collection, and then 5MB for each of the remaining 9
               | collections).
               | 
               | Now what happens if you had a 11MB heap instead? Well,
               | the garbage collector won't run until 11MBs were
               | allocated (1st run), and with 1MB of live data after each
               | collection, each remaining run would only be done after
               | 10MBs of allocations. That is to say: only 5-collections
               | would happen.
               | 
               | So with the same algorithm, the 11MB heap with 1MB of
               | live data is twice-as-efficient (ie: half the
               | collections) as the 6MB heap (with 1MB of live data).
               | 
               | -------
               | 
               | Java and its garbage collector is multithreaded these
               | days and the analysis is more difficult. But that doesn't
               | change the fact that generational collectors have this
               | property, where the more RAM they use between
               | collections, the "better they perform".
               | 
               | So just "defacto-sharing more RAM" means all of the
               | individual Java-threads will be more efficient at garbage
               | collection. Well-written heaps get more efficient the
               | more you throw at them.
        
               | vlovich123 wrote:
               | Oh. I misread your suggestion. Yes, if you can somehow
               | have all the services living within the same address
               | space, a shared heap could probably be better. In the
               | model described here we're talking about a multi-computer
               | distributed system though, not a bunch of containers on
               | one machine. Also, by merging everything into one address
               | space you're now potentially making the blast radius of a
               | security vulnerability much larger.
               | 
               | Also, it's not instantly clear things actually get better
               | because while your heap space is larger, so is your
               | object graph and the rate at which your objects are
               | getting created is larger. You'll get some compression
               | out of it by sharing the class information across
               | services but not much more than that.
        
       | anonymousDan wrote:
       | Interesting. I wonder would it permit more advanced optimization
       | passes that might have too much impact on performance otherwise?
        
         | nradov wrote:
         | Any additional gains at the compilation stage are going to be
         | marginal. The real potential is with further dynamic run time
         | optimization within the JVM.
        
           | anonymousDan wrote:
           | Not sure I understand what your disagreement is... as I said
           | you can do more advanced compilation, which by implication
           | should give you better performance. Is that what you meant?
        
       | ksec wrote:
       | Sorry, this page doesn't exist!
       | 
       | Edit: Could @dang change the link to
       | https://www.theregister.com/2021/12/15/azul_introduces_remot...
        
         | dang wrote:
         | Fixed now. Thanks!
        
       | ggfgg wrote:
       | I can see the benefit of this. We have similar problem with JIT
       | executing on a whole bunch of containers during a roll out.
       | 
       | However I'd rather scrap the software and rewrite it in something
       | with a full AOT compiler and rapid startup at this point. And
       | without more dependencies which will fuck up mid deployment
       | whenever cloud provider X falls over.
        
       ___________________________________________________________________
       (page generated 2021-12-17 23:01 UTC)