[HN Gopher] ZGC - What's new in JDK 16
       ___________________________________________________________________
        
       ZGC - What's new in JDK 16
        
       Author : harporoeder
       Score  : 154 points
       Date   : 2021-03-23 14:33 UTC (8 hours ago)
        
 (HTM) web link (malloc.se)
 (TXT) w3m dump (malloc.se)
        
       | olodus wrote:
       | Really impressive results.
       | 
       | Sorry for my ignorance on the topic, but will this have any
       | impact on other JVM languages or will this mostly only benefit
       | Java itself?
       | 
       | I realize even though I use JVM languages now and then I do not
       | really know if they use their own GC implementation or make use
       | of Java's. Does this differ between the languages maybe?
        
         | buryat wrote:
         | this will work for any language that runs on top of JVM, that's
         | the beauty of the JVM, improvements benefit all its languages
        
         | jfengel wrote:
         | The JVM has its own garbage collector. Every language uses it.
         | 
         | There may be tiny differences in the way code generators and
         | optimizers work, which mean they may not get exactly the same
         | properties out of equivalent code. For example, if they're
         | generating a lot of objects behind the scenes, the GC
         | improvements might help more, or less, or even do worse.
         | 
         | But that's the kind of thing that's really dependent on the
         | algorithm you've implemented. So mostly likely you get some
         | benefit for free. If you don't, you'll need to benchmark to
         | find out. The optimizers do a lot of work for you (and the JVM
         | does a ton of language-independent optimization), but some
         | things are up to experiment.
        
       | bitmapbrother wrote:
       | >After reaching that initial 10ms goal, we re-aimed and set our
       | target on something more ambitious. Namely that a GC pause should
       | never be longer than 1ms. Starting with JDK 16, I'm happy to
       | report that we've reached that goal too. ZGC now has O(1) pause
       | times. In other words, they execute in constant time and do not
       | increase with the heap, live-set, or root-set size (or anything
       | else for that matter). Of course, we're still at the mercy of the
       | operating system scheduler to give GC threads CPU time. But as
       | long as your system isn't heavily over-provisioned, you can
       | expect to see average GC pause times of around 0.05ms (50 us) and
       | max pause times of around 0.5ms (500 us).
       | 
       | Very impressive and well done. Should Azul be worried?
        
         | novium wrote:
         | Since ZGC is in OpenJDK it should already be available in Zulu
         | as well
         | 
         | https://github.com/openjdk/jdk/tree/master/src/hotspot/share...
        
       | modeless wrote:
       | 1 ms pause times are pretty good. That's finally getting close to
       | the point where it may no longer be the biggest factor preventing
       | adoption in applications like core game engine code. Although at
       | 144 Hz it's still 14% of your frame time, so it's hardly
       | negligible.
       | 
       | Even if the GC is running on an otherwise idle core there are
       | still other costs like power consumption and memory bandwidth. So
       | you still want to minimize allocation to keep the GC workload
       | down.
       | 
       | For too long GC people were touting 10 ms pause times as "low"
       | and not bothering to go further, but truly low pause times _are_
       | possible. I 'd love to see a new systems language that _starts_
       | by designing for extremely low-pause GC, not manual allocation or
       | a borrow checker. I think it would be possible to make something
       | that you could use for real time work without having to
       | compromise on memory safety and without having to pay the
       | complexity tax Rust takes on for the borrow checker.
        
         | moonchild wrote:
         | > at 144 Hz it's still 14% of your frame time, so it's hardly
         | negligible
         | 
         | A single skipped frame is not a big deal and will probably not
         | be noticed. It will probably happen anyway due to scheduling
         | quirks, resource contention with other processes, existing
         | variation in frametime...
         | 
         | True realtime work requires no dynamic allocations whatsoever
         | (which, notably, is not covariant with gc!), so I think 'low'
         | pause times are an acceptable compromise. Where performance is
         | a concern, you need to manually manage a lot of factors, among
         | them GC/dynamic memory use. There's no runtime that can obviate
         | that.
         | 
         | Granted, 1ms pause times are probably still not low enough for
         | realtime audio, and there may be room for some carelessness
         | there (audio being soft realtime, not hard realtime). But I
         | think just being careful to avoid dynamic allocation on the
         | audio thread is probably a worthwhile tradeoff.
        
           | devit wrote:
           | Dynamic allocations don't cause any issues with hard
           | realtime, as long as you don't run out of memory.
        
             | moonchild wrote:
             | Most allocators are not constant time, and are fairly slow
             | anyway. (Actually GC tend to have faster allocators, but
             | obviously unpredictable pauses.)
             | 
             | (Though there was an allocator I saw recently that promised
             | O(1) allocations. Pretty neat idea.)
        
               | monocasa wrote:
               | Core game code commonly uses custom allocators that do
               | provide those semantics though.
               | 
               | A bump allocator that you reset every frame is O(1) and a
               | dozen or so cycles per allocation for example.
        
               | moonchild wrote:
               | Sure, yes. I was referring more to 'general-purpose
               | dynamic allocator' (malloc or so). I agree custom memory
               | management/reclamation techniques can be fine for RT;
               | just semantics.
        
           | modeless wrote:
           | > A single skipped frame is not a big deal and will probably
           | not be noticed.
           | 
           | Attitudes like this are why my phone sucks to use and I get
           | nauseous in VR and GC devs spent so long in denial saying 10
           | ms pause times should be good enough. Yes, single dropped
           | frames matter. If you don't think so then I don't want to use
           | your software.
        
             | kaba0 wrote:
             | A single skipped frame usually means that we are talking
             | about soft real time. And there it is absolutely
             | acceptable, not in the average case, but eg. on a heavily
             | used computer a slight drop in audio is "appropriate", it's
             | not an anti-missile device.
             | 
             | It won't make the normal case jittery, nauseus or anything
             | like that. Also, in regards to your GC devs comment, I
             | would say that attitudes like this is the problem.. The
             | great majority of programs can do with much more than 10 ms
             | pause times.
        
               | akx wrote:
               | A slight drop in audio would be perfectly unacceptable
               | for eg computers running concerts.
        
           | AaronFriel wrote:
           | > A single skipped frame is not a big deal and will probably
           | not be noticed.
           | 
           | Some folks definitely notice this phenomenon, called a
           | "microstutter" by that group. You can see it here:
           | 
           | https://testufo.com/stutter#demo=microstuttering&foreground=.
           | ..
        
             | kaba0 wrote:
             | Noone mentioned how frequent frame-skips are we talking
             | about.
             | 
             | Is a single frameskip in an hour a problem?
        
           | barrkel wrote:
           | A constant frame interval is better than occasional skipped
           | frames. You don't need a super high frame rate for perceived
           | smooth motion, but dropped frames look like stutter.
        
             | brokencode wrote:
             | Stutter is becoming much less of a factor with variable
             | refresh rate display technology. Modern consoles, TVs, and
             | many monitors are being built with VRR these days, and in a
             | few years it will probably be ubiquitous.
             | 
             | Unless you have a highly optimized game, you are probably
             | not able to consistently run at a 144 Hz monitor's native
             | refresh rate anyway, so even without skipping frames you
             | will see stuttering. VRR solves this problem as well.
        
               | syockit wrote:
               | I'm not sure if you and GP share the same notion of
               | stutter or not. I never saw stutters when limiting the
               | game at 24 or 30 fps while playing on a 60 Hz LCD monitor
               | in the past. It stutters only when the fps is not
               | constant.
        
         | Thaxll wrote:
         | GC will most likely never be used in demanding games. You want
         | total control over memory. 1ms sounds ok but still you don't
         | know when and for how long the GC is going to kicks in.
        
           | BenoitP wrote:
           | > for how long the GC is going to kicks in.
           | 
           | 1ms (max, average at 50us)
           | 
           | And for the 'when' I'll add that the very concept of having a
           | concurrent GC means you don't need to do a (potentially
           | pausing) malloc right in the middle of what you're trying to
           | do.
        
             | monocasa wrote:
             | The kind of people that care about GC pause times have
             | their own allocators that are as cheap as jvm allocations
             | and cheaper deallocation. They aren't poopooing GC's and
             | then just calling regular malloc and free.
        
             | Thaxll wrote:
             | Engines rely on smart allocator and memory pool, they
             | usually allocate everything beforehand. You're not running
             | malloc between two frames. Imagine a game like Battlefield
             | if you were to allocate memory for each fired bullet.
        
           | CJefferson wrote:
           | The biggest engine in gaming, Unity, uses c#, which is GCed.
        
             | terramex wrote:
             | And the amount of man-hours collectively spent on working
             | around this terrible, terrible GC is immense.
             | 
             | It was the worst GC implementation I've seen in my life,
             | could cause 0.5s GC spikes every 10 seconds on Xbox One
             | even though we were allocating none or very little memory
             | during gameplay. The amount of pre-allocated and pooled
             | objects was bringing it down to its knees, because Unity's
             | GC is non-generational and checks every single object every
             | time. In the end we moved a lot of the data into native
             | plugins written in C++. Nothing super hard, but you choose
             | high-level engine to avoid such issues.
             | 
             | I've read that in 2019 they finally added incremental mode
             | GC, that solves some of the issues but is still far cry
             | from modern GC's.
        
             | bitmapbrother wrote:
             | I would say Unreal is the biggest game engine in terms of
             | pervasiveness. Also, isn't C# just used as a scripting
             | language in Unity? All of the heavy lifting is dome by the
             | C/C++ backend.
        
             | Thaxll wrote:
             | It's def not the biggest, it's almost not used in "AAA"
             | games also.
        
               | liamkf wrote:
               | Unreal also has a GC to deal with. I've spent more time
               | on AAA games than I'd like to admit trying to
               | hide/mitigate/optimize the hitch.
        
         | The_rationalist wrote:
         | _Although at 144 Hz it 's still 14% of your frame time_ well if
         | we believe their numbers, the _worst case_ is 0.5ms so 6% of
         | frame time for 144hz. Assuming their stated average pause time
         | of 0.05ms then average pauses (and the GC isn 't constantly
         | pausing) take 0.6% of frametime, which _is_ negligible. Though
         | your concerns on throughput and resource usages stands. Well
         | newer programming languages could leverage ZGC (and improve
         | upon it) by targeting graalVM + it enable cross-language
         | interop.
        
           | modeless wrote:
           | In my experience GC developers wildly underestimate their
           | worst case, so I don't really believe that 0.5ms number. But
           | more importantly, you should not use average pause time at
           | all. At 144 Hz the 99th percentile frame time occurs more
           | than once per second. If you want to avoid dropping frames
           | you need to design for the worst case.
        
             | doikor wrote:
             | There is a worst case that is much worse then the now
             | mentioned 1ms. Namely really big change in allocation rate
             | in which case the gc cycle does not finish before running
             | out of memory. In that case ZGC stops allocations
             | ("Allocation Stall" in gc logs). But this is basically a
             | failure mode and should not happen during normal operations
             | at all.
             | 
             | Though you can configure for this in a couple ways if you
             | run into this issue
             | 
             | 1. By telling it to treat some amount of heap as the max
             | that is not actual max when it comes to it calculating when
             | it should start next gc cycle. (-XX:SoftMaxHeapSize)
             | https://malloc.se/blog/zgc-softmaxheapsize
             | 
             | 2. Increasing the amount of concurrent gc threads so they
             | will finish their work faster (-XX:ConcGCThreads)
             | 
             | 3. Just run really large heap so the "run a gc cycle even
             | if we don't need to based on allocation rate" gc cycle
             | keeps the heap in check.
             | 
             | Though after JDK 15 we have not had to mess with any of
             | these. Prior to that we had to adjust soft max heap size a
             | bit. With JDK 16 it should be even better I guess (should
             | be upgrading sometime next week)
        
             | kaba0 wrote:
             | ZGC used to target 10ms as worst-case latency, and they
             | target 3-4ms now I believe.
        
               | jacques_chester wrote:
               | The article above says that the target is 1ms.
        
               | kaba0 wrote:
               | Thanks for the correction, I remembered the 3-4 ms from
               | the inside java podcast on the ZGC.
        
               | jacques_chester wrote:
               | No worries.
        
       | vlovich123 wrote:
       | One of the observations I've been making is that strategies like
       | this of spreading the work around multiple threads almost seem to
       | play with measurements more than necessarily improving the cost.
       | So yes, the "stop the world phase" is shorter & cheaper. It's
       | unclear the rest of the threads have more implicit overhead to
       | support this concurrency (more book-keeping, participating
       | implicitly in GC, etc). Supporting benchmarks of various
       | workloads would be helpful to understand what tradeoffs were
       | made.
        
         | cogman10 wrote:
         | Good observation.
         | 
         | This is a fundamental principle of garbage collection. You can
         | either have low latency or high throughput. You can't get both.
         | 
         | Why is that?
         | 
         | All optimizations that improve latency come at a cost.
         | Generally, more book keeping, more checks, more frequent
         | garbage collections. ZGC is one of those algorithms. It adds a
         | new check every time you access memory to see if it needs to be
         | relocated. That increases the size of objects but also the
         | general runtime of the application.
         | 
         | A similar thing happens with reference counting (which is on
         | the extreme end of the latency/throughput tradeoff). Every time
         | you give a shared pointer or release a shared pointer a check
         | is performed to see if a final release needs to happen.
         | 
         | On the flip side, a naive mark and sweep algorithm is trivially
         | parallelizable. The number of times you check if memory is
         | still in use is bound by when a collection happens. In an ideal
         | state you increase heap size until you get the desired
         | throughput.
         | 
         | We get "violations" of some of these principles if we can take
         | shortcuts or have assumptions about how memory is used and
         | allocated. For example, the assumption that "most allocations
         | are short lived" or the generational hypotheses leads to
         | shorter pause times even when optimizing for throughput without
         | a lot of extra cost. It's only costly when you've got an
         | application that doesn't fit into that hypotheses (which is
         | rare).
         | 
         | Haskell has a somewhat unique garbage collector based on the
         | fact that all data is immutable. They can take shortcuts
         | because older references can't refer to newer references.
        
           | The_rationalist wrote:
           | _Haskell has a somewhat unique garbage collector based on the
           | fact that all data is immutable. They can take shortcuts
           | because older references can 't refer to newer references._ I
           | wonder, can this be achieved for immutable datastructures in
           | the JVM e.g records, lists ?
        
             | [deleted]
        
           | whateveracct wrote:
           | > Haskell has a somewhat unique garbage collector based on
           | the fact that all data is immutable. They can take shortcuts
           | because older references can't refer to newer references.
           | 
           | I don't think this is true? Because laziness is really
           | heavily mutable under the hood. Not to mention that it has
           | mutable references. But maybe there are some tricks in the GC
           | I'm not aware of.
        
             | cogman10 wrote:
             | If you're interested in a fun read, they've published a
             | paper on how they do garbage collection.
             | 
             | http://simonmar.github.io/bib/papers/parallel-gc.pdf
        
           | pron wrote:
           | > Haskell has a somewhat unique garbage collector based on
           | the fact that all data is immutable. They can take shortcuts
           | because older references can't refer to newer references.
           | 
           | When you don't mutate in OpenJDK you get essentially the
           | same. Much of the cost of a modern GC (OpenJDK's G1, and soon
           | probably ZGC, too) is write barriers, that need to inform the
           | GC about reference mutations. If you don't mutate, you don't
           | pay that cost. This is partly why applications that go to the
           | extreme in the effort not to allocate and end up mutating
           | more, might actually do worse than if they'd allocated more
           | with OpenJDK's newer GCs.
           | 
           | In fact, OpenJDK's GCs rely heavily on the assumption that
           | old objects can't reference newer ones unless explicitly
           | mutated, and so require those barriers only in old regions.
        
         | carry_bit wrote:
         | In general if you the highest throughput you'll also get long
         | pause times, since the techniques to reduce the max pause times
         | depends on inserting barriers into the application code.
         | 
         | Ignoring pause times is fine for batch processing, but not
         | ideal for interactive systems.
        
         | BenoitP wrote:
         | There is an overhead: They use higher bits in the address space
         | to indicate various stages in the object's collection
         | (Shenandoah has a forwarding pointer IRCC).
         | 
         | This means you may not activate the compressed pointers
         | optimization.
        
       | BenoitP wrote:
       | > you can expect to see average GC pause times of around 0.05ms
       | (50 us)
       | 
       | This is nuts (and very well below OS jittering)
        
       | coldtea wrote:
       | Question: is the GC suitable for use with something like Idea or
       | is it more for server workloads? Would it reduce UI GC-pauses lag
       | accordingly?
        
         | perennus wrote:
         | I tried it last week actually with OpenJDK15+Windows. With
         | JDK16, IntelliJ didn't boot.
         | 
         | -XX:+UseZGC: Memory usage for my project dropped to a constant
         | 600 megs. Using the IDE felt just as fast as the normal
         | experience.
         | 
         | -XX:+UseShenandoahGC, -Xmx4g. Shenandoah GC used a constant 4
         | gigs of ram. It was a slower user experience for me.
         | 
         | In the end, I went back to the default settings, because the
         | custom JDK changes the look and feel and I don't like it.
        
           | coldtea wrote:
           | > _-XX:+UseZGC: Memory usage for my project dropped to a
           | constant 600 megs. Using the IDE felt just as fast as the
           | normal experience._
           | 
           | Shame, I hoped it would feel faster than the normal
           | experience, with (even infrequent) user-felt GC pauses
           | completely eliminated.
        
         | AnthonBerg wrote:
         | I confirm that ZGC works with IntelliJ IDEA, and it seems to me
         | that it makes IDEA respond quite a bit faster. It's not hard to
         | get IntelliJ IDEs to use ZGC by editing the VM properties file.
        
       | bestinterest wrote:
       | This might be an odd question but how often does garbage
       | collection run and whats the usual time taken over a period of
       | time?
       | 
       | Say I'm doing a drawing/game app and creating a few hundred heap
       | objects a second that need to get garbage collected.
       | 
       | I have no idea on how often GC is run on a typical app and how
       | much real time it takes over say an hour of an semi complex app
       | running on average. It obviously depends on the app but I do not
       | even have a number average cost of a GC language for some typical
       | web app.
       | 
       | I only know 'GC's are bad' because the 100s of HackerNews
       | comments dismissing languages because they have a GC for some
       | reason rather than hard examples of them eating up time.
        
         | brokencode wrote:
         | GC can be very efficient when considering the average cost over
         | time, and is faster than reference counting for instance. It
         | also can have nice features such as heap compaction which you
         | can't easily do with manual memory management.
         | 
         | But the main thing most folks have problems with is the random
         | latency spikes you get with GC. The GC can start at any time in
         | most languages, and might stop all threads in your program for
         | maybe dozens or hundreds of ms. This would be visible to users
         | if you are rendering frames at a constant rate in a game, since
         | each frame takes only around 16 ms in a 60 FPS game.
         | 
         | That's what's exciting about changes like what they are doing
         | with ZGC. They are saying the max garbage collection time is
         | 0.5 ms in normal situations, and the average time is even
         | lower. Most games can accommodate that without a problem.
         | 
         | FYI, this is also important for web servers as well. Some web
         | servers have a huge amount stored in memory, and the GC could
         | take hundreds of ms or even multiple seconds to collect at
         | random times in extreme cases. This can make a web request take
         | perceptibly longer.
         | 
         | Also, if you have multiple machines communicating with one
         | another and randomly spiking in latency due to GC, then worst
         | case latency can add up to pretty terrible numbers if you are
         | not careful.
        
         | adamdusty wrote:
         | After some research I couldn't really find much of an answer.
         | 
         | The thing about GC is you either don't care at all, or you
         | don't want it at all. There's rarely a case where you know how
         | many GC cycles you can handle in a certain period. Web dev, GC
         | all you want. Games can handle GC but its likely you'll need to
         | be cognitive of memory use. Embedded stuff doesn't have enough
         | memory to utilize a GC.
         | 
         | I'm sure why GC languages get so much hate. I do a lot with C#
         | and the runtime gives a few options for controlling allocations
         | and accessing memory, so I can usually get it to be fast
         | enough.
        
         | gopalv wrote:
         | > Say I'm doing a drawing/game app and creating a few hundred
         | heap objects a second that need to get garbage collected.
         | 
         | Was literally my job ten years ago to optimize this and I was
         | struggling with a GC'd language with a proprietary
         | implementation (flash+actionscript).
         | 
         | The problem is not with hundreds of heap objects per-frame, the
         | problem is that they would accumulate to the tens of thousands
         | before the first GC trigger happens.
         | 
         | And the GC trigger might happen in the middle of drawing a
         | frame, even worse, at the end of drawing a frame (which means
         | even a 10ms pause means you miss the 16ms frame window at
         | 60fps).
         | 
         | The problem that most people had was that this was unevenly
         | distributed and janky to put it in the lingo. So you'd get 900
         | frames with no issues and a single frame that freezes.
         | 
         | So most of the problem people have with GC pauses is the
         | unpredictability of it and the massive variations in the 99th
         | percentile latency in the system, making it look slower than it
         | actually is.
         | 
         | Most of the original GC implementations scale poorly as the
         | memory sizes went up and the amount of possible garbage went
         | up, until the GC models started switching over the garbage-
         | first optimizations, thread-local alloc buffers and survivor
         | generation + heap reserves etc (i.e we have lots of memory, our
         | problem is with the object walking overheads - so small objects
         | with lots of references is bad).
         | 
         | The GC model is actually pretty okay, but it is still
         | unpredictable enough that tuning the GC or building an
         | application on top of a GC'd language which has strict latency
         | requirements is hard.
         | 
         | However, as a counterpoint - OpenHFT.
         | 
         | Clearly it is possible, but it takes a lot of alignment across
         | all the system layers, but at that point you might as well
         | write C++ because it is not portable enough to run anywhere.
        
         | jankotek wrote:
         | It really depends on application and complexity of object
         | graph. Short lived object usually have low overhead. Long lived
         | objects with huge heap may cause a problem.
         | 
         | In past GC had bad reputation for increased and unpredictable
         | latencies. In old JVMs GC would pause execution to traverse
         | object graph.
         | 
         | In general do not worry about GC, unless you run into
         | performance issues. If performance is a problem, run continuous
         | profiler such as Flight Recorded. It has very little overhead.
        
           | _ph_ wrote:
           | And in most cases it isn't GC which is the problem, but the
           | program doing too many heap allocations. Cutting heap
           | allocations down improves the speed of most programs, with or
           | without GC.
        
         | dignan wrote:
         | GC is a memory management technique with tradeoffs like all the
         | others.
         | 
         | GC has many different implementations, with widely ranging
         | properties. For example, the JVM itself currently supports at
         | least 3 different GC implementations. There are also different
         | types of GC's, so for example in a generational garbage
         | collection system you'll typically see two or three generations
         | of GCs, depending on the generation (how many GC cycles it has
         | survived) of the objects it collects. The shortest GC's in
         | those systems are usually a couple milliseconds, while the
         | longest ones can be many seconds.
         | 
         | GC isn't always a problem. If your application isn't latency
         | sensitive, it's not a big deal. Though if you tune your network
         | timeouts to be too low, even something that is not really
         | latency sensitive can have trouble because of GC causing
         | network connections to timeout. Even if it is a latency
         | sensitive applicatoin, if GC "stop the world" pauses - pauses
         | that stop program execution, are short it can be OK.
         | 
         | One reason you'll see people say GCs are bad is for those
         | latency sensitive applications. For example, I previously
         | worked on distributed datastores where low latency responses
         | were critical. If our 99th percentile response times jumped
         | over say 250ms, that would result in customers calling our
         | support line in massive numbers. These datastores ran on the
         | JVM, where at the time G1GC was the state of the art low-
         | latency GC. If the systems were overloaded or had badly tuned
         | GC parameters, GC times could easily spike into the seconds
         | range.
         | 
         | Other considerations are GC throughput and CPU usage. GC
         | systems can use a lot of CPU. That's often the tradeoff you'll
         | see for these low-latency GC implementations. GC's also can put
         | a cap on memory throughput. How much memory can the GC
         | implementation examine with how much CPU usage with what amount
         | of stop-the-world time tends to be the nature of the question.
        
       | geodel wrote:
       | Sub milli sec GC pause is very impressive. Though one thing to me
       | is not clear is that if it is true only for very large heaps or
       | it will be great also for typical service/micro service heaps in
       | range of 4-32 GB.
        
         | chrisseaton wrote:
         | I think the whole point is the pause time doesn't vary with the
         | heap size.
        
         | pradeepchhetri wrote:
         | It works great even for large heap sizes. I moved my ES cluster
         | (running with around 92G heap size) from G1GC to ZGC and saw
         | huge improvements in GC. Best part about ZGC is you don't need
         | to touch any GC parameter and it autotunes everything.
        
           | JD557 wrote:
           | >running with around 92G heap size
           | 
           | I'm curious about this choice. The elasticsearch
           | documentation recommends a maximum heap slightly below 32GB
           | [1].
           | 
           | Is this not a problem anymore with G1GC/ZGC, or are you
           | simply "biting the bullet" and using 92G of heap because you
           | can't afford to scale horizontally?
           | 
           | 1: https://www.elastic.co/guide/en/elasticsearch/reference/7.
           | 11...
        
             | capableweb wrote:
             | > because you can't afford to scale horizontally?
             | 
             | Doesn't have to be because of affordance but rather it's
             | more efficient and cheaper to scale vertically first, both
             | in monetary costs and in time/maintenance costs.
        
               | vosper wrote:
               | On hardware, but not on a cloud setup? We run several
               | hundred big ES nodes on AWS, and I believe we stick to
               | the heap sizing guidelines (though I've long wondered if
               | fewer instances with giant heaps might actually work ok,
               | too)
        
               | toast0 wrote:
               | Cloud is trickier to price than real hardware. On real
               | hardware, filling the ram slots is clearly cheaper than
               | buying a second machine, if ram is the only issue. If you
               | need to replace with higher density ram, sometimes it's
               | more cost effective to buy a second machine. Adding more
               | processor sockets to get more ram slots is also sometimes
               | more, sometimes less cost effective than adding more
               | machines. Often, you might need more processing to go
               | with the ram, which can change the balance.
               | 
               | In cloud, with defined instance types, usually more ram
               | comes with more everything else, and from pricing listed
               | at https://www.awsprices.com/ in US East, it looks like
               | within an instance type, $ / ram is usually consistent.
               | The least expensive (per unit ram) class of instances is
               | x1/x1e which are 122 Gb to 3904, so that does lean
               | towards bigger instances being cost effective.
               | 
               | Exceptions I saw are c1.xlarge is less expensive than
               | c1.medium, c4.xlarge is less than other c4 types and c4
               | is more expensive than others, m1.medium < m1.large ==
               | m1.xlarge < m1.small, m3.medium is more expensive than
               | other m3, p2.16xlarge is more expensive than other p2,
               | t2.small is less expensive than other t2. Many of these
               | differences are a tenth of a penny per hour though.
        
             | legerdemain wrote:
             | Heaps "slightly below 32GB" are usually because of the
             | -XX:+UseCompressedOops option, which allows Java to address
             | up to 32GB of memory with a smaller pointer. Between
             | 32-35GB of heap, you're just paying off the savings you
             | would have gotten with compressed object pointers, but if
             | you keep cranking your heap further after that, you'll
             | start getting benefits again.
        
               | JanecekPetr wrote:
               | This, exactly. One added issue is that ZGC does NOT
               | support compressed oops at all.
        
           | manasvi_gupta wrote:
           | Please specify Elasticsearch & JDK version. Also, index size
           | and heap size per node.
           | 
           | From my experience, high heap sizes are unnecessary since
           | Lucene (used by ES) has greatly reduced heap usage by moving
           | things off-heap[1].
           | 
           | [1] - https://www.elastic.co/blog/significantly-decrease-
           | your-elas...
        
           | pron wrote:
           | Whether G1 or ZGC are the best choice depends on the workload
           | and requirements, but G1 in recent JDK versions also requires
           | virtually no tuning (if your G1 usage had flags _other_ than
           | maximum heap size, maybe minimum heap size, and maybe pause
           | target, try again without them).
        
           | gher-shyu3i wrote:
           | Did you notice a change in the peak memory usage?
        
           | vosper wrote:
           | How (and how much) did these improvements manifest? For
           | example, did you measure consistently faster response times
           | when running ZGC rather than G1GC? If so, by how much? I'm
           | always looking for a way to improve ES response times for our
           | users.
        
         | perliden wrote:
         | ZGC pause times will be the same regardless of heap size. ZGC
         | currently supports heaps from 8MB to 16TB. So if you have
         | 4-32GB heaps and want low latency, then ZGC is definitely
         | something to try.
        
           | eklavya wrote:
           | Hey, is there any benchmark comparing throughput performance
           | of ZGC vs G1 etc. How much hit (performance wise) would one
           | take for getting this awesome pause time limit?
        
             | kaba0 wrote:
             | Here is a quite elaborate one, though it is not totally up-
             | to-date:
             | 
             | https://jet-start.sh/blog/2020/06/09/jdk-gc-benchmarks-
             | part1
        
           | geodel wrote:
           | Ah, you are the author of article :). Thanks for replying!
           | Does ZGC compromise on throughput compare to G1 to achieve
           | low pause times?
        
             | perliden wrote:
             | ZGC in its current form trades a bit of throughput
             | performance for better latency. This presentation provides
             | some more details and some performance numbers (a link to
             | the slides is also available there).
             | https://malloc.se/blog/zgc-oracle-developer-live-2020
        
       ___________________________________________________________________
       (page generated 2021-03-23 23:01 UTC)