[HN Gopher] ZGC - What's new in JDK 16
___________________________________________________________________
ZGC - What's new in JDK 16
Author : harporoeder
Score : 154 points
Date : 2021-03-23 14:33 UTC (8 hours ago)
(HTM) web link (malloc.se)
(TXT) w3m dump (malloc.se)
| olodus wrote:
| Really impressive results.
|
| Sorry for my ignorance on the topic, but will this have any
| impact on other JVM languages or will this mostly only benefit
| Java itself?
|
| I realize even though I use JVM languages now and then I do not
| really know if they use their own GC implementation or make use
| of Java's. Does this differ between the languages maybe?
| buryat wrote:
| this will work for any language that runs on top of JVM, that's
| the beauty of the JVM, improvements benefit all its languages
| jfengel wrote:
| The JVM has its own garbage collector. Every language uses it.
|
| There may be tiny differences in the way code generators and
| optimizers work, which mean they may not get exactly the same
| properties out of equivalent code. For example, if they're
| generating a lot of objects behind the scenes, the GC
| improvements might help more, or less, or even do worse.
|
| But that's the kind of thing that's really dependent on the
| algorithm you've implemented. So mostly likely you get some
| benefit for free. If you don't, you'll need to benchmark to
| find out. The optimizers do a lot of work for you (and the JVM
| does a ton of language-independent optimization), but some
| things are up to experiment.
| bitmapbrother wrote:
| >After reaching that initial 10ms goal, we re-aimed and set our
| target on something more ambitious. Namely that a GC pause should
| never be longer than 1ms. Starting with JDK 16, I'm happy to
| report that we've reached that goal too. ZGC now has O(1) pause
| times. In other words, they execute in constant time and do not
| increase with the heap, live-set, or root-set size (or anything
| else for that matter). Of course, we're still at the mercy of the
| operating system scheduler to give GC threads CPU time. But as
| long as your system isn't heavily over-provisioned, you can
| expect to see average GC pause times of around 0.05ms (50 us) and
| max pause times of around 0.5ms (500 us).
|
| Very impressive and well done. Should Azul be worried?
| novium wrote:
| Since ZGC is in OpenJDK it should already be available in Zulu
| as well
|
| https://github.com/openjdk/jdk/tree/master/src/hotspot/share...
| modeless wrote:
| 1 ms pause times are pretty good. That's finally getting close to
| the point where it may no longer be the biggest factor preventing
| adoption in applications like core game engine code. Although at
| 144 Hz it's still 14% of your frame time, so it's hardly
| negligible.
|
| Even if the GC is running on an otherwise idle core there are
| still other costs like power consumption and memory bandwidth. So
| you still want to minimize allocation to keep the GC workload
| down.
|
| For too long GC people were touting 10 ms pause times as "low"
| and not bothering to go further, but truly low pause times _are_
| possible. I 'd love to see a new systems language that _starts_
| by designing for extremely low-pause GC, not manual allocation or
| a borrow checker. I think it would be possible to make something
| that you could use for real time work without having to
| compromise on memory safety and without having to pay the
| complexity tax Rust takes on for the borrow checker.
| moonchild wrote:
| > at 144 Hz it's still 14% of your frame time, so it's hardly
| negligible
|
| A single skipped frame is not a big deal and will probably not
| be noticed. It will probably happen anyway due to scheduling
| quirks, resource contention with other processes, existing
| variation in frametime...
|
| True realtime work requires no dynamic allocations whatsoever
| (which, notably, is not covariant with gc!), so I think 'low'
| pause times are an acceptable compromise. Where performance is
| a concern, you need to manually manage a lot of factors, among
| them GC/dynamic memory use. There's no runtime that can obviate
| that.
|
| Granted, 1ms pause times are probably still not low enough for
| realtime audio, and there may be room for some carelessness
| there (audio being soft realtime, not hard realtime). But I
| think just being careful to avoid dynamic allocation on the
| audio thread is probably a worthwhile tradeoff.
| devit wrote:
| Dynamic allocations don't cause any issues with hard
| realtime, as long as you don't run out of memory.
| moonchild wrote:
| Most allocators are not constant time, and are fairly slow
| anyway. (Actually GC tend to have faster allocators, but
| obviously unpredictable pauses.)
|
| (Though there was an allocator I saw recently that promised
| O(1) allocations. Pretty neat idea.)
| monocasa wrote:
| Core game code commonly uses custom allocators that do
| provide those semantics though.
|
| A bump allocator that you reset every frame is O(1) and a
| dozen or so cycles per allocation for example.
| moonchild wrote:
| Sure, yes. I was referring more to 'general-purpose
| dynamic allocator' (malloc or so). I agree custom memory
| management/reclamation techniques can be fine for RT;
| just semantics.
| modeless wrote:
| > A single skipped frame is not a big deal and will probably
| not be noticed.
|
| Attitudes like this are why my phone sucks to use and I get
| nauseous in VR and GC devs spent so long in denial saying 10
| ms pause times should be good enough. Yes, single dropped
| frames matter. If you don't think so then I don't want to use
| your software.
| kaba0 wrote:
| A single skipped frame usually means that we are talking
| about soft real time. And there it is absolutely
| acceptable, not in the average case, but eg. on a heavily
| used computer a slight drop in audio is "appropriate", it's
| not an anti-missile device.
|
| It won't make the normal case jittery, nauseus or anything
| like that. Also, in regards to your GC devs comment, I
| would say that attitudes like this is the problem.. The
| great majority of programs can do with much more than 10 ms
| pause times.
| akx wrote:
| A slight drop in audio would be perfectly unacceptable
| for eg computers running concerts.
| AaronFriel wrote:
| > A single skipped frame is not a big deal and will probably
| not be noticed.
|
| Some folks definitely notice this phenomenon, called a
| "microstutter" by that group. You can see it here:
|
| https://testufo.com/stutter#demo=microstuttering&foreground=.
| ..
| kaba0 wrote:
| Noone mentioned how frequent frame-skips are we talking
| about.
|
| Is a single frameskip in an hour a problem?
| barrkel wrote:
| A constant frame interval is better than occasional skipped
| frames. You don't need a super high frame rate for perceived
| smooth motion, but dropped frames look like stutter.
| brokencode wrote:
| Stutter is becoming much less of a factor with variable
| refresh rate display technology. Modern consoles, TVs, and
| many monitors are being built with VRR these days, and in a
| few years it will probably be ubiquitous.
|
| Unless you have a highly optimized game, you are probably
| not able to consistently run at a 144 Hz monitor's native
| refresh rate anyway, so even without skipping frames you
| will see stuttering. VRR solves this problem as well.
| syockit wrote:
| I'm not sure if you and GP share the same notion of
| stutter or not. I never saw stutters when limiting the
| game at 24 or 30 fps while playing on a 60 Hz LCD monitor
| in the past. It stutters only when the fps is not
| constant.
| Thaxll wrote:
| GC will most likely never be used in demanding games. You want
| total control over memory. 1ms sounds ok but still you don't
| know when and for how long the GC is going to kicks in.
| BenoitP wrote:
| > for how long the GC is going to kicks in.
|
| 1ms (max, average at 50us)
|
| And for the 'when' I'll add that the very concept of having a
| concurrent GC means you don't need to do a (potentially
| pausing) malloc right in the middle of what you're trying to
| do.
| monocasa wrote:
| The kind of people that care about GC pause times have
| their own allocators that are as cheap as jvm allocations
| and cheaper deallocation. They aren't poopooing GC's and
| then just calling regular malloc and free.
| Thaxll wrote:
| Engines rely on smart allocator and memory pool, they
| usually allocate everything beforehand. You're not running
| malloc between two frames. Imagine a game like Battlefield
| if you were to allocate memory for each fired bullet.
| CJefferson wrote:
| The biggest engine in gaming, Unity, uses c#, which is GCed.
| terramex wrote:
| And the amount of man-hours collectively spent on working
| around this terrible, terrible GC is immense.
|
| It was the worst GC implementation I've seen in my life,
| could cause 0.5s GC spikes every 10 seconds on Xbox One
| even though we were allocating none or very little memory
| during gameplay. The amount of pre-allocated and pooled
| objects was bringing it down to its knees, because Unity's
| GC is non-generational and checks every single object every
| time. In the end we moved a lot of the data into native
| plugins written in C++. Nothing super hard, but you choose
| high-level engine to avoid such issues.
|
| I've read that in 2019 they finally added incremental mode
| GC, that solves some of the issues but is still far cry
| from modern GC's.
| bitmapbrother wrote:
| I would say Unreal is the biggest game engine in terms of
| pervasiveness. Also, isn't C# just used as a scripting
| language in Unity? All of the heavy lifting is dome by the
| C/C++ backend.
| Thaxll wrote:
| It's def not the biggest, it's almost not used in "AAA"
| games also.
| liamkf wrote:
| Unreal also has a GC to deal with. I've spent more time
| on AAA games than I'd like to admit trying to
| hide/mitigate/optimize the hitch.
| The_rationalist wrote:
| _Although at 144 Hz it 's still 14% of your frame time_ well if
| we believe their numbers, the _worst case_ is 0.5ms so 6% of
| frame time for 144hz. Assuming their stated average pause time
| of 0.05ms then average pauses (and the GC isn 't constantly
| pausing) take 0.6% of frametime, which _is_ negligible. Though
| your concerns on throughput and resource usages stands. Well
| newer programming languages could leverage ZGC (and improve
| upon it) by targeting graalVM + it enable cross-language
| interop.
| modeless wrote:
| In my experience GC developers wildly underestimate their
| worst case, so I don't really believe that 0.5ms number. But
| more importantly, you should not use average pause time at
| all. At 144 Hz the 99th percentile frame time occurs more
| than once per second. If you want to avoid dropping frames
| you need to design for the worst case.
| doikor wrote:
| There is a worst case that is much worse then the now
| mentioned 1ms. Namely really big change in allocation rate
| in which case the gc cycle does not finish before running
| out of memory. In that case ZGC stops allocations
| ("Allocation Stall" in gc logs). But this is basically a
| failure mode and should not happen during normal operations
| at all.
|
| Though you can configure for this in a couple ways if you
| run into this issue
|
| 1. By telling it to treat some amount of heap as the max
| that is not actual max when it comes to it calculating when
| it should start next gc cycle. (-XX:SoftMaxHeapSize)
| https://malloc.se/blog/zgc-softmaxheapsize
|
| 2. Increasing the amount of concurrent gc threads so they
| will finish their work faster (-XX:ConcGCThreads)
|
| 3. Just run really large heap so the "run a gc cycle even
| if we don't need to based on allocation rate" gc cycle
| keeps the heap in check.
|
| Though after JDK 15 we have not had to mess with any of
| these. Prior to that we had to adjust soft max heap size a
| bit. With JDK 16 it should be even better I guess (should
| be upgrading sometime next week)
| kaba0 wrote:
| ZGC used to target 10ms as worst-case latency, and they
| target 3-4ms now I believe.
| jacques_chester wrote:
| The article above says that the target is 1ms.
| kaba0 wrote:
| Thanks for the correction, I remembered the 3-4 ms from
| the inside java podcast on the ZGC.
| jacques_chester wrote:
| No worries.
| vlovich123 wrote:
| One of the observations I've been making is that strategies like
| this of spreading the work around multiple threads almost seem to
| play with measurements more than necessarily improving the cost.
| So yes, the "stop the world phase" is shorter & cheaper. It's
| unclear the rest of the threads have more implicit overhead to
| support this concurrency (more book-keeping, participating
| implicitly in GC, etc). Supporting benchmarks of various
| workloads would be helpful to understand what tradeoffs were
| made.
| cogman10 wrote:
| Good observation.
|
| This is a fundamental principle of garbage collection. You can
| either have low latency or high throughput. You can't get both.
|
| Why is that?
|
| All optimizations that improve latency come at a cost.
| Generally, more book keeping, more checks, more frequent
| garbage collections. ZGC is one of those algorithms. It adds a
| new check every time you access memory to see if it needs to be
| relocated. That increases the size of objects but also the
| general runtime of the application.
|
| A similar thing happens with reference counting (which is on
| the extreme end of the latency/throughput tradeoff). Every time
| you give a shared pointer or release a shared pointer a check
| is performed to see if a final release needs to happen.
|
| On the flip side, a naive mark and sweep algorithm is trivially
| parallelizable. The number of times you check if memory is
| still in use is bound by when a collection happens. In an ideal
| state you increase heap size until you get the desired
| throughput.
|
| We get "violations" of some of these principles if we can take
| shortcuts or have assumptions about how memory is used and
| allocated. For example, the assumption that "most allocations
| are short lived" or the generational hypotheses leads to
| shorter pause times even when optimizing for throughput without
| a lot of extra cost. It's only costly when you've got an
| application that doesn't fit into that hypotheses (which is
| rare).
|
| Haskell has a somewhat unique garbage collector based on the
| fact that all data is immutable. They can take shortcuts
| because older references can't refer to newer references.
| The_rationalist wrote:
| _Haskell has a somewhat unique garbage collector based on the
| fact that all data is immutable. They can take shortcuts
| because older references can 't refer to newer references._ I
| wonder, can this be achieved for immutable datastructures in
| the JVM e.g records, lists ?
| [deleted]
| whateveracct wrote:
| > Haskell has a somewhat unique garbage collector based on
| the fact that all data is immutable. They can take shortcuts
| because older references can't refer to newer references.
|
| I don't think this is true? Because laziness is really
| heavily mutable under the hood. Not to mention that it has
| mutable references. But maybe there are some tricks in the GC
| I'm not aware of.
| cogman10 wrote:
| If you're interested in a fun read, they've published a
| paper on how they do garbage collection.
|
| http://simonmar.github.io/bib/papers/parallel-gc.pdf
| pron wrote:
| > Haskell has a somewhat unique garbage collector based on
| the fact that all data is immutable. They can take shortcuts
| because older references can't refer to newer references.
|
| When you don't mutate in OpenJDK you get essentially the
| same. Much of the cost of a modern GC (OpenJDK's G1, and soon
| probably ZGC, too) is write barriers, that need to inform the
| GC about reference mutations. If you don't mutate, you don't
| pay that cost. This is partly why applications that go to the
| extreme in the effort not to allocate and end up mutating
| more, might actually do worse than if they'd allocated more
| with OpenJDK's newer GCs.
|
| In fact, OpenJDK's GCs rely heavily on the assumption that
| old objects can't reference newer ones unless explicitly
| mutated, and so require those barriers only in old regions.
| carry_bit wrote:
| In general if you the highest throughput you'll also get long
| pause times, since the techniques to reduce the max pause times
| depends on inserting barriers into the application code.
|
| Ignoring pause times is fine for batch processing, but not
| ideal for interactive systems.
| BenoitP wrote:
| There is an overhead: They use higher bits in the address space
| to indicate various stages in the object's collection
| (Shenandoah has a forwarding pointer IRCC).
|
| This means you may not activate the compressed pointers
| optimization.
| BenoitP wrote:
| > you can expect to see average GC pause times of around 0.05ms
| (50 us)
|
| This is nuts (and very well below OS jittering)
| coldtea wrote:
| Question: is the GC suitable for use with something like Idea or
| is it more for server workloads? Would it reduce UI GC-pauses lag
| accordingly?
| perennus wrote:
| I tried it last week actually with OpenJDK15+Windows. With
| JDK16, IntelliJ didn't boot.
|
| -XX:+UseZGC: Memory usage for my project dropped to a constant
| 600 megs. Using the IDE felt just as fast as the normal
| experience.
|
| -XX:+UseShenandoahGC, -Xmx4g. Shenandoah GC used a constant 4
| gigs of ram. It was a slower user experience for me.
|
| In the end, I went back to the default settings, because the
| custom JDK changes the look and feel and I don't like it.
| coldtea wrote:
| > _-XX:+UseZGC: Memory usage for my project dropped to a
| constant 600 megs. Using the IDE felt just as fast as the
| normal experience._
|
| Shame, I hoped it would feel faster than the normal
| experience, with (even infrequent) user-felt GC pauses
| completely eliminated.
| AnthonBerg wrote:
| I confirm that ZGC works with IntelliJ IDEA, and it seems to me
| that it makes IDEA respond quite a bit faster. It's not hard to
| get IntelliJ IDEs to use ZGC by editing the VM properties file.
| bestinterest wrote:
| This might be an odd question but how often does garbage
| collection run and whats the usual time taken over a period of
| time?
|
| Say I'm doing a drawing/game app and creating a few hundred heap
| objects a second that need to get garbage collected.
|
| I have no idea on how often GC is run on a typical app and how
| much real time it takes over say an hour of an semi complex app
| running on average. It obviously depends on the app but I do not
| even have a number average cost of a GC language for some typical
| web app.
|
| I only know 'GC's are bad' because the 100s of HackerNews
| comments dismissing languages because they have a GC for some
| reason rather than hard examples of them eating up time.
| brokencode wrote:
| GC can be very efficient when considering the average cost over
| time, and is faster than reference counting for instance. It
| also can have nice features such as heap compaction which you
| can't easily do with manual memory management.
|
| But the main thing most folks have problems with is the random
| latency spikes you get with GC. The GC can start at any time in
| most languages, and might stop all threads in your program for
| maybe dozens or hundreds of ms. This would be visible to users
| if you are rendering frames at a constant rate in a game, since
| each frame takes only around 16 ms in a 60 FPS game.
|
| That's what's exciting about changes like what they are doing
| with ZGC. They are saying the max garbage collection time is
| 0.5 ms in normal situations, and the average time is even
| lower. Most games can accommodate that without a problem.
|
| FYI, this is also important for web servers as well. Some web
| servers have a huge amount stored in memory, and the GC could
| take hundreds of ms or even multiple seconds to collect at
| random times in extreme cases. This can make a web request take
| perceptibly longer.
|
| Also, if you have multiple machines communicating with one
| another and randomly spiking in latency due to GC, then worst
| case latency can add up to pretty terrible numbers if you are
| not careful.
| adamdusty wrote:
| After some research I couldn't really find much of an answer.
|
| The thing about GC is you either don't care at all, or you
| don't want it at all. There's rarely a case where you know how
| many GC cycles you can handle in a certain period. Web dev, GC
| all you want. Games can handle GC but its likely you'll need to
| be cognitive of memory use. Embedded stuff doesn't have enough
| memory to utilize a GC.
|
| I'm sure why GC languages get so much hate. I do a lot with C#
| and the runtime gives a few options for controlling allocations
| and accessing memory, so I can usually get it to be fast
| enough.
| gopalv wrote:
| > Say I'm doing a drawing/game app and creating a few hundred
| heap objects a second that need to get garbage collected.
|
| Was literally my job ten years ago to optimize this and I was
| struggling with a GC'd language with a proprietary
| implementation (flash+actionscript).
|
| The problem is not with hundreds of heap objects per-frame, the
| problem is that they would accumulate to the tens of thousands
| before the first GC trigger happens.
|
| And the GC trigger might happen in the middle of drawing a
| frame, even worse, at the end of drawing a frame (which means
| even a 10ms pause means you miss the 16ms frame window at
| 60fps).
|
| The problem that most people had was that this was unevenly
| distributed and janky to put it in the lingo. So you'd get 900
| frames with no issues and a single frame that freezes.
|
| So most of the problem people have with GC pauses is the
| unpredictability of it and the massive variations in the 99th
| percentile latency in the system, making it look slower than it
| actually is.
|
| Most of the original GC implementations scale poorly as the
| memory sizes went up and the amount of possible garbage went
| up, until the GC models started switching over the garbage-
| first optimizations, thread-local alloc buffers and survivor
| generation + heap reserves etc (i.e we have lots of memory, our
| problem is with the object walking overheads - so small objects
| with lots of references is bad).
|
| The GC model is actually pretty okay, but it is still
| unpredictable enough that tuning the GC or building an
| application on top of a GC'd language which has strict latency
| requirements is hard.
|
| However, as a counterpoint - OpenHFT.
|
| Clearly it is possible, but it takes a lot of alignment across
| all the system layers, but at that point you might as well
| write C++ because it is not portable enough to run anywhere.
| jankotek wrote:
| It really depends on application and complexity of object
| graph. Short lived object usually have low overhead. Long lived
| objects with huge heap may cause a problem.
|
| In past GC had bad reputation for increased and unpredictable
| latencies. In old JVMs GC would pause execution to traverse
| object graph.
|
| In general do not worry about GC, unless you run into
| performance issues. If performance is a problem, run continuous
| profiler such as Flight Recorded. It has very little overhead.
| _ph_ wrote:
| And in most cases it isn't GC which is the problem, but the
| program doing too many heap allocations. Cutting heap
| allocations down improves the speed of most programs, with or
| without GC.
| dignan wrote:
| GC is a memory management technique with tradeoffs like all the
| others.
|
| GC has many different implementations, with widely ranging
| properties. For example, the JVM itself currently supports at
| least 3 different GC implementations. There are also different
| types of GC's, so for example in a generational garbage
| collection system you'll typically see two or three generations
| of GCs, depending on the generation (how many GC cycles it has
| survived) of the objects it collects. The shortest GC's in
| those systems are usually a couple milliseconds, while the
| longest ones can be many seconds.
|
| GC isn't always a problem. If your application isn't latency
| sensitive, it's not a big deal. Though if you tune your network
| timeouts to be too low, even something that is not really
| latency sensitive can have trouble because of GC causing
| network connections to timeout. Even if it is a latency
| sensitive applicatoin, if GC "stop the world" pauses - pauses
| that stop program execution, are short it can be OK.
|
| One reason you'll see people say GCs are bad is for those
| latency sensitive applications. For example, I previously
| worked on distributed datastores where low latency responses
| were critical. If our 99th percentile response times jumped
| over say 250ms, that would result in customers calling our
| support line in massive numbers. These datastores ran on the
| JVM, where at the time G1GC was the state of the art low-
| latency GC. If the systems were overloaded or had badly tuned
| GC parameters, GC times could easily spike into the seconds
| range.
|
| Other considerations are GC throughput and CPU usage. GC
| systems can use a lot of CPU. That's often the tradeoff you'll
| see for these low-latency GC implementations. GC's also can put
| a cap on memory throughput. How much memory can the GC
| implementation examine with how much CPU usage with what amount
| of stop-the-world time tends to be the nature of the question.
| geodel wrote:
| Sub milli sec GC pause is very impressive. Though one thing to me
| is not clear is that if it is true only for very large heaps or
| it will be great also for typical service/micro service heaps in
| range of 4-32 GB.
| chrisseaton wrote:
| I think the whole point is the pause time doesn't vary with the
| heap size.
| pradeepchhetri wrote:
| It works great even for large heap sizes. I moved my ES cluster
| (running with around 92G heap size) from G1GC to ZGC and saw
| huge improvements in GC. Best part about ZGC is you don't need
| to touch any GC parameter and it autotunes everything.
| JD557 wrote:
| >running with around 92G heap size
|
| I'm curious about this choice. The elasticsearch
| documentation recommends a maximum heap slightly below 32GB
| [1].
|
| Is this not a problem anymore with G1GC/ZGC, or are you
| simply "biting the bullet" and using 92G of heap because you
| can't afford to scale horizontally?
|
| 1: https://www.elastic.co/guide/en/elasticsearch/reference/7.
| 11...
| capableweb wrote:
| > because you can't afford to scale horizontally?
|
| Doesn't have to be because of affordance but rather it's
| more efficient and cheaper to scale vertically first, both
| in monetary costs and in time/maintenance costs.
| vosper wrote:
| On hardware, but not on a cloud setup? We run several
| hundred big ES nodes on AWS, and I believe we stick to
| the heap sizing guidelines (though I've long wondered if
| fewer instances with giant heaps might actually work ok,
| too)
| toast0 wrote:
| Cloud is trickier to price than real hardware. On real
| hardware, filling the ram slots is clearly cheaper than
| buying a second machine, if ram is the only issue. If you
| need to replace with higher density ram, sometimes it's
| more cost effective to buy a second machine. Adding more
| processor sockets to get more ram slots is also sometimes
| more, sometimes less cost effective than adding more
| machines. Often, you might need more processing to go
| with the ram, which can change the balance.
|
| In cloud, with defined instance types, usually more ram
| comes with more everything else, and from pricing listed
| at https://www.awsprices.com/ in US East, it looks like
| within an instance type, $ / ram is usually consistent.
| The least expensive (per unit ram) class of instances is
| x1/x1e which are 122 Gb to 3904, so that does lean
| towards bigger instances being cost effective.
|
| Exceptions I saw are c1.xlarge is less expensive than
| c1.medium, c4.xlarge is less than other c4 types and c4
| is more expensive than others, m1.medium < m1.large ==
| m1.xlarge < m1.small, m3.medium is more expensive than
| other m3, p2.16xlarge is more expensive than other p2,
| t2.small is less expensive than other t2. Many of these
| differences are a tenth of a penny per hour though.
| legerdemain wrote:
| Heaps "slightly below 32GB" are usually because of the
| -XX:+UseCompressedOops option, which allows Java to address
| up to 32GB of memory with a smaller pointer. Between
| 32-35GB of heap, you're just paying off the savings you
| would have gotten with compressed object pointers, but if
| you keep cranking your heap further after that, you'll
| start getting benefits again.
| JanecekPetr wrote:
| This, exactly. One added issue is that ZGC does NOT
| support compressed oops at all.
| manasvi_gupta wrote:
| Please specify Elasticsearch & JDK version. Also, index size
| and heap size per node.
|
| From my experience, high heap sizes are unnecessary since
| Lucene (used by ES) has greatly reduced heap usage by moving
| things off-heap[1].
|
| [1] - https://www.elastic.co/blog/significantly-decrease-
| your-elas...
| pron wrote:
| Whether G1 or ZGC are the best choice depends on the workload
| and requirements, but G1 in recent JDK versions also requires
| virtually no tuning (if your G1 usage had flags _other_ than
| maximum heap size, maybe minimum heap size, and maybe pause
| target, try again without them).
| gher-shyu3i wrote:
| Did you notice a change in the peak memory usage?
| vosper wrote:
| How (and how much) did these improvements manifest? For
| example, did you measure consistently faster response times
| when running ZGC rather than G1GC? If so, by how much? I'm
| always looking for a way to improve ES response times for our
| users.
| perliden wrote:
| ZGC pause times will be the same regardless of heap size. ZGC
| currently supports heaps from 8MB to 16TB. So if you have
| 4-32GB heaps and want low latency, then ZGC is definitely
| something to try.
| eklavya wrote:
| Hey, is there any benchmark comparing throughput performance
| of ZGC vs G1 etc. How much hit (performance wise) would one
| take for getting this awesome pause time limit?
| kaba0 wrote:
| Here is a quite elaborate one, though it is not totally up-
| to-date:
|
| https://jet-start.sh/blog/2020/06/09/jdk-gc-benchmarks-
| part1
| geodel wrote:
| Ah, you are the author of article :). Thanks for replying!
| Does ZGC compromise on throughput compare to G1 to achieve
| low pause times?
| perliden wrote:
| ZGC in its current form trades a bit of throughput
| performance for better latency. This presentation provides
| some more details and some performance numbers (a link to
| the slides is also available there).
| https://malloc.se/blog/zgc-oracle-developer-live-2020
___________________________________________________________________
(page generated 2021-03-23 23:01 UTC)