[HN Gopher] JVM statistics cause garbage collection pauses (2015)
___________________________________________________________________
JVM statistics cause garbage collection pauses (2015)
Author : tosh
Score : 65 points
Date : 2024-09-19 14:05 UTC (8 hours ago)
(HTM) web link (www.evanjones.ca)
(TXT) w3m dump (www.evanjones.ca)
| ta988 wrote:
| Is it still the case?
| lbalazscs wrote:
| In 2015 there was no ZGC. Today ZGC (an optional garbage
| collector optimized for latency) guarantees that there will be
| no GC pauses longer than a millisecond.
| survivedurcode wrote:
| I would check your answer. These are pauses due to time spent
| writing to diagnostic outputs. These are not traditional
| collection pauses. This affects both jstat as well as writes
| of GC logs. (I.e. GC log writes will block the app just the
| same way)
| pjmlp wrote:
| Which is why for anything serious one should be using
| Flight Recorder instead.
| hawk_ wrote:
| ZGC doesn't remove safepoint requests on threads which is the
| root cause. "Guarantees" here are with very heavy quotes.
| kanzenryu2 wrote:
| Sadly in many cases no; it's not magic. This nirvana is
| restricted to cases where there is CPU bandwidth available
| (e.g. some cores idle) and plenty of free RAM. When either
| CPU or RAM are less plentiful... hello pauses my old friend.
| sunshowers wrote:
| This is why memory-bound services generally use languages
| without mandatory GC. Tail latency is a killer.
|
| Rust's memory management does have some issues in practice
| (large synchronous drops) but they're relatively minor and
| easily addressed compared to mandatory GC.
| esaym wrote:
| These modern garbage collectors are not simply free though. I
| got bored last year and went on a deep dive with GC params
| for Minecraft. For my needs I ended up with:
| -XX:+UseParallelGC -XX:MaxGCPauseMillis=300 -Xmx2G -Xms768M
|
| When flying around in spectator mode, you'd see 3 to 4
| processes using 100%. Changing to more modern collectors just
| added more load to the system. ZGC was the worst, with 16+
| processes all using 100% cpu. With the ParallelGC, yes you'll
| get the occasional pause but at least my laptop is not
| burning hot fire.
| namibj wrote:
| You'll need more spare heap for ZGC.
| ackfoobar wrote:
| And using generational ZGC will probably lower CPU usage
| a lot.
| plandis wrote:
| Yes no GC is free (well perhaps Epsilon comes close :)
|
| It's a low pause GC so latencies, particularly tail
| latencies, can be more predictable and bounded. The
| tradeoff you make is that it uses more CPU time and memory
| in order to operate.
| hinkley wrote:
| The cost of statistics gathering on a GC implementation that
| avoids ineffective GC activity is less affected by the cost
| of telemetry (no news is good news), but it is still
| affected.
| ackfoobar wrote:
| Probably yes.
|
| https://bugs.openjdk.org/browse/JDK-8076103
|
| Closed with "Won't Fix".
| flykespice wrote:
| ...With no reasoning at all?
| ackfoobar wrote:
| A bit more context in the mailing list:
|
| > It's a non-issue with a pure ram-based file system. Or
| tmpfs with no swap.
|
| https://mail.openjdk.org/pipermail/hotspot-runtime-
| dev/2015-...
| smrtinsert wrote:
| Is this account a submission bot of some sort?
| throwaway04324 wrote:
| The account seems to be connected to a real person, but it has
| a high number of submissions (over 350 submissions the past 30
| days)
| geodel wrote:
| Also spending a lot cause higher credit card bills.
| cogman10 wrote:
| > in /tmp
|
| Why is `/tmp` on disk and not a tmpfs mount?
| sltkr wrote:
| There is no law that says /tmp must be on tmpfs, and
| historically this wasn't done, because tmpfs is limited in size
| to some faction of the kernel's memory, while /tmp may be used
| to store much larger files.
|
| For example, GNU sort can sort arbitrarily large input files,
| which is implemented by splitting the input into sorted chunks
| that are written to a temporary directory, /tmp by default. But
| this is based on the assumption that /tmp can store
| significantly larger files than fit in memory, otherwise the
| point is moot. So using tmpfs makes /tmp useless for this type
| of operation.
|
| In the end, it's a trade-off between performance and disk
| space. I also prefer to mount /tmp on tmpfs for performance
| reasons, but you should not assume that this is the case on all
| systems.
| aidenn0 wrote:
| While I run /tmp on disk, I should point out that tmpfs is
| not limited to the size of RAM; contents of tmpfs can be
| swapped out just like any other memory allocation.
| aidenn0 wrote:
| Why would I want it on tmpfs? Only advantage I see is slightly
| improved boot times (/tmp is typically cleared on boot, which
| is obviously not necessary for tmpfs).
| hinkley wrote:
| Slightly simpler handling for docker containers. Particularly
| if you run multiple copies of the same image on one box
| (blue-green deploys, process-per-cpu programming languages,
| etc)
| sltkr wrote:
| > The pauses occur even [..] if you call mlock
|
| I wonder how this is even possible. The only scenario I can think
| of involves a page fault on the page table itself (i.e., the page
| is locked into memory, but a page fault occurs during virtual-to-
| physical address translation). Does anyone know the real reason?
| survivedurcode wrote:
| Probably because pages mapped, even if they are locked into
| memory are not allowed to stay dirty forever. Does this help?
| https://stackoverflow.com/a/11024388 (In contrast, if you
| mlocked but never wrote to the pages, you probably would not
| encounter read pauses)
| pjmlp wrote:
| For proper statistics use Visual VM or Flight Recorder, if using
| an OpenJDK derived JVM implementation.
|
| Also note that not all JVMs are made alike, and there are plenty
| to chose from.
| hashmash wrote:
| When using the `-XX:+PerfDisableSharedMem` workaround, VisualVM
| cannot attach to the running process anymore.
| jakewins wrote:
| Man I remember being bit by this in migrating to AWS - this had
| like snuck through on fast on-prem disks, but as soon as that
| /tmp was on RDS oh boy, it was a dozy.
| hinkley wrote:
| Stuff like this is why back when I still wrote Java we only
| wanted to turn on JVM telemetry on production boxes if they were
| canaries. Slower you can work around by deploying more copies.
| But jitter is not something you can do much about.
| opentokix wrote:
| Using ebpf, perf and flamegraphs would let him find this in a
| couple of hours. That was not available for him in 2015 tho.
___________________________________________________________________
(page generated 2024-09-19 23:01 UTC)