https://www.evanjones.ca/jvm-mmap-pause.html
The Four Month Bug: JVM statistics cause garbage collection pauses
about | archive
[ 2015-March-24 16:56 ]
TL;DR: The JVM by default exports statistics by mmap-ing a file in /
tmp (hsperfdata). On Linux, modifying a memory mapped file can block
until disk I/O completes, which can be hundreds of milliseconds.
Since the JVM modifies these statistics during garbage collection and
safepoints, this causes pauses that are hundreds of milliseconds
long. To reduce worst-case pause latencies, add the
-XX:+PerfDisableSharedMem JVM flag to disable this feature. This will
break tools that read this file, like jstat. Update: how I found this
problem
Long version: There are few things I find more satisfying that
solving a challenging bug, and this is the hardest bug I can
remember. I've spent four months figuring this one out. The root
cause is that on rare occasions, writing to memory that is a memory
mapped file on Linux will block, waiting for disk writes to complete.
This is surprising, since in the code it doesn't look like modifying
some variable is related to disk I/O. I wrote a test program to
demonstrate this behaviour, and I was able to cause pauses on
multiple file systems: ext2, ext3, ext4, btrfs (extremely common),
and xfs (less common). The pauses occur even if the I/O is to a
separate disk, or if you call mlock. The only workaround I've found
is to place the mmap-ed file in tmpfs (a RAM disk), or disable it
completely.
I don't know exactly why the Linux kernel does this, but the pauses
do not seem to occur for reads, so the Linux kernel is marking these
pages read-only. A friend suggested this is the kernel's way of
reducing the write I/O rate under overloaded conditions. If you know
exactly why the kernel is doing this, I would love to hear it.
Since I've discovered this, a number of services at Twitter are
experimenting with adding the -XX:+PerfDisableSharedMem JVM flag and
seeing a significant reduction in worst-case JVM pause latency. I
highly recommend testing this out if you run a latency sensitive
service.
Symptoms
I first noticed some JVM garbage collection pauses that were taking
more real time than user plus system time. Here is an example garbage
collection log entry:
2014-12-10T12:38:44.419+0000: 58758.830: [GC (Allocation Failure)[ParNew: 11868438K->103534K(13212096K), 0.7651580 secs]
12506389K->741669K(17406400K), 0.7652510 secs] [Times: user=0.36 sys=0.01, real=0.77 secs]
The last section of this line shows that this garbage collection took
770 milliseconds of real time (real=0.77). However, the garbage
collection threads only consumed 370 milliseconds of CPU time (user=
0.36 sys=0.01). The JVM uses multiple threads for GC, so normally the
user plus system time is multiple times higher than the real time.
This suggests the kernel has blocked the GC threads for some reason.
The cause is that the JVM creates an "hsperfdata" file in /tmp, with
the name hsperfdata_(username), which is a memory mapped file that
records statistics about safepoints and garbage collections. When the
JVM threads are reaching a safepoint, they increment some counters in
this memory. Occasionally the Linux kernel blocks those threads until
it finishes some amount of disk I/O.
Reproducing the pauses
You just need to cause a lot of writes to a single disk while
measuring the time to write to an mmaped file:
1. Start a machine with at least 4 CPUs to make sure there is plenty
of CPU time available (I tested this on AWS and Google Cloud VMs
with at least 8 CPUs and 7 GB of RAM). Use the latest Linux
kernel and distribution you want. I used Ubuntu 14.10 with kernel
3.16.0-31-generic.
2. Copy the files from my reproduction Github repository; See the
README there for additional instructions.
3. Compile and run mmapwritepause or MakeGarbage (with verbose
safepoint times).
4. While running, cause a lot of writes with diskload.sh.
Related Issues
* Buffered writes can stall
* Causes of random access memory pauses
* Garbage collection page scan attacks
* Optimizing Linux Memory Management (zone reclaim)