Abstract: Apparently Full GC is done in parallel in the Java 10 G1 collector. Or is it? In this newsletter we set out to discover the truth by dumping the GC CPU usage with the new Unified JVM Logging.
Welcome to the 256th edition of The Java(tm) Specialists' Newsletter, sent to you from a sunny Crete. Today is a special day, for at least two reasons. First off, we have a lunar phenomenon that occurs once every 19 years. My teenage daughter noticed that there was no full moon in February. This means we had two full moons in January and also two full moons in March. You have heard the expression once in a blue moon? The moon does not really become blue. Neither is it made of cheese. And yes, it is always round, Silke. When two full moons happen in one calendar month, we call this a "blue moon". Today is the first full moon after the spring equinox, called Paschal Moon (not Pascal or Delphi or Java Moon!). It is also a blue moon. Thus we have a blue Paschal Moon. Next time will be in 2037. Then in 2056. It is rare. Enjoy it - go for a walk in the night, or even better, run 5km. Watch your step though and take along some silver bullets. You never know. I will be wearing my trusty NAO+ Petzl headlamp [ISBN B01GFPMEXY] for the rough patches.
The second reason today is special is that this is newsletter number 1<<8. I doubt I will get to 1<<9 in my lifetime and 1<<10 is certainly out of my reach.
Oh and of course, Catholic Easter is tomorrow, happy Easter to my non-Orthodox Christian friends :-) In Orthodox Christianity, we commemorate Easter one week later this year. Tomorrow is "Palm Sunday" for us, followed by "Holy Week" when we all fast from meat. Sobering time, yet my favourite religious holiday.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
The G1 garbage collector was infamous for doing a single-threaded full GC cycle. At the time when you need all the hardware that you can muster to scrounge for unused objects, we bottlenecked on a single thread. In Java 10 they fixed this. The full GC now runs with all the resources that we throw at it.
To demonstrate this, I wrote ObjectChurner, which creates a bunch of different sized byte arrays. It holds onto the objects for some time. The sizes are randomized, but in a controlled, repeatable way.
import java.util.*; import java.util.stream.*; public class ObjectChurner { // 4GB heap, 2048 regions, 2mb each // humongous objects: 10*1mb, 5*2mb, 3*10mb, 1*50mb // largish objects: 10_000*100kb, 10_000*10kb, 100_000*1kb // smaller objects: 100_000*100, 100_000*10, 100_000*1 private final int[] sizes; private int nextPos = 0; private final byte[][] data = new byte[1 * 1024 * 1024][]; public ObjectChurner() { List<Integer> sizes = new ArrayList<>(); // humongous objects add(sizes, 10, 1024 * 1024); add(sizes, 5, 2 * 1024 * 1024); add(sizes, 3, 10 * 1024 * 1024); add(sizes, 1, 50 * 1024 * 1024); // largish objects: add(sizes, 10_000, 100 * 1024); add(sizes, 10_000, 10 * 1024); add(sizes, 100_000, 1 * 1024); // largish objects: add(sizes, 100_000, 100); add(sizes, 100_000, 10); add(sizes, 100_000, 1); Collections.shuffle(sizes, new Random(0)); this.sizes = sizes.stream() .mapToInt(Integer::intValue) .toArray(); } private void add(List<Integer> sizes, int number, int size) { for (int i = 0; i < number; i++) { sizes.add(size); } } private void churn() { for (int i = 0; i < 10; i++) { for (int size : sizes) { data[nextPos++ & (data.length-1)] = new byte[size]; } System.out.println("Next cycle"); } } public static void main(String... args) { ObjectChurner churner = new ObjectChurner(); churner.churn(); } }
Java 9 introduced Unified VM Logging, an absolutely amazing change that allows us to create logs on just about anything that happens in the JVM. Would you like to know when threads are started? Yep, you can create logs for that. You can turn the logs on and off at runtime with jcmd. I am running a webinar on Tuesday 3rd April 2018 at 08:00 UTC on how we can use jcmd to get information about our string constant pool. The webinar is free for anyone enrolled in Heinz's Happy Hour Season 02.
To demonstrate the single-threadedness of the full GC in
Java 9, run the code above with the following JVM parameters:
-Xlog:gc,gc+cpu::uptime -Xmx4g -Xms4g -Xlog:gc*:details.vgc
This will output each GC event and its CPU usage to stdout, showing only the uptime
as a tag.
The setting -Xlog:gc*
is like the
-XX:+PrintGCDetails of previous Java versions. We will write
this to the file details.vgc.
When I run this code with Java 9.0.4, I see lines that look like this:
[5.585s] GC(54) Pause Young 3860M->3852M(4096M) 25.170ms [5.585s] GC(54) User=0.13s Sys=0.00s Real=0.03s
The first line tells us it was a pause in the young generation, it shows how much memory was collected and how long it took. This GC was a G1 Evacuation Pause. The next line shows user time, system time and real time. Real time is how long the pause took if we measured it with a stopwatch. We also call this "wall clock time". User time is how much cpu was burned up across all the cores. In our example, we know that the real time was precisely 25.170ms. User time was 130 something milliseconds. My machine is 1-4-2, thus a single socket, four cores, two hyperthreads per core. We can see from the ratio of user/real time that we were collecting in parallel.
A while ago, I was tuning an application that had a 17 second GC pause in young space. Real time was thus 17 seconds. However, the user time was 29 milliseconds. Even though it showed up in the GC log as a pause, it was not the garbage collector that was taking this long. It might have been a virus scanner or a backup system or a network fault or a counted loop that caused the threads in the JVM to take a long time to reach a safepoint. Keeping an eye on the user/real ratio is a good idea.
Scanning down the output some more, we see these lines:
[6.275s] GC(56) Pause Full 4032M->3286M(4096M) 667.419ms [6.275s] GC(56) User=0.77s Sys=0.01s Real=0.67s
The full GC thus took 667.419 milliseconds. However, if we look at the CPU usage, we discover that the user time is very close to the real time. Hardly any parallelism was happening.
We now run my ObjectChurner in Java 10, with the exact same settings. The Pause Young looks much the same:
[4.839s] GC(55) Pause Young 3658M->3653M(4096M) 29.201ms [4.839s] GC(55) User=0.14s Sys=0.01s Real=0.03s
But the Pause Full now has a user/real ratio of 5.45, demonstrating that indeed, Java 10 uses all my cores to take out the trash:
[5.423s] GC(59) Pause Full 4028M->3286M(4096M) 513.026ms [5.423s] GC(59) User=2.80s Sys=0.04s Real=0.52s
In our toy example ObjectChurner, we deliberately put
unrealistic stress on the garbage collector in order to elicit
Pause Full without resorting to System.gc()
calls. We violated the "weak generational hypothesis" by
holding onto objects for too long. This would flatten any
garbage collector. The purpose of the experiment was to see
whether Full GC was really done in parallel in Java 10. It is.
The transition from Java 4 to 5 was huge. Java 5 to 6 was minor. Java 6 to 7 was medium. Java 7 to 8 was huge, especially if we wanted to take advantage of Java 8 Streams. Java 8 to 9 is gigantic for framework providers that want to be backwards compatible. Fortunately the migration is not too severe for typical application developers. Java 9 to 10 is small. It might be worthwhile to skip Java 9 and migrate directly to Java 10 or 11.
Kind regards from Crete
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.