Abstract: Java 8 magically makes all our code run in parallel. If you believe that, I've got a tower in Paris that I'm trying to sell. Really good price. In this newsletter we look at how the parallism is determined and how we can influence it.
Welcome to the 220th issue of The Java(tm) Specialists' Newsletter, sent to you from the island of the Minotaur. Before Crete, I lived in South Africa for 35 years. I grew up in the surreal system of "apartheid". Everybody was grouped according to the pigmentation of their skin (paper bag test) and the curliness of their hair (pencil test). Low melanin levels made you "European". Otherwise you were "Non-European". Visitors from the USA would get confused and join the much longer "Non-European" queue. Since my family have lived in Africa since 1870, I never qualified for European citizenship. I am African, even though I look and sound German. Even German passport officials find it hard to believe that I am not German. And now I am Greek, thanks to the amazing generosity of the Hellenic State. Similarly, people have a hard time believing that I am, even when I show them my Greek ID. The look on their faces is priceless when I start speaking their language :-) If you are thinking of emigrating to Greece - I also did a PhD in Computer Science and that was easier and took less time.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
Java 8 has been with us for two months and I'm getting requests from companies to deliver training on how to program with the new constructs. On my question, how soon they would be writing production code in Java 8, the answer is universally "not yet, we're just looking at it for now". Whenever a new major version is released, it takes a while for good coding idioms to be established. It was the same with Java 5 generics. Initially, programmers tried all sorts of very complicated things. I am guilty myself of doing things that were beyond the original design (see Strategy Pattern with Generics). But over the years we have started using them more sparingly and they are now most commonly used only to make collections a bit safer.
I believe we will have the same experience with Java 8, especially some of the cooler features, such as lambdas and parallel streams. The promise is that our code will magically run faster. The stream will automatically flow through the Fork/Join pool and be executed in parallel. I have heard a few talks already on the subject on Java 8 and they all contained mistakes on this very important topic. I will address parallel streams in more detail in a later newsletter, once I've had a chance to do some proper analysis of it. In this issue, I would like to ask a very simple question, but one that is really important because so much hinges on it. And the question is this: Where do the threads for this parallel magic come from?
In Java 8, we have a common Fork/Join pool, which we can access with
ForkJoinPool.commonPool()
. This is used for parallel streams,
parallel sorting, CompletableFuture, etc. When you construct a Fork/Join pool,
you do not specify the maximum number of threads. You instead specify a
desired parallelism, which says how many active threads you would like to
run at the same time. When a thread blocks on a phaser, another thread is
created to keep the pool at the required active thread count. The phaser is
the only synchronizer that will cause this behaviour. A Fork/Join pool
has as maximum number of threads 32767, but most operating systems will fail
with an OutOfMemoryError long before this number is reached. In this sample
code, I fork new RecursiveActions until we reach the first phase (after 200
threads have arrived). If we increase the phases to a larger number, say
for example to 100_000, then this code will fail.
import java.util.concurrent.*; public class PhaserForkJoin { public static void main(String... args) { ForkJoinPool common = ForkJoinPool.commonPool(); Phaser phaser = new Phaser(200); common.invoke(new PhaserWaiter(phaser)); } private static class PhaserWaiter extends RecursiveAction { private final Phaser phaser; private PhaserWaiter(Phaser phaser) { this.phaser = phaser; System.out.println(ForkJoinPool.commonPool().getPoolSize()); } protected void compute() { if (phaser.getPhase() > 0) return; // we've passed first phase PhaserWaiter p1 = new PhaserWaiter(phaser); p1.fork(); phaser.arriveAndAwaitAdvance(); p1.join(); } } }
The Fork/Join pool thus does not have a practical maximum number of threads, only the desired parallelism, which shows us how many concurrently active threads we should allow.
Having a common pool is great, because it means that we can share the same pool for different types of jobs, without exceeding the total desired parallelism of the machine that the code is running on. Of course, if one of the threads blocks in any way besides with a Phaser, then this common pool will not perform nearly as well as was hoped.
The default value for the desired parallelism of the common FJ pool
is Runtime.getRuntime().availableProcessors() - 1
. Thus
if you take a dual-core machine and try to run parallel sort with
Arrays.parallelSort()
, it will default to the ordinary
Arrays.sort()
method. Despite what you might have been
promised during Oracle presentations, you would not see any speedup at
all on a dual-core machine. (Even mentioned on
Developer.com
by someone who evidently didn't try it out themselves.)
However, a bigger issue is that Runtime.getRuntime().availableProcessors()
does not always return the value that you expect. For example, on my
dual-core 1-2-1 machine, it returns the value 2, which is what I would expect.
But on my 1-4-2 machine, meaning one socket, four cores and two hyperthreads per core,
this method returns the value 8. However, I only have 4 cores and if the code is
bottlenecked on CPU, will have 7 threads competing for the CPU cycles instead of a
more
reasonable 4. If my bottleneck is on memory, then I might get a 7x speedup on the
test.
But that's not all! One of our fellow Java Champions found a case where he had a 16-4-2 machine (thus with 16 sockets, each with four cores and 2 hyperthreads per core) return the value 16! Based on the results on my i7 MacBook Pro, I would have expected the value to be 16 * 4 * 2 = 128. Run Java 8 on this machine, and it will configure the common Fork/Join pool to have a parallelism of only 15. As Brian Goetz pointed out on the list, "The VM doesn't really have an opinion about what a processor is; it just asks the OS for a number. Similarly, the OS usually doesn't care either, it asks the hardware. The hardware responds with a number, usually the number of "hardware threads". The OS believes the hardware. The VM believes the OS."
Fortunately there is a workaround. On startup, you can specify the common pool parallelism
with the system property java.util.concurrent.ForkJoinPool.common.parallelism
.
Thus we could start this code with -Djava.util.concurrent.ForkJoinPool.common.parallelism=128
and
it would show us that our parallelism is now 128:
import java.util.concurrent.*; public class ForkJoinPoolCommon { public static void main(String... args) { System.out.println(ForkJoinPool.commonPool()); } }
We have two additional system properties for controlling the common pool. If you
would
like to handle uncaught exceptions, you could specify the handler class with
java.util.concurrent.ForkJoinPool.common.exceptionHandler
. And if you would
prefer to have your own thread factory, that is configured with
java.util.concurrent.ForkJoinPool.common.threadFactory
. The default thread
factory for Fork/Join pool uses daemon threads, which you might want to avoid in your
application. Be careful if you do that - you cannot shut down the common pool!
Whilst I was writing this newsletter, the article "What's Wrong in Java 8, Part III: Streams and Parallel Streams" by Pierre-yves Saumont arrived in my inbox. In it, he writes: "... by default, all streams will use the same ForkJoinPool, configured to use as many threads as there are cores in the computer on which the program is running." However, as we saw in this newsletter, the default number is typically the number of hardware threads minus one, not cores! In addition, it might sometimes instead be the number of sockets minus one. He does point out an issue, in that once the pool has reached the desired parallelism level, it will no longer create new threads, even if there are lots of tasks waiting. Imagine for example a Java 8 application server that extensively uses parallel streams. They would all share the same common pool, potentially causing a bottleneck in the application server!
It is still early days for Java 8 adoption and I'm in no great hurry to rush a course out the door. You might find this hard to believe, but some of my customers still use Java 1.4.2. I even have someone coding Java 1.1. We are seeing the more adventurous customers slowly moving over to Java 7 now. It is a pity, as I really like some of the syntactic sugar of Java 8 and will be writing some newsletters soon about my findings.
Kind regards from Crete
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.