Abstract: Whenever we have thread pools in our system, we need to make their size configurable. In this follow-up, we continue looking at some issues with the common fork/join pool that is used in Java 8.
A quick follow-up to the previous newsletter on Runtime.getRuntime().availableProcessors().
The parallelism of the common pool is
typically calculated as availableProcessors() - 1, unless we
specify it differently as a system property. However, if you run Java
on a single-core machine, it sets the common pool parallelism to 1,
whereas the more correct value would be zero. You might wonder whether
anybody on this planet still has single-core machines? Actually, it has
become vogue to give developers shared virtual development environments,
where they are allocated a single core but oodles of memory. With
virtualized boxes, it is quite likely that you will have just one core.
The problem comes in when code such as the parallelSort algorithm tries
to make a decision whether to run in parallel or to default to the old
sort mechanism. In the Arrays.parallelSort() method, they look at the
value of ForkJoinPool.getCommonPoolParallelism()
. Since
the value for a single-core and dual-core system is the same (1), it calls
the default Arrays.sort() method. I thus stand by my assertion that
the 30% performance boost claim on a dual-core machine is at best thumb
sucked.
However, for streams it works a little bit better. Since for a dual-core system, we have a Fork/Join pool of size 1, we actually will have two threads working: 1. The thread in the pool and 2. the submitting thread. I made the typical mistake where you are asked to count all the people in the room and you forget to include yourself in the total :-) So parallel() does work - sort of. It has the correct number of workers based on the value that is returned by availableProcessors(). Except of course if you have a single-core machine, in which case parallel() will use two threads instead of just the one that is mapped to a physical core.
Pierre-yves Saumont kindly sent me a piece of code that proves that parallel() defaults to the availableProcessors() value (with some slight modifications by me - any mistakes are my fault :-))
import java.util.*; import java.util.concurrent.*; import java.util.stream.*; public class CountThreads { private static Map<String, Integer> map = new ConcurrentHashMap<>(); public static void main(String... args) { System.out.println(IntStream.range(1, 1_000_000). parallel().filter(CountThreads::isPrime).count()); map.forEach((k, v) -> System.out.println(k + ": " + v)); } public static boolean isPrime(int n) { map.compute(Thread.currentThread().getName(), (k, v) -> v == null ? 1 : v + 1); if (n % 2 == 0) return false; for (int i = 3; i * i <= n; i += 2) { if (n % i == 0) { return false; } } return true; } }
On a dual-core and single-core systems, you'd see the following output:
78498 ForkJoinPool.commonPool-worker-1: 499999 main: 500000
On my 1-4-2 machine, with 8 hyperthreads, we see that 8 threads were part of the work:
78498 ForkJoinPool.commonPool-worker-7: 125000 ForkJoinPool.commonPool-worker-1: 125000 ForkJoinPool.commonPool-worker-2: 125000 main: 125000 ForkJoinPool.commonPool-worker-5: 125000 ForkJoinPool.commonPool-worker-6: 156249 ForkJoinPool.commonPool-worker-3: 125000 ForkJoinPool.commonPool-worker-4: 93750
If you measure the performance of something like parallel sort, which requires lots of memory access, you might find that your hyperthreads will give you close to linear speedup. This is because the biggest bottleneck is the speed at which the cache lines can be filled. Hyperthreading allows some of that to happen in parallel. Finding prime numbers is less dependent on memory and thus you would not see as much of a speedup beyond your physical cores.
In addition, the sorting algorithm for parallelSort() is different. For some of the data sets that I tried, a single-threaded Arrays.sort() was much faster than an 8-threaded Arrays.parallelSort(), especially if the data set was already sorted.
Lastly, I tried writing the CountThreads code with the Fork/Join framework. It came out looking like this:
import java.util.concurrent.*; public class CountThreadsForkJoin { public static void main(String... args) { System.out.println(ForkJoinPool.commonPool().invoke( new PrimeSearcher(1, 1_000_000))); } private static class PrimeSearcher extends RecursiveTask<Integer> { private static final int THRESHOLD = 10000; private final int from; private final int to; public PrimeSearcher(int from, int to) { this.from = from; this.to = to; } protected Integer compute() { if (to - from < THRESHOLD) { return serialCount(); } int from0 = from; int to0 = from + (to - from) / 2; int from1 = to0 + 1; int to1 = to; PrimeSearcher ps0 = new PrimeSearcher(from0, to0); PrimeSearcher ps1 = new PrimeSearcher(from1, to1); ps0.fork(); return ps1.invoke() + ps0.join(); } private Integer serialCount() { int count = 0; for (int i = from; i < to; i++) { if (CountThreads.isPrime(i)) count++; } return count; } } }
I would like to meet the person who thinks that Java 8 streams are not an improvement over the pain of manually coding Fork/Join:
import java.util.stream.*; public class CountThreadsStream { public static void main(String... args) { System.out.println(IntStream.range(1, 1_000_000). parallel().filter(CountThreads::isPrime).count()); } }
But time will tell how many use cases we can find for parallel() in the real world.
Kind regards from Crete
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.