Abstract: One of the most significant changes in JDK 1.4 is regarding how the HashMap works internally. Instead of the remainder function, it now uses bit masking to assign keys to buckets. This change has implications on how we write our hashCode() method.
Welcome to the 54th edition of The Java(tm) Specialists' Newsletter sent to 4289 Java Specialists in 85 countries. I seem to have to explain with each newsletter why you have not heard from me for soooo long. Last night I was speaking to a Java guru at Dimension Data in Cape Town, and he mentioned that somehow he had gotten unsubscribed from my newsletter. "No", said I, "you are still subscribed. It has just been rather quiet lately."
We have now finally completed moving into our new home, which includes an office with a view across False Bay all the way to Cape Point (when the weather is nice, which is most of the time :-), so perhaps, the newsletters will appear more frequently. Last week I had to go on an emergency mission to Europe to try help solve some Java deadlock in a very large piece of code. If you get stuck with a deadlock in Java, or even worse, a livelock, you are welcome to contact me and I will see if I can be of help.
Calling all Mac fans & haters! - I am considering buying a Mac notebook for Java development and to develop my courseware. If you have any horror stories (or success stories) regarding Java and Mac OS X, please let me know.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
My explorations into the JDK 1.4 source code did not come unrewarded. Within a day of starting my expedition, I stumbled across something that ended up baffling several Java experts, including yours truly: java.util.HashMap now always has a power-of-two number of buckets, irrespective of what size and load factor you specify!!!
Conventional wisdom says that the buckets in a hash table should be a prime number size. That's what Craig Larman and Rhett Guthrie argued in their excellent (albeit a bit dated) book on Java Performance. The reason why the number of buckets should be prime is so that the hash values are well distributed, even when the hash function is not very well distributed. I'm sure you will find the theory behind it in some dusty CompSci 101 textbook somewhere. The worst possible number of buckets is a power-of-two! So, why on earth did Sun Microsystems deliberately change java.util.HashMap to force it to have a power-of-two number of buckets? (No, "Sun wants to sell more hardware" is not the correct answer ;-)
I posed this question to Bruce
Eckel, who also did not know the reason for this. However,
he knew Joshua Bloch, the author of java.util.HashMap
, so we
asked him. Before reading the answer, spend a few minutes
thinking about why this could be.
But before I expound the reason, and the problems that came with the change, I want to mention a problem that someone on JavaDesk (an advanced yahoogroups Java User Group with excellent content) mentioned: Since JDK 1.3, integer division and remainder calculation has become really slow. I've written a basic microbenching framework based on some code originally posted in the JavaDesk group, and also shown in the bug report where the slowdown was described. One has to be careful with Microbenchmarking, as you can read up in this article from the San Francisco conference.
We start with a simple interface
/** * Benchmark contains some calculation that is executed inside * the doCalculation() method a certain number of times. The * number of iterations is returned from the method. */ public interface Benchmark { int doCalculation(); }
Add to it a little performance testing routine:
public class PerformanceTest { private static final int REPEATS = 10; public static void main(String[] args) { if (args.length != 1) { usage(); } System.out.println( "JVM version:" + System.getProperty("java.version")); try { evaluate((Benchmark)Class.forName(args[0]).newInstance()); System.out.println(); } catch (Exception ex) { ex.printStackTrace(); usage(); } } private static void usage() { System.err.println( "usage: java PerformanceTest BenchmarkClass"); System.err.println( "\tBenchmarkClass is a class implementing Benchmark"); System.exit(1); } private static void evaluate(Benchmark benchmark) { // do the function once to "warm up" HotSpot compiler benchmark.doCalculation(); long average = 0; for (int i = 0; i < REPEATS; i++) { long time = -System.currentTimeMillis(); int iterations = benchmark.doCalculation(); time += System.currentTimeMillis(); System.out.print(iterations / time); System.out.print(" "); System.out.flush(); average += iterations / time; } System.out.println(); System.out.println( "Average " + (average / REPEATS) + " iterations per millisecond"); } }
The most basic test that I want to show is a simple empty loop.
This "benchmark" is really quite useless - any half-decent compiler
should optimize it away anyway. In addition, if you unroll the
loop by 16 steps, you end up doing nothing 16 times and slice the
loop into a 16th of the length. The reason for testing this is
simply to have some indication of whether my other tests are actually
going to measure anything besides the for
loop.
Please also note that in general each version of JDK makes
programs run faster, but that you will always find some benchmark
which seems to demonstrate that the old JDK was faster.
public class EmptyLoopBenchmark implements Benchmark { private static final int ITERATIONS = 10 * 1000 * 1000; public int doCalculation() { for (int i = 0; i < ITERATIONS; i++) { ; // NOP } return ITERATIONS; } }
The results are interesting, although they don't really tell us very much:
JVM version:1.2 333333 333333 333333 200000 333333 ... Average 311056 iterations per millisecond JVM version:1.3.1_03 250000 200000 250000 166666 250000 ... Average 217941 iterations per millisecond JVM version:1.4.0 142857 166666 166666 125000 166666 ... Average 151785 iterations per millisecond JVM version:1.4.1-beta 82644 76923 76923 76923 83333 ... Average 78926 iterations per millisecond
Is the JDK 1.4.1 then slower than the JDK 1.2 ?!? Nope, not in
general, but as I said, with benchmarks you can find strange
things. However, let's look at a benchmark on addition. We have
to use the val
variable after the loop, otherwise a clever compiler
could optimize the whole loop away.
public class AdditionBenchmark implements Benchmark { private static final int ITERATIONS = 10 * 1000 * 1000; private int memory; public int doCalculation() { int val = 0; for (int i = 0; i < ITERATIONS; i++) { val = val + i; } memory = val; return ITERATIONS; } }
The results on my little Asus Notebook are as follows:
JVM version:1.2 333333 333333 200000 333333 500000 ... Average 344999 iterations per millisecond JVM version:1.3.1_03 200000 200000 200000 243902 200000 ... Average 209390 iterations per millisecond JVM version:1.4.0 125000 125000 125000 123456 125000 ... Average 123853 iterations per millisecond JVM version:1.4.1-beta 83333 76923 76335 90909 76923 ... Average 79486 iterations per millisecond
These results at best demonstrate that addition is blindingly fast, as we would expect. Note that we have to compare the addition benchmark to the empty loop benchmark to realise that the actual addition operation is negligible. However, what about remainder, the % operator?
public class RemainderBenchmark implements Benchmark { private static final int ITERATIONS = 10 * 1000 * 1000; private int memory; public int doCalculation() { int val = 0; for (int i = 0; i < ITERATIONS; i++) { val = i % 11; } memory = val; return ITERATIONS; } }
Here we see that performance has dropped significantly since JDK 1.2, as described in the bug report mentioned earlier:
JVM version:1.2 62500 62500 62111 58823 66666 ... Average 62520 iterations per millisecond JVM version:1.3.1_03 17513 17513 17513 17211 17513 ... Average 17457 iterations per millisecond JVM version:1.4.0 14903 16920 16638 15360 16366 ... Average 16051 iterations per millisecond JVM version:1.4.1-beta 17211 17513 17211 16920 16920 ... Average 17217 iterations per millisecond
These results is somewhat disturbing! Isn't that one of the important functions that gets executed in the traditional java.util.HashMap class? The engineers at Sun (Joshua Bloch and Doug Lea) decided to rather use a bit masking approach instead of remainder, because that is much faster:
public class MaskingBenchmark implements Benchmark { private static final int ITERATIONS = 10 * 1000 * 1000; private int memory; public int doCalculation() { int val = 0; for (int i = 0; i < ITERATIONS; i++) { val = i & 0x000000ff; } memory = val; return ITERATIONS; } }
You can see that the performance values are far more encouraging:
JVM version:1.2 166666 142857 166666 166666 125000 ... Average 158416 iterations per millisecond JVM version:1.3.1_03 142857 142857 100000 140845 111111 ... Average 131624 iterations per millisecond JVM version:1.4.0 111111 142857 125000 125000 123456 ... Average 128813 iterations per millisecond JVM version:1.4.1-beta 76923 76923 83333 76335 83333 ... Average 80849 iterations per millisecond
In order to protect Sun against claims by the intellectual
proletariat ;-), who have up to now managed to get away with
writing bad hash functions, Josh wrote a super-fast
rehash()
function inside java.util.HashMap
that attempted to redistribute the bits a little bit. Good
hash code writers were rewarded with a nice improvement in
performance for java.util.HashMap.get()
in
JDK 1.4.0 thanks to the new masking trick:
import java.util.*; /** * The lower-order bits are used as part of this hashcode * which is good if you are using JDK 1.4.0 */ public class GoodHashcodeBenchmark implements Benchmark { private static final int ITERATIONS = 1000 * 1000; private Object memory; private HashMap map = new HashMap(203); private Integer[] values; public GoodHashcodeBenchmark() { for (int i=0; i < 1000; i++) { map.put(new Integer(i), "Value"); } TreeSet keys = new TreeSet(map.keySet()); values = (Integer[])keys.toArray(new Integer[1000]); } public int doCalculation() { for (int i = 0; i < ITERATIONS; i++) { memory = map.get(values[i%1000]); } return ITERATIONS; } }
When we run this benchmark, we get the following values:
JVM version:1.2 5555 5882 5882 5524 5555 5555 5555 5847 5555 6250 Average 5716 iterations per millisecond JVM version:1.3.1_03 3846 3690 3846 3436 3846 3571 3831 3703 3690 3571 Average 3703 iterations per millisecond JVM version:1.4.0 6250 6250 5847 6250 5882 5882 6622 5882 5882 5882 Average 6062 iterations per millisecond JVM version:1.4.1-beta 4149 4347 4347 4329 4000 4329 4347 4545 4149 4347 Average 4288 iterations per millisecond
Despite the slowdown of JDK 1.[34].x with regards to remainder calculations, Joshua Bloch and Doug Lea still managed to actually make HashMap.get() function in the JDK 1.4.0 run faster than in the JDK 1.2! The observant reader would have noticed that JDK 1.4.1 runs slower than JDK 1.4.0 but faster than JDK 1.3.1. Before I explain why, let's have a look at what happens when you write a bad hash function:
import java.util.*; /** * The lower-order bits are NOT used as part of this hashcode * which is bad if you are using JDK 1.4.0 */ public class BadHashcodeBenchmark implements Benchmark { private static final int ITERATIONS = 1000 * 1000; private Object memory; private HashMap map = new HashMap(203); private Integer[] values; public BadHashcodeBenchmark() { for (int i=0; i < 1000; i++) { map.put(new Integer(i * 1024), "Value"); } TreeSet keys = new TreeSet(map.keySet()); values = (Integer[])keys.toArray(new Integer[1000]); } public int doCalculation() { for (int i = 0; i < ITERATIONS; i++) { memory = map.get(values[i%1000]); } return ITERATIONS; } }
The result is obvious. Since the new JDK 1.4.0 HashMap is ultra-sensitive to lower-order bits being the same, due to it using masking instead of remainder, the HashMap basically becomes a linked list and the lookup for a single entry becomes O(n) instead of O(1). Oops:
JVM version:1.2 5524 5555 5882 5882 5524 5882 5555 5555 5524 5555 Average 5643 iterations per millisecond JVM version:1.3.1_03 3571 3690 3571 3703 3558 3703 3558 3703 3571 3690 Average 3631 iterations per millisecond JVM version:1.4.0 173 172 154 171 173 171 168 167 172 172 Average 169 iterations per millisecond JVM version:1.4.1-beta 4149 4347 4347 4149 4166 4347 4329 4347 4347 4329 Average 4285 iterations per millisecond
Bug reports started surfacing on the
bug
parade saying that the HashMap in JDK 1.4.0 had a bug that caused it
to be very slow. The following is an excerpt from a three-way conversation
between Joshua Bloch, author of java.util.HashMap
, Bruce Eckel
and myself (quoted with permission):
Joshua Bloch: The downside of using a power-of-two is that the resulting hash table is very sensitive to the quality of the hash function (hashCode). It is imperative that any change in the input must affect the low order bits of the hash value. (Ideally, it should affect all bits of the hash value with equal likelihood.) Because we have no assurance that this is true, we put in a secondary (or "defensive") hash function when we switched to the power-of-two hash table. This hash function is applied to the results of hashCode before masking off the low order bits. Its job is to scatter the information over all the bits, and in particular, into the low order bits. Of course it has to run *very* fast, or you lose the benefit of switching to the power-of-two-sized table. The original secondary hash function in 1.4 turned out to be insufficient. We knew that this was a theoretical possibility, but we thought that it didn't affect any practical data sets. We were wrong. The replacement secondary hash function (which I developed with the aid of a computer) has strong statistical properties that pretty much guarantee good bucket distribution.
The rehash()
method in JDK 1.4.1 is quite interesting, but
let's hope that it does a good enough job. From my tests, it seems to
distribute the bits beautifully, but you will have to test (I don't know
how!) that your hash function is compatible with the rehash method of
HashMap. Obviously, having a more complicated rehash() method causes
a drop in performance (when the hash function was good anyway), but it
gives us a better average performance than JDK 1.4.0 and JDK
1.3.1.
If you need to squeeze the last ounce of performance out of the JDK and you need to do that with hash tables, you should probably write your own stripped-down version of HashMap that works specifically for your key-domain.
I now understand the reason for using power-of-two for HashMap buckets, but, I am also disappointed. The old mechanism was so much more predictable in terms of distribution of keys into buckets. How am I going to explain this new mechanism to Java beginners?
On the other hand, this new knowledge allows me (and you) to amble up to people who claim to know Java and say: "Did you know that HashMap in JDK 1.4 forces the number of buckets to be a power-of-two?" If the innocent victim responds: "Yes", you can follow-up your question with "Why?" Try it out - and let me know the response :-)))
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.