Abstract: Some Map implementations allow null keys and values. This leads to funky behaviour when calling putIfAbsent() and the compute functions. In this newsletter we look a bit more closely at the issues at hand when allowing nulls in maps.
Welcome to the 303rd edition of The Java(tm) Specialists' Newsletter, sent from the wonderful Island of Crete. Last month we had a small version of JCrete, with some of the folks staying on for a few days. We socialized like we haven't for two years, and as a result, I missed the July edition of this newsletter. But we are back!
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
A few days ago, my friend Dmitry Vyazelenko mentioned the strange behavior of nulls in maps. That tweet inspired this newsletter. Thanks Dmitry :-)
When I started coding in Java, the only collections we had
available were Vector
and Hashtable
. Anything else, we had to
write ourselves. Well, there was also Stack
, a subclass of
Vector
, and Properties
, a subclass of Hashtable
. But we did
not use them much. At least in those days job interviews were
quick: "If you needed to store key/value pairs, which data
structure would you use?" "Um, Vector?" "Sorry, next
candidate please."
Soon a collection framework was created to fill the gap. It was based on the standard template library for C++ and was named JGL by a company called ObjectSpace. It had some cool features like one-to-many mappings in maps, similar to Eclipse Collection's Multimap. We used JGL for a while, but when Java 1.2 was released, switched over to the Java Collections framework.
Old Hashtable
had a quirk that we could insert neither a null
key nor a null
value. If we desperately wanted to add null
,
we would create a dummy object representing null
and add
that instead, and then convert that back to null
if we
found it. It was messy, and we wished for the ability to add
null
. Be careful what you wish for, it may be granted!
The Map interface specified that an implementation might
accept null
keys and values, but it may also reject them.
Thus we might see NullPointerException
s on put()
or get()
.
When the Map
interface was introduced in Java 1.2,
we had four Map
implementations with five
different behaviours. What fun:
Hashtable
would always throw
a NullPointerException
for either null
keys or values.
HashMap
would be happy with both null
keys and
values.
TreeMap
would accept null
values, but would
throw a NullPointerException
with null
keys. However, if
the very first pair had a null
key, then it accepted it
happily. And if we created the TreeMap
with a Comparator
that worked with null
, then the tree would accept a null
key.
WeakHashMap
would accept null
keys, but would
never remove that entry again.
In Java 1.4, LinkedHashMap
and IdentityHashMap
were added,
and these behaved the same as HashMap
.
In Java 5, several things happened. First off, generics were
added to the language. These were sneaked in without breaking
backwards compatibility. Also a ConcurrentHashMap
was added,
which had much faster read performance than the Hashtable
.
The ConcurrentHashMap
used an internal lock, thus we could no
longer participate in the locking from outside. Hashtable
synchronized on "this". With ConcurrentHashMap
, any compound actions would
need to be supported by the map itself. However, in those
days we did not have default methods in interfaces, and so a
new interface was born, the ConcurrentMap
. The most useful of
these compound methods was putIfAbsent()
. The only public
implementations of the ConcurrentMap
in the JDK are the
ConcurrentHashMap
from Java 5, and later the
ConcurrentSkipListMap
from Java 6. They both throw a
NullPointerException
if we attempt to add a null
key or
value. However, the Javadoc for ConcurrentMap
does not
prevent an implementation from accepting null
, but when the
map does allow null
, the behavior becomes a bit funky. The
meaning of "absent" is that the key either does not exist at
all or that the value is null
. Thus the following two
snippets of code are not equivalent if we can have null
values:
if (!map.containsKey(-1)) map.put(-1, 0L);
if (map.get(-1) == null) map.put(-1, 0L);
However, if we want to be pedantic, only the second one can
be converted to putIfAbsent()
without change in semantics.
For example, if we have put (-1, null)
into the map, then the
first code snippet would not overwrite it, but the second
would.
We see here that if the map contains a mapping to (-1,
,
then it is not overwritten in the first if statement, but it
is in the second:
null
)
public class PutIfAbsentFun { public static void main(String... args) { Map<Integer, Long> map = HashMap.newHashMap(1); // Java 19 map.put(-1, null); // start with a null value System.out.println("map = " + map); // {-1=null} if (!map.containsKey(-1)) map.put(-1, 0L); System.out.println("map = " + map); // {-1=null} if (map.get(-1) == null) map.put(-1, 0L); System.out.println("map = " + map); // {-1=0} } }
At first glance, it seems that computeIfAbsent()
works like
putIfAbsent()
. If the value is null
, then the computation is
done. However, there is a subtlety. If the computation
function returns null
, then the entry is not inserted into
the map. And if we use computeIfPresent()
or compute()
and
their function return null
, then the entry is removed
from the map. This behaviour makes sense if maps do not allow
null
keys and values. But it can get confusing when they do.
public class ComputeIfFun { public static void main(String... args) { Map<Integer, Long> map = HashMap.newHashMap(1); System.out.println("map = " + map); // {} map.computeIfAbsent(-1, key -> { System.out.println("computing the value"); return null; // this will not add an entry into the map }); System.out.println("map = " + map); // {} map.computeIfAbsent(-1, key -> { System.out.println("computing the value"); return 0L; // the map now contains -1=0L }); System.out.println("map = " + map); // {-1=0L} map.computeIfPresent(-1, (key, value) -> { System.out.println("computing the value again"); return null; // this removes the entry from the map }); System.out.println("map = " + map); // {} map.put(-1, null); System.out.println("map = " + map); // {-1=null} map.compute(-1, (key, value) -> null); // removes the entry System.out.println("map = " + map); } }
The merge() function is subtly different, yet still mostly
the same. The value parameter cannot be null
, otherwise we
get a NullPointerException
. However, if the value in the map
is null
, then the merge()
function is not called and instead,
the value is simply added. Similarly to the compute()
methods, if our function returns null
, then the entry is
removed. Here is an example:
public class MergeFun { public static void main(String... args) { Map<Integer, Long> map = HashMap.newHashMap(1); System.out.println("map = " + map); // {} map.put(-1, null); System.out.println("map = " + map); // {-1=null} map.merge(-1, 1L, (value1, value2) -> 2L); System.out.println("map = " + map); // {-1=1} map.merge(-1, 1L, (value1, value2) -> 2L); System.out.println("map = " + map); // {-1=2} map.merge(-1, 1L, (value1, value2) -> null); System.out.println("map = " + map); // {} } }
The compound methods remove(key, value)
,
replace(key, value)
, and replace(key, oldValue, newValue)
treat a null
value as being present:
public class ReplaceRemoveFun { public static void main(String... args) { Map<Integer, Long> map = HashMap.newHashMap(1); System.out.println(map.replace(-1, null)); // null map.put(-1, 0L); System.out.println(map.replace(-1, null)); // 0 System.out.println(map.replace(-1, null, null)); // true System.out.println(map.remove(-1, null)); // true System.out.println(map.remove(-1, null)); // false } }
Allowing null
keys in maps has some interesting implications.
I sent out some Twitter polls to see how many programmers
would know the correct answers:
What
does new TreeMap().put(null, null)
produce? with
possible answers of "No output" (15%), NullPointerException
(61%) and "Either, depending on ..." (24%). The correct
answer is the last, because it depends on the Java version.
Versions 1.2 through 1.6 they accepted the first key to be
null
, and would only throw a NullPointerException
on
subsequent entries. This was made more consistent in Java 1.7
to always throw a NullPointerException
unless the
TreeMap
was created with a Comparator
that could cope with
null
keys. See Bug
5045147.
In
a WeakHashMap.put(null, "Hello world")
, when will the entry be removed in #Java?
with possible answers of "Won't be added" (35%),
"At the next GC" (14%), "A while after the next GC" (14%),
and "Never" (37%). The correct answer is the last one,
since when the key is null
, a private static final object is
used instead as a key. Since this object is always reachable,
the WeakReference
will never be cleared, and thus the entry
will remain in the map forever.
What
is the computational time complexity of HashMap.get(null)
in #Java, where n is the number of keys with a hash code of
0? Possible answers were "linear - O(n)" (29%),
"quadratic - O(n²)" (3%), "constant - O(1)" (47%), and
"logarithmic - O(log n)" (21%). The correct answer was the
first - linear. Since Java 8, when a lot of keys have the
same hash code, the HashMap
builds up a red-black tree of
entries. Thus ordinarily the complexity would be logarithmic.
However, this is only true if the key is Comparable
. Since
the key is null
, it does not implement Comparable
, and thus
the map needs to search for it linearly.
A little example for the last quiz, where we create a lot
of Strings with hashCode=0 and then also add a null
key. This
time we are using a HashSet
instead, which internally
delegates to a HashMap
:
public class ZeroHashFun { private static final String[] zeroHashCodes = { "ARbyguv", "ARbygvW", "ARbyhVv", "ARbyhWW", "ARbzHuv", "ARbzHvW", "ARbzIVv", "ARbzIWW", "ARcZguv", "ARcZgvW", "ARcZhVv", "ARcZhWW", "ASCyguv", "ASCygvW", "ASCyhVv", "ASCyhWW", "ASCzHuv", "ASCzHvW", "ASCzIVv", "ASCzIWW", "ASDZguv", "ASDZgvW", "ASDZhVv", "ASDZhWW",}; public static void main(String... args) { for (int i = 16; i <= 24; i++) { System.out.println("Testing with size " + (1 << i)); test(1 << i); } } private static void test(int size) { HashSet<String> set = HashSet.newHashSet(size); for (int i = 0; i < size - 1; i++) { StringBuilder sb = new StringBuilder(); for (int j = 1, index = 0; j <= i; j <<= 1, index++) { if ((i & j) != 0) sb.append(zeroHashCodes[index % zeroHashCodes.length]); } set.add(sb.toString()); } set.add(null); // also adding a null key System.out.println("set.size() = " + set.size()); System.out.println("Searching for key not in the set"); String notInMap = zeroHashCodes[1] + zeroHashCodes[0]; for (int repeats = 0; repeats < 10; repeats++) { testLookup(set, notInMap); } System.out.println("Searching for null key"); for (int repeats = 0; repeats < 10; repeats++) { testLookup(set, null); } } private static void testLookup(Set<String> set, String key) { long time = System.nanoTime(); try { set.contains(key); } finally { time = System.nanoTime() - time; System.out.printf("time = %dms%n", (time / 1000000)); } } }
Summary of the output. For each of the times, we only show the median. It is clear though from the timing that the contains() method has linear compexity to the number of keys with hashCode==0, since as we double that size, we also take twice as long to call contains():
Testing with size 65536 set.size() = 65536 Searching for key not in the set time = 0ms Searching for null key time = 0ms Testing with size 131072 set.size() = 131072 Searching for key not in the set time = 0ms Searching for null key time = 3ms Testing with size 262144 set.size() = 262144 Searching for key not in the set time = 0ms Searching for null key time = 7ms Testing with size 524288 set.size() = 524288 Searching for key not in the set time = 0ms Searching for null key time = 13ms Testing with size 1048576 set.size() = 1048576 Searching for key not in the set time = 0ms Searching for null key time = 25ms Testing with size 2097152 set.size() = 2097152 Searching for key not in the set time = 0ms Searching for null key time = 55ms Testing with size 4194304 set.size() = 4194304 Searching for key not in the set time = 0ms Searching for null key time = 104ms Testing with size 8388608 set.size() = 8388608 Searching for key not in the set time = 0ms Searching for null key time = 230ms Testing with size 16777216 set.size() = 16777216 Searching for key not in the set time = 0ms Searching for null key time = 454ms
(Even though this could be classified as an extremely mild
threat, I don't think I'll bother submitting a bug report.
It takes longer to create the set than to call contains()
.)
None of this weirdness with null
keys and values would happen
if we used maps that did not allow them in the first place.
My concluding recommendation thus is to use ConcurrentHashMap
instead of HashMap
and ConcurrentSkipListMap
instead of
TreeMap
. No more null
weirdness, and race conditions are also
taken care of.
I hope you enjoyed reading this as much as I enjoyed writing it :-)
Kind regards from Chorafakia
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.