Abstract: When I first started using Java in 1997, I needed very large hash tables for matching records as quickly as possible. We ran into trouble when some of the keys were mutable and ended up disappearing from the table, and then reappearing again later.
Welcome to the 31st issue of The Java(tm) Specialists' Newsletter, where we look at how Hash Sets can become really confused. I love hearing from my readers, so please send your comments, praises, criticisms, accolades. The best compliment to me is if you forward this newsletter to other Java fans, for example to your local Java User Group (JUG).
I remember hearing last year that Visual Basic (shudder) was the fastest growing computer language. A report this year found that Java is now growing faster than Visual Basic, which frankly shudders me even more. There are apparently about 2 million people on this globe hacking away at our poor, dear language. Rather discouraging is that my newsletter is only reaching a meager 0.065% of Java programmers. Then again, this is supposed to be The Java Specialists' Newsletter, so maybe there aren't that many of us out there who should be considered specialists ;-)
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
A few weeks ago, I received an email from Charles N. May
who pointed out that it is quite easy to change an Integer using
reflection in the same way that I showed in my
article on Insane Strings. This is only possible
because for some strange reason, or no reason at all, the data
member value
within the Integer class was not
declared final
.
Charles demonstrated the possible repercussions quite nicely by
inserting the same Integer object several times into a
java.util.HashSet
, each time with different values.
The contract of the java.util.Set
says:
java.util.Set: A collection that contains no duplicate elements.
More formally, sets contain no pair of elements e1
and
e2
such that e1.equals(e2)
, and at most
one null element.
It then goes on to say:
Note: Great care must be exercised if mutable objects are used as set elements. The behavior of a set is not specified if the value of an object is changed in a manner that affects equals comparisons while the object is an element in the set.
Interesting, but how do we define mutable, or immutable for that matter? I searched various Java texts but could not find a definition of what is meant by that. Let's try and define it for our use:
Immutable Class: A class where the state cannot be changed.
Simple enough, but what does state refer to? Is it a state descriptor of all the data members in the class? Or is it just the identity of the class? For example, let's consider the class Employee, which is part of the middle class ...
public class Employee { private final String name; private double salary; public Employee(String name, double salary) { this.name = name; this.salary = salary; } public String getName() { return name; } public double getSalary() { return salary; } public void setSalary(double salary) { this.salary = salary; } }
Is the Employee class mutable? If we say that the state
consists of a conglomeration of all the data members then the
answer is "Yes". If we say the state consists of the identity,
the answer is "No", as we cannot change the state. However, it
will get even more confusing as we also have to define
identity. I would say that the identity is the part of
the class that determines the value of e1.equals(e2)
and e1.hashCode()
. So, in Employee above, what is
the identity? Is it name
? Nope, it is actually the
location of the object in memory. If we want the identity to be
according to the name, we will have to write equals()
and hashCode()
methods, the details of which are
below the scope of TJSN.
Since mutable is so hard to pin down, we should actually
rewrite the javadocs for the java.util.Set
interface:
Note: Great care must be exercised if objects with a mutable identity are used as set elements. The behaviour of a set is not specified if the value of an object is changed in a manner that affects equals comparisons or its hashcode while the object is an element in the set.
By the description above, we should be careful with objects like
String, Integer, etc. where the identity can be changed at runtime
using reflection. Alright, it is bad coding practice to change
private data members using reflection, but Integer could easily
have fitted the above description if the value
had
been final
. Then again, as discussed in
my "final" newsletter,
it is also bad coding practice to not mark data members
final when they should be. Incidentally, java.lang.String
is a lost cause - we can never make that properly immutable since
its identity is contained in an array and arrays are mutable.
Once the 1'998'700 monkeys who don't read this newsletter discover
you can change Strings, we are all lost!
So, what happens when you change the identity of an object while it is in the HashSet? Let's have a look:
import java.lang.reflect.Field; // I can't resist using that package import java.util.*; public class HashHashGone { // Did you know that you can make a static final data member // without a value (i.e. blank), as long as you set the value // in the static initializer? I love making every possible // field final as it smokes out a lot of bugs. private static final Field intValue; static { try { intValue = Integer.class.getDeclaredField("value"); intValue.setAccessible(true); } catch(NoSuchFieldException ex) { // we throw a RuntimeException from the static initializer // this will actually generate an Error and kill the thread // that first accessed GhostSet. throw new RuntimeException( "Serious error - no \"value\" field found in Integer"); } } // This method changes the integer passed into the method to // the new value. private static void setInteger(Integer i, int newInt) { try { intValue.set(i, new Integer(newInt)); } catch (IllegalAccessException ex) { throw new RuntimeException( "Serious error - field should have been accessible"); } } // This method will be used later to print a detailed view of // the set, including the value, the class and the identity // hash code of the object. private static void printDetailed(Set set) { System.out.print("["); Iterator it = set.iterator(); while(it.hasNext()) { Object val = it.next(); System.out.print("\t(toString()=" + val + ",class=" + val.getClass().getName() + ",identityHashCode=" + System.identityHashCode(val) + ")"); if (it.hasNext()) System.out.print(", "); System.out.println(); } System.out.println("]"); } public static void main(String[] args) { if (args.length != 2) { System.out.println( "arguments: <initial-intvalue> <num-copies-to-insert-in-Set>"); return; } int initialValue = Integer.parseInt(args[0]); int numberOfCopiesToInsert = Integer.parseInt(args[1]); Integer x = new Integer(initialValue); int currValue = initialValue; Set set = new HashSet(); for (int i = 0; i < numberOfCopiesToInsert; i++) { setInteger(x, ++currValue); set.add(x); } setInteger(x, initialValue); System.out.println("here's a set containing " + numberOfCopiesToInsert + " copies of Integer(" + x + "): "); System.out.println(set); System.out.println("detailed view of set:"); printDetailed(set); System.out.println("and does it contain that Integer?: " + set.contains(x)); System.out.println("can the Integer be removed from the Set?"); System.out.println(set.remove(x)); System.out.println("the Set contents after attempted remove:"); System.out.println(set); setInteger(x, -initialValue); System.out.println( "altering the Integer to its opposite makes the Set contain:"); System.out.println(set); setInteger(x, initialValue); currValue = initialValue; for (int i = 0; i < numberOfCopiesToInsert; i++) { setInteger(x, ++currValue); set.remove(x); } System.out.println("now all the elements have been removed " + "from the Set as it contains:"); System.out.println(set); System.out.println(); } }
When I run this with java HashHashGone 42 5
, I get
the following output:
here's a set containing 5 copies of Integer(42): [42, 42, 42, 42, 42] detailed view of set: [ (toString()=42,class=java.lang.Integer,identityHashCode=6483656), (toString()=42,class=java.lang.Integer,identityHashCode=6483656), (toString()=42,class=java.lang.Integer,identityHashCode=6483656), (toString()=42,class=java.lang.Integer,identityHashCode=6483656), (toString()=42,class=java.lang.Integer,identityHashCode=6483656) ] and does it contain that Integer?: false can the Integer be removed from the Set? false the Set contents after attempted remove: [42, 42, 42, 42, 42] altering the Integer to its opposite makes the Set contain: [-42, -42, -42, -42, -42] now all the elements have been removed from the Set as it contains: []
The HashSet actually contains a HashMap that keeps the entries as
keys and values. The HashMap contains a rehash()
method, which is called when the table grows past the threshold
determined by the number of entries and fill factor. However,
contrary to what I believed until I tried it out today, the rehash
method does not consider the possibility that there may be two
objects in different hash positions, but equal, in the table.
Re-hashing will therefore not solve the problem described above.
Moral of the story, assuming there is still morality left in this world? (btw, have you ever considered how similar the words mortal, moral and mutable are?) OK, the moral is? Never create a class where the identity can be changed once the object has been created.
That's all for this week. Please take a moment to think of how I could improve this newsletter and pop me an email.
Regards
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.