Abstract: The compareTo() function has three rules. Break any one of them and you might get an exception when you sort. In this newsletter we explore what these rules are and how they can affect sorting.
Welcome to the 269th edition of The Java(tm) Specialists' Newsletter, sent to you from a dry and warm Crete. A month ago, Helene asked me whether we should invite our friends for our Easter BBQ. The plan was that we would make the meat and everyone else would bring salads. We expected a big crowd. So we placed an order for a whole lamb which I would make on the spit. Since lambs are usually small on Crete, we asked for a largish one, about 12kg. Michalis phoned me on Saturday, saying that the lamb turned out to be 20kg. My heart sank. I'm a programmer, not a chef. 20kg is quite difficult to cook. Plus our initial plans did not materialize, so we had less than half the guests. I got up at 5:45 to light the fire and then started grilling at about 6:30. The coals were much too hot and the lamb immediately began turning brown. Not a good sign when you've got 5 hours of cooking to go! After much shoveling embers to and fro, we managed to produce a delicious lunch, albeit about 3x as much meat as we needed. It was a truly wonderful day though, with interesting conversation, good food and loyal friends.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
In our previous newsletter Puzzle: Is a Programmer
a Person? we had to figure out why
Collections.sort()
worked in some versions of Java and not in others. The first
complete answer came from Vassilis Bekiaris. Well done :-)
Before we delve into the explanation of the puzzle, we should point out that the modeling is wrong. Programmer is a characteristic of a physical entity. Favour composition over inheritance. If we wanted to now have another role, such as Manager, would we subclass Person again? And what if someone is a Manager and a Programmer (yes, that exists). In fact, I was chatting to a product manager yesterday about the value of staying "in the game" as a programmer, even if that isn't our primary job description.
Here are the Person and Programmer classes from our previous newsletter:
public class Person implements Comparable<Person> { private final String name; public Person(String name) { this.name = name; } public int compareTo(Person that) { return name.compareTo(that.name); } public String toString() { return name; } }
public class Programmer extends Person { private final short iq; public Programmer(String name, short iq) { super(name); if (iq < 120) throw new IllegalArgumentException( "Programmer IQ cannot be less than 120"); this.iq = iq; } public int compareTo(Person that) { if (that instanceof Programmer) { Programmer p = (Programmer) that; int result = iq - p.iq; // works because it's a short if (result != 0) return -result; // biggest IQ first } return super.compareTo(that); } public String toString() { return super.toString() + " (" + iq + ")"; } }
The rules for compareTo()
are:
sgn(x.compareTo(y)) == -sgn(y.compareTo(x))
for all x
and y
. (This implies that
x.compareTo(y)
must throw an exception iff
y.compareTo(x)
throws an exception.)
(x.compareTo(y)<0 && y.compareTo(z)<0)
implies
x.compareTo(z)<0
.
x.compareTo(y)==0
implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z))
, for
all z.
(x.compareTo(y)==0) == (x.equals(y))
. Generally speaking,
any class that implements the Comparable
interface and
violates this condition should clearly indicate this fact. The
recommended language is "Note: this class has a natural ordering that is
inconsistent with equals."
In our example of Person and Programmer, rules 2 and 3 are violated and suggestion 1 is not implemented. Here is a CompareTester that can verify the rules. The computational time complexity of the test methods range from quadratic to cubic, so please do not use this in production!
import java.util.*; public class CompareTester { public static <E extends Comparable<E>> void test( E[] comparables) { test(comparables, Comparator.naturalOrder()); } public static <E> void test(E[] comparables, Comparator<E> comparator) { rule1(comparator, comparables); rule2(comparator, comparables); rule3(comparator, comparables); suggestion1(comparator, comparables); } // Quadratic private static <E> void rule1(Comparator<E> comparator, E[] comparables) { for (E x : comparables) { for (E y : comparables) { int xy = comparator.compare(x, y); int yx = comparator.compare(y, x); if (sgn(xy) != -sgn(yx)) { System.out.printf("Rule 1 violated with" + " x=%s, y=%s%n", x, y); } } } } // Cubic private static <E> void rule2(Comparator<E> comparator, E[] comparables) { for (E x : comparables) { for (E y : comparables) { for (E z : comparables) { int xy = comparator.compare(x, y); int yz = comparator.compare(y, z); if (xy < 0 && yz < 0) { int xz = comparator.compare(x, z); if (!(xz < 0)) { System.out.printf("Rule 2 violated with " + "x=%s, y=%s, z=%s%n", x, y, z); } } } } } } // Cubic private static <E> void rule3(Comparator<E> comparator, E[] comparables) { for (E x : comparables) { for (E y : comparables) { for (E z : comparables) { if (comparator.compare(x, y) == 0) { int xz = comparator.compare(x, z); int yz = comparator.compare(y, z); if (sgn(xz) != sgn(yz)) { System.out.printf("Rule 3 violated with " + "x=%s, y=%s, z=%s%n", x, y, z); } } } } } } // Quadratic private static <E> void suggestion1(Comparator<E> comparator, E[] comparables) { for (E x : comparables) { for (E y : comparables) { int xy = comparator.compare(x, y); if ((xy == 0) != x.equals(y)) { System.out.printf("Suggestion 1 violated with" + " x=%s, y=%s%n", x, y); } } } } private static int sgn(int compareResult) { return Integer.compare(compareResult, 0); } }
In our PersonTester, we use the CompareTester to see how well the compareTo() method stacks up. Turns out, not too great:
public class PersonTester { public static void main(String... args) { Person[] people = { new Programmer("Aaron", (short) 130), new Person("Adolf"), new Programmer("Brian", (short) 180), new Person("Brian"), new Programmer("Cedric", (short) 120), new Programmer("Cedric", (short) 120), new Programmer("Zoran", (short) 200), }; CompareTester.test(people); } }
The result from running this is
Rule 2 violated with x=Aaron (130), y=Adolf, z=Brian (180) Rule 2 violated with x=Aaron (130), y=Adolf, z=Zoran (200) Rule 2 violated with x=Aaron (130), y=Brian, z=Zoran (200) Rule 2 violated with x=Adolf, y=Brian (180), z=Aaron (130) Rule 2 violated with x=Adolf, y=Zoran (200), z=Aaron (130) Rule 2 violated with x=Brian (180), y=Aaron (130), z=Adolf Rule 2 violated with x=Brian (180), y=Aaron (130), z=Brian Rule 2 violated with x=Brian, y=Zoran (200), z=Aaron (130) Rule 2 violated with x=Brian, y=Zoran (200), z=Brian (180) Rule 2 violated with x=Zoran (200), y=Aaron (130), z=Adolf Rule 2 violated with x=Zoran (200), y=Aaron (130), z=Brian Rule 3 violated with x=Brian (180), y=Brian, z=Aaron (130) Rule 3 violated with x=Brian (180), y=Brian, z=Zoran (200) Rule 3 violated with x=Brian, y=Brian (180), z=Aaron (130) Rule 3 violated with x=Brian, y=Brian (180), z=Zoran (200) Suggestion 1 violated with x=Brian (180), y=Brian Suggestion 1 violated with x=Brian, y=Brian (180) Suggestion 1 violated with x=Cedric (120), y=Cedric (120) Suggestion 1 violated with x=Cedric (120), y=Cedric (120)
For example, Programmer Aaron (130) comes before Person Adolf, as we are comparing only on name and "Aaron" comes before "Adolf". Person Adolf comes before Programmer Brian (180), as we are again only looking at the name. However, Programmer Brian (180) comes before Programmer Aaron (130). Since they are both programmers, the one with the highest IQ is on the left. We have thus violated rules 2 and 3.
We cannot correctly sort heterogenous lists, either in Java 6 or in later versions.
In Java 7, we changed from MergeSort to TimSort. This does not check the rules as such, but it throws an IllegalArgumentException when it detects that there is residue left behind from an incorrect compareTo() function. It does not say what rule was violated, because it does not know. It also won't always detect the violation, just depends how the elements are ordered. Checking rule 1 is quadratic. Rules 2 and 3 are cubic. We should not ignore these IllegalArgumentExceptions.
We could in theory enable the old MergeSort with
-Djava.util.Arrays.useLegacyMergeSort=true
However, the resultant list is not sorted properly, since we
have not defined a correct sort order.
Since Java 8, Collections.sort(list) delegates to List.sort(Comparator). In the past, Collections.sort(list) would copy all elements to an array, sort that with Arrays.sort(), then copy them back. Since the method was moved to the List interface, we can have different specializations of the method. For example, ArrayList sorts the elements in place. Thus when we sort it for the first time, TimSort might discover that things went awry and throw the IllegalArgumentException. As a side effect, the ArrayList order would have changed. If we call it often enough, we reach a state where we do not detect any residue.
It is possible to define a correct Comparator that will work with mixed Person and Programmer objects. We could simply define that Person comes after Programmer. If we have two Programmers, then we compare them by IQ and name. Otherwise we compare using the normal compareTo() of Person. Like this:
Comparator<Person> comparator = (p1, p2) -> { if (p1 instanceof Programmer) { if (p2 instanceof Programmer) return p1.compareTo(p2); else return -1; } if (p2 instanceof Programmer) return 1; return p1.compareTo(p2); };
The CompareTester confirms that we do not violate any rules. However, we should also implement the equals() and hashCode() methods. The equals() is also a bit tricky, since it needs to be symmetric and transitive. Here are my equals() methods in Person and Programmer:
public class Person implements Comparable<Person> { /* snip */ public boolean equals(Object o) { if (this == o) return true; if (o == null || o.getClass() != Person.class) return false; return equalsName((Person) o); } protected boolean equalsName(Person that) { return Objects.equals(name, that.name); } } public class Programmer extends Person { /* snip */ public boolean equals(Object o) { if (this == o) return true; if (o == null || o.getClass() != Programmer.class) return false; Programmer that = (Programmer) o; return iq == that.iq && super.equalsName(that); } }
Our test class now becomes PersonTesterWithCorrectComparator:
import java.util.*; import java.util.stream.*; public class PersonTesterWithCorrectComparator { public static void main(String... args) { Person[] people = { new Programmer("Aaron", (short) 130), new Person("Adolf"), new Programmer("Brian", (short) 180), new Person("Brian"), new Programmer("Cedric", (short) 120), new Programmer("Cedric", (short) 120), new Programmer("Zoran", (short) 200), }; Comparator<Person> comparator = (p1, p2) -> { if (p1 instanceof Programmer) { if (p2 instanceof Programmer) return p1.compareTo(p2); else return -1; } if (p2 instanceof Programmer) return 1; return p1.compareTo(p2); }; CompareTester.test(people, comparator); Arrays.sort(people, comparator); Stream.of(people).forEach(System.out::println); } }
Output is as follows:
Zoran (200) Brian (180) Aaron (130) Cedric (120) Cedric (120) Adolf Brian
In the comments section of my previous newsletter, you will find another solution where the compareTo functions on Person and Programmer returns a consistent natural order. To view the commments, you will need to explicitly enable the Disqus comments. GDPR and all that jazz, you know.
Kind regards from Crete
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.