Running on Java 26-ea+20-2057 (Preview)
Home of The JavaSpecialists' Newsletter

269Heterogeneous Comparisons

Author: Dr Heinz M. KabutzDate: 2019-04-29Java Version: 8Sources on GitHubCategory: Tips and Tricks
 

Abstract: The compareTo() function has three rules. Break any one of them and you might get an exception when you sort. In this newsletter we explore what these rules are and how they can affect sorting.

 

Welcome to the 269th edition of The Java(tm) Specialists' Newsletter, sent to you from a dry and warm Crete. A month ago, Helene asked me whether we should invite our friends for our Easter BBQ. The plan was that we would make the meat and everyone else would bring salads. We expected a big crowd. So we placed an order for a whole lamb which I would make on the spit. Since lambs are usually small on Crete, we asked for a largish one, about 12kg. Michalis phoned me on Saturday, saying that the lamb turned out to be 20kg. My heart sank. I'm a programmer, not a chef. 20kg is quite difficult to cook. Plus our initial plans did not materialize, so we had less than half the guests. I got up at 5:45 to light the fire and then started grilling at about 6:30. The coals were much too hot and the lamb immediately began turning brown. Not a good sign when you've got 5 hours of cooking to go! After much shoveling embers to and fro, we managed to produce a delicious lunch, albeit about 3x as much meat as we needed. It was a truly wonderful day though, with interesting conversation, good food and loyal friends.

javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.

Heterogeneous Comparisons

In our previous newsletter Puzzle: Is a Programmer a Person? we had to figure out why Collections.sort() worked in some versions of Java and not in others. The first complete answer came from Vassilis Bekiaris. Well done :-)

Before we delve into the explanation of the puzzle, we should point out that the modeling is wrong. Programmer is a characteristic of a physical entity. Favour composition over inheritance. If we wanted to now have another role, such as Manager, would we subclass Person again? And what if someone is a Manager and a Programmer (yes, that exists). In fact, I was chatting to a product manager yesterday about the value of staying "in the game" as a programmer, even if that isn't our primary job description.

Here are the Person and Programmer classes from our previous newsletter:

public class Person implements Comparable<Person> {
  private final String name;

  public Person(String name) {
    this.name = name;
  }

  public int compareTo(Person that) {
    return name.compareTo(that.name);
  }

  public String toString() {
    return name;
  }
}
public class Programmer extends Person {
  private final short iq;

  public Programmer(String name, short iq) {
    super(name);
    if (iq < 120)
      throw new IllegalArgumentException(
          "Programmer IQ cannot be less than 120");
    this.iq = iq;
  }

  public int compareTo(Person that) {
    if (that instanceof Programmer) {
      Programmer p = (Programmer) that;
      int result = iq - p.iq;  // works because it's a short
      if (result != 0) return -result; // biggest IQ first
    }
    return super.compareTo(that);
  }

  public String toString() {
    return super.toString() + " (" + iq + ")";
  }
}

The rules for compareTo() are:

  • Rule 1: The implementor must ensure sgn(x.compareTo(y)) == -sgn(y.compareTo(x)) for all x and y. (This implies that x.compareTo(y) must throw an exception iff y.compareTo(x) throws an exception.)
  • Rule 2: The implementor must also ensure that the relation is transitive: (x.compareTo(y)<0 && y.compareTo(z)<0) implies x.compareTo(z)<0.
  • Rule 3: Finally, the implementor must ensure that x.compareTo(y)==0 implies that sgn(x.compareTo(z)) == sgn(y.compareTo(z)), for all z.
  • Suggestion 1: It is strongly recommended, but not strictly required that (x.compareTo(y)==0) == (x.equals(y)). Generally speaking, any class that implements the Comparable interface and violates this condition should clearly indicate this fact. The recommended language is "Note: this class has a natural ordering that is inconsistent with equals."

In our example of Person and Programmer, rules 2 and 3 are violated and suggestion 1 is not implemented. Here is a CompareTester that can verify the rules. The computational time complexity of the test methods range from quadratic to cubic, so please do not use this in production!

import java.util.*;

public class CompareTester {
  public static <E extends Comparable<E>> void test(
      E[] comparables) {
    test(comparables, Comparator.naturalOrder());
  }

  public static <E> void test(E[] comparables,
                              Comparator<E> comparator) {
    rule1(comparator, comparables);
    rule2(comparator, comparables);
    rule3(comparator, comparables);
    suggestion1(comparator, comparables);
  }

  // Quadratic
  private static <E> void rule1(Comparator<E> comparator,
                                E[] comparables) {
    for (E x : comparables) {
      for (E y : comparables) {
        int xy = comparator.compare(x, y);
        int yx = comparator.compare(y, x);
        if (sgn(xy) != -sgn(yx)) {
          System.out.printf("Rule 1 violated with" +
              " x=%s, y=%s%n", x, y);
        }
      }
    }
  }

  // Cubic
  private static <E> void rule2(Comparator<E> comparator,
                                E[] comparables) {
    for (E x : comparables) {
      for (E y : comparables) {
        for (E z : comparables) {
          int xy = comparator.compare(x, y);
          int yz = comparator.compare(y, z);
          if (xy < 0 && yz < 0) {
            int xz = comparator.compare(x, z);
            if (!(xz < 0)) {
              System.out.printf("Rule 2 violated with " +
                  "x=%s, y=%s, z=%s%n", x, y, z);
            }
          }
        }
      }
    }
  }

  // Cubic
  private static <E> void rule3(Comparator<E> comparator,
                                E[] comparables) {
    for (E x : comparables) {
      for (E y : comparables) {
        for (E z : comparables) {
          if (comparator.compare(x, y) == 0) {
            int xz = comparator.compare(x, z);
            int yz = comparator.compare(y, z);
            if (sgn(xz) != sgn(yz)) {
              System.out.printf("Rule 3 violated with " +
                  "x=%s, y=%s, z=%s%n", x, y, z);
            }
          }
        }
      }
    }
  }

  // Quadratic
  private static <E> void suggestion1(Comparator<E> comparator,
                                      E[] comparables) {
    for (E x : comparables) {
      for (E y : comparables) {
        int xy = comparator.compare(x, y);
        if ((xy == 0) != x.equals(y)) {
          System.out.printf("Suggestion 1 violated with" +
              " x=%s, y=%s%n", x, y);
        }
      }
    }
  }

  private static int sgn(int compareResult) {
    return Integer.compare(compareResult, 0);
  }
}    

In our PersonTester, we use the CompareTester to see how well the compareTo() method stacks up. Turns out, not too great:

public class PersonTester {
  public static void main(String... args) {
    Person[] people = {
        new Programmer("Aaron", (short) 130),
        new Person("Adolf"),
        new Programmer("Brian", (short) 180),
        new Person("Brian"),
        new Programmer("Cedric", (short) 120),
        new Programmer("Cedric", (short) 120),
        new Programmer("Zoran", (short) 200),
    };

    CompareTester.test(people);
  }
}

The result from running this is

Rule 2 violated with x=Aaron (130), y=Adolf, z=Brian (180)
Rule 2 violated with x=Aaron (130), y=Adolf, z=Zoran (200)
Rule 2 violated with x=Aaron (130), y=Brian, z=Zoran (200)
Rule 2 violated with x=Adolf, y=Brian (180), z=Aaron (130)
Rule 2 violated with x=Adolf, y=Zoran (200), z=Aaron (130)
Rule 2 violated with x=Brian (180), y=Aaron (130), z=Adolf
Rule 2 violated with x=Brian (180), y=Aaron (130), z=Brian
Rule 2 violated with x=Brian, y=Zoran (200), z=Aaron (130)
Rule 2 violated with x=Brian, y=Zoran (200), z=Brian (180)
Rule 2 violated with x=Zoran (200), y=Aaron (130), z=Adolf
Rule 2 violated with x=Zoran (200), y=Aaron (130), z=Brian
Rule 3 violated with x=Brian (180), y=Brian, z=Aaron (130)
Rule 3 violated with x=Brian (180), y=Brian, z=Zoran (200)
Rule 3 violated with x=Brian, y=Brian (180), z=Aaron (130)
Rule 3 violated with x=Brian, y=Brian (180), z=Zoran (200)
Suggestion 1 violated with x=Brian (180), y=Brian
Suggestion 1 violated with x=Brian, y=Brian (180)
Suggestion 1 violated with x=Cedric (120), y=Cedric (120)
Suggestion 1 violated with x=Cedric (120), y=Cedric (120)

For example, Programmer Aaron (130) comes before Person Adolf, as we are comparing only on name and "Aaron" comes before "Adolf". Person Adolf comes before Programmer Brian (180), as we are again only looking at the name. However, Programmer Brian (180) comes before Programmer Aaron (130). Since they are both programmers, the one with the highest IQ is on the left. We have thus violated rules 2 and 3.

We cannot correctly sort heterogenous lists, either in Java 6 or in later versions.

In Java 7, we changed from MergeSort to TimSort. This does not check the rules as such, but it throws an IllegalArgumentException when it detects that there is residue left behind from an incorrect compareTo() function. It does not say what rule was violated, because it does not know. It also won't always detect the violation, just depends how the elements are ordered. Checking rule 1 is quadratic. Rules 2 and 3 are cubic. We should not ignore these IllegalArgumentExceptions.

We could in theory enable the old MergeSort with -Djava.util.Arrays.useLegacyMergeSort=true However, the resultant list is not sorted properly, since we have not defined a correct sort order.

Since Java 8, Collections.sort(list) delegates to List.sort(Comparator). In the past, Collections.sort(list) would copy all elements to an array, sort that with Arrays.sort(), then copy them back. Since the method was moved to the List interface, we can have different specializations of the method. For example, ArrayList sorts the elements in place. Thus when we sort it for the first time, TimSort might discover that things went awry and throw the IllegalArgumentException. As a side effect, the ArrayList order would have changed. If we call it often enough, we reach a state where we do not detect any residue.

It is possible to define a correct Comparator that will work with mixed Person and Programmer objects. We could simply define that Person comes after Programmer. If we have two Programmers, then we compare them by IQ and name. Otherwise we compare using the normal compareTo() of Person. Like this:

Comparator<Person> comparator = (p1, p2) -> {
  if (p1 instanceof Programmer) {
    if (p2 instanceof Programmer)
      return p1.compareTo(p2);
    else
      return -1;
  }

  if (p2 instanceof Programmer)
    return 1;

  return p1.compareTo(p2);
};

The CompareTester confirms that we do not violate any rules. However, we should also implement the equals() and hashCode() methods. The equals() is also a bit tricky, since it needs to be symmetric and transitive. Here are my equals() methods in Person and Programmer:

public class Person implements Comparable<Person> {
  /* snip */
  public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || o.getClass() != Person.class) return false;
    return equalsName((Person) o);
  }
  protected boolean equalsName(Person that) {
    return Objects.equals(name, that.name);
  }
}
public class Programmer extends Person {
  /* snip */
  public boolean equals(Object o) {
    if (this == o) return true;
    if (o == null || o.getClass() != Programmer.class)
      return false;
    Programmer that = (Programmer) o;
    return iq == that.iq && super.equalsName(that);
  }
}

Our test class now becomes PersonTesterWithCorrectComparator:

import java.util.*;
import java.util.stream.*;

public class PersonTesterWithCorrectComparator {
  public static void main(String... args) {
    Person[] people = {
        new Programmer("Aaron", (short) 130),
        new Person("Adolf"),
        new Programmer("Brian", (short) 180),
        new Person("Brian"),
        new Programmer("Cedric", (short) 120),
        new Programmer("Cedric", (short) 120),
        new Programmer("Zoran", (short) 200),
    };

    Comparator<Person> comparator = (p1, p2) -> {
      if (p1 instanceof Programmer) {
        if (p2 instanceof Programmer)
          return p1.compareTo(p2);
        else
          return -1;
      }

      if (p2 instanceof Programmer)
        return 1;

      return p1.compareTo(p2);
    };
    CompareTester.test(people, comparator);

    Arrays.sort(people, comparator);
    Stream.of(people).forEach(System.out::println);
  }
}

Output is as follows:

Zoran (200)
Brian (180)
Aaron (130)
Cedric (120)
Cedric (120)
Adolf
Brian

In the comments section of my previous newsletter, you will find another solution where the compareTo functions on Person and Programmer returns a consistent natural order. To view the commments, you will need to explicitly enable the Disqus comments. GDPR and all that jazz, you know.

Kind regards from Crete

Heinz

 

Comments

We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)

When you load these comments, you'll be connected to Disqus. Privacy Statement.

Related Articles

Browse the Newsletter Archive

About the Author

Heinz Kabutz Java Conference Speaker

Java Champion, author of the Javaspecialists Newsletter, conference speaking regular... About Heinz

Superpack

Java Specialists Superpack Our entire Java Specialists Training in one huge bundle more...

Free Java Book

Dynamic Proxies in Java Book
Java Training

We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.

Java Consulting

We can help make your Java application run faster and trouble-shoot concurrency and performance bugs...