Abstract: In this newsletter we look at the difference in performance between cloning and copying an array of bytes. Beware of the Microbenchmark! We also show how misleading such a test can be, but explain why the cloning is so much slower for small arrays.
Welcome to the 124th edition of The Java(tm) Specialists' Newsletter, which I started writing for you in Las Vegas, where I had the fortune of attending TheServerSide Java Symposium. One of the highlights of these conferences is the networking during coffee breaks. Shaking hands with the likes of Ted Neward, Kirk Pepperdine, Bruce Snyder, Bruce Tate, Stephan Janssen, Rod Johnson, Adrian Coyler, Gregor Hohpe and Eugene Ciurana. At the TSSJS conferences you can get easy access to the speakers. It was also great linking up with my old friends from Estonia and with suscribers from around the world, such as David Jones (who wrote J2EE Singleton for us many years ago) and Robert Lentz, a Java Architect from Germany.
On Friday night Kirk Pepperdine and I led a Birds-of-a-Feather (BoF) on whether performance testing was still a relevant activity. Some people think that instead of tuning the performance, one could simply buy more hardware. This is not always the case. For example, if you are hitting the limits of memory, the garbage collector may degrade the performance of the entire system to such an extent that the only option is to fix the problem.
The difficulty with looking at performance, as we will see in this newsletter, is eliminating the noise from the problem. Beware of the microbenchmark!
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
A month ago, my friend Paul van Spronsen sent me this puzzle (Paul wrote an excellent newsletter on multicasting in Java):
Look at the attached class. Guess which test will be faster and then run X.main. Startling result.
import java.util.Random; public class X { private final byte[] byteValue = new byte[16]; X() { new Random(0).nextBytes(byteValue); } public byte[] testClone() { return byteValue.clone(); } public byte[] testNewAndCopy() { byte[] b = new byte[byteValue.length]; System.arraycopy(byteValue, 0, b, 0, byteValue.length); return b; } public static void main(String[] args) { doTest(); doTest(); } private static void doTest() { X x = new X(); int m = 50000000; long t0 = System.currentTimeMillis(); for (int i = 0; i < m; i++) { x.testClone(); } long t1 = System.currentTimeMillis(); System.out.println("clone(): " + (t1 - t0)); t0 = System.currentTimeMillis(); for (int i = 0; i < m; i++) { x.testNewAndCopy(); } t1 = System.currentTimeMillis(); System.out.println("arraycopy(): " + (t1 - t0)); } }
I guessed, based on previous experience, that the
testNewAndCopy()
would be faster. The
System.arrayCopy()
method uses JNI to copy the
memory and I knew that was fast. (I also knew my friend Paul
would only send me a puzzle if the result was surprising)
Here we see that clone takes about 5 times longer than copying
the array:
clone(): 26598 arraycopy(): 5618 clone(): 26639 arraycopy(): 5648
We could run off now and change all our code to use the more
verbose approach instead of clone()
, and expect to
see an improvement in performance.
Beware of the Microbenchmark! I cannot emphasize this enough. When we encounter surprises, we need to find out why they are there. Changing code before knowing why it is slower can result in ugly, slow code. When I showed this to Kirk Pepperdine at TheServerSide Java Symposium, he suggested that I would need to look at the source code of clone() to see why there was such a large difference.
But before we do that, let's have a look at some more robust testing classes. First off, a class that runs a method repeatedly within some time period. Here, higher numbers are better. This time I added some JavaDocs to explain how it works. Please let me know if you like seeing JavaDocs in the code, or if I can strip them out?
import java.util.*; /** * The PerformanceChecker tries to run the task as often as possible * in the allotted time. It then returns the number of times that * the task was called. To make sure that the timer stops the test * timeously, we check that the difference between start and end * time is less than EPSILON. After it has tried unsuccessfully for * MAXIMUM_ATTEMPTS times, it throws an exception. * * @author Heinz Kabutz * @since 2006/03/27 */ public class PerformanceChecker { /** * Whether the test should continue running. Will expire after * some time specified in testTime. Needs to be volatile for * visibility. */ private volatile boolean expired = false; /** The number of milliseconds that each test should run */ private final long testTime; /** The task to execute for the duration of the test run. */ private final Runnable task; /** * Accuracy of test. It must finish within 20ms of the testTime * otherwise we retry the test. This could be configurable. */ public static final int EPSILON = 20; /** * Number of repeats before giving up with this test. */ private static final int MAXIMUM_ATTEMPTS = 3; /** * Set up the number of milliseconds that the test should run, and * the task that should be executed during that time. The task * should ideally run for less than 10ms at most, otherwise you * will get too many retry attempts. * * @param testTime the number of milliseconds that the test should * execute. * @param task the task that should be executed repeatedly * until the time is used up. */ public PerformanceChecker(long testTime, Runnable task) { this.testTime = testTime; this.task = task; } /** * Start the test, and after the set time interrupt the test and * return the number of times that we were able to execute the * run() method of the task. */ public long start() { long numberOfLoops; long start; int runs = 0; do { if (++runs > MAXIMUM_ATTEMPTS) { throw new IllegalStateException("Test not accurate"); } expired = false; start = System.currentTimeMillis(); numberOfLoops = 0; Timer timer = new Timer(); timer.schedule(new TimerTask() { public void run() { expired = true; } }, testTime); while (!expired) { task.run(); numberOfLoops++; } start = System.currentTimeMillis() - start; timer.cancel(); } while (Math.abs(start - testTime) > EPSILON); collectGarbage(); return numberOfLoops; } /** * After every test run, we collect garbage by calling System.gc() * and sleeping for a short while to make sure that the garbage * collector has had a chance to collect objects. */ private void collectGarbage() { for (int i = 0; i < 3; i++) { System.gc(); try { Thread.sleep(10); } catch (InterruptedException e) { Thread.currentThread().interrupt(); break; } } } }
I do not like running performance tests without calculating the standard deviation of the average, otherwise we cannot detect how noisy the environment is. Here is a class to calculate mean and standard deviation:
import java.util.*; /** * Calculates the mean and average of a series of numbers. Not the * most efficient algorithm, but fast enough. * * @author Heinz Kabutz */ public class Average { /** The set of values stored as doubles. Autoboxed. */ private Collection<Double> values = new ArrayList<Double>(); /** * Add a new value to the series. Changes the values returned by * mean() and stddev(). * @param value the new value to add to the series. */ public void add(double value) { values.add(value); } /** * Calculate and return the mean of the series of numbers. * Throws an exception if this is called before the add() method. * @return the mean of all the numbers added to the series. * @throws IllegalStateException if no values have been added yet. * Otherwise we could cause a NullPointerException. */ public double mean() { int elements = values.size(); if (elements == 0) throw new IllegalStateException("No values"); double sum = 0; for (double value : values) { sum += value; } return sum / elements; } /** * Calculate and return the standard deviation of the series of * numbers. See Stats 101 for more information... * Throws an exception if this is called before the add() method. * @return the standard deviation of numbers added to the series. * @throws IllegalStateException if no values have been added yet. * Otherwise we could cause a NullPointerException. */ public double stddev() { double mean = mean(); double stddevtotal = 0; for (double value : values) { double dev = value - mean; stddevtotal += dev * dev; } return Math.sqrt(stddevtotal / values.size()); } }
I know we will cause noise in the system, but what I definitely
want to prevent is objects ending up in the Old Space. The
point at which an object is put into Old Space is when it is
larger than 512kb. Since byte[]
takes up 8 bytes
for the object pointer and 4 bytes for the array length, the
largest byte array that will fit into Young Space is
512 * 1024 - 12. Try it out! We experiment with this in our
Java Performance
Tuning Course, and essential course if you are
coding in Java.
Here we calculate the performance of a PerformanceChecker instance, based on the given number of runs. The result that comes back is an instance of Average. The standard deviation should be as small as possible. If it is large, then we know that there was background noise and that the values might or might not be invalid.
/** * This class calculates the performance of a PerformanceChecker * instance, based on the given number of runs. * * @author Heinz Kabutz */ public class PerformanceHarness { /** * We calculate the average number of times that the check * executed, together with the standard deviation. * @param check The test that we want to evaluate * @param runs How many times it should be executed * @return an average number of times that test could run */ public Average calculatePerf(PerformanceChecker check, int runs) { Average avg = new Average(); // first we warm up the hotspot compiler check.start(); check.start(); for(int i=0; i < runs; i++) { long count = check.start(); avg.add(count); } return avg; } }
We now need to run this with the clone and array copy tests. First, I define some code for each of these test cases.
import java.util.Random; public class ArrayCloneTest implements Runnable { private final byte[] byteValue; public ArrayCloneTest(int length) { byteValue = new byte[length]; // always the same set of bytes... new Random(0).nextBytes(byteValue); } public void run() { byte[] result = byteValue.clone(); } } import java.util.Random; public class ArrayNewAndCopyTest implements Runnable { private final byte[] byteValue; public ArrayNewAndCopyTest(int length) { byteValue = new byte[length]; // always the same set of bytes... new Random(0).nextBytes(byteValue); } public void run() { byte[] b = new byte[byteValue.length]; System.arraycopy(byteValue, 0, b, 0, byteValue.length); } }
We can now run the complete benchmark, by writing a CompleteTest class that tests everything thoroughly. In order to make this interesting, I test the difference between clone() and copy() for various sizes of byte[]. As mentioned before, we have to be careful not to exceed the maximum size of byte[] that can exist in the Young Space, otherwise the performance will degrade to such an extent that the whole test will fail.
public class CompleteTest { private static final int RUNS = 10; private static final long TEST_TIME = 100; public static void main(String[] args) throws Exception { test(1); test(10); test(100); test(1000); test(10000); test(100000); } private static void test(int length) { PerformanceHarness harness = new PerformanceHarness(); Average arrayClone = harness.calculatePerf( new PerformanceChecker(TEST_TIME, new ArrayCloneTest(length)), RUNS); Average arrayNewAndCopy = harness.calculatePerf( new PerformanceChecker(TEST_TIME, new ArrayNewAndCopyTest(length)), RUNS); System.out.println("Length=" + length); System.out.printf("Clone %.0f\t%.0f%n", arrayClone.mean(), arrayClone.stddev()); System.out.printf("Copy %.0f\t%.0f%n", arrayNewAndCopy.mean(), arrayNewAndCopy.stddev()); System.out.printf("Diff %.2fx%n", arrayNewAndCopy.mean() / arrayClone.mean()); System.out.println(); } }
When we now run this more comprehensive test, we see an interesting phenomenon. As the byte[] increases in size, the difference between the two techniques disappears. Why is that, you might wonder? Let's first look at the result, before we try to find out why...
Length=1 Clone 253606 19767 Copy 1282556 139950 Diff 5.06x Length=10 Clone 240096 10105 Copy 1159128 59049 Diff 4.83x Length=100 Clone 167628 13144 Copy 464809 43279 Diff 2.77x Length=1000 Clone 53575 3535 Copy 68080 7455 Diff 1.27x Length=10000 Clone 8842 162 Copy 7547 713 Diff 0.85x Length=100000 Clone 807 19 Copy 763 90 Diff 0.95x
Oops, it seems that once the array becomes longer, the performance of the two is almost equal! Infact, the cloning seems marginally faster. Aren't you glad that you have not changed all your code yet?
I followed Kirk's advice and looked at the source code of the clone() method, which you will find in the JVM_Clone method in jvm.c (You will need to download the complete JVM source code from Sun's website). After getting my head around the old C code, I realised that it does two things in addition to plain copying of the memory:
It has to be extra careful when copying an object array to not publish pointers without telling the garbage collector, or you would get nasty memory leaks.
These tests are in themselves not significant, but when the rest of our work is little (such as when the byte[] is small), then this causes our experiment to skew against cloning. Let's try out what would happen if we were to add those two checks to our test:
import java.util.Random; public class ArrayNewAndCopyTest implements Runnable { private final byte[] byteValue; public ArrayNewAndCopyTest(int length) { byteValue = new byte[length]; // always the same set of bytes... new Random(0).nextBytes(byteValue); } public void run() { Class cls = byteValue.getClass(); if (cls.isArray()) { if (!cls.getComponentType().isAssignableFrom(Object.class)) { byte[] b = new byte[byteValue.length]; System.arraycopy(byteValue, 0, b, 0, byteValue.length); return; } } throw new RuntimeException(); } }
The results are now almost identical:
Length=1 Clone 237416 18007 Copy 235780 13080 Diff 0.99x Length=10 Clone 226804 9614 Copy 231363 12176 Diff 1.02x Length=100 Clone 153981 6809 Copy 169900 9851 Diff 1.10x Length=1000 Clone 50406 2835 Copy 52498 2579 Diff 1.04x Length=10000 Clone 7769 281 Copy 6807 271 Diff 0.88x Length=100000 Clone 724 24 Copy 680 49 Diff 0.94x
This leads me to conclude that the only reason why Paul's test
seemed so much faster was because he picked a byte[]
size that was small enough that the actual copying was dwarfed
by the two if
statements.
Using clone() for copying arrays is less code and
the performance difference is, as we saw, only significant for
tiny arrays. I think that in future I will rather use clone()
than System.arrayCopy().
It would have been great to eliminate more noise from the experiment, but since we are testing the speed of copying of arrays, we need to create new objects all the time, which then need to be garbage collected.
An interesting method that I saw in Brian Goetz's new book on
Java Concurrency in Practice (more details soon) is
java.util.Arrays.copyOf(byte[] original, int newLength)
which was added in JDK 6. The copyOf
method uses
System.arrayCopy()
to make a copy of the array,
but is more flexible than clone since you can make copies of
parts of an array.
Our new website is now running on a dedicated server, which has stayed up beautifully. I want to make it as easy as possible for you to browse through my past newsletters. Please let me know if you think of ways to make this part of the website more navigable.
Kind regards
Heinz
P.S. Sometimes it helps to cache byte[]'s especially if they are all of the same size and you need to create lots of them.
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.