Running on Java 24-ea+28-3562 (Preview)
Home of The JavaSpecialists' Newsletter

124Copying Arrays Fast

Author: Dr. Heinz M. KabutzDate: 2006-03-28Java Version: 5Category: Performance
 

Abstract: In this newsletter we look at the difference in performance between cloning and copying an array of bytes. Beware of the Microbenchmark! We also show how misleading such a test can be, but explain why the cloning is so much slower for small arrays.

 

Welcome to the 124th edition of The Java(tm) Specialists' Newsletter, which I started writing for you in Las Vegas, where I had the fortune of attending TheServerSide Java Symposium. One of the highlights of these conferences is the networking during coffee breaks. Shaking hands with the likes of Ted Neward, Kirk Pepperdine, Bruce Snyder, Bruce Tate, Stephan Janssen, Rod Johnson, Adrian Coyler, Gregor Hohpe and Eugene Ciurana. At the TSSJS conferences you can get easy access to the speakers. It was also great linking up with my old friends from Estonia and with suscribers from around the world, such as David Jones (who wrote J2EE Singleton for us many years ago) and Robert Lentz, a Java Architect from Germany.

On Friday night Kirk Pepperdine and I led a Birds-of-a-Feather (BoF) on whether performance testing was still a relevant activity. Some people think that instead of tuning the performance, one could simply buy more hardware. This is not always the case. For example, if you are hitting the limits of memory, the garbage collector may degrade the performance of the entire system to such an extent that the only option is to fix the problem.

The difficulty with looking at performance, as we will see in this newsletter, is eliminating the noise from the problem. Beware of the microbenchmark!

javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.

Copying Arrays Fast

A month ago, my friend Paul van Spronsen sent me this puzzle (Paul wrote an excellent newsletter on multicasting in Java):

Look at the attached class. Guess which test will be faster and then run X.main. Startling result.

import java.util.Random;

public class X {
  private final byte[] byteValue = new byte[16];

  X() {
    new Random(0).nextBytes(byteValue);
  }

  public byte[] testClone() {
    return byteValue.clone();
  }

  public byte[] testNewAndCopy() {
    byte[] b = new byte[byteValue.length];
    System.arraycopy(byteValue, 0, b, 0, byteValue.length);
    return b;
  }

  public static void main(String[] args) {
    doTest();
    doTest();
  }

  private static void doTest() {
    X x = new X();
    int m = 50000000;

    long t0 = System.currentTimeMillis();
    for (int i = 0; i < m; i++) {
      x.testClone();
    }
    long t1 = System.currentTimeMillis();

    System.out.println("clone(): " + (t1 - t0));

    t0 = System.currentTimeMillis();
    for (int i = 0; i < m; i++) {
      x.testNewAndCopy();
    }
    t1 = System.currentTimeMillis();

    System.out.println("arraycopy(): " + (t1 - t0));
  }
}

I guessed, based on previous experience, that the testNewAndCopy() would be faster. The System.arrayCopy() method uses JNI to copy the memory and I knew that was fast. (I also knew my friend Paul would only send me a puzzle if the result was surprising) Here we see that clone takes about 5 times longer than copying the array:

    clone(): 26598
    arraycopy(): 5618
    clone(): 26639
    arraycopy(): 5648

We could run off now and change all our code to use the more verbose approach instead of clone(), and expect to see an improvement in performance.

Beware of the Microbenchmark! I cannot emphasize this enough. When we encounter surprises, we need to find out why they are there. Changing code before knowing why it is slower can result in ugly, slow code. When I showed this to Kirk Pepperdine at TheServerSide Java Symposium, he suggested that I would need to look at the source code of clone() to see why there was such a large difference.

But before we do that, let's have a look at some more robust testing classes. First off, a class that runs a method repeatedly within some time period. Here, higher numbers are better. This time I added some JavaDocs to explain how it works. Please let me know if you like seeing JavaDocs in the code, or if I can strip them out?

import java.util.*;

/**
 * The PerformanceChecker tries to run the task as often as possible
 * in the allotted time.  It then returns the number of times that
 * the task was called.  To make sure that the timer stops the test
 * timeously, we check that the difference between start and end
 * time is less than EPSILON.  After it has tried unsuccessfully for
 * MAXIMUM_ATTEMPTS times, it throws an exception.
 *
 * @author Heinz Kabutz
 * @since 2006/03/27
 */
public class PerformanceChecker {
  /**
   * Whether the test should continue running.  Will expire after
   * some time specified in testTime.  Needs to be volatile for
   * visibility.
   */
  private volatile boolean expired = false;
  /** The number of milliseconds that each test should run */
  private final long testTime;
  /** The task to execute for the duration of the test run. */
  private final Runnable task;
  /**
   * Accuracy of test.  It must finish within 20ms of the testTime
   * otherwise we retry the test.  This could be configurable.
   */
  public static final int EPSILON = 20;
  /**
   * Number of repeats before giving up with this test.
   */
  private static final int MAXIMUM_ATTEMPTS = 3;

  /**
   * Set up the number of milliseconds that the test should run, and
   * the task that should be executed during that time.  The task
   * should ideally run for less than 10ms at most, otherwise you
   * will get too many retry attempts.
   *
   * @param testTime the number of milliseconds that the test should
   *                 execute.
   * @param task     the task that should be executed repeatedly
   *                 until the time is used up.
   */
  public PerformanceChecker(long testTime, Runnable task) {
    this.testTime = testTime;
    this.task = task;
  }

  /**
   * Start the test, and after the set time interrupt the test and
   * return the number of times that we were able to execute the
   * run() method of the task.
   */
  public long start() {
    long numberOfLoops;
    long start;
    int runs = 0;
    do {
      if (++runs > MAXIMUM_ATTEMPTS) {
        throw new IllegalStateException("Test not accurate");
      }
      expired = false;
      start = System.currentTimeMillis();
      numberOfLoops = 0;
      Timer timer = new Timer();
      timer.schedule(new TimerTask() {
        public void run() {
          expired = true;
        }
      }, testTime);
      while (!expired) {
        task.run();
        numberOfLoops++;
      }
      start = System.currentTimeMillis() - start;
      timer.cancel();
    } while (Math.abs(start - testTime) > EPSILON);
    collectGarbage();
    return numberOfLoops;
  }

  /**
   * After every test run, we collect garbage by calling System.gc()
   * and sleeping for a short while to make sure that the garbage
   * collector has had a chance to collect objects.
   */
  private void collectGarbage() {
    for (int i = 0; i < 3; i++) {
      System.gc();
      try {
        Thread.sleep(10);
      } catch (InterruptedException e) {
        Thread.currentThread().interrupt();
        break;
      }
    }
  }
}

I do not like running performance tests without calculating the standard deviation of the average, otherwise we cannot detect how noisy the environment is. Here is a class to calculate mean and standard deviation:

import java.util.*;

/**
 * Calculates the mean and average of a series of numbers.  Not the
 * most efficient algorithm, but fast enough.
 *
 * @author Heinz Kabutz
 */
public class Average {
  /** The set of values stored as doubles.  Autoboxed. */
  private Collection<Double> values = new ArrayList<Double>();

  /**
   * Add a new value to the series.  Changes the values returned by
   * mean() and stddev().
   * @param value the new value to add to the series.
   */
  public void add(double value) {
    values.add(value);
  }

  /**
   * Calculate and return the mean of the series of numbers.
   * Throws an exception if this is called before the add() method.
   * @return the mean of all the numbers added to the series.
   * @throws IllegalStateException if no values have been added yet.
   * Otherwise we could cause a NullPointerException.
   */
  public double mean() {
    int elements = values.size();
    if (elements == 0) throw new IllegalStateException("No values");
    double sum = 0;
    for (double value : values) {
      sum += value;
    }
    return sum / elements;
  }

  /**
   * Calculate and return the standard deviation of the series of
   * numbers.  See Stats 101 for more information...
   * Throws an exception if this is called before the add() method.
   * @return the standard deviation of numbers added to the series.
   * @throws IllegalStateException if no values have been added yet.
   * Otherwise we could cause a NullPointerException.
   */
  public double stddev() {
    double mean = mean();
    double stddevtotal = 0;
    for (double value : values) {
      double dev = value - mean;
      stddevtotal += dev * dev;
    }
    return Math.sqrt(stddevtotal / values.size());
  }
}

I know we will cause noise in the system, but what I definitely want to prevent is objects ending up in the Old Space. The point at which an object is put into Old Space is when it is larger than 512kb. Since byte[] takes up 8 bytes for the object pointer and 4 bytes for the array length, the largest byte array that will fit into Young Space is 512 * 1024 - 12. Try it out! We experiment with this in our Java Performance Tuning Course, and essential course if you are coding in Java.

Here we calculate the performance of a PerformanceChecker instance, based on the given number of runs. The result that comes back is an instance of Average. The standard deviation should be as small as possible. If it is large, then we know that there was background noise and that the values might or might not be invalid.

/**
 * This class calculates the performance of a PerformanceChecker
 * instance, based on the given number of runs.
 *
 * @author Heinz Kabutz
 */
public class PerformanceHarness {
  /**
   * We calculate the average number of times that the check
   * executed, together with the standard deviation.
   * @param check The test that we want to evaluate
   * @param runs How many times it should be executed
   * @return an average number of times that test could run
   */
  public Average calculatePerf(PerformanceChecker check, int runs) {
    Average avg = new Average();
    // first we warm up the hotspot compiler
    check.start(); check.start();
    for(int i=0; i < runs; i++) {
      long count = check.start();
      avg.add(count);
    }
    return avg;
  }
}

We now need to run this with the clone and array copy tests. First, I define some code for each of these test cases.

import java.util.Random;

public class ArrayCloneTest implements Runnable {
  private final byte[] byteValue;

  public ArrayCloneTest(int length) {
    byteValue = new byte[length];
    // always the same set of bytes...
    new Random(0).nextBytes(byteValue);
  }

  public void run() {
    byte[] result = byteValue.clone();
  }
}


import java.util.Random;

public class ArrayNewAndCopyTest implements Runnable {
  private final byte[] byteValue;

  public ArrayNewAndCopyTest(int length) {
    byteValue = new byte[length];
    // always the same set of bytes...
    new Random(0).nextBytes(byteValue);
  }

  public void run() {
    byte[] b = new byte[byteValue.length];
    System.arraycopy(byteValue, 0, b, 0, byteValue.length);
  }
}

We can now run the complete benchmark, by writing a CompleteTest class that tests everything thoroughly. In order to make this interesting, I test the difference between clone() and copy() for various sizes of byte[]. As mentioned before, we have to be careful not to exceed the maximum size of byte[] that can exist in the Young Space, otherwise the performance will degrade to such an extent that the whole test will fail.

public class CompleteTest {
  private static final int RUNS = 10;
  private static final long TEST_TIME = 100;

  public static void main(String[] args) throws Exception {
    test(1);
    test(10);
    test(100);
    test(1000);
    test(10000);
    test(100000);
  }

  private static void test(int length) {
    PerformanceHarness harness = new PerformanceHarness();
    Average arrayClone = harness.calculatePerf(
        new PerformanceChecker(TEST_TIME,
            new ArrayCloneTest(length)), RUNS);
    Average arrayNewAndCopy = harness.calculatePerf(
        new PerformanceChecker(TEST_TIME,
            new ArrayNewAndCopyTest(length)), RUNS);

    System.out.println("Length=" + length);
    System.out.printf("Clone %.0f\t%.0f%n",
        arrayClone.mean(), arrayClone.stddev());
    System.out.printf("Copy  %.0f\t%.0f%n",
        arrayNewAndCopy.mean(), arrayNewAndCopy.stddev());
    System.out.printf("Diff  %.2fx%n",
        arrayNewAndCopy.mean() / arrayClone.mean());
    System.out.println();
  }
}

When we now run this more comprehensive test, we see an interesting phenomenon. As the byte[] increases in size, the difference between the two techniques disappears. Why is that, you might wonder? Let's first look at the result, before we try to find out why...

    Length=1
    Clone 253606    19767
    Copy  1282556   139950
    Diff  5.06x

    Length=10
    Clone 240096    10105
    Copy  1159128   59049
    Diff  4.83x

    Length=100
    Clone 167628    13144
    Copy  464809    43279
    Diff  2.77x

    Length=1000
    Clone 53575     3535
    Copy  68080     7455
    Diff  1.27x

    Length=10000
    Clone 8842      162
    Copy  7547      713
    Diff  0.85x

    Length=100000
    Clone 807       19
    Copy  763       90
    Diff  0.95x

Oops, it seems that once the array becomes longer, the performance of the two is almost equal! Infact, the cloning seems marginally faster. Aren't you glad that you have not changed all your code yet?

I followed Kirk's advice and looked at the source code of the clone() method, which you will find in the JVM_Clone method in jvm.c (You will need to download the complete JVM source code from Sun's website). After getting my head around the old C code, I realised that it does two things in addition to plain copying of the memory:

  1. Check whether instance is normal object or array.
  2. Check whether array is of primitives or objects.

It has to be extra careful when copying an object array to not publish pointers without telling the garbage collector, or you would get nasty memory leaks.

These tests are in themselves not significant, but when the rest of our work is little (such as when the byte[] is small), then this causes our experiment to skew against cloning. Let's try out what would happen if we were to add those two checks to our test:

import java.util.Random;

public class ArrayNewAndCopyTest implements Runnable {
  private final byte[] byteValue;

  public ArrayNewAndCopyTest(int length) {
    byteValue = new byte[length];
    // always the same set of bytes...
    new Random(0).nextBytes(byteValue);
  }

  public void run() {
    Class cls = byteValue.getClass();
    if (cls.isArray()) {
      if (!cls.getComponentType().isAssignableFrom(Object.class)) {
        byte[] b = new byte[byteValue.length];
        System.arraycopy(byteValue, 0, b, 0, byteValue.length);
        return;
      }
    }
    throw new RuntimeException();
  }
}

The results are now almost identical:

    Length=1
    Clone 237416	18007
    Copy  235780	13080
    Diff  0.99x

    Length=10
    Clone 226804	9614
    Copy  231363	12176
    Diff  1.02x

    Length=100
    Clone 153981	6809
    Copy  169900	9851
    Diff  1.10x

    Length=1000
    Clone 50406	2835
    Copy  52498	2579
    Diff  1.04x

    Length=10000
    Clone 7769	281
    Copy  6807	271
    Diff  0.88x

    Length=100000
    Clone 724	24
    Copy  680	49
    Diff  0.94x

This leads me to conclude that the only reason why Paul's test seemed so much faster was because he picked a byte[] size that was small enough that the actual copying was dwarfed by the two if statements. Using clone() for copying arrays is less code and the performance difference is, as we saw, only significant for tiny arrays. I think that in future I will rather use clone() than System.arrayCopy().

It would have been great to eliminate more noise from the experiment, but since we are testing the speed of copying of arrays, we need to create new objects all the time, which then need to be garbage collected.

An interesting method that I saw in Brian Goetz's new book on Java Concurrency in Practice (more details soon) is java.util.Arrays.copyOf(byte[] original, int newLength) which was added in JDK 6. The copyOf method uses System.arrayCopy() to make a copy of the array, but is more flexible than clone since you can make copies of parts of an array.

Our new website is now running on a dedicated server, which has stayed up beautifully. I want to make it as easy as possible for you to browse through my past newsletters. Please let me know if you think of ways to make this part of the website more navigable.

Kind regards

Heinz

P.S. Sometimes it helps to cache byte[]'s especially if they are all of the same size and you need to create lots of them.

 

Comments

We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)

When you load these comments, you'll be connected to Disqus. Privacy Statement.

Related Articles

Browse the Newsletter Archive

About the Author

Heinz Kabutz Java Conference Speaker

Java Champion, author of the Javaspecialists Newsletter, conference speaking regular... About Heinz

Superpack '23

Superpack '24 Our entire Java Specialists Training in one huge bundle more...

Free Java Book

Dynamic Proxies in Java Book
Java Training

We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.

Java Consulting

We can help make your Java application run faster and trouble-shoot concurrency and performance bugs...