Abstract: Java compilers convert Strings appended with + to StringBuffer. The generated bytecode is compiler specific, with javac and the Eclipse compiler producing slightly different results.
Welcome to the 68th edition of The Java(tm) Specialists' Newsletter, sent to 6400 Java Specialists in 95 countries.
Since our last newsletter, we have had two famous Java authors join the ranks of subscribers. It gives me great pleasure to welcome Mark Grand and Bill Venners to our list of subscribers. Mark is famous for his three volumes of Java Design Patterns books. You will notice that I quote Mark in the brochure of my Design Patterns course. Bill is famous for his book Inside The Java Virtual Machine [ISBN 0071350934] . Bill also does a lot of work training with Bruce Eckel.
Our last newsletter on BASIC Java produced gasps of disbelief. Some readers told me that they now wanted to unsubscribe, which of course I supported 100%. Others enjoyed it with me. It was meant in humour, as the warnings at the beginning of the newsletter clearly indicated.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
The first code that I look for when I am asked to find out why some code is slow is concatenation of Strings. When we concatenate Strings with += a whole lot of objects are constructed.
Before we can look at an example, we need to define a Timer class that we will use for measuring performance:
/** * Class used to measure the time that a task takes to execute. * The method "time" prints out how long it took and returns * the time. */ public class Timer { /** * This method runs the Runnable and measures how long it takes * @param r is the Runnable for the task that we want to measure * @return the time it took to execute this task */ public static long time(Runnable r) { long time = -System.currentTimeMillis(); r.run(); time += System.currentTimeMillis(); System.out.println("Took " + time + "ms"); return time; } }
In the test case, we have three tasks that we want to measure. The first is a simple += String append, which turns out to be extremely slow. The second creates a StringBuffer and calls the append method of StringBuffer. The third method creates the StringBuffer with the correct size and then appends to that. After I have presented the code, I will explain what happens and why.
public class StringAppendDiff { public static void main(String[] args) { System.out.println("String += 10000 additions"); Timer.time(new Runnable() { public void run() { String s = ""; for(int i = 0; i < 10000; i++) { s += i; } // we have to use "s" in some way, otherwise a clever // compiler would optimise it away. Not that I have // any such compiler, but just in case ;-) System.out.println("Length = " + s.length()); } }); System.out.println( "StringBuffer 300 * 10000 additions initial size wrong"); Timer.time(new Runnable() { public void run() { StringBuffer sb = new StringBuffer(); for(int i = 0; i < (300 * 10000); i++) { sb.append(i); } String s = sb.toString(); System.out.println("Length = " + s.length()); } }); System.out.println( "StringBuffer 300 * 10000 additions initial size right"); Timer.time(new Runnable() { public void run() { StringBuffer sb = new StringBuffer(19888890); for(int i = 0; i < (300 * 10000); i++) { sb.append(i); } String s = sb.toString(); System.out.println("Length = " + s.length()); } }); } }
This program does use quite a bit of memory, so you should set the maximum old generation heapspace to be quite large, for example 256mb. You can do that with the -Xmx256m flag. When we run this program, we get the following output:
String += 10000 additions Length = 38890 Took 2203ms StringBuffer 300 * 10000 additions initial size wrong Length = 19888890 Took 2254ms StringBuffer 300 * 10000 additions initial size right Length = 19888890 Took 1562ms
You can observe that using StringBuffer directly is about 300 times faster than using +=. Another observation that we can make is that if we set the initial size to be correct, it only takes 1562ms instead of 2254ms. This is because of the way that java.lang.StringBuffer works. When you create a new StringBuffer, it creates a char[] of size 16. When you append, and there is no space left in the char[] then it is doubled in size. This means that if you size it first, you will reduce the number of char[]s that are constructed.
The time that the += String append takes is dependent
on the compiler that you use to compile the code. I
discovered this accidentally during my Java course last
week, and much to my embarrassment, I did not know why
this was. If you compile it from within Eclipse, you get
the result above, and if you compile it with Sun's
javac
, you get the output below. I think
that Eclipse uses jikes to compile the code, but I am not
sure. Perhaps it even has an internal compiler?
String += 10000 additions Length = 38890 Took 7912ms StringBuffer 300 * 10000 additions initial size wrong Length = 19888890 Took 2634ms StringBuffer 300 * 10000 additions initial size right Length = 19888890 Took 1822ms
This took some head-scratching, resulting in my fingers being full of wood splinters. I started by writing a class that did the basic String append with +=.
public class BasicStringAppend { public BasicStringAppend() { String s = ""; for(int i = 0; i < 100; i++) { s += i; } } }
When in doubt about what the compiler does, disassemble
the classes. Even when I disassembled them, it took a
while before I figured out what the difference was and
why it was important. The part where they differ is in
italics. You can disassemble a class with the
tool javap
that is in the bin directory of
your java installation. Use the -c parameter:
javap -c BasicStringAppend
Compiled with Eclipse: Compiled from BasicStringAppend.java public class BasicStringAppend extends java.lang.Object { public BasicStringAppend(); } Method BasicStringAppend() 0 aload_0 1 invokespecial #9 <Method java.lang.Object()> 4 ldc #11 <String ""> 6 astore_1 7 iconst_0 8 istore_2 9 goto 34 12 new #13 <Class java.lang.StringBuffer> 15 dup 16 aload_1 17 invokestatic #19 <Method java.lang.String valueOf(java.lang.Object)> 20 invokespecial #22 <Method java.lang.StringBuffer(java.lang.String)> 23 iload_2 24 invokevirtual #26 <Method java.lang.StringBuffer append(int)> 27 invokevirtual #30 <Method java.lang.String toString()> 30 astore_1 31 iinc 2 1 34 iload_2 35 bipush 100 37 if_icmplt 12 40 return
Compiled with Sun's javac: Compiled from BasicStringAppend.java public class BasicStringAppend extends java.lang.Object { public BasicStringAppend(); } Method BasicStringAppend() 0 aload_0 1 invokespecial #1 <Method java.lang.Object()> 4 ldc #2 <String ""> 6 astore_1 7 iconst_0 8 istore_2 9 goto 34 12 new #3 <Class java.lang.StringBuffer> 15 dup 16 invokespecial #4 <Method java.lang.StringBuffer()> 19 aload_1 20 invokevirtual #5 <Method java.lang.StringBuffer append(java.lang.String)> 23 iload_2 24 invokevirtual #6 <Method java.lang.StringBuffer append(int)> 27 invokevirtual #7 <Method java.lang.String toString()> 30 astore_1 31 iinc 2 1 34 iload_2 35 bipush 100 37 if_icmplt 12 40 return
Instead of explaining what every line does (which I hope should not be necessary on a Java Specialists' Newsletter) I present the equivalent Java code for both IBM's Eclipse and Sun. The differences, which equate to the disassembled difference, is again in italics:
public class IbmBasicStringAppend { public IbmBasicStringAppend() { String s = ""; for(int i = 0; i < 100; i++) { s = new StringBuffer(String.valueOf(s)).append(i).toString(); } } }
public class SunBasicStringAppend { public SunBasicStringAppend() { String s = ""; for(int i = 0; i < 100; i++) { s = new StringBuffer().append(s).append(i).toString(); } } }
It does not actually matter which compiler is better, either is terrible. The answer is to avoid += with Strings wherever possible.
You should never reuse a StringBuffer object. Construct it, fill it, convert it to a String, and then throw it away.
Why is this? StringBuffer contains a char[]
which holds the characters to be used for the String. When you call
toString()
on the StringBuffer, does it make a copy of
the char[]
? No, it assumes that you will
throw the StringBuffer away and constructs a String with a pointer to
the same char[]
that is contained inside
StringBuffer! If you do change the StringBuffer after creating
a String, it makes a copy of the char[]
and
uses that internally. Do yourself a favour and read the source code
of StringBuffer - it is enlightning.
But it gets worse than this. In JDK 1.4.1, Sun changed the way that setLength() works. Before 1.4.1, it was safe to do the following:
... // StringBuffer sb defined somewhere else sb.append(...); sb.append(...); sb.append(...); String s = sb.toString(); sb.setLength(0);
The code of setLength pre-1.4.1 used to contain the following snippet of code:
if (count < newLength) { // *snip* } else { count = newLength; if (shared) { if (newLength > 0) { copy(); } else { // If newLength is zero, assume the StringBuffer is being // stripped for reuse; Make new buffer of default size value = new char[16]; shared = false; } } }
It was replaced in the 1.4.1 version with:
if (count < newLength) { // *snip* } else { count = newLength; if (shared) copy(); }
Therefore, if you reuse a StringBuffer in JDK 1.4.1, and any one of the Strings created with that StringBuffer is big, all future Strings will have the same size char[]. This is not very kind of Sun, since it causes bugs in many libraries. However, my argument is that you should not have reused StringBuffers anyway, since you will have less overhead simply creating a new one than setting the size to zero again.
This memory leak was pointed out to me by Andrew Shearman during one of my courses, thank you very much! For more information, you can visit Sun's website.
When you read those posts, it becomes apparent that JDOM reuses StringBuffers extensively. It was probably a bit mean to change StringBuffer's setLength() method, although I think that it is not a bug. It is simply highlighting bugs in many libraries.
For those of you that use JDOM, I hope that JDOM will be fixed soon to cater for this change in the JDK. For the rest of us, let us remember to throw away used StringBuffers.
So long...
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.