Abstract: How much memory was wasted when an additional boolean field was added to java.lang.String in Java 13? None at all. This article explains why.
Welcome to the 278th edition of The Java(tm) Specialists' Newsletter, sent to you from the stunning Island of Crete. During the lockdown period, we are fortunately still allowed to go out for exercise. Thus my daily runs are continuing. I regularly share the lovely views on @heinzkabutz.
My book "Dynamic Proxies in Java" has now been published and you can get your free copy of the e-book from InfoQ.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
Last month, in newsletter 277, I wrote about a change in Java 13 that prevented having to recalculate the hash code of a String in the unlikely case that it was 0. I saw several objections to the change, asking why Oracle had added another field to String, thus increasing its memory consumption.
Object size in Java is somewhat hard to determine. We do not have a sizeof operator. It also varies by system. For example, in a 64-bit JVM with compressed OOPS, we use 4 bytes for a reference and 12 bytes for the object header. If our JVM is configured with a maximum heap of 32 GB or more, then a reference is 8 bytes and the object header is 16 bytes.
One thing that is consistent with all JVM systems I have
looked at, is that objects are aligned on 8 byte boundaries.
This means that the actual memory usage of an object will
always be a multiple of 8.
Thus the java.lang.Boolean
class is 12 bytes for
the object header and one byte for the boolean, totalling
13 bytes. However, it will use 16 bytes, wasting 3 bytes due
to object alignment.
In the past, I used all sorts of trickery for guessing the
object size. Nowadays I use JOL
(Java Object Layout). For example, here is the output
when we look at the internals of
java.lang.Boolean
:
java.lang.Boolean object internals: OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 1 boolean Boolean.value 13 3 (loss due to the next object alignment) Instance size: 16 bytes Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
As we see, the instance size is 16 bytes and we have three bytes that are unused space.
If we create a JVM with a 32GB heap (-Xmx32g), then the object header uses 16 bytes and thus the size is 17 bytes. However, the actual size is 24 bytes, due to object alignment:
java.lang.Boolean object internals: OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 4 (object header) 16 1 boolean Boolean.value 17 7 (loss due to the next object alignment) Instance size: 24 bytes Space losses: 0 bytes internal + 7 bytes external = 7 bytes total
Let's get back to String and consider the object sizes over
the versions of Java. We are ignoring the size of the
char[]
or byte[]
that contain the
actual text.
Java 6 used 32 bytes, since they were storing the offset and count:
# java version "1.6.0_65" OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 4 char[] String.value 16 4 int String.offset 20 4 int String.count 24 4 int String.hash 28 4 (loss due to the next object alignment) Instance size: 32 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
(Incidentally, when the cached hash
was added to
String in Java 1.3, most JVMs were 32-bit and the object
header was just 8 bytes. In those days, the extra
hash
field fitted into the wasted space. Another
interesting factoid from 2001 - in those days every field took
at least 4 bytes, even boolean
and byte
. That changed in Java
1.4. Enough ancient history!)
Java 7 decreases this to 24 bytes. The hash32
field
was an optimization to reduce DOS attacks on hash maps.
It was "free" in terms of memory usage, since without that
we would have had 4 unused bytes anyway.
# openjdk version "1.7.0_252" java.lang.String object internals: OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 4 char[] String.value 16 4 int String.hash 20 4 int String.hash32 Instance size: 24 bytes Space losses: 0 bytes internal + 0 bytes external = 0 bytes total
Java 8 gets rid of the hash32
field,
which they replaced with a generalized solution inside
java.util.HashMap
. This did not save any memory in String,
since those 4 bytes are now "wasted" due to the next object
alignment.
# openjdk version "1.8.0_242" java.lang.String object internals: OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 4 char[] String.value 16 4 int String.hash 20 4 (loss due to the next object alignment) Instance size: 24 bytes Space losses: 0 bytes internal + 4 bytes external = 4 bytes total
Java 9 changed the array type to byte[]
and added a coder
. However, the String object
still uses 24 bytes, with 3 lost due to object alignment.
# java version "9.0.4" build 9.0.4+11 java.lang.String object internals: OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 4 byte[] String.value 16 4 int String.hash 20 1 byte String.coder 21 3 (loss due to the next object alignment) Instance size: 24 bytes Space losses: 0 bytes internal + 3 bytes external = 3 bytes total
Java 13 added the hashIsZero
boolean field,
which in Java uses 1 byte. However, we still do not use any
additional memory. Thus, as stated in the abstract, adding
this new field did not cost any additional memory.
# openjdk version "13.0.2" 2020-01-14 build 13.0.2+8 java.lang.String object internals: OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 4 byte[] String.value 16 4 int String.hash 20 1 byte String.coder 21 1 boolean String.hashIsZero 22 2 (loss due to the next object alignment) Instance size: 24 bytes Space losses: 0 bytes internal + 2 bytes external = 2 bytes total
When I ran the test in Java 15, I noticed a slight change in the object layout:
# openjdk version "15-ea" 2020-09-15 - build 15-ea+20-899 java.lang.String object internals: OFFSET SIZE TYPE DESCRIPTION 0 4 (object header) 4 4 (object header) 8 4 (object header) 12 4 int String.hash 16 1 byte String.coder 17 1 boolean String.hashIsZero 18 2 (alignment/padding gap) 20 4 byte[] String.value Instance size: 24 bytes Space losses: 2 bytes internal + 0 bytes external = 2 bytes total
After some searching, I found Shipilev's "Java Objects Inside Out" article that includes a link to an enhancement added to Java 15. Since Java 15, the field layout is a bit different and they can pack fields across class hierarchies. This has a whole bunch of implications for high performance Java. I would encourage you to read Shipilev's article.
Kind regards from Crete
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.