Abstract: In his latest book, Maurice Naftalin takes us on a journey of discovery as we learn with him how Lambdas and Streams work in Java 8.
Welcome to the 224th issue of The Java(tm) Specialists' Newsletter, sent from the stunning Island of Crete. Last week I was in Paris with my partners Zenika, teaching their engineers how to present my Design Patterns Course. Thus if you would like it in French or even Italian, Zenika will be able to help you. On the Monday before the course, I got in slightly too late for lunch and too early for supper. In typical Parisian fashion, all the restaurants in my area were closed. I thus had a small snack and then went out for a run. If you've ever seen Forest Gump, you can imagine what happened next. It took me a while to get my bearings, but eventually I was crisscrossing the Seine river, completely mesmerized by the beauty of the place. After a while, I thought the Eiffel Tower didn't look too far away (hint: it is tall and it looks close from many places in Paris), so I kept on running. "Run Forest run!" Eventually I did a total of 15.6km, which for someone of my size felt like quite an achievement. By the time I got back and met up with my friend Kirk Pepperdine, yeah, the restaurants had stopped serving food. Gotta love the French!
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
Maurice Naftalin is a dear friend of mine. Even though my name does not appear anywhere in the text, he wrote "It's all your fault!" in the autographed copy that he gifted me over breakfast at Canary Wharf.
And he's not entirely wrong. A couple of years prior, Maurice told me that he was thinking of writing a 2nd edition of his Java Generics and Collections [ISBN 0596527756] book to include lambdas and streams. Or perhaps a book on lambdas by themselves. He wasn't sure. I suggested that he should start by writing a Lambda FAQ in a similar vein to Angelika Langer's excellent Generics FAQ. This would have two purposes. First off, he could contribute something useful to society at large. This he certainly did. Due to his efforts, Maurice is now one of the newest Oracle Java Champions. Secondly, he could scope out the volume of material to see if there was enough in there to write a book.
Since I'm rather impetuous, I immediately registered the domain name www.lambdafaq.org and put up a simple Wordpress site with the title "Maurice Naftalin's Lambda FAQ". I then pestered Maurice until he had a few questions and their answers on the website. Before too long, Oracle had linked to it from their main Lambda website. It has now evolved into a very useful tutorial to answer some of the tough questions that we encounter with lambdas.
When the time came to review his book, I was too busy with other things to even look at it. However, this past month I've given it a very detailed read and would like to share my findings with you. Besides a few small typos, real errors were annoyingly elusive. In other words, my contribution, besides kick-starting the website, was miniscule.
Before I continue with the review, I'd like to give a shout-out to two other friends who beat Maurice to the printing press: Dr Richard Warburton and almost-Dr Raoul-Gabriel Urma.
Dr Richard Warburton recently published Java 8 Lambdas: Pragmatic Functional Programming [ISBN 1449370772] . Whenever I meet "Sir" Richard (as I affectionately call him) at conferences, he is only too happy to try to answer some of the questions I'm battling with at the time. Usually he also does not know, but the good doctor always makes a point of finding the answer and getting back to me. I appreciate that :-)
Raoul-Gabriel Urma gave an excellent talk about Java 8 at the JDK IO conference in Denmark in 2014. Whilst enjoying an excellent dinner, I was telling him that I had bought my son size 52 shoes. He said to me: "Sorry, I don't know that system. What is that in European?" Um - that WAS European! Raoul wrote Java 8 in Action: Lambdas, Streams, and functional-style programming [ISBN 1617291994] . Raoul's book is very good indeed! He covers topics that are hard to find elsewhere, such as the CompletableFuture. Just that one chapter makes it worthwhile buying his book. He does not cover ManagedBlocker, but then neither do the others and you can always read my newsletter to learn more.
Dr Richard Warburton, Raoul-Gabriel Urma and James Gough have put together what looks like a nice little training course on Java 8. The outline looks promising and they even cover the new Date and Time API, largely produced by members of their London Java Community.
Maurice Naftalin is also busy putting together a kickass Java 8 course. It is based on his book and ETA is Q2 2015. Initially it will be available only as an in-house course. Please have a look at Mastering Lambdas Course for more information.
Lastly you could join me on my Extreme Java - Concurrency Performance course. Whilst this is not a pure Java 8 Lambda course, you will pick up the essentials of lambdas and streams whilst broadening your mind with threading and performance. That particular course is also available as self-study on our Teachable Platform.
I am told that writing any book is a lot of work. But two years? Why so long, Maurice? After all, your book is rather short. I know authors that can churn that out in two weeks! I personally prefer short books. Even this book took me about a month to work through, next to all my other obligations. My bookshelf contains lots of books that I never finished because they were simply too voluminous. Maurice has done the thinking and refining for me. There is nothing in the book that I would say is superfluous. I thus save time studying his shorter book.
In addition, the book is focused. This is not a book about functional programming like the other two mentioned above. It is a book about Java's place in a new world where we need to utilize lots of cores. This is not surprising, as he had input from Brian Goetz, author of Java Concurrency in Practice [ISBN 0321349601] . Maurice's chapter on performance is one of the best I've read in any book. I care about performance and so should you. Maurice teaches Kirk Pepperdine's Java Performance Tuning Course and all my courses focused on concurrency and performance. Besides knowing what he is talking about, he also knows who to ask for input into his writing. We thus see Aleksey Shipilev's Java Microbenchmarking Harness being employed, rather than some ad-hoc mechanism.
Some highlights from the book. To explain lambdas, Naftalin starts by showing a normal anonymous inner class.
pointList.forEach(new Consumer<Point>() { public void accept(Point p) { p.translate(1, 1); } });
He then starts removing elements that are superfluous by greying them out. For example let's hide the "new Consumer" constructor call:
pointList.forEach(new Consumer<Point>() { public void accept(Point p) { p.translate(1, 1); } });
He then greys out even more, specifically the single method inside the Consumer. Since there is only one, it should be obvious what code is meant. The name of the method is no longer important:
pointList.forEach(new Consumer<Point>() { public void accept(Point p) { p.translate(1, 1); } });
Even the fact that the parameter type is a "Point" can be deduced, so let's grey that out too:
pointList.forEach(new Consumer<Point>() { public void accept(Point p) { p.translate(1, 1); } });
This can then be represented simply as a Java 8 lambda, using the -> syntax:
pointList.forEach(p -> p.translate(1, 1));
Great explanation - just the way I would've done it!
In the second chapter, we compare the old anonymous inner classes with lambdas. They are not the same. Anonymous classes always create a new object. Lambdas do not necessarily. In both cases, the object collection cost can be eliminated with escape analysis. Here is a small class that shows object identity:
public class IdentityAnonymousLambda { public static void main(String... args) { for (int i = 0; i < 2; i++) { showIdentity(() -> System.out.println("Lambda - no fields")); showIdentity(() -> System.out.println("Lambda - parameters - " + args)); showIdentity(new Runnable() { public void run() { System.out.println("anon - no fields"); } }); showIdentity(new Runnable() { public void run() { System.out.println("anon - parameters - " + args); } }); System.out.println(); } } private static void showIdentity(Runnable runnable) { System.out.printf("%x ", System.identityHashCode(runnable)); runnable.run(); } }
And here is the output. Note how the identity hash codes of the simple lambda are the same, since they are the same object both times we use the lambda. For the more complicated lambda, the hash codes are different.
404b9385 Lambda - no fields 58372a00 Lambda - parameters - [Ljava.lang.String;@4dd8dc3 6d03e736 anon - no fields 568db2f2 anon - parameters - [Ljava.lang.String;@4dd8dc3 404b9385 Lambda - no fields 378bf509 Lambda - parameters - [Ljava.lang.String;@4dd8dc3 5fd0d5ae anon - no fields 2d98a335 anon - parameters - [Ljava.lang.String;@4dd8dc3
The second big difference is in the scope of "this". Within the lambda it refers to the enclosing object, but within the anonymous class, it refers to the object instance of the anonymous inner class. To refer to the outer object, we would have to write OuterClass.this. For example, in the "Hello" class below, both lambdas will print "Hello, world!". Try it out yourself before reading any further:
/** @author Maurice Naftalin, from Mastering Lambdas */ public class Hello { Runnable r1 = () -> { System.out.println(this); }; Runnable r2 = () -> { System.out.println(toString()); }; public String toString() { return "Hello, world!"; } public static void main(String... args) { new Hello().r1.run(); new Hello().r2.run(); } }
Heinz's approach to learning new language features: Delegate the grunt work to your IDE, in my case IntelliJ IDEA. Here's what I do (don't laugh):
Here are three small code snippets that get progressively improved with my technique:
import java.util.*; import java.util.function.*; public class HeinzLambdaTrainingWheels { public static void main(String... args) { // step 1. write using old-school anonymous inner classes Arrays.stream(args).map(new Function<String, String>() { public String apply(String s) { return s.toUpperCase(); } }).forEach(new Consumer<String>() { public void accept(String s) { System.out.println(s); } }); // step 2. "Replace with lambda" Arrays.stream(args).map(s -> s.toUpperCase()). forEach(s -> System.out.println(s)); // step 3. "Replace with Method Reference" Arrays.stream(args).map(String::toUpperCase). forEach(System.out::println); } }
To you this might seem like a stupid thing to do, but it works for me. I shared "Heinz's approach" with Dr Warburton, from Java 8 Lambdas [ISBN 1449370772] . He didn't say anything, but his deprecating smile could have written a book. Over time, these training wheels will fall off. But in the meantime, I do find it useful to help me get the syntax right.
Still in chapter 2, Maurice writes about the challenge of overloaded methods. Just to remind you, an overloaded method is when you have multiple methods with the same name, but with different parameters. For example Object.wait() is overloaded. This can go awry quite easily. For example, in my previous newsletter, I had gone wild with refactoring and produced the following (it compiles):
public class Interruptions { public static void saveForLater(InterruptibleAction action) { saveForLater(action::run); } public static <E> E saveForLater( InterruptibleTask<E> task) { boolean interrupted = Thread.interrupted(); // clears flag try { while (true) { try { return task.run(); } catch (InterruptedException e) { // flag would be cleared at this point too interrupted = true; } } } finally { if (interrupted) Thread.currentThread().interrupt(); } } @FunctionalInterface public interface InterruptibleAction { public void run() throws InterruptedException; } @FunctionalInterface public interface InterruptibleTask<E> { public E run() throws InterruptedException; } }
I was surprised when this resulted in a stack overflow error. Even though the code compiled, the resultant byte code was not what I had expected. Expanding the first saveForLater() method to an anonymous inner class demonstrates the issue:
public class Interruptions { public static void saveForLater(InterruptibleAction action) { saveForLater(new InterruptibleAction() { public void run() throws InterruptedException { action.run(); } }); } // ... }
We can now see that we are calling ourselves recursively!
Maurice has an excellent command of the English language. His book contains some of the most beautiful prose I have read in any technical book to date (and I have read many). When I was showing him how to teach my Java Specialist Master Course, he drove me quite nutty with his perfectionism. For example, he pointed out that the default locking mechanism wasn't "unfair". Instead it was "non-fair", which is the opposite of "fair". I have four beautiful children. I am not a "fair" father, rather I try to be "non-fair". If I was "fair", then when I bought my daughters a dress, I would also have to buy my son a dress. And when I bought my son size 52 basketball shoes, I would have to buy the same for my daughters. Instead, I buy things for the various children as and when they need them. I noticed that Maurice also mentioned the "non-fair" locking mechanism in his book :-)
This is the first technical book where I found myself re-reading a sentence a few times, simply because the words were so beautifully assembled. For example: "A 40-year trend of exponentially increasing processor speed had been halted by inescapable physical facts: signal leakage, inadequate heat dissipation, and the hard truth that, even at the speed of light, data cannot cross a chip quickly enough for further processor speed increases."
Pure art.
We now get into the more difficult parts of the book, where Maurice writes about streams and the different types of pipelines they can contain. Even though I studied those pages very carefully, I know I will have to go over that again if I want to apply them to real code. In chapter 4 he continues with streams, collection and reduction and just as everything started falling apart in my brain and the little neurons were about to go on hunger strike, he wacks me in the face with a "Worked Example". He makes us stop reading and start thinking. Those annoying horizontal lines tell us that we should not carry on reading until we have at least tried to come up with a solution. I must say that I was completely lost at the first worked example. However, I was amazed at the power and elegance of the solution presented in the book. He carried on with this approach of giving us some practical puzzles to solve. It worked for me. I found myself stopping what I was reading, and trying to solve it without his help. Usually I failed, but always I learned far more than if I had simply read his solution.
In chapter 5, Maurice presents an idea of a recursive grep using MappedByteBuffer. This LineSpliterator from the book is slated to be included in the JDK one of these days. We don't often have ideas from a book flowing into the JDK. However, there are some issues with the LineSpliterator in the book. First off, it only works with ByteBuffer. We all know that Java has an artificial limitation of Integer.MAX_VALUE for number of bytes. Thus we cannot create a MappedByteBuffer larger than that, even on a 64-bit machine.
Maurice used a clever trick to calculate the mid-point:
int mid = (lo + hi) >>> 1;
When I typed the code into my IDE, I changed that to the
simpler int mid = (lo + hi) / 2;
and was
surprised when that overflowed with files close to 2 GB. The
triple-shift >>> means an unsigned bit shift. (lo +
hi)
might very well be a negative number, but we'll
shift the left-most bit one to the right, making it positive.
Another way to have written that would be as
int mid = lo + (hi - lo)/2;
Besides the artificial 2GB file size limit (do we really want to go back there?) there were also a few bugs in the code, which I found through unit testing. For example, the output would cut off the first character from each line.
A stream should be at least as fast when you run it in parallel as when you run it in serial. In the Fork/Join framework, we code that by having a threshold below which we do not fork further. We should do the same in the trySplit() method. In my code, I use the magic number 10000, based on observation in our small whitepaper on When to use parallel streams. If the chunk of file is less than that, I do not split further.
Lastly, the LineSpliterator would fill the characters into a StringBuilder and would then convert that to a String before passing it on to a regular expression matcher. However, the pattern.matcher() method takes as parameter a CharSequence, thus there is no need to create the Strings. We could simply keep pointers to the original StringBuilders. This is a trick that not many programmers know about, but it made a measurable difference in my tests.
I spent far more time on this than I should have, but after removing a few bugs and improving the performance a bit, we now have something that can work with virtually any size file. Much to my surprise, even the non-parallel version was faster than the standard grep utility on Mac OS X. And the parallel version was about 4x the speed of the sequential, since I have four cores on my machine.
Here is the DispLine class, which counts the number of bytes where this particular CharSequence starts (similar to grep -b)
public class DispLine { private final long disp; private final CharSequence line; public DispLine(long disp, CharSequence line) { this.disp = disp; this.line = line; } public CharSequence getLine() { return line; } public String toString() { return disp + ":" + line; } }
And now my BigLineSpliterator. To understand how it works, you need to first understand Spliterators (Maurice's book is a good place to start [ISBN 0071829628] ). Secondly you would need to understand a bit about MappedByteBuffer and FileChannel. For that I can recommend my Extreme Java - Advanced Topics Course and then the musings of Peter Lawrey.
I am willing to bet a fair wager that my code still contains several bugs. If you find one, please send me your unit test and I'll be happy to look at it. For example, this will not work properly with files that have the Windows format of \n\r. There are bugs, so please test extensively. This is more a learning exercise than production-ready code. You've been warned ... Now, without further delay, here is my BigLineSpliterator:
import java.io.*; import java.nio.*; import java.nio.channels.*; import java.util.*; import java.util.function.*; import static java.nio.channels.FileChannel.MapMode.READ_ONLY; /** * @author Dr Heinz M. Kabutz, Maurice Naftalin, based on the * LineSpliterator from the Mastering Lambdas book. */ public class BigLineSpliterator implements Spliterator<DispLine> { private static final int AVG_LINE_LENGTH = 40; private static final int CHUNK_SIZE = Integer.MAX_VALUE; private final ByteBuffer[] bbs; private long lo; private final long hi; private final long offset; public BigLineSpliterator(ByteBuffer bb) { this(bb, 0, bb.limit()); } public BigLineSpliterator(ByteBuffer bb, int lo, int hi) { this(new ByteBuffer[]{bb}, lo, hi, 0); } public BigLineSpliterator(FileChannel fc) throws IOException { this(fc, 0, fc.size()); } public BigLineSpliterator(FileChannel fc, long lo, long hi) throws IOException { this(split(fc, lo, hi), lo, hi, lo); } private BigLineSpliterator(ByteBuffer[] bbs, long lo, long hi, long offset) { this.bbs = bbs; this.lo = lo; this.hi = hi; this.offset = offset; } private static ByteBuffer[] split(FileChannel fc, long lo, long hi) throws IOException { int numberOfChunks = (int) Math.ceil( ((double) (hi - lo)) / CHUNK_SIZE); long remainingBytes = (hi - lo); ByteBuffer[] bbs = new ByteBuffer[numberOfChunks]; for (int i = 0; i < bbs.length; i++) { long position = i * (long)CHUNK_SIZE + lo; long size = i < bbs.length - 1 ? CHUNK_SIZE : remainingBytes; remainingBytes -= CHUNK_SIZE; bbs[i] = fc.map(READ_ONLY, position, size); } long totalSize = 0; for (ByteBuffer bb : bbs) { totalSize += bb.limit(); } if (totalSize != (hi - lo)) throw new AssertionError("Split size does not match"); return bbs; } public boolean tryAdvance(Consumer<? super DispLine> action) { long index = lo; StringBuilder sb = new StringBuilder(); char next; while ((next = get(index++)) != '\n') { sb.append(next); } action.accept(new DispLine(lo, sb)); lo = lo + sb.length() + 1; return lo < hi; } private char get(long pos) { long truePos = pos - offset; int chunk = (int) (truePos / CHUNK_SIZE); return (char) bbs[chunk].get((int) (truePos % CHUNK_SIZE)); } private static final int SEQUENTIAL_THRESHOLD = 10000; @Override public Spliterator<DispLine> trySplit() { if (hi - lo < SEQUENTIAL_THRESHOLD) return null; long index = (lo + hi) >>> 1; while (get(index) != '\n') index++; BigLineSpliterator newSpliterator = null; if (index != hi) { newSpliterator = new BigLineSpliterator( bbs, lo, index, offset); lo = index + 1; } return newSpliterator; } @Override public long estimateSize() { return (hi - lo) / AVG_LINE_LENGTH; } @Override public int characteristics() { return ORDERED | IMMUTABLE | NONNULL; } }
I tested this on rather large files up to 60 GB in size.
The results were the same as with the grep -b 12345*
file.txt
Mac OS X command, albeit significantly
faster. However, I did not test whether it works with
files, when we start at anything besides 0.
That's it for this month and this year. Thanks for all the feedback you send. If you read the book, Maurice has asked if you could please leave a review on Amazon, whether you liked it or not :-)
Kind regards
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.