Abstract: We review Java Concurrency in Practice by Brian Goetz. Brian's book is the most readable on the topic of concurrency in Java, and deals with this difficult subject with a wonderful hands-on approach. It is interesting, useful, and relevant to the problems facing Java developers today.
Welcome to the 125th edition of The Java(tm) Specialists' Newsletter, sent to you from the "Dark Continent" (a term for Africa, coined about 130 years ago). A few weeks ago we suffered recurrent power failures, which made the name "Dark Continent" rather apt.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
In my course on the new features in Java 5, we examine the "new" concurrency constructs of Java. Most of these are based on classes that have been freely available on Doug Lea's website for at least six years, and were well described in his excellent book Concurrent Programming in Java [ISBN 0201310090] . However, I am yet to meet someone, either on a course or during my contracting / consulting, that has read Doug Lea's book.
Java Concurrency in Practice [ISBN 0321349601] is split into four distinct sections:
Brian does an excellent job of explaining, and his examples are more similar to the real world than you would find with other books.
Something else I like about the book is that it mentions all sorts of new features that are available in Java 6. Whilst I cannot use that yet in production, it is good to know what is coming.
At the end of the first section on Fundamentals, Brian goes through the steps of building an efficient, scalable result cache that you could use in a web server. The idea is to remember the result from a previous calculation in order to reduce latency and increase throughput, at the cost of a bit more memory usage.
The problem with building this cache is that if we are not careful, we could easily turn it into a scalability bottleneck. Brian starts with a basic HashMap, then looks at ways that we can make it more scalable.
We first define interface Computable, which performs some calculation:
public interface Computable<A, V> { V compute(A arg) throws InterruptedException; }
The next class is ExpensiveFunction, which takes a long time to compute the result:
import java.math.BigInteger; public class ExpensiveFunction implements Computable<String, BigInteger> { public BigInteger compute(String arg) { // after deep thought... try { Thread.sleep(1000); } catch (InterruptedException e) { Thread.currentThread().interrupt(); } return new BigInteger(arg); } }
We now build a Computable wrapper that remembers the result from the previous calculations and returns these transparently. This process is called memoization.
import java.util.*; public class Memoizer1<A, V> implements Computable<A, V> { @GuardedBy("this") private final Map<A, V> cache = new HashMap<A, V>(); private final Computable<A, V> c; public Memoizer1(Computable<A, V> c) { this.c = c; } public synchronized V compute(A arg) throws InterruptedException { V result = cache.get(arg); if (result == null) { result = c.compute(arg); cache.put(arg, result); } return result; } }
The @GuardedBy annotation is one of several described in the book that help us to document our assumptions regarding thread safety:
import java.lang.annotation.*; /** * The field or method to which this annotation is applied can only * be accessed when holding a particular lock, which may be a * built-in (synchronization) lock, or may be an explicit * java.util.concurrent.Lock. */ @Target({ElementType.FIELD, ElementType.METHOD}) @Retention(RetentionPolicy.RUNTIME) public @interface GuardedBy { String value(); }
Since HashMap is not threadsafe, we need to take the conservative approach and lock on all access. If you have several threads queued up to execute compute, then it might actually take longer than without caching.
A slight improvement occurs when you change the HashMap to be a ConcurrentHashMap instance:
import java.util.Map; import java.util.concurrent.ConcurrentHashMap; public class Memoizer2<A, V> implements Computable<A, V> { private final Map<A, V> cache = new ConcurrentHashMap<A, V>(); private final Computable<A, V> c; public Memoizer2(Computable<A, V> c) { this.c = c; } public V compute(A arg) throws InterruptedException { V result = cache.get(arg); if (result == null) { result = c.compute(arg); cache.put(arg, result); } return result; } }
The problem with Memoizer2 is that even if one thread starts an expensive computation, other threads may still start the same computation. Instead, we should have some way of representing the notion that "thread X is currently computing f (27)", so that if another thread arrives looking for f (27), it knows that it should wait until Thread X has finished its work.
Yet another option is to use the FutureTask class
import java.util.Map; import java.util.concurrent.*; public class Memoizer3<A, V> implements Computable<A, V> { private final Map<A, Future<V>> cache = new ConcurrentHashMap<A, Future<V>>(); private final Computable<A, V> c; public Memoizer3(Computable<A, V> c) { this.c = c; } public V compute(final A arg) throws InterruptedException { Future<V> f = cache.get(arg); if (f == null) { Callable<V> eval = new Callable<V>() { public V call() throws InterruptedException { return c.compute(arg); } }; FutureTask<V> ft = new FutureTask<V>(eval); f = ft; cache.put(arg, ft); ft.run(); // call to c.compute happens here } try { return f.get(); } catch (ExecutionException e) { // Kabutz: this is my addition to the code... try { throw e.getCause(); } catch (RuntimeException ex) { throw ex; } catch (Error ex) { throw ex; } catch (Throwable t) { throw new IllegalStateException("Not unchecked", t); } } } }
The Memoizer3 is almost perfect, but there still is a window of vulnerability when two threads might compute the same value. The window is smaller than in Memoizer2, but it still is there. Memoizer3 is vulnerable to this problem because a compound action (putif-absent) is performed on the backing map that cannot be made atomic using locking.
The next approach in the book [ISBN 0321349601] is to use the atomic putIfAbsent() method from ConcurrentMap.
import java.util.concurrent.*; public class Memoizer<A, V> implements Computable<A, V> { private final ConcurrentMap<A, Future<V>> cache = new ConcurrentHashMap<A, Future<V>>(); private final Computable<A, V> c; public Memoizer(Computable<A, V> c) { this.c = c; } public V compute(final A arg) throws InterruptedException { while (true) { Future<V> f = cache.get(arg); if (f == null) { Callable<V> eval = new Callable<V>() { public V call() throws InterruptedException { return c.compute(arg); } }; FutureTask<V> ft = new FutureTask<V>(eval); f = cache.putIfAbsent(arg, ft); if (f == null) { f = ft; ft.run(); } } try { return f.get(); } catch (CancellationException e) { cache.remove(arg, f); } catch (ExecutionException e) { // Kabutz: this is my addition to the code... try { throw e.getCause(); } catch (RuntimeException ex) { throw ex; } catch (Error ex) { throw ex; } catch (Throwable t) { throw new IllegalStateException("Not unchecked", t); } } } } }
When I read the Java samples, the problem and the solution both appeared quite straightforward to me. Brian has picked plausible real-world example that clearly demonstrate the techniques that he presents.
This book should be available in your bookshops in the next few weeks, so keep your eye open for this one! Alternatively, you can also pre-order it on Amazon.com [ISBN 0321349601] .
Kind regards
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.