Abstract: Java 11 added the HttpClient to give us a better way to send HTTP requests. It supports asynchronous and synchronous mode. HTTP2 comes out of the box. The threading is a bit funky though and Professor Cay Horstmann explores how things work underneath the covers.
Welcome to the 271st edition of The Java(tm) Specialists' Newsletter. We have a guest author this month, Professor Cay Horstmann of Core Java fame. His article is based on some experiments that we did at JCrete, but the code has been almost completely rewritten. Kind regards - Heinz.
At JCrete 2019, Heinz
Kabutz led a session that showed a mystery about configuring the thread
pool for the HttpClient
class. Setting a new executor
didn't have the desired effect. It turns out that the implementation
has changed (and perhaps not for the better), and the documentation is
lagging. If you plan to use HttpClient
asynchronously, you
really want to pay attention to this. As a bonus, there are a few more
useful tidbits about using HttpClient
effectively.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
The HttpClient
was an incubator feature in Java 9 and has
been, in its final form, a part of the Java API as of Java 11. It
provides a more pleasant API than the classic
HttpURLConnection
class, has a nice asynchronous
interface, and works with HTTP/2. This article deals with the
asynchronous interface.
Suppose you want to read a web page and then process the body once it
arrives. First make a HttpClient
object:
HttpClient client = HttpClient.newBuilder() // Redirect except https to http .followRedirects(HttpClient.Redirect.NORMAL) .build();
Then make a request:
HttpRequest request = HttpRequest.newBuilder() .uri(new URI("https://horstmann.com")) .GET() .build();
Now get the response and process it, by adding to the completable
future that the sendAsync
method returns:
client.sendAsync(request, HttpResponse.BodyHandlers.ofString()) .thenAccept(response -> ...);
The sendAsync
method uses non-blocking I/O to get the
data. When the data is available, it is passed on to a callback for
processing. The HttpClient
makes use of the standard
CompletableFuture
interface. The function that was passed
to thenAccept
is called when the data is ready.
In which thread? Of course not in the thread that has called
client.sendAsync
. That thread has moved on to do other
things.
The HttpClient.Builder
class has a method
executor
:
ExecutorService executor1 = Executors.newCachedThreadPool(); HttpClient client = HttpClient.newBuilder() .executor(executor1) .followRedirects(HttpClient.Redirect.NORMAL) .build();
According to the JDK 11 docs, this "sets the executor to be used for asynchronous and dependent tasks".
At JCrete 2019, Heinz Kabutz demonstrated a program that grabbed
Dilbert comics of the day, going to URLs of the form
https://dilbert.com/strip/2019-08-21
, finding the image
URLs inside, and then loading the images.
Here is a slight simplification of the code.
This file ImageInfo.java has a
class ImageInfo
that holds the image URL and binary data.
A subclass DilbertImageInfo
has the Dilbert-specific
details for getting the URL of the web page and for extracting the
image URL from it. A class WikimediaImageInfo
does the
same for the Wikimedia image of the day.
Because two requests are needed for fetching each image, it is convenient to make a helper method:
public <T> CompletableFuture<T> getAsync( String url, HttpResponse.BodyHandler<T> responseBodyHandler) { HttpRequest request = HttpRequest.newBuilder() .GET() .uri(URI.create(url)) .build(); return client.sendAsync(request, responseBodyHandler) .thenApply(HttpResponse::body); }
This helper method is called in two methods for getting the image URL and data:
private CompletableFuture<ImageInfo> findImageInfo( LocalDate date, ImageInfo info) { return getAsync(info.getUrlForDate(date), HttpResponse.BodyHandlers.ofString()) .thenApply(info::findImage); } private CompletableFuture<ImageInfo> findImageData( ImageInfo info) { return getAsync(info.getImagePath(), HttpResponse.BodyHandlers.ofByteArray()) .thenApply(info::setImageData); }
Now we are ready for our processing pipeline:
public void load(LocalDate date, ImageInfo info) { findImageInfo(date, info) .thenCompose(this::findImageData) .thenAccept(this::process); }
The process
method shows the image in a frame. See
dailyImages/ImageProcessor.java
for the complete code.
But it didn't work. By printing a message in process
that
included Thread.currentThread()
, it was clear that the
thread was from the global fork-join pool, not the provided
executor. On my Linux laptop, the program just hung, and on Heinz's
Mac, it crashed with an out of memory error when trying to fetch 10,000
images.
Heinz wasn't the first to notice that setting the executor doesn't work as expected - see this StackOverflow query.
The bug database gives some clues. This is a change in behavior since JDK 11. Nowadays, the "dependent" tasks are not executed by the provided executor, but by the common fork-join pool. However, the documentation hasn't been updated to track the change, and that's another bug.
Let's pick apart the statements from the change notice:
executor
method sets the executor for
the internal workings. And you can control the dependent tasks by
specifying an executor:
return client.sendAsync(request, responseBodyHandler) .thenApplyAsync(HttpResponse::body, executor2);
In this example, we want to load a potentially large number of images. We don't want a thread per image, so let's use a fixed thread pool.
private ExecutorService executor2 = Executors.newFixedThreadPool(100);
And we do not want to set the executor for the HTTP client internals. We have no idea what it does, and there is no documentation what kind of executor might be adequate.
Here is the takeaway for you:
sendAsync
unless you know that the common fork-join pool
is the right executor for that task.executor
on an HttpClient
builder unless you know that your executor is better (presumably
after having studied and understood the
source code
of the HttpClient
implementation).
The HttpClient
implementation uses a cached
thread pool for its tasks. On Linux, when fetching 10,000
images, there were never more than a few hundred concurrent
tasks in the HttpClient
executor (presumably
all short-duration responses to selector events). On the Mac,
the virtual machine ran out of memory after creating just
over 2,000 threads - your mileage might vary. When supplying
a fixed thread pool, the program hung on the Mac as it did on
Linux.
The program simply calls loadAll
to load all images and
process them:
public void loadAll() { long time = System.nanoTime(); try { LocalDate date = LocalDate.now(); for (int i = 0; i < NUMBER_TO_SHOW; i++) { ImageInfo info = new DilbertInfo(); info.setDate(date.toString()); System.out.println("Loading " + date); load(date, info); date = date.minusDays(1); } latch.await(); } catch (InterruptedException e) { Thread.currentThread().interrupt(); System.err.println("Interrupted"); } finally { time = System.nanoTime() - time; System.out.printf("time = %dms%n", (time / 1_000_000)); } }
The latch is initialized as
private final CountDownLatch latch = new CountDownLatch(NUMBER_TO_SHOW);
The process
method, which is called as the last part of
the pipeline in the load
method, calls:
latch.countDown()
That way, the loadAll
method doesn't terminate until all
of the images are loaded.
This is a "happy day" design that won't hold up in real life. If
anything goes wrong in the pipeline, then process
may
never be called.
We need to put the equivalent of a finally
clause into the
processing pipeline to make sure that the latch is counted down after
each image has either been processed, or a failure has occurred. Here
is how you do that:
public void load(LocalDate date, ImageInfo info) { findImageInfo(date, info) .thenCompose(this::findImageData) .thenAccept(this::process) .whenComplete((x, t) -> latch.countDown()); }
The whenComplete
action is invoked with the result or
exceptional outcome of the completable future.
If you want to see if an exception occurred, you can check that
t
is not null
, and then print the stack
trace. Or you can sandwich in an exception handler:
public void load(LocalDate date, ImageInfo info) { findImageInfo(date, info) .thenCompose(this::findImageData) .thenAccept(this::process) .exceptionally(t -> { t.printStackTrace(); return null; }) .thenAccept(t -> latch.countDown()); }
With this change, the program will terminate.
Here is the takeaway for you:
Now when you run the program against the Dilbert site and try getting a thousand images, you can see that the site simply refuses to serve up that many. You get exceptions that are caused by:
javax.net.ssl.SSLHandshakeException: Remote host terminated the handshake
The site hates people who hammer it, and turns them away. It actually remembers your IP address and takes a while to get back into its good graces.
That's why the Wikimedia image of the day site is more useful for
testing. Their images might not be as funny, but it earnestly tries to
serve them. Still, it can't keep up. After all, a thousand requests are
issued in an instant, and then the HttpClient
instances
await the responses. Some of them throw an exception that is caused by:
java.io.IOException: too many concurrent streams
This is how the HttpClient
reacts to an HTTP/2 server that
has sent it a "go away" response (after first having informed it about
the maximum number of concurrent connections). Unfortunately, with the
current implementation of the HttpClient
, it is impossible
to find out what the nature of the failure was. People are unhappy
about that, as evidenced by this
and this
bug report and this
StackOverflow question.
The easiest remedy is to space out the requests by some amount. In our testing, 100 ms worked fine.
Should the HttpClient
take this issue on? By retrying some
number of times? Or spacing out requests? Or should there be better
error reporting so that the users of the class can make those
decisions? As it is, the HttpClient
isn't quite ready for
the messiness of the real world.
Here is the takeaway for you:
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.