Abstract: BigInteger has new algorithms for multiplying and dividing large numbers that have a better computational complexity than previous versions of Java. A further improvement would be to parallelize multiply() with Fork/Join.
Welcome to the 236th edition of The Java(tm) Specialists' Newsletter, written in the Aegean Airlines Lounge in Athens on my way home after renewing my South African passport. Whilst speaking to the staff at the embassy, I discovered that Marthinus van Schalkwyk has recently been appointed as the South African Ambassador to Greece. Not sure what to say about that. Van Schalkwyk became leader of the National Party in South Africa after Frederik Willem de Klerk retired. The National Party is the one that banned the ANC long before I was born. He then changed the name to "New National Party", a bit like java.nio (New IO). They then merged with their previous opposition the Democratic Party and formed yet another party, the Democratic Alliance. Shortly after some disastrous local elections, the New National Party guys left the party (so to speak) and joined the ANC, which they had previously banned. It was at this point that I gave up trying to understand politics. Good thing too, seeing how we have a coalition government in Greece consisting of an extreme left and an extreme right party. How that marriage has lasted over a year is anyone's guess. At least I get to exercise my right to vote in Greece with regularity.
A special welcome to a new subscriber country - Georgia! We have several Georgians living in Chorafakia and it's great to now also be sending The Java(tm) Specialists' Newsletter to their home country :)
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
Whilst randomly looking through Java code a few months ago, I noticed that BigInteger had been improved. In previous versions, multiply worked the way your elementary school teacher used to torture you: O(n2). In an earlier newsletter, I mentioned how Karatsuba was a better algorithm to use for larger numbers. It weighs in at O(nlg 3) or approximately O(n1.585). Another algorithm is 3-way Toom Cook, which is O(n1.465). Whilst these look very similar, the difference is huge as n grows. Here is a small program that demonstrates it:
public class BigOComparison { public static void main(String... args) { for (int n = 1; n <= 1_000_000_000; n *= 10) { double n_2 = Math.pow(n, 2); double n_1_585 = Math.pow(n, 1.585); double n_1_465 = Math.pow(n, 1.465); double karatsuba_faster_than_quadratic = n_2 / n_1_585; double toom_cook_faster_than_karatsuba = n_1_585 / n_1_465; System.out.printf("%d\t%.2fx\t%.2fx%n", n, karatsuba_faster_than_quadratic, toom_cook_faster_than_karatsuba); } } }
We can see that as n increases in size, the difference between the computational complexities becomes more apparent. At n=1_000_000_000, n1.585 would be over 5000 faster than n2. n1.465 would be another 12 times faster than that:
1 1.00x 1.00x 10 2.60x 1.32x 100 6.76x 1.74x 1000 17.58x 2.29x 10000 45.71x 3.02x 100000 118.85x 3.98x 1000000 309.03x 5.25x 10000000 803.53x 6.92x 100000000 2089.30x 9.12x 1000000000 5432.50x 12.02x
Of course there are setup costs that are ignored with Big O notation. Because of these, we would only want to use Karatsuba for large numbers and Toom Cook when they are even bigger.
Java 8 now has these two algorithms built in to BigInteger. To see the improvement in performance, here is Fibonacci using Dijkstra's sum of squares algorithm:
import java.math.*; public class Fibonacci { public BigInteger f(int n) { if (n == 0) return BigInteger.ZERO; if (n == 1) return BigInteger.ONE; if (n % 2 == 1) { // F(2n-1) = F(n-1)^2 + F(n)^2 n = (n + 1) / 2; BigInteger fn_1 = f(n - 1); BigInteger fn = f(n); return fn_1.multiply(fn_1).add(fn.multiply(fn)); } else { // F(2n) = ( 2 F(n-1) + F(n) ) F(n) n = n / 2; BigInteger fn_1 = f(n - 1); BigInteger fn = f(n); return fn_1.shiftLeft(1).add(fn).multiply(fn); } } public BigInteger f_slow(int n) { if (n == 0) return BigInteger.ZERO; if (n == 1) return BigInteger.ONE; return f_slow(n - 1).add(f_slow(n - 2)); } }
The f_slow() method is only there to help us test our fast algorithm, but would not be useful beyond about n=30.
Here is a test class that we can run in Java 7 and 8 to see how the reduced computational complexity in the multiply() algorithm speeds things up in Java 8:
public class FibonacciTest { public static void main(String... args) { Fibonacci fib = new Fibonacci(); for (int i = 0; i < 10; i++) { if (!fib.f(i).equals(fib.f_slow(i))) { throw new AssertionError("Mismatch at i=" + i); } } for (int n = 10000; n < 50_000_000; n *= 2) { timeTest(fib, n); } } private static void timeTest(Fibonacci fib, int n) { System.out.printf("fib(%,d)%n", n); long time = System.currentTimeMillis(); System.out.println(fib.f(n).bitLength()); time = System.currentTimeMillis() - time; System.out.println("time = " + time); } }
Here is the output for Java 7:
heinz$ java -showversion FibonacciTest java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) fib(10,000) 6942 time = 14 fib(20,000) 13884 time = 10 fib(40,000) 27769 time = 11 fib(80,000) 55539 time = 23 fib(160,000) 111078 time = 51 fib(320,000) 222157 time = 108 fib(640,000) 444314 time = 181 fib(1,280,000) 888629 time = 590 fib(2,560,000) 1777259 time = 2530 fib(5,120,000) 3554518 time = 8785 fib(10,240,000) 7109037 time = 34603 fib(20,480,000) 14218074 time = 142635 fib(40,960,000) 28436148 time = 586950
You can see that as the value of n doubles, the time it takes roughly quadruples. You can also see by the number of bits that the results get rather large.
And now Java 8. For smaller numbers, we don't see much difference to Java 7, but we start to diverge at roughly fib(1,280,000). Java 8 calculates fib(40,960,000) about 50x faster. I wasn't patient enough to calculate larger numbers, since I have a flight to catch this afternoon :-) However, I would expect the next data point to be roughly 75x faster.
heinz$ java -showversion FibonacciTest java version "1.8.0_74" Java(TM) SE Runtime Environment (build 1.8.0_74-b02) Java HotSpot(TM) 64-Bit Server VM (build 25.74-b02, mixed mode) fib(10,000) 6942 time = 6 fib(20,000) 13884 time = 3 fib(40,000) 27769 time = 7 fib(80,000) 55539 time = 16 fib(160,000) 111078 time = 27 fib(320,000) 222157 time = 40 fib(640,000) 444314 time = 58 fib(1,280,000) 888629 time = 155 fib(2,560,000) 1777259 time = 324 fib(5,120,000) 3554518 time = 734 fib(10,240,000) 7109037 time = 1661 fib(20,480,000) 14218074 time = 4412 fib(40,960,000) 28436148 time = 11870
So, now you have seen that Java 8 has improved in at least one area. However, they did not go far enough in my opinion. Both Karatsuba and Toom-Cook can easily be parallelized with recursive decomposition. If you really want to work with such large numbers then you probably also want to throw hardware at your problem. I tried it out by modifying BigInteger and adding a little MultiplyTask:
private static final class MultiplyTask extends RecursiveTask<BigInteger> { private final BigInteger b1, b2; public MultiplyTask(BigInteger b1, BigInteger b2) { this.b1 = b1; this.b2 = b2; } protected BigInteger compute() { return b1.multiply(b2); } }
And then inside the multiplyKaratsuba() method I changed
BigInteger p1 = xh.multiply(yh); // p1 = xh*yh BigInteger p2 = xl.multiply(yl); // p2 = xl*yl
To instead do this:
MultiplyTask mt1 = new MultiplyTask(xh, yh); mt1.fork(); BigInteger p2 = xl.multiply(yl); // p2 = xl*yl BigInteger p1 = mt1.join();//xh.multiply(yh); // p1 = xh*yh
By default my code uses the common Fork/Join Pool, but it
would be marvelous to add a new method to BigInteger that
allows us to multiply in parallel, for example:
BigInteger.multiply(BigInteger, ForkJoinPool)
or more explicitly
BigInteger.multiplyParallel(BigInteger, ForkJoinPool)
.
My modifications to BigInteger worked fairly well. I also used ManagedBlocker to implement a "reserved caching scheme" in Fibonacci to avoid calculating the same number twice. ManagedBlocker worked very nicely and kept my cores more busy.
Here is a tweet where I show my parallel solution without using the ManagedBlocker. Notice how idle my cores are, especially at the beginning of the run when the numbers I am multiplying are small. And another tweet with the same code, but with my "reserved caching scheme" using the ManagedBlocker to keep the ForkJoinPool more lively. fib(1_000_000_000) finished 8% faster and as you can see from my CPU graph, utilized the available hardware much better. I talk about these types of concepts in my Extreme Java - Concurrency & Performance Course in case you'd like to learn how to do it yourself.
I hope you enjoyed this newsletter and that it was useful to you.
Kind regards from sunny and warm Greece
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.