Abstract: Us humans are rather good at figuring out what is meant from context. Computers are terrible. They do exactly what we tell them to. Fun ensues when we feed them wrong number formatting information.
Welcome to the 240th edition of The Java(tm) Specialists' Newsletter, sent to you from Ljubljana, Slovenia. I taught my Concurrency course to Epilog, a small engineering company. As I always do, I asked the class how much experience they had with Java. Some of them had been programming the language longer than me, so I was thoroughly intimidated. How could I possible teach them anything new? As it turned out, we had some excellent discussions about some in-depth Java topics and I left satisfied that I had earned my pay :-)
Last night, their CEO took some of us out sightseeing and told us about the history of Ljubljana. Yes, that's a fringe benefit of working with small companies :-) Private tour by the CEO. Did you know that the oldest wheel in Europe was discovered near Ljubljana? They dated it to about 5150 years! It is also the oldest wooden wheel ever discovered. Look it up.
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
It is hard to program computers to think like us. Just like aspies, they take things too literally. We need to weigh our words carefully. They might believe you really did want to "sudo rm -rf /".
In 1984, my dad bought a ZX Spectrum. He wanted to do calculations on it with a spreadsheet. After a long wait, the spreadsheet was loaded via audio tape and he eagerly started entering his figures. Nothing worked. Every number had a flashing question mark next to it. He called my older brothers in to help. They didn't know what could possibly be wrong with his numbers. Eventually I was asked to have a look. Having already done a bit of programming in BASIC, I suggested that we should maybe try use the period as a decimal point, instead of a comma. That solved the problem and at 13 years of age, was the first time I encountered this issue. But not the last.
Officially, South Africa uses the following format for one hundred and twenty-three thousand four hundred and fifty-six Rand and seventy-eight cents: "R123 456,78". The decimal point is a comma and the thousand separator is a space. Not just any space, a NO-BREAK SPACE (U+00A0). Hardly anyone uses the official format. We write "R123,456.78", just like the US and UK. This is true for internet banking, online shopping, government documents, contracts, etc. But you could find all of these formats being used: "R123456.78", "R123456,78", "R123 456.78", "R123 456,78" and "R123,456.78". Our human fuzzy logic brains do not even see that there are differences. We parse what makes most sense to us.
Since we have so many options, it is up to the individual person to choose which one they would like to see. I personally like "R123 456.78", but sometimes write it as "R123'456.78" and other times "R123,456.78". I never use comma as a decimal point separator. And neither do any of my internet banking sites. My auditors send me financial statements with a space as a thousand separator, pleasant to the eye. Their invoice uses a period as a decimal point. Even though the comma decimal point is the official format, hardly anyone uses it.
Continental Europe is different. Here everyone uses comma as the decimal point. A few years ago, I tried to pay £502.46 to my printers in England using a local internet banking site. After the payment had been submitted, I remembered that I had forgotten to send them proof of payment. When I looked at my transaction, I saw that this banking website had ignored the dot! It had initiated a £50246 payment. Oh my! Luckily I was able to quickly cancel the transfer.
Even though I've lived in Europe for 10 years, I still use en_ZA as a locale, as it has a great date format yyyy/MM/dd. However, I noticed a few Mac OS X upgrades back that all my decimal points were displayed as commas. In the region settings under "Advanced", I customized my en_ZA setting to the more familiar #,###.## format. Now the numbers in my applications looked good again. The only danger were those European banking websites.
One of the cool toys I've been playing with recently is a Carbide3D Nomad CNC router. Of course I control it from Java. I wrote a little program that would let the router follow my cursor around the screen and start drilling the moment I clicked the mouse button. It is a lot of fun watching the router track my movements, like an obedient dog. The way we interact with the CNC router is via GCode, an ancient text based control language for telling it what to cut. Not quite as old as that Ljubljana wheel, but almost. I connected everything and launched my program. The machine stuttered and started spinning out of control, trying to drill into the table, flying right and left, up and down at great speed. I yanked the power cable out of the wall. Close shave, but what was wrong?
I had carefully generated all the GCode with String.format("%.3f") to make sure I had the ideal number of decimal points in the commands. However, I had accidentally launched my code with JDK 9 Early Access! Yeah, using an EA version of Java with a potentially seriously harmful piece of woodcarving equipment is not smart. But, between playing with CNC routers, I was reviewing the JDK code for Java 9, particularly how VarHandles were being applied in the atomic and concurrent classes. I had forgotten to set my environment back to Java 8.
With the help of Stuart Marks and Ben Evans, we discovered
that Java 9 now uses the "official" Unicode formats for
decimal points. For South Africa, that means that all your
floating point numbers will in future automatically be converted
into "123 456,78", whether you like that format or not. Note
that I had already customized the format on my machine to the
more common "123,456.78". My mistake with String.format()
was that I assumed it would disregard locale information.
The better way is to always specify the locale that you need,
as in String.format(Locale.US, "%.3f", num)
.
It is definitely a bug to disregard user customizations. I submitted a Java bug report and in less than 24 hours it was closed as "not an issue". It is. It will affect a country with 50.000.000 or 50,000,000 or 50 000 000 inhabitants. I know it's "only Africa" and besides, "they don't have computers there" <sarcasm/>. Microsoft did something similar with forcing these official settings (that on one uses) down our throats, but they also accepted user customizations. If you live in South Africa and this affects you, maybe you would also like to submit a bug report?
To see the Java 9 regression in action, have a look at this code:
import java.text.*; import java.util.*; public class PayPrinters { public static void main(String... args) throws Exception { Locale en_ZA = new Locale("en", "ZA"); NumberFormat format = NumberFormat.getInstance(en_ZA); System.out.printf(en_ZA, "%,.2f%n", 123456.78); parse(format, "123 456,78"); // normal space parse(format, "123 456.78"); // normal space parse(format, "123\u00a0456,78"); // no-break space parse(format, "123\u00a0456.78"); // no-break space parse(format, "123456,78"); parse(format, "123456.78"); parse(format, "123.456,78"); parse(format, "123,456.78"); } private static void parse(NumberFormat format, String number) throws ParseException { System.out.println("parse(\"" + number + "\") = " + format.parse(number)); } }
Here are the various Java versions that I have on my Mac OS X machine. The number is formatted as 123,456.78 on Java 6, 7 and 8, but as 123 456,78 on Java 9, completely disregarding my settings. Using a no-break space is risky. You can't see it. And most people would just use a normal space. In which case the number cannot be parsed either.
heinz$ java -showversion PayPrinters java version "1.6.0_65" Java(TM) SE Runtime Environment (build 1.6.0_65-b14-468-11M4833) Java HotSpot(TM) 64-Bit Server VM (build 20.65-b04-468, mixed mode) 123,456.78 parse("123 456,78") = 123 parse("123 456.78") = 123 parse("123 456,78") = 123 parse("123 456.78") = 123 parse("123456,78") = 12345678 parse("123456.78") = 123456.78 parse("123.456,78") = 123.456 parse("123,456.78") = 123456.78
heinz$ java -showversion PayPrinters java version "1.7.0_80" Java(TM) SE Runtime Environment (build 1.7.0_80-b15) Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode) 123,456.78 parse("123 456,78") = 123 parse("123 456.78") = 123 parse("123 456,78") = 123 parse("123 456.78") = 123 parse("123456,78") = 12345678 parse("123456.78") = 123456.78 parse("123.456,78") = 123.456 parse("123,456.78") = 123456.78
heinz$ java -showversion PayPrinters java version "1.8.0_101" Java(TM) SE Runtime Environment (build 1.8.0_101-b13) Java HotSpot(TM) 64-Bit Server VM (build 25.101-b13, mixed mode) 123,456.78 parse("123 456,78") = 123 parse("123 456.78") = 123 parse("123 456,78") = 123 parse("123 456.78") = 123 parse("123456,78") = 12345678 parse("123456.78") = 123456.78 parse("123.456,78") = 123.456 parse("123,456.78") = 123456.78
heinz$ java -showversion PayPrinters java version "9-ea" Java(TM) SE Runtime Environment (build 9-ea+134) Java HotSpot(TM) 64-Bit Server VM (build 9-ea+134, mixed mode) 123 456,78 parse("123 456,78") = 123 parse("123 456.78") = 123 parse("123 456,78") = 123456.78 parse("123 456.78") = 123456 parse("123456,78") = 123456.78 parse("123456.78") = 123456 parse("123.456,78") = 123 parse("123,456.78") = 123.456
There is a workaround that Stuart Marks discovered. If
you launch your JVM with -Djava.locale.providers=HOST,CLDR,JRE
it first uses your settings, then the official Unicode (CLDR)
for your locale and lastly the default JRE settings. In my
opinion, this should be the default for Java VMs. Here is
the output in Java 9:
heinz$ java -Djava.locale.providers=HOST,CLDR,JRE \ -showversion PayPrinters java version "9-ea" Java(TM) SE Runtime Environment (build 9-ea+134) Java HotSpot(TM) 64-Bit Server VM (build 9-ea+134, mixed mode) 123,456.78 parse("123 456,78") = 123 parse("123 456.78") = 123 parse("123 456,78") = 123 parse("123 456.78") = 123 parse("123456,78") = 12345678 parse("123456.78") = 123456.78 parse("123.456,78") = 123.456 parse("123,456.78") = 123456.78
Thanks for reading this newsletter :-) Short and sweet, as my flight back home is about to depart.
Kind regards
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.