Abstract: Java supports unicode in variable and method names, although we generally recommend using ASCII characters. Here we have a look at how to do it.
Welcome to the 36th edition of The Java(tm) Specialists' Newsletter. This week, we will look at the strange things that happen when we try to use unicode characters in our code.
I am sitting outside in my garden, with beautiful sunshine and a pitbull terrier at my command ;-) Approximately a month ago, the biggest software vendor in South Africa went bankrupt, severely affecting the availability of software in this country. Fortunately for me, I have friends in convenient places: I purchased the software that I needed (Dragon NaturallySpeaking) from Amazon in Germany and had it shipped to infor AG, who I have spoken about in other newsletters - they very kindly shipped it down to the end of the earth.
As a result of using Dragon NaturallySpeaking, you will probably notice that my newsletters will have an even more conversational style than before. I am always looking at ways in which I can improve my newsletters and serve you better. Please remember to forward this newsletter to friends and colleagues who are interested in Java.
A special welcome to country No 56, Malta! My wife's previous boss at a hotel was the Maltese ambassador for Cape Town, which was really cool, as he had diplomatic immunity from parking fines and speeding fines. Mind you, traffic laws are rather lax in this country, I have only had one speeding fine in my life, and I drive an Alfa Romeo!
South Africa has just become the cheapest country in the world! We are the first country where a Big Mac costs less than US$ 1. It is cheaper here even than in the Philipines and China. I had a good response to my advert for my Java Course (thank you for your patience in this regard) and so I definitely want to develop the idea of running courses in South Africa, combined with a holiday :-)
1707 members are currently subscribed from 56 countries
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
A few months ago, I was reading a book written by the authors of Java, when I stumbled across a piece of code that was using Unicode characters as variable names. Being the curious type, I immediately tried writing a piece of code that used funny characters. Easier said than done! I don't know of any Java IDE that supports Unicode. The common e-mail systems in this world would also choke like a dog on a chicken bone if I sent you a newsletter containing Unicode characters ;-)
Before I get into how we could use Unicode characters in our variables, let's just take a step back and think about it: Imagine being called in by a Japanese company who has got a memory leak in their program which they want you to fix (one of the most common tasks I have been asked to perform), and imagine if in their company they used Japanese characters for their variables. Yes, it would compile if you follow the ideas in this newsletter, but what would the result be for me? I would probably pack my bags and head back home! It's bad enough having to read code where the variable names are in German or in Afrikaans, I cannot imagine trying to understand code where I don't even know the characters used in variable names!
Since I could not find an IDE that supported Unicode, my first job was to write a Unicode editor. Also easier said than done. I had learned many years ago that Writers and Readers are used for Unicode characters, but I had never really used Unicode before. My first approach at reading and writing Unicode files looked something like this:
public void load() throws IOException { BufferedReader in = new BufferedReader(new FileReader(filename)); String s; while((s = in.readLine()) != null) { // ... } }
Did you know that FileReader extends InputStreamReader? In its constructor it constructs a FileInputStream that it passes to its parent. The InputStreamReader has a constructor that takes as argument the encoding used for reading files. FileReader unfortunately does not expose the constructor that takes the encoding as an argument, it simply uses an operating-system dependent encoding. One cannot but wonder what the author of the FileReader had been smoking the day he/she wrote that code ...
(Actually, when I wrote the Sun Microsystems Java programmer examination a few years ago, the only none-GUI question that I got wrong was a question relating to reading ISO-8859-1 data. Perhaps there has always been a hole in my knowledge regarding this topic.)
Should you want to use the FileReader to read an encoding different to the standard one, you would have to do the following:
public void load() throws IOException { BufferedReader in = new BufferedReader( new InputStreamReader( new FileInputStream(filename), "UTF-16BE")); String s; while((s = in.readLine()) != null) { // ... } }
Without further ado, here is the code for a Unicode text editor. It allows you to insert Unicode characters by entering their decimal values and pressing the appropriate button. For the design, I have followed an approach I saw a few years ago on jGuru, where all the GUI elements are created lazily. It makes the GUI code very nicely maintainable, as you never have to worry in what order elements are constructed.
import java.awt.*; import java.awt.event.*; import javax.swing.*; import java.io.*; public class UnicodeEditor extends JFrame { private JPanel buttonPanel; private JScrollPane editorPanel; private JTextArea editor; private final String filename; private final String encoding; public UnicodeEditor(String filename, String encoding) throws IOException { this.filename = filename; this.encoding = encoding; getContentPane().add(getButtonPanel(), BorderLayout.NORTH); getContentPane().add(getEditorPanel(), BorderLayout.CENTER); load(); } protected JPanel getButtonPanel() { if (buttonPanel == null) { buttonPanel = new JPanel(); JButton unicodeInsert = new JButton("Insert Unicode:"); final JTextField unicodeField = new JTextField(8); JButton saveExit = new JButton("Save & Exit"); unicodeInsert.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent e) { getEditor().insert( "" + (char)Integer.parseInt(unicodeField.getText()), getEditor().getCaretPosition()); } }); saveExit.addActionListener(new ActionListener() { public void actionPerformed(ActionEvent e) { try { save(); System.exit(0); } catch(IOException ex) { ex.printStackTrace(); } } }); buttonPanel.add(unicodeInsert); buttonPanel.add(unicodeField); buttonPanel.add(saveExit); } return buttonPanel; } protected JTextArea getEditor() { if (editor == null) { editor = new JTextArea(); } return editor; } protected JScrollPane getEditorPanel() { if (editorPanel == null) { editorPanel = new JScrollPane(getEditor()); } return editorPanel; } protected void load() throws IOException { BufferedReader in = new BufferedReader(new InputStreamReader( new FileInputStream(filename), encoding)); StringBuffer buf = new StringBuffer(); int i; while((i = in.read()) != -1) buf.append((char)i); in.close(); getEditor().setText(buf.toString()); } protected void save() throws IOException { BufferedWriter out = new BufferedWriter(new OutputStreamWriter( new FileOutputStream(filename), encoding)); char[] text = getEditor().getText().toCharArray(); for (int i=0; i<text.length; i++) out.write(text[i]); out.close(); } public static void main(String[] args) throws IOException { if (args.length < 1) throw new IllegalArgumentException( "usage: UnicodeEditor filename [encoding]"); String encoding = (args.length == 2)?args[1]:"UTF-16BE"; UnicodeEditor editor = new UnicodeEditor(args[0], encoding); editor.setSize(500,500); editor.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE); editor.show(); } }
By default this uses the UTF-16BE format, standing for Sixteen-bit Unicode Transformation Format, big-endian byte order. You can specify any encoding when you start the editor, such as UTF-8, ISO-8859-1, etc. But, before we use this editor, we first need to have a file containing Unicode characters. I've written a code generator that generates two files, MathsSymbols.java and MathsSymbolsTest.java:
import java.io.*; public class UnicodeVariableGenerator { public static void generateMathsSymbols() throws IOException { PrintWriter out = new PrintWriter(new OutputStreamWriter( new FileOutputStream("MathsSymbols.java"), "UTF-16BE")); out.println("public interface MathsSymbols {"); out.print( " public static final double "); out.print((char)960); out.println(" = 3.14159265358979323846;"); out.print( " public static final double "); out.print((char)949); out.println(" = 2.7182818284590452354;"); out.println("}"); out.close(); } public static void generateMathsSymbolsTest() throws IOException { PrintWriter out = new PrintWriter(new OutputStreamWriter( new FileOutputStream("MathsSymbolsTest.java"), "UTF-16BE")); out.println("public class MathsSymbolsTest implements MathsSymbols {"); out.println(" public static void main(String args[]) {"); out.println(" System.out.println(\"The value of PI is: \" + \u03C0);"); out.println(" System.out.println(\"The value of E is: \" + \u03B5);"); out.println(" }"); out.println("}"); out.close(); } public static void main(String[] args) throws IOException { generateMathsSymbols(); generateMathsSymbolsTest(); } }
I won't include the code for MathsSymbols.java and MathsSymbolsTest.java, please run the UnicodeVariableGenerator class to generate that code. I already bomb out enough mailing systems by sending my newsletters in HTML (*evil grin*), no use in causing more trouble by using Unicode. Once you've run the UnicodeVariableGenerator, please load the MathsSymbols.java file with the UnicodeEditor, using UTF-16BE and have a look at it: you should see the Greek symbol for PI.
The last "trick" you need to know about is how to compile the MathsSymbols.java and MathsSymbolsTest.java. If you open the files with notepad or vi, you will probably see a rather strangely formatted file, with two bytes being used per character. When you compile these files, you therefore have to specify the character encoding used:
javac -encoding UTF-16BE MathsSymbols*.java
That's it! And it has kept me busy longer than just about all the other newsletters to try and get it right. Another interesting variation of this is where David Treves (who I met through a really cool advanced Java chat list - JavaDesk on YahooGroups - where you get shouted at if you ask beginner questions) tried to write/read Hebrew to the Database. He doggedly tried to get it working until eventually he succeeded - after I had given up hope of ever figuring it out. Stay tuned for the next few weeks to see how he did it.
Until next week, when we celebrate our first anniversary as the most interesting Java newsletter on the Internet ;-)
Kind regards
Heinz
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.