Abstract: A few years ago, the second oldest man in our village Chorafakia wrote a book about the history of our area. Only catch - it was in Cretan Greek. I tried to read it, but couldn't. Google Translate shrugged at the strange Cretan dialect. Then ChatGPT 4.0 came along, and we can interact with it directly from within Java. In this newsletter we have a look at how I was able to translate this masterpiece into four languages: English, German, Dutch and Slovak.
Welcome to the 314th edition of The Java(tm) Specialists' Newsletter. For me, 2023 was rather rough. I took on far too many "opportunities", with the result that I didn't even take a break during summer. Fortunately I managed to spend part of December and January chilling out, solving Project Euler puzzles (157 solved so far - top 0.478% - whoopee). Yes, I code Java for fun, as well as professionally. Unfortunately, solving these Euler problems is physically demanding for me, since my mind does not shut down at night and I find myself awake at 3am, thinking of better and faster algorithms.
After I had exhausted myself with the maths problems, I moved onto a translation project that I had put aside for a while: a book written in Cretan dialect about our village Chorafakia. Let's read how that went ...
javaspecialists.teachable.com: Please visit our new self-study course catalog to see how you can upskill your Java knowledge.
Every time we had visitors to JCrete or one of our Java courses on Crete, and we hiked on the Akrotiri peninsula, we would come across curious large round structures that we could not explain. Shelters for animals? Leftover bunkers from World War II? What were these structures of stone? At long last, I found the answer.
A few years ago, Mr Manolis Koutrakis, the lead church singer in our village of Chorafakia, asked me whether I knew someone who could help him type in a book he had written. This was a historical book he had spent the previous twenty years putting together. He had written it by hand, and it had to be entered into a computer and then published. At the time, I did not know anyone who could help. A few years later, I was watching his sons Giorgos and Giannis distill their annual batch of excellent tsikoudia, when Mr Manolis told me that he had found someone and that his book was now for sale at the local supermarket. I rushed off to buy it, and he kindly autographed it for me.
Only catch was, the book was written in Greek. And not just any Greek, a Western Cretan dialect of Greek. My Greek was, and still is, at a very basic level. I can buy groceries, but Greek has tenfold the vocabulary of English, thus I quickly get lost. Cretan is even more challenging.
After looking at the book for a bit, and realizing that I would never be able to read it, I asked Mr Manolis if I could perhaps try translate it into English. My thinking was to pop it into Google Translate, and a few minutes later I would know what he had written. He kindly agreed, as long as it would not incur him any more expenses. He had already invested thousands in creating the book and printing hard copies. His son Giannis handed me a memory stick jam packed with all the PDFs and images that were in the book.
My first step was to put up a Wordpress website of the book (note that the website only contains a part of the book), with a plugin that would send individual paragraphs of text to Google Translate automatically. The results were cached in a database, so that we would could manually improve the translations. I spent several days on this, but using WordPress and a web interface always felt clumsy. Eventually I paused the project when I became too busy with my regular Java work.
Last December, I was standing in church watching Mr Manolis at 91 years old still leading the church singing with great energy and enthusiasm, and it occurred to me that perhaps ChatGPT would be able to do a better job of converting a strange dialect into English? On arriving home, I immediately began copy and pasting some of the text into the OpenAI chat version 3.5. The result was good, better than Google Translate, but not brilliant. Having read rave reviews about version 4.0, I applied to upgrade, and as soon as that was approved, tried again. The result was like night and day. The English prose was punchy and powerful. It made sense. ChatGPT had even translated a poem from Cretan into something in English that sounded a bit like poetry.
However, we still had to create a book. My first approach was to copy everything into LibreOffice. However, as a software geek, I don't like repeating myself. Each item such as an image or a location link should be a separate file, which we can then include in one large document. After much searching, I circled back to AsciiDoc, which I had used to write Dynamic Proxies in Java. The beauty with AsciiDoc is that I could break the book into hundreds of small chunks, all under Git source control, and then generate whatever format I wanted to: PDF, EPUB, HTML5, etc.
The printed copy of the book had all the photos at the back, in colour. This was obviously done to save printing costs, but made the book more difficult to read. Most sections referenced several photographs, thus you would find yourself paging back and forth all the time. In our revised electronic version, the photographs are inlined to where they are first referenced. This gives a much better flow through the book. The photos that I have are unfortunately not of the best quality, and perhaps in a future version, we will find time to rescan them. For now, they are "good enough" and adequately show what is written about.
Once I had split the book into hundreds of small chunks, it was time to send it to ChatGPT 4.0. I would copy a section and then paste it into the chat, with the prefix of "Translate into English:". This worked fine, unless there were some AsciiDoc tags in the middle. In that case, it would often also translate the tags. I thus changed the prefix to state: "Translate into English, keeping the AsciiDoc tags intact:". ChatGPT was surprisingly good at this, and it worked most of the time, unless the chunk of text was a bit too long.
After spending a few days painstakingly copying and pasting the text chunks into ChatGPT, I decided to try automate this process. How? Well, I was sitting down at the beach in my trusty old Suzuki Jimny, wondering how hard it could be to use their ChatGPT API. I just asked ChatGPT: "Show me how to send prompts to the ChatGPT API using Java." A few seconds later, I had a detailed explanation of how to do it. I wanted to use the Java 11 HttpClient, so I requested: "Show me how to do this using the standard Java HttpClient" and seconds later, I had that solution. Then I wanted to use Gson for the JSON encoding, so "Use Gson" and voila! we had the answer. I ran the code on my server, whilst I headed to the beach for my daily push-ups, run and swim. When I came back, it had translated dozens of text chunks.
LLMs are a game changer, as you have no doubt witnessed yourself. Less than a year ago, Chat GPT 4.0 was not even available yet. Now, it has transformed our work, assuming that we are allowed to use it for our business code. Obviously we do not want to send any business secrets to ChatGPT, thus we do have to be careful. But for general code, it can be exceedingly useful.
For example, I use a bunch of symbolic links of the images and locations directories in order to build the book. These directories are common to all of the translations. I had never copied a symbolic link before, so a quick ChatGPT: "Please show me how to copy symbolic links between two directories in Java", and again, seconds later, I had a demo of how to do it:
private static void copySymbolicLinks(Path source, Path target) throws IOException { Files.walkFileTree(source, new SimpleFileVisitor<>() { @Override public FileVisitResult visitFile(Path file, BasicFileAttributes attrs) throws IOException { if (Files.isSymbolicLink(file)) { Path targetLink = target.resolve(source.relativize(file)); Files.createDirectories(targetLink.getParent()); Files.copy(file, targetLink, StandardCopyOption.REPLACE_EXISTING, LinkOption.NOFOLLOW_LINKS); } return FileVisitResult.CONTINUE; } }); }
I could have figured this out myself, but that would have taken more than a few seconds. I'm also not sure if this is the best way to do it, but it works. Right now, that is all I care about. I just wanted to get the job done.
I then wrote some code to send each chunk of text to the ChatGPT API, again, by asking ChatGPT for a demo of how to do it. As long as the text wasn't too large, it seemed to work very well. The limit currently seems to be about 10k characters per prompt. With a minimum of code, I was able to have about 200 pages of text translated from a cryptic Cretan dialect into understandable English overnight whilst I dreamed of flying unicorns. Incredible. To give you a taste of what it produced, here is one of my favourite village stories:
"Primarily in these ways, they combated diseases, injuries, and accidents that occurred to them. They had great trust in these traditional remedies for their effective results from herbs and old wives' tales. That’s why they viewed the doctors' words with great suspicion when they arrived at their clinics. As vividly illustrated in the following story of Uncle Yiannis Pontikakis, a fisherman by trade. They nicknamed him so due to his ailments, which eventually forced him to knock on a doctor’s door. The doctor listened to him attentively and after examining him, he made his own diagnosis of the illness. He then explained how to take the medication he prescribed.
"He wished him a speedy recovery, and Uncle Pontikakis Yiannis left. He listened to the doctor’s words attentively. He paid for the visit, said goodbye, and went straight to the well-known pharmacy to inquire about the price of the medicine, because he also had his own remedy in mind. He greeted the pharmacist and asked about the cost of the medicines. Upon hearing the price, he told the pharmacist, "Relative, I don’t carry enough money and if I can, I will come back soon," and left. However, he headed straight to the butcher’s shop. There he found a goat’s head, exactly what he needed for his ailment, and it also matched the cost of the medicine. Without much thought, he bought it, put it in his bag, mounted his donkey, twirled his mustache, and set off for the village, smiling and eager.
"As soon as he arrived and dismounted, he got busy with preparing the goat’s head. He was skilled in this art, being a shepherd. It wasn’t long before Uncle Yiannis began to nibble on pieces of the boiled head, drink his wine, and blush like a partridge when it crows and looks for its mate. Thus, he recovered well and rejuvenated, not waiting for the doctor’s powders, as he used to say thereafter every time his adventures were recounted in his discussions with his companions.
"With this tale, I also conclude my report on what pertained to the health of people at that time. The diseases that tormented them and the remedies and traditional cures they used to combat and heal them."
Isn't that a delightful story? I can picture Uncle Giannis Pontikakis - The Little Mouse, sitting there munching on meat and drinking wine, thinking that this was much better than the doctor's powder. The book is filled with stories like that, and would have been wholly inaccessible to me without a translation.
I used my little Java program to translate the entire book into English. Since the Greek characters are a bit different to Latin, I also went through the book and fixed spellings and other obvious mistakes. There was not that much that I had to change.
Once I had finished the first complete translation, I was wondering how easy it would be to translate it from English into German. I had to adapt my program a bit, but within a day, I had a PDF of the book - in German. I read and write English comfortable. I can read German, but I find it difficult to write in it. However, the translated German is better than I could ever produce. I can certainly read it. Here is how one of the above paragraphs is translated:
German: "Kaum war er angekommen und abgestiegen, machte er sich daran, den Ziegenkopf zuzubereiten. Er war in dieser Kunst geübt, da er Schäfer war. Es dauerte nicht lange, da begann Onkel Yiannis, an Stücken des gekochten Kopfes zu knabbern, seinen Wein zu trinken und rot zu werden wie ein Rebhuhn, wenn es kräht und nach seinem Partner sucht. So erholte er sich gut und verjüngte sich, ohne auf die Pulver des Arztes zu warten, wie er fortan immer sagte, wenn seine Abenteuer in den Gesprächen mit seinen Gefährten wiedererzählt wurden."
Not perfect, but definitely understandable and funny if you know German. Yesterday I decided to go one step further and translate the book from English to Dutch. This is more tricky for me, since I can definitely not write Dutch, and can only understand some of it:
Dutch: "Zodra hij aankwam en van zijn ezel afsteeg, ging hij druk aan de slag met het bereiden van de geitenkop. Hij was bedreven in deze kunst, als herder zijnde. Het duurde niet lang voordat Ome Yiannis begon te knabbelen aan stukjes van de gekookte kop, zijn wijn dronk, en bloosde als een patrijs wanneer het kraait en op zoek is naar zijn partner. Zo herstelde hij goed en werd hij verjongd, niet wachtend op de poeders van de dokter, zoals hij nadien altijd placht te zeggen wanneer zijn avonturen werden besproken in zijn gesprekken met zijn kameraden."
But why stop with languages that I can sort-of understand? Yesterday, I also translated the book into Slovak, since one of our beach running friends is from there. This time, I have absolutely no idea what it says:
Slovak: "Hneď ako prišiel a zosadol z osla, pustil sa do prípravy kozliej hlavy. Bol v tomto umení zručný, keďže bol pastier. Nebolo dlho, kým strýko Yiannis nezačal obžierať kúsky varenej hlavy, piť svoje víno a červenať sa ako jarabica, keď kričí a hľadá svoju družku. Takto sa dobre zotavil a omladol, nečakajúc na doktorove prášky, ako potom zvykol hovoriť vždy, keď jeho dobrodružstvá rozprával vo svojich diskusiách s tovarišmi."
Since we are sending the prompts to the ChatGPT API, we have to be careful to not send them too quickly, otherwise we risk getting throttled or banned. In my final solution, I sent each request using a virtual thread, but used a Semaphore to throttle the requests on my side. This made the code trivial to write, thanks to StructuredTaskScope, whilst still giving excellent throughput. Our program can translate this 500 page PDF in less of an hour into another language. BTW, if you would like to learn more about virtual threads, we are now offering a course on this topic.
In order to send some love to the author of this book, I have put up a page where we can buy the PDF and EPUB versions in Cretan Greek, English, German, Dutch, and Slovak. Plus, if you would like the book in another language, let me know and I'll run the translator for you. As a Java geek, I found it truly fascinating that something like this was possible. The book itself is fascinating, as well as the technology behind it. All the profit from book sales go to the author Mr Emmanouil Koutrakis, son of Giorgos. Let's surprise him :-)
As you've no doubt experienced yourself, ChatGPT does not always give us perfect answers, and we need to guide it a bit in order to make it work. It can be a good partner in our software development. IntelliJ even has AI built right into the IDE. It is a game changer and should make us far more productive over time.
Heinz
P.S. The strange round structures that we saw on our hikes were lime kilns and is how the people of our area used to scrape together a living. It was horrible and dangerous work, but the soil of our peninsula is dry and not very fertile. In the past, there was no piped water here, and thus options were rather limited. There is a reproduction of such a kiln on the hiking path down from the Gouverneto Monastery. Imagine a huge rock oven that burns for a week and whose purpose is to produce quicklime. The father of the book author once single-handedly collected 1000 bundles of firewood for his kiln. He used the proceeds of the work to buy himself a new suit.
We are always happy to receive comments from our readers. Feel free to send me a comment via email or discuss the newsletter in our JavaSpecialists Slack Channel (Get an invite here)
We deliver relevant courses, by top Java developers to produce more resourceful and efficient programmers within their organisations.