Diacritics restoration: can we do better using neural networks and deep learning? Perspectives from a 10-year-old open source project

People who need to write correctly in languages that have letters with various diacritics such as ‘ğ‘, ‘ş‘, ‘ö‘, ‘ı‘, etc., can be troubled with US or UK standard QWERTY keyboards because of the lack of such letters on those keyboard layouts. If you also need to switch between languages such as English, and Turkish, you know what I mean.

Possible forms of diacritic restoration in Turkish for “aci”. Source: “Diacritic Restoration Using Recurrent Neural Network” by Ayşenur Genç Uzun

The process of taking a piece of writing without correct spelling (that uses standard ASCII characters, without proper diacritics) , and replacing the relevant letters with the correct ones is known as “diacritics restoration“, or “diacritics reconstruction” (or “deASCIIfication” colloquially). About 10 years ago, I wrote a Python program to help people with this: Turkish Deasciifier; a port of the Emacs Lisp code developed by Prof. Deniz Yüret. There’s also a web interface at

Read the rest of this entry »
Leave a comment

Posted by on October 22, 2020 in Linguistics, Programlama, python, Science


Tags: ,

What is Engineering? Perspectives from “The Sciences of the Artificial”

If you are an engineer, or an engineering manager responsible for designing software-intensive complex systems, you will find a lot of food for thought in the following quotes from “The Sciences of the Artificial” by Nobel laureate and Turing Award recipient Herbert A. Simon. You might realize that the term ‘software‘ never appears in the following quotations, and the word ‘program‘ is mentioned only twice. Yet, the issues, concerns, methods, and the line of reasoning proposed by Simon can be used to attack the core of challenges facing software engineers working on different systems, and diverse domains. I believe these, as well as most of the rest of the book, deserve a critical and deep reading by generations of engineers.

“There is nothing special that needs to be said here about resource conservation—cost minimization, for example, as a design criterion. Cost minimization has always been an implicit consideration in the design of engineering structures, but until a few years ago it generally was only implicit, rather than explicit. More and more cost calculations have been brought explicitly into the design procedure, and a strong case can be made today for training design engineers in that body of technique and theory that economists know as “cost-benefit analysis.””

Read the rest of this entry »
Leave a comment

Posted by on October 6, 2020 in business, Management, Programlama, Science


Tags: , ,

Mozilla Common Voice Veri Seti ve Türkçe: Bir Gariplik Yok Mu?

Günümüzde YZ (Yapay Zeka) uygulamaları hayatımızın her alanına nüfuz etmeye devam ediyor: ses arayüzleri ve akıllı asistanlar pek çok yerde karşımıza çıkmakta. Makine Öğrenme temelli yapay zeka uygulamalarının diğer alanlarında olduğu gibi ses ve konuşma teknolojileri alanında da bilimsel çalışmaları ve inovasyonu artırmak, start-up’ları hareketlendirmek için kaliteli ve doğrulanmış, etiketlenmiş açık veri setlerine erişim önemli. Küçük start-up’ların Internet ve teknoloji devleri ile bu konuda hemen yarışmasını beklemek ise zor. Bu yüzden bu alanda veri toplayan ve açık lisanslar ile paylaşan Mozilla Common Voice gibi projeler önemli bir rol üstleniyor. Kaliteli ve çok miktarda veri içeren veri setleri, ses tanıma (Speech to Text), yazıyı sese çevirme (TTS – Text to Speech) vb. için çok önemli bir başlangıç noktası.

Kısa süre önce Mozilla Common Voice ses veri setinin yeni sürümünü duyurdu (Temmuz, 2020). Ses verisi toplanan diller arasında Türkçe olduğu için dikkatimi çekti. Yıllar önce benzer bir projeye katkıda bulunmuş biri olarak daha detaylı inceleyince beni şaşırtan bir durumla karşılaştım! Dünyada neredeyse 80 milyon kişi tarafından konuşulan Türkçe, bu veri seti içinde, İngilizceyi geçtim, 10 kat daha az insanın konuştuğu Katalanca gibi bir dilden bile daha az veri ile temsil ediliyor: Veri setindeki Katalanca veri miktarı Türkçe’den 24 kat, İngilizce ise 86 kat daha fazla!

Read the rest of this entry »


Tags: , , , , , , , , ,

Generative Deep Learning and Bach, a Good Fit?

If you’re like me, you know that there’s never “enough Bach” in one’s life and you can always tap into infinite musical curiosities based on Bach. Using Artificial Intelligence methods such as deep learning to “train” computers for music composition is one of the fascinating recent trends in this area, and applying these automated, statistical methods to Bach chorales is an active topic of research with interesting results. The book by David Foster, “Generative Deep Learning – Teaching Machines to Paint, Write, Compose, and Play“, has a chapter dedicated to using generative deep learning methods such as MuseGAN for music composition, and explains how such “generative” models can be trained on Bach’s real polyphonic compositions to output new musical pieces in the style of Bach.

Below is an original piece created by the Generative Adversarial Deep Learning Network (GAN, in particular the famous MuseGAN network architecture). The MuseGAN deep learning network system was able to create this after training for only 1000 epochs on a moderate laptop for 2 hours (without using GPUs), based on the data set at (a set of 229 Bach chorales). In other words, this is definitely not representative of what Deep Learning can achieve as best because such a system can be easily trained for longer on much more powerful systems (see further examples below). The focus of these examples is the fact that you can also start to experiment with deep learning systems that start to model musical aspects without explicit musical teaching, hard-encoded rules in software, etc.

You can click on the image below to visit SoundCloud and listen to MP3 file generated by MuseScore.

Example created by the GAN by randomly applying nornally distributed noise vectors - Click to listen on SoundCloud

Example created by MuseGAN by randomly applying normally distributed noise vectors – Click to listen on SoundCloud

Among the actual Bach chorales in the data set, the “closest” one to the artificially generated example (“close” in the sense of Euclidean distance) can be seen below. Read the rest of this entry »

Leave a comment

Posted by on December 17, 2019 in Math, Music, Programlama


Tags: , , , , , ,

The Level of High School Mathematics Education in France 220 Years Ago

Whenever new PISA (Programme for International Student Assessment) results are announced, or some journalist writes a piece on the latest state of French baccalauréat exams, many people take a critical look at educational matters and make comparisons. I think a little example from the dusty pages of the history of mathematics can shed some light at the level of high school education in France back in 1800s, that is, almost 220 years ago. Who knows, it might even give some inspiration to people who want to check their standards.

The example is about the famous German mathematician Gauss: He wrote a remarkable book in 1798, humbly titled as “Disquisitiones Arithmeticae” (“Arithmetical Investigations”). The book was first published in 1801, and only 6 years later it was translated into French and published in 1807 as “Recherches arithmétiques“.

The translator of this important book was Antoine Charles Marcelin Poullet-Delisle, a math teacher at a high school: Lycée d’Orléans. Another French high school teacher, Louis Poinsot, wrote a long review about the translation in a daily newspaper on 21 March 1807, Saturday. Poinsot was a mathematics teacher at Lycée Bonaparte in Paris, just like the French translator of Gauss’s book.

The archives of the daily newspaper where Poinsot published his review of “Recherches arithmétiques” is available online at DigiNole Home » FSU Digital Library » Napoleonic Collections » Le Moniteur universel » Moniteur universel

And you can read the review on the second page of the newspaper: Read the rest of this entry »

Leave a comment

Posted by on December 11, 2019 in Math, Tarih


Tags: , , , , ,

How to confuse Google Translate by simply adding a newline?

When you have the most popular and successful computer-based translation service in the world used by millions of people everyday, it’s inevitable that very interesting cases will be discovered. Let’s take the following question:

  • Can simply adding a “newline” character change the translation of a word?

This sounds weird, because for a human being, the obvious reaction would be:

  • What does that even mean? Probably you’ve accidentally hit ENTER or something, and that can’t possibly affect the meaning of a word, why do you even ask that?

Well, if the translation system in question based on statistical natural language processing and neural network algorithms such as deep learning, then things get a little more complex. Let’s first look at a sentence without any superfluous newline inserted:

and now, let’s hit ENTER right after the Dutch word “afzetzone”, to see the translation change magically:

The point here is not if the word “afzetzone” is translated correctly, but rather, how come its translation changes by simply adding one more “white space” after the word.

If you’re a lay person, you’ll probably be baffled by this example, and if you’re an NLP expert, specializing in deep learning techniques, you’ll probably scratch your head and then smile, and if you’re one of the scientists or engineers actually working on the Google Translate software’s debugging, well, then you might give a different reaction. 😉

All in all, keep in mind that in today’s technological landscape, there are super complex systems behind simple interfaces, and such “glitches” barely scratch the surface of this, providing a little, and opaque glimpse into a popular Artificial Intelligence product.

Leave a comment

Posted by on November 8, 2019 in Linguistics, Programlama, Science


Tags: , ,

Normality Testing: is it normal?

It is largely because of lack of knowledge of what statistics is that the person untrained in it trusts himself with a tool quite as dangerous as any he may pick out from the whole armamentarium of scientific methodology. –Edwin B. Wilson (1927), quoted in Stephen M. Stigler, The Seven Pillars of Statistical Wisdom.

Imagine you’re responsible for testing some aspects of a complex software product, and one of your colleagues comes up with the following request:

  • Hey, can you write a self-contained function to test the results of software component X, and returns TRUE if the data set generated by X is normally distributed, and FALSE otherwise?

What’s a poor software developer to do?

Well, you cherish the fond memories of your first statistics class that you took more than 20 years ago, and say: “I’ll plot a histogram of the data, and see if it’s normal!”

But of course, in less than a second you realize that manual visual inspection of a plot will not make an automated test, not at all! So as a brilliant software developer with math background, you say, “easy, I’ll just grab my secret weapon, that is, Python and its SciPy library to smash through this little statistical challenge!” You’re happy that you can stand on the shoulders of the giants, and use a well-documented, simple function such as scipy.stats.normaltest.
Read the rest of this entry »

Leave a comment

Posted by on September 11, 2019 in Math, Programlama, python, Science


Tags: , , , ,

What was the state of AI in Europe almost 70 years ago?

When it comes to the history of Artificial Intelligence (AI), even a simple Internet search will tell you that the defining event was “The Dartmouth Summer Research Project on Artificial Intelligence“, a summer workshop in 1956, held in Dartmouth College, United States. What is less known is the fact that, 5 years before Dartmouth, USA, there was a conference in Europe, back in 1951. The conference in Paris was “Les machines à calculer et la pensée humaine” (Calculating machines and human thinking). This can be easily considered the earliest major conference on Artificial Intelligence. Supported by the Rockefeller foundation, its participant list included the intellectual giants of the field, such as Warren Sturgis McCulloch, Norbert Wiener, Maurice Vincent Wilkes, and others.

The organizer of the conference, Louis Couffignal, was also mathematician and cybernetics pioneer, who had already published a book titled “Les machines à calculer. Leurs principes. Leur évolution.” in 1933 (Calculating machines. Their principles. Their evolution.) Another highlight from the conference was El Ajedrecista (The Chess Player), designed by Spanish civil engineer and mathematician Leonardo Torres y Quevedo. There was also a presentation based on practical experiences with the Z4 computer, designed by Konrad Zuse, and operated in ETH Zurich. The presenter was none other than Eduard Stiefel, inventor of the conjugate gradient method, among other things.

The field of AI has come a long way since 1951, and it is safe to say it’s going to penetrate into more aspects of our lives and technologies. It’s also safe to say that like many technological and scientific endeavors, progress in AI is the result of many bright minds in many different countries, and generally USA and UK are regarded as the places that contributed a lot. But it’s also important to recognize the lesser known facts such as this Paris conference in 1951, and realize the strong tradition in Europe: not only the academic, research and development track, but also the strong industrial and business tracks. Historical artifacts in languages other than English necessarily mean less recognition, but they should be a reason to cherish the diversity and variety. I believe all of these aspects combined should guide Europe in its quest for advancing the state of the art in AI, both in terms of software, hardware, and combined systems.

This article is heavily based on and inspired by the following article by Herbert Bruderer, a retired lecturer in didactics of computer science at ETH Zürich: “The Birthplace of Artificial Intelligence?

Leave a comment

Posted by on July 11, 2019 in Math, Programlama, Science


Tags: , , , , ,

Zen of GitHub and Python

For some of the readers it’s old news, but I’ve just discovered the Zen of GitHub API. It immediately reminded me of The Zen of Python, and of course I wanted to find out a list of GitHub’s version of Zen koans. Therefore I wrote a short Python program to do the job: Read the rest of this entry »


Posted by on June 4, 2019 in Programlama, python


Tags: , ,

How to preview fixed width (mono spaced) fonts in an editable Emacs buffer?

When using Emacs, I don’t spend time thinking about fonts most of the time. Like the majority, I pick my favorite fixed width, mono space font and get on with it. Every now and then I can hear about some cool new font for reading lots of software source code and technical writing, and I might give it a try, but that’s the end of it.

But sometimes, you just want to have an overview and see everything summed up in a single place, preferably an Emacs buffer so you can also play with it and hack it. Of course, your GNU/Linux, macOS, or MS Windows will happily show you all the available fonts, and let you filter out fixed width ones suitable for programming. Emacs itself can also do something very similar. But as I said, why not have something according to your taste?

With a bit of Emacs Lisp, it seems not that difficult, at least on GNU/Linux:

;; See the following for more details
;; and also see the following on a recent GNU/Linux or similar system:
;; /usr/share/doc/fontconfig/fontconfig-user.html
;; for the explanation of spacing=100
;; also see the following UNIX StackExchange answer:
(defun compare-monospace-font-families ()
"Display a list of all monospace font faces. Tested on GNU/Linux."
(pop-to-buffer "*Monospace Fonts*")
(dolist (font-name (seq-filter (lambda (font)
(when-let ((info (font-info font)))
(string-match-p "spacing=100" (aref info 1))))
(concat "1 l; 0 O o [ < = > ] " font-name ")\n")
'font-lock-face `((:family
,(format "%s" (font-get (font-spec :name font-name) :family))))))))

The result of running compare-monospace-font-families can be seen in the following screenshot: Read the rest of this entry »

1 Comment

Posted by on May 9, 2019 in Emacs, General


Tags: ,