Author Archives: Emre Sevinç

About Emre Sevinç

AACTAAAGGAACTTT… some stochastic processes so far… I’m a software developer, e-learning designer, cognitive science researcher and former university lecturer. In my 30s, I keep on learning, discovering and studying things about computing, project management, human mind and hacking.

Mozilla Common Voice Veri Seti ve Türkçe: Bir Gariplik Yok Mu?

Günümüzde YZ (Yapay Zeka) uygulamaları hayatımızın her alanına nüfuz etmeye devam ediyor: ses arayüzleri ve akıllı asistanlar pek çok yerde karşımıza çıkmakta. Makine Öğrenme temelli yapay zeka uygulamalarının diğer alanlarında olduğu gibi ses ve konuşma teknolojileri alanında da bilimsel çalışmaları ve inovasyonu artırmak, start-up’ları hareketlendirmek için kaliteli ve doğrulanmış, etiketlenmiş açık veri setlerine erişim önemli. Küçük start-up’ların Internet ve teknoloji devleri ile bu konuda hemen yarışmasını beklemek ise zor. Bu yüzden bu alanda veri toplayan ve açık lisanslar ile paylaşan Mozilla Common Voice gibi projeler önemli bir rol üstleniyor. Kaliteli ve çok miktarda veri içeren veri setleri, ses tanıma (Speech to Text), yazıyı sese çevirme (TTS – Text to Speech) vb. için çok önemli bir başlangıç noktası.

Kısa süre önce Mozilla Common Voice ses veri setinin yeni sürümünü duyurdu (Temmuz, 2020). Ses verisi toplanan diller arasında Türkçe olduğu için dikkatimi çekti. Yıllar önce benzer bir projeye katkıda bulunmuş biri olarak daha detaylı inceleyince beni şaşırtan bir durumla karşılaştım! Dünyada neredeyse 80 milyon kişi tarafından konuşulan Türkçe, bu veri seti içinde, İngilizceyi geçtim, 10 kat daha az insanın konuştuğu Katalanca gibi bir dilden bile daha az veri ile temsil ediliyor: Veri setindeki Katalanca veri miktarı Türkçe’den 24 kat, İngilizce ise 86 kat daha fazla!

Read the rest of this entry »


Tags: , , , , , , , , ,

Generative Deep Learning and Bach, a Good Fit?

If you’re like me, you know that there’s never “enough Bach” in one’s life and you can always tap into infinite musical curiosities based on Bach. Using Artificial Intelligence methods such as deep learning to “train” computers for music composition is one of the fascinating recent trends in this area, and applying these automated, statistical methods to Bach chorales is an active topic of research with interesting results. The book by David Foster, “Generative Deep Learning – Teaching Machines to Paint, Write, Compose, and Play“, has a chapter dedicated to using generative deep learning methods such as MuseGAN for music composition, and explains how such “generative” models can be trained on Bach’s real polyphonic compositions to output new musical pieces in the style of Bach.

Below is an original piece created by the Generative Adversarial Deep Learning Network (GAN, in particular the famous MuseGAN network architecture). The MuseGAN deep learning network system was able to create this after training for only 1000 epochs on a moderate laptop for 2 hours (without using GPUs), based on the data set at (a set of 229 Bach chorales). In other words, this is definitely not representative of what Deep Learning can achieve as best because such a system can be easily trained for longer on much more powerful systems (see further examples below). The focus of these examples is the fact that you can also start to experiment with deep learning systems that start to model musical aspects without explicit musical teaching, hard-encoded rules in software, etc.

You can click on the image below to visit SoundCloud and listen to MP3 file generated by MuseScore.

Example created by the GAN by randomly applying nornally distributed noise vectors - Click to listen on SoundCloud

Example created by MuseGAN by randomly applying normally distributed noise vectors – Click to listen on SoundCloud

Among the actual Bach chorales in the data set, the “closest” one to the artificially generated example (“close” in the sense of Euclidean distance) can be seen below. Read the rest of this entry »

Leave a comment

Posted by on December 17, 2019 in Math, Music, Programlama


Tags: , , , , , ,

The Level of High School Mathematics Education in France 220 Years Ago

Whenever new PISA (Programme for International Student Assessment) results are announced, or some journalist writes a piece on the latest state of French baccalauréat exams, many people take a critical look at educational matters and make comparisons. I think a little example from the dusty pages of the history of mathematics can shed some light at the level of high school education in France back in 1800s, that is, almost 220 years ago. Who knows, it might even give some inspiration to people who want to check their standards.

The example is about the famous German mathematician Gauss: He wrote a remarkable book in 1798, humbly titled as “Disquisitiones Arithmeticae” (“Arithmetical Investigations”). The book was first published in 1801, and only 6 years later it was translated into French and published in 1807 as “Recherches arithmétiques“.

The translator of this important book was Antoine Charles Marcelin Poullet-Delisle, a math teacher at a high school: Lycée d’Orléans. Another French high school teacher, Louis Poinsot, wrote a long review about the translation in a daily newspaper on 21 March 1807, Saturday. Poinsot was a mathematics teacher at Lycée Bonaparte in Paris, just like the French translator of Gauss’s book.

The archives of the daily newspaper where Poinsot published his review of “Recherches arithmétiques” is available online at DigiNole Home » FSU Digital Library » Napoleonic Collections » Le Moniteur universel » Moniteur universel

And you can read the review on the second page of the newspaper: Read the rest of this entry »

Leave a comment

Posted by on December 11, 2019 in Math, Tarih


Tags: , , , , ,

How to confuse Google Translate by simply adding a newline?

When you have the most popular and successful computer-based translation service in the world used by millions of people everyday, it’s inevitable that very interesting cases will be discovered. Let’s take the following question:

  • Can simply adding a “newline” character change the translation of a word?

This sounds weird, because for a human being, the obvious reaction would be:

  • What does that even mean? Probably you’ve accidentally hit ENTER or something, and that can’t possibly affect the meaning of a word, why do you even ask that?

Well, if the translation system in question based on statistical natural language processing and neural network algorithms such as deep learning, then things get a little more complex. Let’s first look at a sentence without any superfluous newline inserted:

and now, let’s hit ENTER right after the Dutch word “afzetzone”, to see the translation change magically:

The point here is not if the word “afzetzone” is translated correctly, but rather, how come its translation changes by simply adding one more “white space” after the word.

If you’re a lay person, you’ll probably be baffled by this example, and if you’re an NLP expert, specializing in deep learning techniques, you’ll probably scratch your head and then smile, and if you’re one of the scientists or engineers actually working on the Google Translate software’s debugging, well, then you might give a different reaction. 😉

All in all, keep in mind that in today’s technological landscape, there are super complex systems behind simple interfaces, and such “glitches” barely scratch the surface of this, providing a little, and opaque glimpse into a popular Artificial Intelligence product.

Leave a comment

Posted by on November 8, 2019 in Linguistics, Programlama, Science


Tags: , ,

Normality Testing: is it normal?

It is largely because of lack of knowledge of what statistics is that the person untrained in it trusts himself with a tool quite as dangerous as any he may pick out from the whole armamentarium of scientific methodology. –Edwin B. Wilson (1927), quoted in Stephen M. Stigler, The Seven Pillars of Statistical Wisdom.

Imagine you’re responsible for testing some aspects of a complex software product, and one of your colleagues comes up with the following request:

  • Hey, can you write a self-contained function to test the results of software component X, and returns TRUE if the data set generated by X is normally distributed, and FALSE otherwise?

What’s a poor software developer to do?

Well, you cherish the fond memories of your first statistics class that you took more than 20 years ago, and say: “I’ll plot a histogram of the data, and see if it’s normal!”

But of course, in less than a second you realize that manual visual inspection of a plot will not make an automated test, not at all! So as a brilliant software developer with math background, you say, “easy, I’ll just grab my secret weapon, that is, Python and its SciPy library to smash through this little statistical challenge!” You’re happy that you can stand on the shoulders of the giants, and use a well-documented, simple function such as scipy.stats.normaltest.
Read the rest of this entry »

Leave a comment

Posted by on September 11, 2019 in Math, Programlama, python, Science


Tags: , , , ,

What was the state of AI in Europe almost 70 years ago?

When it comes to the history of Artificial Intelligence (AI), even a simple Internet search will tell you that the defining event was “The Dartmouth Summer Research Project on Artificial Intelligence“, a summer workshop in 1956, held in Dartmouth College, United States. What is less known is the fact that, 5 years before Dartmouth, USA, there was a conference in Europe, back in 1951. The conference in Paris was “Les machines à calculer et la pensée humaine” (Calculating machines and human thinking). This can be easily considered the earliest major conference on Artificial Intelligence. Supported by the Rockefeller foundation, its participant list included the intellectual giants of the field, such as Warren Sturgis McCulloch, Norbert Wiener, Maurice Vincent Wilkes, and others.

The organizer of the conference, Louis Couffignal, was also mathematician and cybernetics pioneer, who had already published a book titled “Les machines à calculer. Leurs principes. Leur évolution.” in 1933 (Calculating machines. Their principles. Their evolution.) Another highlight from the conference was El Ajedrecista (The Chess Player), designed by Spanish civil engineer and mathematician Leonardo Torres y Quevedo. There was also a presentation based on practical experiences with the Z4 computer, designed by Konrad Zuse, and operated in ETH Zurich. The presenter was none other than Eduard Stiefel, inventor of the conjugate gradient method, among other things.

The field of AI has come a long way since 1951, and it is safe to say it’s going to penetrate into more aspects of our lives and technologies. It’s also safe to say that like many technological and scientific endeavors, progress in AI is the result of many bright minds in many different countries, and generally USA and UK are regarded as the places that contributed a lot. But it’s also important to recognize the lesser known facts such as this Paris conference in 1951, and realize the strong tradition in Europe: not only the academic, research and development track, but also the strong industrial and business tracks. Historical artifacts in languages other than English necessarily mean less recognition, but they should be a reason to cherish the diversity and variety. I believe all of these aspects combined should guide Europe in its quest for advancing the state of the art in AI, both in terms of software, hardware, and combined systems.

This article is heavily based on and inspired by the following article by Herbert Bruderer, a retired lecturer in didactics of computer science at ETH Zürich: “The Birthplace of Artificial Intelligence?

Leave a comment

Posted by on July 11, 2019 in Math, Programlama, Science


Tags: , , , , ,

Zen of GitHub and Python

For some of the readers it’s old news, but I’ve just discovered the Zen of GitHub API. It immediately reminded me of The Zen of Python, and of course I wanted to find out a list of GitHub’s version of Zen koans. Therefore I wrote a short Python program to do the job: Read the rest of this entry »


Posted by on June 4, 2019 in Programlama, python


Tags: , ,

How to preview fixed width (mono spaced) fonts in an editable Emacs buffer?

When using Emacs, I don’t spend time thinking about fonts most of the time. Like the majority, I pick my favorite fixed width, mono space font and get on with it. Every now and then I can hear about some cool new font for reading lots of software source code and technical writing, and I might give it a try, but that’s the end of it.

But sometimes, you just want to have an overview and see everything summed up in a single place, preferably an Emacs buffer so you can also play with it and hack it. Of course, your GNU/Linux, macOS, or MS Windows will happily show you all the available fonts, and let you filter out fixed width ones suitable for programming. Emacs itself can also do something very similar. But as I said, why not have something according to your taste?

With a bit of Emacs Lisp, it seems not that difficult, at least on GNU/Linux:

The result of running compare-monospace-font-families can be seen in the following screenshot: Read the rest of this entry »

1 Comment

Posted by on May 9, 2019 in Emacs, General


Tags: ,

Lost in Google Translate: How Unreasonable Effectiveness of Data can Sometimes Lead Us Astray

I’ve recently received an e-mail in Dutch from the Belgian teacher of my 7.5-year-old son, and even though my Dutch is more than enough to understand what his teacher wrote, I also wanted to check it with Google Translate out of habit and because of my professional/academic background. This led to an interesting discovery and made me think once again about artificial intelligence, deep learning, automatic translation, statistical natural language processing, knowledge representation, commonsense reasoning and linguistics.

But first things first, let’s see how Google Translate translated a very ordinary Dutch sentence into English:

Interesting! It is obvious that my son’s teacher didn’t have anything to do with a grinding table (!), and even if he did, I don’t think he’d involve his class with such interesting hobbies. 🙂 Of course, he meant the “multiplication table for 3”.

Then I wanted to see what the giant search engine, Google Search itself knows about Dutch word of “maaltafel”. And I’ve immediately seen that Google Search knows very well that “maaltafel” in Dutch means “Multiplication table” in English. Not only that, but also in the first page of search results, you can see the expected Dutch expression occurring 47 times. Nothing surprising here: Read the rest of this entry »


Posted by on February 8, 2019 in CogSci, Linguistics, philosophy, Science


Tags: , , , , , ,

Two Laws for Systems

The first is known as Gall’s law for for systems design:

“A simple system may or may not work. A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.” — John Gall

This law is essentially an argument in favour of underspecification: it can be used to explain the success of systems like the World Wide Web and Blogosphere, which grew from simple to complex systems incrementally, and the failure of systems like CORBA, which began with complex specifications. Gall’s Law has strong affinities to the practice of agile software development.

Read the rest of this entry »

Leave a comment

Posted by on November 22, 2018 in Management, philosophy, Programlama