RSS

A new data structure in town: Maple Tree


Thanks to a recent post on lwn.net, I learned about a new data structure: Maple Tree. Apparently, it’s been in development for the last 1.5 years: “The Maple Tree is a new data structure for Linux that provides an efficient way to store index ranges which map to a single pointer. It is RCU-safe and optimised for modern CPUs. For this application, it outperforms both the existing rbtree and radix tree data structures. The API is inspired by the XArray, and is significantly easier to use than the rbtree. This talk will cover the details of the implementation and show examples of users.”

This is what I could find about this up and coming “Maple Tree” data structure for enhancing Linux performance:

The Linux Maple Tree – Matthew Wilcox, Oracle
Read the rest of this entry »
 
Leave a comment

Posted by on February 15, 2021 in Linux, Programlama

 

Tags: , ,

Unix and Women


I’ve recently come across the names of two women that were active during the birth and early days of Unix, back in 1970s and 1980s. For future reference, I wanted to note down information about these pioneering women.

“For many people, writing is painful and editing one’s own prose is difficult, tedious, and error-prone. It is often hard to see which parts of a document are difficult to read or how to transform a wordy sentence into a more concise one. It is even harder to discover that one overuses a particular linguistic construct. The system of programs described here helps writers to evaluate documents and to produce better written and more readable prose. The system consists of programs to measure surface features of text that are important to good writing style as well as programs to do some of the tedious jobs of a copy editor. Some of the surface features measured are readability, sentence and word length, sentence type, word usage, and sentence openers. The copy editing programs find spelling errors, wordy phrases, bad diction, some punctuation errors, double words, and split infinitives.”

Computer aids for writers“, Lorinda Cherry, ACM SIGPLAN Notices, April 1981

Lorinda Cherry and Nina McDonald worked on Writer’s Workbench among other things in 1970s at Bell Labs. I wish the utilities that made up Writer’s Workbench would still be available and actively developed as free and open source software, maybe via GitHub (all I could find was this discussion on Hacker News).

According to M. Douglas McIlroy, Lorinda Cherry also contributed to another operating system: Plan 9.

The curious readers of history of computing can learn more about these women in the following online resources:

Read the rest of this entry »
 
Leave a comment

Posted by on February 2, 2021 in Programlama, Tarih

 

Tags: ,

Truth, correctness and utility: an example from Information Theory


I’ve come across the following when doing research on “data processing inequality“:

Fom page 19 of “Elements of Information Theory“, Second Edition, 2006, Thomas M. Cover and Joy A. Thomas

As it’s also stated in Scholarpedia’s “Mutual information” article, “Kullback-Leibler divergence is not a true distance: it is not symmetric, and it does not obey the triangle inequality (Cover and Thomas, 1991). It is not hard to show that DKL(P(z)||Q(z)) is non-negative, and zero if and only if P(z)=Q(z) .”

I found this a striking example of an expression not being true, and mathematically wrong, but the concept still being “useful“, as stated by Cover and Thomas, as long as you are experienced, and well aware of what you’re doing.

Further Reading:

 
Leave a comment

Posted by on February 2, 2021 in Books, Math

 

Tags: , , , ,

Diacritics restoration: can we do better using neural networks and deep learning? Perspectives from a 10-year-old open source project


People who need to write correctly in languages that have letters with various diacritics such as ‘ğ‘, ‘ş‘, ‘ö‘, ‘ı‘, etc., can be troubled with US or UK standard QWERTY keyboards because of the lack of such letters on those keyboard layouts. If you also need to switch between languages such as English, and Turkish, you know what I mean.

Possible forms of diacritic restoration in Turkish for “aci”. Source: “Diacritic Restoration Using Recurrent Neural Network” by Ayşenur Genç Uzun

The process of taking a piece of writing without correct spelling (that uses standard ASCII characters, without proper diacritics) , and replacing the relevant letters with the correct ones is known as “diacritics restoration“, or “diacritics reconstruction” (or “deASCIIfication” colloquially). About 10 years ago, I wrote a Python program to help people with this: Turkish Deasciifier; a port of the Emacs Lisp code developed by Prof. Deniz Yüret. There’s also a web interface at http://turkceyap.appspot.com.

Read the rest of this entry »
 
Leave a comment

Posted by on October 22, 2020 in Linguistics, Programlama, python, Science

 

Tags: ,

What is Engineering? Perspectives from “The Sciences of the Artificial”


If you are an engineer, or an engineering manager responsible for designing software-intensive complex systems, you will find a lot of food for thought in the following quotes from “The Sciences of the Artificial” by Nobel laureate and Turing Award recipient Herbert A. Simon. You might realize that the term ‘software‘ never appears in the following quotations, and the word ‘program‘ is mentioned only twice. Yet, the issues, concerns, methods, and the line of reasoning proposed by Simon can be used to attack the core of challenges facing software engineers working on different systems, and diverse domains. I believe these, as well as most of the rest of the book, deserve a critical and deep reading by generations of engineers.

“There is nothing special that needs to be said here about resource conservation—cost minimization, for example, as a design criterion. Cost minimization has always been an implicit consideration in the design of engineering structures, but until a few years ago it generally was only implicit, rather than explicit. More and more cost calculations have been brought explicitly into the design procedure, and a strong case can be made today for training design engineers in that body of technique and theory that economists know as “cost-benefit analysis.””

Read the rest of this entry »
 
Leave a comment

Posted by on October 6, 2020 in business, Management, Programlama, Science

 

Tags: , ,

Mozilla Common Voice Veri Seti ve Türkçe: Bir Gariplik Yok Mu?


Günümüzde YZ (Yapay Zeka) uygulamaları hayatımızın her alanına nüfuz etmeye devam ediyor: ses arayüzleri ve akıllı asistanlar pek çok yerde karşımıza çıkmakta. Makine Öğrenme temelli yapay zeka uygulamalarının diğer alanlarında olduğu gibi ses ve konuşma teknolojileri alanında da bilimsel çalışmaları ve inovasyonu artırmak, start-up’ları hareketlendirmek için kaliteli ve doğrulanmış, etiketlenmiş açık veri setlerine erişim önemli. Küçük start-up’ların Internet ve teknoloji devleri ile bu konuda hemen yarışmasını beklemek ise zor. Bu yüzden bu alanda veri toplayan ve açık lisanslar ile paylaşan Mozilla Common Voice gibi projeler önemli bir rol üstleniyor. Kaliteli ve çok miktarda veri içeren veri setleri, ses tanıma (Speech to Text), yazıyı sese çevirme (TTS – Text to Speech) vb. için çok önemli bir başlangıç noktası.

Kısa süre önce Mozilla Common Voice ses veri setinin yeni sürümünü duyurdu (Temmuz, 2020). Ses verisi toplanan diller arasında Türkçe olduğu için dikkatimi çekti. Yıllar önce benzer bir projeye katkıda bulunmuş biri olarak daha detaylı inceleyince beni şaşırtan bir durumla karşılaştım! Dünyada neredeyse 80 milyon kişi tarafından konuşulan Türkçe, bu veri seti içinde, İngilizceyi geçtim, 10 kat daha az insanın konuştuğu Katalanca gibi bir dilden bile daha az veri ile temsil ediliyor: Veri setindeki Katalanca veri miktarı Türkçe’den 24 kat, İngilizce ise 86 kat daha fazla!

Read the rest of this entry »

 
 

Tags: , , , , , , , , ,

Generative Deep Learning and Bach, a Good Fit?


If you’re like me, you know that there’s never “enough Bach” in one’s life and you can always tap into infinite musical curiosities based on Bach. Using Artificial Intelligence methods such as deep learning to “train” computers for music composition is one of the fascinating recent trends in this area, and applying these automated, statistical methods to Bach chorales is an active topic of research with interesting results. The book by David Foster, “Generative Deep Learning – Teaching Machines to Paint, Write, Compose, and Play“, has a chapter dedicated to using generative deep learning methods such as MuseGAN for music composition, and explains how such “generative” models can be trained on Bach’s real polyphonic compositions to output new musical pieces in the style of Bach.

Below is an original piece created by the Generative Adversarial Deep Learning Network (GAN, in particular the famous MuseGAN network architecture). The MuseGAN deep learning network system was able to create this after training for only 1000 epochs on a moderate laptop for 2 hours (without using GPUs), based on the data set at https://github.com/czhuang/JSB-Chorales-dataset (a set of 229 Bach chorales). In other words, this is definitely not representative of what Deep Learning can achieve as best because such a system can be easily trained for longer on much more powerful systems (see further examples below). The focus of these examples is the fact that you can also start to experiment with deep learning systems that start to model musical aspects without explicit musical teaching, hard-encoded rules in software, etc.

You can click on the image below to visit SoundCloud and listen to MP3 file generated by MuseScore.

Example created by the GAN by randomly applying nornally distributed noise vectors - Click to listen on SoundCloud

Example created by MuseGAN by randomly applying normally distributed noise vectors – Click to listen on SoundCloud

Among the actual Bach chorales in the data set, the “closest” one to the artificially generated example (“close” in the sense of Euclidean distance) can be seen below. Read the rest of this entry »

 
Leave a comment

Posted by on December 17, 2019 in Math, Music, Programlama

 

Tags: , , , , , ,

The Level of High School Mathematics Education in France 220 Years Ago


Whenever new PISA (Programme for International Student Assessment) results are announced, or some journalist writes a piece on the latest state of French baccalauréat exams, many people take a critical look at educational matters and make comparisons. I think a little example from the dusty pages of the history of mathematics can shed some light at the level of high school education in France back in 1800s, that is, almost 220 years ago. Who knows, it might even give some inspiration to people who want to check their standards.

The example is about the famous German mathematician Gauss: He wrote a remarkable book in 1798, humbly titled as “Disquisitiones Arithmeticae” (“Arithmetical Investigations”). The book was first published in 1801, and only 6 years later it was translated into French and published in 1807 as “Recherches arithmétiques“.

The translator of this important book was Antoine Charles Marcelin Poullet-Delisle, a math teacher at a high school: Lycée d’Orléans. Another French high school teacher, Louis Poinsot, wrote a long review about the translation in a daily newspaper on 21 March 1807, Saturday. Poinsot was a mathematics teacher at Lycée Bonaparte in Paris, just like the French translator of Gauss’s book.

The archives of the daily newspaper where Poinsot published his review of “Recherches arithmétiques” is available online at DigiNole Home » FSU Digital Library » Napoleonic Collections » Le Moniteur universel » Moniteur universel

And you can read the review on the second page of the newspaper: Read the rest of this entry »

 
Leave a comment

Posted by on December 11, 2019 in Math, Tarih

 

Tags: , , , , ,

How to confuse Google Translate by simply adding a newline?


When you have the most popular and successful computer-based translation service in the world used by millions of people everyday, it’s inevitable that very interesting cases will be discovered. Let’s take the following question:

  • Can simply adding a “newline” character change the translation of a word?

This sounds weird, because for a human being, the obvious reaction would be:

  • What does that even mean? Probably you’ve accidentally hit ENTER or something, and that can’t possibly affect the meaning of a word, why do you even ask that?

Well, if the translation system in question based on statistical natural language processing and neural network algorithms such as deep learning, then things get a little more complex. Let’s first look at a sentence without any superfluous newline inserted:

and now, let’s hit ENTER right after the Dutch word “afzetzone”, to see the translation change magically:

The point here is not if the word “afzetzone” is translated correctly, but rather, how come its translation changes by simply adding one more “white space” after the word.

If you’re a lay person, you’ll probably be baffled by this example, and if you’re an NLP expert, specializing in deep learning techniques, you’ll probably scratch your head and then smile, and if you’re one of the scientists or engineers actually working on the Google Translate software’s debugging, well, then you might give a different reaction. 😉

All in all, keep in mind that in today’s technological landscape, there are super complex systems behind simple interfaces, and such “glitches” barely scratch the surface of this, providing a little, and opaque glimpse into a popular Artificial Intelligence product.

 
Leave a comment

Posted by on November 8, 2019 in Linguistics, Programlama, Science

 

Tags: , ,

Normality Testing: is it normal?


It is largely because of lack of knowledge of what statistics is that the person untrained in it trusts himself with a tool quite as dangerous as any he may pick out from the whole armamentarium of scientific methodology. –Edwin B. Wilson (1927), quoted in Stephen M. Stigler, The Seven Pillars of Statistical Wisdom.

Imagine you’re responsible for testing some aspects of a complex software product, and one of your colleagues comes up with the following request:

  • Hey, can you write a self-contained function to test the results of software component X, and returns TRUE if the data set generated by X is normally distributed, and FALSE otherwise?

What’s a poor software developer to do?

Well, you cherish the fond memories of your first statistics class that you took more than 20 years ago, and say: “I’ll plot a histogram of the data, and see if it’s normal!”

But of course, in less than a second you realize that manual visual inspection of a plot will not make an automated test, not at all! So as a brilliant software developer with math background, you say, “easy, I’ll just grab my secret weapon, that is, Python and its SciPy library to smash through this little statistical challenge!” You’re happy that you can stand on the shoulders of the giants, and use a well-documented, simple function such as scipy.stats.normaltest.
Read the rest of this entry »

 
Leave a comment

Posted by on September 11, 2019 in Math, Programlama, python, Science

 

Tags: , , , ,