RSS

For the Love of Books: Turkey versus Belgium


The 80. edition of Antwerp Book Fair (Boekenbeurs) has finished recently. Shortly after that, another book fair, 35. Istanbul Book Fair took place. I was curious to compare the number of visitors, and especially the percentage of book fair visitors by taking the population of the city into account.

Istanbul is home to about 15 million people. Of these 15 million people, 558 thousand visited the fair in 9 days. On the other hand, the population of Antwerp is about 500 thousand people, and out of that, 170 thousand people visited the fair in 10 days on average in 2010 to 2012. In terms of percentage, the picture looks like the following:

screen-shot-2016-11-20-at-11-36-46

In other words, 3.7% of Istanbul visited their book fair, whereas 34% of Antwerp population visited their book fair. If 34% of Istanbul’s population visited the book fair, that would make ~ 5.1 million people in 9 days.

This picture tells us something about literacy and interest in books, but of course we should always ask the question: how representative such a crude statistic is? Another interesting fact is that Istanbul makes up about 20% of Turkey’s population, whereas Antwerp makes up about 4.5% of Belgium’s population. (If we want to compare Turkey to a country with similar population and its most populated city: Germany is close to Turkey with its 80 million people, and its most populated city, Berlin, is home to 3.4 million people. In other words, Germany’s most populated city makes up only 4.25% of the Germany’s population.)

 
1 Comment

Posted by on November 20, 2016 in Books

 

Tags: , , , ,

Faster, RegEx! Match! Match! (Which Regular Expression Utility is the Fastest?)


When it comes to dealing with text data, regular expressions are the bread and butter of data processing, as well as programming, most of the time. Hardly a day or two passes before you use grep or a similar tool. Until recently, I thought the field of regular expressions and related tools were very useful, boring, and didn’t present any innovations. It turns out that I was wrong!

There are two relatively new players in town: ICgrep and ripgrep.

ICGrep uses a new, parallel bitstream technology, developed Dr. Robert D. Cameron at Simon Fraser University. It claims to be super fast for many text search and processing tasks. ICGrep is available for download from http://www.icgrep.com/downloads.htm as a binary executable for OS X / MacOS. Its source code is also available if you want to build it for your operating system.

ripgrep is developed mainly by Andrew Gallant and other open source contributors, and its source code is available at https://github.com/BurntSushi/ripgrep. It is developed in Rust programming language, and claims to be very fast, Unicode-ready, as well as smart; ready to replace the Silver Searcher (ag), and “ack“.

Let’s see how they compare to the venerable regular expression utilities that we all know and love. Read the rest of this entry »

 
Leave a comment

Posted by on November 3, 2016 in Linux, Programlama, sysadmin

 

Tags: , , , , , , , , ,

How to decrease the Maven build time of your Java projects


There are good resources on the web that shows how you can decrease the Maven build times of Java projects, but since I couldn’t find the following information in most of them, I wanted to note this down for future reference. One of the simplest things you can do to decrease the Maven build time is to add the following to your command line:

-Dmaven.javadoc.skip=true

But is it worth it? Let’s check. Take an example project such as Hadoop that is about 2 million lines of source code. Without skipping the generation of Javadoc, Read the rest of this entry »

 
2 Comments

Posted by on May 23, 2016 in java, Programlama

 

Tags: , ,

Turkish Mode for Emacs is now available as a package via MELPA


Turkish Mode for Emacs, developed by Deniz Yüret, is now available as a package via MELPA. This is for people trying to type Turkish documents on a U.S. keyboard using Emacs. The program provides a `turkish-mode` in which the correct Turkish accents are added to the ASCII version of the last word typed each time the user hits space. If you are using a recent stable version of Emacs that lets you use the Emacs package manager, and you’ve added MELPA as a repository, installing it is as easy as running:

M-x package-install turkish

and then putting the following line in your init file:

(require 'turkish)

Once you have done that, in any Emacs session you can toggle the Turkish mode

M-x turkish-mode

The same program has been converted to many different languages and available on many platforms such as a Python package, a Java package, a Perl CPAN package, an Ubuntu PPA package, a web application,  a Chrome plug-in, a Firefox add-on, and a Safari add-on.

 
Leave a comment

Posted by on March 29, 2016 in Emacs, Programlama

 

Tags:

How to comment your code: an example from Hadoop


How to comment your source code? This topic comes up every once in a while, and sometimes it leads to heated discussions. The consensus is something like “comment why, and not how”. Useful as it seems, I think it is important to give examples from real-world scenarios. So, let’s look at such a case.

I’ve been working on the integration between Hadoop and HGST Active Archive S3 Object Storage product recently, and while dealing with the internals of the S3A File System that we are improving at the company, as well its interaction with YARN,  I’ve come across an interesting piece of code in the Hadoop code base. Before going into its details, look at it without any comments:

Read the rest of this entry »

 
Leave a comment

Posted by on March 11, 2016 in java, Programlama

 

Tags: , , ,

Old Computers: A Trip Down the Memory Lane and History of Computing


A few weeks ago I went to the computer science building of KU Leuven for a Haskell meet-up. I was surprised to see a lot of very old computers beautifully put on an exhibition. It felt like a time travel in the history of computing. I captured a few of them using the camera of my smartphone, trying to imagine what the pioneers of computing back then would’ve thought if they had seen this smartphone in action (full resolution photos of these and many others are available in my Flickr album.)

Some of the computers were happily churning and crunching data long before I was born such as this one:

20151201_182649

Read the rest of this entry »

 
Leave a comment

Posted by on December 24, 2015 in Programlama

 

Tags: , , , , ,

Is there a high quality and free Text to Speech system for Dutch that runs on GNU/Linux?


Dear Text to Speech and open source experts:

For a toy / hobby project (non-commercial), I’m trying to find a suitable Text to Speech system for Dutch that I can run on GNU/Linux. So far, the situation does not look very promising. I’ve tried eSpeak, but using it for Dutch is not as good as I expect. I made my experiment using a file “computer.txt” that has the following contents:

Een computer is een apparaat waarmee gegevens volgens formele procedures zoals algoritmen kunnen worden verwerkt. Meestal wordt met het woord computer een elektronisch, digitaal apparaat bedoeld, maar er bestaan ook mechanische en analoge computers.

$ espeak -vnl+7 -s 170 -f computer.txt

Read the rest of this entry »

 
3 Comments

Posted by on December 3, 2015 in Linguistics, Linux

 

Tags: , , ,