Tag Archives: python

Normality Testing: is it normal?

It is largely because of lack of knowledge of what statistics is that the person untrained in it trusts himself with a tool quite as dangerous as any he may pick out from the whole armamentarium of scientific methodology. –Edwin B. Wilson (1927), quoted in Stephen M. Stigler, The Seven Pillars of Statistical Wisdom.

Imagine you’re responsible for testing some aspects of a complex software product, and one of your colleagues comes up with the following request:

  • Hey, can you write a self-contained function to test the results of software component X, and returns TRUE if the data set generated by X is normally distributed, and FALSE otherwise?

What’s a poor software developer to do?

Well, you cherish the fond memories of your first statistics class that you took more than 20 years ago, and say: “I’ll plot a histogram of the data, and see if it’s normal!”

But of course, in less than a second you realize that manual visual inspection of a plot will not make an automated test, not at all! So as a brilliant software developer with math background, you say, “easy, I’ll just grab my secret weapon, that is, Python and its SciPy library to smash through this little statistical challenge!” You’re happy that you can stand on the shoulders of the giants, and use a well-documented, simple function such as scipy.stats.normaltest.
Read the rest of this entry »

Leave a comment

Posted by on September 11, 2019 in Math, Programlama, python, Science


Tags: , , , ,

Zen of GitHub and Python

For some of the readers it’s old news, but I’ve just discovered the Zen of GitHub API. It immediately reminded me of The Zen of Python, and of course I wanted to find out a list of GitHub’s version of Zen koans. Therefore I wrote a short Python program to do the job: Read the rest of this entry »


Posted by on June 4, 2019 in Programlama, python


Tags: , ,

GODISNOWHERE: A look at a famous question using Python, Google and natural language processing

Are there any commonalities among human intelligence, Bayesian probability models, corpus linguistics, and religion? This blog entry presents a piece of light reading for people interested in a combination of those topics.
You have probably heard the famous question:

       “What do you see below?”


The stream of letters can be broken down into English words in two different ways, either as “God is nowhere”   or as “God is now here.” You can find an endless set of variations on this theme on the Internet,  but I will deal with this example in the context of computational linguistics and big data processing.


When I first read the beautiful book chapter titled “Natural Language Corpus Data” written by Peter Norvig, in the book “Beautiful Data“, I’ve decided to make an experiment using Norvig’s code. In that chapter, Norvig showed a very concise Python program that ‘learned’ how to break down a stream of letters into English words, in other words, a program with the capability to do ‘word segmentation’.

Norvig’s code coupled with Google’s language corpus, is powerful and impressive; it is able to take a character string such as


and return a correct segmentation:

‘when’, ‘in’, ‘the’, ‘course’, ‘of’, ‘human’, ‘events’, ‘it’, ‘becomes’, ‘necessary’

But how would it deal with “GODISNOWEHERE”? Let’s try it out in a GNU/Linux environment: Read the rest of this entry »


Posted by on March 1, 2014 in Linguistics, Programlama, python


Tags: , , , , , , , ,

Scala versus Python and R: software archaeology in bioinformatics

When one of the scala-user members has mentioned a bioinformatics package called GATK (Genome Analysis Toolkit) and its use of Scala recently, I’ve decided to take a further look into this matter. Thanks to the valuable Ohloh service, amateur software archaeology has never been easier! After a brief visit to I’ve learned that GATK software has had 12,871 commits made by 77 contributors  within the last 5 years, representing 99,078 lines of code.

I wanted to learn more about its source code languages, and decided to drill down by visiting What I have discovered was surprising. Let me share the facts I’ve found so far: The project did not have any Scala code until recently, for example in July, 2009, it had no Scala, whereas it contained 4410 lines of Python and 56 lines of R code:


Read the rest of this entry »


Posted by on February 16, 2014 in Programlama


Tags: , , , ,

GNU/Linux command line tip of the day: sum of numbers in a column

More often than not, I need to quickly need to see the sum of a column of numbers when I’m doing some processing on the GNU/Linux command line. For the sake of simplicity, let’s assume that you have the following output from some command line pipe:
Read the rest of this entry »


Posted by on May 28, 2013 in awk, Linux


Tags: , , , , ,

Is Semantic Web and Linked Data Good Enough? SPARQL & DBPedia vs. Python & IMDbPY

Semantic Web & Linked Data: Technology of the future? Hopefully.

The inspiration of this short article is a simple question my wife asked while we were enjoying a recent episode of Continuum, a Canadian science-fiction series:

Q1. Hey, Emre, isn’t the girl who is playing Kiera’s grandmother the same girl who played Rosie Larsen in The Killing?

I said that I believed so and that it would be very easy to get the definitive answer via IMDb. Before I finished my sentence though, it occured to me that this would be a nice test to evaluate the current state of the Semantic Web and Linked Data. After all, how difficult would it be to query the wonderful world of linked data with a couple of SPARQL queries, and even go further by asking the following question:

Q2. Who are the actors that performed both in The Killing and Continuum?

What will be state of Semantic Web and Linked Data 65 Years from Now?

What will be state of Semantic Web and Linked Data 65 Years from Now?

After all, semantic web, linked data, coupled with DBpedia can easily tell us the actors that starred in Hoffa and The Shining, right? Simply running the following SPARQL query running the the following query using

Read the rest of this entry »


Posted by on July 11, 2012 in Programlama, python


Tags: , , , , , , , , , , , ,

How To: Natural Sorting / Human Sorting in Python and other languages

There are some subtle and crucial concepts that I meet again and again. The problem is that once I’m done with them sometimes I tend to forget some of them. Sorting string variables in a humane (alphabetical, natural) instead of ASCIIbetical order is just one of those concepts. I had to re-discover this issue as I was trying to write a file renaming program in Python today. My problem can be described as

>>> "u11-Phrase 099.wav" >> "u11-Phrase 100.wav" >> "u11-Phrase 101.wav" < "u11-Phrase 1000.wav"

So according Python (or your favorite language’s default sort functionality!) “u11-Phrase 100.wav” comes before “u11-Phrase 1000.wav” but “u11-Phrase 101.wav” comes after “u11-Phrase 1000.wav”!

For solutions please see one of those:


Posted by on December 21, 2009 in General, Programlama