RSS

Normality Testing: is it normal?


It is largely because of lack of knowledge of what statistics is that the person untrained in it trusts himself with a tool quite as dangerous as any he may pick out from the whole armamentarium of scientific methodology. –Edwin B. Wilson (1927), quoted in Stephen M. Stigler, The Seven Pillars of Statistical Wisdom.

Imagine you’re responsible for testing some aspects of a complex software product, and one of your colleagues comes up with the following request:

  • Hey, can you write a self-contained function to test the results of software component X, and returns TRUE if the data set generated by X is normally distributed, and FALSE otherwise?

What’s a poor software developer to do?

Well, you cherish the fond memories of your first statistics class that you took more than 20 years ago, and say: “I’ll plot a histogram of the data, and see if it’s normal!”

But of course, in less than a second you realize that manual visual inspection of a plot will not make an automated test, not at all! So as a brilliant software developer with math background, you say, “easy, I’ll just grab my secret weapon, that is, Python and its SciPy library to smash through this little statistical challenge!” You’re happy that you can stand on the shoulders of the giants, and use a well-documented, simple function such as scipy.stats.normaltest.
Read the rest of this entry »

Advertisements
 
Leave a comment

Posted by on September 11, 2019 in Math, Programlama, python, Science

 

Tags: , , , ,

What was the state of AI in Europe almost 70 years ago?


When it comes to the history of Artificial Intelligence (AI), even a simple Internet search will tell you that the defining event was “The Dartmouth Summer Research Project on Artificial Intelligence“, a summer workshop in 1956, held in Dartmouth College, United States. What is less known is the fact that, 5 years before Dartmouth, USA, there was a conference in Europe, back in 1951. The conference in Paris was “Les machines à calculer et la pensée humaine” (Calculating machines and human thinking). This can be easily considered the earliest major conference on Artificial Intelligence. Supported by the Rockefeller foundation, its participant list included the intellectual giants of the field, such as Warren Sturgis McCulloch, Norbert Wiener, Maurice Vincent Wilkes, and others.

The organizer of the conference, Louis Couffignal, was also mathematician and cybernetics pioneer, who had already published a book titled “Les machines à calculer. Leurs principes. Leur évolution.” in 1933 (Calculating machines. Their principles. Their evolution.) Another highlight from the conference was El Ajedrecista (The Chess Player), designed by Spanish civil engineer and mathematician Leonardo Torres y Quevedo. There was also a presentation based on practical experiences with the Z4 computer, designed by Konrad Zuse, and operated in ETH Zurich. The presenter was none other than Eduard Stiefel, inventor of the conjugate gradient method, among other things.

The field of AI has come a long way since 1951, and it is safe to say it’s going to penetrate into more aspects of our lives and technologies. It’s also safe to say that like many technological and scientific endeavors, progress in AI is the result of many bright minds in many different countries, and generally USA and UK are regarded as the places that contributed a lot. But it’s also important to recognize the lesser known facts such as this Paris conference in 1951, and realize the strong tradition in Europe: not only the academic, research and development track, but also the strong industrial and business tracks. Historical artifacts in languages other than English necessarily mean less recognition, but they should be a reason to cherish the diversity and variety. I believe all of these aspects combined should guide Europe in its quest for advancing the state of the art in AI, both in terms of software, hardware, and combined systems.

This article is heavily based on and inspired by the following article by Herbert Bruderer, a retired lecturer in didactics of computer science at ETH Zürich: “The Birthplace of Artificial Intelligence?

 
Leave a comment

Posted by on July 11, 2019 in Math, Programlama, Science

 

Tags: , , , , ,

Zen of GitHub and Python


For some of the readers it’s old news, but I’ve just discovered the Zen of GitHub API. It immediately reminded me of The Zen of Python, and of course I wanted to find out a list of GitHub’s version of Zen koans. Therefore I wrote a short Python program to do the job: Read the rest of this entry »

 
2 Comments

Posted by on June 4, 2019 in Programlama, python

 

Tags: , ,

How to preview fixed width (mono spaced) fonts in an editable Emacs buffer?


When using Emacs, I don’t spend time thinking about fonts most of the time. Like the majority, I pick my favorite fixed width, mono space font and get on with it. Every now and then I can hear about some cool new font for reading lots of software source code and technical writing, and I might give it a try, but that’s the end of it.

But sometimes, you just want to have an overview and see everything summed up in a single place, preferably an Emacs buffer so you can also play with it and hack it. Of course, your GNU/Linux, macOS, or MS Windows will happily show you all the available fonts, and let you filter out fixed width ones suitable for programming. Emacs itself can also do something very similar. But as I said, why not have something according to your taste?

With a bit of Emacs Lisp, it seems not that difficult, at least on GNU/Linux:

The result of running compare-monospace-font-families can be seen in the following screenshot: Read the rest of this entry »

 
Leave a comment

Posted by on May 9, 2019 in Emacs, General

 

Tags: ,

Lost in Google Translate: How Unreasonable Effectiveness of Data can Sometimes Lead Us Astray


I’ve recently received an e-mail in Dutch from the Belgian teacher of my 7.5-year-old son, and even though my Dutch is more than enough to understand what his teacher wrote, I also wanted to check it with Google Translate out of habit and because of my professional/academic background. This led to an interesting discovery and made me think once again about artificial intelligence, deep learning, automatic translation, statistical natural language processing, knowledge representation, commonsense reasoning and linguistics.

But first things first, let’s see how Google Translate translated a very ordinary Dutch sentence into English:

Interesting! It is obvious that my son’s teacher didn’t have anything to do with a grinding table (!), and even if he did, I don’t think he’d involve his class with such interesting hobbies. 🙂 Of course, he meant the “multiplication table for 3”.

Then I wanted to see what the giant search engine, Google Search itself knows about Dutch word of “maaltafel”. And I’ve immediately seen that Google Search knows very well that “maaltafel” in Dutch means “Multiplication table” in English. Not only that, but also in the first page of search results, you can see the expected Dutch expression occurring 47 times. Nothing surprising here: Read the rest of this entry »

 
1 Comment

Posted by on February 8, 2019 in CogSci, Linguistics, philosophy, Science

 

Tags: , , , , , ,

Two Laws for Systems


The first is known as Gall’s law for for systems design:

“A simple system may or may not work. A complex system that works is invariably found to have evolved from a simple system that worked. A complex system designed from scratch never works and cannot be patched up to make it work. You have to start over with a working simple system.” — John Gall

This law is essentially an argument in favour of underspecification: it can be used to explain the success of systems like the World Wide Web and Blogosphere, which grew from simple to complex systems incrementally, and the failure of systems like CORBA, which began with complex specifications. Gall’s Law has strong affinities to the practice of agile software development.

Read the rest of this entry »

 
Leave a comment

Posted by on November 22, 2018 in Management, philosophy, Programlama

 

How fast can we count the number of lines in GNU/Linux, macOS, and MS Windows?


At a first glance, the question of counting lines in a text file is super straightforward. You simply run `wc` (word count) with -l or --lines option. And that’s what I’ve exactly been doing for more than 20 years. But what I read recently made me question if there are faster and more efficient ways to do that. Because, nowadays with very large and fast storage, you can easily have have text files that can be 1 GB, 10 GB, or even 100 GB. Coupled with that fact is that your laptop has at least 2 cores or maybe 4, that means 8 logical cores with hyper-threading. On a powerful server, it’s not surprising at all to have 16 or more CPU cores. Therefore, can this simple text processing be made more efficient, burning as many cores as available, and utilizing them to their maximum to return the line count in a fraction of time?

Here’s what I found:

In the first link above, I came across an interesting utility:

Apparently, the author of the turbo-linecount decided to implement his solution in C++ (for Linux, macOS, and MS Windows). He uses memory mapping technique to map the text file to the memory, and multi-threading to start threads that count the number of newlines (`\n`s) for different chunks of the memory region that corresponds to file contents, finally returning the sum total of newlines as the line count. Even though there are some issues with that system, I think it’s still very interesting. Actually my initial reaction was “how come this nice utility still not a standard package in most of the GNU/Linux software distributions such as Red Hat, Debian, Ubuntu, etc.?”.

Maybe we’ll have better options soon. Or maybe we already do? Let me know if there are better ways for this simple, yet frequently used operation.

 
Leave a comment

Posted by on November 12, 2018 in Linux, Programlama

 

Tags: , , ,