Are there any commonalities among human intelligence, Bayesian probability models, corpus linguistics, and religion? This blog entry presents a piece of light reading for people interested in a combination of those topics.
You have probably heard the famous question:
“What do you see below?”
The stream of letters can be broken down into English words in two different ways, either as “God is nowhere” or as “God is now here.” You can find an endless set of variations on this theme on the Internet, but I will deal with this example in the context of computational linguistics and big data processing.
When I first read the beautiful book chapter titled “Natural Language Corpus Data” written by Peter Norvig, in the book “Beautiful Data“, I’ve decided to make an experiment using Norvig’s code. In that chapter, Norvig showed a very concise Python program that ‘learned’ how to break down a stream of letters into English words, in other words, a program with the capability to do ‘word segmentation’.
Norvig’s code coupled with Google’s language corpus, is powerful and impressive; it is able to take a character string such as
and return a correct segmentation:
‘when’, ‘in’, ‘the’, ‘course’, ‘of’, ‘human’, ‘events’, ‘it’, ‘becomes’, ‘necessary’
But how would it deal with “GODISNOWEHERE”? Let’s try it out in a GNU/Linux environment: Read the rest of this entry »