RSS

GODISNOWHERE: A look at a famous question using Python, Google and natural language processing


Are there any commonalities among human intelligence, Bayesian probability models, corpus linguistics, and religion? This blog entry presents a piece of light reading for people interested in a combination of those topics.
You have probably heard the famous question:

       “What do you see below?”

            GODISNOWHERE

The stream of letters can be broken down into English words in two different ways, either as “God is nowhere”   or as “God is now here.” You can find an endless set of variations on this theme on the Internet,  but I will deal with this example in the context of computational linguistics and big data processing.

margo

When I first read the beautiful book chapter titled “Natural Language Corpus Data” written by Peter Norvig, in the book “Beautiful Data“, I’ve decided to make an experiment using Norvig’s code. In that chapter, Norvig showed a very concise Python program that ‘learned’ how to break down a stream of letters into English words, in other words, a program with the capability to do ‘word segmentation’.

Norvig’s code coupled with Google’s language corpus, is powerful and impressive; it is able to take a character string such as

“wheninthecourseofhumaneventsitbecomesnecessary”

and return a correct segmentation:


‘when’, ‘in’, ‘the’, ‘course’, ‘of’, ‘human’, ‘events’, ‘it’, ‘becomes’, ‘necessary’

But how would it deal with “GODISNOWEHERE”? Let’s try it out in a GNU/Linux environment: Read the rest of this entry »

 
2 Comments

Posted by on March 1, 2014 in Linguistics, Programlama, python

 

Tags: , , , , , , , ,

Scala versus Python and R: software archaeology in bioinformatics


When one of the scala-user members has mentioned a bioinformatics package called GATK (Genome Analysis Toolkit) and its use of Scala recently, I’ve decided to take a further look into this matter. Thanks to the valuable Ohloh service, amateur software archaeology has never been easier! After a brief visit to https://www.ohloh.net/p/gatk I’ve learned that GATK software has had 12,871 commits made by 77 contributors  within the last 5 years, representing 99,078 lines of code.

I wanted to learn more about its source code languages, and decided to drill down by visiting https://www.ohloh.net/p/gatk/analyses/latest/languages_summary. What I have discovered was surprising. Let me share the facts I’ve found so far: The project did not have any Scala code until recently, for example in July, 2009, it had no Scala, whereas it contained 4410 lines of Python and 56 lines of R code:

beforeScala

Read the rest of this entry »

 
Leave a comment

Posted by on February 16, 2014 in Programlama

 

Tags: , , , ,

Can LinkedIn endorsements be motivating? A case for Coursera’s Machine Learning class


Any self-respecting social media savvy professional knows that LinkedIn endorsements are the result of a hideous gamification experiment gone wrong (on many levels), except when they think it is a straightforward abuse of human psychology. Some computer programmers even try to give it a backlash by writing automated scripts to endorse profiles with totally absurd ‘skill sets‘.

On the other hand, in some unexpected cases, these endorsements can be very motivating, which is what happened to me a few months ago, back in October, 2013. To cut a long story short, when I came across the following endorsements by some of my LinkedIn contacts, my reaction was something that even surprised me:

ml_linkedin_endorsement

It went like that: “Machine Learning! Should I accept that endorsement? I mean, I did small projects related to machine learning, such as Poor Man’s TV program Recommender that utilized Support Vector Machines, and a cross-cultural and cross-domain recommendation system using a semantic graph database such as AllegroGraph; but apart from an AI course that I had while studying for my cognitive science degree, I haven’t taken any Machine Learning course. On the other hand, Andrew Ng’s famous Machine Learning course at Coursera is about to start, so maybe that’s a nice opportunity! Why not? If I can finish the course successfully, then accepting such an endorsement will be a bit meaningful, at least from a practical, or academic point of view.” Read the rest of this entry »

 
2 Comments

Posted by on January 19, 2014 in e-Learning, Programlama

 

Tags: , , , ,

How to use extractors in Scala for powerful pattern matching – Devoxx style


A few months ago, Joshua D. Suereth gave a very nice technical talk at Devoxx 2013, titled “How to wield Scala in the trenches“, which was full of functional programming & Scala gems. The last slides of his presentation were dedicated to a very powerful style of using extractors and pattern matching. I liked his example so much that I wanted to understand it better and note it down in my blog so that I can apply it in many similar cases.

The basic motivation comes from the following question. Assume you have a List of objects, e.g. People, who can have more than one residence; and you want to select the ones based on a condition, e.g. the ones living in Istanbul. There are of course different ways to traverse the list and filter the ones you are looking for: Read the rest of this entry »

 
7 Comments

Posted by on January 4, 2014 in Programlama

 

Tags: , ,

After the course: Tales from the Genome, Introduction to Genetics & A Few Resources


Now that I’ve finished the Tales from the Genome, Introduction to Genetics course, I’d like to note some of the related resources (some of the links are related to 23andMe.com, a company that sponsored the course, it is the same company whose genetic analysis kit I have used to learn more about my genome and the mutations I have. Unfortunately, in the meantime I have also learned that they were forced to stop selling their kits, luckily I already had my results before that happened).

geneticcode Read the rest of this entry »

 
2 Comments

Posted by on December 29, 2013 in Science

 

Tags: , , , , , , , , ,

Book review: Java Performance


9780137142521Drinking from the Firehose

Even though the Java platform (along with JVM) is one of the most ubiquotus software develoment platforms, it was surprisingly difficult to find a self-contained book dedicated to performance aspects of Java platform. “Java Performance” by Charlie Hunt and Binu John can be considered the only solid and contemporary reference in the domain of performance analysis and tuning of Java based systems. No matter which programming language you use to run on JVM, this book is the essential reference until something better comes along.

On the other hand, make no mistake, this is not a lightweight book, or a cookbook which you can consult for a few performance tuning recipes. Reading Java Performance is more like drinking from the firehose. Whether you are a beginner, or a seasoned developer, it will take time to digest the gory details presented. Luckily, the book’s logical organization is close to perfect and some of the chapters are pretty self-contained.

From start to finish, it is not difficult to see that if you want to consider yourself a serious Java performance engineer, you need to master the majority of the book. Chapter 2 starts with an overview of the basics of operating systems performance and monitoring, setting the stage for the upcoming chapters and acting as a refresher. The authors are very careful to explain concepts concretely by giving examples from Linux, Solaris and MS Windows systems, which makes sense given the portability of JVM. Chapter 3 and 4, if taken together, provide the most comprehensive technical explanation of Java Virtual Machine and Just-in Time compilation from a performance perspective. Even if you are not facing performance problems (yet), this two chapters make a very solid and clear reference for understanding JVM and JIT technology. The terminology, concepts and details in these chapters are very important: Without a solid understanding of them, it is not easy to understand the discussions in the following chapters.

Read the rest of this entry »

 
Leave a comment

Posted by on December 8, 2013 in Books, java, Programlama

 

Tags: , , , ,

What Should Be the Attributes of an IT Project Manager?


Recently I’ve come across a job advert in which a Belgian company was looking for a project manager. They had listed what they were expecting, and I think it makes a good list as attributes of a project manager (emphasis on some words and sentences are mine) :

The IT Project Manager:

- Identifies, describes, and assigns “S.M.A.R.T” objectives to stakeholders for execution of project and team work where team members are in a functional line rather than a hierarchical line

- Applies Best-in-Class industry Project Management guidelines, methodologies (i.e. WBS, SWOT analysis, …) Read the rest of this entry »

 
Leave a comment

Posted by on December 6, 2013 in business

 

Tags: , , , , , , ,

 
Follow

Get every new post delivered to your Inbox.

Join 53 other followers