RSS

Is this the State of the Art for grammar checking on Linux in 21st century?


Recently, I’ve shared an article with a colleague of mine. The article had been published in a peer-reviewed journal and the contents were original and interesting. On the other hand, my colleague, being a meticulous reader of scientific texts, has immediately spotted a few simple grammar errors. It was very easy to blame the authors and editors for not correcting such errors before publication, but this triggered another question:

Why don’t we have open source and very high quality grammar checking software that is already integrated into major text editors such as VIM, Emacs, etc.?

Any user of recent version of MS Word is well aware of on-the-fly grammar checking, at least for English. But as many academicians know very well, many of them use LaTeX to typeset their articles and rely on either well-known text editors such as VIM and Emacs, or specialized software for handling LaTeX easily. Therefore, to tell these people “go and check your article using MS Word, or copy paste your article text to an online grammar checking service” does not make a lot of sense. Those methods are not convenient and thus not very usable by hundreds of thousands of scientists writing articles every day. But what would be the ideal way? The answer is simple in theory: We have high quality open source spell checkers, at least for English, and they have been already integrated into major text editors, therefore scientists who write in LaTeX have no excuse for spelling errors, it is simply a matter of activating the spell checker. If only they had similar software for grammar checking, it would be very straightforward and convenient to eliminate the easiest grammar errors, at least for English.

A quick search on the Internet revealed the following for grammar checking on GNU/Linux:

Baoqiu Cui has implemented a grammar checker integration for Emacs using link-grammar, but unfortunately it is far from easily usable.

emacsGC1

Read the rest of this entry »

 
Leave a comment

Posted by on June 10, 2014 in Emacs, Linguistics, Linux

 

Tags: , , , ,

GODISNOWHERE: A look at a famous question using Python, Google and natural language processing


Are there any commonalities among human intelligence, Bayesian probability models, corpus linguistics, and religion? This blog entry presents a piece of light reading for people interested in a combination of those topics.
You have probably heard the famous question:

       “What do you see below?”

            GODISNOWHERE

The stream of letters can be broken down into English words in two different ways, either as “God is nowhere”   or as “God is now here.” You can find an endless set of variations on this theme on the Internet,  but I will deal with this example in the context of computational linguistics and big data processing.

margo

When I first read the beautiful book chapter titled “Natural Language Corpus Data” written by Peter Norvig, in the book “Beautiful Data“, I’ve decided to make an experiment using Norvig’s code. In that chapter, Norvig showed a very concise Python program that ‘learned’ how to break down a stream of letters into English words, in other words, a program with the capability to do ‘word segmentation’.

Norvig’s code coupled with Google’s language corpus, is powerful and impressive; it is able to take a character string such as

“wheninthecourseofhumaneventsitbecomesnecessary”

and return a correct segmentation:


‘when’, ‘in’, ‘the’, ‘course’, ‘of’, ‘human’, ‘events’, ‘it’, ‘becomes’, ‘necessary’

But how would it deal with “GODISNOWEHERE”? Let’s try it out in a GNU/Linux environment: Read the rest of this entry »

 
2 Comments

Posted by on March 1, 2014 in Linguistics, Programlama, python

 

Tags: , , , , , , , ,

Scala versus Python and R: software archaeology in bioinformatics


When one of the scala-user members has mentioned a bioinformatics package called GATK (Genome Analysis Toolkit) and its use of Scala recently, I’ve decided to take a further look into this matter. Thanks to the valuable Ohloh service, amateur software archaeology has never been easier! After a brief visit to https://www.ohloh.net/p/gatk I’ve learned that GATK software has had 12,871 commits made by 77 contributors  within the last 5 years, representing 99,078 lines of code.

I wanted to learn more about its source code languages, and decided to drill down by visiting https://www.ohloh.net/p/gatk/analyses/latest/languages_summary. What I have discovered was surprising. Let me share the facts I’ve found so far: The project did not have any Scala code until recently, for example in July, 2009, it had no Scala, whereas it contained 4410 lines of Python and 56 lines of R code:

beforeScala

Read the rest of this entry »

 
Leave a comment

Posted by on February 16, 2014 in Programlama

 

Tags: , , , ,

Can LinkedIn endorsements be motivating? A case for Coursera’s Machine Learning class


Any self-respecting social media savvy professional knows that LinkedIn endorsements are the result of a hideous gamification experiment gone wrong (on many levels), except when they think it is a straightforward abuse of human psychology. Some computer programmers even try to give it a backlash by writing automated scripts to endorse profiles with totally absurd ‘skill sets‘.

On the other hand, in some unexpected cases, these endorsements can be very motivating, which is what happened to me a few months ago, back in October, 2013. To cut a long story short, when I came across the following endorsements by some of my LinkedIn contacts, my reaction was something that even surprised me:

ml_linkedin_endorsement

It went like that: “Machine Learning! Should I accept that endorsement? I mean, I did small projects related to machine learning, such as Poor Man’s TV program Recommender that utilized Support Vector Machines, and a cross-cultural and cross-domain recommendation system using a semantic graph database such as AllegroGraph; but apart from an AI course that I had while studying for my cognitive science degree, I haven’t taken any Machine Learning course. On the other hand, Andrew Ng’s famous Machine Learning course at Coursera is about to start, so maybe that’s a nice opportunity! Why not? If I can finish the course successfully, then accepting such an endorsement will be a bit meaningful, at least from a practical, or academic point of view.” Read the rest of this entry »

 
2 Comments

Posted by on January 19, 2014 in e-Learning, Programlama

 

Tags: , , , ,

How to use extractors in Scala for powerful pattern matching – Devoxx style


A few months ago, Joshua D. Suereth gave a very nice technical talk at Devoxx 2013, titled “How to wield Scala in the trenches“, which was full of functional programming & Scala gems. The last slides of his presentation were dedicated to a very powerful style of using extractors and pattern matching. I liked his example so much that I wanted to understand it better and note it down in my blog so that I can apply it in many similar cases.

The basic motivation comes from the following question. Assume you have a List of objects, e.g. People, who can have more than one residence; and you want to select the ones based on a condition, e.g. the ones living in Istanbul. There are of course different ways to traverse the list and filter the ones you are looking for: Read the rest of this entry »

 
7 Comments

Posted by on January 4, 2014 in Programlama

 

Tags: , ,

After the course: Tales from the Genome, Introduction to Genetics & A Few Resources


Now that I’ve finished the Tales from the Genome, Introduction to Genetics course, I’d like to note some of the related resources (some of the links are related to 23andMe.com, a company that sponsored the course, it is the same company whose genetic analysis kit I have used to learn more about my genome and the mutations I have. Unfortunately, in the meantime I have also learned that they were forced to stop selling their kits, luckily I already had my results before that happened).

geneticcode Read the rest of this entry »

 
2 Comments

Posted by on December 29, 2013 in Science

 

Tags: , , , , , , , , ,

Book review: Java Performance


9780137142521Drinking from the Firehose

Even though the Java platform (along with JVM) is one of the most ubiquotus software develoment platforms, it was surprisingly difficult to find a self-contained book dedicated to performance aspects of Java platform. “Java Performance” by Charlie Hunt and Binu John can be considered the only solid and contemporary reference in the domain of performance analysis and tuning of Java based systems. No matter which programming language you use to run on JVM, this book is the essential reference until something better comes along.

On the other hand, make no mistake, this is not a lightweight book, or a cookbook which you can consult for a few performance tuning recipes. Reading Java Performance is more like drinking from the firehose. Whether you are a beginner, or a seasoned developer, it will take time to digest the gory details presented. Luckily, the book’s logical organization is close to perfect and some of the chapters are pretty self-contained.

From start to finish, it is not difficult to see that if you want to consider yourself a serious Java performance engineer, you need to master the majority of the book. Chapter 2 starts with an overview of the basics of operating systems performance and monitoring, setting the stage for the upcoming chapters and acting as a refresher. The authors are very careful to explain concepts concretely by giving examples from Linux, Solaris and MS Windows systems, which makes sense given the portability of JVM. Chapter 3 and 4, if taken together, provide the most comprehensive technical explanation of Java Virtual Machine and Just-in Time compilation from a performance perspective. Even if you are not facing performance problems (yet), this two chapters make a very solid and clear reference for understanding JVM and JIT technology. The terminology, concepts and details in these chapters are very important: Without a solid understanding of them, it is not easy to understand the discussions in the following chapters.

Read the rest of this entry »

 
Leave a comment

Posted by on December 8, 2013 in Books, java, Programlama

 

Tags: , , , ,

 
Follow

Get every new post delivered to your Inbox.

Join 57 other followers