RSS

Category Archives: Linux

Faster, RegEx! Match! Match! (Which Regular Expression Utility is the Fastest?)


When it comes to dealing with text data, regular expressions are the bread and butter of data processing, as well as programming, most of the time. Hardly a day or two passes before you use grep or a similar tool. Until recently, I thought the field of regular expressions and related tools were very useful, boring, and didn’t present any innovations. It turns out that I was wrong!

There are two relatively new players in town: ICgrep and ripgrep.

ICGrep uses a new, parallel bitstream technology, developed Dr. Robert D. Cameron at Simon Fraser University. It claims to be super fast for many text search and processing tasks. ICGrep is available for download from http://www.icgrep.com/downloads.htm as a binary executable for OS X / MacOS. Its source code is also available if you want to build it for your operating system.

ripgrep is developed mainly by Andrew Gallant and other open source contributors, and its source code is available at https://github.com/BurntSushi/ripgrep. It is developed in Rust programming language, and claims to be very fast, Unicode-ready, as well as smart; ready to replace the Silver Searcher (ag), and “ack“.

Let’s see how they compare to the venerable regular expression utilities that we all know and love. Read the rest of this entry »

 
Leave a comment

Posted by on November 3, 2016 in Linux, Programlama, sysadmin

 

Tags: , , , , , , , , ,

Is there a high quality and free Text to Speech system for Dutch that runs on GNU/Linux?


Dear Text to Speech and open source experts:

For a toy / hobby project (non-commercial), I’m trying to find a suitable Text to Speech system for Dutch that I can run on GNU/Linux. So far, the situation does not look very promising. I’ve tried eSpeak, but using it for Dutch is not as good as I expect. I made my experiment using a file “computer.txt” that has the following contents:

Een computer is een apparaat waarmee gegevens volgens formele procedures zoals algoritmen kunnen worden verwerkt. Meestal wordt met het woord computer een elektronisch, digitaal apparaat bedoeld, maar er bestaan ook mechanische en analoge computers.

$ espeak -vnl+7 -s 170 -f computer.txt

Read the rest of this entry »

 
3 Comments

Posted by on December 3, 2015 in Linguistics, Linux

 

Tags: , , ,

PostgreSQL 9 High Availability Cookbook


6969OSPostgreSQL 9 High Availability Cookbook is a very well written book whose primary audience are experienced DBAs and system engineers who want to take their PostgreSQL skills to the next level by diving into the details of building highly available PostgreSQL based systems. Reading this book is like drinking from a fire hose, the signal-to-noise ratio is very high; in other words, every single page is packed with important, critical, and very practical information. As a consequence, this also means that the book is not for newbies: not only you have to know the fundamental aspects of PostgreSQL from a database administrator’s point of view, but you also need to have solid GNU/Linux system administration background.

One of the strongest aspects of the book is the author’s principled and well-structured engineering approach to building a highly available PostgreSQL system. Instead of jumping to some recipes to be memorized, the book teaches you basic but very important principles of capacity planning. More importantly, this planning of servers and networking is not only given as a good template, but the author also explains the logic behind it, as well as drawing attention to the reason behind the heuristics he use and why some magic numbers are taken as a good estimate in case of lack of more case-specific information. This style is applied very consistently throughout the book, each recipe is explained so that you know why you do something in addition to how you do it. Read the rest of this entry »

 
Leave a comment

Posted by on August 21, 2014 in Books, Linux, sysadmin

 

Tags: , , , ,

Is this the State of the Art for grammar checking on Linux in 21st century?


Recently, I’ve shared an article with a colleague of mine. The article had been published in a peer-reviewed journal and the contents were original and interesting. On the other hand, my colleague, being a meticulous reader of scientific texts, has immediately spotted a few simple grammar errors. It was very easy to blame the authors and editors for not correcting such errors before publication, but this triggered another question:

Why don’t we have open source and very high quality grammar checking software that is already integrated into major text editors such as VIM, Emacs, etc.?

Any user of recent version of MS Word is well aware of on-the-fly grammar checking, at least for English. But as many academicians know very well, many of them use LaTeX to typeset their articles and rely on either well-known text editors such as VIM and Emacs, or specialized software for handling LaTeX easily. Therefore, to tell these people “go and check your article using MS Word, or copy paste your article text to an online grammar checking service” does not make a lot of sense. Those methods are not convenient and thus not very usable by hundreds of thousands of scientists writing articles every day. But what would be the ideal way? The answer is simple in theory: We have high quality open source spell checkers, at least for English, and they have been already integrated into major text editors, therefore scientists who write in LaTeX have no excuse for spelling errors, it is simply a matter of activating the spell checker. If only they had similar software for grammar checking, it would be very straightforward and convenient to eliminate the easiest grammar errors, at least for English.

A quick search on the Internet revealed the following for grammar checking on GNU/Linux:

– Baoqiu Cui has implemented a grammar checker integration for Emacs using link-grammar, but unfortunately it is far from easily usable.

emacsGC1

Read the rest of this entry »

 
1 Comment

Posted by on June 10, 2014 in Emacs, Linguistics, Linux

 

Tags: , , , ,

GNU/Linux command line tip of the day: sum of numbers in a column


More often than not, I need to quickly need to see the sum of a column of numbers when I’m doing some processing on the GNU/Linux command line. For the sake of simplicity, let’s assume that you have the following output from some command line pipe:
Read the rest of this entry »

 
3 Comments

Posted by on May 28, 2013 in awk, Linux

 

Tags: , , , , ,

How to solve the ugly font problem of Java applications in Ubuntu 12.10


Upgrading from a few years old Ubuntu GNU/Linux version to the latest Ubuntu 12.10 might hurt your eyes… that is, if you happen to code in Java, develop Swing applications, or sometimes prefer IDEs such as NetBeans to Emacs. Somehow upgrading to the latest version of Ubuntu creates a problem with fonts and in many Java applications you see very ugly, bold fonts in menus, tree labels, etc.

This has been confirmed as a bug, and you can read more details at https://bugs.launchpad.net/ubuntu/+source/openjdk-7/+bug/937200 or https://netbeans.org/bugzilla/show_bug.cgi?id=221778

Apparently this bug seems to be somehow related to Wine and a font package. The solution that worked for me was simply to issue the following command:

    sudo apt-get remove fonts-unfonts-core

Well, I did not need Korean TrueType fonts anyway.

 

 
2 Comments

Posted by on February 27, 2013 in java, Linux

 

Tags: , , , , ,

How to solve the ‘suspend does not work when laptop lid is closed’ problem in Ubuntu 12.10


I have recently upgraded my ThinkPad T500’s operating system to Ubuntu 12.10 and everything went very smooth except an annoying issue: the suspend functionality was not activated when I closed the laptop lid. Running pm-suspend from the command line, clicking on the Suspend from the GUI, or using the Fn+F4 key combination was working and my laptop was going into the sleep mode but somehow closing the lid was not achieving the same effect.

A quick Google search turned this bug report: https://bugs.launchpad.net/ubuntu/+source/gnome-power-manager/+bug/863834 and a simple work around: https://bugs.launchpad.net/ubuntu/+source/gnome-power-manager/+bug/863834/comments/30

So I have decided to implement a similar workaround, but instead of using pm-suspend, I preferred to use dbus-send to invoke the sleep mode (see http://ubuntuforums.org/showpost.php?p=11331634&postcount=6 for more details):

 
Leave a comment

Posted by on February 26, 2013 in Linux

 

Tags: , , ,