RSS

Category Archives: sysadmin

Faster, RegEx! Match! Match! (Which Regular Expression Utility is the Fastest?)


When it comes to dealing with text data, regular expressions are the bread and butter of data processing, as well as programming, most of the time. Hardly a day or two passes before you use grep or a similar tool. Until recently, I thought the field of regular expressions and related tools were very useful, boring, and didn’t present any innovations. It turns out that I was wrong!

There are two relatively new players in town: ICgrep and ripgrep.

ICGrep uses a new, parallel bitstream technology, developed Dr. Robert D. Cameron at Simon Fraser University. It claims to be super fast for many text search and processing tasks. ICGrep is available for download from http://www.icgrep.com/downloads.htm as a binary executable for OS X / MacOS. Its source code is also available if you want to build it for your operating system.

ripgrep is developed mainly by Andrew Gallant and other open source contributors, and its source code is available at https://github.com/BurntSushi/ripgrep. It is developed in Rust programming language, and claims to be very fast, Unicode-ready, as well as smart; ready to replace the Silver Searcher (ag), and “ack“.

Let’s see how they compare to the venerable regular expression utilities that we all know and love. Read the rest of this entry »

Advertisements
 
Leave a comment

Posted by on November 3, 2016 in Linux, Programlama, sysadmin

 

Tags: , , , , , , , , ,

PostgreSQL 9 High Availability Cookbook


6969OSPostgreSQL 9 High Availability Cookbook is a very well written book whose primary audience are experienced DBAs and system engineers who want to take their PostgreSQL skills to the next level by diving into the details of building highly available PostgreSQL based systems. Reading this book is like drinking from a fire hose, the signal-to-noise ratio is very high; in other words, every single page is packed with important, critical, and very practical information. As a consequence, this also means that the book is not for newbies: not only you have to know the fundamental aspects of PostgreSQL from a database administrator’s point of view, but you also need to have solid GNU/Linux system administration background.

One of the strongest aspects of the book is the author’s principled and well-structured engineering approach to building a highly available PostgreSQL system. Instead of jumping to some recipes to be memorized, the book teaches you basic but very important principles of capacity planning. More importantly, this planning of servers and networking is not only given as a good template, but the author also explains the logic behind it, as well as drawing attention to the reason behind the heuristics he use and why some magic numbers are taken as a good estimate in case of lack of more case-specific information. This style is applied very consistently throughout the book, each recipe is explained so that you know why you do something in addition to how you do it. Read the rest of this entry »

 
Leave a comment

Posted by on August 21, 2014 in Books, Linux, sysadmin

 

Tags: , , , ,

A Bash quirk on `time’ and thoughts of a programmer on its semantics


Here’s a short puzzler for GNU/Linux command line nerds:

Why, indeed? To clarify things a little bit: time is a reserved keyword in Bash and unless you explicitly call the /usr/bin/time program, Bash will execute its internal timing command (see http://www.gnu.org/s/bash/manual/html_node/Pipelines.html#Pipelines, http://linux.die.net/man/1/time, and http://www.gnu.org/s/bash/manual/html_node/Shell-Builtin-Commands.html#Shell-Builtin-Commands). Well, at least that was what I thought until I encountered the example above some time ago, when I was trying to accomplish something on the command line.

Apparently, if the time is not the first token on the command line then the built-in timing function of Bash is not executed. So if you want to change the collation as well as use the built-in timing function you have to do the following:

Another question that comes to mind: Does changing LC_COLLATE lead to some special execution environment? Does it change how Bash interprets its reserved keywords and built-ins? Well, to examine another example, take this: help is also a Bash built-in, but if you go and create a dummy /usr/bin/help and then try to run (e.g. in your home directory) LC_ALL=C help alias, you are going to see that Bash executes the built-in help and does not try to run the program in /usr/bin/*.

It seems like the time built-in has a very special situation and this creates a quirk in Bash semantics. This is not somehing that will bite you regularly, but if you thing Bash as a programming language and the command line as its REPL (Read-Eval-Print-Loop) environment (such as the REPLs for Lisp, Ruby, Python, Scala, etc.), then such inconsistencies in the semantics of a programming environment can be pretty surprising and sometimes even annoying.

*: Thanks to Debian developer Recai Oktaş for this example test.

 
Leave a comment

Posted by on December 18, 2011 in Linux, Programlama, sysadmin

 

On entropy and GNU/Linux


A.Kadir Altan’s blog entry on erasing disks by supplying random data (in Turkish, here’s an automatic translation into English) refreshed my curiosity about the hardware number generators on PCs, especially the ones on ThinkPad laptops and a short search led to the following links. I just wanted to note them down so that I can refer to them in the future:

 
Leave a comment

Posted by on July 26, 2011 in Linux, security, sysadmin

 

My first impressions of drush (DRUpal SHell): Wow!


I just learned about drush (Drupal Shell) and decided to give it a try. It seems somebody finally thought about busy and command-line loving system administrators. I tried to update the Google Analytics module for one of the Drupal-powered sites I was managing, below is the steps I took with drush (do not forget to install the cli version of php if you’re missing it, in my case aptitude install php5-cli was required):
Read the rest of this entry »

 
Leave a comment

Posted by on October 7, 2010 in sysadmin

 

Tags:

Adventures in the world of multi boot USB sticks


I’m in the process of creating the ultimate USB multi boot stick for personal and professional uses, so far I experimented a little bit with Parted Magic, Clonezilla and grml. I want to note down the useful sites so far:

Create your own save-your-ass multi-boot USB stick

Boot Multiple ISO from USB (MultiBoot USB)
Read the rest of this entry »

 
Leave a comment

Posted by on August 31, 2010 in sysadmin

 

fail2ban: Defending Apache against brute force attacks to digest authentication protected pages


I’ve just realized that the default filters installed with fail2ban in Ubuntu GNU/Linux does not help you when you use Digest Authentication with Apache. In order to have the most basic measure against brute force attacks to a digest authentication enabled web service you need to modify /etc/fail2ban/filter.d/apache-auth.conf. I have tried the suggestion given at fail2ban wiki and it seems to work http://www.fail2ban.org/wiki/index.php/Talk:Apache:

Once you add the line above to the apache-auth.conf file, try a to enter wrong username / password combinations when you are presented with the authentication window and then check if fail2ban detects it (I’m assuming your log files are at their usual locations):


$ fail2ban-regex /var/log/apache2/error.log /etc/fail2ban/filter.d/apache-auth.conf

If it returns success and you can see that the relevant IP addresses are matched then you can restart your fail2ban server and have one more level of protection.

 
2 Comments

Posted by on August 21, 2010 in security, sysadmin