RSS

Category Archives: Programlama

Programlama kategorisi

Data Processing Resources: Command-line Interface (CLI) for CSV, TSV, JSON, and XML


Sometimes you don’t want pandas, tidyverse, Excel, or PostgreSQL. You know they are very powerful and flexible, you know if you’re already using them daily you can utilize them. But sometimes you just want to be left alone with your CVS, TSV, JSON and XML files, process them quickly on the command line, and get done with it. And you want something a little more specialized than awk , cut, and sed.

This list is by no means complete and authoritative. I compiled this as a reference that I can come back later. If you have other suggestions that are according to the spirit of this article, feel free to share them by writing a comment at the end. Without further ado, here’s my list:

  • xsv: A fast CSV command line toolkit written by the author of ripgrep. It’s useful for indexing, slicing, analyzing, splitting and joining CSV files.
  • q: run SQL directly on CSV or TSV files.
  • csvkit: a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.
  • textql: execute SQL against structured text like CSV or TSV.
  • miller: like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON. You get to work with your data using named fields, without needing to count positional column indices.
  • agate: a Python data analysis library that is optimized for humans instead of machines. It is an alternative to numpy and pandas that solves real-world problems with readable code.

Honorable mentions:

  • SQLite: import a CSV File Into an SQLite Table, and use plain SQL to query it.
  • csv-mode for Emacs: sort, align, transpose, and manage rows and fields of CSV files.
  • lnav: the Log Navigator. See the tutorial in Linux Magazine.
  • jq: this one is THE tool for processing JSON on the command-line. It’s like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.
  • JMESPath tutorial: a query language for JSON. You can extract and transform elements from a JSON document. There are a lot of implementations at http://jmespath.org/libraries.html and the CLI implementation is jp.

Finally the CLI for XML:

Advertisements
 
Leave a comment

Posted by on July 10, 2018 in Linux, Programlama

 

Tags: , , , , , , , , , ,

How to start your calendar analytics using Microsoft Graph API


As a Data Officer who started to work in a big manufacturing company a few months ago, I attended a lot of meetings in many locations. After a while, wearing my analytical hat, I asked myself: how can I easily see

  • how many meetings I’ve attended
  • how many people I’ve met
  • how many locations I’ve been to

Based on the fact my company had already switched to Microsoft Office 365, I came across a very nice unified system: Microsoft Graph API. According to the description in its official overview, :

You can use the Microsoft Graph API to interact with the data of millions of users in the Microsoft cloud. Use Microsoft Graph to build apps for organizations and consumers that connect to a wealth of resources, relationships, and intelligence, all through a single endpoint https://graph.microsoft.com. Microsoft Graph is made up of resources connected by relationships. For example, a user can be connected to a group through a memberOf relationship, and to another user through a manager relationship. Your app can traverse these relationships to access these connected resources and perform actions on them through the API. You can also get valuable insights and intelligence about the data from Microsoft Graph. For example, you can get the popular files trending around a particular user, or get the most relevant people around a user.

Read the rest of this entry »

 
Leave a comment

Posted by on March 23, 2018 in business, Programlama

 

Tags: , , , , ,

HOW-TO: Poor Man’s Quiz Scorer in Emacs Lisp


Sometimes it’s about small data.

Recently I’ve been studying a topic using a book, and at the end of each chapter there are quizzes of 20-25 questions. My method was to open a text buffer in Emacs, and vertically note down the question number, my answers, and after finishing this, go to the correct answer list, and mark correct answers with Y, and wrong ones with N:

Y 1- A,B,C
Y 2- C
N 3- D
N 4- C
Y 5- A
Y 6- B,C,D
Y 7- B

That was all fine, but I found myself counting the number of correct answers, and calculate my score in terms of percentage manually. I could of course quickly run the `M-x count-matches` (or `how-many`) to see how many correct / wrong answers I had, but doing this more than a few tests seemed to become tedious. Therefore, Emacs Lisp to the rescue!

With this simple Emacs Lisp function, assigned to F12 function key, I can now simply hit F12 and see my score percentage in Emacs:

“Your test score percentage: 71.42857142857143”

There are of course alternative methods to solve this straightforward problem, e.g. you can run some shell scripts on your Emacs (or VIM) buffer, or on a simple text file. But I like this solution being self-contained in Emacs, as well as the fact that it’ll continue to run as intended 30 years from now, in a newer Emacs version probably. This, and the fact that, my preferred tool for dealing with textual data gives me the flexibility to program it any way I want. It might have its drawbacks, it might be showing its age in its architecture, there are newer, shiny tools, there are specialized IDEs for various programming languages I use, but all of these notwithstanding, I still like to think it’s a beautiful thing that an editor released when I was born, that helped me with many tasks for decades, will still be with me for decades to come.

 
Leave a comment

Posted by on February 26, 2017 in Emacs, Programlama

 

Tags: ,

One year with “Haskell Programming from First Principles”


My relationship with the Haskell programming language, my efforts to learn it had its ups and downs throughout the years. According to my memory and the archives of my blog, my first attempts had been around 2005 – 2006, more than 12 years ago. Back then, apart from a few books written by university professors, and some Wiki-based books, I couldn’t find much high quality material for beginners. Therefore, my efforts didn’t last very long. A few years later, I heard the news about a new book, “Real World Haskell” being written. I was excited once again, I even made a few comments here and there as the book was being written. Unfortunately, life happened, and I couldn’t spend much time on that nice book, too. Fast forward to the end of 2015, and I was working at a company in Ghent, Belgium where there were some Haskell experts, trying out things in an industrial storage system development environment. The teams that I was part of had nothing to do with Haskell though, my daily job was almost always about Python, Bash, ActionScript, Java, and some Scala. Nevertheless, being in such an environment rekindled my curiosity, and I decided to look around to see if there was some new Haskell books targeted at people who didn’t use this language before. Luckily, I’ve heard about the book “Haskell Programming from First Principles“, and I decided to give it a try. Therefore I bought the book, and started to read and study it in the beginning of 2016. Since Haskell was not at all used in my daily job, I could study the book only in my spare time, therefore it took me about 1 year to finish the book, doing most of the exercises. Read the rest of this entry »

 
 

Tags:

Faster, RegEx! Match! Match! (Which Regular Expression Utility is the Fastest?)


When it comes to dealing with text data, regular expressions are the bread and butter of data processing, as well as programming, most of the time. Hardly a day or two passes before you use grep or a similar tool. Until recently, I thought the field of regular expressions and related tools were very useful, boring, and didn’t present any innovations. It turns out that I was wrong!

There are two relatively new players in town: ICgrep and ripgrep.

ICGrep uses a new, parallel bitstream technology, developed Dr. Robert D. Cameron at Simon Fraser University. It claims to be super fast for many text search and processing tasks. ICGrep is available for download from http://www.icgrep.com/downloads.htm as a binary executable for OS X / MacOS. Its source code is also available if you want to build it for your operating system.

ripgrep is developed mainly by Andrew Gallant and other open source contributors, and its source code is available at https://github.com/BurntSushi/ripgrep. It is developed in Rust programming language, and claims to be very fast, Unicode-ready, as well as smart; ready to replace the Silver Searcher (ag), and “ack“.

Let’s see how they compare to the venerable regular expression utilities that we all know and love. Read the rest of this entry »

 
1 Comment

Posted by on November 3, 2016 in Linux, Programlama, sysadmin

 

Tags: , , , , , , , , ,

How to decrease the Maven build time of your Java projects


There are good resources on the web that shows how you can decrease the Maven build times of Java projects, but since I couldn’t find the following information in most of them, I wanted to note this down for future reference. One of the simplest things you can do to decrease the Maven build time is to add the following to your command line:

-Dmaven.javadoc.skip=true

But is it worth it? Let’s check. Take an example project such as Hadoop that is about 2 million lines of source code. Without skipping the generation of Javadoc, Read the rest of this entry »

 
2 Comments

Posted by on May 23, 2016 in java, Programlama

 

Tags: , ,

Turkish Mode for Emacs is now available as a package via MELPA


Turkish Mode for Emacs, developed by Deniz Yüret, is now available as a package via MELPA. This is for people trying to type Turkish documents on a U.S. keyboard using Emacs. The program provides a `turkish-mode` in which the correct Turkish accents are added to the ASCII version of the last word typed each time the user hits space. If you are using a recent stable version of Emacs that lets you use the Emacs package manager, and you’ve added MELPA as a repository, installing it is as easy as running:

M-x package-install turkish

and then putting the following line in your init file:

(require 'turkish)

Once you have done that, in any Emacs session you can toggle the Turkish mode

M-x turkish-mode

The same program has been converted to many different languages and available on many platforms such as a Python package, a Java package, a Perl CPAN package, an Ubuntu PPA package, a web application,  a Chrome plug-in, a Firefox add-on, and a Safari add-on.

 
Leave a comment

Posted by on March 29, 2016 in Emacs, Programlama

 

Tags: