Tag Archives: json

Data Processing Resources: Command-line Interface (CLI) for CSV, TSV, JSON, and XML


  • 2020-05-27: Added rows for tabular data processing
  • 2020-05-21: Added huniq
  • 2020-01-23: Added VisiData CLI data viewer and plotter
  • 2020-01-10: Added fx interactive JSON viewer and processor.
  • 2019-02-08: Added BigBash It!
  • 2019-01-03: Added GNU datamash for CSV processing
  • 2018-10-24: Added gron for JSON processing.

Sometimes you don’t want pandas, tidyverse, Excel, or PostgreSQL. You know they are very powerful and flexible, you know if you’re already using them daily you can utilize them. But sometimes you just want to be left alone with your CVS, TSV, JSON and XML files, process them quickly on the command line, and get done with it. And you want something a little more specialized than awk , cut, and sed.

This list is by no means complete and authoritative. I compiled this as a reference that I can come back later. If you have other suggestions that are according to the spirit of this article, feel free to share them by writing a comment at the end. Without further ado, here’s my list:

  • xsv: A fast CSV command line toolkit written by the author of ripgrep. It’s useful for indexing, slicing, analyzing, splitting and joining CSV files.
  • q: run SQL directly on CSV or TSV files.
  • csvkit: a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.
  • textql: execute SQL against structured text like CSV or TSV.
  • miller: like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON. You get to work with your data using named fields, without needing to count positional column indices.
  • agate: a Python data analysis library that is optimized for humans instead of machines. It is an alternative to numpy and pandas that solves real-world problems with readable code.

Honorable mentions:

  • GNU datamash: a command-line program which performs basic numeric, textual and statistical operations on input textual data files. See examples & one-liners.
  • SQLite: import a CSV File Into an SQLite Table, and use plain SQL to query it.
  • csv-mode for Emacs: sort, align, transpose, and manage rows and fields of CSV files.
  • lnav: the Log Navigator. See the tutorial in Linux Magazine.
  • jq: this one is THE tool for processing JSON on the command-line. It’s like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.
  • gron: transforms JSON into discrete assignments to make it easier to grep for what you want and see the absolute path to it. (Why shouldn’t you just use jq?)
  • jid: JSON Incremental Digger, drill down JSON interactively by using filtering queries like jq.
  • jiq: jid with jq.
  • fx: interactive JSON viewer and processor.
  • JMESPath tutorial: a query language for JSON. You can extract and transform elements from a JSON document. There are a lot of implementations at and the CLI implementation is jp.
  • BigBash It!: converts your SQL SELECT queries into an autonomous Bash one-liner that can be executed on almost any *nix device to make quick analyses or crunch GB of log files in CSV format. Perfectly suited for Big Data tasks on your local machine. Source code available at
  • VisiData: an interactive multi-tool for tabular data. In addition to CSV, it supports a lot of formats, and has plotting capabilities inside the terminal. You can use it to convert between formats, as an interactive replacement for grep, awk, sed, cut, sort, uniq, create ad-hoc data pipelines on the command line, and a generic utility with many automation capabilities.
  • huniq: Replacement for sort | uniq optimized for speed (10x faster) when sorting is not needed. huniq replaces sort | uniq (or sort -u with gnu sort) and huniq -c replaces sort | uniq -c. The order of the output is stable when in normal mode, but it is not stable when in -c/count mode.
  • rows: No matter in which format your tabular data is: rows will import it, automatically detect types and give you high-level Python objects so you can start working with the data instead of trying to parse it. It is also locale-and-unicode aware. More information is in the documentation. The author of miller likes rows, too!

Finally the CLI for XML:

1 Comment

Posted by on July 10, 2018 in Linux, Programlama


Tags: , , , , , , , , , ,

How to start your calendar analytics using Microsoft Graph API

As a Data Officer who started to work in a big manufacturing company a few months ago, I attended a lot of meetings in many locations. After a while, wearing my analytical hat, I asked myself: how can I easily see

  • how many meetings I’ve attended
  • how many people I’ve met
  • how many locations I’ve been to

Based on the fact my company had already switched to Microsoft Office 365, I came across a very nice unified system: Microsoft Graph API. According to the description in its official overview, :

You can use the Microsoft Graph API to interact with the data of millions of users in the Microsoft cloud. Use Microsoft Graph to build apps for organizations and consumers that connect to a wealth of resources, relationships, and intelligence, all through a single endpoint Microsoft Graph is made up of resources connected by relationships. For example, a user can be connected to a group through a memberOf relationship, and to another user through a manager relationship. Your app can traverse these relationships to access these connected resources and perform actions on them through the API. You can also get valuable insights and intelligence about the data from Microsoft Graph. For example, you can get the popular files trending around a particular user, or get the most relevant people around a user.

Read the rest of this entry »

Leave a comment

Posted by on March 23, 2018 in business, Programlama


Tags: , , , , ,