RSS

Tag Archives: AWK

Data Processing Resources: Command-line Interface (CLI) for CSV, TSV, JSON, and XML


UPDATE on 24-Oct-2018: Added gron for JSON processing.

Sometimes you don’t want pandas, tidyverse, Excel, or PostgreSQL. You know they are very powerful and flexible, you know if you’re already using them daily you can utilize them. But sometimes you just want to be left alone with your CVS, TSV, JSON and XML files, process them quickly on the command line, and get done with it. And you want something a little more specialized than awk , cut, and sed.

This list is by no means complete and authoritative. I compiled this as a reference that I can come back later. If you have other suggestions that are according to the spirit of this article, feel free to share them by writing a comment at the end. Without further ado, here’s my list:

  • xsv: A fast CSV command line toolkit written by the author of ripgrep. It’s useful for indexing, slicing, analyzing, splitting and joining CSV files.
  • q: run SQL directly on CSV or TSV files.
  • csvkit: a suite of command-line tools for converting to and working with CSV, the king of tabular file formats.
  • textql: execute SQL against structured text like CSV or TSV.
  • miller: like awk, sed, cut, join, and sort for name-indexed data such as CSV, TSV, and tabular JSON. You get to work with your data using named fields, without needing to count positional column indices.
  • agate: a Python data analysis library that is optimized for humans instead of machines. It is an alternative to numpy and pandas that solves real-world problems with readable code.

Honorable mentions:

  • SQLite: import a CSV File Into an SQLite Table, and use plain SQL to query it.
  • csv-mode for Emacs: sort, align, transpose, and manage rows and fields of CSV files.
  • lnav: the Log Navigator. See the tutorial in Linux Magazine.
  • jq: this one is THE tool for processing JSON on the command-line. It’s like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.
  • gron: transforms JSON into discrete assignments to make it easier to grep for what you want and see the absolute path to it. (Why shouldn’t you just use jq?)
  • jid: JSON Incremental Digger, drill down JSON interactively by using filtering queries like jq.
  • jiq: jid with jq.
  • JMESPath tutorial: a query language for JSON. You can extract and transform elements from a JSON document. There are a lot of implementations at http://jmespath.org/libraries.html and the CLI implementation is jp.

Finally the CLI for XML:

Advertisements
 
1 Comment

Posted by on July 10, 2018 in Linux, Programlama

 

Tags: , , , , , , , , , ,

GNU/Linux command line tip of the day: sum of numbers in a column


More often than not, I need to quickly need to see the sum of a column of numbers when I’m doing some processing on the GNU/Linux command line. For the sake of simplicity, let’s assume that you have the following output from some command line pipe:
Read the rest of this entry »

 
3 Comments

Posted by on May 28, 2013 in awk, Linux

 

Tags: , , , , ,