json2tsv - www.codemadness.org - www.codemadness.org saait content files
 (HTM) git clone git://git.codemadness.org/www.codemadness.org
 (DIR) Log
 (DIR) Files
 (DIR) Refs
 (DIR) README
 (DIR) LICENSE
       ---
       json2tsv (10110B)
       ---
            1 1<- Back        /        codemadness.org        70
            2 i                codemadness.org        70
            3 i                codemadness.org        70
            4 i# json2tsv: a JSON to TSV converter                codemadness.org        70
            5 i                codemadness.org        70
            6 iLast modification on 2021-09-25                codemadness.org        70
            7 i                codemadness.org        70
            8 iConvert JSON to TSV or separated output.                codemadness.org        70
            9 i                codemadness.org        70
           10 ijson2tsv reads JSON data from stdin.  It outputs each JSON type to a TAB-                codemadness.org        70
           11 iSeparated Value format per line by default.                codemadness.org        70
           12 i                codemadness.org        70
           13 i                codemadness.org        70
           14 i## TAB-Separated Value format                codemadness.org        70
           15 i                codemadness.org        70
           16 iThe output format per line is:                codemadness.org        70
           17 i                codemadness.org        70
           18 i        nodename<TAB>type<TAB>value<LF>                codemadness.org        70
           19 i                codemadness.org        70
           20 iControl-characters such as a newline, TAB and backslash (\n, \t and \\) are                codemadness.org        70
           21 iescaped in the nodename and value fields.  Other control-characters are                codemadness.org        70
           22 iremoved.                codemadness.org        70
           23 i                codemadness.org        70
           24 iThe type field is a single byte and can be:                codemadness.org        70
           25 i                codemadness.org        70
           26 i* a for array                codemadness.org        70
           27 i* b for bool                codemadness.org        70
           28 i* n for number                codemadness.org        70
           29 i* o for object                codemadness.org        70
           30 i* s for string                codemadness.org        70
           31 i* ? for null                codemadness.org        70
           32 i                codemadness.org        70
           33 iFiltering on the first field "nodename" is easy using awk for example.                codemadness.org        70
           34 i                codemadness.org        70
           35 i                codemadness.org        70
           36 i## Features                codemadness.org        70
           37 i                codemadness.org        70
           38 i* Accepts all **valid** JSON.                codemadness.org        70
           39 i* Designed to work well with existing UNIX programs like awk and grep.                codemadness.org        70
           40 i* Straightforward and not much lines of code: about 475 lines of C.                codemadness.org        70
           41 i* Few dependencies: C compiler (C99), libc.                codemadness.org        70
           42 i* No need to learn a new (meta-)language for processing data.                codemadness.org        70
           43 i* The parser supports code point decoding and UTF-16 surrogates to UTF-8.                codemadness.org        70
           44 i* It does not output control-characters to the terminal for security reasons by                codemadness.org        70
           45 i  default (but it has a -r option if needed).                codemadness.org        70
           46 h* On OpenBSD it supports »pledge(2)« for syscall restriction:        URL:https://man.openbsd.org/pledge        codemadness.org        70
           47 i  pledge("stdio", NULL).                codemadness.org        70
           48 i* Supports setting a different field separator and record separator with the -F                codemadness.org        70
           49 i  and -R option.                codemadness.org        70
           50 i                codemadness.org        70
           51 i                codemadness.org        70
           52 i## Cons                codemadness.org        70
           53 i                codemadness.org        70
           54 i* For the tool there is additional overhead by processing and filtering data                codemadness.org        70
           55 i  from stdin after parsing.                codemadness.org        70
           56 i* The parser does not do complete validation on numbers.                codemadness.org        70
           57 i* The parser accepts some bad input such as invalid UTF-8                codemadness.org        70
           58 h  (see »RFC8259 - 8.1. Character Encoding«).        URL:https://tools.ietf.org/html/rfc8259#section-8.1        codemadness.org        70
           59 i  json2tsv reads from stdin and does not do assumptions about a "closed                codemadness.org        70
           60 i  ecosystem" as described in the RFC.                codemadness.org        70
           61 i* The parser accepts some bad JSON input and "extensions"                codemadness.org        70
           62 h  (see »RFC8259 - 9. Parsers«).        URL:https://tools.ietf.org/html/rfc8259#section-9        codemadness.org        70
           63 i* Encoded NUL bytes (\u0000) in strings are ignored.                codemadness.org        70
           64 h  (see »RFC8259 - 9. Parsers«).        URL:https://tools.ietf.org/html/rfc8259#section-9        codemadness.org        70
           65 i  "An implementation may set limits on the length and character contents of                codemadness.org        70
           66 i  strings."                codemadness.org        70
           67 i* The parser is not the fastest possible JSON parser (but also not the                codemadness.org        70
           68 i  slowest).  For example: for ease of use, at the cost of performance all                codemadness.org        70
           69 i  strings are decoded, even though they may be unused.                codemadness.org        70
           70 i                codemadness.org        70
           71 i                codemadness.org        70
           72 i## Why Yet Another JSON parser?                codemadness.org        70
           73 i                codemadness.org        70
           74 iI wanted a tool that makes parsing JSON easier and work well from the shell,                codemadness.org        70
           75 hsimilar to »jq«.        URL:https://stedolan.github.io/jq/        codemadness.org        70
           76 i                codemadness.org        70
           77 ised and grep often work well enough for matching some value using some regex                codemadness.org        70
           78 ipattern, but it is not good enough to parse JSON correctly or to extract all                codemadness.org        70
           79 iinformation: just like parsing HTML/XML using some regex is not good (enough)                codemadness.org        70
           80 ior a good idea :P.                codemadness.org        70
           81 i                codemadness.org        70
           82 hI didn't want to learn a new specific »meta-language« which jq has and wanted        URL:https://stedolan.github.io/jq/manual/#Builtinoperatorsandfunctions        codemadness.org        70
           83 isomething simpler.                codemadness.org        70
           84 i                codemadness.org        70
           85 iWhile it is more efficient to embed this query language for data aggregation,                codemadness.org        70
           86 iit is also less simple. In my opinion it is simpler to separate this and use                codemadness.org        70
           87 ipattern-processing by awk or an other filtering/aggregating program.                codemadness.org        70
           88 i                codemadness.org        70
           89 iFor the parser, there are many JSON parsers out there, like the efficient                codemadness.org        70
           90 h»jsmn parser«, however a few parser behaviours I want to have are:        URL:https://github.com/zserge/jsmn        codemadness.org        70
           91 i                codemadness.org        70
           92 i* jsmn buffers data as tokens, which is very efficient, but also a bit                codemadness.org        70
           93 i  annoying as an API as it requires another layer of code to interpret the                codemadness.org        70
           94 i  tokens.                codemadness.org        70
           95 i* jsmn does not handle decoding strings by default. Which is very efficient                codemadness.org        70
           96 i  if you don't need parts of the data though.                codemadness.org        70
           97 i* jsmn does not keep context of nested structures by default, so may require                codemadness.org        70
           98 i  writing custom utility functions for nested data.                codemadness.org        70
           99 i                codemadness.org        70
          100 iThis is why I went for a parser design that uses a single callback per "node"                codemadness.org        70
          101 itype and keeps track of the current nested structure in a single array and                codemadness.org        70
          102 iemits that.                codemadness.org        70
          103 i                codemadness.org        70
          104 i                codemadness.org        70
          105 i## Clone                codemadness.org        70
          106 i                codemadness.org        70
          107 i        git clone git://git.codemadness.org/json2tsv                codemadness.org        70
          108 i                codemadness.org        70
          109 i                codemadness.org        70
          110 i## Browse                codemadness.org        70
          111 i                codemadness.org        70
          112 iYou can browse the source-code at:                codemadness.org        70
          113 i                codemadness.org        70
          114 h* https://git.codemadness.org/json2tsv/        URL:https://git.codemadness.org/json2tsv/        codemadness.org        70
          115 1* gopher://codemadness.org/1/git/json2tsv        /git/json2tsv        codemadness.org        70
          116 i                codemadness.org        70
          117 i                codemadness.org        70
          118 i## Download releases                codemadness.org        70
          119 i                codemadness.org        70
          120 iReleases are available at:                codemadness.org        70
          121 i                codemadness.org        70
          122 h* https://codemadness.org/releases/json2tsv/        URL:https://codemadness.org/releases/json2tsv/        codemadness.org        70
          123 1* gopher://codemadness.org/1/releases/json2tsv        /releases/json2tsv        codemadness.org        70
          124 i                codemadness.org        70
          125 i                codemadness.org        70
          126 i## Build and install                codemadness.org        70
          127 i                codemadness.org        70
          128 i        $ make                codemadness.org        70
          129 i        # make install                codemadness.org        70
          130 i                codemadness.org        70
          131 i                codemadness.org        70
          132 i## Examples                codemadness.org        70
          133 i                codemadness.org        70
          134 hAn usage example to parse posts of the JSON API of »reddit.com« and format them        URL:https://www.reddit.com/        codemadness.org        70
          135 ito a plain-text list using awk:                codemadness.org        70
          136 i                codemadness.org        70
          137 i        #!/bin/sh                codemadness.org        70
          138 i        curl -s -H 'User-Agent:' 'https://old.reddit.com/.json?raw_json=1&limit=100' | \                codemadness.org        70
          139 i        json2tsv | \                codemadness.org        70
          140 i        awk -F '\t' '                codemadness.org        70
          141 i        function show() {                codemadness.org        70
          142 i                if (length(o["title"]) == 0)                codemadness.org        70
          143 i                        return;                codemadness.org        70
          144 i                print n ". " o["title"] " by " o["author"] " in r/" o["subreddit"];                codemadness.org        70
          145 i                print o["url"];                codemadness.org        70
          146 i                print "";                codemadness.org        70
          147 i        }                codemadness.org        70
          148 i        $1 == ".data.children[].data" {                codemadness.org        70
          149 i                show();                codemadness.org        70
          150 i                n++;                codemadness.org        70
          151 i                delete o;                codemadness.org        70
          152 i        }                codemadness.org        70
          153 i        $1 ~ /^\.data\.children\[\]\.data\.[a-zA-Z0-9_]*$/ {                codemadness.org        70
          154 i                o[substr($1, 23)] = $3;                codemadness.org        70
          155 i        }                codemadness.org        70
          156 i        END {                codemadness.org        70
          157 i                show();                codemadness.org        70
          158 i        }'                codemadness.org        70
          159 i                codemadness.org        70
          160 i                codemadness.org        70
          161 i## References                codemadness.org        70
          162 i                codemadness.org        70
          163 i* Sites:                codemadness.org        70
          164 h  * seriot.ch - Parsing JSON is a Minefield        URL:http://seriot.ch/parsing_json.php        codemadness.org        70
          165 h  * A comprehensive test suite for RFC 8259 compliant JSON parsers        URL:https://github.com/nst/JSONTestSuite        codemadness.org        70
          166 h  * json.org        URL:https://json.org/        codemadness.org        70
          167 i* Current standard:                codemadness.org        70
          168 h  * RFC8259 - The JavaScript Object Notation (JSON) Data Interchange Format        URL:https://tools.ietf.org/html/rfc8259        codemadness.org        70
          169 h  * Standard ECMA-404 - The JSON Data Interchange Syntax (2nd edition (December 2017)        URL:https://www.ecma-international.org/publications/standards/Ecma-404.htm        codemadness.org        70
          170 i* Historic standards:                codemadness.org        70
          171 h  * RFC7159 - The JavaScript Object Notation (JSON) Data Interchange Format (obsolete)        URL:https://tools.ietf.org/html/rfc7159        codemadness.org        70
          172 h  * RFC7158 - The JavaScript Object Notation (JSON) Data Interchange Format (obsolete)        URL:https://tools.ietf.org/html/rfc7158        codemadness.org        70
          173 h  * RFC4627 - The JavaScript Object Notation (JSON) Data Interchange Format (obsolete, original)        URL:https://tools.ietf.org/html/rfc4627        codemadness.org        70
          174 .