json2tsv - www.codemadness.org - www.codemadness.org saait content files
(HTM) git clone git://git.codemadness.org/www.codemadness.org
(DIR) Log
(DIR) Files
(DIR) Refs
(DIR) README
(DIR) LICENSE
---
json2tsv (10110B)
---
1 1<- Back / codemadness.org 70
2 i codemadness.org 70
3 i codemadness.org 70
4 i# json2tsv: a JSON to TSV converter codemadness.org 70
5 i codemadness.org 70
6 iLast modification on 2021-09-25 codemadness.org 70
7 i codemadness.org 70
8 iConvert JSON to TSV or separated output. codemadness.org 70
9 i codemadness.org 70
10 ijson2tsv reads JSON data from stdin. It outputs each JSON type to a TAB- codemadness.org 70
11 iSeparated Value format per line by default. codemadness.org 70
12 i codemadness.org 70
13 i codemadness.org 70
14 i## TAB-Separated Value format codemadness.org 70
15 i codemadness.org 70
16 iThe output format per line is: codemadness.org 70
17 i codemadness.org 70
18 i nodename<TAB>type<TAB>value<LF> codemadness.org 70
19 i codemadness.org 70
20 iControl-characters such as a newline, TAB and backslash (\n, \t and \\) are codemadness.org 70
21 iescaped in the nodename and value fields. Other control-characters are codemadness.org 70
22 iremoved. codemadness.org 70
23 i codemadness.org 70
24 iThe type field is a single byte and can be: codemadness.org 70
25 i codemadness.org 70
26 i* a for array codemadness.org 70
27 i* b for bool codemadness.org 70
28 i* n for number codemadness.org 70
29 i* o for object codemadness.org 70
30 i* s for string codemadness.org 70
31 i* ? for null codemadness.org 70
32 i codemadness.org 70
33 iFiltering on the first field "nodename" is easy using awk for example. codemadness.org 70
34 i codemadness.org 70
35 i codemadness.org 70
36 i## Features codemadness.org 70
37 i codemadness.org 70
38 i* Accepts all **valid** JSON. codemadness.org 70
39 i* Designed to work well with existing UNIX programs like awk and grep. codemadness.org 70
40 i* Straightforward and not much lines of code: about 475 lines of C. codemadness.org 70
41 i* Few dependencies: C compiler (C99), libc. codemadness.org 70
42 i* No need to learn a new (meta-)language for processing data. codemadness.org 70
43 i* The parser supports code point decoding and UTF-16 surrogates to UTF-8. codemadness.org 70
44 i* It does not output control-characters to the terminal for security reasons by codemadness.org 70
45 i default (but it has a -r option if needed). codemadness.org 70
46 h* On OpenBSD it supports »pledge(2)« for syscall restriction: URL:https://man.openbsd.org/pledge codemadness.org 70
47 i pledge("stdio", NULL). codemadness.org 70
48 i* Supports setting a different field separator and record separator with the -F codemadness.org 70
49 i and -R option. codemadness.org 70
50 i codemadness.org 70
51 i codemadness.org 70
52 i## Cons codemadness.org 70
53 i codemadness.org 70
54 i* For the tool there is additional overhead by processing and filtering data codemadness.org 70
55 i from stdin after parsing. codemadness.org 70
56 i* The parser does not do complete validation on numbers. codemadness.org 70
57 i* The parser accepts some bad input such as invalid UTF-8 codemadness.org 70
58 h (see »RFC8259 - 8.1. Character Encoding«). URL:https://tools.ietf.org/html/rfc8259#section-8.1 codemadness.org 70
59 i json2tsv reads from stdin and does not do assumptions about a "closed codemadness.org 70
60 i ecosystem" as described in the RFC. codemadness.org 70
61 i* The parser accepts some bad JSON input and "extensions" codemadness.org 70
62 h (see »RFC8259 - 9. Parsers«). URL:https://tools.ietf.org/html/rfc8259#section-9 codemadness.org 70
63 i* Encoded NUL bytes (\u0000) in strings are ignored. codemadness.org 70
64 h (see »RFC8259 - 9. Parsers«). URL:https://tools.ietf.org/html/rfc8259#section-9 codemadness.org 70
65 i "An implementation may set limits on the length and character contents of codemadness.org 70
66 i strings." codemadness.org 70
67 i* The parser is not the fastest possible JSON parser (but also not the codemadness.org 70
68 i slowest). For example: for ease of use, at the cost of performance all codemadness.org 70
69 i strings are decoded, even though they may be unused. codemadness.org 70
70 i codemadness.org 70
71 i codemadness.org 70
72 i## Why Yet Another JSON parser? codemadness.org 70
73 i codemadness.org 70
74 iI wanted a tool that makes parsing JSON easier and work well from the shell, codemadness.org 70
75 hsimilar to »jq«. URL:https://stedolan.github.io/jq/ codemadness.org 70
76 i codemadness.org 70
77 ised and grep often work well enough for matching some value using some regex codemadness.org 70
78 ipattern, but it is not good enough to parse JSON correctly or to extract all codemadness.org 70
79 iinformation: just like parsing HTML/XML using some regex is not good (enough) codemadness.org 70
80 ior a good idea :P. codemadness.org 70
81 i codemadness.org 70
82 hI didn't want to learn a new specific »meta-language« which jq has and wanted URL:https://stedolan.github.io/jq/manual/#Builtinoperatorsandfunctions codemadness.org 70
83 isomething simpler. codemadness.org 70
84 i codemadness.org 70
85 iWhile it is more efficient to embed this query language for data aggregation, codemadness.org 70
86 iit is also less simple. In my opinion it is simpler to separate this and use codemadness.org 70
87 ipattern-processing by awk or an other filtering/aggregating program. codemadness.org 70
88 i codemadness.org 70
89 iFor the parser, there are many JSON parsers out there, like the efficient codemadness.org 70
90 h»jsmn parser«, however a few parser behaviours I want to have are: URL:https://github.com/zserge/jsmn codemadness.org 70
91 i codemadness.org 70
92 i* jsmn buffers data as tokens, which is very efficient, but also a bit codemadness.org 70
93 i annoying as an API as it requires another layer of code to interpret the codemadness.org 70
94 i tokens. codemadness.org 70
95 i* jsmn does not handle decoding strings by default. Which is very efficient codemadness.org 70
96 i if you don't need parts of the data though. codemadness.org 70
97 i* jsmn does not keep context of nested structures by default, so may require codemadness.org 70
98 i writing custom utility functions for nested data. codemadness.org 70
99 i codemadness.org 70
100 iThis is why I went for a parser design that uses a single callback per "node" codemadness.org 70
101 itype and keeps track of the current nested structure in a single array and codemadness.org 70
102 iemits that. codemadness.org 70
103 i codemadness.org 70
104 i codemadness.org 70
105 i## Clone codemadness.org 70
106 i codemadness.org 70
107 i git clone git://git.codemadness.org/json2tsv codemadness.org 70
108 i codemadness.org 70
109 i codemadness.org 70
110 i## Browse codemadness.org 70
111 i codemadness.org 70
112 iYou can browse the source-code at: codemadness.org 70
113 i codemadness.org 70
114 h* https://git.codemadness.org/json2tsv/ URL:https://git.codemadness.org/json2tsv/ codemadness.org 70
115 1* gopher://codemadness.org/1/git/json2tsv /git/json2tsv codemadness.org 70
116 i codemadness.org 70
117 i codemadness.org 70
118 i## Download releases codemadness.org 70
119 i codemadness.org 70
120 iReleases are available at: codemadness.org 70
121 i codemadness.org 70
122 h* https://codemadness.org/releases/json2tsv/ URL:https://codemadness.org/releases/json2tsv/ codemadness.org 70
123 1* gopher://codemadness.org/1/releases/json2tsv /releases/json2tsv codemadness.org 70
124 i codemadness.org 70
125 i codemadness.org 70
126 i## Build and install codemadness.org 70
127 i codemadness.org 70
128 i $ make codemadness.org 70
129 i # make install codemadness.org 70
130 i codemadness.org 70
131 i codemadness.org 70
132 i## Examples codemadness.org 70
133 i codemadness.org 70
134 hAn usage example to parse posts of the JSON API of »reddit.com« and format them URL:https://www.reddit.com/ codemadness.org 70
135 ito a plain-text list using awk: codemadness.org 70
136 i codemadness.org 70
137 i #!/bin/sh codemadness.org 70
138 i curl -s -H 'User-Agent:' 'https://old.reddit.com/.json?raw_json=1&limit=100' | \ codemadness.org 70
139 i json2tsv | \ codemadness.org 70
140 i awk -F '\t' ' codemadness.org 70
141 i function show() { codemadness.org 70
142 i if (length(o["title"]) == 0) codemadness.org 70
143 i return; codemadness.org 70
144 i print n ". " o["title"] " by " o["author"] " in r/" o["subreddit"]; codemadness.org 70
145 i print o["url"]; codemadness.org 70
146 i print ""; codemadness.org 70
147 i } codemadness.org 70
148 i $1 == ".data.children[].data" { codemadness.org 70
149 i show(); codemadness.org 70
150 i n++; codemadness.org 70
151 i delete o; codemadness.org 70
152 i } codemadness.org 70
153 i $1 ~ /^\.data\.children\[\]\.data\.[a-zA-Z0-9_]*$/ { codemadness.org 70
154 i o[substr($1, 23)] = $3; codemadness.org 70
155 i } codemadness.org 70
156 i END { codemadness.org 70
157 i show(); codemadness.org 70
158 i }' codemadness.org 70
159 i codemadness.org 70
160 i codemadness.org 70
161 i## References codemadness.org 70
162 i codemadness.org 70
163 i* Sites: codemadness.org 70
164 h * seriot.ch - Parsing JSON is a Minefield URL:http://seriot.ch/parsing_json.php codemadness.org 70
165 h * A comprehensive test suite for RFC 8259 compliant JSON parsers URL:https://github.com/nst/JSONTestSuite codemadness.org 70
166 h * json.org URL:https://json.org/ codemadness.org 70
167 i* Current standard: codemadness.org 70
168 h * RFC8259 - The JavaScript Object Notation (JSON) Data Interchange Format URL:https://tools.ietf.org/html/rfc8259 codemadness.org 70
169 h * Standard ECMA-404 - The JSON Data Interchange Syntax (2nd edition (December 2017) URL:https://www.ecma-international.org/publications/standards/Ecma-404.htm codemadness.org 70
170 i* Historic standards: codemadness.org 70
171 h * RFC7159 - The JavaScript Object Notation (JSON) Data Interchange Format (obsolete) URL:https://tools.ietf.org/html/rfc7159 codemadness.org 70
172 h * RFC7158 - The JavaScript Object Notation (JSON) Data Interchange Format (obsolete) URL:https://tools.ietf.org/html/rfc7158 codemadness.org 70
173 h * RFC4627 - The JavaScript Object Notation (JSON) Data Interchange Format (obsolete, original) URL:https://tools.ietf.org/html/rfc4627 codemadness.org 70
174 .