2024-08-01 - XML2TSV
====================
I have had fun using json2tsv. It simplifies the task of using JSON
data in AWK scripts.
I wanted something like json2tsv but for XML. Conventional wisdom
says that parsing XML with regex causes madness, and results in N+1
problems. You have been warned.
I wrote separate scripts to be used in a pipeline. Breaking it down
into multiple steps greatly simplifies the task of parsing XML in
AWK.
* cdata.awk converts the weird sections a more "regular"
format that xml2tsv.awk can parse.
* xmlrem.awk removes the sections.
* xml2tsv.awk reads XML on stdin and prints TSV on stdout, replacing
tab with \t, carriage return with \r, and newline with \n.
Usage:
cdata.awk file.xml | xmlrem.awk | xml2tsv.awk >file.tsv
I tested this script in gawk, mawk, and nawk, including my 16-bit DOS
build of nawk. This is just a toy and i would not recommend using it
on large data sets.
Source code:
tags: bencollver,retrocomputing,technical
Tags
====
bencollver
retrocomputing
technical