2024-08-01 - XML2TSV ==================== I have had fun using json2tsv. It simplifies the task of using JSON data in AWK scripts. I wanted something like json2tsv but for XML. Conventional wisdom says that parsing XML with regex causes madness, and results in N+1 problems. You have been warned. I wrote separate scripts to be used in a pipeline. Breaking it down into multiple steps greatly simplifies the task of parsing XML in AWK. * cdata.awk converts the weird sections a more "regular" format that xml2tsv.awk can parse. * xmlrem.awk removes the sections. * xml2tsv.awk reads XML on stdin and prints TSV on stdout, replacing tab with \t, carriage return with \r, and newline with \n. Usage: cdata.awk file.xml | xmlrem.awk | xml2tsv.awk >file.tsv I tested this script in gawk, mawk, and nawk, including my 16-bit DOS build of nawk. This is just a toy and i would not recommend using it on large data sets. Source code: tags: bencollver,retrocomputing,technical Tags ==== bencollver retrocomputing technical