(TXT) View source
       
       # 2024-11-02 - No YAML, No Recutils
       
       Back around 2011 i wrote a private database, and at some point
       migrated it to a file based format.  Each record is a YAML file.
       The data is presented in a vertical format with one line per field,
       plus some multi-line blocks defined by indentation.  Trivial to
       edit in any text editor.  I used Tcl and the yaml module from libtcl
       to process the data.
       
       Recently i found the No YAML web site and i decided it was time
       for a change.
       
 (HTM) No YAML
       
       I briefly considered CSV, TSV, and JSON.  They are all mature,
       standardized formats, but in my opinion they fall short when it
       comes to editing in a vertical format in a plain text editor.
       
       I looked at GNU Recutils, since several folks wrote about it their
       phlogs.  Like my YAML files, recutils presents the data in a vertical
       format with one line per field, plus it can do multi-line blocks via
       line continuations.  The format is fine for my purposes.
       
       One of my requirements is that i want this to work on FreeDOS too.
       Recutils requires filesystem support for ACL, which is too fancy for
       DOS.  The format is simple enough, but the source code is
       surprisingly complex.  It does a fraction of what sqlite3 does, and
       in a less portable, less robust way.
       
       I tried making my own format based on ASCII control codes.  I could
       use Control-^ (the RS character) as the record separator, Control-_
       (the US character) as the unit separator AKA the field separator, and
       Control-X (the CAN or Cancel character) to discard all text since the
       beginning of the field.  This format works in ed(1) and the calvin vi
       clone on DOS. However, the control characters are a little ridiculous
       to look at and type in.  I did not want to foist such an eye-sore on
       my future self.
       
 (DIR) csvtofsv also uses ASCII control codes as delimiters
       
       I tried another format based on something i found online.  Like my
       YAML files, this format has one file per record and one line per
       field, presented in a vertical format.  The record separator is an
       empty line.  The field separator is the EOL (end of the line).  Each
       field has a name, a colon character, a space, and optionally a value.
       Any line that begins with whitespace is a line continuation from the
       previous field.
       
       This format is trivial to process in AWK.  No special parser
       required.  I converted my private database to this format.  Exporting
       the whole AWK database to CSV took 0.3 seconds, compared to 3 seconds
       in the Tcl & YAML version.
       
       I added one feature: inline blocks of multi-line text. The block
       format is the same as the line contination format, except the initial
       value is a backslash character.
       
       For example, here is a line continuation.
       
           fieldname: First sentence.
             Second sentence.
             Third sentence.
       
       When this value is read, the EOL and indentation are removed.
       That's why this can also be represented without continuation.
       
           fieldname: First sentence. Second sentence. Third sentence.
       
       Here is an inline block of multi-line text.
       
           fieldname: \
             Line 1 of 3.
             Line 2 of 3.
             Line 3 of 3.
       
       When this value is read, the indentation is removed, but the
       EOL is preserved.  The value contains multiple lines.
       
       Time for me to shut up and show them the code.  Below are two
       small AWK scripts to convert from gopher lawn format to TSV
       and back.
       
 (TXT) lawn2tsv.awk
       
 (TXT) tsv2lawn.awk
       
       p.s.
       
       In theory, if i wanted to migrate the data, i could use uncsv to
       convert between TSV and CSV.  GNU recutils can import and export
       CSV.
       
       I was told that the gopher lawn format resembles the Header Fields
       format in email standards.  See section 2.2 of RFC 5322.
       
 (TXT) gopher://gopher.32kb.net/0/rfc/rfc5322.txt
       
       The VCARD format is also similar.  See section 6.10 of RFC 6350 for
       Extended Properties and Parameters.  I could have abused this format
       but i think it is too complex for my purposes.
       
 (TXT) gopher://gopher.32kb.net/0/rfc/rfc6350.txt
       
       tags: bencollver,technical,unix
       
       # Tags
       
 (DIR) bencollver
 (DIR) technical
 (DIR) unix