hngopher.com

       [HN Gopher] Show HN: Paradict - Streamable multi-format serializ...
       ___________________________________________________________________
        
       Show HN: Paradict - Streamable multi-format serialization with
       schema
        
       Hi HN ! I'm Alex, a tech enthusiast. I'm excited to show you
       Paradict (https://github.com/pyrustic/paradict), my solution for
       streamable multi-format serialization.  Although JSON, YAML, and
       TOML are all human-readable, they serve different purposes. For
       example, TOML is specifically designed for configuration files
       while JSON is used as a data interchange format.  Sometimes an
       initiative to create a binary version of JSON arises and as far as
       I know, it ends with an unidirectional mapping of datatypes.  There
       is no silver bullet, yet one coherent solution built from scratch
       that addresses multi-format (binary and textual) serialization and
       configuration files would be a step forward.  Earlier this year, I
       accidentally designed a textual data format to represent complex
       data structures inside a document divided into sections. The
       project, namely Jesth (Just Extract Sections Then Hack'em),
       generated an interesting discussion on HN
       (https://news.ycombinator.com/item?id=35991018).  Out of curiosity,
       I ran some benchmarks using Jesth, JSON and MessagePack, with and
       without Gzip compression against a large JSON file downloaded from
       the web. The benchmarking gave me insights that led to the decision
       to evolve Jesth's ideas into a new multi-format serialization
       solution.  I designed and built Paradict from scratch to serialize
       and deserialize a dictionary data structure. Although Paradict's
       root data structure is a dictionary, lists, sets, and dictionaries
       can be nested within it at arbitrary depth.  A Paradict dictionary
       can be populated with strings, binary data, integers, floats,
       complex numbers, booleans, dates, times, datetimes, comments,
       extension objects, and grids (matrices). There is also a schema-
       based validation mechanism that can contain programmatic checkers.
       The binary serialization format is designed with compactness in
       mind such as Pi with its first two decimal places, the Golden ratio
       with its first two decimal places, and the date of the funeral of
       Pope Benedict XVI would each be encoded on two bytes (not counting
       their respective 1-byte tag which starts each Paradict binary
       datum).  This binary format has two levels of granularity for
       continuous data stream processing: a datum at the low level, which
       is in some cases a 2-tuple composed of a tag and its payload, and
       the message at the high level which is a dictionary data structure.
       The textual serialization format has two modes: data and config
       modes. Config mode implicitly treats dictionary keys as strings,
       removing the need to surround them with quotes, and unlike the
       colon (:) between a key-value pair in data mode, it uses the equal
       sign (=) as separator.  This textual format has two levels of
       granularity for continuous data stream processing: a single line of
       text at the low level and the message at the high level which is a
       dictionary data structure.  Here is a valid Paradict configuration
       document that contains a "user" section:                 [user]
       # no comment       id = 42       name = 'alex'       birthday =
       2042-12-25T16:20:59Z       photo = (bin)           54 68 69 73 20
       69 73 20 6E 6F 74 20 61 20 70 68           6F 74 6F 67 72 61 70 68
       weight_matrix = (grid)           1 0 1 0           0 1 0 1
       1 0 1 0       books = (dict)           romance = (list)
       'Happy Place'               'Romantic Comedy'           sci_fi =
       (list)               'Dune'               'Neuromancer'
       epitaph = (text)           According to the law of conservation of
       energy,           no a bit of you is gone;           you are just
       less orderly.           ---       Under the hood, Paradict uses
       Braq (https://github.com/pyrustic/braq), the most obvious way to
       section a document (as shown just above), and Ustrid
       (https://github.com/pyrustic/ustrid), to uniquely generate string
       identifiers.  Paradict is available on PyPI and you can learn more
       by reading its README, browsing the source code or playing with its
       tests.  Let me know what you think about all this !
        
       Author : alexrustic
       Score  : 10 points
       Date   : 2023-12-18 16:30 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       ___________________________________________________________________
       (page generated 2023-12-18 23:01 UTC)