[HN Gopher] Show HN: Paradict - Streamable multi-format serializ...
___________________________________________________________________
Show HN: Paradict - Streamable multi-format serialization with
schema
Hi HN ! I'm Alex, a tech enthusiast. I'm excited to show you
Paradict (https://github.com/pyrustic/paradict), my solution for
streamable multi-format serialization. Although JSON, YAML, and
TOML are all human-readable, they serve different purposes. For
example, TOML is specifically designed for configuration files
while JSON is used as a data interchange format. Sometimes an
initiative to create a binary version of JSON arises and as far as
I know, it ends with an unidirectional mapping of datatypes. There
is no silver bullet, yet one coherent solution built from scratch
that addresses multi-format (binary and textual) serialization and
configuration files would be a step forward. Earlier this year, I
accidentally designed a textual data format to represent complex
data structures inside a document divided into sections. The
project, namely Jesth (Just Extract Sections Then Hack'em),
generated an interesting discussion on HN
(https://news.ycombinator.com/item?id=35991018). Out of curiosity,
I ran some benchmarks using Jesth, JSON and MessagePack, with and
without Gzip compression against a large JSON file downloaded from
the web. The benchmarking gave me insights that led to the decision
to evolve Jesth's ideas into a new multi-format serialization
solution. I designed and built Paradict from scratch to serialize
and deserialize a dictionary data structure. Although Paradict's
root data structure is a dictionary, lists, sets, and dictionaries
can be nested within it at arbitrary depth. A Paradict dictionary
can be populated with strings, binary data, integers, floats,
complex numbers, booleans, dates, times, datetimes, comments,
extension objects, and grids (matrices). There is also a schema-
based validation mechanism that can contain programmatic checkers.
The binary serialization format is designed with compactness in
mind such as Pi with its first two decimal places, the Golden ratio
with its first two decimal places, and the date of the funeral of
Pope Benedict XVI would each be encoded on two bytes (not counting
their respective 1-byte tag which starts each Paradict binary
datum). This binary format has two levels of granularity for
continuous data stream processing: a datum at the low level, which
is in some cases a 2-tuple composed of a tag and its payload, and
the message at the high level which is a dictionary data structure.
The textual serialization format has two modes: data and config
modes. Config mode implicitly treats dictionary keys as strings,
removing the need to surround them with quotes, and unlike the
colon (:) between a key-value pair in data mode, it uses the equal
sign (=) as separator. This textual format has two levels of
granularity for continuous data stream processing: a single line of
text at the low level and the message at the high level which is a
dictionary data structure. Here is a valid Paradict configuration
document that contains a "user" section: [user]
# no comment id = 42 name = 'alex' birthday =
2042-12-25T16:20:59Z photo = (bin) 54 68 69 73 20
69 73 20 6E 6F 74 20 61 20 70 68 6F 74 6F 67 72 61 70 68
weight_matrix = (grid) 1 0 1 0 0 1 0 1
1 0 1 0 books = (dict) romance = (list)
'Happy Place' 'Romantic Comedy' sci_fi =
(list) 'Dune' 'Neuromancer'
epitaph = (text) According to the law of conservation of
energy, no a bit of you is gone; you are just
less orderly. --- Under the hood, Paradict uses
Braq (https://github.com/pyrustic/braq), the most obvious way to
section a document (as shown just above), and Ustrid
(https://github.com/pyrustic/ustrid), to uniquely generate string
identifiers. Paradict is available on PyPI and you can learn more
by reading its README, browsing the source code or playing with its
tests. Let me know what you think about all this !
Author : alexrustic
Score : 10 points
Date : 2023-12-18 16:30 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
___________________________________________________________________
(page generated 2023-12-18 23:01 UTC)