[HN Gopher] Fast Lua Serialization (2023)
       ___________________________________________________________________
        
       Fast Lua Serialization (2023)
        
       Author : synergy20
       Score  : 49 points
       Date   : 2024-08-02 16:04 UTC (6 hours ago)
        
 (HTM) web link (artemis.sh)
 (TXT) w3m dump (artemis.sh)
        
       | CapsAdmin wrote:
       | These days, LuaJIT 2.1 also comes with a highly specialized
       | serializer.
       | 
       | https://luajit.org/ext_buffer.html#serialize
        
       | sitkack wrote:
       | Kinda ironic that the article doesn't discuss it, but Lua was
       | originally a data loading and simulation driving language for
       | Fortran. Lua _was_ the serialization format.
       | 
       | Lua itself can load millions of records per second from disk.
        
         | Something1234 wrote:
         | I'm going to need to hear more about it because this is
         | absolutely wild.
        
           | fanf2 wrote:
           | https://www.lua.org/doc/hopl.pdf
        
         | riidom wrote:
         | What would be the source for that first sentence of yours?
        
         | plorkyeran wrote:
         | Lua was not originally a serialization format. It was a _data
         | entry_ format which was written by hand which then generated
         | the files fed into simulations. Load speed was very important,
         | but they weren 't generating Lua. It has been used as a
         | serialization format (for example, World of Warcraft saves UI
         | state by generating Lua files), but that came along much later.
         | 
         | https://www.lua.org/history.html covers the early history.
        
       | orf wrote:
       | > It correctly handles the strange array/map duality of lua
       | tables in the most efficient way it can do safely (serializing
       | both variants concurrently and only writing out the correct one
       | at the end)
       | 
       | The article doesn't expand on this - what are they referring to?
       | Are lua tables arrays?
       | 
       | Serializing something common like tables 2x seems like a massive
       | overhead.
        
         | copx wrote:
         | A Lua table is data structure which contains both an array and
         | a hash map part i.e. it accepts both indexed and keyed entries.
         | E.g.:                 myTable[1] = "red"
         | myTable.name = "Color Codes"
        
         | MattJ100 wrote:
         | Lua doesn't have a separate "array" type, as arrays are just a
         | special case of key->value where the key is always a positive
         | integer. While this is nice (it simplifies the language yet Lua
         | tables are incredibly versatile), it does indeed make it a
         | little more work to translate structures to
         | languages/serializations that distinguish between the two
         | types, if the serializer needs to guess at runtime (e.g. if you
         | don't have a predefined schema).
         | 
         | I can't comment on the efficiency of whatever this
         | implementation is doing, I haven't read the code. It does sound
         | a little expensive. Also it's not always 100% clear which the
         | correct mapping is, unless the developer explicitly picks one.
         | For example, a serializer has no way to determine whether '{}'
         | is intended to be an empty array or an empty hash.
         | 
         | Another common gotcha, e.g. when translating to JSON, is that
         | Lua does not restrict the types of keys (while JSON objects
         | only support string keys), so boolean true/false are possible
         | keys, for example, which cannot be translated to JSON directly.
        
           | matheusmoreira wrote:
           | I ended up drawing the same conclusion while implementing my
           | language. An insertion ordered hash table turned out to be
           | just a normal array of values where the keys map to array
           | indexes instead. Other than speed, I'm struggling to think of
           | a good reason to keep the dedicated array type...
        
             | cmovq wrote:
             | Note that the lua VM still optimizes tables that look like
             | arrays by storing values in the array separate from the
             | hash table [1].
             | 
             | [1]: https://www.lua.org/gems/sample.pdf
        
             | VWWHFSfQ wrote:
             | Interesting side-effect of newer Python dictionaries
             | preserving insertion order too.
             | 
             | Explained by Hettinger here:
             | 
             | https://www.youtube.com/watch?v=npw4s1QTmPg
        
           | 0cf8612b2e1e wrote:
           | The biggest Lua gotcha is that accessing a missing element is
           | not an error. Requires a hack to distinguish "really null".
        
             | VWWHFSfQ wrote:
             | most often seen with something like JSON serialization
             | where JSON null will result in the key being completely
             | absent from the Lua table, thus re-encoding into JSON will
             | be missing that key entirely instead of existing but with a
             | null value.
             | 
             | It's why almost every serialization library will have
             | userdata values to represent "null" and things like empty
             | arrays instead of empty objects.
             | 
             | Still love Lua though.
        
       | turtledragonfly wrote:
       | Lua has its own built-in low-level serialization, in the form of
       | string.pack() and string.unpack()[1]. This is akin to Perl's
       | pack()[2].
       | 
       | But note that this is a tool used for laying out bytes in a
       | certain order (like a C struct), not a general "serialization
       | framework" for traversing data structures and whatnot. But it can
       | be the building block for one.
       | 
       | [1] https://www.lua.org/manual/5.4/manual.html#6.4.2
       | 
       | [2] https://perldoc.perl.org/functions/pack
        
       | dottrap wrote:
       | As another extremely simple idea, in my personal experience, just
       | using regular Lua table syntax for serialization and then pre-
       | compiling it with luac so it could be loaded quickly via
       | dofile(), produced results just as fast for loading it in as
       | using lua-protobuf. (I don't remember the difference between
       | writing out Lua tables vs. lua-protobuf because my use case
       | needed to read the data more often than generate it, but it must
       | have not been big enough, if any, otherwise I would probably
       | remember it.) I was loading gigabytes of data for large batch
       | processing.
        
         | csears wrote:
         | Would that be safe even with untrusted data? (Assuming you were
         | the one who serialized and compiled the table)
        
           | Dylan16807 wrote:
           | Are you asking if there are special numbers or strings or
           | some other kind of plain data that would glitch out the
           | interpreter when they're read back in? That would be an
           | impressive failure of a programming language.
           | 
           | Yes it's safe as long as you're serializing correctly (which
           | isn't very hard).
        
       ___________________________________________________________________
       (page generated 2024-08-02 23:00 UTC)