[HN Gopher] Fast Lua Serialization (2023)
___________________________________________________________________
Fast Lua Serialization (2023)
Author : synergy20
Score : 49 points
Date : 2024-08-02 16:04 UTC (6 hours ago)
(HTM) web link (artemis.sh)
(TXT) w3m dump (artemis.sh)
| CapsAdmin wrote:
| These days, LuaJIT 2.1 also comes with a highly specialized
| serializer.
|
| https://luajit.org/ext_buffer.html#serialize
| sitkack wrote:
| Kinda ironic that the article doesn't discuss it, but Lua was
| originally a data loading and simulation driving language for
| Fortran. Lua _was_ the serialization format.
|
| Lua itself can load millions of records per second from disk.
| Something1234 wrote:
| I'm going to need to hear more about it because this is
| absolutely wild.
| fanf2 wrote:
| https://www.lua.org/doc/hopl.pdf
| riidom wrote:
| What would be the source for that first sentence of yours?
| plorkyeran wrote:
| Lua was not originally a serialization format. It was a _data
| entry_ format which was written by hand which then generated
| the files fed into simulations. Load speed was very important,
| but they weren 't generating Lua. It has been used as a
| serialization format (for example, World of Warcraft saves UI
| state by generating Lua files), but that came along much later.
|
| https://www.lua.org/history.html covers the early history.
| orf wrote:
| > It correctly handles the strange array/map duality of lua
| tables in the most efficient way it can do safely (serializing
| both variants concurrently and only writing out the correct one
| at the end)
|
| The article doesn't expand on this - what are they referring to?
| Are lua tables arrays?
|
| Serializing something common like tables 2x seems like a massive
| overhead.
| copx wrote:
| A Lua table is data structure which contains both an array and
| a hash map part i.e. it accepts both indexed and keyed entries.
| E.g.: myTable[1] = "red"
| myTable.name = "Color Codes"
| MattJ100 wrote:
| Lua doesn't have a separate "array" type, as arrays are just a
| special case of key->value where the key is always a positive
| integer. While this is nice (it simplifies the language yet Lua
| tables are incredibly versatile), it does indeed make it a
| little more work to translate structures to
| languages/serializations that distinguish between the two
| types, if the serializer needs to guess at runtime (e.g. if you
| don't have a predefined schema).
|
| I can't comment on the efficiency of whatever this
| implementation is doing, I haven't read the code. It does sound
| a little expensive. Also it's not always 100% clear which the
| correct mapping is, unless the developer explicitly picks one.
| For example, a serializer has no way to determine whether '{}'
| is intended to be an empty array or an empty hash.
|
| Another common gotcha, e.g. when translating to JSON, is that
| Lua does not restrict the types of keys (while JSON objects
| only support string keys), so boolean true/false are possible
| keys, for example, which cannot be translated to JSON directly.
| matheusmoreira wrote:
| I ended up drawing the same conclusion while implementing my
| language. An insertion ordered hash table turned out to be
| just a normal array of values where the keys map to array
| indexes instead. Other than speed, I'm struggling to think of
| a good reason to keep the dedicated array type...
| cmovq wrote:
| Note that the lua VM still optimizes tables that look like
| arrays by storing values in the array separate from the
| hash table [1].
|
| [1]: https://www.lua.org/gems/sample.pdf
| VWWHFSfQ wrote:
| Interesting side-effect of newer Python dictionaries
| preserving insertion order too.
|
| Explained by Hettinger here:
|
| https://www.youtube.com/watch?v=npw4s1QTmPg
| 0cf8612b2e1e wrote:
| The biggest Lua gotcha is that accessing a missing element is
| not an error. Requires a hack to distinguish "really null".
| VWWHFSfQ wrote:
| most often seen with something like JSON serialization
| where JSON null will result in the key being completely
| absent from the Lua table, thus re-encoding into JSON will
| be missing that key entirely instead of existing but with a
| null value.
|
| It's why almost every serialization library will have
| userdata values to represent "null" and things like empty
| arrays instead of empty objects.
|
| Still love Lua though.
| turtledragonfly wrote:
| Lua has its own built-in low-level serialization, in the form of
| string.pack() and string.unpack()[1]. This is akin to Perl's
| pack()[2].
|
| But note that this is a tool used for laying out bytes in a
| certain order (like a C struct), not a general "serialization
| framework" for traversing data structures and whatnot. But it can
| be the building block for one.
|
| [1] https://www.lua.org/manual/5.4/manual.html#6.4.2
|
| [2] https://perldoc.perl.org/functions/pack
| dottrap wrote:
| As another extremely simple idea, in my personal experience, just
| using regular Lua table syntax for serialization and then pre-
| compiling it with luac so it could be loaded quickly via
| dofile(), produced results just as fast for loading it in as
| using lua-protobuf. (I don't remember the difference between
| writing out Lua tables vs. lua-protobuf because my use case
| needed to read the data more often than generate it, but it must
| have not been big enough, if any, otherwise I would probably
| remember it.) I was loading gigabytes of data for large batch
| processing.
| csears wrote:
| Would that be safe even with untrusted data? (Assuming you were
| the one who serialized and compiled the table)
| Dylan16807 wrote:
| Are you asking if there are special numbers or strings or
| some other kind of plain data that would glitch out the
| interpreter when they're read back in? That would be an
| impressive failure of a programming language.
|
| Yes it's safe as long as you're serializing correctly (which
| isn't very hard).
___________________________________________________________________
(page generated 2024-08-02 23:00 UTC)