[HN Gopher] Bebop: An Efficient, Schema-Based Binary Serializati...
___________________________________________________________________
Bebop: An Efficient, Schema-Based Binary Serialization Format
Author : kompas_msv
Score : 99 points
Date : 2021-03-30 03:38 UTC (19 hours ago)
(HTM) web link (rainway.com)
(TXT) w3m dump (rainway.com)
| The_rationalist wrote:
| Noob question: How does performance of state of the art Binary
| formats compare versus state of the art Json serializers such as
| SIMDjson? (both in throughput and file size)
| andy_ppp wrote:
| As I evolve my schema (say add a new field and rename an old one)
| how is that handled? In my experience with GRPC+Protobuf schema
| changes over time and various service versions all being deployed
| this can become a real problem when one schema version starts
| talking to another.
|
| I wish there was a half way house between something like this and
| JSON as I really find is useful to be able to debug over the wire
| with Postman or Charles for example.
| junon wrote:
| > Our evaluation of other solutions found poor client-side
| serialization performance, large run-time overhead, poor browser
| support, and different trade-offs that drove us to create Bebop.
|
| Soo... cap'n proto?
| zoltrain wrote:
| Was going to say the exact same thing...
| https://capnproto.org/otherlang.html seems like there's pretty
| broad support.
| dikei wrote:
| There's also FlatBuffer, which is also specifically made for
| Gaming application by Google.
| vanderZwan wrote:
| Originally created by Wouter van Oortmerssen
|
| http://strlen.com/
| tyingq wrote:
| His Treesheets "hierarchical spreadsheet" is interesting,
| may give that a try. Seems a bit like emacs org mode.
| http://strlen.com/treesheets/
| SnazzZG wrote:
| Wow, that's a name I haven't heard in 30-odd years, I loved
| programming in AmigaE in the 90's!
| vanderZwan wrote:
| In that case I think you're going to enjoy checking out
| his website! He's done quite a few cool things in the
| meantime
| w23j wrote:
| Wow, that CV is ... impressive.
| vanderZwan wrote:
| I guess he's a bit like the programmer equivalent of the
| underground musician that's not that well known by the
| mainstream but who inspired a lot of other musicians. For
| example, he wrote the language that _inspired_
| Brainfuck[0].
|
| [0] http://strlen.com/false-language/
| emmanueloga_ wrote:
| Came up with a list of IDL alternatives for a question
| recently, perhaps relevant/interesting [1].
|
| 1: https://cs.stackexchange.com/questions/129904/does-there-
| exi...
| quilombodigital wrote:
| congrats! And finally Corba is being reinvented!
| derriz wrote:
| Some feedback - maybe I'm being picky but I feel you've chosen
| poorly with regard to the "readonly" keyword in your IDL. Or
| least I'd prefer if you used the normal type/interface naming
| conventions. "readonly" and "immutable" are different concepts
| although often confused. I'm pretty sure "immutable" better
| describes the behavior in your case.
| throwaway210222 wrote:
| This year's ASN.1 BER. Again
| brabel wrote:
| I wrote an ASN.1 parser once (the platform we were using didn't
| have one)... and been wondering exactly why there are so many
| new binary formats when it all goes down to something just like
| ASN.1 :D, mostly (which is what, 30 years old now?).
| [deleted]
| athrowaway3z wrote:
| I've never done a proper read up on ASN.1 to know when and if
| to apply it. But i think its biggest obstacle apart from a
| flashy website is its wikipedia page
| (https://en.wikipedia.org/wiki/ASN.1). It is just overloaded
| with concepts that don't seem to be solving my problem, and its
| not clear what is a must-know and what is not.
| jnwatson wrote:
| Speaking of which, I've never seen a marshaling protocol with
| as sophisticated a message versioning scheme as ASN.1. The
| amount of thought that went into specifying how to allow
| "implementations from the future" to interoperate with those of
| the past is impressive.
| vlovich123 wrote:
| And yet outclassed on any meaningful metric by generic
| serialization languages? ASN.1 is a programming language in
| addition to a data transport format opening up many
| opportunities for security vulnerabilities. Performance is
| shit because it's using text to encode everything so that
| it's human readable. Similarly parsing is extra complicated
| and slow for that reason. Serialization/deserialization
| requires a dedicated library rather than one that can be
| reused across all tasks. Finally the data representation
| itself is extremely bloated vs the binary data. It's
| impressive in the same way that cassette technology and the
| neat things people did with them. I really wish some
| reasonable IDL would just get adopted for standards purposes
| rather than each standard developing their own.
|
| All other serialization formats using an IDL seem strictly
| better.
| formerly_proven wrote:
| > Performance is shit because it's using text to encode
| everything so that it's human readable. Similarly parsing
| is extra complicated and slow for that reason. [...]
| Finally the data representation itself is extremely bloated
| vs the binary data.
|
| A weird criticism considering the encoding rules usually
| used for ASN.1 are all binary and some of them are bit-
| packed (like PER), which is very uncommon in newer
| protocols (for good reason).
|
| Oh and there is OER now, which is actually a very
| reasonable binary encoding.
| tonyg wrote:
| Isn't it amazing? ASN.1 seems like such a low bar to pass,
| and yet.
| dm3 wrote:
| Surprised there's no mention of SBE. It's my go to schema-based
| serialization format.
| the_duke wrote:
| The article doesn't mention at all how those speedups have been
| achieved, what tradeoff the format makes and what it is optimized
| for - which would be the actually interesting part.
|
| A link to the benchmark code and description of the data would be
| nice.
|
| It also doesn't show data for FlatBuffer, which is often a lot
| faster and leaner than ProtoBuf, or for Capt'n Proto.
|
| ProtoBuf is not exactly known for amazing performance or very
| optimal client implementations across the various languages.
|
| By all means, create a new serialization format, why not. But
| with so many options to chose from, I would require really strong
| justification internally.
| willtim wrote:
| Designing a serialisation format has a lot similarities with
| designing a programming language: data types, declarative/ease-
| of-use versus full control, abstractions, static versus dynamic
| memory areas, optimising representations etc. There is
| definitely room for a variety of formats depending on the use
| cases; and many are unhappy with the current incumbent:
| https://reasonablypolymorphic.com/blog/protos-are-wrong/
|
| But yes, it's a lot of work and they probably should document
| what the trade-offs are that they have made.
| michaelcampbell wrote:
| And security. SO MANY CVE's related to serialization issues.
| schoetbi wrote:
| Totaly agree. I use flatbuffer and it is a lot faster than e.g
| Json:
| https://google.github.io/flatbuffers/flatbuffers_benchmarks....
| tyingq wrote:
| It does seem more similar to Capt'n Proto than the things it
| does compare itself to.
|
| I would guess the high speed is the triad of
| length encoding with a header vs searching for delimiters
| using what they call structs for benchmarks (no repeatedly
| sending the field name) how much you trade off
| safety/sanity checks for performance
|
| Oh, and keeping ints little endian.
| andrewmd5 wrote:
| One of the authors of Bebop here:
|
| - The benchmark code is present in the laboratory directory of
| the repository.
|
| - We don't compare to Capt'n Proto because it does not have a
| stable web-based implementation, at least not one that has the
| features that make it so fast natively, so there is nothing to
| compare.
|
| - Flatbuffers are fast but have a notoriously awful API to work
| with while also creating their own non-standard data structures
| in languages like C++. Bebop generates standard type-safe code.
|
| - Bebop doesn't try to compress data other than strings. This
| is because we don't want to be responsible for compressing
| trailing zeroes when faster compression algorithms exist that
| can be down after encoding. Also most data is tiny.
|
| - Bebop supports discriminated unions and has a much more
| robust type system than Flatbuffers.
|
| - We're not convincing anyone to use our stuff. It was made for
| us and open sourced because it was useful; we don't need people
| ripping out their current serializers if there's no pressure to
| do so.
| pdimitar wrote:
| I only opened this article to see a comparison with FlatBuffers
| and got disappointed. :(
|
| To be fair though, I am open to the idea of having a separate
| schema definition language, at least. (And please don't say DDL,
| it doesn't even come close.)
| schoetbi wrote:
| Agree, the ecosystem for flatbuffer and the perfomance is
| great. Here is the language support:
| https://google.github.io/flatbuffers/flatbuffers_support.htm...
| otabdeveloper4 wrote:
| FlatBuffers creates its own demented and completely
| incompatible data structures when deserializing into C++. Which
| means you then need to copy the FlatBuffers structures into
| normal ones, defeating the entire point of "zero-copy" in the
| first place.
|
| This thing it looks like uses normal C++ structures under the
| hood, and if so that's a huge plus.
| pdimitar wrote:
| I don't see why C++ considerations get priority over
| everything else, or are you making another point?
| otabdeveloper4 wrote:
| They do get priority when you're primarily coding in C++,
| obviously. Not everything is Javascript-first. Different
| requirements for different folks.
| pdimitar wrote:
| I am not a JS dev, don't project your frustration on me.
| ;)
|
| So okay, FlatBuffers doesn't map its zero-copy philosophy
| perfectly everywhere -- fact of life. What would you
| offer then? Which other format and/or library?
| otabdeveloper4 wrote:
| Never used this Bebop thing, but I'd have definitely
| preferred it to FlatBuffers back when I was shopping for
| serialization libraries.
| [deleted]
| pantalaimon wrote:
| Is anyone using CBOR at all?
| jedisct1 wrote:
| People writing standards.
| octopoc wrote:
| We are using it for sending data from edge devices to the cloud
| jedisct1 wrote:
| Surprisingly no comparison against Cap'n Proto.
| jarym wrote:
| There's gotta now be way too many serialisation formats. Each one
| claiming similar things to others.
|
| What would be helpful is a concrete example showing what was
| tried with an existing approach that fell short. I mean code,
| benchmarks, theory.
| sebastialonso wrote:
| We need an independent reviewer of serialization formats!
| nerdponx wrote:
| This would be a fun blog post series: pick a couple of basic
| serialization/marshalling tasks, and a couple of common
| formats, and compare their performance characteristics,
| storage size and bandwidth requirements, implementation
| availability, etc.
| ForHackernews wrote:
| Any comparison against Avro?
| cies wrote:
| So there are stucts (all values present), messages (some values
| may be omitted) and enums (pick one, but has not value).
|
| I miss "tagged unions" or enums with values a.k.a. sumtypes.
| andrewmd5 wrote:
| Bebop supports tagged unions now.
| lsb wrote:
| How does this compare performance-wise with Arrow, designed for
| zero-copy usage?
| dikei wrote:
| They have different use-cases:
|
| * Arrow is columnar, batch-oriented, geared toward high
| throughput.
|
| * Bebob is record-oriented, similar to Avro, Protobuf or JSON,
| geared toward low latency.
| dtf wrote:
| Also interesting to note that Arrow itself uses FlatBuffers
| for its schema data.
|
| https://arrow.apache.org/docs/format/Columnar.html
| genericguy wrote:
| Looks great, I would use it over gRPC just for being able to use
| objects instead of the java-like setX nonsense.
| genericguy wrote:
| Though it would need writing all the network code!
| jeffbee wrote:
| It would not, because gRPC is agnostic to the payload format.
| You simply pass it pre-formatted payloads instead of passing
| pointers to proto messages.
___________________________________________________________________
(page generated 2021-03-30 23:02 UTC)