[HN Gopher] Bebop: An Efficient, Schema-Based Binary Serializati...
       ___________________________________________________________________
        
       Bebop: An Efficient, Schema-Based Binary Serialization Format
        
       Author : kompas_msv
       Score  : 99 points
       Date   : 2021-03-30 03:38 UTC (19 hours ago)
        
 (HTM) web link (rainway.com)
 (TXT) w3m dump (rainway.com)
        
       | The_rationalist wrote:
       | Noob question: How does performance of state of the art Binary
       | formats compare versus state of the art Json serializers such as
       | SIMDjson? (both in throughput and file size)
        
       | andy_ppp wrote:
       | As I evolve my schema (say add a new field and rename an old one)
       | how is that handled? In my experience with GRPC+Protobuf schema
       | changes over time and various service versions all being deployed
       | this can become a real problem when one schema version starts
       | talking to another.
       | 
       | I wish there was a half way house between something like this and
       | JSON as I really find is useful to be able to debug over the wire
       | with Postman or Charles for example.
        
       | junon wrote:
       | > Our evaluation of other solutions found poor client-side
       | serialization performance, large run-time overhead, poor browser
       | support, and different trade-offs that drove us to create Bebop.
       | 
       | Soo... cap'n proto?
        
         | zoltrain wrote:
         | Was going to say the exact same thing...
         | https://capnproto.org/otherlang.html seems like there's pretty
         | broad support.
        
         | dikei wrote:
         | There's also FlatBuffer, which is also specifically made for
         | Gaming application by Google.
        
           | vanderZwan wrote:
           | Originally created by Wouter van Oortmerssen
           | 
           | http://strlen.com/
        
             | tyingq wrote:
             | His Treesheets "hierarchical spreadsheet" is interesting,
             | may give that a try. Seems a bit like emacs org mode.
             | http://strlen.com/treesheets/
        
             | SnazzZG wrote:
             | Wow, that's a name I haven't heard in 30-odd years, I loved
             | programming in AmigaE in the 90's!
        
               | vanderZwan wrote:
               | In that case I think you're going to enjoy checking out
               | his website! He's done quite a few cool things in the
               | meantime
        
             | w23j wrote:
             | Wow, that CV is ... impressive.
        
               | vanderZwan wrote:
               | I guess he's a bit like the programmer equivalent of the
               | underground musician that's not that well known by the
               | mainstream but who inspired a lot of other musicians. For
               | example, he wrote the language that _inspired_
               | Brainfuck[0].
               | 
               | [0] http://strlen.com/false-language/
        
           | emmanueloga_ wrote:
           | Came up with a list of IDL alternatives for a question
           | recently, perhaps relevant/interesting [1].
           | 
           | 1: https://cs.stackexchange.com/questions/129904/does-there-
           | exi...
        
       | quilombodigital wrote:
       | congrats! And finally Corba is being reinvented!
        
       | derriz wrote:
       | Some feedback - maybe I'm being picky but I feel you've chosen
       | poorly with regard to the "readonly" keyword in your IDL. Or
       | least I'd prefer if you used the normal type/interface naming
       | conventions. "readonly" and "immutable" are different concepts
       | although often confused. I'm pretty sure "immutable" better
       | describes the behavior in your case.
        
       | throwaway210222 wrote:
       | This year's ASN.1 BER. Again
        
         | brabel wrote:
         | I wrote an ASN.1 parser once (the platform we were using didn't
         | have one)... and been wondering exactly why there are so many
         | new binary formats when it all goes down to something just like
         | ASN.1 :D, mostly (which is what, 30 years old now?).
        
           | [deleted]
        
         | athrowaway3z wrote:
         | I've never done a proper read up on ASN.1 to know when and if
         | to apply it. But i think its biggest obstacle apart from a
         | flashy website is its wikipedia page
         | (https://en.wikipedia.org/wiki/ASN.1). It is just overloaded
         | with concepts that don't seem to be solving my problem, and its
         | not clear what is a must-know and what is not.
        
         | jnwatson wrote:
         | Speaking of which, I've never seen a marshaling protocol with
         | as sophisticated a message versioning scheme as ASN.1. The
         | amount of thought that went into specifying how to allow
         | "implementations from the future" to interoperate with those of
         | the past is impressive.
        
           | vlovich123 wrote:
           | And yet outclassed on any meaningful metric by generic
           | serialization languages? ASN.1 is a programming language in
           | addition to a data transport format opening up many
           | opportunities for security vulnerabilities. Performance is
           | shit because it's using text to encode everything so that
           | it's human readable. Similarly parsing is extra complicated
           | and slow for that reason. Serialization/deserialization
           | requires a dedicated library rather than one that can be
           | reused across all tasks. Finally the data representation
           | itself is extremely bloated vs the binary data. It's
           | impressive in the same way that cassette technology and the
           | neat things people did with them. I really wish some
           | reasonable IDL would just get adopted for standards purposes
           | rather than each standard developing their own.
           | 
           | All other serialization formats using an IDL seem strictly
           | better.
        
             | formerly_proven wrote:
             | > Performance is shit because it's using text to encode
             | everything so that it's human readable. Similarly parsing
             | is extra complicated and slow for that reason. [...]
             | Finally the data representation itself is extremely bloated
             | vs the binary data.
             | 
             | A weird criticism considering the encoding rules usually
             | used for ASN.1 are all binary and some of them are bit-
             | packed (like PER), which is very uncommon in newer
             | protocols (for good reason).
             | 
             | Oh and there is OER now, which is actually a very
             | reasonable binary encoding.
        
           | tonyg wrote:
           | Isn't it amazing? ASN.1 seems like such a low bar to pass,
           | and yet.
        
       | dm3 wrote:
       | Surprised there's no mention of SBE. It's my go to schema-based
       | serialization format.
        
       | the_duke wrote:
       | The article doesn't mention at all how those speedups have been
       | achieved, what tradeoff the format makes and what it is optimized
       | for - which would be the actually interesting part.
       | 
       | A link to the benchmark code and description of the data would be
       | nice.
       | 
       | It also doesn't show data for FlatBuffer, which is often a lot
       | faster and leaner than ProtoBuf, or for Capt'n Proto.
       | 
       | ProtoBuf is not exactly known for amazing performance or very
       | optimal client implementations across the various languages.
       | 
       | By all means, create a new serialization format, why not. But
       | with so many options to chose from, I would require really strong
       | justification internally.
        
         | willtim wrote:
         | Designing a serialisation format has a lot similarities with
         | designing a programming language: data types, declarative/ease-
         | of-use versus full control, abstractions, static versus dynamic
         | memory areas, optimising representations etc. There is
         | definitely room for a variety of formats depending on the use
         | cases; and many are unhappy with the current incumbent:
         | https://reasonablypolymorphic.com/blog/protos-are-wrong/
         | 
         | But yes, it's a lot of work and they probably should document
         | what the trade-offs are that they have made.
        
           | michaelcampbell wrote:
           | And security. SO MANY CVE's related to serialization issues.
        
         | schoetbi wrote:
         | Totaly agree. I use flatbuffer and it is a lot faster than e.g
         | Json:
         | https://google.github.io/flatbuffers/flatbuffers_benchmarks....
        
         | tyingq wrote:
         | It does seem more similar to Capt'n Proto than the things it
         | does compare itself to.
         | 
         | I would guess the high speed is the triad of
         | length encoding with a header vs searching for delimiters
         | using what they call structs for benchmarks (no repeatedly
         | sending the field name)         how much you trade off
         | safety/sanity checks for performance
         | 
         | Oh, and keeping ints little endian.
        
         | andrewmd5 wrote:
         | One of the authors of Bebop here:
         | 
         | - The benchmark code is present in the laboratory directory of
         | the repository.
         | 
         | - We don't compare to Capt'n Proto because it does not have a
         | stable web-based implementation, at least not one that has the
         | features that make it so fast natively, so there is nothing to
         | compare.
         | 
         | - Flatbuffers are fast but have a notoriously awful API to work
         | with while also creating their own non-standard data structures
         | in languages like C++. Bebop generates standard type-safe code.
         | 
         | - Bebop doesn't try to compress data other than strings. This
         | is because we don't want to be responsible for compressing
         | trailing zeroes when faster compression algorithms exist that
         | can be down after encoding. Also most data is tiny.
         | 
         | - Bebop supports discriminated unions and has a much more
         | robust type system than Flatbuffers.
         | 
         | - We're not convincing anyone to use our stuff. It was made for
         | us and open sourced because it was useful; we don't need people
         | ripping out their current serializers if there's no pressure to
         | do so.
        
       | pdimitar wrote:
       | I only opened this article to see a comparison with FlatBuffers
       | and got disappointed. :(
       | 
       | To be fair though, I am open to the idea of having a separate
       | schema definition language, at least. (And please don't say DDL,
       | it doesn't even come close.)
        
         | schoetbi wrote:
         | Agree, the ecosystem for flatbuffer and the perfomance is
         | great. Here is the language support:
         | https://google.github.io/flatbuffers/flatbuffers_support.htm...
        
         | otabdeveloper4 wrote:
         | FlatBuffers creates its own demented and completely
         | incompatible data structures when deserializing into C++. Which
         | means you then need to copy the FlatBuffers structures into
         | normal ones, defeating the entire point of "zero-copy" in the
         | first place.
         | 
         | This thing it looks like uses normal C++ structures under the
         | hood, and if so that's a huge plus.
        
           | pdimitar wrote:
           | I don't see why C++ considerations get priority over
           | everything else, or are you making another point?
        
             | otabdeveloper4 wrote:
             | They do get priority when you're primarily coding in C++,
             | obviously. Not everything is Javascript-first. Different
             | requirements for different folks.
        
               | pdimitar wrote:
               | I am not a JS dev, don't project your frustration on me.
               | ;)
               | 
               | So okay, FlatBuffers doesn't map its zero-copy philosophy
               | perfectly everywhere -- fact of life. What would you
               | offer then? Which other format and/or library?
        
               | otabdeveloper4 wrote:
               | Never used this Bebop thing, but I'd have definitely
               | preferred it to FlatBuffers back when I was shopping for
               | serialization libraries.
        
       | [deleted]
        
       | pantalaimon wrote:
       | Is anyone using CBOR at all?
        
         | jedisct1 wrote:
         | People writing standards.
        
         | octopoc wrote:
         | We are using it for sending data from edge devices to the cloud
        
       | jedisct1 wrote:
       | Surprisingly no comparison against Cap'n Proto.
        
       | jarym wrote:
       | There's gotta now be way too many serialisation formats. Each one
       | claiming similar things to others.
       | 
       | What would be helpful is a concrete example showing what was
       | tried with an existing approach that fell short. I mean code,
       | benchmarks, theory.
        
         | sebastialonso wrote:
         | We need an independent reviewer of serialization formats!
        
           | nerdponx wrote:
           | This would be a fun blog post series: pick a couple of basic
           | serialization/marshalling tasks, and a couple of common
           | formats, and compare their performance characteristics,
           | storage size and bandwidth requirements, implementation
           | availability, etc.
        
       | ForHackernews wrote:
       | Any comparison against Avro?
        
       | cies wrote:
       | So there are stucts (all values present), messages (some values
       | may be omitted) and enums (pick one, but has not value).
       | 
       | I miss "tagged unions" or enums with values a.k.a. sumtypes.
        
         | andrewmd5 wrote:
         | Bebop supports tagged unions now.
        
       | lsb wrote:
       | How does this compare performance-wise with Arrow, designed for
       | zero-copy usage?
        
         | dikei wrote:
         | They have different use-cases:
         | 
         | * Arrow is columnar, batch-oriented, geared toward high
         | throughput.
         | 
         | * Bebob is record-oriented, similar to Avro, Protobuf or JSON,
         | geared toward low latency.
        
           | dtf wrote:
           | Also interesting to note that Arrow itself uses FlatBuffers
           | for its schema data.
           | 
           | https://arrow.apache.org/docs/format/Columnar.html
        
       | genericguy wrote:
       | Looks great, I would use it over gRPC just for being able to use
       | objects instead of the java-like setX nonsense.
        
         | genericguy wrote:
         | Though it would need writing all the network code!
        
           | jeffbee wrote:
           | It would not, because gRPC is agnostic to the payload format.
           | You simply pass it pre-formatted payloads instead of passing
           | pointers to proto messages.
        
       ___________________________________________________________________
       (page generated 2021-03-30 23:02 UTC)