[HN Gopher] Parsing Protobuf like never before
       ___________________________________________________________________
        
       Parsing Protobuf like never before
        
       Author : ibobev
       Score  : 108 points
       Date   : 2025-07-17 10:09 UTC (6 days ago)
        
 (HTM) web link (mcyoung.xyz)
 (TXT) w3m dump (mcyoung.xyz)
        
       | UncleEntity wrote:
       | > In other words, a UPB parser is actually configuration for an
       | interpreter VM, which executes Protobuf messages as its bytecode.
       | 
       | This is kind of confusing, the VM is runtime crafted to parse a
       | single protobuf message type and only this message type? The
       | Second Futamura Projection, I suppose...
       | 
       | Or the VM is designed specifically around generic protobuf
       | messages and it can parse any random message but only if it's a
       | protobuf message?
       | 
       | I've been working on the design of a similar system but for
       | general binary parsing (think bison/yacc for binary data) and
       | hadn't even considered doing _data over specialized VM_ vs.
       | _bytecode+data over general VM_. Honestly, since it 's designed
       | around 'maximum laziness' (it just parses/verifies and creates
       | metadata over the input so you only pay for decoding bytes you
       | actually use) and I/O overhead is way greater than the VM
       | dispatching trying this out is probably one of those "premature
       | optimization is the root of all evil" cases but intriguing none
       | the less.
        
         | tonyarkles wrote:
         | Based on what I know about the structure of protobufs
         | internally and without having looked deep into what UPB is
         | doing... I'd guess it could probably be a stack machine that
         | treats (byte)+ as opcodes. Most of the time I'd think of it as
         | parser -> AST -> bytecode, but I think the "grammar" of
         | protobufs would allow your parser to essentially emit terminals
         | as they're parsed straight to the VM as instructions to
         | execute.
        
           | UncleEntity wrote:
           | In the couple days since I posted my confusion (threads
           | merged or something) I consulted the daffy robots and figured
           | out how it all works. Also had them come up with a design
           | document for "a specialized compiler and virtual machine
           | architecture for parsing Protocol Buffer messages that
           | achieves significant performance improvements through a novel
           | compilation pipeline combining protobuf-specific AST
           | optimization, continuation-passing style transformations, and
           | tail call interpreter execution."
           | 
           | Interesting times we live in...
        
         | haberman wrote:
         | I think I can shed some light on this, as the creator and lead
         | of upb.
         | 
         | Calling a Protobuf Parser an "interpreter VM" is a little bit
         | of rhetorical flourish. It comes from the observation that
         | there are some deep structural similarities between the two,
         | which I first observed in an article a few years back:
         | https://blog.reverberate.org/2021/04/21/musttail-efficient-i...
         | 
         | > It may seem odd to compare interpreter loops to protobuf
         | parsers, but the nature of the protobuf wire format makes them
         | more similar than you might expect. The protobuf wire format is
         | a series of tag/value pairs, where the tag contains a field
         | number and wire type. This tag acts similarly to an interpreter
         | opcode: it tells us what operation we need to perform to parse
         | this field's data. Like interpreter opcodes, protobuf field
         | numbers can come in any order, so we have to be prepared to
         | dispatch to any part of the code at any time.
         | 
         | This means that the overall structure of a protobuf parser is
         | conceptually a while() loop surrounding a switch() statement,
         | just like a VM interpreter.
         | 
         | The tricky part is that the set of "case" labels for a Protobuf
         | parser is message-specific and defined by the fields in the
         | schema. How do we accommodate that?
         | 
         | The traditional answer was to generate a function per message
         | and use the schema's field numbers as the case labels. You can
         | see an example of that here (in C++):
         | https://github.com/protocolbuffers/protobuf/blob/f763a2a8608...
         | 
         | More recently, we've moved towards making Protobuf parsing more
         | data-driven, where each field's schema is compiled into _data_
         | that is passed as an argument to a generic Protobuf parser
         | function. We call this  "table-driven parsing", and from my
         | read of the blog article, I believe this is what Miguel is
         | doing with hyperpb.
         | 
         | The trick then becomes how to make this table-driven dispatch
         | as fast as possible, to simulate what the switch() statement
         | would have done. That question is what I cover at length in the
         | article mentioned above.
        
           | anonymoushn wrote:
           | Really great. I wonder, for the "types encoded as code"
           | approach, is there any benefit to fast paths for data with
           | fields in ascending order? For some json parsers with types
           | encoded as code I have observed some speedup from either
           | hard-coding a known key order or assuming keys in some order
           | and providing a fallback in case an unexpected key is
           | encountered. For users who are stuck with protobuf forever
           | because of various services using it and various data being
           | encoded this way, the historical data could plausibly be
           | canonicalized and written back in large chunks when it is
           | accessed, so that one need not pay the entire cost of
           | canonicalizing it all at once. But of course the icache
           | concerns are still just as bad.
        
       | mdhb wrote:
       | I'd really love to see more work bringing the best parts of
       | protobuf to a standardised serialization format like CBOR.
       | 
       | I'd make the same argument for gRPC-web to something like WHATWG
       | streams and or WebTransport.
       | 
       | There is a lot of really cool and important learnings in both but
       | it's also so tied up in weird tooling and assumptions. Let's
       | rebase on IETF and W3C standards
        
         | youngtaff wrote:
         | Would be good to see support for encoding / decoding CBOR
         | exposed as a broswer API - they currently use CBOR internally
         | for WebAuthn so I'd hope it's bnot too hard
        
         | cyberax wrote:
         | You can easily do this. Protobuf supports pluggable writers,
         | and iterating over a schema is pretty easy. We do it for the
         | JSONB.
         | 
         | I'm not sure the purpose, though. Protobuf is great for its
         | inflexible schema, and CBOR is great for its flexible data
         | representation.
         | 
         | A separate CBOR schema would be a better fit, there's CDDL but
         | it has no traction.
        
       | irq-1 wrote:
       | https://github.com/bufbuild/hyperpb-go
        
       | skybrian wrote:
       | This is excellent: an in-depth description showing how the Go
       | internals make writing fast interpreters difficult, by someone
       | who is far more determined than I ever was to make it fast
       | anyway.
       | 
       | I've assumed that writing fast interpreters wasn't a use case the
       | Go team cared much about, but if it makes protobuf parsing
       | faster, maybe it will get some attention, and some of these low-
       | level tricks will no longer be necessary?
        
       | alexozer wrote:
       | So am I identifying the bottlenecks that motivate this design
       | correctly?
       | 
       | 1. Go FFI is slow
       | 
       | 2. Per-proto generated code specialization is slow, because of
       | icache pressure
       | 
       | I know there's more to the optimization story here, but I guess
       | these are the primary motivations for the VM over just better
       | code generation or implementing a parser in non-Go?
        
       | jeffrallen wrote:
       | > hyperpb is a brand new library, written in the most cursed Go
       | imaginable
       | 
       | This made me LOL.
        
       | dumah wrote:
       | Fantastic post.
       | 
       | Please do one on your analysis and optimization workflow and
       | tooling!
        
       | dang wrote:
       | Related ongoing thread:
       | 
       |  _Hyperpb: Faster dynamic Protobuf parsing_ -
       | https://news.ycombinator.com/item?id=44661785
        
       | ryukoposting wrote:
       | > Every type contributes to a cost on the instruction cache,
       | meaning that if your program parses a lot of different types, it
       | will essentially flush your instruction cache any time you enter
       | a parser. Worse still, if a parse involves enough types, the
       | parser itself will hit instruction decoding throughput issues.
       | 
       | Interesting. This makes me wonder how nanopb would benchmark
       | against the parsers shown in the graphs. nanopb's whole schtick
       | is that it's pure C99 and it doesn't generate separate parsing
       | functions for each message. nanopb is what a lot of embedded code
       | uses, due to the small footprint.
        
       | tschellenbach wrote:
       | last time i benchmarked msgpack and protobuf against each other
       | it was near flat for my use case. JSON was 2-3x slower, but
       | msgpack and protobuf were near equal. might be different now
       | after this release, exciting :)
        
         | Analemma_ wrote:
         | Keep in mind that some of the performance bottlenecks the
         | author is talking about and optimizing for show up in large-
         | scale uses and possibly not in benchmarks. In particular, the
         | "parsing too many different types blows away your instruction
         | cache" issue will only show up if you are actually parsing lots
         | of types, otherwise UPB is not necessary.
        
       | ptspts wrote:
       | Where are the benchmarks comparing hyperpb to other proto
       | parsers?
        
       | cyberax wrote:
       | I would love to benchmark it against the old gogo protobuf. One
       | thing that really slows down PB is its insistence on using
       | pointers everywhere. Gogo protobuf used value types where
       | possible, significantly reducing the GC pressure.
       | 
       | It feels like arenas are just an attempt to bring that back.
        
       | dgan wrote:
       | maybe if performance is really an issue, protobufs shouldn't be
       | used ?
       | 
       | there is also flatbuffers, capnp and purely C++ serializers: zpp,
       | yas... but protobufs are definitely convenient !
        
       | nemo1618 wrote:
       | There are two ways to look at this.
       | 
       | First is that, if the parsing library for your codec includes a
       | compiler, VM, and PGO, your codec must be extremely cursed and
       | you should take a step back and think about your life.
       | 
       | Second is that, if the parsing library for your codec includes a
       | compiler, VM, and PGO, your codec must be wildly popular and adds
       | enormous value.
        
       ___________________________________________________________________
       (page generated 2025-07-23 23:00 UTC)