[HN Gopher] Parsing Protobuf like never before
___________________________________________________________________
Parsing Protobuf like never before
Author : ibobev
Score : 108 points
Date : 2025-07-17 10:09 UTC (6 days ago)
(HTM) web link (mcyoung.xyz)
(TXT) w3m dump (mcyoung.xyz)
| UncleEntity wrote:
| > In other words, a UPB parser is actually configuration for an
| interpreter VM, which executes Protobuf messages as its bytecode.
|
| This is kind of confusing, the VM is runtime crafted to parse a
| single protobuf message type and only this message type? The
| Second Futamura Projection, I suppose...
|
| Or the VM is designed specifically around generic protobuf
| messages and it can parse any random message but only if it's a
| protobuf message?
|
| I've been working on the design of a similar system but for
| general binary parsing (think bison/yacc for binary data) and
| hadn't even considered doing _data over specialized VM_ vs.
| _bytecode+data over general VM_. Honestly, since it 's designed
| around 'maximum laziness' (it just parses/verifies and creates
| metadata over the input so you only pay for decoding bytes you
| actually use) and I/O overhead is way greater than the VM
| dispatching trying this out is probably one of those "premature
| optimization is the root of all evil" cases but intriguing none
| the less.
| tonyarkles wrote:
| Based on what I know about the structure of protobufs
| internally and without having looked deep into what UPB is
| doing... I'd guess it could probably be a stack machine that
| treats (byte)+ as opcodes. Most of the time I'd think of it as
| parser -> AST -> bytecode, but I think the "grammar" of
| protobufs would allow your parser to essentially emit terminals
| as they're parsed straight to the VM as instructions to
| execute.
| UncleEntity wrote:
| In the couple days since I posted my confusion (threads
| merged or something) I consulted the daffy robots and figured
| out how it all works. Also had them come up with a design
| document for "a specialized compiler and virtual machine
| architecture for parsing Protocol Buffer messages that
| achieves significant performance improvements through a novel
| compilation pipeline combining protobuf-specific AST
| optimization, continuation-passing style transformations, and
| tail call interpreter execution."
|
| Interesting times we live in...
| haberman wrote:
| I think I can shed some light on this, as the creator and lead
| of upb.
|
| Calling a Protobuf Parser an "interpreter VM" is a little bit
| of rhetorical flourish. It comes from the observation that
| there are some deep structural similarities between the two,
| which I first observed in an article a few years back:
| https://blog.reverberate.org/2021/04/21/musttail-efficient-i...
|
| > It may seem odd to compare interpreter loops to protobuf
| parsers, but the nature of the protobuf wire format makes them
| more similar than you might expect. The protobuf wire format is
| a series of tag/value pairs, where the tag contains a field
| number and wire type. This tag acts similarly to an interpreter
| opcode: it tells us what operation we need to perform to parse
| this field's data. Like interpreter opcodes, protobuf field
| numbers can come in any order, so we have to be prepared to
| dispatch to any part of the code at any time.
|
| This means that the overall structure of a protobuf parser is
| conceptually a while() loop surrounding a switch() statement,
| just like a VM interpreter.
|
| The tricky part is that the set of "case" labels for a Protobuf
| parser is message-specific and defined by the fields in the
| schema. How do we accommodate that?
|
| The traditional answer was to generate a function per message
| and use the schema's field numbers as the case labels. You can
| see an example of that here (in C++):
| https://github.com/protocolbuffers/protobuf/blob/f763a2a8608...
|
| More recently, we've moved towards making Protobuf parsing more
| data-driven, where each field's schema is compiled into _data_
| that is passed as an argument to a generic Protobuf parser
| function. We call this "table-driven parsing", and from my
| read of the blog article, I believe this is what Miguel is
| doing with hyperpb.
|
| The trick then becomes how to make this table-driven dispatch
| as fast as possible, to simulate what the switch() statement
| would have done. That question is what I cover at length in the
| article mentioned above.
| anonymoushn wrote:
| Really great. I wonder, for the "types encoded as code"
| approach, is there any benefit to fast paths for data with
| fields in ascending order? For some json parsers with types
| encoded as code I have observed some speedup from either
| hard-coding a known key order or assuming keys in some order
| and providing a fallback in case an unexpected key is
| encountered. For users who are stuck with protobuf forever
| because of various services using it and various data being
| encoded this way, the historical data could plausibly be
| canonicalized and written back in large chunks when it is
| accessed, so that one need not pay the entire cost of
| canonicalizing it all at once. But of course the icache
| concerns are still just as bad.
| mdhb wrote:
| I'd really love to see more work bringing the best parts of
| protobuf to a standardised serialization format like CBOR.
|
| I'd make the same argument for gRPC-web to something like WHATWG
| streams and or WebTransport.
|
| There is a lot of really cool and important learnings in both but
| it's also so tied up in weird tooling and assumptions. Let's
| rebase on IETF and W3C standards
| youngtaff wrote:
| Would be good to see support for encoding / decoding CBOR
| exposed as a broswer API - they currently use CBOR internally
| for WebAuthn so I'd hope it's bnot too hard
| cyberax wrote:
| You can easily do this. Protobuf supports pluggable writers,
| and iterating over a schema is pretty easy. We do it for the
| JSONB.
|
| I'm not sure the purpose, though. Protobuf is great for its
| inflexible schema, and CBOR is great for its flexible data
| representation.
|
| A separate CBOR schema would be a better fit, there's CDDL but
| it has no traction.
| irq-1 wrote:
| https://github.com/bufbuild/hyperpb-go
| skybrian wrote:
| This is excellent: an in-depth description showing how the Go
| internals make writing fast interpreters difficult, by someone
| who is far more determined than I ever was to make it fast
| anyway.
|
| I've assumed that writing fast interpreters wasn't a use case the
| Go team cared much about, but if it makes protobuf parsing
| faster, maybe it will get some attention, and some of these low-
| level tricks will no longer be necessary?
| alexozer wrote:
| So am I identifying the bottlenecks that motivate this design
| correctly?
|
| 1. Go FFI is slow
|
| 2. Per-proto generated code specialization is slow, because of
| icache pressure
|
| I know there's more to the optimization story here, but I guess
| these are the primary motivations for the VM over just better
| code generation or implementing a parser in non-Go?
| jeffrallen wrote:
| > hyperpb is a brand new library, written in the most cursed Go
| imaginable
|
| This made me LOL.
| dumah wrote:
| Fantastic post.
|
| Please do one on your analysis and optimization workflow and
| tooling!
| dang wrote:
| Related ongoing thread:
|
| _Hyperpb: Faster dynamic Protobuf parsing_ -
| https://news.ycombinator.com/item?id=44661785
| ryukoposting wrote:
| > Every type contributes to a cost on the instruction cache,
| meaning that if your program parses a lot of different types, it
| will essentially flush your instruction cache any time you enter
| a parser. Worse still, if a parse involves enough types, the
| parser itself will hit instruction decoding throughput issues.
|
| Interesting. This makes me wonder how nanopb would benchmark
| against the parsers shown in the graphs. nanopb's whole schtick
| is that it's pure C99 and it doesn't generate separate parsing
| functions for each message. nanopb is what a lot of embedded code
| uses, due to the small footprint.
| tschellenbach wrote:
| last time i benchmarked msgpack and protobuf against each other
| it was near flat for my use case. JSON was 2-3x slower, but
| msgpack and protobuf were near equal. might be different now
| after this release, exciting :)
| Analemma_ wrote:
| Keep in mind that some of the performance bottlenecks the
| author is talking about and optimizing for show up in large-
| scale uses and possibly not in benchmarks. In particular, the
| "parsing too many different types blows away your instruction
| cache" issue will only show up if you are actually parsing lots
| of types, otherwise UPB is not necessary.
| ptspts wrote:
| Where are the benchmarks comparing hyperpb to other proto
| parsers?
| cyberax wrote:
| I would love to benchmark it against the old gogo protobuf. One
| thing that really slows down PB is its insistence on using
| pointers everywhere. Gogo protobuf used value types where
| possible, significantly reducing the GC pressure.
|
| It feels like arenas are just an attempt to bring that back.
| dgan wrote:
| maybe if performance is really an issue, protobufs shouldn't be
| used ?
|
| there is also flatbuffers, capnp and purely C++ serializers: zpp,
| yas... but protobufs are definitely convenient !
| nemo1618 wrote:
| There are two ways to look at this.
|
| First is that, if the parsing library for your codec includes a
| compiler, VM, and PGO, your codec must be extremely cursed and
| you should take a step back and think about your life.
|
| Second is that, if the parsing library for your codec includes a
| compiler, VM, and PGO, your codec must be wildly popular and adds
| enormous value.
___________________________________________________________________
(page generated 2025-07-23 23:00 UTC)