[HN Gopher] Go Protobuf: The New Opaque API
___________________________________________________________________
Go Protobuf: The New Opaque API
Author : secure
Score : 111 points
Date : 2024-12-16 20:18 UTC (2 hours ago)
(HTM) web link (go.dev)
(TXT) w3m dump (go.dev)
| kubb wrote:
| I hate this API and Go's handling of protocol buffers in general.
| Especially preparing test data for it makes for some of the most
| cumbersome and unwieldy files that you will ever come across.
| Combined with table driven testing you have thousands upon
| thousands of lines of data with an unbelievably long identifiers
| that can't be inferred (e.g. in array literals) that is usually
| copy pasted around and slightly changed. Updating and
| understanding all of that is a nightmare and if you miss a coma
| or a brace somewhere, the compiler isn't smart enough to point
| you to where so you get lines upon lines of syntax errors. But,
| being opaque has some advantages for sure.
| throwaway894345 wrote:
| I haven't used protocol buffers, but in general any kind of
| code generation produces awful code. I much prefer generating
| the machine spec (protocol buffers, in this case) from Go code
| rather than the other way around. It's not a perfect solution,
| but it's much better than dealing with generated code in my
| experience.
| alakra wrote:
| Is this like the FlatBuffers "zero-copy" deserialization?
| mort96 wrote:
| I'm not done reading the article yet, but nothing so far
| indicates that this is zero-copy, just a more efficient
| internal representation
| kyrra wrote:
| Nope. This is just a different implementation that greatly
| improves the speed in various ways.
| strawhatguy wrote:
| Great, now there's an API per struct/message to learn and
| communicate throughout the codebase, with all the getters and
| setters.
|
| A given struct is probably faster for protobuf parsing in the new
| layout, but the complexity of the code probably increases, and I
| can see this complexity easily negating these gains.
| hellcow wrote:
| I'd recommend transforming protobuf types to domain types at
| your API boundary. Then you have domain types through the whole
| application.
| mxey wrote:
| At which point I loose all the benefits of lazy decoding that
| the accessor methods can provide, so I could just decode
| directly into a sensible struct, except you can't with
| Protobuf.
| mort96 wrote:
| Accessor methods aren't for lazy decoding but for more
| efficient memory layouts.
| mxey wrote:
| But that will also not transfer over to the domain struct
| secure wrote:
| > Great, now there's an API per struct/message to learn and
| communicate throughout the codebase, with all the getters and
| setters.
|
| No, the general idea (and practical experience, at least for
| projects within Google) is that a codebase migrates completely
| from one API level to another. Only larger code bases will have
| to deal with different API levels. Even in such cases, your
| policy can remain "always use the Open API" unless you are
| interested in picking up the performance gains of the Opaque
| API.
| mort96 wrote:
| I mean calling it "a new API per message" is a bit of an
| exaggeration... the "API" per message is still the same:
| something with some set of attributes. It's just that those
| attributes are now set and accessed with getters and setters
| (with predictable names) rather than as struct fields. Once you
| know how to access fields on protobuf types in general, all
| message-specific info you need is which fields exist and what
| their types are, which was the case before too.
| jrockway wrote:
| I always used the getters anyway. Given:
| message M { string foo = 1; } message N
| { M bar = 2; }
|
| I find (new(M)).Bar.Foo panicking pretty annoying. So I just
| made it a habit to m.GetBar().GetFoo() anyway. If
| m.GetBar().SetFoo() works with the new API, that would be an
| improvement.
|
| There are some options like nilaway if you want static analysis
| to prevent you from writing this sort of code, but it's
| difficult to retrofit into an existing codebase that plays a
| little too fast and loose with nil values. Having code authors
| and code reviewers do the work is simpler, though probably less
| accurate.
|
| The generated code's API has never really bothered me. It is
| flexible enough to be clever. I especially liked using proto3
| for data types and then storing them in a kv store with an API
| like: type WithID interface { GetId() []byte }
| func Put(tx *Tx, x WithID) error { ... } func Get(tx
| *Tx, id []byte) (WithId, error) { ... }
|
| The autogenerated API is flexible enough for this sort of
| shenanigan, though it's not something I would recommend except
| to have fun.
| jeffbee wrote:
| Protobuf 3 was bending over backwards to try to make the Go API
| make sense, but in the process it screwed up the API for C++,
| with many compromises. Then they changed course and made presence
| explicit again in proto 3.1. Now they are saying Go gets a
| C++-like API.
|
| What I'd like is to rewind the time machine and undo all the
| path-dependent brain damage.
| sa46 wrote:
| When I was at Google around 2016, there was a significant push
| to convince folks that the proto3 implicit presence was
| superior to explicit presence.
|
| Is there a design doc with the rationale for switching back to
| explicit presence for Edition 2023?
|
| The closest docs I've found are
| https://buf.build/blog/protobuf-editions-are-here and https://g
| ithub.com/protocolbuffers/protobuf/tree/main/docs/d....
| jeffbee wrote:
| I was only there for the debate you mentioned and not there
| for the reversal, so I dunno.
| jcdavis wrote:
| > it screwed up the API for C++, with many compromises
|
| The implicit presence garbage screwed up the API for many
| languages, not just C++
|
| What is wild is how obviously silly it was at the time, too -
| no hindsight was needed.
| dekhn wrote:
| I work mainly in Python, it's always seemed really bad that
| there are 3 main implementations of Protobufs, instead of the
| C++ being the real implementation and other platforms just
| dlopen'ing and using it (there are a million software
| engineering arguments around this; I've heard them all before,
| have my own opinions, and have listened to the opinions of
| people I disagree with). It seems like the velocity of a
| project is the reciprocal of the number of independent
| implementations of a spec because any one of the
| implementations can slow down all the implementations (like
| what happened with proto3 around required and optional).
|
| From what I can tell, a major source of the problem was that
| protobuf field semantics were absolutely critical to the
| scaling of google in the early days (as an inter-server
| protocol for rapidly evolving things like the search stack),
| but it's also being used as a data modelling toolkit (as a way
| of representing data with a high level of fidelity). And those
| two groups- along with the multiple language developers who
| don't want to deal with native code- do not see eye to eye, and
| want to drive the spec in their preferred direction.
|
| (FWIW nowadays I use pydantic for type descriptions and JSON
| for transport, but I really prefer having an external IDL
| unrelated to any specific programming language)
| sbrother wrote:
| I still use proto2 if possible. The syntactic sugar around
| `oneof` wasn't nice enough to merit dealing with proto3's
| implicit presence -- maybe it is just because I learned proto2
| with C++ and don't use Go, but proto3 just seemed like a big
| step back and introduced footguns that weren't there before.
| Happy to hear they are reverting some of those finally.
| kyrra wrote:
| The opaque API brings some niceties that other languages have,
| specifically about initialization. The Java impl for protobuf
| will never generate a NullPointerException, as calling `get` on a
| field would just return the default instance of that field.
|
| The Go OpenAPI did not do this. For many primative types, it was
| fine. But for protobuf maps, you had to check if the map had been
| initialized yet in Go code before accessing it. Meaning, with the
| Opaque API, you can start just adding items to a proto map in Go
| code without thinking about initialization. (as the Opaque impl
| will init the map for you).
|
| This is honestly something I wish Go itself would do. Allowing
| for nil maps in Go is such a footgun.
| ynniv wrote:
| _The Java impl for protobuf will never generate a
| NullPointerException, as calling `get` on a field would just
| return the default instance of that field._
|
| This was a mistake. You still want to check whether it was
| initialized most of the time, and when you do the wrong thing
| it's even more difficult to see the error.
| kyrra wrote:
| Depends on your use. If you are parsing a message you just
| received, I agree that you want to do a "has" check before
| accessing a field. But when constructing a message, having to
| manually create all the options is really annoying. (I do
| love the java builder pattern for protos).
|
| But I do know the footgun of calling "get" on a Java Proto
| Builder without setting it, as that actually initializes the
| field to empty, and could call it to be emitted as such.
|
| Such are the tradeoffs. I'd prefer null-safety to accidental
| field setting (or thinking a field was set, when it really
| wasn't).
| tantalor wrote:
| > you want to do a "has" check before accessing a field
|
| You should only do that if the semantics of the not-set
| field are different than the default value, which should be
| rare and documented on the field.
| dpeckett wrote:
| To be honest I kind of find myself drifting away from
| gRPC/protobuf in my recent projects. I love the idea of an IDL
| for describing APIs and a great compiler/codegen (protoc) but
| there's just soo many idiosyncrasies baked into gRPC at this
| point that it often doesn't feel worth it IMO.
|
| Been increasingly using LSP style JSON-RPC 2.0, sure it's got
| it's quirks and is far from the most wire/marshaling efficient
| approach but JSON codecs are ubiquitous and JSON-RPC is trivial
| to implement. In-fact I recently even wrote a stack allocated,
| server implementation for microcontrollers in Rust
| https://github.com/OpenPSG/embedded-jsonrpc.
|
| Varlink (https://varlink.org/) is another interesting approach,
| there's reasons why they didn't implement the full JSON-RPC spec
| but their IDL is pretty interesting.
| malkia wrote:
| Apart from being text format, I'm not sure how well JSON-RPC
| handles doubles vs long integers and other types, where
| protobuf can be directed to handle them appropriately. That is
| a problem in JSON itself, so you may neeed to encode some
| numbers using... "string"
| dpeckett wrote:
| I'd say the success of REST kind of proves that's something
| that for the most part can be worked around. Often comes down
| to the JSON codec itself, many codecs will allow
| unmarshalling/marshalling fields straight into long int
| types.
|
| Also JS now has BigInt types and the JSON decoder can be told
| to use them. So I'd argue it's kind of a moot point at this
| stage.
| tonymet wrote:
| why is code generation under-utilized? protobufs and other go
| tooling are great for code generation. Yet in practice i see few
| teams using it at scale.
|
| Lots of teams creating rest / json APIs, but very few who use
| code generation to provide compile-time protection.
| kevmo314 wrote:
| Code generation leaves a layer of abstraction between the API
| and the actual implementation which works great if that code
| generation is bug-free but if it's not, you're like... totally
| fucked. Most commonly people say you can read the generated
| code and step backwards but that's like saying you can read the
| compiled JavaScript and it's basically open source. That layer
| of abstraction is an underrated mental barrier.
|
| Of course, code generation is still practical and I'm a lot
| more likely to trust a third-party writing a code generator
| like protobufs, OpenAPI specs, etc, but I would not trust an
| internal team to do so without a very good reason. I've worked
| on a few projects that lost hundreds of dev hours trying to
| maintain their code generator to avoid a tiny bit of
| copy/paste.
| kccqzy wrote:
| Code generation is under utilized because most people don't
| have a build system good enough for it. Traditional make is
| fine: you just define dependencies and rules. But a lot of
| people want to use language-specific build systems and these
| often don't have good support for code generation and
| dependency tracking for generated code.
|
| Yet another subtlety is that when cross-compiling, you need to
| build the code generation tool for the local target always even
| though the main target could be a foreign architecture. And
| because the code generation tool and the main code could share
| dependencies, these dependencies need to be built twice for
| different targets. That again is something many build tools
| don't support.
| lakomen wrote:
| Graphql won the race for me. Grpc is no longer relevant. Too many
| hurdles, no proper to and from Web support. You have to use some
| 3rd party non free service.
| nicce wrote:
| Aren't their usecases completely different?
| asmor wrote:
| Intersects quite heavily if you're defining a schema for your
| API
| g0ld3nrati0 wrote:
| just curious, why do use protobuf instead of flatbuffers?
___________________________________________________________________
(page generated 2024-12-16 23:00 UTC)