[HN Gopher] Go Protobuf: The New Opaque API
       ___________________________________________________________________
        
       Go Protobuf: The New Opaque API
        
       Author : secure
       Score  : 111 points
       Date   : 2024-12-16 20:18 UTC (2 hours ago)
        
 (HTM) web link (go.dev)
 (TXT) w3m dump (go.dev)
        
       | kubb wrote:
       | I hate this API and Go's handling of protocol buffers in general.
       | Especially preparing test data for it makes for some of the most
       | cumbersome and unwieldy files that you will ever come across.
       | Combined with table driven testing you have thousands upon
       | thousands of lines of data with an unbelievably long identifiers
       | that can't be inferred (e.g. in array literals) that is usually
       | copy pasted around and slightly changed. Updating and
       | understanding all of that is a nightmare and if you miss a coma
       | or a brace somewhere, the compiler isn't smart enough to point
       | you to where so you get lines upon lines of syntax errors. But,
       | being opaque has some advantages for sure.
        
         | throwaway894345 wrote:
         | I haven't used protocol buffers, but in general any kind of
         | code generation produces awful code. I much prefer generating
         | the machine spec (protocol buffers, in this case) from Go code
         | rather than the other way around. It's not a perfect solution,
         | but it's much better than dealing with generated code in my
         | experience.
        
       | alakra wrote:
       | Is this like the FlatBuffers "zero-copy" deserialization?
        
         | mort96 wrote:
         | I'm not done reading the article yet, but nothing so far
         | indicates that this is zero-copy, just a more efficient
         | internal representation
        
         | kyrra wrote:
         | Nope. This is just a different implementation that greatly
         | improves the speed in various ways.
        
       | strawhatguy wrote:
       | Great, now there's an API per struct/message to learn and
       | communicate throughout the codebase, with all the getters and
       | setters.
       | 
       | A given struct is probably faster for protobuf parsing in the new
       | layout, but the complexity of the code probably increases, and I
       | can see this complexity easily negating these gains.
        
         | hellcow wrote:
         | I'd recommend transforming protobuf types to domain types at
         | your API boundary. Then you have domain types through the whole
         | application.
        
           | mxey wrote:
           | At which point I loose all the benefits of lazy decoding that
           | the accessor methods can provide, so I could just decode
           | directly into a sensible struct, except you can't with
           | Protobuf.
        
             | mort96 wrote:
             | Accessor methods aren't for lazy decoding but for more
             | efficient memory layouts.
        
               | mxey wrote:
               | But that will also not transfer over to the domain struct
        
         | secure wrote:
         | > Great, now there's an API per struct/message to learn and
         | communicate throughout the codebase, with all the getters and
         | setters.
         | 
         | No, the general idea (and practical experience, at least for
         | projects within Google) is that a codebase migrates completely
         | from one API level to another. Only larger code bases will have
         | to deal with different API levels. Even in such cases, your
         | policy can remain "always use the Open API" unless you are
         | interested in picking up the performance gains of the Opaque
         | API.
        
         | mort96 wrote:
         | I mean calling it "a new API per message" is a bit of an
         | exaggeration... the "API" per message is still the same:
         | something with some set of attributes. It's just that those
         | attributes are now set and accessed with getters and setters
         | (with predictable names) rather than as struct fields. Once you
         | know how to access fields on protobuf types in general, all
         | message-specific info you need is which fields exist and what
         | their types are, which was the case before too.
        
         | jrockway wrote:
         | I always used the getters anyway. Given:
         | message M {           string foo = 1;        }        message N
         | {            M bar = 2;        }
         | 
         | I find (new(M)).Bar.Foo panicking pretty annoying. So I just
         | made it a habit to m.GetBar().GetFoo() anyway. If
         | m.GetBar().SetFoo() works with the new API, that would be an
         | improvement.
         | 
         | There are some options like nilaway if you want static analysis
         | to prevent you from writing this sort of code, but it's
         | difficult to retrofit into an existing codebase that plays a
         | little too fast and loose with nil values. Having code authors
         | and code reviewers do the work is simpler, though probably less
         | accurate.
         | 
         | The generated code's API has never really bothered me. It is
         | flexible enough to be clever. I especially liked using proto3
         | for data types and then storing them in a kv store with an API
         | like:                  type WithID interface { GetId() []byte }
         | func Put(tx *Tx, x WithID) error { ... }        func Get(tx
         | *Tx, id []byte) (WithId, error) { ... }
         | 
         | The autogenerated API is flexible enough for this sort of
         | shenanigan, though it's not something I would recommend except
         | to have fun.
        
       | jeffbee wrote:
       | Protobuf 3 was bending over backwards to try to make the Go API
       | make sense, but in the process it screwed up the API for C++,
       | with many compromises. Then they changed course and made presence
       | explicit again in proto 3.1. Now they are saying Go gets a
       | C++-like API.
       | 
       | What I'd like is to rewind the time machine and undo all the
       | path-dependent brain damage.
        
         | sa46 wrote:
         | When I was at Google around 2016, there was a significant push
         | to convince folks that the proto3 implicit presence was
         | superior to explicit presence.
         | 
         | Is there a design doc with the rationale for switching back to
         | explicit presence for Edition 2023?
         | 
         | The closest docs I've found are
         | https://buf.build/blog/protobuf-editions-are-here and https://g
         | ithub.com/protocolbuffers/protobuf/tree/main/docs/d....
        
           | jeffbee wrote:
           | I was only there for the debate you mentioned and not there
           | for the reversal, so I dunno.
        
         | jcdavis wrote:
         | > it screwed up the API for C++, with many compromises
         | 
         | The implicit presence garbage screwed up the API for many
         | languages, not just C++
         | 
         | What is wild is how obviously silly it was at the time, too -
         | no hindsight was needed.
        
         | dekhn wrote:
         | I work mainly in Python, it's always seemed really bad that
         | there are 3 main implementations of Protobufs, instead of the
         | C++ being the real implementation and other platforms just
         | dlopen'ing and using it (there are a million software
         | engineering arguments around this; I've heard them all before,
         | have my own opinions, and have listened to the opinions of
         | people I disagree with). It seems like the velocity of a
         | project is the reciprocal of the number of independent
         | implementations of a spec because any one of the
         | implementations can slow down all the implementations (like
         | what happened with proto3 around required and optional).
         | 
         | From what I can tell, a major source of the problem was that
         | protobuf field semantics were absolutely critical to the
         | scaling of google in the early days (as an inter-server
         | protocol for rapidly evolving things like the search stack),
         | but it's also being used as a data modelling toolkit (as a way
         | of representing data with a high level of fidelity). And those
         | two groups- along with the multiple language developers who
         | don't want to deal with native code- do not see eye to eye, and
         | want to drive the spec in their preferred direction.
         | 
         | (FWIW nowadays I use pydantic for type descriptions and JSON
         | for transport, but I really prefer having an external IDL
         | unrelated to any specific programming language)
        
         | sbrother wrote:
         | I still use proto2 if possible. The syntactic sugar around
         | `oneof` wasn't nice enough to merit dealing with proto3's
         | implicit presence -- maybe it is just because I learned proto2
         | with C++ and don't use Go, but proto3 just seemed like a big
         | step back and introduced footguns that weren't there before.
         | Happy to hear they are reverting some of those finally.
        
       | kyrra wrote:
       | The opaque API brings some niceties that other languages have,
       | specifically about initialization. The Java impl for protobuf
       | will never generate a NullPointerException, as calling `get` on a
       | field would just return the default instance of that field.
       | 
       | The Go OpenAPI did not do this. For many primative types, it was
       | fine. But for protobuf maps, you had to check if the map had been
       | initialized yet in Go code before accessing it. Meaning, with the
       | Opaque API, you can start just adding items to a proto map in Go
       | code without thinking about initialization. (as the Opaque impl
       | will init the map for you).
       | 
       | This is honestly something I wish Go itself would do. Allowing
       | for nil maps in Go is such a footgun.
        
         | ynniv wrote:
         | _The Java impl for protobuf will never generate a
         | NullPointerException, as calling `get` on a field would just
         | return the default instance of that field._
         | 
         | This was a mistake. You still want to check whether it was
         | initialized most of the time, and when you do the wrong thing
         | it's even more difficult to see the error.
        
           | kyrra wrote:
           | Depends on your use. If you are parsing a message you just
           | received, I agree that you want to do a "has" check before
           | accessing a field. But when constructing a message, having to
           | manually create all the options is really annoying. (I do
           | love the java builder pattern for protos).
           | 
           | But I do know the footgun of calling "get" on a Java Proto
           | Builder without setting it, as that actually initializes the
           | field to empty, and could call it to be emitted as such.
           | 
           | Such are the tradeoffs. I'd prefer null-safety to accidental
           | field setting (or thinking a field was set, when it really
           | wasn't).
        
             | tantalor wrote:
             | > you want to do a "has" check before accessing a field
             | 
             | You should only do that if the semantics of the not-set
             | field are different than the default value, which should be
             | rare and documented on the field.
        
       | dpeckett wrote:
       | To be honest I kind of find myself drifting away from
       | gRPC/protobuf in my recent projects. I love the idea of an IDL
       | for describing APIs and a great compiler/codegen (protoc) but
       | there's just soo many idiosyncrasies baked into gRPC at this
       | point that it often doesn't feel worth it IMO.
       | 
       | Been increasingly using LSP style JSON-RPC 2.0, sure it's got
       | it's quirks and is far from the most wire/marshaling efficient
       | approach but JSON codecs are ubiquitous and JSON-RPC is trivial
       | to implement. In-fact I recently even wrote a stack allocated,
       | server implementation for microcontrollers in Rust
       | https://github.com/OpenPSG/embedded-jsonrpc.
       | 
       | Varlink (https://varlink.org/) is another interesting approach,
       | there's reasons why they didn't implement the full JSON-RPC spec
       | but their IDL is pretty interesting.
        
         | malkia wrote:
         | Apart from being text format, I'm not sure how well JSON-RPC
         | handles doubles vs long integers and other types, where
         | protobuf can be directed to handle them appropriately. That is
         | a problem in JSON itself, so you may neeed to encode some
         | numbers using... "string"
        
           | dpeckett wrote:
           | I'd say the success of REST kind of proves that's something
           | that for the most part can be worked around. Often comes down
           | to the JSON codec itself, many codecs will allow
           | unmarshalling/marshalling fields straight into long int
           | types.
           | 
           | Also JS now has BigInt types and the JSON decoder can be told
           | to use them. So I'd argue it's kind of a moot point at this
           | stage.
        
       | tonymet wrote:
       | why is code generation under-utilized? protobufs and other go
       | tooling are great for code generation. Yet in practice i see few
       | teams using it at scale.
       | 
       | Lots of teams creating rest / json APIs, but very few who use
       | code generation to provide compile-time protection.
        
         | kevmo314 wrote:
         | Code generation leaves a layer of abstraction between the API
         | and the actual implementation which works great if that code
         | generation is bug-free but if it's not, you're like... totally
         | fucked. Most commonly people say you can read the generated
         | code and step backwards but that's like saying you can read the
         | compiled JavaScript and it's basically open source. That layer
         | of abstraction is an underrated mental barrier.
         | 
         | Of course, code generation is still practical and I'm a lot
         | more likely to trust a third-party writing a code generator
         | like protobufs, OpenAPI specs, etc, but I would not trust an
         | internal team to do so without a very good reason. I've worked
         | on a few projects that lost hundreds of dev hours trying to
         | maintain their code generator to avoid a tiny bit of
         | copy/paste.
        
         | kccqzy wrote:
         | Code generation is under utilized because most people don't
         | have a build system good enough for it. Traditional make is
         | fine: you just define dependencies and rules. But a lot of
         | people want to use language-specific build systems and these
         | often don't have good support for code generation and
         | dependency tracking for generated code.
         | 
         | Yet another subtlety is that when cross-compiling, you need to
         | build the code generation tool for the local target always even
         | though the main target could be a foreign architecture. And
         | because the code generation tool and the main code could share
         | dependencies, these dependencies need to be built twice for
         | different targets. That again is something many build tools
         | don't support.
        
       | lakomen wrote:
       | Graphql won the race for me. Grpc is no longer relevant. Too many
       | hurdles, no proper to and from Web support. You have to use some
       | 3rd party non free service.
        
         | nicce wrote:
         | Aren't their usecases completely different?
        
           | asmor wrote:
           | Intersects quite heavily if you're defining a schema for your
           | API
        
       | g0ld3nrati0 wrote:
       | just curious, why do use protobuf instead of flatbuffers?
        
       ___________________________________________________________________
       (page generated 2024-12-16 23:00 UTC)