[HN Gopher] A new ProtoBuf generator for Go
___________________________________________________________________
A new ProtoBuf generator for Go
Author : tanoku
Score : 206 points
Date : 2021-06-03 17:50 UTC (5 hours ago)
(HTM) web link (vitess.io)
(TXT) w3m dump (vitess.io)
| jzelinskie wrote:
| I hadn't realized that Gogo was in such a bad spot with the
| upstream Go protobuf changes. There was lots of drama when the
| changes were made and I guess that overshadowed any optics I had
| on Gogo.
|
| Making vtprotobuf an additional protoc plugin seems like the
| Right Thing(tm), although it's a shame how complicated protoc
| commands end up becoming for mature projects. I'm pretty tempted
| to port Authzed over to this and run some benchmarks -- our
| entire service requires e2e latency under 20ms, so every little
| bit counts. The biggest performance win is likely just having an
| unintrusive interface for pooling allocated protos.
| jeffbee wrote:
| Proto message unmarshal in Go for a small message should be 5
| orders of magnitude below 20ms, shouldn't even begin to matter
| until you are sweating individual microseconds.
| lttlrck wrote:
| The significance of 20ms isn't clear so this is hard to
| judge.
|
| Perhaps they have significant external (network) latency
| leaving only a few ms budget for the application stack - so
| they could easily be up against a wall.
| morelisp wrote:
| Until the GC kicks in and steals a full 200usec + a bunch of
| your throughput...
|
| (Holy shit, who is downvoting this? It's literally the whole
| article!)
| harikb wrote:
| Properly written Go code (or even Java for that matter)
| will try to minimize allocations. For Java, unless I am
| mistaken pause-less GC is only offered by Azul - $$
| morelisp wrote:
| Yeah, the whole point of the article is that gRPC v2 (and
| frankly v1 for that matter) are not "properly written" to
| do this.
| RhodesianHunter wrote:
| >or even Java
|
| Just in case you may be unaware, the latest GCs for Java
| (Shenandoah, ZGC) are miles ahead of anything available
| for Go due to sheer age and manpower. Parallel and
| Pauseless are easily achievable in most cases.
| morelisp wrote:
| Java's GC is better but Go's GC is also parallel and
| "pauseless" - iirc ZGC is 50-500usec which is comparable
| to Go's target 200usec.
|
| The point is, neither is "five orders of magnitude" below
| 20ms. And neither needs zero CPU even if it doesn't block
| other threads.
| geodel wrote:
| > Latest GCs for Java (Shenandoah, ZGC) are miles ahead
| of anything available.
|
| Beyond hyperbole, do you have any actual comparison of Go
| vs Java GC performance?
| throwaway894345 wrote:
| If your path is sensitive to 200us of latency you should
| probably optimize your application and tune your GC.
| Typically 200us for freeing all unreachable memory is not a
| big deal.
| jcelerier wrote:
| > If your path is sensitive to 200us of latency you
| should probably optimize your application and tune your
| GC.
|
| okay, you've done this, three years later and it's the
| same thing again since you need to accomodate the new
| features. your users haven't upgraded their computers.
| what do you do ?
| brandmeyer wrote:
| 3% regression in QPS, 20% regression in CPU, and 5%
| regression in memory usage according to the article. Those
| are considerably worse than "5 orders of magnitude below".
| harikb wrote:
| GP meant 5 orders of magnitude below "20 ms". 20 ms is a
| lot of time.
|
| There is nothing one can do to a, say, a 1 kilo byte buffer
| that will cross 1 ms in _any_ language. My own Go code
| doesn 't cross more than few micros per message.
| brandmeyer wrote:
| GP's root claim is that protobuf
| serialization/deserialization performance shouldn't
| matter, on an article where a user is _specifically
| demonstrating that it does matter_.
| joshuamorton wrote:
| The usecase described in the article, and the usecase
| described in the top post in this thread aren't the same
| usecase. If you aren't throughput bound, a 5% regression
| in parse speed doesn't matter if your goal is to stay
| under 20ms and parsing takes 17 us. Sure it now takes 19
| us, which is a regression of 2 us out of 20ms, or
| 1/10000th of your time.
| rapsey wrote:
| > our entire service requires e2e latency under 20ms
|
| Why are you using Go then?
| kodah wrote:
| 20ms is a pretty considerable amount of time WRT E2E
| transaction time in today's world. Can you expand on your
| concerns with Go?
| mcronce wrote:
| It's not really suitable for latency-critical applications.
|
| EDIT: Fixed unfortunate typo
| fcantournet wrote:
| You can 100% write services with P999 < 20ms in go. Not
| even trying that hard. Go is entirely suitable for this
| kind of constraints, I dare say that's go's main target.
|
| P99 < 1ms, that's when you're going to want to switch it
| up.
| somethingwitty1 wrote:
| was the double-negative intentional? I've used Go for
| sub-millisecond needs. So 20ms seems like it would be a
| reasonable choice from where I'm sitting.
| mcronce wrote:
| It was not intentional, thanks for asking...very
| unfortunate typo ;)
|
| Go doesn't give you control over inline vs indirect
| allocation, instead relying on escape analysis, which is
| notoriously finicky. Seemingly unrelated changes, along
| with compiler upgrades, can ruin your carefully optimized
| code.
|
| This is especially heinous because it uses a GC;
| unnecessary allocations have a disproportionately large
| impact on your application performance. One or the other
| wouldn't be nearly as bad.
|
| Time and time again we see reports from
| organizations/projects with perfectly fine average
| latency, but horrendous p95+ times, when written in Go -
| some going as far as to do straight-up insane
| optimizations (see Dragph) or rewrite in other languages.
| jen20 wrote:
| I'm not sure that the phrasing in the article is particularly
| fair:
|
| > The maintainers of Gogo, understandably, were not up to the
| gigantic task.
|
| I'm 99% sure they are "up to" (as in "capable of") doing so, they
| are just not "up for" it (as in, "will not do it").
| jahewson wrote:
| Yes I assume the author meant "not up for"
| Zababa wrote:
| They could be "not up to" because of lack of resources,
| probably time and/or money. I think that's what is implied,
| rather than lack of technical knowledge.
| lux wrote:
| I got the sense that they meant "not willing" but I agree
| that's one of those English phrases that can easily be
| misconstrued towards the more negative interpretation.
|
| That said, I love the detailed post and the interesting
| solution, and the commitment to performance!
| n0x1m wrote:
| the biggest current problem with Go and ProtoBuf is swagger
| support when using it for API returns. Enums are not supported
| for example. The leniency of protojson can't be used in other
| languages that built on top of the swagger docs.
| PostThisTooFast wrote:
| Is there one for Kotlin yet? It's pretty pathetic that Google's
| own protocol lacks native support for its most popular operating
| system.
| HNLogInsSuckAss wrote:
| Yes, I was surprised by this. We ended up using the Java ones
| only two years ago because of the lack of a Kotlin generator.
| Google had a blog post talking about what a struggle it was for
| some reason, but meanwhile someone had already created a decent
| Swift generator.
| hn_go_brrrrr wrote:
| Yes: https://developers.google.com/protocol-
| buffers/docs/kotlintu...
| HNLogInsSuckAss wrote:
| Those must've been released only in the last couple of years.
| In 2019 there was still no Kotlin generator. The OP shouldn't
| have been modded down, because that is indeed pathetic.
|
| https://medium.com/digitalfrontiers/a-dance-with-
| protocols-k...
| jupp0r wrote:
| Using CPU utilization as a performance metric can be extremely
| misleading. My favorite article on the subject is from Brendan
| Gregg:
|
| http://www.brendangregg.com/blog/2017-05-09/cpu-utilization-...
|
| A much better way to test the influence of the new compiler would
| be to test the actual throughput at which saturation is achieved
| (which is what the benchmark in the C++ grpc library measure to
| assess their performance).
| dkhenry wrote:
| There is a fairly robust set of benchmarks that are run to test
| out performance improvements[1] and macro benchmarks are the
| ultimate test of holistic improvement. CPU isn't a great proxy,
| but one of the biggest problems in real world performance on
| this specific system ( databases in general ) is latency. CPU
| time is a really good proxy for latency so by taking a look at
| CPU time we can get an idea of how the system will respond
| under "normal" conditions.
|
| 1.https://benchmark.vitess.io/macrobench
| et1337 wrote:
| In this case the regression also caused a 3% decrease in
| throughput.
| gilgad13 wrote:
| Maybe I'm missing something, but my read of
| golang/protobuf#364[1] was that part of the motivation for the
| re-organization in protobuf-go v2 was to allow for optimizations
| like gogoprotobuf to be developed without requiring a complete
| fork. I totally understand that the authors of gogoprotobuf do
| not have the time to re-architect their library to use these
| hooks, but best I can figure this generator does not use these
| hooks either. Instead it defines additional member functions, and
| wrappers that look for those specialized functions and fallback
| to the generic ones if not found.
|
| For example, it looks like pooled decoders could be implemented
| by setting a custom unmarshaller through the ProtoMethods[2] API.
|
| I wonder why not? Did the authors of the vtprotobuf extension not
| want to bite off that much work? Is the new API not sufficient to
| do what they want (thus failing some of the goals expressed in
| golang/protobuf#364?
|
| [1]: https://github.com/golang/protobuf/issues/364
|
| [2]:
| https://pkg.go.dev/google.golang.org/protobuf@v1.26.0/reflec...
| alecthomas wrote:
| I haven't looked in more detail, but one blocker is that
| `ProtoMethods() *methods` returns a private type, making it
| effectively unimplementable outside this package.
| zeeboo wrote:
| So, I thought this at one point, too. But it turns out that
| methods is a type alias to an unnamed type, so there's no
| package level privacy issues:
| https://github.com/protocolbuffers/protobuf-
| go/blob/v1.26.0/...
| [deleted]
| shoefindortz wrote:
| > Arenas are, however, unfeasible to implement in Go because it
| is a garbage collected language.
|
| If you are willing to use cgo, google already implemented one for
| gapid.
|
| https://github.com/google/gapid/tree/master/core/memory/aren...
| pjmlp wrote:
| Not only that, there are other garbage collected languages like
| D, Nim and C# that offer the language features to do arenas
| without having to touch any C code.
|
| There is still so much education to do.
| throwaway894345 wrote:
| Do I misunderstand what arenas are? I thought it was just
| "allocate this big array as a single allocation rather than N
| little allocations"? If so, how is that not supported in Go?
| (e.g., `arena := make([]Foo, 1000000000)`)
| slimsag wrote:
| An arena allocator allows you to store many allocations _of
| different types_ in the same single chunk of memory, and
| then free all of them at one point in time.
| throwaway894345 wrote:
| Why can't you do this in Go? I'm 99% sure we can allocate
| a massive array of bytes using safe Go and use unsafe to
| cast a chunk of bytes to an instance of a type. This
| isn't type safe, but neither would the equivalent C code.
| slimsag wrote:
| That's what this whole thread is about: you can literally
| do just that.
| throwaway894345 wrote:
| > That's what this whole thread is about: you can
| literally do just that
|
| I don't know how you get that from the thread:
|
| > Arenas are, however, unfeasible to implement in Go
| because it is a garbage collected language.
|
| > If you are willing to use cgo, google already
| implemented one for gapid.
|
| > there are other garbage collected languages like D, Nim
| and C# that offer the language features to do arenas
| without having to touch any C code.
|
| It seems like the above statements implicitly or
| explicitly claim that this isn't feasible in Go without
| C.
| pjmlp wrote:
| You are misunderstanding the thread, I just mentioned
| some of the languages I like (still waiting for Go's
| generics), and the comment I was replying to made an
| assert about an implementation that uses cgo.
|
| Both of us are dismissing the assertion that "Arenas are,
| however, unfeasible to implement in Go because it is a
| garbage collected language."
|
| You can do manually memory allocation via a syscall into
| the host OS, use unsafe to cast memory blocks to the
| types that you want and then clean it all up with defer,
| assuming the arena is only usable inside a lexical
| region, otherwise extra care is needed to avoid leaks.
| throwaway894345 wrote:
| Fair enough.
| shoefindortz wrote:
| I was proposing the cgo option because it's already
| implemented.
|
| I _think_ allocating a slice of contiguous bytes and
| using unsafe pointers should work fine as long as you are
| very cautious about structs/vars with pointers into the
| buffer getting freed by the GC.
| throwaway894345 wrote:
| > I _think_ allocating a slice of contiguous bytes and
| using unsafe pointers should work fine as long as you are
| very cautious about structs/vars with pointers into the
| buffer getting freed by the GC
|
| Go's GC is conservative, so I don't think you need to
| take any special caution in that regard. I would expect
| that you just need to take care that your casts are
| correct (e.g., that you aren't casting overlapping
| regions of memory as distinct objects).
| acrispino wrote:
| Go went to a precise GC with version 1.3
| throwaway894345 wrote:
| Oh wow, I didn't realize.
| p_l wrote:
| Aren't arenas old news in GC languages in general?
|
| Most of the time, their non-presence is due to general pools
| being just as good most of the time, or people simply not
| needing them that much with modern GC
| pjmlp wrote:
| Yes, so I really did not got how come such assertion was
| made.
|
| Probably lack of experience with machine friendly code.
| dimitrios1 wrote:
| I can't believe we've managed to have this lengthy of a
| discussion about GC languages and speed without anyone
| mentioning rust. Has HN turned a corner?
| shoefindortz wrote:
| Rust has an arena allocator too[1], but it is implemented
| with 165(!!!) usages of unsafe. :)
|
| [1] https://github.com/fitzgen/bumpalo
| coder543 wrote:
| This is far from the only arena allocator written in
| Rust.
|
| From the same author, a zero-unsafe arena allocator:
| https://github.com/fitzgen/generational-arena
|
| There are many, _many_ arena implementations available
| with varying characteristics. It 's disingenuous to act
| like Rust requires the author of an arena library to
| write "unsafe" everywhere.
| pjmlp wrote:
| Maybe, don't know.
|
| In what concerns me, although I like Rust, I only see it
| for scenarios where any kind of memory allocation is very
| precious, Ada/SPARK and MISRA-C style.
|
| I have been using GC languages with C++ like features, or
| polyglot codebases, for almost 20 years to think otherwise.
|
| Most of the time developers learn about _new_ and miss out
| on the low level language features.
|
| It is a matter of balance, either trying to do everything
| in a single language, or eventually write a couple of
| functions in a lower level language that are then used as
| building blocks for the rest of the application.
|
| No need to throw away the ecosystem and developer tooling
| just to rewrite a data structure.
| azth wrote:
| Would you consider codecs or heavy numerical simulations
| to fall under those memory allocation scenarios that
| you'd use Rust for as well?
| flakiness wrote:
| I wonder what Google is thinking about the v2 performance. It's
| well known that protobuf processing is taxing heavy on their data
| center [1]. It's hard to imagine they just leave it slow. Or do
| they?
|
| [1] https://research.google/pubs/pub44271/
| justicezyx wrote:
| There was a project to develop a asic (probably bundled inside
| NIC) to do protobuf parsing. At some point Sanjay did a change
| to proto API that rendered that project less appealing.
|
| Disclaimer: Google had a lot of internal stuff they considered
| important to their core tech competencies. For example, no open
| source about Google paxos APIs and infrastructure, networking,
| etc.
___________________________________________________________________
(page generated 2021-06-03 23:00 UTC)