[HN Gopher] Gobs of data (2011)
___________________________________________________________________
Gobs of data (2011)
Author : ash
Score : 82 points
Date : 2023-12-04 12:15 UTC (10 hours ago)
(HTM) web link (go.dev)
(TXT) w3m dump (go.dev)
| buro9 wrote:
| this is quite old, so I'm curious about what triggered it being
| posted again, has something happened / changed?
| jstanley wrote:
| Just because you already knew it all doesn't mean everyone else
| did. I hadn't seen it before.
|
| Sometimes even when something was posted a few years ago some
| people just haven't seen it yet.
| sudhirj wrote:
| Ten thousand people, to be exact https://xkcd.com/1053/
| blowski wrote:
| It's an entirely reasonable question to ask "is there any
| specific as to why this is being posted today?". If the
| answer is no, that's fine, but there may be extra context
| that is interesting and not obvious.
| ash wrote:
| I've posted it because I'm always on the lookout for simple
| solutions for complex problems, and especially for how these
| solutions are designed. The post describes the design process
| well.
|
| Also Rob Pike is a great technical writer. Another example of
| his style is "Effective Go":
|
| https://go.dev/doc/effective_go
| buro9 wrote:
| yup, and if people are looking for usage I just found a gist
| that shows how gob handling can be useful (writing to cache
| that allows the reading back to be castable into the correct
| structs) https://gist.github.com/pioz/ca5b7a11200f54afbd76dee
| 7acbcc06...
| azaras wrote:
| I did not know it, but I think so few changes from the proto-
| buffer that it is a waste of time.
| bheadmaster wrote:
| Note that this was written in 2011, while the first mention of
| "proto3" in protobuf repository was in 2014. So this blogpost
| probably influenced the development of proto3, which fixed many
| issues of proto2 (which is referred to as just "protocol
| buffers" in the blogpost).
| icholy wrote:
| Eh, I like Go and respect Rob Pike, but I seriously doubt gob
| had any impact on the proto3 design
| losvedir wrote:
| Interesting. I wonder to what extent it's found use at Google
| over this past decade.
|
| There are advantage to being language-specific, but a lot of
| disadvantages, as well (speaking as someone who recently had to
| write some Elixir code to unmarshal a Ruby object...). It seems
| hard to introduce this since you're forcing all communicating
| services to be Go-based, which is kind of contrary to the
| independence that microservices usually affords you.
|
| Some of the benefits are simply design goals (e.g., top level
| arrays) which could also be done in a language-independent
| protocol. And even performance questions _probably_ could. Like,
| Cap 'n Proto I think is designed so that users of the protocol
| don't have to serialize/deserialize the data, right? They just
| pass it around and work with it directly.
|
| I can see Rob Pike being frustrated with Protocol Buffers at
| Google, and I don't begrudge anyone for taking a big shot like
| this, but I wonder if he's found any success with it.
| lifthrasiir wrote:
| Yeah, after years of dealing with language-specific
| serialization formats---and inadvertently learning internals of
| them (including Go gob, Python pickle and PHP serialize), I'm
| over. And gob is not even a schematic serialization format
| (i.e. not only you don't need to define a schema beforehand,
| you _can 't_). There is some interesting idea, but that's all.
| Use a well-known schemaless serialization format with some
| extensibility [1] if you really need.
|
| [1] Maybe there was no suitable one when Go was first created.
| Nowadays I believe CBOR is the best format for this job.
| packetlost wrote:
| I'm in the same boat. Not to mention security concerns that
| often crop up in (interpreted) language specific
| deserialization (I'm looking at you, pickle, thinly veiled
| `eval()`). I agree that CBOR should generally be the
| serialization tool of choice for self-describing data (ie. in
| places where you might otherwise choose JSON).
|
| And if your language of choice doesn't have a CBOR lib, CBOR
| is fairly easy to implement and writing a encoder/decoder is
| very fun! I recently completed my implementation for the
| Gerbil Scheme language last week [0].
|
| [0]: https://github.com/chiefnoah/gerbil-cbor
| twotwotwo wrote:
| Yep, I very much agree with this. It's probably inevitable
| that languages with reflection grow some kind of language-
| specific serialization format because a) they can and b)
| there's often _some_ use case where it looks handy to have a
| format adapted to the quirks of the language. Plus, when some
| of these bespoke formats were created, the world hadn 't
| converged as much as it has now around a few common text and
| binary formats.
|
| But now the interoperable serialization formats have a lot
| more energy spent on tools and such, and the formats are
| better-specified, to the point that you probably want to use
| them even where interoperability with other stuff doesn't
| force it.
| danpalmer wrote:
| As an engineer at Google, my opinion on Protocol Buffers
| changed massively. Pre-Google I found them awkward, the
| language bindings in Python sucked, and I didn't really see the
| point. I knew a schema for services was a good idea, but
| protobufs didn't seem like the best option.
|
| The thing is that at Google protobufs are used everywhere.
| Like, absolutely everywhere. Think of all the places they could
| be, and it's way more. All the tooling understands them, code
| search and go-to-reference works on them everywhere, they are
| truly transformational on how many different services with many
| different implementations interact.
|
| Are they perfect? Far from it. But a Go-only implementation
| misses almost all the value of protobufs. If I was inventing
| them for Python I could do better (pickle? maybe not) but the
| whole point is that they aren't language/ecosystem/use-case
| specific. If this was Rob Pike's frustration then I can't help
| but feel he missed the point, or this post is a little
| disingenuous as to the benefits.
|
| I've not seen Gobs used in Google, but I'd imagine an engineer
| would need to make a very strong case against using protos
| between services, regardless of if both services are in Go.
| knorker wrote:
| In my opinion this is pretty on-brand for Pike.
|
| Back when he did his best work, it was possible for one
| person to "just write the new thing", without making it fit
| with anything else. There was nothing else to fit it with.
| You could invent everything from scratch, and not only was it
| not a waste of time, if you were good enough it had a chance
| of being the best fit for purpose.
|
| You could take shortcuts. You could have every part of your
| system be "odd", because nothing was "even".
|
| That's not true anymore. And the way I see it Pike has not
| moved on.
|
| Science in general had this switch at one point, too. There
| was a point where one could know all of science. But it's
| long gone.
| danpalmer wrote:
| Interesting hypothesis. I can't comment on Pike's history
| here, but at Google there's certainly a noticeable
| difference between old special cases that were built
| pre-2015 ish, and the modern world where everything is
| _very_ cohesive. I get the impression that there was a big
| push to achieve that, led from around that time by various
| products. It 's hit some areas more than others, but there
| do seem to be almost no new products building in the way
| you've described now.
| hellozomo wrote:
| This is an unnecessary ad hominem attack. He wrote this
| over a decade ago, in the midst of doing his best work
| creating Go itself.
|
| gob was his opinionated way of doing a Go-specific
| encoding, while also supporting any number of other
| encodings in the language. Go has incredibly good support
| for almost every popular encoding there is.
|
| gob has also been used successfully by a number of
| projects. In many cases, it's a perfectly good way to
| encode a piece of data that is completely local to a Go
| program.
| knorker wrote:
| > This is an unnecessary ad hominem attack
|
| I didn't mean it as such.
|
| > in the midst of doing his best work creating Go itself.
|
| Not to go too far off topic, but Go is another example.
| It famously ignores decades of language theory, and they
| wrote their own assembler, linker, etc.
|
| Now, much of that has been undone and rewritten, as Go
| became more adopted, requiring playing well with the rest
| of the ecosystem.
|
| (but much of it we're unfortunately stuck with, because
| it's part of the language)
|
| 30 years ago there was no ecosystem to play well with,
| and compared to now we were just banging rocks together.
| Back then you could be a CS polymath as one person. Well,
| _I_ couldn 't, but Pike could.
|
| It was the old days of John Carmack starting every game
| engine with an empty directory.
|
| I'm saying that today nobody can. Even John Carmack could
| not on his own write a AAA game. (I know ID had other
| coders, but my point stands)
| hellozomo wrote:
| Well, to go off-topic with you, the idea that Go "ignores
| decades of language theory" is just an opinion that you
| hold. A factual statement would be to say that the Go
| designers _omitted_ a great many features other languages
| have included.
|
| The idea that they did so out of ignorance is ridiculous,
| given the background of the Go design team. They made
| considered decisions about what to include in Go.
|
| A fixed-gear bicycle is not "ignoring" decades of bicycle
| design theory.
|
| Creating a programming language is an _engineering_ task
| not a _theoretical_ task. Which means there are major
| trade-offs to be made. And they chose their trade-offs.
| The wild success of Go should at least make you consider
| whether or not they 're better at making these trade-offs
| than you, and most language designers, are.
|
| Maybe they were wrong about some choices, which is why Go
| is still evolving, but they were self-evidently mostly
| correct.
| knorker wrote:
| Well, that's your opinion.
|
| You seem to have gotten a bit emotional about this, so I
| don't think this'll go anywhere.
| biorach wrote:
| The comment you're replying to is calm, reasoned and
| courteous.
|
| Claiming the poster is being emotional is... poor form,
| to put it mildly
| knorker wrote:
| Oh. I found it very defensive and aggressive.
|
| Putting scare quotes around what I said, calling what I
| said "ridiculous", and seems to have taken critique of
| the Go language personally by saying that the Go language
| designers better than most language designers.
|
| So this is why I found the reply neither calm nor
| courteous. As for reasoned, I don't see any reasoning it
| it, just conclusions.
|
| I don't even see any sign of a refutation of my point,
| which isn't so much about if Go is good or not, but about
| the needless reinventing and ignoring other work, and as
| a result running into problems that would have been
| predictable, had they been taken into account.
|
| To me it summarizes as "they're smart, you're dumb".
|
| If this is the type of replies this person would write,
| then that's not productive.
| dsff3f3f3f wrote:
| Most of the Reddit/HN/Twitter crowd that are parroting
| the "Go ignores decades of language theory" line are just
| parroting something that originated in an incredibly
| toxic part of the Scala community. The vast majority of
| them don't actually understand the tradeoffs and
| implementation complexity that are associated with the
| specific subset of language features they desire. The
| debate about a particular feature is often not even
| concluded in languages that include said feature. For
| example the Swift team still has serious disagreements
| about generics and their respective implementation.
|
| I've seen Pike have conversations about language design
| with SPJ, Hejlsberg, Lattner, Stroustrup, Odersky and
| other highly respective PL designers and they would never
| make such a shallow and trite comment.
| randomdata wrote:
| _> and they wrote their own assembler, linker, etc._
|
| True, but over 30 years ago. In fact, said compiler was
| the very first program written for Plan 9. Those decades
| you speak of came _afterwards_.
| geodel wrote:
| Well, it is 2011 post. Now Pike is retired, I don't think
| he cares or matters anyway.
| knorker wrote:
| I think he matters, for computer history. I never aimed
| to have him care what I say, though.
| randomdata wrote:
| _> I can 't help but feel he missed the point_
|
| Did he miss the point, or was the point to light a fire under
| the protobuf team to improve their product? Which they
| eventually did (e.g. required fields were removed after this
| post was made).
| danpalmer wrote:
| It's possible that was an aim, and I'm not familiar with
| the timelines here, however this does again focus more on
| the mechanics of protos which I still don't think matter as
| much as them being fully integrated into a company like
| they are at Google. The mechanics of them are still not
| great. Maybe this was a blocker to that happening though?
| _ZeD_ wrote:
| FYI: required fields were re-introduced in protobuf. why?
| because despite what the gobs team and others thinks, that
| is a necessary feature to work with data structures. If
| "everything is optional", what is the value of any
| structure, over sending an untyped associative array of
| stuff?
| morelisp wrote:
| "Optional" fields, as in fields you can tell if they were
| explicitly set or not, were re-introduced. Depending on
| your definition either required fields were not
| reintroduced, or required fields with a mandatory default
| value were until recently the only kind of field in
| proto3.
| Timon3 wrote:
| As awkward as Protobufs might be - is there any similar
| format with so many client bindings? I tried comparing the
| message formats I could find (e.g. Cap'n'Proto, Protobuf,
| msgpack and others), and still Protobuf seems to have the
| most supported languages. I'd be happy for any other
| suggestions!
| yegle wrote:
| It's called "Larry & Sergey Protobuf Moving Co." for a
| reason. See the t-shirt design in the video screenshot here:
| https://isocpp.org/blog/2020/07/cppcon-2019-there-are-no-
| zer...
|
| Disclaimer: an employee at the said moving company.
| kunley wrote:
| Nice recap.
|
| By the way, just curious what is your opinion on MessagePack?
| dmi wrote:
| > If all you want to send is an array of integers, why should you
| have to put it into a struct first?
|
| If you're sure that's all you'll ever have to do, then sure. But
| unless you're 100% certain that the protocol will never evolve
| further, having a more complex structure allows it to change in a
| gradual way.
| lsaferite wrote:
| It was clear, from the post, that they were saying, "If all I
| need is a simple array, why should I be required to wrap it in
| a struct?" The whole point (from the post) being that protobuf
| required structs but gob allowed simpler types _in addition_ to
| structs.
| Thorrez wrote:
| dmi knows that. dmi was saying that even if the encoding
| scheme allows encoding simpler types, it's often not smart to
| use that functionality, because you won't be able to evolve
| the format in the future. If you encode a message instead of
| a simple type, you'll be able to evolve it later as you add
| more features to your program.
|
| Note that even protobufs, which doesn't allow encoding simple
| types at the top level, still has this debate when deciding
| whether to encode an array of simple types (inside a struct)
| or an array of structs (inside a struct). And Google's
| guidance is to use an array of structs if more data might be
| needed in the future:
|
| >However, if additional data is likely to be needed in the
| future, repeated fields should use a message instead of a
| scalar proactively, to avoid parallel repeated fields.
|
| https://google.aip.dev/144
|
| >// Good: A separate message that can grow to include more
| fields
|
| https://protobuf.dev/programming-guides/api/#order-
| independe...
| orf wrote:
| _all I need _right now_ is a simple array_
|
| Nobody knows the future, and preparing for the future is a
| huge part of software engineering. Sending top-level arrays
| instead of sending them inside a struct is never the right
| way.
| assbuttbuttass wrote:
| Gob is a great serialization format! It's super easy to use, and
| supports go native types (kind of like Python's pickle).
|
| For a recent project, I needed a simple key-value store. I was
| evaluating using a full RDBMS, but I ended up just putting gob
| files in a directory.
| jeffrallen wrote:
| I used gob for my first client/server Go program, which was a
| "make one of something you know about to throw away" new language
| experiment. It worked, but I quickly turned away from it, because
| it would never be cross platform.
|
| I saw gob more as an experiment that the Go team used to check
| the reflect package's usability. (Which sucks anyway, by the
| way.)
|
| I'm surprised it's still in the stdlib. I would have guessed it
| would have been removed for Go 1.0, because it was already clear
| then that it was not suitable for anything more experiments.
| sebstefan wrote:
| >[Required fields are] also a maintenance problem. Over time, one
| may want to modify the data definition to remove a required
| field, but that may cause existing clients of the data to crash.
|
| Okay, but would you rather have it crash or allow for a program
| to run on the wrong data? Especially if you do that and then say
| that everything has zero as a default value.
|
| The question remains whether the serialization format should be
| taking care of that, or a round of parsing later on with a schema
| on the side; but if you do the former without the latter you're
| setting yourself up for deployment nightmares
| tomohawk wrote:
| If you want to deal with the crash and justify why the system
| went down because you were more correct than the other guys,
| then sure.
|
| Protocols often represent an interface between organizations.
| Especially when that is the case, you want to be as charitable
| as possible when accepting input, because getting any issues
| resolved may very well require getting the two organizations to
| agree.
|
| Also, as things change over time, an overly strict
| interpretation when receiving packets will require unnecessary
| rework in the future, and possibly down time or lost business.
|
| When dealing with protocols, it's generally best to be strict
| when emitting packets and as tolerant as possible when
| accepting them.
| sebstefan wrote:
| That's the motto for browsers and I agree with it in context,
| but if it's something you control (like services of a
| distributed application) then not really. You can just make
| sure the versions match during deployment and save yourself
| some debugging headaches
|
| Not if it's something sensitive either, where maybe crashing
| is preferable to running the wrong way
| mst wrote:
| Largely agree, with the addendum that it's a really good idea
| to collect metrics as to how much tolerance your code has
| been required to show. Whether you need to present those
| metrics to the sender and ask them to tweak their emissions
| or simply keen an eye on them is situation dependent, but
| having them at all is definitely in the "future you will
| thank current you later" ... and I will absolutely confess
| that current me has cursed past me for not doing so on more
| than one occasion, and I can only hope I remember more often
| in the future ;)
| cosmic_quanta wrote:
| Indeed, I'd rather have the program crash rather than repeat
| this nightmare: https://specbranch.com/posts/knight-capital/
| art_vandalay wrote:
| I forgot Go was still around. Thanks for reminding me.
| broken_broken_ wrote:
| Just finished removing this encoding in our production services.
|
| It panics on malformed input which is a no go for us since high
| availability is really important for us, and it showed quite a
| lot in the performance and memory profiles (roughly 5 times the
| time and memory as doing the same with JSON).
|
| The code was converting some data to gob, and storing it in the
| database for later.
|
| We now just do the same but in json, it's human readable and
| Postgres validates that the data is valid JSON.
|
| And unmarshaling it does not panic.
| jjtheblunt wrote:
| Have you tried the superset of json from AWS?
|
| https://amazon-ion.github.io/ion-docs/
| abtinf wrote:
| I've been considering adopting the gob package. I haven't used
| it before, so I only know what's in the docs -- and all of your
| claims are surprising to me. Could you share more information?
|
| How is it possible that they were getting malformed input? This
| was happening in go-to-go communication, or was there some kind
| of cross-language interop?
|
| Any idea why the performance was so much slower than JSON in
| your case? The technique described in the OP would seem to make
| that impossible.
|
| Do you think it's possible the database column type or
| collation was somehow affecting the gob?
| broken_broken_ wrote:
| The column type was bytea (basically blob) so it should be
| stored as is by the database. The profiling showed the
| hotspots in the gob package directly.
|
| The docs explicitly mention that invalid input will make it
| panic and that can be confirmed by reading the code or
| fuzzing the input.
|
| From my understanding, there is no compile time schema so
| everything is done with runtime reflection and that is bound
| to not be super fast. Granted, JSON is the same on paper, I
| would guess that the JSON package had more eyes on it and
| optimizations.
|
| In our case, everything was using JSON except this one
| component due to some historical oddity so it was also a win
| in terms of simplifying.
| scottlawson wrote:
| If the issue is panic why not create a wrapper func with
| recover and present the same interface that you want?
| grose wrote:
| That's interesting because I've had basically the opposite
| experience. I used encoding/json with BadgerDB and saw that
| json.Unmarshal in a hot loop was using about 68% of total CPU
| time in a profile taken from production. By switching to gob it
| significantly decreased to around 28% (for gob's decode
| function). I've read that decoding interfaces in gob is
| slow[1], maybe that accounts for my difference as I don't have
| any in this particular struct. Also this was a very read-heavy
| service, so that could be a major difference as well.
|
| [1]: https://groups.google.com/g/golang-nuts/c/12qhqiG1J70
| jerf wrote:
| FWIW, this isn't used much by the community. Being a standard
| library package it still get some use of course, but for
| comparison, encoding/gob shows about 22.5K imports [1] to
| encoding/json's nearly 800K, and whereas you can see in the JSON
| search an ecosystem of JSON libraries, gob is basically just gob.
|
| Calling it "dead" just invites a tedious thread about what the
| definition of "dead" is, so I won't, I'll just sort of imply it
| in this sentence without actually coming out and saying it in a
| clear manner. I would generally both A: recommend against this,
| not necessarily as a dire warning, just, you know, a
| recommendation and B: for anyone who is perturbed by the idea of
| this existing, just be aware that it's not like this package has
| embedded itself into the Go ecosystem or anything.
|
| [1]: https://pkg.go.dev/search?q=gob
|
| [2]: https://pkg.go.dev/search?q=json
| emmanueloga_ wrote:
| Someone made a benchmark of serialization libraries in go [1],
| and I was surprised to see gobs is one of the slowest ones,
| specially for decoding. I suspect part of the reason is that the
| API doesn't not allow reusing decoders [2]. From my explorations
| it seems like both JSON [3], message-pack [4] and CBOR [5] are
| better alternatives.
|
| By the way, in Go there are a like a million JSON encoders
| because a lot of things in the std library are not really coded
| for maximum performance but more for easy of usage, it seems.
| Perhaps this is the right balance for certain things (ex: the
| http library, see [6]).
|
| There are also a bunch of libraries that allow you to modify a
| JSON file "in place", without having to fully deserialize into
| structs (ex: GJSON/SJSON [7] [8]). This sounds very convenient
| and more efficient that fully de/serializing if we just need to
| change the data a little.
|
| --
|
| 1: https://github.com/alecthomas/go_serialization_benchmarks
|
| 2:
| https://github.com/golang/go/issues/29766#issuecomment-45492...
|
| --
|
| 3: https://github.com/goccy/go-json
|
| 4: https://github.com/vmihailenco/msgpack
|
| 5: https://github.com/fxamacker/cbor
|
| --
|
| 6: https://github.com/valyala/fasthttp#faq
|
| --
|
| 7: https://github.com/tidwall/gjson
|
| 8: https://github.com/tidwall/sjson
| dang wrote:
| Not gobs of comments but discussed at the time:
|
| _Gobs of data_ - https://news.ycombinator.com/item?id=2365430 -
| March 2011 (2 comments)
| zgiber wrote:
| It may not be a good tool for communicating between services
| implemented in different languages. But i'd happily use it to
| save stuff to disk where database is overkill.
___________________________________________________________________
(page generated 2023-12-04 23:01 UTC)