hngopher.com

       [HN Gopher] Gobs of data (2011)
       ___________________________________________________________________
        
       Gobs of data (2011)
        
       Author : ash
       Score  : 82 points
       Date   : 2023-12-04 12:15 UTC (10 hours ago)
        
 (HTM) web link (go.dev)
 (TXT) w3m dump (go.dev)
        
       | buro9 wrote:
       | this is quite old, so I'm curious about what triggered it being
       | posted again, has something happened / changed?
        
         | jstanley wrote:
         | Just because you already knew it all doesn't mean everyone else
         | did. I hadn't seen it before.
         | 
         | Sometimes even when something was posted a few years ago some
         | people just haven't seen it yet.
        
           | sudhirj wrote:
           | Ten thousand people, to be exact https://xkcd.com/1053/
        
           | blowski wrote:
           | It's an entirely reasonable question to ask "is there any
           | specific as to why this is being posted today?". If the
           | answer is no, that's fine, but there may be extra context
           | that is interesting and not obvious.
        
         | ash wrote:
         | I've posted it because I'm always on the lookout for simple
         | solutions for complex problems, and especially for how these
         | solutions are designed. The post describes the design process
         | well.
         | 
         | Also Rob Pike is a great technical writer. Another example of
         | his style is "Effective Go":
         | 
         | https://go.dev/doc/effective_go
        
           | buro9 wrote:
           | yup, and if people are looking for usage I just found a gist
           | that shows how gob handling can be useful (writing to cache
           | that allows the reading back to be castable into the correct
           | structs) https://gist.github.com/pioz/ca5b7a11200f54afbd76dee
           | 7acbcc06...
        
       | azaras wrote:
       | I did not know it, but I think so few changes from the proto-
       | buffer that it is a waste of time.
        
         | bheadmaster wrote:
         | Note that this was written in 2011, while the first mention of
         | "proto3" in protobuf repository was in 2014. So this blogpost
         | probably influenced the development of proto3, which fixed many
         | issues of proto2 (which is referred to as just "protocol
         | buffers" in the blogpost).
        
           | icholy wrote:
           | Eh, I like Go and respect Rob Pike, but I seriously doubt gob
           | had any impact on the proto3 design
        
       | losvedir wrote:
       | Interesting. I wonder to what extent it's found use at Google
       | over this past decade.
       | 
       | There are advantage to being language-specific, but a lot of
       | disadvantages, as well (speaking as someone who recently had to
       | write some Elixir code to unmarshal a Ruby object...). It seems
       | hard to introduce this since you're forcing all communicating
       | services to be Go-based, which is kind of contrary to the
       | independence that microservices usually affords you.
       | 
       | Some of the benefits are simply design goals (e.g., top level
       | arrays) which could also be done in a language-independent
       | protocol. And even performance questions _probably_ could. Like,
       | Cap 'n Proto I think is designed so that users of the protocol
       | don't have to serialize/deserialize the data, right? They just
       | pass it around and work with it directly.
       | 
       | I can see Rob Pike being frustrated with Protocol Buffers at
       | Google, and I don't begrudge anyone for taking a big shot like
       | this, but I wonder if he's found any success with it.
        
         | lifthrasiir wrote:
         | Yeah, after years of dealing with language-specific
         | serialization formats---and inadvertently learning internals of
         | them (including Go gob, Python pickle and PHP serialize), I'm
         | over. And gob is not even a schematic serialization format
         | (i.e. not only you don't need to define a schema beforehand,
         | you _can 't_). There is some interesting idea, but that's all.
         | Use a well-known schemaless serialization format with some
         | extensibility [1] if you really need.
         | 
         | [1] Maybe there was no suitable one when Go was first created.
         | Nowadays I believe CBOR is the best format for this job.
        
           | packetlost wrote:
           | I'm in the same boat. Not to mention security concerns that
           | often crop up in (interpreted) language specific
           | deserialization (I'm looking at you, pickle, thinly veiled
           | `eval()`). I agree that CBOR should generally be the
           | serialization tool of choice for self-describing data (ie. in
           | places where you might otherwise choose JSON).
           | 
           | And if your language of choice doesn't have a CBOR lib, CBOR
           | is fairly easy to implement and writing a encoder/decoder is
           | very fun! I recently completed my implementation for the
           | Gerbil Scheme language last week [0].
           | 
           | [0]: https://github.com/chiefnoah/gerbil-cbor
        
           | twotwotwo wrote:
           | Yep, I very much agree with this. It's probably inevitable
           | that languages with reflection grow some kind of language-
           | specific serialization format because a) they can and b)
           | there's often _some_ use case where it looks handy to have a
           | format adapted to the quirks of the language. Plus, when some
           | of these bespoke formats were created, the world hadn 't
           | converged as much as it has now around a few common text and
           | binary formats.
           | 
           | But now the interoperable serialization formats have a lot
           | more energy spent on tools and such, and the formats are
           | better-specified, to the point that you probably want to use
           | them even where interoperability with other stuff doesn't
           | force it.
        
         | danpalmer wrote:
         | As an engineer at Google, my opinion on Protocol Buffers
         | changed massively. Pre-Google I found them awkward, the
         | language bindings in Python sucked, and I didn't really see the
         | point. I knew a schema for services was a good idea, but
         | protobufs didn't seem like the best option.
         | 
         | The thing is that at Google protobufs are used everywhere.
         | Like, absolutely everywhere. Think of all the places they could
         | be, and it's way more. All the tooling understands them, code
         | search and go-to-reference works on them everywhere, they are
         | truly transformational on how many different services with many
         | different implementations interact.
         | 
         | Are they perfect? Far from it. But a Go-only implementation
         | misses almost all the value of protobufs. If I was inventing
         | them for Python I could do better (pickle? maybe not) but the
         | whole point is that they aren't language/ecosystem/use-case
         | specific. If this was Rob Pike's frustration then I can't help
         | but feel he missed the point, or this post is a little
         | disingenuous as to the benefits.
         | 
         | I've not seen Gobs used in Google, but I'd imagine an engineer
         | would need to make a very strong case against using protos
         | between services, regardless of if both services are in Go.
        
           | knorker wrote:
           | In my opinion this is pretty on-brand for Pike.
           | 
           | Back when he did his best work, it was possible for one
           | person to "just write the new thing", without making it fit
           | with anything else. There was nothing else to fit it with.
           | You could invent everything from scratch, and not only was it
           | not a waste of time, if you were good enough it had a chance
           | of being the best fit for purpose.
           | 
           | You could take shortcuts. You could have every part of your
           | system be "odd", because nothing was "even".
           | 
           | That's not true anymore. And the way I see it Pike has not
           | moved on.
           | 
           | Science in general had this switch at one point, too. There
           | was a point where one could know all of science. But it's
           | long gone.
        
             | danpalmer wrote:
             | Interesting hypothesis. I can't comment on Pike's history
             | here, but at Google there's certainly a noticeable
             | difference between old special cases that were built
             | pre-2015 ish, and the modern world where everything is
             | _very_ cohesive. I get the impression that there was a big
             | push to achieve that, led from around that time by various
             | products. It 's hit some areas more than others, but there
             | do seem to be almost no new products building in the way
             | you've described now.
        
             | hellozomo wrote:
             | This is an unnecessary ad hominem attack. He wrote this
             | over a decade ago, in the midst of doing his best work
             | creating Go itself.
             | 
             | gob was his opinionated way of doing a Go-specific
             | encoding, while also supporting any number of other
             | encodings in the language. Go has incredibly good support
             | for almost every popular encoding there is.
             | 
             | gob has also been used successfully by a number of
             | projects. In many cases, it's a perfectly good way to
             | encode a piece of data that is completely local to a Go
             | program.
        
               | knorker wrote:
               | > This is an unnecessary ad hominem attack
               | 
               | I didn't mean it as such.
               | 
               | > in the midst of doing his best work creating Go itself.
               | 
               | Not to go too far off topic, but Go is another example.
               | It famously ignores decades of language theory, and they
               | wrote their own assembler, linker, etc.
               | 
               | Now, much of that has been undone and rewritten, as Go
               | became more adopted, requiring playing well with the rest
               | of the ecosystem.
               | 
               | (but much of it we're unfortunately stuck with, because
               | it's part of the language)
               | 
               | 30 years ago there was no ecosystem to play well with,
               | and compared to now we were just banging rocks together.
               | Back then you could be a CS polymath as one person. Well,
               | _I_ couldn 't, but Pike could.
               | 
               | It was the old days of John Carmack starting every game
               | engine with an empty directory.
               | 
               | I'm saying that today nobody can. Even John Carmack could
               | not on his own write a AAA game. (I know ID had other
               | coders, but my point stands)
        
               | hellozomo wrote:
               | Well, to go off-topic with you, the idea that Go "ignores
               | decades of language theory" is just an opinion that you
               | hold. A factual statement would be to say that the Go
               | designers _omitted_ a great many features other languages
               | have included.
               | 
               | The idea that they did so out of ignorance is ridiculous,
               | given the background of the Go design team. They made
               | considered decisions about what to include in Go.
               | 
               | A fixed-gear bicycle is not "ignoring" decades of bicycle
               | design theory.
               | 
               | Creating a programming language is an _engineering_ task
               | not a _theoretical_ task. Which means there are major
               | trade-offs to be made. And they chose their trade-offs.
               | The wild success of Go should at least make you consider
               | whether or not they 're better at making these trade-offs
               | than you, and most language designers, are.
               | 
               | Maybe they were wrong about some choices, which is why Go
               | is still evolving, but they were self-evidently mostly
               | correct.
        
               | knorker wrote:
               | Well, that's your opinion.
               | 
               | You seem to have gotten a bit emotional about this, so I
               | don't think this'll go anywhere.
        
               | biorach wrote:
               | The comment you're replying to is calm, reasoned and
               | courteous.
               | 
               | Claiming the poster is being emotional is... poor form,
               | to put it mildly
        
               | knorker wrote:
               | Oh. I found it very defensive and aggressive.
               | 
               | Putting scare quotes around what I said, calling what I
               | said "ridiculous", and seems to have taken critique of
               | the Go language personally by saying that the Go language
               | designers better than most language designers.
               | 
               | So this is why I found the reply neither calm nor
               | courteous. As for reasoned, I don't see any reasoning it
               | it, just conclusions.
               | 
               | I don't even see any sign of a refutation of my point,
               | which isn't so much about if Go is good or not, but about
               | the needless reinventing and ignoring other work, and as
               | a result running into problems that would have been
               | predictable, had they been taken into account.
               | 
               | To me it summarizes as "they're smart, you're dumb".
               | 
               | If this is the type of replies this person would write,
               | then that's not productive.
        
               | dsff3f3f3f wrote:
               | Most of the Reddit/HN/Twitter crowd that are parroting
               | the "Go ignores decades of language theory" line are just
               | parroting something that originated in an incredibly
               | toxic part of the Scala community. The vast majority of
               | them don't actually understand the tradeoffs and
               | implementation complexity that are associated with the
               | specific subset of language features they desire. The
               | debate about a particular feature is often not even
               | concluded in languages that include said feature. For
               | example the Swift team still has serious disagreements
               | about generics and their respective implementation.
               | 
               | I've seen Pike have conversations about language design
               | with SPJ, Hejlsberg, Lattner, Stroustrup, Odersky and
               | other highly respective PL designers and they would never
               | make such a shallow and trite comment.
        
               | randomdata wrote:
               | _> and they wrote their own assembler, linker, etc._
               | 
               | True, but over 30 years ago. In fact, said compiler was
               | the very first program written for Plan 9. Those decades
               | you speak of came _afterwards_.
        
             | geodel wrote:
             | Well, it is 2011 post. Now Pike is retired, I don't think
             | he cares or matters anyway.
        
               | knorker wrote:
               | I think he matters, for computer history. I never aimed
               | to have him care what I say, though.
        
           | randomdata wrote:
           | _> I can 't help but feel he missed the point_
           | 
           | Did he miss the point, or was the point to light a fire under
           | the protobuf team to improve their product? Which they
           | eventually did (e.g. required fields were removed after this
           | post was made).
        
             | danpalmer wrote:
             | It's possible that was an aim, and I'm not familiar with
             | the timelines here, however this does again focus more on
             | the mechanics of protos which I still don't think matter as
             | much as them being fully integrated into a company like
             | they are at Google. The mechanics of them are still not
             | great. Maybe this was a blocker to that happening though?
        
             | _ZeD_ wrote:
             | FYI: required fields were re-introduced in protobuf. why?
             | because despite what the gobs team and others thinks, that
             | is a necessary feature to work with data structures. If
             | "everything is optional", what is the value of any
             | structure, over sending an untyped associative array of
             | stuff?
        
               | morelisp wrote:
               | "Optional" fields, as in fields you can tell if they were
               | explicitly set or not, were re-introduced. Depending on
               | your definition either required fields were not
               | reintroduced, or required fields with a mandatory default
               | value were until recently the only kind of field in
               | proto3.
        
           | Timon3 wrote:
           | As awkward as Protobufs might be - is there any similar
           | format with so many client bindings? I tried comparing the
           | message formats I could find (e.g. Cap'n'Proto, Protobuf,
           | msgpack and others), and still Protobuf seems to have the
           | most supported languages. I'd be happy for any other
           | suggestions!
        
           | yegle wrote:
           | It's called "Larry & Sergey Protobuf Moving Co." for a
           | reason. See the t-shirt design in the video screenshot here:
           | https://isocpp.org/blog/2020/07/cppcon-2019-there-are-no-
           | zer...
           | 
           | Disclaimer: an employee at the said moving company.
        
         | kunley wrote:
         | Nice recap.
         | 
         | By the way, just curious what is your opinion on MessagePack?
        
       | dmi wrote:
       | > If all you want to send is an array of integers, why should you
       | have to put it into a struct first?
       | 
       | If you're sure that's all you'll ever have to do, then sure. But
       | unless you're 100% certain that the protocol will never evolve
       | further, having a more complex structure allows it to change in a
       | gradual way.
        
         | lsaferite wrote:
         | It was clear, from the post, that they were saying, "If all I
         | need is a simple array, why should I be required to wrap it in
         | a struct?" The whole point (from the post) being that protobuf
         | required structs but gob allowed simpler types _in addition_ to
         | structs.
        
           | Thorrez wrote:
           | dmi knows that. dmi was saying that even if the encoding
           | scheme allows encoding simpler types, it's often not smart to
           | use that functionality, because you won't be able to evolve
           | the format in the future. If you encode a message instead of
           | a simple type, you'll be able to evolve it later as you add
           | more features to your program.
           | 
           | Note that even protobufs, which doesn't allow encoding simple
           | types at the top level, still has this debate when deciding
           | whether to encode an array of simple types (inside a struct)
           | or an array of structs (inside a struct). And Google's
           | guidance is to use an array of structs if more data might be
           | needed in the future:
           | 
           | >However, if additional data is likely to be needed in the
           | future, repeated fields should use a message instead of a
           | scalar proactively, to avoid parallel repeated fields.
           | 
           | https://google.aip.dev/144
           | 
           | >// Good: A separate message that can grow to include more
           | fields
           | 
           | https://protobuf.dev/programming-guides/api/#order-
           | independe...
        
           | orf wrote:
           | _all I need _right now_ is a simple array_
           | 
           | Nobody knows the future, and preparing for the future is a
           | huge part of software engineering. Sending top-level arrays
           | instead of sending them inside a struct is never the right
           | way.
        
       | assbuttbuttass wrote:
       | Gob is a great serialization format! It's super easy to use, and
       | supports go native types (kind of like Python's pickle).
       | 
       | For a recent project, I needed a simple key-value store. I was
       | evaluating using a full RDBMS, but I ended up just putting gob
       | files in a directory.
        
       | jeffrallen wrote:
       | I used gob for my first client/server Go program, which was a
       | "make one of something you know about to throw away" new language
       | experiment. It worked, but I quickly turned away from it, because
       | it would never be cross platform.
       | 
       | I saw gob more as an experiment that the Go team used to check
       | the reflect package's usability. (Which sucks anyway, by the
       | way.)
       | 
       | I'm surprised it's still in the stdlib. I would have guessed it
       | would have been removed for Go 1.0, because it was already clear
       | then that it was not suitable for anything more experiments.
        
       | sebstefan wrote:
       | >[Required fields are] also a maintenance problem. Over time, one
       | may want to modify the data definition to remove a required
       | field, but that may cause existing clients of the data to crash.
       | 
       | Okay, but would you rather have it crash or allow for a program
       | to run on the wrong data? Especially if you do that and then say
       | that everything has zero as a default value.
       | 
       | The question remains whether the serialization format should be
       | taking care of that, or a round of parsing later on with a schema
       | on the side; but if you do the former without the latter you're
       | setting yourself up for deployment nightmares
        
         | tomohawk wrote:
         | If you want to deal with the crash and justify why the system
         | went down because you were more correct than the other guys,
         | then sure.
         | 
         | Protocols often represent an interface between organizations.
         | Especially when that is the case, you want to be as charitable
         | as possible when accepting input, because getting any issues
         | resolved may very well require getting the two organizations to
         | agree.
         | 
         | Also, as things change over time, an overly strict
         | interpretation when receiving packets will require unnecessary
         | rework in the future, and possibly down time or lost business.
         | 
         | When dealing with protocols, it's generally best to be strict
         | when emitting packets and as tolerant as possible when
         | accepting them.
        
           | sebstefan wrote:
           | That's the motto for browsers and I agree with it in context,
           | but if it's something you control (like services of a
           | distributed application) then not really. You can just make
           | sure the versions match during deployment and save yourself
           | some debugging headaches
           | 
           | Not if it's something sensitive either, where maybe crashing
           | is preferable to running the wrong way
        
           | mst wrote:
           | Largely agree, with the addendum that it's a really good idea
           | to collect metrics as to how much tolerance your code has
           | been required to show. Whether you need to present those
           | metrics to the sender and ask them to tweak their emissions
           | or simply keen an eye on them is situation dependent, but
           | having them at all is definitely in the "future you will
           | thank current you later" ... and I will absolutely confess
           | that current me has cursed past me for not doing so on more
           | than one occasion, and I can only hope I remember more often
           | in the future ;)
        
         | cosmic_quanta wrote:
         | Indeed, I'd rather have the program crash rather than repeat
         | this nightmare: https://specbranch.com/posts/knight-capital/
        
       | art_vandalay wrote:
       | I forgot Go was still around. Thanks for reminding me.
        
       | broken_broken_ wrote:
       | Just finished removing this encoding in our production services.
       | 
       | It panics on malformed input which is a no go for us since high
       | availability is really important for us, and it showed quite a
       | lot in the performance and memory profiles (roughly 5 times the
       | time and memory as doing the same with JSON).
       | 
       | The code was converting some data to gob, and storing it in the
       | database for later.
       | 
       | We now just do the same but in json, it's human readable and
       | Postgres validates that the data is valid JSON.
       | 
       | And unmarshaling it does not panic.
        
         | jjtheblunt wrote:
         | Have you tried the superset of json from AWS?
         | 
         | https://amazon-ion.github.io/ion-docs/
        
         | abtinf wrote:
         | I've been considering adopting the gob package. I haven't used
         | it before, so I only know what's in the docs -- and all of your
         | claims are surprising to me. Could you share more information?
         | 
         | How is it possible that they were getting malformed input? This
         | was happening in go-to-go communication, or was there some kind
         | of cross-language interop?
         | 
         | Any idea why the performance was so much slower than JSON in
         | your case? The technique described in the OP would seem to make
         | that impossible.
         | 
         | Do you think it's possible the database column type or
         | collation was somehow affecting the gob?
        
           | broken_broken_ wrote:
           | The column type was bytea (basically blob) so it should be
           | stored as is by the database. The profiling showed the
           | hotspots in the gob package directly.
           | 
           | The docs explicitly mention that invalid input will make it
           | panic and that can be confirmed by reading the code or
           | fuzzing the input.
           | 
           | From my understanding, there is no compile time schema so
           | everything is done with runtime reflection and that is bound
           | to not be super fast. Granted, JSON is the same on paper, I
           | would guess that the JSON package had more eyes on it and
           | optimizations.
           | 
           | In our case, everything was using JSON except this one
           | component due to some historical oddity so it was also a win
           | in terms of simplifying.
        
         | scottlawson wrote:
         | If the issue is panic why not create a wrapper func with
         | recover and present the same interface that you want?
        
         | grose wrote:
         | That's interesting because I've had basically the opposite
         | experience. I used encoding/json with BadgerDB and saw that
         | json.Unmarshal in a hot loop was using about 68% of total CPU
         | time in a profile taken from production. By switching to gob it
         | significantly decreased to around 28% (for gob's decode
         | function). I've read that decoding interfaces in gob is
         | slow[1], maybe that accounts for my difference as I don't have
         | any in this particular struct. Also this was a very read-heavy
         | service, so that could be a major difference as well.
         | 
         | [1]: https://groups.google.com/g/golang-nuts/c/12qhqiG1J70
        
       | jerf wrote:
       | FWIW, this isn't used much by the community. Being a standard
       | library package it still get some use of course, but for
       | comparison, encoding/gob shows about 22.5K imports [1] to
       | encoding/json's nearly 800K, and whereas you can see in the JSON
       | search an ecosystem of JSON libraries, gob is basically just gob.
       | 
       | Calling it "dead" just invites a tedious thread about what the
       | definition of "dead" is, so I won't, I'll just sort of imply it
       | in this sentence without actually coming out and saying it in a
       | clear manner. I would generally both A: recommend against this,
       | not necessarily as a dire warning, just, you know, a
       | recommendation and B: for anyone who is perturbed by the idea of
       | this existing, just be aware that it's not like this package has
       | embedded itself into the Go ecosystem or anything.
       | 
       | [1]: https://pkg.go.dev/search?q=gob
       | 
       | [2]: https://pkg.go.dev/search?q=json
        
       | emmanueloga_ wrote:
       | Someone made a benchmark of serialization libraries in go [1],
       | and I was surprised to see gobs is one of the slowest ones,
       | specially for decoding. I suspect part of the reason is that the
       | API doesn't not allow reusing decoders [2]. From my explorations
       | it seems like both JSON [3], message-pack [4] and CBOR [5] are
       | better alternatives.
       | 
       | By the way, in Go there are a like a million JSON encoders
       | because a lot of things in the std library are not really coded
       | for maximum performance but more for easy of usage, it seems.
       | Perhaps this is the right balance for certain things (ex: the
       | http library, see [6]).
       | 
       | There are also a bunch of libraries that allow you to modify a
       | JSON file "in place", without having to fully deserialize into
       | structs (ex: GJSON/SJSON [7] [8]). This sounds very convenient
       | and more efficient that fully de/serializing if we just need to
       | change the data a little.
       | 
       | --
       | 
       | 1: https://github.com/alecthomas/go_serialization_benchmarks
       | 
       | 2:
       | https://github.com/golang/go/issues/29766#issuecomment-45492...
       | 
       | --
       | 
       | 3: https://github.com/goccy/go-json
       | 
       | 4: https://github.com/vmihailenco/msgpack
       | 
       | 5: https://github.com/fxamacker/cbor
       | 
       | --
       | 
       | 6: https://github.com/valyala/fasthttp#faq
       | 
       | --
       | 
       | 7: https://github.com/tidwall/gjson
       | 
       | 8: https://github.com/tidwall/sjson
        
       | dang wrote:
       | Not gobs of comments but discussed at the time:
       | 
       |  _Gobs of data_ - https://news.ycombinator.com/item?id=2365430 -
       | March 2011 (2 comments)
        
       | zgiber wrote:
       | It may not be a good tool for communicating between services
       | implemented in different languages. But i'd happily use it to
       | save stuff to disk where database is overkill.
        
       ___________________________________________________________________
       (page generated 2023-12-04 23:01 UTC)