hngopher.com

       [HN Gopher] Amazon Ion Specification
       ___________________________________________________________________
        
       Amazon Ion Specification
        
       Author : rewmie
       Score  : 58 points
       Date   : 2023-09-13 17:11 UTC (5 hours ago)
        
 (HTM) web link (amazon-ion.github.io)
 (TXT) w3m dump (amazon-ion.github.io)
        
       | Zenul_Abidin wrote:
       | Am I right in assuming that this is like Protobuf but just for
       | JSON objects?
        
         | jonhohle wrote:
         | It's a superset of JSON with an isomorphic binary encoding,
         | additional data types (blobs, s-exps, timestamps, symbols,
         | etc.), better number handling, annotations, and the ability to
         | pre-share symbol metadata for more efficient binary encoding
         | (similar to how protobufs encodes fields, but optional).
         | 
         | You can write Ion by hand (like JSON) and share it without a
         | schema (unlike protobufs). There's fewer ways to express values
         | than YAML, but more data types.
         | 
         | Having S-exps is convenient for writing DSLs in a data language
         | that's easily readable from other languages.
        
       | satvikpendem wrote:
       | Can this be used similarly to GraphQL?
        
         | kevan wrote:
         | This is just the data serialization format, you have to build
         | any other functionality yourself. We do have a pattern on a few
         | of our APIs where there's a big fixed schema (i.e. it's just a
         | struct and you can't do GraphQL things like following
         | references and hydrating them into objects) and clients select
         | the subset of attributes they want and we only return that.
         | It's useful for reducing response sizes but the main benefit is
         | we can pretty easily track which attributes are actually used
         | over time. That helps us deprecate attributes with a lot less
         | pain.
        
       | nathas wrote:
       | lol, the internal docs on this at Amazon were something very
       | close to "we invented this before Avro and we think that's
       | probably a better choice if you need binary serialization."
       | 
       | My 2 cents: don't use it.
        
         | oh_come_on wrote:
         | [dead]
        
         | postalrat wrote:
         | My 2 cents: don't use avro or anything like it unless you can
         | prove its going to save you money
        
           | tombert wrote:
           | What would you suggest? Just JSON everywhere?
        
             | postalrat wrote:
             | json, csv, text, html, binary blobs you dont create,
             | whatever is easiest
        
               | Waterluvian wrote:
               | cbor even.
               | 
               | Honestly, if you're in a case where you absolutely know
               | none of these work for you and you can absolutely prove
               | you need another, you're probably just going to write
               | your own. And that's a fleetingly rare case.
        
               | tombert wrote:
               | I have never used Ion so I cannot speak to its use in
               | practice, but I haven't really had too much of an issue
               | with msgpack. It's faster than JSON, more compressed than
               | JSON, without being any more difficult than any JSON
               | library I've used. It's an almost-universal good for me;
               | the only thing you lose is the ability to easily
               | introspect the messages if there's an issue.
        
             | rahkiin wrote:
             | I'm interested in the answer as well. Also interested
             | what's wrong with Ion
        
       | ChrisArchitect wrote:
       | (2016)
        
       | jsnell wrote:
       | Previous discussions on Ion:
       | 
       | https://news.ycombinator.com/item?id=29284428 (2 years ago, 229
       | comments)
       | 
       | https://news.ycombinator.com/item?id=23921610 (3 years ago, 110
       | comments)
       | 
       | https://news.ycombinator.com/item?id=11546098 (7 years ago, 163
       | comments)
        
       | mikece wrote:
       | Am I reading this right that it's a binary format for real-time
       | streaming data, similar to Avro, but can include arbitrarily deep
       | nested structures unlike Avro?
        
         | leef wrote:
         | Ion has a binary format but is not specifically about real-time
         | streaming. It is a JSON replacement.
         | 
         | Ion originated 10+ years ago from the Amazon catalog team - the
         | team that kept data about the hundreds of millions of items
         | available on Amazon. Nearly every team in the company called
         | the catalog to get information about items all the time -
         | scanning the entire catalog, parts of the catalog, millions of
         | individual item lookups every second, etc.
         | 
         | They did the math and some very large percentage of network
         | traffic in Amazon Retails data centers was catalog data. If
         | that data, currently in XML or JSON format, was sent in a more
         | compact format it would save some ridiculous millions of
         | dollars every year. So Ion was born and eventually open
         | sourced.
        
         | jauntywundrkind wrote:
         | Why do you single out avro & not any of the hundreds of other
         | ser/de systems? Is that what you know best? Is there something
         | specific about avro that makes it feel particularly similar?
         | https://github.com/maximveksler/awesome-serialization
        
           | mikece wrote:
           | I work in an a shop where Kafka and Avro are everywhere. If I
           | worked someone else I might make reference to something else
           | if it was front-of-mind all the time.
        
           | PaulHoule wrote:
           | People keep inventing new ones because the old ones suck or
           | they think the old ones suck. Look at all the discontents
           | around JSON (no comments!), people react violently when
           | people try to apply a little extra like JSON-LD. Then there
           | are all the things like YAML, TOML and such that try to be a
           | little better but are widely thought to be a little worse.
           | (And that's just the human readable data formats) Then there
           | is always
           | 
           | https://en.wikipedia.org/wiki/ASN.1
           | 
           | which is forgotten but not gone.
        
           | abeppu wrote:
           | In fairness, in the list you link to,
           | 
           | - Avro and Ion are the only two that are labeled
           | Textual/Binary
           | 
           | - They are in the same Big Data grouping
           | 
           | - They both are schema-embedded, and support some rich nested
           | datastructures, though they deviate on many of the specifics
           | 
           | So I think it's reasonable to pick out Avro as an especially
           | similar point of comparison.
        
       | mdaniel wrote:
       | On the one hand, this seems more real than a lot of promotion-
       | ware I've seen: https://github.com/amazon-ion/ion-intellij-
       | plugin#readme
       | 
       | On the other hand, they're not using this for the boto schemas,
       | which seems like a natural place to show that it's able to
       | capture real-world schemas so that makes it hard for me to think
       | this has any traction
        
         | jesterpm wrote:
         | Ion is heavily used on the retail side of Amazon, but it's only
         | recently started to appear in AWS products.
         | 
         | AWS is starting support PartiQL (https://partiql.org/) queries
         | in some places and PartiQL uses Ion's type system internally.
        
         | kevan wrote:
         | The SDKs use Smithy[1] which is tailored for
         | defining+generating services and SDKs, Ion is more of a pure
         | data serialization format. It's definitely niche but my org
         | uses it in a few places and it has some nice properties that
         | fit our case pretty well (rapidly evolving schema, most clients
         | only care about a small subset of attributes, ability to apply
         | multiple and different schemas based on regions or businesses).
         | 
         | It's the sort of thing where I'd advise exploring other options
         | first and only using it if the whys[2] really resonate with you
         | because it definitely comes with some overhead.
         | 
         | [1] https://smithy.io/2.0/index.html [2] https://amazon-
         | ion.github.io/ion-docs/guides/why.html
        
       | glonq wrote:
       | What's the pros and cons of this versus CBOR, which we had great
       | success with in our system.
       | 
       | https://cbor.io
        
         | leonardspeiser wrote:
         | Pros of Ion vs CBOR:
         | 
         | Wider range of data types - Ion supports decimals, symbols,
         | blobs, and clobs which don't exist in CBOR. Optional schemas
         | and annotations - Ion allows attaching type/schema information
         | to data for validation purposes. CBOR has no schema support.
         | Text format - Ion provides a human-readable text format for
         | data interchange, CBOR is binary only. Maturity - Ion has been
         | used in production at Amazon since 2009, CBOR is a newer
         | standard (RFC 7049 in 2014). Language support - More mature
         | library ecosystem around Ion vs CBOR which is still gaining
         | adoption.
         | 
         | Pros of CBOR vs Ion:
         | 
         | Standardized - CBOR is an IETF standard, Ion is an Amazon-
         | proprietary format. Simplicity - CBOR has a smaller set of
         | basic data types making it simpler to implement. Used in other
         | standards - CBOR is used in data formats like COSE for crypto
         | operations and CWT for web tokens. Efficiency - The CBOR binary
         | format can have a smaller encoding size than Ion's. JSON
         | interoperability - CBOR is designed to be a JSON-compatible
         | binary format. Ion is JSON-like but not fully compatible.
         | 
         | In summary, Ion has richer data typing and schema capabilities
         | and a long production history. But CBOR is simpler,
         | standardized, and gaining momentum - especially in crypto and
         | web standards using it as a binary encoding basis.
         | 
         | So Ion may be better for applications dealing with complex,
         | annotated data. But CBOR has advantages for an efficient binary
         | interchange format, particularly when standards compatibility
         | is important.
        
       | tombert wrote:
       | I would be interested to see how this compares to something like
       | msgpack [1] in performance and final size of the binary. Msgpack
       | has been my go-to for binary serialization for years due to how
       | simple and fast it is, and how easy it is to make it work with
       | native Clojure data structures.
       | 
       | [1] https://msgpack.org/index.html
        
         | jesterpm wrote:
         | That comparison would depend heavily on what you're storing.
         | 
         | Ion has the option of using symbol tables to replace strings
         | (e.g. in struct/map keys or in values). So, if you benchmark
         | had a large number of records with similar structures, I would
         | expect Ion to pull ahead. On the other hand, if each record had
         | nothing in common, I'd expect them to perform similarly.
         | 
         | One feature of the Ion libraries that I've liked is the parser
         | will take any of the formats and figure out what to do with it
         | (text, binary, compressed binary). It's one less thing to worry
         | about. You can switch encodings later without breaking
         | consumers, you can write plain text Ion when you're testing,
         | etc.
        
           | plq wrote:
           | Symbol tables, compression, etc seem one level of abstraction
           | above what msgpack provides. Such features could be
           | implemented on top of vanilla msgpack as long as all parties
           | agree on the msgpack schema.
        
       | news_to_me wrote:
       | Saw this the other day, but the multiple types of null kind of
       | turned me off - e.g. `null.int`, `null.float`, `null.null`. Is
       | there a good justification for this? Seems like a kluge in any
       | case.
        
         | steveBK123 wrote:
         | typed nulls good
        
           | throwbadubadu wrote:
           | Sounds like the grug brained developer speaking... Hi! ;)
        
         | spaceywilly wrote:
         | Seems like the justification would be to keep the type
         | information when going back and forth to Ion. More like
         | "multiple nullable types" instead of "multiple types of null"
         | 
         | userBirthDay: null <-- ok, but what type is it? String? Int?
         | Timestamp?
         | 
         | userBirthDay: null.timestamp <-- ok, it's a timestamp typed
         | variable, but we don't know the value. Yay, happy programmer.
        
         | zyang wrote:
         | My guess is buffer size calculations.
        
       ___________________________________________________________________
       (page generated 2023-09-13 23:01 UTC)