[HN Gopher] Amazon Ion Specification
___________________________________________________________________
Amazon Ion Specification
Author : rewmie
Score : 58 points
Date : 2023-09-13 17:11 UTC (5 hours ago)
(HTM) web link (amazon-ion.github.io)
(TXT) w3m dump (amazon-ion.github.io)
| Zenul_Abidin wrote:
| Am I right in assuming that this is like Protobuf but just for
| JSON objects?
| jonhohle wrote:
| It's a superset of JSON with an isomorphic binary encoding,
| additional data types (blobs, s-exps, timestamps, symbols,
| etc.), better number handling, annotations, and the ability to
| pre-share symbol metadata for more efficient binary encoding
| (similar to how protobufs encodes fields, but optional).
|
| You can write Ion by hand (like JSON) and share it without a
| schema (unlike protobufs). There's fewer ways to express values
| than YAML, but more data types.
|
| Having S-exps is convenient for writing DSLs in a data language
| that's easily readable from other languages.
| satvikpendem wrote:
| Can this be used similarly to GraphQL?
| kevan wrote:
| This is just the data serialization format, you have to build
| any other functionality yourself. We do have a pattern on a few
| of our APIs where there's a big fixed schema (i.e. it's just a
| struct and you can't do GraphQL things like following
| references and hydrating them into objects) and clients select
| the subset of attributes they want and we only return that.
| It's useful for reducing response sizes but the main benefit is
| we can pretty easily track which attributes are actually used
| over time. That helps us deprecate attributes with a lot less
| pain.
| nathas wrote:
| lol, the internal docs on this at Amazon were something very
| close to "we invented this before Avro and we think that's
| probably a better choice if you need binary serialization."
|
| My 2 cents: don't use it.
| oh_come_on wrote:
| [dead]
| postalrat wrote:
| My 2 cents: don't use avro or anything like it unless you can
| prove its going to save you money
| tombert wrote:
| What would you suggest? Just JSON everywhere?
| postalrat wrote:
| json, csv, text, html, binary blobs you dont create,
| whatever is easiest
| Waterluvian wrote:
| cbor even.
|
| Honestly, if you're in a case where you absolutely know
| none of these work for you and you can absolutely prove
| you need another, you're probably just going to write
| your own. And that's a fleetingly rare case.
| tombert wrote:
| I have never used Ion so I cannot speak to its use in
| practice, but I haven't really had too much of an issue
| with msgpack. It's faster than JSON, more compressed than
| JSON, without being any more difficult than any JSON
| library I've used. It's an almost-universal good for me;
| the only thing you lose is the ability to easily
| introspect the messages if there's an issue.
| rahkiin wrote:
| I'm interested in the answer as well. Also interested
| what's wrong with Ion
| ChrisArchitect wrote:
| (2016)
| jsnell wrote:
| Previous discussions on Ion:
|
| https://news.ycombinator.com/item?id=29284428 (2 years ago, 229
| comments)
|
| https://news.ycombinator.com/item?id=23921610 (3 years ago, 110
| comments)
|
| https://news.ycombinator.com/item?id=11546098 (7 years ago, 163
| comments)
| mikece wrote:
| Am I reading this right that it's a binary format for real-time
| streaming data, similar to Avro, but can include arbitrarily deep
| nested structures unlike Avro?
| leef wrote:
| Ion has a binary format but is not specifically about real-time
| streaming. It is a JSON replacement.
|
| Ion originated 10+ years ago from the Amazon catalog team - the
| team that kept data about the hundreds of millions of items
| available on Amazon. Nearly every team in the company called
| the catalog to get information about items all the time -
| scanning the entire catalog, parts of the catalog, millions of
| individual item lookups every second, etc.
|
| They did the math and some very large percentage of network
| traffic in Amazon Retails data centers was catalog data. If
| that data, currently in XML or JSON format, was sent in a more
| compact format it would save some ridiculous millions of
| dollars every year. So Ion was born and eventually open
| sourced.
| jauntywundrkind wrote:
| Why do you single out avro & not any of the hundreds of other
| ser/de systems? Is that what you know best? Is there something
| specific about avro that makes it feel particularly similar?
| https://github.com/maximveksler/awesome-serialization
| mikece wrote:
| I work in an a shop where Kafka and Avro are everywhere. If I
| worked someone else I might make reference to something else
| if it was front-of-mind all the time.
| PaulHoule wrote:
| People keep inventing new ones because the old ones suck or
| they think the old ones suck. Look at all the discontents
| around JSON (no comments!), people react violently when
| people try to apply a little extra like JSON-LD. Then there
| are all the things like YAML, TOML and such that try to be a
| little better but are widely thought to be a little worse.
| (And that's just the human readable data formats) Then there
| is always
|
| https://en.wikipedia.org/wiki/ASN.1
|
| which is forgotten but not gone.
| abeppu wrote:
| In fairness, in the list you link to,
|
| - Avro and Ion are the only two that are labeled
| Textual/Binary
|
| - They are in the same Big Data grouping
|
| - They both are schema-embedded, and support some rich nested
| datastructures, though they deviate on many of the specifics
|
| So I think it's reasonable to pick out Avro as an especially
| similar point of comparison.
| mdaniel wrote:
| On the one hand, this seems more real than a lot of promotion-
| ware I've seen: https://github.com/amazon-ion/ion-intellij-
| plugin#readme
|
| On the other hand, they're not using this for the boto schemas,
| which seems like a natural place to show that it's able to
| capture real-world schemas so that makes it hard for me to think
| this has any traction
| jesterpm wrote:
| Ion is heavily used on the retail side of Amazon, but it's only
| recently started to appear in AWS products.
|
| AWS is starting support PartiQL (https://partiql.org/) queries
| in some places and PartiQL uses Ion's type system internally.
| kevan wrote:
| The SDKs use Smithy[1] which is tailored for
| defining+generating services and SDKs, Ion is more of a pure
| data serialization format. It's definitely niche but my org
| uses it in a few places and it has some nice properties that
| fit our case pretty well (rapidly evolving schema, most clients
| only care about a small subset of attributes, ability to apply
| multiple and different schemas based on regions or businesses).
|
| It's the sort of thing where I'd advise exploring other options
| first and only using it if the whys[2] really resonate with you
| because it definitely comes with some overhead.
|
| [1] https://smithy.io/2.0/index.html [2] https://amazon-
| ion.github.io/ion-docs/guides/why.html
| glonq wrote:
| What's the pros and cons of this versus CBOR, which we had great
| success with in our system.
|
| https://cbor.io
| leonardspeiser wrote:
| Pros of Ion vs CBOR:
|
| Wider range of data types - Ion supports decimals, symbols,
| blobs, and clobs which don't exist in CBOR. Optional schemas
| and annotations - Ion allows attaching type/schema information
| to data for validation purposes. CBOR has no schema support.
| Text format - Ion provides a human-readable text format for
| data interchange, CBOR is binary only. Maturity - Ion has been
| used in production at Amazon since 2009, CBOR is a newer
| standard (RFC 7049 in 2014). Language support - More mature
| library ecosystem around Ion vs CBOR which is still gaining
| adoption.
|
| Pros of CBOR vs Ion:
|
| Standardized - CBOR is an IETF standard, Ion is an Amazon-
| proprietary format. Simplicity - CBOR has a smaller set of
| basic data types making it simpler to implement. Used in other
| standards - CBOR is used in data formats like COSE for crypto
| operations and CWT for web tokens. Efficiency - The CBOR binary
| format can have a smaller encoding size than Ion's. JSON
| interoperability - CBOR is designed to be a JSON-compatible
| binary format. Ion is JSON-like but not fully compatible.
|
| In summary, Ion has richer data typing and schema capabilities
| and a long production history. But CBOR is simpler,
| standardized, and gaining momentum - especially in crypto and
| web standards using it as a binary encoding basis.
|
| So Ion may be better for applications dealing with complex,
| annotated data. But CBOR has advantages for an efficient binary
| interchange format, particularly when standards compatibility
| is important.
| tombert wrote:
| I would be interested to see how this compares to something like
| msgpack [1] in performance and final size of the binary. Msgpack
| has been my go-to for binary serialization for years due to how
| simple and fast it is, and how easy it is to make it work with
| native Clojure data structures.
|
| [1] https://msgpack.org/index.html
| jesterpm wrote:
| That comparison would depend heavily on what you're storing.
|
| Ion has the option of using symbol tables to replace strings
| (e.g. in struct/map keys or in values). So, if you benchmark
| had a large number of records with similar structures, I would
| expect Ion to pull ahead. On the other hand, if each record had
| nothing in common, I'd expect them to perform similarly.
|
| One feature of the Ion libraries that I've liked is the parser
| will take any of the formats and figure out what to do with it
| (text, binary, compressed binary). It's one less thing to worry
| about. You can switch encodings later without breaking
| consumers, you can write plain text Ion when you're testing,
| etc.
| plq wrote:
| Symbol tables, compression, etc seem one level of abstraction
| above what msgpack provides. Such features could be
| implemented on top of vanilla msgpack as long as all parties
| agree on the msgpack schema.
| news_to_me wrote:
| Saw this the other day, but the multiple types of null kind of
| turned me off - e.g. `null.int`, `null.float`, `null.null`. Is
| there a good justification for this? Seems like a kluge in any
| case.
| steveBK123 wrote:
| typed nulls good
| throwbadubadu wrote:
| Sounds like the grug brained developer speaking... Hi! ;)
| spaceywilly wrote:
| Seems like the justification would be to keep the type
| information when going back and forth to Ion. More like
| "multiple nullable types" instead of "multiple types of null"
|
| userBirthDay: null <-- ok, but what type is it? String? Int?
| Timestamp?
|
| userBirthDay: null.timestamp <-- ok, it's a timestamp typed
| variable, but we don't know the value. Yay, happy programmer.
| zyang wrote:
| My guess is buffer size calculations.
___________________________________________________________________
(page generated 2023-09-13 23:01 UTC)