[HN Gopher] Amazon Ion - A richly-typed, self-describing, hierar...
___________________________________________________________________
Amazon Ion - A richly-typed, self-describing, hierarchical
serialization format
Author : gjvc
Score : 367 points
Date : 2021-11-20 00:28 UTC (22 hours ago)
(HTM) web link (amzn.github.io)
(TXT) w3m dump (amzn.github.io)
| kats wrote:
| It's just another format not better than any of the others.
| 95th wrote:
| Remind of bencoding used in torrents
| silvestrov wrote:
| This is what JSON should have been extended to.
|
| But Douglas Crockford just don't want to innovate anything, just
| like Gruber didn't want to make a proper specification of the
| Markdown format.
|
| Sometimes people are keeping innovation back. Fortunately this
| did not happend with html.
|
| The main thing missing from the text format is a magic and
| version number. At least the binary format has it.
| kwertyoowiyop wrote:
| The dominance of JSON shows that Crockford made some good
| decisions, even though we may not agree with them on any given
| day.
| seanclayton wrote:
| The dominance of JSON just shows that JS is dominant.
| usrusr wrote:
| Even JS stopped parsing JSON as a subset JS a long time
| ago. JSON lineage has been irrelevant in terms of
| popularity transmission ever since people stopped doing var
| jsonobj = eval(jsonstring);
| indymike wrote:
| > The dominance of JSON just shows that JS is dominant.
|
| I don't know that's the case... I've used JSON in lots of
| non-JS languages because it just works, and errors rarely
| are caused by mismatches in how JSON behaves in language X
| and language Y. A lot of that is that it is simple, and
| rigid.
| chromatin wrote:
| Check out Ilya Yaroshenko's Ion library for D, part of the larger
| 'mir' library:
|
| http://mir-ion.libmir.org/
|
| https://github.com/libmir/mir-ion
| hatf0 wrote:
| Weird to see the library I work on show up in HN --- Mir Ion is
| a pretty complicated library (and admittedly our documentation
| needs work -- I'm working on that!), but I'm very proud of our
| work.
|
| Some fun things about Mir Ion:
|
| - We can fully deserialize Ion at compile-time (via D's CTFE
| functionality)
|
| - We're one of the fastest JSON parsing libraries (and one of
| the most memory efficient too -- we actually store all JSON
| data in memory as Ion data, which is vastly more efficient)
|
| - We're nearly 100% compliant to all of the upstream test cases
| (our main issue is that we're often _too_ lax on spec, and
| allow files that are invalid through)
|
| - The entire library is (nearly) all `@nogc`, thanks to the Mir
| standard library
|
| If anyone has any questions on Mir Ion, feel free to shoot me a
| line at harrison (at) 0xcc.pw
| CyanLite4 wrote:
| How does this compare with MsgPack?
| tootie wrote:
| No schema validation?
| landonxjames wrote:
| I believe that is provided by the Ion Schema Language
| https://amzn.github.io/ion-schema/docs/spec.html
| programd wrote:
| "Zero and negative dates are not valid, so the earliest instant
| in time that can be represented as a timestamp is Jan 01, 0001"
|
| That seems to be...a problem? How do you deal with archeological
| dates, of which there are many, in Ion?
| pdpi wrote:
| That's an interesting question. On the one hand, it feels weird
| that you can't represent those dates at all.
|
| On the other hand, representability of a given date becomes
| progressively less useful the further back in time you go, and
| stuff becomes really gnarly once you go past the Julian
| calendar in 45BC.
|
| Also, simplifying to "no dates before Jan 1 0001" has very
| little impact on applications dealing with the modern-ish world
| (with "modern" generously defined as "anything after the
| collapse of the Roman Empire"), and I can only assume
| applications dealing with earlier times could do with a more
| specialised representation for dates anyway.
| biztos wrote:
| Just to give one example, in Thailand right now it's the year
| 2564.
|
| 1 BC for some is not "-1" for everyone.
| elteto wrote:
| What modern tech service (of the kind that would have use for
| Ion) is dealing with archaeological dates _at scale_? Honest
| question.
| rjzzleep wrote:
| I feel like a lot of file formats came out of companies, but even
| protocol buffers isn't calling itself google protocol buffers.
| What is it with modern companies putting their name everywhere
| they can?
| rp1 wrote:
| Grpc?
| tjpnz wrote:
| The g is (allegedly) not for Google.
| rp1 wrote:
| What's it for?
| [deleted]
| moltenguardian wrote:
| https://grpc.github.io/grpc/core/md_doc_g_stands_for.html
| Rebelgecko wrote:
| gRPC
| travisd wrote:
| > What does gRPC stand for? > gRPC Remote Procedure
| Calls, of course!
|
| https://grpc.io/docs/what-is-grpc/faq/
| SavantIdiot wrote:
| It's funny, I didn't realize protobuf was a Google thing for a
| long time because of that. At least `protobuf` is a reasonably-
| specific search term. `ion` returns too much noise. Almost a
| good reason to name things weirder, like `iyon`. But then
| they'd get laughed at. EDIT: oh, its a Tagolog name too, and a
| light company.
| jsnell wrote:
| Disambiguation. There is one thing called protobufs. There are
| hundreds called "ion", a lot of which are more notable than an
| internal file format.
|
| Edit: I was going to paste in a relevant quote from Zarf (i.e.
| Andrew Plotkin) on naming. Some of his most important programs
| have total nonsense names like "glulx", and the reasoning was
| that at least it would be easy to search for when the name is
| unique. But ironically, "Zarf" is so common a term that I can't
| find the quote.
| ix101 wrote:
| Natural file extension will be .ai despite having no relation
| to AI
| seniorsassycat wrote:
| I've seen .ion and .i0n for text and binary ion files. I
| think amazon ion is like golang - used to clarify meaning not
| branding
| syspec wrote:
| Surprised I have not heard of this before, I'd love something to
| come along and give JSON a kick in the pants.
|
| I do think JSON is the defacto standard, and it really does get
| the job done, but for some more advances uses something like this
| could really shine.
| petilon wrote:
| I like the fact that you can annotate objects as well, not just
| literals. So this is valid: animal: Tiger:: {
| gender: 'F', weight: 450 }
|
| This solves the inheritance problem, i.e., if you have multiple
| subclasses how do you know which type to deserialize as?
| plandis wrote:
| I believe that this is exactly how Jackson serialized Ion
| handles subtype polymorphism.
| quiffledwerg wrote:
| I just feel deeply disinclined towards supporting anything Amazon
| because they've developed a reputation as such a poor community
| member.
| setheron wrote:
| Wow I remember using Ion back at Amazon in 2012. I can't remember
| but I think the order data warehouse was using it ...
|
| I also now remember back to using something that was akin to FaaS
| but wasn't called that. I could give them a JAR of some code that
| would execute on some Ion data for the order data when it
| changed. Basically FaaS for an ETL pipeline...
|
| Crazy how ahead of the times some companies were.
| vineyardmike wrote:
| I wonder why it took 10+ years to share then?
| timdorr wrote:
| Actually, it only took them 4-ish years:
| https://amzn.github.io/ion-docs/news/2016/04/21/amazon-
| open-...
| User23 wrote:
| That was a golden age for Amazon engineering. I assume they're
| still great, but that stretch from 2004 to 2014 was some
| incredible advancement.
| dorianmariefr wrote:
| Pretty neat, but isn't it like *two* formats: one binary and one
| textual?
| OJFord wrote:
| Consider that binary, binary coded decimal, Gray code,
| hexadecimal, octal, etc. are all 'formats' expressing the same
| (numerical) idea.
|
| You can't say the same of, for example, YAML & JSON, since the
| former (if not the latter?) has constructs unrepresentable in
| the other.
|
| It's slightly confused because an application might 'serialise
| to' JSON or YAML or Ion equivalently - but really that's saying
| the application's data being serialised fits a model that's a
| subset of the intersection between those formats.
|
| You could call Ion two, but it's more than that in that it's
| also a promise that they're 1:1 (err, and onto if you like) -
| their intersection is their union.
| echelon wrote:
| One data model, two serializations of it.
| seniorsassycat wrote:
| Two representations of the same data structures.
|
| Ion text is like JSON, in fact all JSON is valid ion text. Ion
| text has comments, trailing commas, dates, and unquoted keys.
| It's a really good alternative to JSON, YAML, or TOML.
|
| Ion binary is compact and fast to parse. Values are length
| prefixed so the parser can skip over unneeded fields or
| structs, saving time parsing and memory allocated. Common
| string values, like struct keys and enum values, are given
| numeric ids and stored once in a header table.
| ComputerGuru wrote:
| Do comments persist in binary serialization or is that a
| lossy one-way operation?
| the_girabbit wrote:
| Comments don't persist in binary. Like white space, they
| are explicitly not part of the data model.
| seniorsassycat wrote:
| I think the ion java library includes a AST parser that
| includes comments, but the ION data model doesn't. The
| binary format cannot include comments.
|
| I think many text parsers are missing libraries that edit
| documents in place, preserving formatting and comments.
| jscholes wrote:
| The latest format for Kindle eBooks, KFX, is based on this.
| loeg wrote:
| Yep:
| https://github.com/apprenticeharper/DeDRM_tools/blob/master/...
| yayitswei wrote:
| Reminds me of Clojure's transit.
| cmancini wrote:
| That was the first thing I thought of! Big fan of transit.
| Seems very similar.
| Zamicol wrote:
| Am I the only one that doesn't like base 64?
|
| Hex for when efficiency isn't paramount.
|
| Base 85 or BasE91 for when efficiency is more of a concern.
| http://base91.sourceforge.net/
| hackcasual wrote:
| You want to use hex whenever byte aligned data is going to be
| compressed. Base64 quadruples byte level symbols
| ralusek wrote:
| I understand the case for Base91, but why hex over Base64?
| Base64 for readability and sticking to multiples of two, Base91
| for maximum efficiency with readable ASCII.
| Zamicol wrote:
| Base 64 is good at nothing and bad at some things.
|
| - Hex is human readable, case insensitive, not that
| "inefficient", and always aligns to bytes.
|
| - Base 85 and basE91 are efficient.
|
| - Bitcoin uses Base58 because they thought base 64 was too
| human unreadable. Ethereum uses Hex.
|
| - Base 256 (bytes) is efficient and the native language of
| computers.
|
| Base 64 is not efficient, not human readable, and not easy to
| encode.
|
| The biggest problem with base 64 is that base 64 is not base
| 64. Are you doing base 64 with padding? Are you doing base 64
| with URL safe characters or URL unsafe characters? Are you
| following the standard RFC 4648 bucket encoding, or are you
| using iterative divide by radix? I think a great place where
| the cracks show is JOSE, where for things like thumbprints
| there's a ton of conversion steps (UTF-8 key -> base 64 ->
| ASCII bytes -> digest (bytes) -> base 64 thumbprint).
|
| My personal advise for 90% of projects considering looking at
| base 64 should just use Hex or bytes. If needing human
| readability, use Hex. Otherwise use binary.
| Aeolun wrote:
| I like base64 because it's the de-facto standard, and data size
| (in places where I'd use base64) isn't a main concern for me.
| stjohnswarts wrote:
| Yeah I get tired of reinvention of everything for tiny gains
| in size/performance.
| transfire wrote:
| `years::4`? I don't know. What not `4::years`?
|
| Also, symbols converted to integers means the receiving end has
| to already know exactly what they are.
| re wrote:
| Putting annotations before values is likely to be more useful
| for streaming parsers than putting them after. Imagine the case
| where the annotation represents a class that you want to
| deserialize a large object into.
| sokoloff wrote:
| There is provision for encoding a local symbol table:
| https://amzn.github.io/ion-docs/docs/symbols.html#processing...
| stevefan1999 wrote:
| How does that differs from the likes of MessagePack and CBOR?
| Eelongate wrote:
| Did anything ever become of the lispy language that was being
| built using Ion as its homoiconic syntax? I'm afraid I can't
| recall what it was called. Fusion maybe?
| kayamon wrote:
| Dunno about that one but if you like that sort of thing, check
| out Rebol.
| throwaway_sZntK wrote:
| Yeah, Fusion was the name. Last I heard, they discontinued it,
| saying essentially "If you really want a full Lisp, there's
| already Clojure." S-exps continued to be used in Ion for
| embedded 1-liners but they only supported a handful of
| operators, not a full language.
| garmaine wrote:
| I hope you get an answer, because this sounds very intriguing
| but google is failing me in finding any references to it.
| seniorsassycat wrote:
| Ions text format is a nice JSON alternative while it's binary
| format is very compact and allows for efficient sparse parsing.
| Fields are prefixed with their length so you can skip over
| unneeded fields or structs while only creating objects for values
| you'll use.
| grouphugs wrote:
| fuck amazon, and fuck everyone that won't stop promoting them.
| nprateem wrote:
| Shame there's no PHP lib :(
| mpfundstein wrote:
| your chance
| jonwilsdon wrote:
| Disclosure: I manage the Ion and PartiQL teams at Amazon.
|
| If you want to create an issue for it (the best repo is
| probably the ion-docs one: https://github.com/amzn/ion-
| docs/issues) that will help to show us there is demand for it.
| Providing information on your use case helps us prioritize.
| clhodapp wrote:
| It's staggering to me that people keep making these "rich" data
| formats without sum types. At least to me, the "ors" are just as
| important as the "ands" in domain modeling. Apart from that,
| while you can always sort of fake it with a bunch of optional
| fields I believe that you kind of need a native encoding to a
| tagged union if you want to avoid bloating your messages.
| spenczar5 wrote:
| Others have mentioned Protobuf and Capnproto's support. Avro
| has them too, they're called Union.
|
| It seems that sum types are the norm, actually.
| clhodapp wrote:
| Those do now but I _believe_ that all of them added support
| years after their initial versions
| spenczar5 wrote:
| I think you're incorrect:
|
| Avro had unions in version 1.0 [0], which is from 2012.
|
| Capnproto had unions back in 2013 [1]. That's from the v0.1
| days, or maybe even earlier.
|
| Protobuf has had oneof support for about 7 years. They were
| added in version 2.6.0, from 2014-08-15 [2]. That's still 6
| years after the initial public release in 2008, though, so
| this is maybe what you were thinking of? I don't know too
| many people who were using protobuf in those days outside
| of Google, though.
|
| ---
|
| [0] https://avro.apache.org/docs/1.0.0/spec.html#Unions
|
| [1] https://github.com/capnproto/capnproto/commit/eb8404a15
| 7e074...
|
| [2] https://github.com/protocolbuffers/protobuf/blob/master
| /CHAN...
| clhodapp wrote:
| Thanks for the references, friend!
|
| And yes, I definitely am primarily thinking of protobuf,
| as I struggled with this back with version 2.5. I had the
| (apparently mistakenly) impression that Avro and Cap'n
| Proto (which I think actually first came out in this
| timeframe) were about on par.
| the_girabbit wrote:
| Genuine question--why would you need a sum type in a self-
| describing data format?
| valenterry wrote:
| Well, there are already sumtypes, just only specific builtin
| ones, not custom ones. E.g. booleans are sumtypes (true |
| false). Everything else that is nullable is also a sumtype
| (e.g. number | null).
|
| I think it should be pretty obvious how these are helpful and
| why they are needed no?
| the_girabbit wrote:
| Yeah, but it's a schema-less, self-describing data format.
| It's not like a specific position in a data stream has a
| requirement to be a specific type.
|
| I can see why sum types would be useful in a schema or for
| the elements of a collection that is required to be
| homogeneous (ie. List<Foo|Bar>).
|
| For what use case would one use custom sum types in a
| schema-less data format?
| seniorsassycat wrote:
| ion schema is a type system that can validate ion values and it
| supports sum types.
|
| https://amzn.github.io/ion-schema/docs/spec.html#union
|
| The ion data model doesn't describe a schema or type system.
| It's a data structure where values are of a known type. In the
| binary format values are preceded by a type id, in the text
| format the syntax declares the type - "" for string, {} for
| struct. The data model doesn't declare what types a value could
| have, only the type it does have.
| dastbe wrote:
| data interchange formats try to encode as little backwards
| incompatible information as possible. in this case, it would be
| the restriction that something is a sum type when it could have
| multiple fields set in the future. another example is protobuf
| moving to all fields being optional by default.
|
| as for the wire format, a variant struct where you've only
| instantiated a single field will encode down to just about the
| minimum amount of information required.
| nly wrote:
| Avro went the opposite way to most and just makes the concept
| of an optional field implementable via a union with null
|
| Non union fields can even be upgraded to unions later
|
| Personally I find the protobufs "everything is optional!"
| Behaviour fucking insane and awful to deal with, but it is
| true to the semantics of its underlying wire format.
| valenterry wrote:
| That's not contradicting though.
|
| One can always choose not to use (native) sumtypes if they
| are interested in extreme performance or compatibility.
|
| But logically speaking, it is _good_ that it's a restriction
| that a sumtype can't just turn into a multiple-fields type.
| Because while my software (as the consumer) might still be
| able to deserialize it, the assumption that only one field is
| set would be broken and my logic would now potentially
| broken. Much better if that happens at deserialization time
| then later one when I find out that my data is
| incorrect/corrupt.
| vlovich123 wrote:
| Have you looked at cap'n'proto. It does sum types in a very
| sane way.
| joshlemer wrote:
| Doesn't it trivially have "sum types" since it's just arbitrary
| self-describing data? i.e. nobody is stopping you from passing
| around objects in such a way:
|
| {a:1} {a:{b:2}} {a:4} {a:{b:4}}
|
| There's no static type layer over top of this, so it's
| inherently up to interpretation and whatever type system you
| want to use to describe this data, to be able to express that
| the values of `a` can be of type `number | {b: number}`
| valenterry wrote:
| > There's no static type layer over top of this
|
| Yeah, that's the problem. I mean, hey, why json? We could
| just use unstructured plaintext for everything and now we are
| free to do everything. But obviously that has its own
| drawbacks.
|
| Having built-in support for sumtypes means better and more
| ergonomic support from libraries, it means there is one
| standard and not different ways to encode things and it also
| means better performance and tooling.
| joshlemer wrote:
| The point is that there's no reason to single out sumtypes
| here. Insofar as ions/json has support for
| arrays/objects/strings/numbers, it has exactly the same
| support for sumtypes, as in the example I showed above.
| Here is a list of "sumtype" `string | number | object`:
|
| [{}, "hi", 1, 2, 3, "yo", {a: "bc"}]
| valenterry wrote:
| No, that is not a sumtype, that's an array.
|
| In the same sense "1e-12" is not a number, it's a string.
| Yes, it's a string that encodes a number in a certain
| notion, but for alle the tooling, the IDE, the libraries,
| etc. it will stay a string.
| joshlemer wrote:
| What I mean is, it is an array of a sumtype `number |
| string | object`. So precisely, you could call it a
| `list<number | string | object>`
| dunefox wrote:
| That's a list union[number, string, object] or list[Any],
| not a sum type, no? This
|
| `data X = A | B
|
| [A, B, ...]`
|
| Is a list containing a sum type: list[X]
| joshlemer wrote:
| There is no such thing in JSON or Ions as defining this
| "X" schema somewhere. So I may as well say that your
| [A,B,...] is a list[Any].
|
| Now, I wouldn't actually call it a list of any, I would
| say you proved my point for me. Your example is
| functionally the same as mine. I would give this example:
|
| `[A, B, ...]`
|
| and say that that is a list of sum types. You may say "no
| no no! Only now is it a list of sum types!":
|
| `data X = A | B
|
| [A, B, ...]`
|
| But my point is that there is no JSON/Ion equivalent of
| your `data X = A | B`. Everyone in this comment tree is
| confusing the data itself with out-of-band schema over
| that data. "Sumtype" is nothing more than a fiction, or a
| schema. Saying that JSON/Ions don't support sumtypes is
| like saying JSON doesn't support "NonNegativeInteger"
| type. Sure it does! Here are some: 1, 2, 3, 10. What
| tooling or type system you use outside of the data itself
| to enforce constraints on the data types is orthogonal to
| the data format itself.
| ImprobableTruth wrote:
| Sum types =/= union types. Sum types are also called
| 'tagged' or 'discriminable' unions because they have some
| way to discriminate between them. That is, if you have an
| element a of type A, a is _not_ part of the sum type A +
| B because it 's missing a tag.
|
| [5,"hello",3] has the type list (int [?] string), not
| list (int + string). You _can_ emulate the latter by
| manually adding a tag, but native support is much
| preferable.
| joshlemer wrote:
| I know the differences between untagged and tagged
| unions, I'm trying to provide a minimal example without
| distracting details but sure we can talk about tagged
| unions. Here is a list of tagged unions, so I once again
| point out that sum types are "supported" in JSON/ions
| just as much as any other data type: [
| {tag: "a", foo: 1}, {tag: "b", bar: "hi", baz:
| 2}, {tag: "a", foo: 3}, {tag: "a",
| foo: 4}, {tag: "a", foo: 5}, {tag:
| "b", bar: "yo", baz: 6} ]
| quantumspandex wrote:
| His point was type support and standard way of doing
| things. Using your argument we just need string type to
| represent everything.
| yakkityyak wrote:
| You should look into https://cuelang.org
| vlovich123 wrote:
| Cap'n'proto has native sum types.
| jsolson wrote:
| Protobuf supports sum types in the higher-level generated
| descriptors and languages -- on the wire they're just encoded
| as, well... oneof a number of possible options.
| ricardobeat wrote:
| Which results in very painful inconsistencies when you're
| dealing with the same schema on different platforms.
| xyzzy_plugh wrote:
| Are you referring to different language
| implementations/runtimes? I don't follow your point about
| inconsistencies.
| [deleted]
| tlocke wrote:
| One problem with Ion is that it doesn't have a map type, but
| instead a struct type that allows duplicate keys. I created Zish
| https://github.com/tlocke/zish as a serialization format that
| addresses the shortcomings of JSON and Ion. Any comments /
| criticisms welcome.
| n8ta wrote:
| I recently implemented a similar (simpler) format
| https://baremessages.org/ in ruby.
|
| First thoughts are:
|
| ION pros: - easy to skip around while reading a file - no need to
| write a schema - backed by amazon so major langs will have impls
| - good date support - better concatenation, probably better
| suited to logging than bare
|
| ION cons - what's the text format even for?
|
| BARE pros: - schemas keep things tightly versioned - smaller
| binaries (not self describing like ion) - simpler to implement so
| tons of devs have impl'ed for their favorite lang - better suited
| to small messages (think REST json api)
|
| BARE cons: - no skip read - no date support
|
| I might do an ion ruby implementation too, to really feel out the
| difference.
| ozzythecat wrote:
| Ion text is helpful so you can convert ion binary to text for
| debugging:
| seniorsassycat wrote:
| ion text is a good contender for JSON, YAML, TOML usecases.
| It's also a good way to present the binary to humans.
| imiric wrote:
| > what's the text format even for?
|
| Configuration files?
|
| Not sure if that's an intended use case, but being more
| flexible than JSON and stricter than YAML seems ideal for
| configuration.
| the_girabbit wrote:
| Ion will be even better for (structured) logging if this
| proposal for templates ever happens.
| https://github.com/amzn/ion-docs/pull/104
|
| Looks like no one's even so much as commented on it in the last
| year, so it might have been abandoned.
| n8ta wrote:
| Ion is already a little too complex for my taste. It'd be a
| shame to see it go the same way as yaml where it's so complex
| that most major implementations are not safely interoperable.
| jonwilsdon wrote:
| Disclosure: I manage the Ion and PartiQL teams at Amazon.
|
| This proposal hasn't been abandoned. We hope to post an
| update soon!
| sirk390 wrote:
| timestamps and decimal are the two most useful additions compared
| to json. They would be nice to add to json if that is somehow
| possible.
| nly wrote:
| JSON numbers, just like all human readable formats, _are_
| decimal... it 's not like binary double values are printed out
| in to JSON in hex or base64
|
| Sure 99% of decoders convert them to and from binary doubles,
| but that's purely an implementation choice.
| indymike wrote:
| > JSON numbers, just like all human readable formats, are
| decimal...
|
| All JSON numbers are implemented as integers or floating
| point, and as a result, have to be cast as a decimal (a
| decimal type is generally something that meets this
| specification: http://speleotrove.com/decimal/) when you
| import them.
|
| Decimal types differ from floating point types in three ways:
| they are accurate, and they take into account rounding rules
| and precision. Decimal math is slower, can have greater
| precision and is better suited to domains where finite
| precision is needed. Floating point is faster, but is not as
| precise, so it's good for some scientific uses... or where
| perfect precision isn't important but speed is... say 3d
| graphics.
|
| I've billed lots of hours over the years fixing code where a
| developer used floats where they should have used decimals.
| For example, if you are dealing with money, you probably want
| decimal. It's one of those problems like trying to parse
| email addresses with a regex or rolling your own crypto... it
| will kind a work until someone finds out it really doesn't
| (think accounting going, our numbers are off by random
| amounts, WTF?).
| nly wrote:
| A binary double can hold any decimal value to 15 digits of
| precision, so as a _serialisation format_ it 's a bit of a
| non-issue... you just need to convert to decimal and round
| appropriately before doing any arithmetic where it matters.
|
| And you're confusing JSON the format with typical
| implementations. Open a JSON file and you see _decimal_
| digits. There is no limit to the number of the digits in
| the grammar. Parsing these digits and converting them to
| binary doubles, for example, is actually _slower_ than
| parsing them as decimals, because you have to do the latter
| anyway to accomplish the former. Almost all JSON libraries
| convert to binary (e.g. doubles) because of their
| ubiquitous hardware and software support...but some
| libraries like RapidJSON expose raw numeric strings out of
| the parser if you want to plug in a decimal library
| hirundo wrote:
| It seems like an odd choice to make the type "metadata" a prefix
| to the value, rather than a separate field. It feels like
| overloading. What's the advantage?
| re wrote:
| Not sure I understand exactly what "a separate field" would
| look like, but:
|
| 1. Considering that a goal of Ion is to be a strict superset of
| JSON, separate syntax ensures that any JSON value can be parsed
| without misinterpreting some field as an annotation--there are
| no reserved/"magic" field names.
|
| 2. Annotations can be applied to any type of value, not just
| objects, which are the only type that have fields.
| [deleted]
| indymike wrote:
| It tells you how to load the value and can be human readable
| for audit purposes. example: degrees::'celsius'::100
| wisty wrote:
| I scanned the docs, and can't see what happens if you alter your
| data schema. Anyone know?
| travisd wrote:
| Seems like you have to handle that yourself. The serialized
| data includes the type, so your app code might have to have
| logic a la "if type1: ... else: ..." after parsing it.
| wisty wrote:
| OK, so it's one of the more flexible ones (like those binary
| jsons) rather than something like protobuf. I guess that
| should have been obvious from "self-describing".
| otabdeveloper4 wrote:
| Nice! This thing is actually sane and thought through. A first
| for serialization formats. They're usually a shitshow.
|
| (Should have gone with 'rational' instead of 'decimal', though.
| Decimal will be too painful to implement accross languages and
| implementations. Java bias?)
| sirk390 wrote:
| But decimal are way more useful as they can represent currency
| amounts. It would be strange to show a currency amount like
| "3/4" or "11/12". Personally, the two datatypes I have always
| been adding manually to json are datetimes and decimals (from
| python)
| otabdeveloper4 wrote:
| A currency amount is just a rational number with "1000000" as
| a denominator.
|
| This is the correct representation, and how Google or the
| blockchain do it.
| mgamache wrote:
| msgpack is near the top for speed and size... Readability is
| nice. Are there other advantages?
|
| https://msgpack.org/index.html
| AtlasBarfed wrote:
| This is json with relaxed jackson parsing: quote-optional keys,
| comments, all doable with jackson OOTB for years now.
| quda wrote:
| Another useless transfer data format. It will be forgotten
| within a year or two.
| hliyan wrote:
| This reminded me of a tight-packed binary format we used in the
| trading systems domain almost 20 years ago. Instead of including
| metadata/field names in each message, it had a central message
| dictionary that every client/server would first download a copy
| from. Messages had only type IDs, followed by binary packed data
| in the correct field order. Because of microsecond latency
| requirements, we even avoided the serialization/deserialization
| process by making the memory format of the message and the wire
| format one and the same. The message class contained the same
| buffer that you would send/store. The GetInt(fieldID) method of
| the class simply points to the right place in the buffer and does
| a cast to int. Application logs contained these messages, rather
| than plain text. There was a special reader to read logs.
| Messages were exchanged over raw TCP. They contained their own
| application layer sequence number so that streams could resume
| after disconnection.
|
| In that world, lantencies were so low that the response to your
| order submission would land in your front-end before you've had
| time to lift your finger off the enter key. I now work with web
| based systems. On days like this, I miss the old ways.
| atlgator wrote:
| We did the same in high fidelity flight simulators for a lot
| less money I'm sure.
| lordnacho wrote:
| Same here, I wrote an exchange core that did this using SBE.
| Basically you don't serialize in the classical sense, because
| you're simply taking whatever bytes are at your pointer and
| using them as some natural type. The internals of the exchange
| also simply used the same layout, so there was minimal copying
| and interpreting. On the way out it was the same, all you had
| to do was mask a few fields that you didn't want everyone to
| see and ship it onto the network.
|
| Even an unoptimized version of this managed to get throughput
| in the 300K/s range.
|
| Somehow it's the endpoint of my journey into serialization.
| Basically, avoid it if you need to be super fast. For most
| things though, it's useful to have something that you can read
| by eye, so if you're not in that HFT bracket it might be nicer
| to just use JSON or whatever.
| secondcoming wrote:
| I assume you were using C++? I'm not sure what you describe is
| possible these days due to UB. At the very least just casting
| bytes received over the wire to a type is UB, so you
| technically need a memcpy() and hope that the compiler
| optimises it out.
| hliyan wrote:
| Yes, it was C++. I was unfamiliar with the acronym "UB" so
| did a Google search. Does it mean "Undefined Behavior"? If I
| remember correctly, primitive types other than strings are
| memcpy'd. GetStr basically returned a char* to the right
| place in the buffer.
| secondcoming wrote:
| Apologies, yes Undefined Behaviour
| sattoshi wrote:
| Apache Thrift works on the same principle of separating
| structure from data.
| mrlemke wrote:
| Very neat and similar to a project I am starting for packet
| radio. I went further with the dictionary concept so that it
| contains common data. This way, your message contains only a
| few dictionary "pointers" (integers in base 64). This makes it
| easier to fit messages in ASCII for 300 baud links.
| oandrew wrote:
| Interesting. Confluent Avro + Schema registry + Kafka uses
| exactly the same approach - binary serialized Avro datums are
| prefixed with schema id which can be resolved via Schema
| registry
| angstrom wrote:
| And to top it off you could fit the entire message into
| whatever the MTU of your network supported. Cap it at 1500
| bytes and subtract the overhead for the frame headers and you
| get an extremely tight TCP/IP sequence stream that buffers
| through 16MB without needing to boil the ocean for a compound
| command sequence.
|
| Having been in industry only 2 decades it amuses me how many
| times this gets rediscovered.
| kwertyoowiyop wrote:
| Every multiplayer game programmer from the 1990s agrees with
| you!
| hliyan wrote:
| That just reminded me of the most mysterious scaling issue I
| ever faced. We had a message to disseminate market data for
| multiple markets (e.g. IBM: 100/100.12 @ NYSE, 101/102 @
| NASDAQ etc.). The system performed admirably under load
| testing (think 50,000 messages per second). One day we
| onboarded a single new regional exchange and the whole market
| data load test collapsed. We searched high and low for days
| without success, until someone figured out that the new
| market addition had caused the market data message to exceed
| the Ethernet frame size for the first time. Problem was not
| at the application layer or the transport, it was data link
| layer fragmentation! Figuring that out felt like solving a
| murder mystery (I wasn't the one who figured it out though).
| jkhdigital wrote:
| _Classic_ example of a leaky abstraction, and the principle
| that implementation details inevitably become undocumented
| API behavior.
| kabdib wrote:
| A lot of "transparent RPC" systems are like this. "It's
| just like a normal function call, it's _sooo_ convenient
| " . . . until it isn't, because it involves the network
| hardware and configuration, routing environment,
| firewalls, equipment failure . . .
| andylynch wrote:
| I've worked on systems like this too - the max packet
| size is very well documented. Then post trade it all gets
| turned into FIXML which somehow manages to be both more
| verbose and less readable.
| angstrom wrote:
| Yeah, that's part of the trick for large listing responses
| to be spread across frames. Usually with some indicator
| like a "more flag" so the client can say "get me the next
| sequence by requesting the next index in the listup with
| the prior btree index. People do this all the time with
| large databases and it's a very similar use case.
| vendiddy wrote:
| This was a fun back and forth to read!
| elcritch wrote:
| Ouch that's rough. One nice bit of IPv6 is that it doesn't
| allow fragmentation. It often much nicer to get no message
| or an error than subtly missing data.
| depereo wrote:
| IPv6 does allow fragmentation.
| elcritch wrote:
| Ah yah that's right. I'm just learning more of ipv6 and
| get it mixed up. It appears what I had in my mind was
| about intermediate routers: "Unlike in IPv4, IPv6 routers
| (intermediate nodes) never fragment IPv6 packets."
| (Wikipedia). To the previous point, it looks like ipv6
| does require networks to send 1280 byte or smaller
| packets unfragmented.
| agumonkey wrote:
| Smells like engineering
| porker wrote:
| Fab story, thank you! I understood up to "Messages were
| exchanged over raw TCP. They contained their own application
| layer sequence number so that streams could resume after
| disconnection." Can you go into more details about how the
| sequence number and resuming after disconnection worked?
| mtrovo wrote:
| Server used a global sequence number for all messages they
| transmit. Clients are stateful so they know exactly what was
| the latest message they processed and would send that id when
| creating a new connection. This was very important as a lot
| of the message types used delta values, one of the most
| important ones being the order book. So in order to apply a
| new message you had to make sure that you're internal state
| was at the correct sequence id, failing to do so would make
| your state go bonkers, specially when you're talking about
| hundreds of messages being received per second. It's scary
| but you had a special message type that would send you a
| snapshot of the expected state with a sequence id that they
| correspond to. So your error handling code would fetch one of
| these and them ask for all the messages newer than that.
| hliyan wrote:
| This is exactly right. It was almost always deltas in favor
| of snapshots. One of the downsides was that sometimes,
| debugging an issue required replaying the entire market up
| to the point of the crash/bug.
| hliyan wrote:
| Pretty basic. The receiving process usually has an input
| thread that just puts the messages into a queue. Then a
| processing thread processes (maybe logic, maybe disk writes,
| maybe send) the messages and queues up periodic batch acks to
| the sender. The sender uses these acks to clear its own
| queue. The receiver persists the last acked sequence number,
| so that in case of a restart, it can tell upstream senders to
| restart sending messages from that point.
| mianos wrote:
| The number of times people have "invented" ASN.1 now is
| ridiculous.
| erenon wrote:
| We do something very similar in binlog:
| https://github.com/morganstanley/binlog
|
| Serialization is platform-dependent (to make it a simple memcpy
| most of the time), and the schema is sent up front (but can be
| updated later, with in-bound messages at will). See the User
| Guide (http://binlog.org/UserGuide.html) and the Internals
| (http://binlog.org/Internals.html) for more.
| ktzar wrote:
| Is it FIX messages?
| https://en.wikipedia.org/wiki/Financial_Information_eXchange
| It's a good idea, extensible (ranges available for banks to
| implement their own codes), and fast.
| nly wrote:
| Old school texty FIX is incredibly slow. FAST FIX is faster
| but not fun to use. Largely SBE has won adoption on the
| market data side, with huge platforms like Euronext (biggest
| on Europe) using it.
| mtrovo wrote:
| I stopped working in the area on the age of FAST FIX, which
| was extremely good for the time. Do you know what are the
| differences to SBE?
| nly wrote:
| I guess I'm biased based on experience at the companies
| I've worked at but FAST never seemed to have good
| libraries or tooling
| hliyan wrote:
| It was a proprietary messaging middleware library. We
| actually found even FAST FIX slow.
| o_bender wrote:
| FAST FIX protocol is terrible performance-wise, its format
| requires multiple branching at every field parsing. Even
| "high-performance" libraries like mFAST are slow: I
| recently helped a client to optimize parsing for several
| messages and got 8x speed improvement over mFAST (which is
| a big deal in HFT space).
| makotobestgirl wrote:
| Sounds like Google's flatbuffers [0], which indexes directly
| into a byte buffer using the field size prefix.
|
| [0] https://google.github.io/flatbuffers/
| armchairhacker wrote:
| I don't understand why serialization formats that separate
| structure and content aren't more popular.
|
| Imagine a system every message is a UID or DID
| (https://www.w3.org/TR/did-core/) followed by raw binary data.
| The UID completely describes the shape of the rest of the
| message. You can also transmit messages to define new UIDs:
| these messages' UID is a shared global UID that everyone knows
| about.
|
| Once a client learns a UID, messages are about as compact as
| possible. And the data defining UIDs can be much more
| descriptive than e.g. property names in JSON. You can send
| documentation and other excess data when defining the UID,
| because you don't have to worry about size, because you're only
| sending the UID once. And UIDs can reference other UIDs to
| reduce duplication.
| dboreham wrote:
| This is protocol buffers + a global type registry. I worked
| on such a system.
| pcarolan wrote:
| Is it public? Id love to learn more about it.
| jenny91 wrote:
| If you read the protobuf source, you can see a bunch of
| places where you can hook in custom type-fetching code,
| e.g. in the google.protobuf.Any type.
|
| After studying it a bit, I'm certain this is how it's
| used inside Google (might also be mentioned elsewhere).
|
| All you'd really need to do is to compile all protos into
| a repository (you can spit out the binary descriptors
| from protoc), then fetch those and decode in the client.
|
| It'd actually be quite straightforward to set up,
| mtrovo wrote:
| I think the system OP is describing is a little bit more
| complex. You're not just describing message types, you also
| have message templates; a template declares a message type
| and a set of prefilled fields. You save data by just sending
| the subset of fields that are actually changing, which is a
| very good abstraction for market data. The template is
| hydrated on the protocol parsing layer so your code only has
| to deal with message types itself.
| NavinF wrote:
| You just described protobufs and all its successors.
|
| See the "@0xdbb9ad1f14bf0b36" at the top of this capnproto
| file for example: https://capnproto.org/language.html
|
| It's a 64bit random number so it'll never have unintentional
| collisions.
|
| Also note that a capnp schema is natively represented as a
| capnp message. Pretty convenient for the "You can also
| transmit messages to define new UIDs" part of your scheme :)
| infogulch wrote:
| Interesting. Ids in particular are described here:
| https://capnproto.org/language.html#unique-ids
|
| I wonder if giving it a name based on the hash of the
| definition has been explored; like Unison [0] where all
| code is content addressable, but for just capnproto
| definitions. Is there a reason not to?
|
| [0]: https://www.unisonweb.org
| NavinF wrote:
| Capnp uses the name of your message, but not its full
| definition because that would make it impossible to
| extend protocols in a backwards compatible way. Without
| the ability to add new fields, making changes to your
| protocol would be impossible in large orgs.
| boxfire wrote:
| MD5 is a 128 bit random number no one would ever have
| thought would collide. 64 bits is peanuts especially when
| message types are being defined dynamically
| remram wrote:
| MD5 is safe against unintentional collisions.
| NavinF wrote:
| Dude that's why I said "unintentional collisions".
|
| Of course you can get intentional collisions. The
| security model here assumes that anyone that wants to
| know your message's ID can just ask.
|
| Did you know that the Internet Protocol uses a 4-bit
| header to specify the format (v4 or v6) of the rest of
| the message? They should have used 128 bits. What a bunch
| of fools.
| [deleted]
| garmaine wrote:
| > It's a 64bit random number so it'll never have
| unintentional collisions.
|
| It'll have unintentional collisions if you ever generate
| more than 4 billion of these random numbers. That's not
| inconceivable.
| logicchains wrote:
| >It'll have unintentional collisions if you ever generate
| more than 4 billion of these random numbers.
|
| If it's 64 bit, doesn't that mean you'd need to generate
| ~10000000000000000000000000000000000000000000000000000000
| 000000000 (2^64) of those numbers to have a collision,
| not 2^32?
| tomerv wrote:
| If you generate randomly then, due to the birthday
| paradox, after generating sqrt(N) values you have a
| reasonable chance of collision.
|
| The birthday paradox is named after the non-intuitive
| fact that with just 32 people in a room you have > 50% of
| 2 people having a birthday on the same day of the year.
| ratorx wrote:
| Does birthday paradox apply here? It's about any pair of
| people having the same birthday, whereas in this case you
| need someone else with a specific birthday.
|
| For example, if you generate 2 numbers and they are the
| same, but are different to the capnproto number, that's a
| collision but doesn't actually matter.
|
| EDIT: It does apply, I misunderstood what the number was
| being used for.
| elcritch wrote:
| It does apply, according to
| https://www.johndcook.com/blog/2017/01/10/probability-of-
| sec...
| ratorx wrote:
| You're right, I misunderstood what the magic number was
| being used for.
| lozenge wrote:
| But if my application only uses 100 schemas, I only care
| about a collision if it's with one of those 100.
| gpderetta wrote:
| You have a collision if any two schemas share the id, not
| if a specific schema collides with any of the others. So
| it is exactly like the birthday paradox.
| heavenlyblue wrote:
| Yeah, but that collision probably doesn't matter because
| there's a bunch of other variables that need to come
| together for it to be an issue at all.
| gpderetta wrote:
| If the schema id is the message id, in principle it could
| be an issue as the protocol on the wite would be
| ambiguous. Then again, you should be able to detect any
| collisions when you register a schema with the schema
| repo and deal with it at that time.
| [deleted]
| adwn wrote:
| > _32 people_
|
| Slight correction: only 23 people, actually. So in every
| second football ("soccer") game, you have two people on
| the field with the same birthday.
| doo_daa wrote:
| I think it's 23 people in a room. The canonical example
| is people on a football (soccer) pitch. With 11 per side
| plus the referee there's a 50% chance that two will share
| the same birthday.
| [deleted]
| remram wrote:
| When you reach the 4 billionth version of your protocol?
| kentonv wrote:
| All versions of the same protocol have the same ID. That
| is the point of IDs -- to link together different
| versions of the protocol.
| remram wrote:
| You're right! That makes collisions even less likely
| then.
| heavenlyblue wrote:
| I don't understand your maths here: how is generating
| 4billion of them is any different from generating 3
| billion except a slight raise in the probability measure?
| NavinF wrote:
| Yes it is. Message schemas are made by humans. Most of
| these messages will be extended in a backwards compatible
| manner over the life of a project rather than replaced
| entirely so their IDs don't change. That's kinda the
| point of protobufs and its successors.
|
| I've probably generated 100 IDs over my lifetime.
| garmaine wrote:
| Which puts it on the same order of magnitude as the
| number of people on the planet. If every person alive
| generated a schema (or if 1/100th of all people generate
| 100 IDs each like you) then we'd have a small number of
| collisions. More likely you'd get large numbers of schema
| like that if there's a widespread application of a
| protocol compiler that generates new schema
| programmatically, e.g. to achieve domain separation, and
| then is applied at scale. I'm not saying that's likely,
| just that it is not, as is claimed, _inconceivable_.
| kentonv wrote:
| It's only really a problem if you use the IDs in the same
| system. It's highly unlikely that you'd link 4B schemas
| into a single binary. And anyway, if you do have a
| conflict, you'll get a linker error.
|
| Cap'n Proto type IDs are not really intended to be used
| in any sort of global database where you look up types by
| ID. Luckily no one really wants to do that anyway. In
| practice you always have a more restricted set of schemas
| you're interested in for your particular project.
|
| (Plus if you actually created a global database, then
| you'd find out if there were any collisions...)
| heavenlyblue wrote:
| If you have 4 billion of them generated there's another
| 1/4billionth chance you'll generate a duplicate.
|
| On top of that you would not only need to generate the
| same ID, you would need to USE it in the same system
| where that is could have some semantics to not cause an
| error.
| nly wrote:
| Protobufs is a boring old tag-length-value format. It's
| kind of the worst of both worlds because it has no type
| information encoded in to it, meaning it's useless without
| the schema, while still having quite a bit of overhead.
|
| Capn'Proto is more like a formalization of C structs in
| that new fields are only added to the end. If memory
| serves, on the wire there is no tag, type or length info
| (for fixed size field types), and everything is rooted at
| fixed offsets
| kentonv wrote:
| Mostly right. Allow me to provide some wonky details.
|
| Protobuf uses tag-type-values, i.e. each field is encoded
| with a tag specifying the field number _and_ some basic
| type info before the value. The type info is only just
| enough information to be able to skip the field if you
| don 't recognize it, e.g. it specifies "integer" vs.
| "byte blob". Some types (such as byte blob) also have a
| length, some (integer) do not. Nested messages are
| usually encoded as byte blobs with a length, but there's
| an alternate encoding where they have a start tag and an
| end tag instead ("start group" and "end group" are two of
| the basic types). On one hand, having a length for nested
| messages seems better because it means you can skip the
| message during deserialization if you aren't interested
| in it. On the other hand, it means that during
| serialization, you have to compute the length of the sub-
| message before actually serializing it, meaning the whole
| tree has to be traversed twice, which kind of sucks,
| especially when the message tree is larger than the L1/L2
| cache. Ironically, most Protobuf decoders don't actually
| support skipping parsing of nested messages so the length
| that was so expensive to compute ends up being largely
| unused. Yet, most decoders only support length-delimited
| nested messages and therefore that's what everyone has to
| produce. Whoops.
|
| Now on to Cap'n Proto. In a given Cap'n Proto "struct",
| there is a data section and a pointer section. Primitive
| types (integers, booleans, etc.) go into the data
| section. This is the part that looks like a C struct --
| fields are identified solely by their offset from the
| start of the data section. Since new fields can be added
| over time, if you're reading old data, you may find the
| data section is too small. So, any fields that are out-
| of-bounds must be assumed to have default values. Fields
| that have complex variable-width types, like strings or
| nested structs, go into the pointer section. Each pointer
| is 64 bits, but does not work like a native pointer. Half
| of the pointer specifies an _offset_ of the pointed-to
| object, relative to the location of the pointer. The
| other half contains... type information! The pointer
| encodes enough information for you to know the basic size
| and shape of the destination object -- just enough
| information to make a copy of it even if you don't know
| the schema. This turns out to be super-important in
| practice for proxy servers and such that need to pass
| messages through without necessarily knowing the details
| of the application schema.
|
| In short, both formats actually contain type information
| on the wire! But, not a full schema -- only the minimal
| information needed to deal with version skew and make
| copying possible without data loss.
| nly wrote:
| I wouldn't call what protobuf encodes type information.
| If I recall all the group stuff is deprecated, so what's
| left basically boils down to 3 types: 32 bit values, 64
| bit values and length prefixed values, which covers
| strings and sub-messages. Without the schema you can't
| even distinguish strings from sub-objects, as they are
| both length prefixed as you described.
|
| Can you even distinguish floats and ints without a schema
| in protobufs? I don't remember.
|
| I really enjoy capnproto, flatbuffers and Avro and bounce
| between them depending on the task at hand.
| mtrovo wrote:
| Don't know if you're describing the original FIX itself with
| the TCP connection. On FAST FIX they got rid of the TCP
| connection and market data was sent over UDP using several
| parallel connections, data was reordered on the client side at
| consumption time and it only used a TCP connection to recover
| data when a sequence gap was found.
| hliyan wrote:
| Actually, even FAST was too slow for us. This was a
| proprietary messaging middleware library. And this particular
| market data feed was the direct one into the matching engine
| itself. For the rest of the system, we used a sort of
| reliable multicast using UDP for the first transmission and
| TCP for missed messages. We initially tried out a
| Gossip/Epidemic protocol but that didn't work out too well.
| Aeolun wrote:
| > In that world, lantencies were so low that the response to
| your order submission would land in your front-end before
| you've had time to lift your finger off the enter key.
|
| If the order submission process depends on the manual press on
| the enter key (+/- 50ms) is there any point to that though?
| sodality2 wrote:
| It was probably used with high frequency trading so fully
| automated unless you happened to be testing it manually.
| hliyan wrote:
| Despite all the algorithms we employed, the concept of a
| manual trade never went away. Also, when the front-end was
| taken out of the equation, the latencies were in the
| microsecond range. 50ms would be excruciatingly slow for an
| algorithm.
| danachow wrote:
| OT but keyboard latency can and often is far below 50ms, more
| like 1ms. It seems to be a common misconception that
| denouncing mandates increased lag.
| formerly_proven wrote:
| That's because a lot of input hardware uses moronic
| debouncing.
| dan-robertson wrote:
| Is this a number that came from an actual benchmark or from
| some marketing material from a keyboard maker? I ask this
| because [1] finds latency (measured from touching the key
| to the usb packet arriving) of 15ms with the fastest
| keyboard and around 50ms with others, though apparently
| some manufacturers have since improved. Or are you talking
| about midi keyboards where I guess latency is more
| noticeable to users?
|
| [1] https://danluu.com/keyboard-latency/
| danachow wrote:
| From the countless review and small time YouTube channels
| that test these things regularly.
|
| I think that post must be a few years out of date - and
| moreover by its own admission doesn't even test hardly
| any "gaming" keyboards. There is a tremendous amount of
| competition in keyboards that has been building for the
| past 10 years.
|
| Input latency is now a marketing thing like horsepower,
| and there are reasonably reputable [1] places and
| countless small time YouTube reviewers that test these
| things. It's not like it is difficult to improve latency,
| and now that it is something that is competitively
| marketed it is delivered on.
|
| [1] https://www.rtings.com/keyboard/tests/latency
|
| Personally I think it's a bit ridiculous. This
| fetishization with minimizing latency to now sub-ms
| levels doesn't necessarily lead to better performance as
| many top level gamers do not use the lowest latency level
| keyboards. But that doesn't change the fact that modern
| mainstream gaming keyboards can hit a latency far below
| 50ms.
| dan-robertson wrote:
| The link I posted was 2017. The site you link gives quite
| different ratings. I assume partly it is different
| methodology (the site you link tries to account for key
| travel somehow and they do something with a display and
| try to account for display latency rather than using a
| logic analyzer), but I'm not really sure. For some
| keyboards in common:
|
| - apple magic keyboard (? vs 2017) 15ms vs 27ms
|
| - das keyboard (3 vs S professional/4 Professional) 25 vs
| 11/10ms
|
| - razer ornata (chroma vs chroma/chroma 2) 35 vs
| 11.4/10.1ms
|
| Interestingly it is not some simple uniform difference:
| the Apple keyboard does much worse in the rtings test,
| perhaps getting not much of a bonus from key travel
| compensation. But the das keyboard vs the razer that are
| 10ms apart on my link perform equally on rtings (but
| maybe I found the wrong model). I don't have a good
| explanation for that discrepancy.
| Aeolun wrote:
| I was thinking more of the time a human finger needs to
| push the button down.
| mendigou wrote:
| This is exactly how it's done for spacecraft telemetry and
| telecommand too, but in this case it's to save bytes rather
| than processing time.
|
| I also miss working on those systems.
| nly wrote:
| What you're describing is exactly what still takes place in
| trading platforms, although a few i've seen now use SBE for
| consistency sake (it's very common on the market data side)
| FpUser wrote:
| I had exactly the same implementation except that type /
| version belonged to the whole message and would map to
| appropriate binary buffer in memory. No real de/serialization
| was needed.
|
| I still use it in my UDP game servers, with added packet id if
| message exceeds max datagram length and has to be split
| ericbarrett wrote:
| The one concern I'd have with this format is a length field
| getting corrupted in transit and causing an out-of-bounds
| memory access. The network protocols' checksums won't save
| you 100% of the time, especially if there's bad hardware in
| the loop. If every field is fixed length this is less of a
| concern, of course; you might get bad data but you won't get
| e.g. a string with length 64M.
| hliyan wrote:
| In our system, if the message didn't unpack properly, the
| application would send a retransmit request with that
| message's sequence number. But in practice, this scenario
| never occurred because TCP already did this for us.
| FpUser wrote:
| I do not remember it ever happening but being semi-paranoid
| I had length in 2 places - beginning and the end of the
| message.
| amitport wrote:
| Well that's just like using C structs. The best serialization
| protocol :).
| nly wrote:
| Some finance software systems do that too. It tends to be a
| nightmare because people end up adding new message types just
| to add a single field
| cma wrote:
| > Ion supports comments.
|
| Thank god.. JSON for config files without comments is so awful.
| michalkrupa wrote:
| Yes, we are all still very excited about JSON. (Edit: and BSON)
| oandrew wrote:
| So basically it's Amazon's version of Apache Avro. Avro supports
| binary/json serialization, schema evolution , logical types (e.g.
| timestamp) and other cool stuff.
|
| https://avro.apache.org/docs/current/spec.html
| fnord77 wrote:
| I wanted to see what the differences are between Ion and Avro.
|
| Unlike avro, ion doesn't require a schema.
| whimsicalism wrote:
| ... or thrift ... or protobuf
|
| https://xkcd.com/927/
| joshka wrote:
| Avro didn't exist when Ion started development.
| jsnell wrote:
| Previous discussions:
|
| https://news.ycombinator.com/item?id=11546098
|
| https://news.ycombinator.com/item?id=23921610
| dang wrote:
| Thanks! Macroexpanded:
|
| _Amazon Ion_ - https://news.ycombinator.com/item?id=23921610 -
| July 2020 (110 comments)
|
| _Amazon open-sources Ion - a binary and text interchangable,
| typed JSON-superset_ -
| https://news.ycombinator.com/item?id=11546098 - April 2016 (163
| comments)
| throwoutway wrote:
| What do you use for the macroexpansion? There are a hundred
| odd tasks like this that I need to create macros for!
| dang wrote:
| I mean that metaphorically but I do have a bunch of
| keyboard shortcuts (in a browser extension) that make
| finding these, and formatting the comments, much faster.
| trinovantes wrote:
| I wonder what's the performance relative to native JSON parsers?
| jonwilsdon wrote:
| Disclosure: I manage the Ion and PartiQL teams at Amazon.
|
| We have done some work on performance comparisons with the ion-
| java-benchmark-cli tool (https://github.com/amzn/ion-java-
| benchmark-cli). Right now you can compare JSON serialized with
| Jackson and there is a pull request
| (https://github.com/amzn/ion-java-benchmark-cli/pull/27) for
| comparing against CBOR that should be merged soon.
|
| We are always happy to hear suggestions for what is useful in
| this area.
| seniorsassycat wrote:
| Parsing ion text should be similar to json, it has the same
| characteristics. All JSON is valid ion text so you can even
| parse JSON with an ION parser.
|
| The binary parser is much faster. All fields are length-
| prefixed so a parser doesn't have to scan forward for the next
| syntax element.
|
| The ion parsers (lexer? not sure the right vocab) I've worked
| with have a `JSON.parse` equivalent that returns a fully
| realized object, a Map, Array, Int, ect but they also have a
| streaming parser that yields value by value. You can skip over
| values you don't need, step over structs or into structs
| without creating a Map or Array. That can be much faster.
___________________________________________________________________
(page generated 2021-11-20 23:01 UTC)