[HN Gopher] What even is a JSON number?
___________________________________________________________________
What even is a JSON number?
Author : bterlson
Score : 50 points
Date : 2024-04-01 19:54 UTC (3 hours ago)
(HTM) web link (blog.trl.sn)
(TXT) w3m dump (blog.trl.sn)
| paulddraper wrote:
| > I-JSON messages SHOULD NOT include numbers that express greater
| magnitude or precision than an IEEE 754 double precision number
| provides
|
| I'm confused by this.
|
| What is the precision of 0.1, relative to IEEE 754?
|
| If I read it correctly, that statement is saying:
| json_number_precision(json_number) <= ieee_754_precision
|
| ^ How do I calculate these values?
| bterlson wrote:
| I think the spec just means, assume IEEE 754. In the case of
| 0.1, which cannot be represented exactly, software should
| assume that `0.1` will be represented as
| `0.100000000000000005551115123126`. Depending on `0.1` being
| parsed as the exact value `0.1` is not widely interoperable.
| ebolyen wrote:
| Relatedly, what about integers like 9007199254740995. Is that a
| legal integer since it rounds to 9007199254740996?
|
| It does seem unclear what it means to exceed precision (given
| rounding is such an expected part of the way we use these
| numbers). Magnitude feels easier as at least you definitely run
| out of bits in the exponent.
| EdSchouten wrote:
| I think the description for Go is inaccurate/incomplete. You can
| call this function to instruct the decoder to leave numbers in
| unparsed string form:
|
| https://pkg.go.dev/encoding/json#Decoder.UseNumber
|
| That allows you to capture/forward numbers without any loss of
| precision.
| bterlson wrote:
| I have added this note, thanks! In the blog I am mostly trying
| to show the behavior you get using the (maybe defacto) stdlib
| with its default configuration, but this is useful data to call
| out.
| timvdalen wrote:
| A little off topic, but fun to see that someone else has adopted
| that magical CSS theme! (https://css.winterveil.net/)
| ape4 wrote:
| Since JSON is so widely used it should be modified to support
| more types - Mongo DB's Extended JSON supports all the BSON
| (Binary) types: Array Binary
| Date Decimal128 Document Double
| Int32 Int64 MaxKey MinKey
| ObjectId Regular Expression Timestamp
|
| https://www.mongodb.com/docs/manual/reference/mongodb-extend...
| bterlson wrote:
| JS is likely to get a hook to be able to handle
| serialization/deserialization of such values without swapping
| out the entire implementation[1]. Native support for these
| types, without additional code or configuration, would likely
| break the Internet badly, so is unlikely to happen
| unfortunately.
|
| 1: https://github.com/tc39/proposal-json-parse-with-source
| apantel wrote:
| Much more valuable than any such extension would be a way to
| annotate types and byte lengths of keys and values so that
| parsers could work more efficiently. I've spent a lot of time
| making a fast JSON parser in Java and the thing that makes it
| so hard is you don't know how many bytes anything is, or what
| type. It's hard to do better than naive byte-by-byte parsing.
| zilti wrote:
| Or maaaybe use XML for such cases
| frizlab wrote:
| It's missing Swift tests, but otherwise it's a great post.
| bterlson wrote:
| If you would like to contribute Swift tests, I would be happy
| to take it! You can send a PR into this document, updating the
| data tables and adding a code sample at the end: https://github
| .com/bterlson/blog/blob/main/content/blog/what.... No need to
| test openapi-tools swift codegen unless you really want to!
| frizlab wrote:
| I'm having a lot on my plate currently, but I'm adding this
| to my TODO list!
| whalesalad wrote:
| I tend to end up encoding everything as an integer (multiply by
| 1000, 10000 etc) and then turn it back into a float/decimal on
| decode. For instance if I am building a system dealing with
| dollar amounts I will store cent amounts everywhere, communicate
| cent amounts over the wire, etc. then treat it as a presentation
| concern to render it as a dollar amount.
| eddd-ddde wrote:
| This is great as long as you always make clear which value is
| pre post encoding. I remember one of my first production bugs
| was giving users 100 times the credit they actually bought.
| Oops.
| Supermancho wrote:
| I have tried to encode all non-trivial numbers as strings. If
| it's too big (or small), or if it's a float, I'll have to
| change my JSON schema. Bake the need to decode numbers into the
| transforms for consistency.
| jerf wrote:
| It's worth bearing in mind when you do that that the largest
| integer that is "generally safe" in JSON is 2^53-1, so if you
| scale by a factor of 10000 you're taking 13-14 more bits off
| that maximum. That leaves you about 2^40, or about a trillion,
| before you may start losing precision or seeing systems
| disagree about the decoded values. Whether that's a problem
| depends on your domain.
| nedt wrote:
| So basically you use fixpoint numbers. Especially for currency
| that's a very good idea anyway, because of rounding errors,
| even more so in IEEE 754
| bterlson wrote:
| Pedantically, IEEE 754 defines decimal floating point formats
| (like decimal128) which are appropriate for representing
| currency. Representing currency in non-integer values in any
| of the binary floating point formats is indeed a recipe for
| disaster though.
| hobs wrote:
| I often store it as smaller than cents, because anything with
| division or a basket of summed parts with taxes can start to
| get funky if you round down (and some places have laws about
| that.)
| 01HNNWZ0MV43FF wrote:
| Makes sense for dollars, but for anything like graphics or
| physics I'd consider a power of two like 1,024 as the fixed-
| point factor instead.
|
| My intuition tells me that "x * 1000 / 1000 == x" might not be
| true for all numbers if you're using floats.
| hn_throwaway_99 wrote:
| The problem with that (which I have seen in practice) is that
| you are essentially hard coding the maximum precision you will
| accept for every client that needs to interpret your JSON.
|
| For example, you say you store monetary amounts as cents. What
| if you needed to store US gas prices, which are normally priced
| in amounts ending in 9/10ths of a cent? If you want to keep
| your values as integers you need to change your precision,
| which will likely mess up a lot of your code.
| erik_seaberg wrote:
| It's weird that any parser that loses digits is tolerated. A
| parser that forces strings into uppercase US-ASCII never would
| be.
| msm_ wrote:
| That's true for every floating point number in every
| programming language you have ever used, though.
| $ python3 Python 3.10.13 (main, Aug 24 2023, 12:59:26)
| [GCC 12.2.0] on linux Type "help", "copyright",
| "credits" or "license" for more information. >>>
| 100000.000000000017 100000.00000000001
| Izkata wrote:
| This is why Decimal exists: Python 3.8.10
| (default, Nov 22 2023, 10:22:35) [GCC 9.4.0] on linux
| Type "help", "copyright", "credits" or "license" for more
| information. >>> from decimal import Decimal >>>
| Decimal('100000.000000000017')
| Decimal('100000.000000000017')
|
| For example: >>> import json >>>
| json.loads('{"a": 100000.000000000017}') {'a':
| 100000.00000000001} >>> json.loads('{"a":
| 100000.000000000017}', parse_float=Decimal) {'a':
| Decimal('100000.000000000017')}
| cerved wrote:
| but serializing/deserializing decimal using the json module
| is futile
| dhosek wrote:
| And not every programming language offers a Decimal type
| and on most of those, there's usually a performance penalty
| associated with it not to mention issues of
| interoperability and developer knowledge of its existence.
| For financial calculations, usually using integers with an
| implicit decimal offset (e.g., US currency amounts being
| expressed in cents rather than dollars), while other
| contexts will often determine that the inherent inaccuracy
| of IEEE floating types is a non-issue. The biggest
| potential problem lies in treating values that act kind of
| like numbers and look like numbers as numbers, e.g., Dewey
| Decimal classification numbers or the topic in a Library of
| Congress classification.1
|
| [?]
|
| 1. This is a bit on my mind lately as I discovered that
| LibraryThing's sort by LoC classification seems to be
| broken so I exported my library (discovering that they
| export as ISO8859-1 with no option for UTF-8) and wrote a
| custom sorter for LOC classification codes for use in
| finally arranging the books on my shelves after my move
| last year.
| enriquto wrote:
| > That's true for every floating point number in every
| programming language you have ever used, though.
|
| Alright, if "you" have only ever used python. In C, for
| example, we have hexadecimal floating point literals that
| represent all floats and doubles exactly (including
| infinities and nans that make the json parser fail
| miserably).
| legobmw99 wrote:
| If you use the same syntax as OP, C's parser will also
| round that literal. The existence of a hex literal for
| floats is something orthogonal
| codetrotter wrote:
| > we have hexadecimal floating point literals that
| represent all floats and doubles exactly
|
| How do you do that?
|
| A couple of resources I found but which I'm not sure if are
| about exactly what you speak of
|
| https://stackoverflow.com/questions/65480947/is-
| ieee-754-rep...
|
| https://gcc.gnu.org/onlinedocs/gcc/Hex-Floats.html
|
| Furthermore, what exactly do you mean by "all floats and
| doubles exactly"?
| enriquto wrote:
| Yes, I was talking about what is described in your
| resources. You can do this: // define a
| floating-point literal in hex and print it in decimal
| float x = 0x1p-8; // x = 1.0/256
| printf("x = %g\n", x); // prints 0.00390625
| // define a floating point literal in decimal and print
| it in various ways float y = 0.3; //
| non-representable, rounded to closest float
| printf("y = %g\n", y); // 0.3 (the %g format does
| some heuristics) printf("y = %.10f\n", y); //
| 0.3000000119 printf("y = %.20f\n", y); //
| 0.30000001192092895508 printf("y = %a\n", f);
| // 0x1.333334p-2
| codetrotter wrote:
| So for example if you make a variable that has the value
| parent commenter used
|
| 100000.000000000017
|
| And then you print it.
|
| Does it preserve the exact value?
| enriquto wrote:
| Your question is ambiguous for two different reasons.
| First, this value is not representable as a floating-
| point number, so there's no way that you can even store
| it in a float. Second, once you have a float variable,
| you can print it in many different ways. So, the answer
| to your question is, irremediably, "it depends what you
| mean by _exact value_ ".
|
| If you print your variable with the %a format, then YES,
| the exact value is preserved and there is no loss of
| information. The problem is that the literal that you
| wrote cannot be represented exactly. But this is hardly a
| fault of the floats. Ints have exactly the same problem:
| int x = 2.5; // x gets the value 2 int y = 7/3;
| // same thing
| codetrotter wrote:
| So in other words, is it fair to say that this situation
| is not much different from what you get with Python?
| tpm wrote:
| https://0.30000000000000004.com/
|
| Although it would be good to move in the direction of using a
| BigDecimal equivalent by default when ingesting unknown data.
| kccqzy wrote:
| I'll add that for Haskell, the library everyone uses for JSON
| parses numbers into Scientific types with almost unlimited size
| and precision. I say almost unlimited because they use a decimal
| coefficient-and-exponent representation where the exponent is a
| 64-bit integer.
|
| The documentation is quite paranoid that if you are dealing with
| untrusted inputs, you could parse two JSON numbers from the
| untrusted source fine and then performing an addition on them
| could cause your memory to fill up. Exciting new DoS vector.
|
| Of course in practice people end up parsing them into custom
| types with 64-bit integers, so this is only a problem if you are
| manipulating JSON directly which is very rare in Haskell.
| egwor wrote:
| I think the thing folk miss is when there's an error like divide
| by zero, or the calculation would return NaN. I feel like this is
| the main gap/concern with using JSON and it seems to be rarely
| discussed.
| olejorgenb wrote:
| Agreed, this can be a pain. Python by default serialize and de-
| serialize the `NaN` literal, making you pay some cleanup cost
| once you need to interopt with other systems. (same for `Inf`)
|
| Say what you want about NaN, but IEEE 754 is the facto way of
| dealing with floating points in computers and even if NaNs and
| Infs are a bit "fringe" it's unfortunate that the most popular
| serialization format can not represent these.
| nigeltao wrote:
| When I wrote my jsonptr tool a few years ago, I noticed that some
| JSON libraries (in both C++ and Rust) don't even do "parse a
| string of decimal digits as a float64" properly. I don't mean
| that in the "0.3 isn't exactly representable; have
| 0.30000000000000004 instead" sense.
|
| I mean that rapidjson (C++) parsed the string
| "0.99999999999999999" as the number 1.0000000000000003. Apart
| from just looking weird, it's a different float64 bit-pattern:
| 0x3FF0000000000000 vs 0x3FF0000000000001.
|
| Similarly, serde-json (Rust) parsed "122.416294033786585" as
| 122.4162940337866. This isn't as obvious a difference, but the
| bit-patterns differ by one: 0x405E9AA48FBB2888 vs
| 0x405E9AA48FBB2889. Serde-json does have an "float_roundtrip"
| feature flag, but it's opt-in, not enabled by default.
|
| For details, look for "rapidjson issue #1773" and "serde_json
| issue #707" at https://nigeltao.github.io/blog/2020/jsonptr.html
| 01HNNWZ0MV43FF wrote:
| Oh wow. So serde_json doesn't roundtrip floats by default, it
| uses some imprecise faster algorithm https://github.com/serde-
| rs/json/issues/707
|
| Good thing there's msgpack I guess.
| ctrw wrote:
| I still get a laugh of ecma 404. The first time I looked it up I
| refreshed the page a large number of times before I realized it
| wasn't an error.
| hinkley wrote:
| One of the first Ajax projects I worked on was multi tenant, and
| someone decided to solve the industrial espionage problem by
| using random 64 bit identifiers for all records in the system.
| You have about a .1% chance of generating an ID that gets
| truncated in JavaScript, which is just enough that you might make
| it past MVP before anyone figures out it's broken, and that's
| exactly what happened to us.
|
| So we had to go through all the code adding quotes to all the ID
| fields. That was a giant pain in my ass.
| bevekspldnw wrote:
| Try to get a DECIMAL value out of a Postgres database into a JSON
| API response and you'll learn all this and more in the most
| painful way possible!
___________________________________________________________________
(page generated 2024-04-01 23:00 UTC)