hngopher.com

       [HN Gopher] What even is a JSON number?
       ___________________________________________________________________
        
       What even is a JSON number?
        
       Author : bterlson
       Score  : 50 points
       Date   : 2024-04-01 19:54 UTC (3 hours ago)
        
 (HTM) web link (blog.trl.sn)
 (TXT) w3m dump (blog.trl.sn)
        
       | paulddraper wrote:
       | > I-JSON messages SHOULD NOT include numbers that express greater
       | magnitude or precision than an IEEE 754 double precision number
       | provides
       | 
       | I'm confused by this.
       | 
       | What is the precision of 0.1, relative to IEEE 754?
       | 
       | If I read it correctly, that statement is saying:
       | json_number_precision(json_number) <= ieee_754_precision
       | 
       | ^ How do I calculate these values?
        
         | bterlson wrote:
         | I think the spec just means, assume IEEE 754. In the case of
         | 0.1, which cannot be represented exactly, software should
         | assume that `0.1` will be represented as
         | `0.100000000000000005551115123126`. Depending on `0.1` being
         | parsed as the exact value `0.1` is not widely interoperable.
        
         | ebolyen wrote:
         | Relatedly, what about integers like 9007199254740995. Is that a
         | legal integer since it rounds to 9007199254740996?
         | 
         | It does seem unclear what it means to exceed precision (given
         | rounding is such an expected part of the way we use these
         | numbers). Magnitude feels easier as at least you definitely run
         | out of bits in the exponent.
        
       | EdSchouten wrote:
       | I think the description for Go is inaccurate/incomplete. You can
       | call this function to instruct the decoder to leave numbers in
       | unparsed string form:
       | 
       | https://pkg.go.dev/encoding/json#Decoder.UseNumber
       | 
       | That allows you to capture/forward numbers without any loss of
       | precision.
        
         | bterlson wrote:
         | I have added this note, thanks! In the blog I am mostly trying
         | to show the behavior you get using the (maybe defacto) stdlib
         | with its default configuration, but this is useful data to call
         | out.
        
       | timvdalen wrote:
       | A little off topic, but fun to see that someone else has adopted
       | that magical CSS theme! (https://css.winterveil.net/)
        
       | ape4 wrote:
       | Since JSON is so widely used it should be modified to support
       | more types - Mongo DB's Extended JSON supports all the BSON
       | (Binary) types:                   Array         Binary
       | Date         Decimal128         Document         Double
       | Int32         Int64         MaxKey         MinKey
       | ObjectId         Regular Expression         Timestamp
       | 
       | https://www.mongodb.com/docs/manual/reference/mongodb-extend...
        
         | bterlson wrote:
         | JS is likely to get a hook to be able to handle
         | serialization/deserialization of such values without swapping
         | out the entire implementation[1]. Native support for these
         | types, without additional code or configuration, would likely
         | break the Internet badly, so is unlikely to happen
         | unfortunately.
         | 
         | 1: https://github.com/tc39/proposal-json-parse-with-source
        
         | apantel wrote:
         | Much more valuable than any such extension would be a way to
         | annotate types and byte lengths of keys and values so that
         | parsers could work more efficiently. I've spent a lot of time
         | making a fast JSON parser in Java and the thing that makes it
         | so hard is you don't know how many bytes anything is, or what
         | type. It's hard to do better than naive byte-by-byte parsing.
        
         | zilti wrote:
         | Or maaaybe use XML for such cases
        
       | frizlab wrote:
       | It's missing Swift tests, but otherwise it's a great post.
        
         | bterlson wrote:
         | If you would like to contribute Swift tests, I would be happy
         | to take it! You can send a PR into this document, updating the
         | data tables and adding a code sample at the end: https://github
         | .com/bterlson/blog/blob/main/content/blog/what.... No need to
         | test openapi-tools swift codegen unless you really want to!
        
           | frizlab wrote:
           | I'm having a lot on my plate currently, but I'm adding this
           | to my TODO list!
        
       | whalesalad wrote:
       | I tend to end up encoding everything as an integer (multiply by
       | 1000, 10000 etc) and then turn it back into a float/decimal on
       | decode. For instance if I am building a system dealing with
       | dollar amounts I will store cent amounts everywhere, communicate
       | cent amounts over the wire, etc. then treat it as a presentation
       | concern to render it as a dollar amount.
        
         | eddd-ddde wrote:
         | This is great as long as you always make clear which value is
         | pre post encoding. I remember one of my first production bugs
         | was giving users 100 times the credit they actually bought.
         | Oops.
        
         | Supermancho wrote:
         | I have tried to encode all non-trivial numbers as strings. If
         | it's too big (or small), or if it's a float, I'll have to
         | change my JSON schema. Bake the need to decode numbers into the
         | transforms for consistency.
        
         | jerf wrote:
         | It's worth bearing in mind when you do that that the largest
         | integer that is "generally safe" in JSON is 2^53-1, so if you
         | scale by a factor of 10000 you're taking 13-14 more bits off
         | that maximum. That leaves you about 2^40, or about a trillion,
         | before you may start losing precision or seeing systems
         | disagree about the decoded values. Whether that's a problem
         | depends on your domain.
        
         | nedt wrote:
         | So basically you use fixpoint numbers. Especially for currency
         | that's a very good idea anyway, because of rounding errors,
         | even more so in IEEE 754
        
           | bterlson wrote:
           | Pedantically, IEEE 754 defines decimal floating point formats
           | (like decimal128) which are appropriate for representing
           | currency. Representing currency in non-integer values in any
           | of the binary floating point formats is indeed a recipe for
           | disaster though.
        
         | hobs wrote:
         | I often store it as smaller than cents, because anything with
         | division or a basket of summed parts with taxes can start to
         | get funky if you round down (and some places have laws about
         | that.)
        
         | 01HNNWZ0MV43FF wrote:
         | Makes sense for dollars, but for anything like graphics or
         | physics I'd consider a power of two like 1,024 as the fixed-
         | point factor instead.
         | 
         | My intuition tells me that "x * 1000 / 1000 == x" might not be
         | true for all numbers if you're using floats.
        
         | hn_throwaway_99 wrote:
         | The problem with that (which I have seen in practice) is that
         | you are essentially hard coding the maximum precision you will
         | accept for every client that needs to interpret your JSON.
         | 
         | For example, you say you store monetary amounts as cents. What
         | if you needed to store US gas prices, which are normally priced
         | in amounts ending in 9/10ths of a cent? If you want to keep
         | your values as integers you need to change your precision,
         | which will likely mess up a lot of your code.
        
       | erik_seaberg wrote:
       | It's weird that any parser that loses digits is tolerated. A
       | parser that forces strings into uppercase US-ASCII never would
       | be.
        
         | msm_ wrote:
         | That's true for every floating point number in every
         | programming language you have ever used, though.
         | $ python3         Python 3.10.13 (main, Aug 24 2023, 12:59:26)
         | [GCC 12.2.0] on linux         Type "help", "copyright",
         | "credits" or "license" for more information.         >>>
         | 100000.000000000017         100000.00000000001
        
           | Izkata wrote:
           | This is why Decimal exists:                 Python 3.8.10
           | (default, Nov 22 2023, 10:22:35)        [GCC 9.4.0] on linux
           | Type "help", "copyright", "credits" or "license" for more
           | information.       >>> from decimal import Decimal       >>>
           | Decimal('100000.000000000017')
           | Decimal('100000.000000000017')
           | 
           | For example:                 >>> import json       >>>
           | json.loads('{"a": 100000.000000000017}')       {'a':
           | 100000.00000000001}       >>> json.loads('{"a":
           | 100000.000000000017}', parse_float=Decimal)       {'a':
           | Decimal('100000.000000000017')}
        
             | cerved wrote:
             | but serializing/deserializing decimal using the json module
             | is futile
        
             | dhosek wrote:
             | And not every programming language offers a Decimal type
             | and on most of those, there's usually a performance penalty
             | associated with it not to mention issues of
             | interoperability and developer knowledge of its existence.
             | For financial calculations, usually using integers with an
             | implicit decimal offset (e.g., US currency amounts being
             | expressed in cents rather than dollars), while other
             | contexts will often determine that the inherent inaccuracy
             | of IEEE floating types is a non-issue. The biggest
             | potential problem lies in treating values that act kind of
             | like numbers and look like numbers as numbers, e.g., Dewey
             | Decimal classification numbers or the topic in a Library of
             | Congress classification.1
             | 
             | [?]
             | 
             | 1. This is a bit on my mind lately as I discovered that
             | LibraryThing's sort by LoC classification seems to be
             | broken so I exported my library (discovering that they
             | export as ISO8859-1 with no option for UTF-8) and wrote a
             | custom sorter for LOC classification codes for use in
             | finally arranging the books on my shelves after my move
             | last year.
        
           | enriquto wrote:
           | > That's true for every floating point number in every
           | programming language you have ever used, though.
           | 
           | Alright, if "you" have only ever used python. In C, for
           | example, we have hexadecimal floating point literals that
           | represent all floats and doubles exactly (including
           | infinities and nans that make the json parser fail
           | miserably).
        
             | legobmw99 wrote:
             | If you use the same syntax as OP, C's parser will also
             | round that literal. The existence of a hex literal for
             | floats is something orthogonal
        
             | codetrotter wrote:
             | > we have hexadecimal floating point literals that
             | represent all floats and doubles exactly
             | 
             | How do you do that?
             | 
             | A couple of resources I found but which I'm not sure if are
             | about exactly what you speak of
             | 
             | https://stackoverflow.com/questions/65480947/is-
             | ieee-754-rep...
             | 
             | https://gcc.gnu.org/onlinedocs/gcc/Hex-Floats.html
             | 
             | Furthermore, what exactly do you mean by "all floats and
             | doubles exactly"?
        
               | enriquto wrote:
               | Yes, I was talking about what is described in your
               | resources. You can do this:                   // define a
               | floating-point literal in hex and print it in decimal
               | float x = 0x1p-8;          // x = 1.0/256
               | printf("x = %g\n", x);     // prints 0.00390625
               | // define a floating point literal in decimal and print
               | it in various ways         float y = 0.3;             //
               | non-representable, rounded to closest float
               | printf("y = %g\n", y);     // 0.3 (the %g format does
               | some heuristics)         printf("y = %.10f\n", y);  //
               | 0.3000000119         printf("y = %.20f\n", y);  //
               | 0.30000001192092895508         printf("y = %a\n", f);
               | // 0x1.333334p-2
        
               | codetrotter wrote:
               | So for example if you make a variable that has the value
               | parent commenter used
               | 
               | 100000.000000000017
               | 
               | And then you print it.
               | 
               | Does it preserve the exact value?
        
               | enriquto wrote:
               | Your question is ambiguous for two different reasons.
               | First, this value is not representable as a floating-
               | point number, so there's no way that you can even store
               | it in a float. Second, once you have a float variable,
               | you can print it in many different ways. So, the answer
               | to your question is, irremediably, "it depends what you
               | mean by _exact value_ ".
               | 
               | If you print your variable with the %a format, then YES,
               | the exact value is preserved and there is no loss of
               | information. The problem is that the literal that you
               | wrote cannot be represented exactly. But this is hardly a
               | fault of the floats. Ints have exactly the same problem:
               | int x = 2.5;   // x gets the value 2         int y = 7/3;
               | // same thing
        
               | codetrotter wrote:
               | So in other words, is it fair to say that this situation
               | is not much different from what you get with Python?
        
         | tpm wrote:
         | https://0.30000000000000004.com/
         | 
         | Although it would be good to move in the direction of using a
         | BigDecimal equivalent by default when ingesting unknown data.
        
       | kccqzy wrote:
       | I'll add that for Haskell, the library everyone uses for JSON
       | parses numbers into Scientific types with almost unlimited size
       | and precision. I say almost unlimited because they use a decimal
       | coefficient-and-exponent representation where the exponent is a
       | 64-bit integer.
       | 
       | The documentation is quite paranoid that if you are dealing with
       | untrusted inputs, you could parse two JSON numbers from the
       | untrusted source fine and then performing an addition on them
       | could cause your memory to fill up. Exciting new DoS vector.
       | 
       | Of course in practice people end up parsing them into custom
       | types with 64-bit integers, so this is only a problem if you are
       | manipulating JSON directly which is very rare in Haskell.
        
       | egwor wrote:
       | I think the thing folk miss is when there's an error like divide
       | by zero, or the calculation would return NaN. I feel like this is
       | the main gap/concern with using JSON and it seems to be rarely
       | discussed.
        
         | olejorgenb wrote:
         | Agreed, this can be a pain. Python by default serialize and de-
         | serialize the `NaN` literal, making you pay some cleanup cost
         | once you need to interopt with other systems. (same for `Inf`)
         | 
         | Say what you want about NaN, but IEEE 754 is the facto way of
         | dealing with floating points in computers and even if NaNs and
         | Infs are a bit "fringe" it's unfortunate that the most popular
         | serialization format can not represent these.
        
       | nigeltao wrote:
       | When I wrote my jsonptr tool a few years ago, I noticed that some
       | JSON libraries (in both C++ and Rust) don't even do "parse a
       | string of decimal digits as a float64" properly. I don't mean
       | that in the "0.3 isn't exactly representable; have
       | 0.30000000000000004 instead" sense.
       | 
       | I mean that rapidjson (C++) parsed the string
       | "0.99999999999999999" as the number 1.0000000000000003. Apart
       | from just looking weird, it's a different float64 bit-pattern:
       | 0x3FF0000000000000 vs 0x3FF0000000000001.
       | 
       | Similarly, serde-json (Rust) parsed "122.416294033786585" as
       | 122.4162940337866. This isn't as obvious a difference, but the
       | bit-patterns differ by one: 0x405E9AA48FBB2888 vs
       | 0x405E9AA48FBB2889. Serde-json does have an "float_roundtrip"
       | feature flag, but it's opt-in, not enabled by default.
       | 
       | For details, look for "rapidjson issue #1773" and "serde_json
       | issue #707" at https://nigeltao.github.io/blog/2020/jsonptr.html
        
         | 01HNNWZ0MV43FF wrote:
         | Oh wow. So serde_json doesn't roundtrip floats by default, it
         | uses some imprecise faster algorithm https://github.com/serde-
         | rs/json/issues/707
         | 
         | Good thing there's msgpack I guess.
        
       | ctrw wrote:
       | I still get a laugh of ecma 404. The first time I looked it up I
       | refreshed the page a large number of times before I realized it
       | wasn't an error.
        
       | hinkley wrote:
       | One of the first Ajax projects I worked on was multi tenant, and
       | someone decided to solve the industrial espionage problem by
       | using random 64 bit identifiers for all records in the system.
       | You have about a .1% chance of generating an ID that gets
       | truncated in JavaScript, which is just enough that you might make
       | it past MVP before anyone figures out it's broken, and that's
       | exactly what happened to us.
       | 
       | So we had to go through all the code adding quotes to all the ID
       | fields. That was a giant pain in my ass.
        
       | bevekspldnw wrote:
       | Try to get a DECIMAL value out of a Postgres database into a JSON
       | API response and you'll learn all this and more in the most
       | painful way possible!
        
       ___________________________________________________________________
       (page generated 2024-04-01 23:00 UTC)