[HN Gopher] Stuffed-Na(a)N: stuff your NaNs
       ___________________________________________________________________
        
       Stuffed-Na(a)N: stuff your NaNs
        
       Author : dgroshev
       Score  : 93 points
       Date   : 2025-04-26 14:04 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | axblount wrote:
       | This is usually called NaN-boxing and is often used to implement
       | dynamic languages.
       | 
       | https://piotrduperas.com/posts/nan-boxing
        
         | addoo wrote:
         | I appreciate this article a lot more because it contains an
         | iota of benchmarking with an explanation about why this might
         | be more performant. Especially since my first thought was
         | 'wouldn't this require more instructions to be executed?'
         | 
         | The original post seems really weird to me. I would have
         | dismissed it as someone's hobby project, but... that doesn't
         | seem like what it's trying to be.
        
           | wging wrote:
           | It's a joke project.
        
           | Tuna-Fish wrote:
           | "More instructions to execute" is not synonymous with
           | "slower".
           | 
           | NaNboxing lets you use less memory on certain use cases.
           | Because memory access is slow and caches are fixed size, this
           | is usually a performance win, even if you have to do a few
           | extra ops on every access.
        
         | jasonthorsness wrote:
         | I wonder if IEEE-754 designers anticipated this use case during
         | development. Great article - this kind of "outside-the-normal-
         | use-case" requires a very careful reading of the
         | specifications/guarantees of the language.
        
           | purplesyringa wrote:
           | IEEE-754 seems to say that NaN payloads are designed to
           | contain "retrospective diagnostic information inherited from
           | invalid or unavailable data and results". While NaN boxing in
           | particular probably wasn't the intention, untouched payloads
           | in general absolutely were.
        
       | dunham wrote:
       | I learned about this when trying to decode data from Firefox
       | IndexedDB. (I was extracting Tana data.) Their structured clone
       | data format uses nan-boxing for serialization.
        
       | AaronAPU wrote:
       | Reminds me of using the highest bit(s) of 64-bit ints to stuff
       | auxiliary data into lockfree algorithms. So long as you're aware
       | of your OS environment you can enable some efficiencies you
       | couldn't otherwise.
        
         | eru wrote:
         | That's why OCaml has 63 bit integers by default: the language
         | itself uses the last bit to help with GC.
        
       | amelius wrote:
       | I'm curious why we have not-a-number, but not not-a-string, not-
       | a-boolean, not-an-enum, and not-a-mycustomtype.
        
         | ok_computer wrote:
         | Because representing infinity is not possible outside of
         | symbolic logic and isn't encodable in floats. I think it is a
         | simple numerical reason and not a deeper computer reason.
        
           | Sharlin wrote:
           | Well, infinity is totally representable with IEEE 754 floats.
           | For example 1.0/0.0 == +inf, -1.0/0.0 == -inf, but 0.0/0.0 ==
           | NaN.
        
             | amelius wrote:
             | A smart compiler should be able to figure out a better
             | value for 0/0, depending on context.
             | 
             | For example:                   for i in range(0, 10):
             | print(i/0.0)
             | 
             | In this case it should probably print +inf when i == 0.
             | 
             | But:                   for i in range(-10, 10):
             | print(i/0.0)
             | 
             | Now it is not clear, but at least we know it's an infinity
             | so perhaps we need a special value +-inf.
             | 
             | And:                   for i in range(-10, 10):
             | print(i/i)
             | 
             | In this case, the value for 0/0 can be 1.
        
               | Sharlin wrote:
               | Well, it could, but that would be against the spec. The
               | hardware implements IEEE 754, most languages guarantee
               | IEEE 754, and transforming code so that 0.0/0.0 doesn't
               | result in NaN would be invalid.
        
         | IncreasePosts wrote:
         | Some common numeric operations can result in non-numbers(eg
         | division by zero - Nan or infinity).
         | 
         | Are there any common string operations with similar behavior?
        
           | masfuerte wrote:
           | Out of range substring? Some languages throw an error, others
           | return an empty string. You could return a propagating NaS
           | instead. I don't know what you'd use it for.
        
           | cluckindan wrote:
           | Charset translation.
           | 
           | Unicode's  is basically a symbol for not-a-char.
        
         | Sharlin wrote:
         | Because IEEE 754 creators wanted to signal non-trapping error
         | conditions for mathematically undefined operations, and they
         | had a plenty of bit patterns to spare. Apparently back in the
         | 70s and 80s in many cases it was preferable for a computation
         | to go through and produce NaNs rather than trapping instantly
         | when executing an undefined operation. I'm not quite sure what
         | the reasoning was exactly.
        
           | wbl wrote:
           | In early FP machines the floating point processor could not
           | take a trap at a faulting instruction precisely: it could
           | only go bad things. Furthermore for programmers and hardware
           | it can be very expensive. Rather than go through a loop and
           | filter NaN out of results it becomes trap every time and
           | resume and is a pain.
        
           | IshKebab wrote:
           | It avoids traps which are really inconvenient.
        
         | dzaima wrote:
         | Because NaNs come from a standardized hardware-supported type,
         | whereas the rest of those are largely language-specific (and
         | you could consider null/nil as a "not-a-*" type for those in
         | applicable languages; and there are languages which disallow
         | NaN floats too, which completes all combinations).
         | 
         | Itanium had a bit for "not a thing" for integers (and perhaps
         | some older hardware from around the time floats started being a
         | thing had similar things), so the idea of hardware support for
         | not-a-* isn't exclusive to floats, but evidently this hasn't
         | caught on; generally it's messy because it needs a bit pattern
         | to yoink, but many types already use all possible ones (whereas
         | floats already needed to chop out some for infinities).
        
         | Sharlin wrote:
         | You can encode not-a-bool, not-a-(utf-8)-string and not-an-enum
         | using one of the invalid bit patterns - that's exactly what the
         | Rust compiler can do with its "niche optimization":
         | https://www.0xatticus.com/posts/understanding_rust_niche/
        
         | usefulcat wrote:
         | C++ has std::optional for exactly that purpose.
        
         | mathgradthrow wrote:
         | Nana basically means that floating point arithmetic is
         | predicting that your mathematical expression is an
         | "indeterminate form", as in the thing you learn in calculus.
        
         | sgerenser wrote:
         | Not-a-boolean would be something like the much maligned tri-
         | state bool pattern:
         | https://thedailywtf.com/articles/What_Is_Truth_0x3f_
        
       | dzaima wrote:
       | This doesn't work on Firefox, as it normalizes NaNs as they're
       | extracted from ArrayBuffers. Presumably because SpiderMonkey uses
       | NaN-boxing itself, and thus just doesn't have any way to
       | represent actual non-canonical NaN floats.
        
         | moffkalast wrote:
         | Dammit Mozilla, first no WebGPU, now this?! /s
        
         | sjrd wrote:
         | The spec mandates normalization of NaNs in ArrayBuffers. If
         | other engines do not normalize, I believe it's a bug in those
         | engines!
        
       | haxiomic wrote:
       | That's curious! Does anyone know why the spec was designed to
       | allow so many possible NaN bit patterns? Seems like just one
       | would do!
        
         | dzaima wrote:
         | As long as you want all bit patterns to be Some float, there's
         | not really one bit pattern you can chop out of somewhere (or,
         | three, rather - both infinities, and NaN).
         | 
         | Taking, say, the 3 smallest subnormal numbers, or the three
         | largest numbers, or whatever, would be extremely bad for
         | allowing optimizations/algorithms/correctness checkers to
         | reason about operations in the abstract.
        
         | brewmarche wrote:
         | You can put diagnostic information about the cause of the NaN
         | in the bits. IEEE754 doesn't mandate any format though, leaving
         | it up to the implementation. You can check section 6 of
         | IEEE754:2019 (which also talks about the possibility to use it
         | to extend the standard to wider ranges, more infinities or
         | other stuff)
        
       | vitaut wrote:
       | I made a garlic nan: https://www.godbolt.org/z/enjv1c7Tf
        
         | dgroshev wrote:
         | I love it, how did I not think about this!
        
       | scorchingjello wrote:
       | Since the first time I had Indian food sometime in 1987 I have
       | always called naan "not a number bread" and no one has ever
       | laughed. I feel like i may have found my people.
        
         | dspillett wrote:
         | This may appeal:
         | https://doubleyoudoubledoubleewe.www.dash.deeohhtee.dash-das...
        
       | carterschonwald wrote:
       | The nan bits were intended to help provide more informative float
       | error diagnostics at the language implementation level for non
       | crashing code. I wonder if I'll ever get to exploring it myself.
        
       | johnklos wrote:
       | It's a shame this won't work on VAX.
        
       | o11c wrote:
       | Note that there are a lot of exciting new float-related functions
       | in C23, many related to NaN handling, signed zeros, or precision
       | in edge cases. Some have been in glibc for years; others not yet.
       | 
       | The API for `getpayload`/`setpayload`/`setpayloadsig` looks a
       | little funny but it's easy enough to wrap (just consider the edge
       | cases; in particular remember that whether 0 is valid for quiet
       | or signaling NaNs is platform-dependent).
       | 
       | Finally we have a reliable `roundeven`, and the convenience of
       | directly calling `nextup`/`nextdown` (but note that you'll still
       | only visit one of the zeros).
       | 
       | The new `fmaximum` family is confusing, but I think I have it
       | clear:                 without 'imum' (legacy): prefer non-NaN
       | with 'imum': zeros distinguished (-0 < +0)       with '_num':
       | prefer a non-NaN, also signal if applicable       with '_mag':
       | compare as if fabs first
       | 
       | `totalorder` has been often-desired; the other new comparisons
       | not so much.
       | 
       | `rootn` and `compoundn` are surprisingly tricky to implement
       | yourself. I still have one testcase I'm not sure why I have to
       | hand-patch, and I'm not sure which implementation choice keeps
       | the best precision (to say nothing of correct rounding mode).
        
         | Etheryte wrote:
         | > but note that you'll still only visit one of the zeros
         | 
         | Out of curiosity, isn't this what you'd pretty much universally
         | want? Or maybe I misunderstand the intent here?
        
           | o11c wrote:
           | I find `nextup/`nextdown` useful for when you want to
           | calculate `{f(nextdown(x)), f(x), f(nextup(x))}` and see if
           | some property holds (or to what degree it holds) for both `x`
           | and its surrounding values. It's annoying that one such
           | surrounding value gets skipped.
           | 
           | That said, quite often you need to handle signed zero
           | specially already, and if you don't the property you're
           | testing might sufficiently differ in the interesting way for
           | the smallest-magnitude subnormals.
        
       | clhodapp wrote:
       | I guess I just don't get it because the before & after cases
       | don't seem to be showing remotely comparable use cases.
       | 
       | In the "before" use-cases, the programmer has accidentally
       | written a bug and gets back a single NaN from a logical operation
       | they performed as a result.
       | 
       | In the "after" cases, the programmer already has some data and
       | explicitly decides to convert it to a NaN encoding. But instead
       | of actually getting back a NaN that is secretly carrying their
       | data, they actually get a whole array of NaN's, which duck type
       | differently, and thus are likely to disappear into another NaN or
       | an undefined if they are propagated through an API.
       | 
       | Like.. I get that the whole thing is supposed to be humorously
       | absurd but... It just doesn't land unless there's something
       | technical I'm missing to connect up the pieces.
        
         | brap wrote:
         | It sounds like you might be interested in the Enterprise
         | Edition
        
       | roywiggins wrote:
       | Four NaNs? Four? That's insane.
       | 
       | https://m.youtube.com/watch?v=feJlRDLX0iQ
        
       ___________________________________________________________________
       (page generated 2025-04-26 23:00 UTC)