[HN Gopher] Stuffed-Na(a)N: stuff your NaNs
___________________________________________________________________
Stuffed-Na(a)N: stuff your NaNs
Author : dgroshev
Score : 93 points
Date : 2025-04-26 14:04 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| axblount wrote:
| This is usually called NaN-boxing and is often used to implement
| dynamic languages.
|
| https://piotrduperas.com/posts/nan-boxing
| addoo wrote:
| I appreciate this article a lot more because it contains an
| iota of benchmarking with an explanation about why this might
| be more performant. Especially since my first thought was
| 'wouldn't this require more instructions to be executed?'
|
| The original post seems really weird to me. I would have
| dismissed it as someone's hobby project, but... that doesn't
| seem like what it's trying to be.
| wging wrote:
| It's a joke project.
| Tuna-Fish wrote:
| "More instructions to execute" is not synonymous with
| "slower".
|
| NaNboxing lets you use less memory on certain use cases.
| Because memory access is slow and caches are fixed size, this
| is usually a performance win, even if you have to do a few
| extra ops on every access.
| jasonthorsness wrote:
| I wonder if IEEE-754 designers anticipated this use case during
| development. Great article - this kind of "outside-the-normal-
| use-case" requires a very careful reading of the
| specifications/guarantees of the language.
| purplesyringa wrote:
| IEEE-754 seems to say that NaN payloads are designed to
| contain "retrospective diagnostic information inherited from
| invalid or unavailable data and results". While NaN boxing in
| particular probably wasn't the intention, untouched payloads
| in general absolutely were.
| dunham wrote:
| I learned about this when trying to decode data from Firefox
| IndexedDB. (I was extracting Tana data.) Their structured clone
| data format uses nan-boxing for serialization.
| AaronAPU wrote:
| Reminds me of using the highest bit(s) of 64-bit ints to stuff
| auxiliary data into lockfree algorithms. So long as you're aware
| of your OS environment you can enable some efficiencies you
| couldn't otherwise.
| eru wrote:
| That's why OCaml has 63 bit integers by default: the language
| itself uses the last bit to help with GC.
| amelius wrote:
| I'm curious why we have not-a-number, but not not-a-string, not-
| a-boolean, not-an-enum, and not-a-mycustomtype.
| ok_computer wrote:
| Because representing infinity is not possible outside of
| symbolic logic and isn't encodable in floats. I think it is a
| simple numerical reason and not a deeper computer reason.
| Sharlin wrote:
| Well, infinity is totally representable with IEEE 754 floats.
| For example 1.0/0.0 == +inf, -1.0/0.0 == -inf, but 0.0/0.0 ==
| NaN.
| amelius wrote:
| A smart compiler should be able to figure out a better
| value for 0/0, depending on context.
|
| For example: for i in range(0, 10):
| print(i/0.0)
|
| In this case it should probably print +inf when i == 0.
|
| But: for i in range(-10, 10):
| print(i/0.0)
|
| Now it is not clear, but at least we know it's an infinity
| so perhaps we need a special value +-inf.
|
| And: for i in range(-10, 10):
| print(i/i)
|
| In this case, the value for 0/0 can be 1.
| Sharlin wrote:
| Well, it could, but that would be against the spec. The
| hardware implements IEEE 754, most languages guarantee
| IEEE 754, and transforming code so that 0.0/0.0 doesn't
| result in NaN would be invalid.
| IncreasePosts wrote:
| Some common numeric operations can result in non-numbers(eg
| division by zero - Nan or infinity).
|
| Are there any common string operations with similar behavior?
| masfuerte wrote:
| Out of range substring? Some languages throw an error, others
| return an empty string. You could return a propagating NaS
| instead. I don't know what you'd use it for.
| cluckindan wrote:
| Charset translation.
|
| Unicode's is basically a symbol for not-a-char.
| Sharlin wrote:
| Because IEEE 754 creators wanted to signal non-trapping error
| conditions for mathematically undefined operations, and they
| had a plenty of bit patterns to spare. Apparently back in the
| 70s and 80s in many cases it was preferable for a computation
| to go through and produce NaNs rather than trapping instantly
| when executing an undefined operation. I'm not quite sure what
| the reasoning was exactly.
| wbl wrote:
| In early FP machines the floating point processor could not
| take a trap at a faulting instruction precisely: it could
| only go bad things. Furthermore for programmers and hardware
| it can be very expensive. Rather than go through a loop and
| filter NaN out of results it becomes trap every time and
| resume and is a pain.
| IshKebab wrote:
| It avoids traps which are really inconvenient.
| dzaima wrote:
| Because NaNs come from a standardized hardware-supported type,
| whereas the rest of those are largely language-specific (and
| you could consider null/nil as a "not-a-*" type for those in
| applicable languages; and there are languages which disallow
| NaN floats too, which completes all combinations).
|
| Itanium had a bit for "not a thing" for integers (and perhaps
| some older hardware from around the time floats started being a
| thing had similar things), so the idea of hardware support for
| not-a-* isn't exclusive to floats, but evidently this hasn't
| caught on; generally it's messy because it needs a bit pattern
| to yoink, but many types already use all possible ones (whereas
| floats already needed to chop out some for infinities).
| Sharlin wrote:
| You can encode not-a-bool, not-a-(utf-8)-string and not-an-enum
| using one of the invalid bit patterns - that's exactly what the
| Rust compiler can do with its "niche optimization":
| https://www.0xatticus.com/posts/understanding_rust_niche/
| usefulcat wrote:
| C++ has std::optional for exactly that purpose.
| mathgradthrow wrote:
| Nana basically means that floating point arithmetic is
| predicting that your mathematical expression is an
| "indeterminate form", as in the thing you learn in calculus.
| sgerenser wrote:
| Not-a-boolean would be something like the much maligned tri-
| state bool pattern:
| https://thedailywtf.com/articles/What_Is_Truth_0x3f_
| dzaima wrote:
| This doesn't work on Firefox, as it normalizes NaNs as they're
| extracted from ArrayBuffers. Presumably because SpiderMonkey uses
| NaN-boxing itself, and thus just doesn't have any way to
| represent actual non-canonical NaN floats.
| moffkalast wrote:
| Dammit Mozilla, first no WebGPU, now this?! /s
| sjrd wrote:
| The spec mandates normalization of NaNs in ArrayBuffers. If
| other engines do not normalize, I believe it's a bug in those
| engines!
| haxiomic wrote:
| That's curious! Does anyone know why the spec was designed to
| allow so many possible NaN bit patterns? Seems like just one
| would do!
| dzaima wrote:
| As long as you want all bit patterns to be Some float, there's
| not really one bit pattern you can chop out of somewhere (or,
| three, rather - both infinities, and NaN).
|
| Taking, say, the 3 smallest subnormal numbers, or the three
| largest numbers, or whatever, would be extremely bad for
| allowing optimizations/algorithms/correctness checkers to
| reason about operations in the abstract.
| brewmarche wrote:
| You can put diagnostic information about the cause of the NaN
| in the bits. IEEE754 doesn't mandate any format though, leaving
| it up to the implementation. You can check section 6 of
| IEEE754:2019 (which also talks about the possibility to use it
| to extend the standard to wider ranges, more infinities or
| other stuff)
| vitaut wrote:
| I made a garlic nan: https://www.godbolt.org/z/enjv1c7Tf
| dgroshev wrote:
| I love it, how did I not think about this!
| scorchingjello wrote:
| Since the first time I had Indian food sometime in 1987 I have
| always called naan "not a number bread" and no one has ever
| laughed. I feel like i may have found my people.
| dspillett wrote:
| This may appeal:
| https://doubleyoudoubledoubleewe.www.dash.deeohhtee.dash-das...
| carterschonwald wrote:
| The nan bits were intended to help provide more informative float
| error diagnostics at the language implementation level for non
| crashing code. I wonder if I'll ever get to exploring it myself.
| johnklos wrote:
| It's a shame this won't work on VAX.
| o11c wrote:
| Note that there are a lot of exciting new float-related functions
| in C23, many related to NaN handling, signed zeros, or precision
| in edge cases. Some have been in glibc for years; others not yet.
|
| The API for `getpayload`/`setpayload`/`setpayloadsig` looks a
| little funny but it's easy enough to wrap (just consider the edge
| cases; in particular remember that whether 0 is valid for quiet
| or signaling NaNs is platform-dependent).
|
| Finally we have a reliable `roundeven`, and the convenience of
| directly calling `nextup`/`nextdown` (but note that you'll still
| only visit one of the zeros).
|
| The new `fmaximum` family is confusing, but I think I have it
| clear: without 'imum' (legacy): prefer non-NaN
| with 'imum': zeros distinguished (-0 < +0) with '_num':
| prefer a non-NaN, also signal if applicable with '_mag':
| compare as if fabs first
|
| `totalorder` has been often-desired; the other new comparisons
| not so much.
|
| `rootn` and `compoundn` are surprisingly tricky to implement
| yourself. I still have one testcase I'm not sure why I have to
| hand-patch, and I'm not sure which implementation choice keeps
| the best precision (to say nothing of correct rounding mode).
| Etheryte wrote:
| > but note that you'll still only visit one of the zeros
|
| Out of curiosity, isn't this what you'd pretty much universally
| want? Or maybe I misunderstand the intent here?
| o11c wrote:
| I find `nextup/`nextdown` useful for when you want to
| calculate `{f(nextdown(x)), f(x), f(nextup(x))}` and see if
| some property holds (or to what degree it holds) for both `x`
| and its surrounding values. It's annoying that one such
| surrounding value gets skipped.
|
| That said, quite often you need to handle signed zero
| specially already, and if you don't the property you're
| testing might sufficiently differ in the interesting way for
| the smallest-magnitude subnormals.
| clhodapp wrote:
| I guess I just don't get it because the before & after cases
| don't seem to be showing remotely comparable use cases.
|
| In the "before" use-cases, the programmer has accidentally
| written a bug and gets back a single NaN from a logical operation
| they performed as a result.
|
| In the "after" cases, the programmer already has some data and
| explicitly decides to convert it to a NaN encoding. But instead
| of actually getting back a NaN that is secretly carrying their
| data, they actually get a whole array of NaN's, which duck type
| differently, and thus are likely to disappear into another NaN or
| an undefined if they are propagated through an API.
|
| Like.. I get that the whole thing is supposed to be humorously
| absurd but... It just doesn't land unless there's something
| technical I'm missing to connect up the pieces.
| brap wrote:
| It sounds like you might be interested in the Enterprise
| Edition
| roywiggins wrote:
| Four NaNs? Four? That's insane.
|
| https://m.youtube.com/watch?v=feJlRDLX0iQ
___________________________________________________________________
(page generated 2025-04-26 23:00 UTC)