[HN Gopher] NaN Boxing (2020)
       ___________________________________________________________________
        
       NaN Boxing (2020)
        
       Author : jstanley
       Score  : 45 points
       Date   : 2022-09-16 18:51 UTC (4 hours ago)
        
 (HTM) web link (piotrduperas.com)
 (TXT) w3m dump (piotrduperas.com)
        
       | Waterluvian wrote:
       | I'm curious for feedback about the very beginning of this post.
       | 
       | How many people see the "let a" example as "the variable 'a'
       | keeps changing shape/type" and how many see it as, "three
       | different things were created and the variable 'a' kept getting
       | reassigned to point at the newest one."
       | 
       | Or is this not really a distinction people see?
        
         | skybrian wrote:
         | It seems like the beginning of the post avoids the issue.
         | "First _a_ was this, now it 's this" doesn't take a position on
         | whether it's really the same object, just how it appears in the
         | debugger.
        
       | throwawaymaths wrote:
       | I wonder if you could do better; instead of stuffing that bit at
       | the bottom as an int marker, do your best to represent the int
       | faithfully (with higher order nan bits doing their thing) so you
       | optimistically perform the add and issue the add at the same time
       | as you issue the conditional, then by the time the conditional
       | resolves both your float addition and integer addition have come
       | back and you pick the winner with no branching
        
         | ridiculous_fish wrote:
         | It's an interesting idea. In practice one must also consider
         | int + double and double + int, so there's four possible paths.
        
       | sosodev wrote:
       | I know we're supposed to stay on topic, but I seriously thought
       | this was going to be an article about elderly women punching each
       | other.
        
         | proto_lambda wrote:
         | HN title mangling strikes again. "NaN Boxing", which is very
         | likely what the OP submitted it as, would've been obvious.
        
           | dang wrote:
           | Twont happen again.
        
           | jstanley wrote:
           | Thanks, I hadn't noticed that HN had mangled it. Fixed now.
        
           | [deleted]
        
         | blowski wrote:
         | For a while, this was a popular Christmas novelty gift.
         | https://www.gigglegadgets.co.uk/home/307-boxing-grannies.htm...
        
       | beagle3 wrote:
       | You can also double the boxed space by using subnormals (nee
       | denormals) if you don't need them for precision (and if you do
       | need them .... my condolences).
        
         | an1sotropy wrote:
         | Can you explain this more? I mean why is the space doubled,
         | exactly?
        
           | [deleted]
        
           | sjrd wrote:
           | Because there are as many subnormal values as NaN values. At
           | least if you don't count the two 0's, which are "paired" with
           | the two infinities.
        
             | an1sotropy wrote:
             | oh right - sorry stupid question.
        
       | maglite77 wrote:
       | It's been ages, but this use of unions reminds me of the VARIANT
       | structure definition from Windows OA/COM:
       | 
       | [1]: https://docs.microsoft.com/en-
       | us/windows/win32/api/oaidl/ns-...
        
       | an1sotropy wrote:
       | Some other things on the same topic:
       | 
       | https://leonardschuetz.ch/blog/nan-boxing/
       | 
       | https://anniecherkaev.com/the-secret-life-of-nan
       | 
       | Is NaN-boxing actually used in the major browser JS engines?
        
         | moonchild wrote:
         | Afaik, it is not used by chrome, but it is used by firefox.
         | Luajit also uses it.
        
         | ridiculous_fish wrote:
         | v8 does not use NaN-boxes; instead they use the low bit to
         | distinguish between a 31-bit small integer ("smi") or a
         | pointer. Doubles are additionally sometimes stored inline
         | ("double field unboxing") I'm not sure how this works exactly.
         | Other times they are heap-allocated. I am not sure if there is
         | a specialized double-allocator, I'd like to know.
         | 
         | JavaScriptCore uses a tweaked NaN-box [1]: values are stored
         | via a NaN box minus a constant, which avoids requiring a mask
         | when chasing pointers. This makes pointers cheaper but floating
         | point operations more expensive.
         | 
         | SpiderMonkey and Hermes both use straight NaN-boxing to my
         | knowledge.
         | 
         | https://github.com/WebKit/WebKit/blob/ec6b5337e777f9b460ec6b...
        
       | kmeisthax wrote:
       | NaN boxing sounds like it would have very significant pointer
       | provenance issues. CHERI would also stick its nose up at it.
       | 
       | The tagged pointer type has the advantage of holding _real
       | pointers_ , meaning that any machine[0] that sticks extra
       | information onto its pointers transparently just works. And
       | reading the int half of the union to check if the pointer is
       | valid should always be sound, but I'm not 100% sure on that.
       | Architectures that don't enforce alignment are also probably not
       | the sort of thing we care about running on, and I doubt there's a
       | way for an optimizer to legally screw over programs that depend
       | on pointer alignment being there on architectures where it has to
       | be there.
       | 
       | But, then again, that's what we said about int-to-pointer
       | casts...
       | 
       | [0] Real or virtual. Remember that when you compile your program
       | with an optimizing compiler or JIT, it is executing on two
       | architectures:
       | 
       | 1. The compiler's expression/AST interpreter, which enforces ISO
       | C undefined behavior rules
       | 
       | 2. The target architecture, where those undefined behaviors
       | become merely unspecified.
        
         | armchairhacker wrote:
         | Any incompatibility can be resolved by having an explicit "NaN
         | box" type. For OSs where pointers always end in b000 and NaN
         | boxing just works, it compiles to 8 bytes. For weird OSs like
         | CHERI or those which store metadata in pointers, it compiles to
         | an ordinary tagged union.
        
           | Dylan16807 wrote:
           | NaN boxing doesn't need pointers to end in b000. You need
           | more than ten bits that don't matter. So you need something
           | like 48 bit pointers, and once you have that you're already
           | done. That's enough to tag 30 different types of pointer and
           | every smaller type.
        
         | klodolph wrote:
         | I don't think that pointer provenance rears its ugly head here.
         | You are generally free to convert a pointer to an integer and
         | then back, which is what's happening here, and you get pointer
         | provenance problems because different pointers may be supposed
         | to point to different objects. Basically, casting to an integer
         | and back is supposed to be "safe", but doing math and getting
         | pointers to different objects is not.
         | 
         | Note that there are some implementation-dependent factors here
         | which I'm not getting into.
         | 
         | > And reading the int half of the union to check if the pointer
         | is valid should always be sound, but I'm not 100% sure on that.
         | 
         | After various alias problems a while back (Linux kernel folks
         | had some words), everyone got together and agreed that you can
         | do type punning through a union... you just get the byte
         | representation of one type, reinterpreted as the byte
         | representation of another type. This was codified a few C
         | standards ago and it's fairly explicit in the spec now.
         | 
         | I'll also just mention that you don't need an architecture than
         | enforces alignment--you just need an allocator that returns
         | aligned pointers.
        
           | MaulingMonkey wrote:
           | > I don't think that pointer provenance rears its ugly head
           | here. You are generally free to convert a pointer to an
           | integer and then back
           | 
           | It's worth noting that Rust is experimenting with making this
           | _not_ so free for one to do:
           | 
           | https://doc.rust-lang.org/std/ptr/index.html#pointer-
           | usize-p...
        
           | wahern wrote:
           | > I don't think that pointer provenance rears its ugly head
           | here. You are generally free to convert a pointer to an
           | integer and then back, which is what's happening here
           | 
           | C [optionally] supports converting pointers to intptr_t (or
           | uintptr_t), not to any integer type (even with nominally
           | sufficient width), and certainly not to a floating point
           | type, which is how NaN boxing works.
           | 
           | In CHERI intptr_t requires special treatment by the compiler.
           | Plus, the nominal width of both pointers and intptr_t double
           | in size--128-bits on 64-bit architectures. Most environments
           | don't even have a 128-bit floating point type, even when the
           | hardware might support it. (According to Wikipedia, Fortran
           | is an exception, but it's still uncommon in C and other
           | popular language environments.)
        
       ___________________________________________________________________
       (page generated 2022-09-16 23:00 UTC)