[HN Gopher] NaN Boxing (2020)
___________________________________________________________________
NaN Boxing (2020)
Author : jstanley
Score : 45 points
Date : 2022-09-16 18:51 UTC (4 hours ago)
(HTM) web link (piotrduperas.com)
(TXT) w3m dump (piotrduperas.com)
| Waterluvian wrote:
| I'm curious for feedback about the very beginning of this post.
|
| How many people see the "let a" example as "the variable 'a'
| keeps changing shape/type" and how many see it as, "three
| different things were created and the variable 'a' kept getting
| reassigned to point at the newest one."
|
| Or is this not really a distinction people see?
| skybrian wrote:
| It seems like the beginning of the post avoids the issue.
| "First _a_ was this, now it 's this" doesn't take a position on
| whether it's really the same object, just how it appears in the
| debugger.
| throwawaymaths wrote:
| I wonder if you could do better; instead of stuffing that bit at
| the bottom as an int marker, do your best to represent the int
| faithfully (with higher order nan bits doing their thing) so you
| optimistically perform the add and issue the add at the same time
| as you issue the conditional, then by the time the conditional
| resolves both your float addition and integer addition have come
| back and you pick the winner with no branching
| ridiculous_fish wrote:
| It's an interesting idea. In practice one must also consider
| int + double and double + int, so there's four possible paths.
| sosodev wrote:
| I know we're supposed to stay on topic, but I seriously thought
| this was going to be an article about elderly women punching each
| other.
| proto_lambda wrote:
| HN title mangling strikes again. "NaN Boxing", which is very
| likely what the OP submitted it as, would've been obvious.
| dang wrote:
| Twont happen again.
| jstanley wrote:
| Thanks, I hadn't noticed that HN had mangled it. Fixed now.
| [deleted]
| blowski wrote:
| For a while, this was a popular Christmas novelty gift.
| https://www.gigglegadgets.co.uk/home/307-boxing-grannies.htm...
| beagle3 wrote:
| You can also double the boxed space by using subnormals (nee
| denormals) if you don't need them for precision (and if you do
| need them .... my condolences).
| an1sotropy wrote:
| Can you explain this more? I mean why is the space doubled,
| exactly?
| [deleted]
| sjrd wrote:
| Because there are as many subnormal values as NaN values. At
| least if you don't count the two 0's, which are "paired" with
| the two infinities.
| an1sotropy wrote:
| oh right - sorry stupid question.
| maglite77 wrote:
| It's been ages, but this use of unions reminds me of the VARIANT
| structure definition from Windows OA/COM:
|
| [1]: https://docs.microsoft.com/en-
| us/windows/win32/api/oaidl/ns-...
| an1sotropy wrote:
| Some other things on the same topic:
|
| https://leonardschuetz.ch/blog/nan-boxing/
|
| https://anniecherkaev.com/the-secret-life-of-nan
|
| Is NaN-boxing actually used in the major browser JS engines?
| moonchild wrote:
| Afaik, it is not used by chrome, but it is used by firefox.
| Luajit also uses it.
| ridiculous_fish wrote:
| v8 does not use NaN-boxes; instead they use the low bit to
| distinguish between a 31-bit small integer ("smi") or a
| pointer. Doubles are additionally sometimes stored inline
| ("double field unboxing") I'm not sure how this works exactly.
| Other times they are heap-allocated. I am not sure if there is
| a specialized double-allocator, I'd like to know.
|
| JavaScriptCore uses a tweaked NaN-box [1]: values are stored
| via a NaN box minus a constant, which avoids requiring a mask
| when chasing pointers. This makes pointers cheaper but floating
| point operations more expensive.
|
| SpiderMonkey and Hermes both use straight NaN-boxing to my
| knowledge.
|
| https://github.com/WebKit/WebKit/blob/ec6b5337e777f9b460ec6b...
| kmeisthax wrote:
| NaN boxing sounds like it would have very significant pointer
| provenance issues. CHERI would also stick its nose up at it.
|
| The tagged pointer type has the advantage of holding _real
| pointers_ , meaning that any machine[0] that sticks extra
| information onto its pointers transparently just works. And
| reading the int half of the union to check if the pointer is
| valid should always be sound, but I'm not 100% sure on that.
| Architectures that don't enforce alignment are also probably not
| the sort of thing we care about running on, and I doubt there's a
| way for an optimizer to legally screw over programs that depend
| on pointer alignment being there on architectures where it has to
| be there.
|
| But, then again, that's what we said about int-to-pointer
| casts...
|
| [0] Real or virtual. Remember that when you compile your program
| with an optimizing compiler or JIT, it is executing on two
| architectures:
|
| 1. The compiler's expression/AST interpreter, which enforces ISO
| C undefined behavior rules
|
| 2. The target architecture, where those undefined behaviors
| become merely unspecified.
| armchairhacker wrote:
| Any incompatibility can be resolved by having an explicit "NaN
| box" type. For OSs where pointers always end in b000 and NaN
| boxing just works, it compiles to 8 bytes. For weird OSs like
| CHERI or those which store metadata in pointers, it compiles to
| an ordinary tagged union.
| Dylan16807 wrote:
| NaN boxing doesn't need pointers to end in b000. You need
| more than ten bits that don't matter. So you need something
| like 48 bit pointers, and once you have that you're already
| done. That's enough to tag 30 different types of pointer and
| every smaller type.
| klodolph wrote:
| I don't think that pointer provenance rears its ugly head here.
| You are generally free to convert a pointer to an integer and
| then back, which is what's happening here, and you get pointer
| provenance problems because different pointers may be supposed
| to point to different objects. Basically, casting to an integer
| and back is supposed to be "safe", but doing math and getting
| pointers to different objects is not.
|
| Note that there are some implementation-dependent factors here
| which I'm not getting into.
|
| > And reading the int half of the union to check if the pointer
| is valid should always be sound, but I'm not 100% sure on that.
|
| After various alias problems a while back (Linux kernel folks
| had some words), everyone got together and agreed that you can
| do type punning through a union... you just get the byte
| representation of one type, reinterpreted as the byte
| representation of another type. This was codified a few C
| standards ago and it's fairly explicit in the spec now.
|
| I'll also just mention that you don't need an architecture than
| enforces alignment--you just need an allocator that returns
| aligned pointers.
| MaulingMonkey wrote:
| > I don't think that pointer provenance rears its ugly head
| here. You are generally free to convert a pointer to an
| integer and then back
|
| It's worth noting that Rust is experimenting with making this
| _not_ so free for one to do:
|
| https://doc.rust-lang.org/std/ptr/index.html#pointer-
| usize-p...
| wahern wrote:
| > I don't think that pointer provenance rears its ugly head
| here. You are generally free to convert a pointer to an
| integer and then back, which is what's happening here
|
| C [optionally] supports converting pointers to intptr_t (or
| uintptr_t), not to any integer type (even with nominally
| sufficient width), and certainly not to a floating point
| type, which is how NaN boxing works.
|
| In CHERI intptr_t requires special treatment by the compiler.
| Plus, the nominal width of both pointers and intptr_t double
| in size--128-bits on 64-bit architectures. Most environments
| don't even have a 128-bit floating point type, even when the
| hardware might support it. (According to Wikipedia, Fortran
| is an exception, but it's still uncommon in C and other
| popular language environments.)
___________________________________________________________________
(page generated 2022-09-16 23:00 UTC)