[HN Gopher] One does not simply calculate the absolute value
___________________________________________________________________
One does not simply calculate the absolute value
Author : mot2ba
Score : 93 points
Date : 2021-08-22 18:12 UTC (4 hours ago)
(HTM) web link (habr.com)
(TXT) w3m dump (habr.com)
| Sharlin wrote:
| Computing the absolute value of a signed (fixed-size, two's
| complement) integer is not trivial either, because there is no
| positive i such that `abs(INT_MIN) == i`. Some languages like
| Java just give you `abs(INT_MIN) == INT_MIN` because:
| -100...000 = ~100...000 + 1 = 011...111 + 1 = 100_000
|
| where ~ is the bitwise complement. Even worse, in C and C++
| specifically, because overflow of signed integers is undefined
| behavior, so is `abs(INT_MIN)`! To be able to represent the
| absolute value, you need to either return the corresponding
| unsigned type (if your language has them), promote to a wider
| type, or return a tuple `(carry_bit, result)`.
| m4r35n357 wrote:
| Bah! Just square & take the square root!
| kurthr wrote:
| And don't forget complex numbers or higher dimensionality.
| beervirus wrote:
| Does this correctly handle all the different NaN values, weird
| corner cases, etc. properly?
| pedrocr wrote:
| For double it would be harder to check but for float (that
| should have the same corner cases) it's easy to just check all
| possible values if you have a gold standard. I've been doing
| that for image processing functions and it's amazing how many
| corner cases you fix and keep fixed by writing unit tests that
| iterate over all possible values or a good subset of them when
| that's not possible.
| JadeNB wrote:
| > Does this correctly handle all the different NaN values,
| weird corner cases, etc. properly?
|
| Of course the answer to "have you covered all weird corner
| cases" is _always_ 'no', but the article addresses the issue of
| different NaNs:
|
| > [...] However, it turns out that doubleToLongBits is also not
| entirely trivial, because it canonicalizes NaNs. There are many
| ways to encode a not-a-number as a double, but only one of them
| is canonical. These different NaNs are even more similar twins,
| they can be distinguished neither via Double.compare, nor via
| any operation. Even string representation is the same. But they
| look different in computer memory. To avoid surprises,
| doubleToLongBits converts any NaN to the canonical form, which
| is encoded in long as 0x7ff8000000000000L. Of course, this
| procedure adds more conditions, which we do not need here.
|
| > What can we do? It appears that there's another method
| doubleToRawLongBits. It doesn't do any smart conversion of NaN
| and just returns exactly the same bit representation: [...]
| sp332 wrote:
| The question is whether that last technique, of just flipping
| the sign bit, handles NaNs and infinities.
| pm215 wrote:
| It's the required semantics of the IEEE floating point
| spec's absolute-value operation. This does mean that abs()
| doesn't behave like eg addition in that it won't set fp
| exception flags for signaling NaNs, or canonicalize NaNs,
| or set a flag to note that this was a denormal number. But
| at the language level "follows IEEE semantics" is probably
| overall less surprising.
|
| chs() ("flip the sign bit") is the other operation that's
| defined as effectively just an operation on the
| representation rather than a "do a real floating point
| computation".
| MontyCarloHall wrote:
| Yes.
|
| The sign bit is totally independent of the value being
| encoded, whether that's a number, q/sNaN, or infinity.
|
| Nitpick: we are not flipping the sign bit, but rather
| setting it to 0.
| klyrs wrote:
| Yeah, the title is a head-scratcher. It _is_ simple to
| calculate the absolute value. Only, it doesn 't look simple
| in a high-level language.
| andi999 wrote:
| Does anybody know a good reference on the minus 0 thing? How do
| you end up with it? Is 4-4=+0.0 and -4 +4=-0.0? Or is it
| different?
| aaaaaaaaaaab wrote:
| Both are exactly 0.0.
|
| Regarding signed zero, read this paper from Kahan:
| https://homes.cs.washington.edu/~ztatlock/599z-17sp/papers/b...
| gvx wrote:
| The easiest way to get a signed zero is with a unary minus
| operation. That is, the expression -x produces -0.0 when x is
| an unsigned zero (so 0.0)
| MontyCarloHall wrote:
| No, those both yield +0.
|
| 0/(-x) would return -0, for all x > 0
|
| Likewise, underflows on negative numbers would also return -0.
| thaumasiotes wrote:
| > 0/(-x) would return -0, for all x > 0
|
| I find this one a little weird, but it seems to be true.
|
| The more normal way to get -0 would be to divide a positive
| number by negative infinity, or a negative number by positive
| infinity.
| 63 wrote:
| From some quick testing, -1.0 * 0.0 = -0.0 and the order
| doesn't matter.
| da_chicken wrote:
| > _For example, if you divide 1.0 by +0.0 and -0.0, you get
| completely different answers: +Infinity and -Infinity._
|
| This must be a Java-ism. In most languages I've used you get a
| division by zero error or NaN.
| titzer wrote:
| It's IEEE-754.
| magicalhippo wrote:
| And there are algorithms that rely on this, like this[1]
| branchless axis-aligned bounding box intersection test.
|
| It exploits the division by zero sign logic of IEEE-754, so
| that the min/max operations used also handles the edge cases
| where the ray is parallel to an axis.
|
| [1]: https://tavianator.com/2011/ray_box.html
| wodenokoto wrote:
| While parent isn't correct in their assumption, this was also
| my first thought after reading the article and I appreciate the
| ensuing discussion.
|
| I hope others will take the entire thread into consideration
| before downvoting.
| kmm wrote:
| Which languages did you try? I found that dividing 1.0 by 0.0
| gives Infinity in Java, C, Julia, JavaScript, Ruby, Clojure,
| Haskell and Rust. I get division errors in Python and Perl.
|
| Curiously, Go gives me a compiler error if I divide constant
| literals, but assigning 0.0 to a variable and dividing by it
| does work. On top of that, assigning -0.0 to a float variable
| leads to it having the value +0.0, which I would call a
| borderline compiler bug. Both are probably a consequence of Go
| constant literals not being IEEE-754 floats but some arbitrary
| precision number types, but it's very unexpected behavior.
| da_chicken wrote:
| > Which languages did you try?
|
| Python and C#.
| Kranar wrote:
| Python does throw an exception but C# does not.
| kmm wrote:
| Dividing 1.0 by 0.0 gives [?] in C# as well when I try it.
| da_chicken wrote:
| You're right. I must've divided 0 by 0 and -0. That's the
| only way C# gives me NaN. I thought I'd specifically
| avoided doing that.
| ketzu wrote:
| +/- Infinity seems to be correct according to IEEE754.
|
| IEEE754 section 7.3:
|
| The divideByZero exception shall be signaled if and only if an
| exact infinite result is defined for an operation on finite
| operands. The default result of divideByZero shall be an [?]
| correctly signed according to the operation: -- For division,
| when the divisor is zero and the dividend is a finite non-zero
| number, the sign of the infinity is the exclusive OR of the
| operands' signs (see 6.3). -- For logB(0) when logBFormat is a
| floating-point format, the sign of the infinity is minus
| (-[?]).
| Agentlien wrote:
| This is what you get according to IEE 754.
|
| You'd get NaN from 0.0/0.0 or division by zero if you did 1/0
| as integer division.
| titzer wrote:
| Haha, -0. After spending literal _years_ of my life dealing
| with that little encoding nuisance, in everything from the V8
| TurboFan JIT, to WebAssembly, to adding floating point to
| Virgil, I just gotta say, that little turd is so not worth it.
| I 've read a lot of different justifications for it (it's a
| limit, it represents _approaching_ 0 from the negatives, some
| algorithms _need_ it, etc), I have to admit, I just don 't buy
| it. It's an encoding artifact that's produced so much
| accidental complexity that, in my book, has been _far_ more
| trouble than it 's been worth.
|
| Change my mind.
| darig wrote:
| -mind
| nimish wrote:
| How else would you represent a branch cut or even a jump at
| 0?
|
| https://people.freebsd.org/~das/kahan86branch.pdf from the
| father of ieee-754 is illuminating.
| FabHK wrote:
| Haha, was just going to link that. Here's an ELI5, though
| not that illuminating:
|
| https://hackernoon.com/negative-zero-bbd5fd790af3
|
| But, bottom line, Kahan (not only the father of IEEE 754,
| but also, with Golub, father of the SVD!) has thought about
| it a lot, and though he called the signed zero "a pain in
| the ass", he also said "There really wasn't a way around
| that and you were stuck with it."
|
| http://history.siam.org/pdfs2/Kahan_final.pdf
| croes wrote:
| So for programming convenience we ignore correct math?
| FabHK wrote:
| We cannot represent "correct math" on a computer, because
| the set of reals is uncountable, but in 64 bits we can
| only represent finitely many numbers, not infinitely
| many, let alone uncountably many.
|
| And given that we have to implement some sort of
| approximation to the usual math on a closed subset of the
| reals (rationals, even), why not extend it with some
| "numbers" (negative zero, plus/minus infinity, NaNs)
| that, yes, enable programming convenience?
| croes wrote:
| There is a difference between 2.4987665678 as 2.5 and
| undefined as +-infinity especially because 1.0/+0.0 is
| +infinity and 1/0 is just an exception in Java. Shouldn't
| it be the same result?
| FabHK wrote:
| > In most languages I've used you get a division by zero error
| or NaN
|
| Or it raises an exception, which once disabled a navy warship
| for hours:
|
| > Now decommissioned, the USS Yorktown was among the first
| warhips extensively computerized to reduce crew (by 10% to 374)
| and costs (by $2.8 million per year).
|
| > On 21 Sept. 1997, the Yorktown was maneuvering off the coast
| of Cape Charles, VA, when a crewman accidentally ENTERed a
| blank field into a data base. The blank was treated as a zero
| and caused a Divide-by-Zero Exception which the data-base
| program could not handle. It aborted to the operating system,
| Microsoft Windows NT 4.0, which crashed, bringing down all the
| ship's LAN consoles and miniature remote terminals.
|
| > The Yorktown was paralyzed for 2 3/4 hours, unable to control
| steering, engines or weapons, until the operating system had
| been re-booted. Fortunately the Yorktown was not in combat nor
| in crowded shipping lanes.
|
| > If IEEE 754's default had been in force, the division by zero
| would have insinuated into the data-base an [?] and/or NaN,
| which would have been detected afterwards without a crash.
|
| http://people.eecs.berkeley.edu/~wkahan/Boulder.pdf
|
| https://www.wired.com/1998/07/sunk-by-windows-nt/
|
| EDIT: It was 2 3/4 hours, not 23 hours. I copied/pasted
| carelessly from the PDF.
| da_chicken wrote:
| Eh, there's like six separate issues that led to this.
|
| 1. Interface accepted blank value for a form when a numeric
| value is required.
|
| 2. Application and database permitted a null or coerced a
| zero when the value is required and cannot be zero. You've
| got to sanitize your input.
|
| 3. Application did not check for or gracefully handle
| something as common as a division by zero exception. Instead
| it proceeded to a buffer overrun and crashed.
|
| 4. Application crashing apparently caused an OS crash? Yes,
| okay, it's NT 4.0 and a buffer overrun, but that still should
| not happen. Protected mode is a thing.
|
| 5. There is no redundancy in a system that manages the
| steering, weapons, and engines of a warship.
|
| 6. A warship was unable to restore the single CNC system to
| working order for 23 hours.
|
| Furthermore... there's no evidence that the software would
| have behaved more sensibly or more controllably if it had a
| value of infinity instead of an error. It shouldn't have had
| a zero in the first place.
| togaen wrote:
| Easy solution: switch to C++, use std::copysign
| literallyaduck wrote:
| Just convert to string then replace the "-" char with empty, then
| you can convert to the type you want. Sure performance suffers,
| but it gets you home by quitting time. Don't actually do this.
| prof-dr-ir wrote:
| "1e-3"
| toxik wrote:
| It's exp abs, and it's a beautiful feature.
| [deleted]
| [deleted]
| im3w1l wrote:
| > return Double.longBitsToDouble( >
| Double.doubleToRawLongBits(value) & 0x7fffffffffffffffL);
|
| Technically, this will change NaN values into different ones.
| FabHK wrote:
| The sign of the NaN might change (as you'd expect from an
| absolute value), but the NaN payload (which IEEE 754 envisaged
| might carry debug information) won't change, right?
| HWR_14 wrote:
| The NaN payload's debug information is allowed to use the
| sign bit as part of the debug code (according to my knowledge
| of IEEE 754). Personally, I cannot imagine needing that many
| debug codes in most common usage, and I don't know if many
| implementations do, but it would definitely be something to
| check in your specific implementation details.
| ectopod wrote:
| Why do you say that? It extracts the bits using
| doubleToRawLongBits which does not do NaN canonicalization by
| design. And it gets back to a double with longBitsToDouble
| which "returns the double floating-point value with the same
| bit pattern". So it doesn't change anything either.
| kennywinker wrote:
| They really missed an opportunity to call it caNaNicalization
| im3w1l wrote:
| NaN values can have sign bit set. This will unset it.
| toxik wrote:
| So we defined abs(NaN), sounds like a feature to me ;-)
| gene91 wrote:
| You made a good point. But one might argue that -nan should
| become nan after an abs operation. I don't know enough
| about this topic to determine whether such an argument has
| merit though.
| HWR_14 wrote:
| It's not -NaN vs NaN. There are a bunch of NaN types,
| which are not related like that. It could (according to
| the spec) change "NaN: Attempt to arcsin value greater
| than 1" to "NaN: Zero divided by Zero operation"
| athrowaway3z wrote:
| Wouldn't max( x,-x ) cover everything ?
| Kranar wrote:
| That will not handle -0.0 unfortunately since 0 == -0.
| alisonkisk wrote:
| That doesn't matter because Math.max() knows the difference
| between +0.0 and -0.0.
|
| Try it and you'll see.
| bordercases wrote:
| It does that because of the way they implement arithmetic,
| not about an inherent property of the mathematical function
| `max`. The problem has to be solved once.
| alisonkisk wrote:
| The article's (dubious, as mentioned at the end) goal is
| finding the most efficient way in CPU time.
| kmm wrote:
| Naively, max(a, b) would be implemented as
| return a >= b ? a : b;
|
| But if x is -0.0, you're back to the same problem that -0.0 >=
| +0.0 actually turns out to be true and not false, hence the
| return value will be -0.0.
|
| Looking at the actual implementation[0], doubleToRawLongBits is
| used again here to take care of this edge case. So it would
| work, but a lot slower than the version given in the article.
|
| 0:
| https://github.com/openjdk/jdk/blob/36e2ddad4d2ef3ce27475af6...
| andi999 wrote:
| Couldn't you also implement max as return a>b? a:b;
|
| (changing greater equal to just greater). Would the problem
| then still persist?
| kmm wrote:
| Positive and negative zero cannot be distinguished by
| normal equality checks, meaning +0.0 == -0.0. Therefore,
| for consistency's sake +0.0 > -0.0 ought to be false, which
| would yield a return value of -0.0 for max(+0.0, -(+0.0))
| gvx wrote:
| That wouldn't solve it: max(0.0, -0.0) would be false ? 0.0
| : -0.0
|
| The problem is that 0.0 == -0.0. As long as you don't take
| care of signed zero, you cannot guarantee that abs(x) has
| no negative sign.
| amelius wrote:
| > After such interviews, there will be rumors about your company.
|
| It makes no sense to bring this up in interview questions. Just
| about every programmer knows that IEEE 754 contains some quirks.
| If you want to know the boring details, just look them up in the
| manual.
| MontyCarloHall wrote:
| It's interesting how fairly complex bit twiddling hacks are well-
| known for integers, but even very simple but useful bit twiddles
| for floats (like this one) are obscure enough to warrant their
| own article.
|
| Of course, the most famous bit manipulation of floats is the fast
| inverse square root trick [0]. The bits of a float as an integer
| literal are proportional to its log2 value (plus a constant),
| which is handy for computing a reciprocal square root in
| logspace, which when converted back to a float gets
| exponentiated, yielding the answer (after a couple more
| refinement steps). If you actually take the time to manually
| hammer out the bitwise manipulations, the fast inverse square
| root is quite simple, but it is nonetheless regarded as black
| magic, likely due to the totally non-explanatory comments in the
| Quake source code.
|
| Funnily enough, directly following the fast inverse square root
| in the Quake source code is a function to compute absolute values
| by zeroing the float sign bit [1], just as described in the
| article.
|
| [0]
| https://en.wikipedia.org/wiki/Fast_inverse_square_root#Alias...
|
| [1] https://github.com/id-Software/Quake-III-
| Arena/blob/dbe4ddb1...
| HWR_14 wrote:
| Well, with bit twiddling for floats, you have some very
| important questions. Like, do you maintain the distinct NaN
| values or are you okay with all NaNs producing a valid (but
| possibly different) NaN value.
| Y_Y wrote:
| What if the twiddle wasn't obscure and the article was
| unwarranted?
| MontyCarloHall wrote:
| Given the amount of discussion this article has sparked on
| HN, I'd argue that as obvious as it might be, the technique
| is nonetheless obscure. The amount of highly experienced and
| accomplished programmers I've encountered who have no idea
| how floating point numbers work is astoundingly high.
|
| If a similar article were written about how shifting an
| integer to the right by one bit is equivalent to dividing it
| by two, I highly suspect it would attract no attention on
| this site, aside from a couple dismissive comments grousing
| about why such a basic fact needs its own article.
| kevin_thibedeau wrote:
| Bit twiddling integers doesn't require unsafe type punning like
| this. It's also largely safe to assume a nominal two's
| complement machine is all your code will ever run on. If not,
| you'll know to be careful. You can't make such blanket
| assumptions with floating point representations.
___________________________________________________________________
(page generated 2021-08-22 23:00 UTC)