hngopher.com

       [HN Gopher] One does not simply calculate the absolute value
       ___________________________________________________________________
        
       One does not simply calculate the absolute value
        
       Author : mot2ba
       Score  : 93 points
       Date   : 2021-08-22 18:12 UTC (4 hours ago)
        
 (HTM) web link (habr.com)
 (TXT) w3m dump (habr.com)
        
       | Sharlin wrote:
       | Computing the absolute value of a signed (fixed-size, two's
       | complement) integer is not trivial either, because there is no
       | positive i such that `abs(INT_MIN) == i`. Some languages like
       | Java just give you `abs(INT_MIN) == INT_MIN` because:
       | -100...000 = ~100...000 + 1 = 011...111 + 1 = 100_000
       | 
       | where ~ is the bitwise complement. Even worse, in C and C++
       | specifically, because overflow of signed integers is undefined
       | behavior, so is `abs(INT_MIN)`! To be able to represent the
       | absolute value, you need to either return the corresponding
       | unsigned type (if your language has them), promote to a wider
       | type, or return a tuple `(carry_bit, result)`.
        
       | m4r35n357 wrote:
       | Bah! Just square & take the square root!
        
       | kurthr wrote:
       | And don't forget complex numbers or higher dimensionality.
        
       | beervirus wrote:
       | Does this correctly handle all the different NaN values, weird
       | corner cases, etc. properly?
        
         | pedrocr wrote:
         | For double it would be harder to check but for float (that
         | should have the same corner cases) it's easy to just check all
         | possible values if you have a gold standard. I've been doing
         | that for image processing functions and it's amazing how many
         | corner cases you fix and keep fixed by writing unit tests that
         | iterate over all possible values or a good subset of them when
         | that's not possible.
        
         | JadeNB wrote:
         | > Does this correctly handle all the different NaN values,
         | weird corner cases, etc. properly?
         | 
         | Of course the answer to "have you covered all weird corner
         | cases" is _always_ 'no', but the article addresses the issue of
         | different NaNs:
         | 
         | > [...] However, it turns out that doubleToLongBits is also not
         | entirely trivial, because it canonicalizes NaNs. There are many
         | ways to encode a not-a-number as a double, but only one of them
         | is canonical. These different NaNs are even more similar twins,
         | they can be distinguished neither via Double.compare, nor via
         | any operation. Even string representation is the same. But they
         | look different in computer memory. To avoid surprises,
         | doubleToLongBits converts any NaN to the canonical form, which
         | is encoded in long as 0x7ff8000000000000L. Of course, this
         | procedure adds more conditions, which we do not need here.
         | 
         | > What can we do? It appears that there's another method
         | doubleToRawLongBits. It doesn't do any smart conversion of NaN
         | and just returns exactly the same bit representation: [...]
        
           | sp332 wrote:
           | The question is whether that last technique, of just flipping
           | the sign bit, handles NaNs and infinities.
        
             | pm215 wrote:
             | It's the required semantics of the IEEE floating point
             | spec's absolute-value operation. This does mean that abs()
             | doesn't behave like eg addition in that it won't set fp
             | exception flags for signaling NaNs, or canonicalize NaNs,
             | or set a flag to note that this was a denormal number. But
             | at the language level "follows IEEE semantics" is probably
             | overall less surprising.
             | 
             | chs() ("flip the sign bit") is the other operation that's
             | defined as effectively just an operation on the
             | representation rather than a "do a real floating point
             | computation".
        
             | MontyCarloHall wrote:
             | Yes.
             | 
             | The sign bit is totally independent of the value being
             | encoded, whether that's a number, q/sNaN, or infinity.
             | 
             | Nitpick: we are not flipping the sign bit, but rather
             | setting it to 0.
        
             | klyrs wrote:
             | Yeah, the title is a head-scratcher. It _is_ simple to
             | calculate the absolute value. Only, it doesn 't look simple
             | in a high-level language.
        
       | andi999 wrote:
       | Does anybody know a good reference on the minus 0 thing? How do
       | you end up with it? Is 4-4=+0.0 and -4 +4=-0.0? Or is it
       | different?
        
         | aaaaaaaaaaab wrote:
         | Both are exactly 0.0.
         | 
         | Regarding signed zero, read this paper from Kahan:
         | https://homes.cs.washington.edu/~ztatlock/599z-17sp/papers/b...
        
         | gvx wrote:
         | The easiest way to get a signed zero is with a unary minus
         | operation. That is, the expression -x produces -0.0 when x is
         | an unsigned zero (so 0.0)
        
         | MontyCarloHall wrote:
         | No, those both yield +0.
         | 
         | 0/(-x) would return -0, for all x > 0
         | 
         | Likewise, underflows on negative numbers would also return -0.
        
           | thaumasiotes wrote:
           | > 0/(-x) would return -0, for all x > 0
           | 
           | I find this one a little weird, but it seems to be true.
           | 
           | The more normal way to get -0 would be to divide a positive
           | number by negative infinity, or a negative number by positive
           | infinity.
        
         | 63 wrote:
         | From some quick testing, -1.0 * 0.0 = -0.0 and the order
         | doesn't matter.
        
       | da_chicken wrote:
       | > _For example, if you divide 1.0 by +0.0 and -0.0, you get
       | completely different answers: +Infinity and -Infinity._
       | 
       | This must be a Java-ism. In most languages I've used you get a
       | division by zero error or NaN.
        
         | titzer wrote:
         | It's IEEE-754.
        
           | magicalhippo wrote:
           | And there are algorithms that rely on this, like this[1]
           | branchless axis-aligned bounding box intersection test.
           | 
           | It exploits the division by zero sign logic of IEEE-754, so
           | that the min/max operations used also handles the edge cases
           | where the ray is parallel to an axis.
           | 
           | [1]: https://tavianator.com/2011/ray_box.html
        
         | wodenokoto wrote:
         | While parent isn't correct in their assumption, this was also
         | my first thought after reading the article and I appreciate the
         | ensuing discussion.
         | 
         | I hope others will take the entire thread into consideration
         | before downvoting.
        
         | kmm wrote:
         | Which languages did you try? I found that dividing 1.0 by 0.0
         | gives Infinity in Java, C, Julia, JavaScript, Ruby, Clojure,
         | Haskell and Rust. I get division errors in Python and Perl.
         | 
         | Curiously, Go gives me a compiler error if I divide constant
         | literals, but assigning 0.0 to a variable and dividing by it
         | does work. On top of that, assigning -0.0 to a float variable
         | leads to it having the value +0.0, which I would call a
         | borderline compiler bug. Both are probably a consequence of Go
         | constant literals not being IEEE-754 floats but some arbitrary
         | precision number types, but it's very unexpected behavior.
        
           | da_chicken wrote:
           | > Which languages did you try?
           | 
           | Python and C#.
        
             | Kranar wrote:
             | Python does throw an exception but C# does not.
        
             | kmm wrote:
             | Dividing 1.0 by 0.0 gives [?] in C# as well when I try it.
        
               | da_chicken wrote:
               | You're right. I must've divided 0 by 0 and -0. That's the
               | only way C# gives me NaN. I thought I'd specifically
               | avoided doing that.
        
         | ketzu wrote:
         | +/- Infinity seems to be correct according to IEEE754.
         | 
         | IEEE754 section 7.3:
         | 
         | The divideByZero exception shall be signaled if and only if an
         | exact infinite result is defined for an operation on finite
         | operands. The default result of divideByZero shall be an [?]
         | correctly signed according to the operation: -- For division,
         | when the divisor is zero and the dividend is a finite non-zero
         | number, the sign of the infinity is the exclusive OR of the
         | operands' signs (see 6.3). -- For logB(0) when logBFormat is a
         | floating-point format, the sign of the infinity is minus
         | (-[?]).
        
         | Agentlien wrote:
         | This is what you get according to IEE 754.
         | 
         | You'd get NaN from 0.0/0.0 or division by zero if you did 1/0
         | as integer division.
        
         | titzer wrote:
         | Haha, -0. After spending literal _years_ of my life dealing
         | with that little encoding nuisance, in everything from the V8
         | TurboFan JIT, to WebAssembly, to adding floating point to
         | Virgil, I just gotta say, that little turd is so not worth it.
         | I 've read a lot of different justifications for it (it's a
         | limit, it represents _approaching_ 0 from the negatives, some
         | algorithms _need_ it, etc), I have to admit, I just don 't buy
         | it. It's an encoding artifact that's produced so much
         | accidental complexity that, in my book, has been _far_ more
         | trouble than it 's been worth.
         | 
         | Change my mind.
        
           | darig wrote:
           | -mind
        
           | nimish wrote:
           | How else would you represent a branch cut or even a jump at
           | 0?
           | 
           | https://people.freebsd.org/~das/kahan86branch.pdf from the
           | father of ieee-754 is illuminating.
        
             | FabHK wrote:
             | Haha, was just going to link that. Here's an ELI5, though
             | not that illuminating:
             | 
             | https://hackernoon.com/negative-zero-bbd5fd790af3
             | 
             | But, bottom line, Kahan (not only the father of IEEE 754,
             | but also, with Golub, father of the SVD!) has thought about
             | it a lot, and though he called the signed zero "a pain in
             | the ass", he also said "There really wasn't a way around
             | that and you were stuck with it."
             | 
             | http://history.siam.org/pdfs2/Kahan_final.pdf
        
             | croes wrote:
             | So for programming convenience we ignore correct math?
        
               | FabHK wrote:
               | We cannot represent "correct math" on a computer, because
               | the set of reals is uncountable, but in 64 bits we can
               | only represent finitely many numbers, not infinitely
               | many, let alone uncountably many.
               | 
               | And given that we have to implement some sort of
               | approximation to the usual math on a closed subset of the
               | reals (rationals, even), why not extend it with some
               | "numbers" (negative zero, plus/minus infinity, NaNs)
               | that, yes, enable programming convenience?
        
               | croes wrote:
               | There is a difference between 2.4987665678 as 2.5 and
               | undefined as +-infinity especially because 1.0/+0.0 is
               | +infinity and 1/0 is just an exception in Java. Shouldn't
               | it be the same result?
        
         | FabHK wrote:
         | > In most languages I've used you get a division by zero error
         | or NaN
         | 
         | Or it raises an exception, which once disabled a navy warship
         | for hours:
         | 
         | > Now decommissioned, the USS Yorktown was among the first
         | warhips extensively computerized to reduce crew (by 10% to 374)
         | and costs (by $2.8 million per year).
         | 
         | > On 21 Sept. 1997, the Yorktown was maneuvering off the coast
         | of Cape Charles, VA, when a crewman accidentally ENTERed a
         | blank field into a data base. The blank was treated as a zero
         | and caused a Divide-by-Zero Exception which the data-base
         | program could not handle. It aborted to the operating system,
         | Microsoft Windows NT 4.0, which crashed, bringing down all the
         | ship's LAN consoles and miniature remote terminals.
         | 
         | > The Yorktown was paralyzed for 2 3/4 hours, unable to control
         | steering, engines or weapons, until the operating system had
         | been re-booted. Fortunately the Yorktown was not in combat nor
         | in crowded shipping lanes.
         | 
         | > If IEEE 754's default had been in force, the division by zero
         | would have insinuated into the data-base an [?] and/or NaN,
         | which would have been detected afterwards without a crash.
         | 
         | http://people.eecs.berkeley.edu/~wkahan/Boulder.pdf
         | 
         | https://www.wired.com/1998/07/sunk-by-windows-nt/
         | 
         | EDIT: It was 2 3/4 hours, not 23 hours. I copied/pasted
         | carelessly from the PDF.
        
           | da_chicken wrote:
           | Eh, there's like six separate issues that led to this.
           | 
           | 1. Interface accepted blank value for a form when a numeric
           | value is required.
           | 
           | 2. Application and database permitted a null or coerced a
           | zero when the value is required and cannot be zero. You've
           | got to sanitize your input.
           | 
           | 3. Application did not check for or gracefully handle
           | something as common as a division by zero exception. Instead
           | it proceeded to a buffer overrun and crashed.
           | 
           | 4. Application crashing apparently caused an OS crash? Yes,
           | okay, it's NT 4.0 and a buffer overrun, but that still should
           | not happen. Protected mode is a thing.
           | 
           | 5. There is no redundancy in a system that manages the
           | steering, weapons, and engines of a warship.
           | 
           | 6. A warship was unable to restore the single CNC system to
           | working order for 23 hours.
           | 
           | Furthermore... there's no evidence that the software would
           | have behaved more sensibly or more controllably if it had a
           | value of infinity instead of an error. It shouldn't have had
           | a zero in the first place.
        
       | togaen wrote:
       | Easy solution: switch to C++, use std::copysign
        
       | literallyaduck wrote:
       | Just convert to string then replace the "-" char with empty, then
       | you can convert to the type you want. Sure performance suffers,
       | but it gets you home by quitting time. Don't actually do this.
        
         | prof-dr-ir wrote:
         | "1e-3"
        
           | toxik wrote:
           | It's exp abs, and it's a beautiful feature.
        
       | [deleted]
        
       | [deleted]
        
       | im3w1l wrote:
       | > return Double.longBitsToDouble(         >
       | Double.doubleToRawLongBits(value) & 0x7fffffffffffffffL);
       | 
       | Technically, this will change NaN values into different ones.
        
         | FabHK wrote:
         | The sign of the NaN might change (as you'd expect from an
         | absolute value), but the NaN payload (which IEEE 754 envisaged
         | might carry debug information) won't change, right?
        
           | HWR_14 wrote:
           | The NaN payload's debug information is allowed to use the
           | sign bit as part of the debug code (according to my knowledge
           | of IEEE 754). Personally, I cannot imagine needing that many
           | debug codes in most common usage, and I don't know if many
           | implementations do, but it would definitely be something to
           | check in your specific implementation details.
        
         | ectopod wrote:
         | Why do you say that? It extracts the bits using
         | doubleToRawLongBits which does not do NaN canonicalization by
         | design. And it gets back to a double with longBitsToDouble
         | which "returns the double floating-point value with the same
         | bit pattern". So it doesn't change anything either.
        
           | kennywinker wrote:
           | They really missed an opportunity to call it caNaNicalization
        
           | im3w1l wrote:
           | NaN values can have sign bit set. This will unset it.
        
             | toxik wrote:
             | So we defined abs(NaN), sounds like a feature to me ;-)
        
             | gene91 wrote:
             | You made a good point. But one might argue that -nan should
             | become nan after an abs operation. I don't know enough
             | about this topic to determine whether such an argument has
             | merit though.
        
               | HWR_14 wrote:
               | It's not -NaN vs NaN. There are a bunch of NaN types,
               | which are not related like that. It could (according to
               | the spec) change "NaN: Attempt to arcsin value greater
               | than 1" to "NaN: Zero divided by Zero operation"
        
       | athrowaway3z wrote:
       | Wouldn't  max( x,-x ) cover everything ?
        
         | Kranar wrote:
         | That will not handle -0.0 unfortunately since 0 == -0.
        
           | alisonkisk wrote:
           | That doesn't matter because Math.max() knows the difference
           | between +0.0 and -0.0.
           | 
           | Try it and you'll see.
        
             | bordercases wrote:
             | It does that because of the way they implement arithmetic,
             | not about an inherent property of the mathematical function
             | `max`. The problem has to be solved once.
        
         | alisonkisk wrote:
         | The article's (dubious, as mentioned at the end) goal is
         | finding the most efficient way in CPU time.
        
         | kmm wrote:
         | Naively, max(a, b) would be implemented as
         | return a >= b ? a : b;
         | 
         | But if x is -0.0, you're back to the same problem that -0.0 >=
         | +0.0 actually turns out to be true and not false, hence the
         | return value will be -0.0.
         | 
         | Looking at the actual implementation[0], doubleToRawLongBits is
         | used again here to take care of this edge case. So it would
         | work, but a lot slower than the version given in the article.
         | 
         | 0:
         | https://github.com/openjdk/jdk/blob/36e2ddad4d2ef3ce27475af6...
        
           | andi999 wrote:
           | Couldn't you also implement max as return a>b? a:b;
           | 
           | (changing greater equal to just greater). Would the problem
           | then still persist?
        
             | kmm wrote:
             | Positive and negative zero cannot be distinguished by
             | normal equality checks, meaning +0.0 == -0.0. Therefore,
             | for consistency's sake +0.0 > -0.0 ought to be false, which
             | would yield a return value of -0.0 for max(+0.0, -(+0.0))
        
             | gvx wrote:
             | That wouldn't solve it: max(0.0, -0.0) would be false ? 0.0
             | : -0.0
             | 
             | The problem is that 0.0 == -0.0. As long as you don't take
             | care of signed zero, you cannot guarantee that abs(x) has
             | no negative sign.
        
       | amelius wrote:
       | > After such interviews, there will be rumors about your company.
       | 
       | It makes no sense to bring this up in interview questions. Just
       | about every programmer knows that IEEE 754 contains some quirks.
       | If you want to know the boring details, just look them up in the
       | manual.
        
       | MontyCarloHall wrote:
       | It's interesting how fairly complex bit twiddling hacks are well-
       | known for integers, but even very simple but useful bit twiddles
       | for floats (like this one) are obscure enough to warrant their
       | own article.
       | 
       | Of course, the most famous bit manipulation of floats is the fast
       | inverse square root trick [0]. The bits of a float as an integer
       | literal are proportional to its log2 value (plus a constant),
       | which is handy for computing a reciprocal square root in
       | logspace, which when converted back to a float gets
       | exponentiated, yielding the answer (after a couple more
       | refinement steps). If you actually take the time to manually
       | hammer out the bitwise manipulations, the fast inverse square
       | root is quite simple, but it is nonetheless regarded as black
       | magic, likely due to the totally non-explanatory comments in the
       | Quake source code.
       | 
       | Funnily enough, directly following the fast inverse square root
       | in the Quake source code is a function to compute absolute values
       | by zeroing the float sign bit [1], just as described in the
       | article.
       | 
       | [0]
       | https://en.wikipedia.org/wiki/Fast_inverse_square_root#Alias...
       | 
       | [1] https://github.com/id-Software/Quake-III-
       | Arena/blob/dbe4ddb1...
        
         | HWR_14 wrote:
         | Well, with bit twiddling for floats, you have some very
         | important questions. Like, do you maintain the distinct NaN
         | values or are you okay with all NaNs producing a valid (but
         | possibly different) NaN value.
        
         | Y_Y wrote:
         | What if the twiddle wasn't obscure and the article was
         | unwarranted?
        
           | MontyCarloHall wrote:
           | Given the amount of discussion this article has sparked on
           | HN, I'd argue that as obvious as it might be, the technique
           | is nonetheless obscure. The amount of highly experienced and
           | accomplished programmers I've encountered who have no idea
           | how floating point numbers work is astoundingly high.
           | 
           | If a similar article were written about how shifting an
           | integer to the right by one bit is equivalent to dividing it
           | by two, I highly suspect it would attract no attention on
           | this site, aside from a couple dismissive comments grousing
           | about why such a basic fact needs its own article.
        
         | kevin_thibedeau wrote:
         | Bit twiddling integers doesn't require unsafe type punning like
         | this. It's also largely safe to assume a nominal two's
         | complement machine is all your code will ever run on. If not,
         | you'll know to be careful. You can't make such blanket
         | assumptions with floating point representations.
        
       ___________________________________________________________________
       (page generated 2021-08-22 23:00 UTC)