[HN Gopher] IEEE 754 round-off errors
       ___________________________________________________________________
        
       IEEE 754 round-off errors
        
       Author : kulikov009
       Score  : 69 points
       Date   : 2021-08-27 11:58 UTC (11 hours ago)
        
 (HTM) web link (yurichev.com)
 (TXT) w3m dump (yurichev.com)
        
       | madengr wrote:
       | My kid has all 100% in several HS classes yet Canvas shows
       | 99.99%. I tried explaining round off errors but she'll have none
       | of it.
        
       | chmod775 wrote:
       | There's been a few articles about rounding floating point numbers
       | correctly in the past few days, but I realized that might not
       | always be what you want.
       | 
       | For instance when adding some small positive number to a floating
       | point value I would typically prefer it round up if correct
       | rounding would mean that it remains unchanged.
       | 
       | b [?] 0 and a + b = a can be a source of fun bugs.
       | 
       | However a point can be made that if you even need to _consider_
       | such behavior, you should probably be using something more
       | precise than your current number representation.
        
         | gliptic wrote:
         | What about stochastic rounding as used in many quantized neural
         | network implementations :).
        
           | ChrisLomont wrote:
           | Leads to non-reproducible or non-portable software.
           | 
           | It's better to have well-defined and reproducible behavior,
           | so one can reason how to make correct algorithms.
           | 
           | Neural networks do it for very precise reasons nearly unique
           | to them. If they were built on randomly rounding primitives,
           | and that random was not the precise flavor one model or
           | another needed, then you'd not easily be able to get the
           | distributions needed for the task.
        
         | da_chicken wrote:
         | > However a point can be made that if you even need to consider
         | such behavior, you should probably be using something more
         | precise than your current number representation.
         | 
         | I would agree with this. If you're in a place where you
         | actually need to care about how accurately you're representing
         | numbers with IEEE 754, you almost certainly should look at a
         | precise numeric data type instead and should set aside IEEE 754
         | entirely. It's less common that we think that we need both the
         | range of values that a float or double represent _and_ need
         | those values represented precisely enough to endure
         | aggregation.
         | 
         | It would be nice to be able to give fewer digits to the
         | exponent in a double, though. How often have I really needed
         | 1e308 or 1e-324? How often have I even needed 1e38 or 1e-45?
         | Then again, I don't remember how much precision we'd get back
         | with more digits in the mantissa.
         | 
         | Precision is a really funny thing, too. Remember, pi to ~37
         | decimal places is precise enough to calculate the circumference
         | of a circle of atoms (~10^-10 m) the size of the observable
         | universe (~10^26 m), as at that point the primary source of
         | imprecision is the fact that you used atoms to construct your
         | circle.
        
           | tomsmeding wrote:
           | > It would be nice to be able to give fewer digits to the
           | exponent in a double, though. How often have I really needed
           | 1e308 or 1e-324? How often have I even needed 1e38 or 1e-45?
           | Then again, I don't remember how much precision we'd get back
           | with more digits in the mantissa.
           | 
           | I'm not sure that helps. Going from 10^308 to 10^38,
           | approximately, takes 3 bits from the 11-bit exponent and adds
           | 3 bits to the 52-bit mantissa. This gives you _almost_ one
           | extra decimal digit in your precision. It's not worth it, and
           | if it is you _certainly_ need a more special-purpose number
           | type.
        
         | mbauman wrote:
         | That's just trading one class of bugs for another. Adding a
         | million 10^-50s to 1.0 is far still closer to 1.0 than the next
         | floating point number.
         | 
         | Your algorithm would give an answer that's a million values
         | away.
        
           | chmod775 wrote:
           | It's a trade off regardless of what you're doing.
           | 
           | One class of bug can make you overwrite values in a map or
           | get your code 'stuck' to a number, the other just gives you a
           | (more) wrong number.
           | 
           | I know which kind of error would be preferred in a video game
           | for instance.
        
             | mbauman wrote:
             | > the other just gives you a (more) wrong number.
             | 
             | This is a common misconception: IEEE754 floating point
             | arithmetic is not wrong or fuzzy or inherently inexact. It
             | is very exact; it just has limited precision.
        
               | chmod775 wrote:
               | No. In mathematical terms typical floating point
               | arithmetic has _perfect_ precision - it is repeatable -
               | but limited accuracy - it can deviate from the real
               | value. It 's a shortcoming of CS vocabulary that
               | "precision" is also used to refer to the number of bits
               | used.
               | 
               | "Exactness" is a fuzzy term used in common english to
               | refer to the concept of both precision and accuracy, or
               | either really. It's not really used in mathematics.
        
               | mbauman wrote:
               | Oh cool, you want to use different definitions. No
               | matter. Point remains: the floating point representation
               | of 1/4 is not approximate. Similarly, the closest
               | floating point value to 1/10 also _precisely_ defines
               | some value: it 's exactly 0.10000000000000000555111512312
               | 57827021181583404541015625.
               | 
               | Now whether that's accurate to you depends upon your
               | application. But the numbers don't care.
        
               | chmod775 wrote:
               | > Oh cool, you want to use different definitions. No
               | matter.
               | 
               | It's not "different" terminology. It's the correct
               | terminology. Which you should have down if you want to
               | wisecrack at people.
               | 
               | You really just took a dozen words that were referring to
               | the result of a calculation being wrong (as opposed to
               | the _real_ result) out of context to make them about
               | floating point numbers in a vacuum.
               | 
               | Then after construing some new context, you tried to
               | correct the straw man by giving an explanation in which
               | you used pretty much all the terminology incorrectly.
               | 
               | Don't be surprised people come back at you in the same
               | tone of voice if you leave an opening wide as a barn
               | door.
               | 
               | I assume both of us are capable of constructing sentences
               | using only precise mathematical language. This is not the
               | place or time to flaunt that ability and correct others
               | on things that are still perfectly understandable. We're
               | not writing a paper.
        
             | hermitdev wrote:
             | Another class of bugs silently transfers fractions of a
             | cent into a bank account...
             | 
             | I did, personally, once have a bug in a script while
             | working in finance, once upon a time. Working on a
             | portfolio of test contracts with trillions of dollars in
             | notional value, and I was off by a single penny. It was an
             | easily found and fixed one-line bug due to truncation
             | instead of correct rounding. But, yes. $1e-2 off on a
             | portfolio worth $1e12 was enough to block release.
        
         | deepsun wrote:
         | Well, you shouldn't do exact equality check on floats anyway,
         | that can be a source of fun bugs regardless of rounding.
        
           | gugagore wrote:
           | I see this advice often and I find that it misses the point.
           | The problem isn't the equality check, the problem is finite
           | precision.
           | 
           | There are situations that call for exact equality checking in
           | floating point.
        
             | cratermoon wrote:
             | In which case, you don't use IEEE 754 floating point for
             | math, you use some kind of numerics type or library that
             | correctly models the domain.
        
         | aardvark179 wrote:
         | If you require a distinct point then you can check if a is
         | smaller than the minimum increment that can be added to b, and
         | add that increment instead, but in the few cases I've needed
         | that sort of behaviour it has been easier to do the calculation
         | and then check for equality.
        
         | jimktrains2 wrote:
         | > However a point can be made that if you even need to consider
         | such behavior, you should probably be using something more
         | precise than your current number representation.
         | 
         | Or a different algorithm. For instance, when computing a mean,
         | keeping the current mean and count instead of the current total
         | and count is more numerically stable.
         | 
         | https://statisticsblog412536603.wordpress.com/2019/10/17/str...
        
       | Joker_vD wrote:
       | Oh for the... don't loop until "x == y", loop until "abs(x-y) <
       | MAX_TOLERABLE_ABS_ERROR", that's been the standard advice since
       | forever.
        
         | Igelau wrote:
         | That was my knee jerk reaction as well, but if you look again
         | you'll see the loop is doing a different check than you think.
        
           | hermitdev wrote:
           | Took me a second look to realize what's going on. First
           | thought was "only 54 iterations?" as in, (initial error) * 54
           | >= 1. On the second look, it clicked that no, it's actually
           | more like (initial error) * 2^54 >= 1
        
         | FabHK wrote:
         | (There's no '==' on the entire page (except in Mathematica
         | code, or for printing the results of an equality test).)
        
       | zokier wrote:
       | I feel lot of unintuitiveness of floats stems from inexact
       | decimal literals. With hex literals the situation would be
       | marginally clearer
        
         | rbonvall wrote:
         | When I create my own programming language, the syntax for float
         | literals will be:                   [?]3.14159
        
           | enriquto wrote:
           | This does not make sense: floating point numbers are not
           | really "approximate". For example, small integer arithmetic
           | using floats is guaranteed to be exact. Your language would
           | be very confusing for this common use case.
        
             | frakt0x90 wrote:
             | I think it was a joke
        
           | lifthrasiir wrote:
           | This does kinda exist in Scheme and Racket:
           | > (/ 1 10)         1/10         > (exact? (/ 1 10))
           | 1/10         > (/ 1.0 10.0)         0.1         > (exact? (/
           | 1.0 10.0))         #f         > (/ #e1.0 #e10.0)   ; #e for
           | "exact"         1/10         > (/ #i1 #i10)       ; #i for
           | "inexact"         0.1
           | 
           | As mentioned in HN before [1], Pyret (which has the same
           | origin as Racket) went further and made the exact literal
           | (now "Exactnum") default, so the inexact literal (now
           | "Roughnum") always requires a preceding `~`.
           | 
           | [1] https://news.ycombinator.com/item?id=28263486
        
         | [deleted]
        
       | rcgorton wrote:
       | IEEE 754-1985 is VERY different than 754-2008. The 2008 is much
       | more complex (personal experience with trying to support it in a
       | compiler)
        
       | cratermoon wrote:
       | See also
       | 
       | https://0.30000000000000004.com/
       | 
       | https://scipython.com/blog/mullers-recurrence/
       | 
       | https://latkin.org/blog/2014/11/22/mullers-recurrence-roundo...
        
       | glouwbug wrote:
       | Just do everything with integers. RTS games of the 90's did it.
       | They were able to sync 8 PCs across the world to run the same
       | historical simulation (think Age of Kings) by passing nothing
       | more than mouse clicks and keyboard presses over peer to peer.
       | Determinism is king in lockstep programming
        
         | ChrisLomont wrote:
         | > Just do everything with integers. It gets cripplingly
         | inefficient to do many calculations with integers. Simulating
         | weather, or doing nearly any scientific calculation, using only
         | integers would be a massively terrible idea.
         | 
         | Same with neural networks (even though there is some work in
         | that area), or a large part of signal processing, or medical
         | work.... and on and on.
         | 
         | >Determinism is king in lockstep programming
         | 
         | Pretty much all modern games don't do lockstep because it's far
         | too slow, messy, and error prone. It's much better to build a
         | tolerant system that can handle missed frames, lags, and
         | instead do predictive and catchup computing.
         | 
         | Maybe use the right tool for the job, and learn new tools to
         | access new jobs, is a better approach. Learning IEEE 754
         | nuances is a useful skill, and one can program many things by
         | using this tool well that cannot be done efficiently or cleanly
         | otherwise.
        
         | throwaway858 wrote:
         | There is a persistent myth that floating point is non-
         | deterministic and therefore cannot be used for "lockstep"
         | multiplayer games.
         | 
         | This may have been true in the 90s because of buggy C
         | compilers, but has long not been the case.
         | 
         | you can use 32 bit or 64 bit floating point and your code will
         | be completely determistic across all modern CPUs and operating
         | systems and even across most programming languages.
         | 
         | The only thing you need to avoid are trig functions (sin, cos)
         | since different platforms have different implementations. so
         | you use your own implementation for these functions(for games
         | you can an approximation which will be even faster than the
         | library implementation). many platforms have an identical sqrt
         | function, but this should be thoroughly tested.
         | 
         | Also if you are running on an intel CPU on 32 bit OS (rare
         | nowadays) you need to call a function at the start of your
         | program to disable 80 bit extended fp.
        
           | protastus wrote:
           | > There is a persistent myth that floating point is non-
           | deterministic and therefore cannot be used for "lockstep"
           | multiplayer games.
           | 
           | Indeed, on all systems I've used, FP is deterministic in the
           | strict sense that running the same code twice on the same
           | platform (with identical HW and SW stack) will produce the
           | same results.
           | 
           | But the results may change depending on: - programming
           | language - libraries used and their versions - underlying
           | silicon - whether SIMD is used - FPU flags - compiler flags
           | (example: ffast-math affects many things behind the scenes,
           | as does /fp:fast, including things that dramatically affect
           | accuracy like the use of approximate reciprocal instructions)
           | - bugs affecting all of the above
           | 
           | At scale, this matrix becomes difficult to track and control.
           | So it shouldn't be a surprise that reproducibility is an
           | issue, and this myth persists.
        
             | throwaway858 wrote:
             | > But the results may change depending on:
             | 
             | > - programming language
             | 
             | As I mentioned in my comment, this is part of the myth.
             | most programming languages have the exact same semantics
             | regarding floating point and if you port an algorithm
             | between them you will get identically deterministic
             | results. this is true for at least c/c++, java, c#, python,
             | javascript, and many more.
             | 
             | - libraries used and their versions
             | 
             | As I mentioned in my comment, this is the only true part of
             | the myth. sqrt should be safe, but trig functions will be
             | different. for games, a lot of them use there own "fast"
             | versions anyway, so not a problem.
             | 
             | - underlying silicon
             | 
             | nothing to worry about here. All modern CPUs have identical
             | IEE754 semantics (except intel 32 bit which I addressed in
             | my original comment)
             | 
             | - whether SIMD is used
             | 
             | Again, nothing to worry about here, whether it is hand
             | coded SIMD assembly, or compiler generated, you will always
             | get identical and portable results. the compiler is
             | obligated to ensure this for autogenerated SIMD.
             | 
             | - FPU flags
             | 
             | not relevant on any system from the last 10 years
             | 
             | - compiler flags (example: ffast-math affects many things
             | behind the scenes, as does /fp:fast, including things that
             | dramatically affect accuracy like the use of approximate
             | reciprocal instructions)
             | 
             | ffast-math by definition introduces imprecise semantics so
             | simply don't use it
             | 
             | - bugs affecting all of the above
             | 
             | compilers (and CPUs!) may have been buggy decades ago but
             | in the modern age floating point is extensively tested. I
             | would be very surprised to see a floating point related bug
             | in a compiler or programming language from the last 10
             | years
        
               | benibela wrote:
               | I still have these problems
               | 
               | > - programming language
               | 
               | I decided to not use C, because it is unsafe, and write
               | all my code in Pascal.
               | 
               | Now the FreePascal double parsing does not work correctly
               | 
               | For one, it always uses 80-bit float. Even if parsing a
               | double, then it rounds the 80-bit float to double after
               | parsing, so the last bit is usually wrongly rounded. They
               | refuse to fix it, because it is easier to maintain just
               | one 80-bit float parsing function than multiple parsing
               | functions for different types.
               | 
               | >- libraries used and their versions
               | 
               | So I moved to someone's double parsing library.
               | 
               | It just did not work. It simply lacked the algorithms
               | required for full double precision parsing
               | 
               | But they were happy to fix it
               | 
               | >- FPU flags
               | 
               | After fixing it, that double parsing library still did
               | not work ... on 32-bit
               | 
               | Although the library was implementing the correct
               | operations. Turned out FreePascal compiles 64-bit using
               | SSE, and 32-bit using FPU which was set to 80-bit
               | precision
               | 
               | In reverse, formatting the double as string is also hard
               | and ambiguous. FreePascal usually prints two decimal
               | digits more than required (in the new implementation.
               | They still had a broken implementation a handful years
               | ago). I wrote my own implementation that prints the
               | shortest decimal using arbitrary precision arithmetic,
               | but that is really slow. (I tried to use a library, but
               | that did not have enough precision, and after they tried
               | to fix it, it is completely broken.)
        
         | gumby wrote:
         | Or if you need rationals, use fixed point (unfortunately not
         | native in many languages)
        
       | kibwen wrote:
       | I like to consider what it would look like if a programming
       | language forbade floating point literals from containing inexact
       | values. This means that a literal like `0.1` would be disallowed,
       | and you would instead be forced to write out the literal as the
       | exact nearest representable value `13421773e-27`. Profoundly
       | awful for usability, but at least nobody could ever claim to be
       | surprised that floating point values are imprecise. :P
        
         | johndough wrote:
         | I like the idea, but on a second thought, it is easily fooled:
         | >>> 3.0/10.0 == 1.0/10.0 + 2.0/10.0         False
        
           | ivegotnoaccount wrote:
           | No need to even go up to this point. Numbers that are too
           | different are enough 1e300 + 1 will be equal to 1e300. And it
           | would still be even if 1 was added 1e300 times.
        
             | benibela wrote:
             | And you could not write 1e300 since it is not exact
             | 
             | You need to write 10000000000000000525047602552044202487044
             | 68581108159154915854115511802457988908195786371375080447864
             | 04370444383288387817694252323536043057564479218478670698284
             | 83872009265758037378302337947880900593689532349707999450811
             | 19038967640880074652742780142494579258788820056842838115669
             | 472196386865459400540160
        
         | deckard1 wrote:
         | not quite the same, but Scheme has rationals.
         | 
         | Now that I think about it, the Scheme standard (R5RS at least)
         | does not require an implementation to support the full numeric
         | tower. I can't remember which parts are required, but I do
         | wonder now if it permits an implementation that _only_ has
         | rationals and drops floating point. Most implementations,
         | obviously, go the other route and only support floats. But it
         | would be interesting to see.
        
       | gautamcgoel wrote:
       | There's a simple solution to all this floating point madness:
       | replace floating point arithmetic with fixed point arithmetic,
       | which is associative, more energy efficient in hardware, and much
       | easier to reason about.
        
         | nestorD wrote:
         | Fixed point arithmetic numbers have a very limited range which
         | leads to them not playing well with multiplication and division
         | (doing scientific computation with them is a game of searching
         | which division|multiplication got you that 0|+inf and then
         | searching for an alternative formula if one such formula even
         | exists, they are much more fragile than IEEE-754). They are not
         | used that often so people are unfamiliar with their
         | shortcomings but, I would not make them the default.
         | 
         | Here is little bit of information on them:
         | https://pages.tacc.utexas.edu/~eijkhout/istc/html/arithmetic...
        
         | tverbeure wrote:
         | If you're doing division and multiplications with fixed point
         | arithmetic and the result must fit in a same size word as the
         | operands, you need to decide for each operation how many
         | fractional bits from each operand you're going to retain in the
         | result.
         | 
         | In other words: you need to understand the range of numbers
         | that you're dealing with every step of the way.
         | 
         | That's not really an improvement...
         | 
         | (I wrote about this here:
         | https://tomverbeure.github.io/rtl/2018/11/26/Racing-the-
         | Beam...)
        
         | cesaref wrote:
         | I think you're missing the point. floating point isn't madness,
         | but people thinking it represents the real numbers without
         | stopping to consider how it works is madness.
         | 
         | For example, just consider the number of bugs that come down to
         | integer over/under flow. The reality is that we like to work
         | with a simplistic model of what is going on, and ignoring the
         | hard to reason about edge cases really does simplify problems,
         | and _if_ on revisiting the code we can either handle or rule
         | out these cases as ever happening, then we 're golden.
         | 
         | As for fixed point arithmetic, this is how DSPs worked for
         | years, and there are still lots of edge cases to worry about. A
         | typical application would see many multiply accumulates for
         | something like an FIR filter, so you still have to consider how
         | large the value can end up, and split your integer into a
         | suitable scaling factor to avoid overflow whilst trading off
         | precision. The whole point of floating point is that this
         | tradeoff is always carried out on your behalf.
         | 
         | But don't get me started on denormals...
        
           | gautamcgoel wrote:
           | Thanks for your reply, this is an interesting perspective -
           | based on your comment, it seems likely that you have more
           | real-world experience with these issues than I do.
           | 
           | Regarding floating point performing this tradeoff for you:
           | yes, I understand that floating point arithmetic trades
           | simplicity for greater dynamic range, giving lots of
           | precision to small values and less precision to large values.
           | You can tackle some of these issues with fixed point
           | arithmetic at the cost of increased memory usage. For
           | example, instead of using a 32 bit float, you could use a 64
           | bit fixed point value, where the most significant 32 bits
           | represent the integral part and the least significant 32 bits
           | represent the fractional part. Obviously this type offers
           | both much greater range and greater precision (at every
           | scale) than the 32 bit float, at the cost of 2x memory usage.
           | I suspect that the 64 bit fixed point type might still be
           | faster to work with in hardware, due to the relative
           | simplicity of implementing fixed point arithmetic in
           | hardware. What do you think?
        
             | gpderetta wrote:
             | Why would you compare a 64 bit fix pint with a 32 bit
             | float? If you can afford twice the space you would use a 64
             | bit float. Also I'm not an hardware designer but I think
             | that a 32 bit float multiplier is going to be less complex
             | than a 64 bit fix point multiplier.
        
           | Igelau wrote:
           | > floating point isn't madness, but people thinking it
           | represents the real numbers without stopping to consider how
           | it works is madness
           | 
           | This is certainly true in some places, but I would wager that
           | a large number of JS devs are completely ignorant of floating
           | point, and many of them likely lack the background knowledge
           | to understand it.
        
             | cratermoon wrote:
             | No need to pick on JS. Most devs in any mainstream language
             | are ignorant of how IEEE754 floating point works. In the
             | 2000s, it was Java developers. In the 90s, Perl
             | programmers. Today it's JS and Python. In another 10 years,
             | it could be Rust. https://0.30000000000000004.com/
        
               | Igelau wrote:
               | Not picking on JS. It's just the best representative of
               | the problem right now.
        
       | pixelpoet wrote:
       | Definitely also check out interval arithmetic:
       | https://en.wikipedia.org/wiki/Interval_arithmetic
       | 
       | Excellent article about it: http://cs.utep.edu/interval-
       | comp/hayes.pdf
        
         | enriquto wrote:
         | From my limited point of view, interval arithmetic looks
         | particularly wrong for everything.
         | 
         | It is as if we were using Bayes rule to update our knowledge of
         | numbers after arithmetic operations (which is a good idea), but
         | all the distributions were artificially changed to uniform
         | after the update (which is obviously a very bad idea).
         | 
         | Is there any practical application where interval arithmetic
         | makes sense?
        
           | petters wrote:
           | It can find the global optimum for a function of a few
           | variables.
        
           | gugagore wrote:
           | The big problem with interval arithmetic, stated in terms of
           | probability distributions, isn't that the distributions are
           | uniform, but that the distributions are independent.
           | 
           | The distribution of x-x is the degenerate distribution 0. But
           | the naive implementation of uncertainty propagation would not
           | take into account the dependence between x and x, and would
           | give an over-approximation.
           | 
           | It's true that probability distributions generalize sets, and
           | sets generalize intervals. But there are many applications
           | where intervals are a great choice for representing
           | uncertainty.
           | 
           | Changing a distribution to uniform is not obviously a bad
           | idea because you can parameterize a uniform distribution and
           | represent the parameters compactly.
           | 
           | There is no compact representation for arbitrary
           | distributions.
        
       | johnklos wrote:
       | It's funny how closely some implementations can come to IEEE 754
       | yet not show the same issues. The first example, which takes just
       | 54 iterations to finish on a typical system, runs forever on a
       | VAX, for instance, which is to say that all the world isn't IEEE
       | 754, but accuracy and rounding issues should always be
       | considered.
        
       | jeffreygoesto wrote:
       | Would not be complete without a link to William Kahan [0]...
       | 
       | [0] http://www.cs.berkeley.edu/~wkahan/Mindless.pdf
        
       ___________________________________________________________________
       (page generated 2021-08-27 23:02 UTC)