[HN Gopher] IEEE 754 round-off errors
___________________________________________________________________
IEEE 754 round-off errors
Author : kulikov009
Score : 69 points
Date : 2021-08-27 11:58 UTC (11 hours ago)
(HTM) web link (yurichev.com)
(TXT) w3m dump (yurichev.com)
| madengr wrote:
| My kid has all 100% in several HS classes yet Canvas shows
| 99.99%. I tried explaining round off errors but she'll have none
| of it.
| chmod775 wrote:
| There's been a few articles about rounding floating point numbers
| correctly in the past few days, but I realized that might not
| always be what you want.
|
| For instance when adding some small positive number to a floating
| point value I would typically prefer it round up if correct
| rounding would mean that it remains unchanged.
|
| b [?] 0 and a + b = a can be a source of fun bugs.
|
| However a point can be made that if you even need to _consider_
| such behavior, you should probably be using something more
| precise than your current number representation.
| gliptic wrote:
| What about stochastic rounding as used in many quantized neural
| network implementations :).
| ChrisLomont wrote:
| Leads to non-reproducible or non-portable software.
|
| It's better to have well-defined and reproducible behavior,
| so one can reason how to make correct algorithms.
|
| Neural networks do it for very precise reasons nearly unique
| to them. If they were built on randomly rounding primitives,
| and that random was not the precise flavor one model or
| another needed, then you'd not easily be able to get the
| distributions needed for the task.
| da_chicken wrote:
| > However a point can be made that if you even need to consider
| such behavior, you should probably be using something more
| precise than your current number representation.
|
| I would agree with this. If you're in a place where you
| actually need to care about how accurately you're representing
| numbers with IEEE 754, you almost certainly should look at a
| precise numeric data type instead and should set aside IEEE 754
| entirely. It's less common that we think that we need both the
| range of values that a float or double represent _and_ need
| those values represented precisely enough to endure
| aggregation.
|
| It would be nice to be able to give fewer digits to the
| exponent in a double, though. How often have I really needed
| 1e308 or 1e-324? How often have I even needed 1e38 or 1e-45?
| Then again, I don't remember how much precision we'd get back
| with more digits in the mantissa.
|
| Precision is a really funny thing, too. Remember, pi to ~37
| decimal places is precise enough to calculate the circumference
| of a circle of atoms (~10^-10 m) the size of the observable
| universe (~10^26 m), as at that point the primary source of
| imprecision is the fact that you used atoms to construct your
| circle.
| tomsmeding wrote:
| > It would be nice to be able to give fewer digits to the
| exponent in a double, though. How often have I really needed
| 1e308 or 1e-324? How often have I even needed 1e38 or 1e-45?
| Then again, I don't remember how much precision we'd get back
| with more digits in the mantissa.
|
| I'm not sure that helps. Going from 10^308 to 10^38,
| approximately, takes 3 bits from the 11-bit exponent and adds
| 3 bits to the 52-bit mantissa. This gives you _almost_ one
| extra decimal digit in your precision. It's not worth it, and
| if it is you _certainly_ need a more special-purpose number
| type.
| mbauman wrote:
| That's just trading one class of bugs for another. Adding a
| million 10^-50s to 1.0 is far still closer to 1.0 than the next
| floating point number.
|
| Your algorithm would give an answer that's a million values
| away.
| chmod775 wrote:
| It's a trade off regardless of what you're doing.
|
| One class of bug can make you overwrite values in a map or
| get your code 'stuck' to a number, the other just gives you a
| (more) wrong number.
|
| I know which kind of error would be preferred in a video game
| for instance.
| mbauman wrote:
| > the other just gives you a (more) wrong number.
|
| This is a common misconception: IEEE754 floating point
| arithmetic is not wrong or fuzzy or inherently inexact. It
| is very exact; it just has limited precision.
| chmod775 wrote:
| No. In mathematical terms typical floating point
| arithmetic has _perfect_ precision - it is repeatable -
| but limited accuracy - it can deviate from the real
| value. It 's a shortcoming of CS vocabulary that
| "precision" is also used to refer to the number of bits
| used.
|
| "Exactness" is a fuzzy term used in common english to
| refer to the concept of both precision and accuracy, or
| either really. It's not really used in mathematics.
| mbauman wrote:
| Oh cool, you want to use different definitions. No
| matter. Point remains: the floating point representation
| of 1/4 is not approximate. Similarly, the closest
| floating point value to 1/10 also _precisely_ defines
| some value: it 's exactly 0.10000000000000000555111512312
| 57827021181583404541015625.
|
| Now whether that's accurate to you depends upon your
| application. But the numbers don't care.
| chmod775 wrote:
| > Oh cool, you want to use different definitions. No
| matter.
|
| It's not "different" terminology. It's the correct
| terminology. Which you should have down if you want to
| wisecrack at people.
|
| You really just took a dozen words that were referring to
| the result of a calculation being wrong (as opposed to
| the _real_ result) out of context to make them about
| floating point numbers in a vacuum.
|
| Then after construing some new context, you tried to
| correct the straw man by giving an explanation in which
| you used pretty much all the terminology incorrectly.
|
| Don't be surprised people come back at you in the same
| tone of voice if you leave an opening wide as a barn
| door.
|
| I assume both of us are capable of constructing sentences
| using only precise mathematical language. This is not the
| place or time to flaunt that ability and correct others
| on things that are still perfectly understandable. We're
| not writing a paper.
| hermitdev wrote:
| Another class of bugs silently transfers fractions of a
| cent into a bank account...
|
| I did, personally, once have a bug in a script while
| working in finance, once upon a time. Working on a
| portfolio of test contracts with trillions of dollars in
| notional value, and I was off by a single penny. It was an
| easily found and fixed one-line bug due to truncation
| instead of correct rounding. But, yes. $1e-2 off on a
| portfolio worth $1e12 was enough to block release.
| deepsun wrote:
| Well, you shouldn't do exact equality check on floats anyway,
| that can be a source of fun bugs regardless of rounding.
| gugagore wrote:
| I see this advice often and I find that it misses the point.
| The problem isn't the equality check, the problem is finite
| precision.
|
| There are situations that call for exact equality checking in
| floating point.
| cratermoon wrote:
| In which case, you don't use IEEE 754 floating point for
| math, you use some kind of numerics type or library that
| correctly models the domain.
| aardvark179 wrote:
| If you require a distinct point then you can check if a is
| smaller than the minimum increment that can be added to b, and
| add that increment instead, but in the few cases I've needed
| that sort of behaviour it has been easier to do the calculation
| and then check for equality.
| jimktrains2 wrote:
| > However a point can be made that if you even need to consider
| such behavior, you should probably be using something more
| precise than your current number representation.
|
| Or a different algorithm. For instance, when computing a mean,
| keeping the current mean and count instead of the current total
| and count is more numerically stable.
|
| https://statisticsblog412536603.wordpress.com/2019/10/17/str...
| Joker_vD wrote:
| Oh for the... don't loop until "x == y", loop until "abs(x-y) <
| MAX_TOLERABLE_ABS_ERROR", that's been the standard advice since
| forever.
| Igelau wrote:
| That was my knee jerk reaction as well, but if you look again
| you'll see the loop is doing a different check than you think.
| hermitdev wrote:
| Took me a second look to realize what's going on. First
| thought was "only 54 iterations?" as in, (initial error) * 54
| >= 1. On the second look, it clicked that no, it's actually
| more like (initial error) * 2^54 >= 1
| FabHK wrote:
| (There's no '==' on the entire page (except in Mathematica
| code, or for printing the results of an equality test).)
| zokier wrote:
| I feel lot of unintuitiveness of floats stems from inexact
| decimal literals. With hex literals the situation would be
| marginally clearer
| rbonvall wrote:
| When I create my own programming language, the syntax for float
| literals will be: [?]3.14159
| enriquto wrote:
| This does not make sense: floating point numbers are not
| really "approximate". For example, small integer arithmetic
| using floats is guaranteed to be exact. Your language would
| be very confusing for this common use case.
| frakt0x90 wrote:
| I think it was a joke
| lifthrasiir wrote:
| This does kinda exist in Scheme and Racket:
| > (/ 1 10) 1/10 > (exact? (/ 1 10))
| 1/10 > (/ 1.0 10.0) 0.1 > (exact? (/
| 1.0 10.0)) #f > (/ #e1.0 #e10.0) ; #e for
| "exact" 1/10 > (/ #i1 #i10) ; #i for
| "inexact" 0.1
|
| As mentioned in HN before [1], Pyret (which has the same
| origin as Racket) went further and made the exact literal
| (now "Exactnum") default, so the inexact literal (now
| "Roughnum") always requires a preceding `~`.
|
| [1] https://news.ycombinator.com/item?id=28263486
| [deleted]
| rcgorton wrote:
| IEEE 754-1985 is VERY different than 754-2008. The 2008 is much
| more complex (personal experience with trying to support it in a
| compiler)
| cratermoon wrote:
| See also
|
| https://0.30000000000000004.com/
|
| https://scipython.com/blog/mullers-recurrence/
|
| https://latkin.org/blog/2014/11/22/mullers-recurrence-roundo...
| glouwbug wrote:
| Just do everything with integers. RTS games of the 90's did it.
| They were able to sync 8 PCs across the world to run the same
| historical simulation (think Age of Kings) by passing nothing
| more than mouse clicks and keyboard presses over peer to peer.
| Determinism is king in lockstep programming
| ChrisLomont wrote:
| > Just do everything with integers. It gets cripplingly
| inefficient to do many calculations with integers. Simulating
| weather, or doing nearly any scientific calculation, using only
| integers would be a massively terrible idea.
|
| Same with neural networks (even though there is some work in
| that area), or a large part of signal processing, or medical
| work.... and on and on.
|
| >Determinism is king in lockstep programming
|
| Pretty much all modern games don't do lockstep because it's far
| too slow, messy, and error prone. It's much better to build a
| tolerant system that can handle missed frames, lags, and
| instead do predictive and catchup computing.
|
| Maybe use the right tool for the job, and learn new tools to
| access new jobs, is a better approach. Learning IEEE 754
| nuances is a useful skill, and one can program many things by
| using this tool well that cannot be done efficiently or cleanly
| otherwise.
| throwaway858 wrote:
| There is a persistent myth that floating point is non-
| deterministic and therefore cannot be used for "lockstep"
| multiplayer games.
|
| This may have been true in the 90s because of buggy C
| compilers, but has long not been the case.
|
| you can use 32 bit or 64 bit floating point and your code will
| be completely determistic across all modern CPUs and operating
| systems and even across most programming languages.
|
| The only thing you need to avoid are trig functions (sin, cos)
| since different platforms have different implementations. so
| you use your own implementation for these functions(for games
| you can an approximation which will be even faster than the
| library implementation). many platforms have an identical sqrt
| function, but this should be thoroughly tested.
|
| Also if you are running on an intel CPU on 32 bit OS (rare
| nowadays) you need to call a function at the start of your
| program to disable 80 bit extended fp.
| protastus wrote:
| > There is a persistent myth that floating point is non-
| deterministic and therefore cannot be used for "lockstep"
| multiplayer games.
|
| Indeed, on all systems I've used, FP is deterministic in the
| strict sense that running the same code twice on the same
| platform (with identical HW and SW stack) will produce the
| same results.
|
| But the results may change depending on: - programming
| language - libraries used and their versions - underlying
| silicon - whether SIMD is used - FPU flags - compiler flags
| (example: ffast-math affects many things behind the scenes,
| as does /fp:fast, including things that dramatically affect
| accuracy like the use of approximate reciprocal instructions)
| - bugs affecting all of the above
|
| At scale, this matrix becomes difficult to track and control.
| So it shouldn't be a surprise that reproducibility is an
| issue, and this myth persists.
| throwaway858 wrote:
| > But the results may change depending on:
|
| > - programming language
|
| As I mentioned in my comment, this is part of the myth.
| most programming languages have the exact same semantics
| regarding floating point and if you port an algorithm
| between them you will get identically deterministic
| results. this is true for at least c/c++, java, c#, python,
| javascript, and many more.
|
| - libraries used and their versions
|
| As I mentioned in my comment, this is the only true part of
| the myth. sqrt should be safe, but trig functions will be
| different. for games, a lot of them use there own "fast"
| versions anyway, so not a problem.
|
| - underlying silicon
|
| nothing to worry about here. All modern CPUs have identical
| IEE754 semantics (except intel 32 bit which I addressed in
| my original comment)
|
| - whether SIMD is used
|
| Again, nothing to worry about here, whether it is hand
| coded SIMD assembly, or compiler generated, you will always
| get identical and portable results. the compiler is
| obligated to ensure this for autogenerated SIMD.
|
| - FPU flags
|
| not relevant on any system from the last 10 years
|
| - compiler flags (example: ffast-math affects many things
| behind the scenes, as does /fp:fast, including things that
| dramatically affect accuracy like the use of approximate
| reciprocal instructions)
|
| ffast-math by definition introduces imprecise semantics so
| simply don't use it
|
| - bugs affecting all of the above
|
| compilers (and CPUs!) may have been buggy decades ago but
| in the modern age floating point is extensively tested. I
| would be very surprised to see a floating point related bug
| in a compiler or programming language from the last 10
| years
| benibela wrote:
| I still have these problems
|
| > - programming language
|
| I decided to not use C, because it is unsafe, and write
| all my code in Pascal.
|
| Now the FreePascal double parsing does not work correctly
|
| For one, it always uses 80-bit float. Even if parsing a
| double, then it rounds the 80-bit float to double after
| parsing, so the last bit is usually wrongly rounded. They
| refuse to fix it, because it is easier to maintain just
| one 80-bit float parsing function than multiple parsing
| functions for different types.
|
| >- libraries used and their versions
|
| So I moved to someone's double parsing library.
|
| It just did not work. It simply lacked the algorithms
| required for full double precision parsing
|
| But they were happy to fix it
|
| >- FPU flags
|
| After fixing it, that double parsing library still did
| not work ... on 32-bit
|
| Although the library was implementing the correct
| operations. Turned out FreePascal compiles 64-bit using
| SSE, and 32-bit using FPU which was set to 80-bit
| precision
|
| In reverse, formatting the double as string is also hard
| and ambiguous. FreePascal usually prints two decimal
| digits more than required (in the new implementation.
| They still had a broken implementation a handful years
| ago). I wrote my own implementation that prints the
| shortest decimal using arbitrary precision arithmetic,
| but that is really slow. (I tried to use a library, but
| that did not have enough precision, and after they tried
| to fix it, it is completely broken.)
| gumby wrote:
| Or if you need rationals, use fixed point (unfortunately not
| native in many languages)
| kibwen wrote:
| I like to consider what it would look like if a programming
| language forbade floating point literals from containing inexact
| values. This means that a literal like `0.1` would be disallowed,
| and you would instead be forced to write out the literal as the
| exact nearest representable value `13421773e-27`. Profoundly
| awful for usability, but at least nobody could ever claim to be
| surprised that floating point values are imprecise. :P
| johndough wrote:
| I like the idea, but on a second thought, it is easily fooled:
| >>> 3.0/10.0 == 1.0/10.0 + 2.0/10.0 False
| ivegotnoaccount wrote:
| No need to even go up to this point. Numbers that are too
| different are enough 1e300 + 1 will be equal to 1e300. And it
| would still be even if 1 was added 1e300 times.
| benibela wrote:
| And you could not write 1e300 since it is not exact
|
| You need to write 10000000000000000525047602552044202487044
| 68581108159154915854115511802457988908195786371375080447864
| 04370444383288387817694252323536043057564479218478670698284
| 83872009265758037378302337947880900593689532349707999450811
| 19038967640880074652742780142494579258788820056842838115669
| 472196386865459400540160
| deckard1 wrote:
| not quite the same, but Scheme has rationals.
|
| Now that I think about it, the Scheme standard (R5RS at least)
| does not require an implementation to support the full numeric
| tower. I can't remember which parts are required, but I do
| wonder now if it permits an implementation that _only_ has
| rationals and drops floating point. Most implementations,
| obviously, go the other route and only support floats. But it
| would be interesting to see.
| gautamcgoel wrote:
| There's a simple solution to all this floating point madness:
| replace floating point arithmetic with fixed point arithmetic,
| which is associative, more energy efficient in hardware, and much
| easier to reason about.
| nestorD wrote:
| Fixed point arithmetic numbers have a very limited range which
| leads to them not playing well with multiplication and division
| (doing scientific computation with them is a game of searching
| which division|multiplication got you that 0|+inf and then
| searching for an alternative formula if one such formula even
| exists, they are much more fragile than IEEE-754). They are not
| used that often so people are unfamiliar with their
| shortcomings but, I would not make them the default.
|
| Here is little bit of information on them:
| https://pages.tacc.utexas.edu/~eijkhout/istc/html/arithmetic...
| tverbeure wrote:
| If you're doing division and multiplications with fixed point
| arithmetic and the result must fit in a same size word as the
| operands, you need to decide for each operation how many
| fractional bits from each operand you're going to retain in the
| result.
|
| In other words: you need to understand the range of numbers
| that you're dealing with every step of the way.
|
| That's not really an improvement...
|
| (I wrote about this here:
| https://tomverbeure.github.io/rtl/2018/11/26/Racing-the-
| Beam...)
| cesaref wrote:
| I think you're missing the point. floating point isn't madness,
| but people thinking it represents the real numbers without
| stopping to consider how it works is madness.
|
| For example, just consider the number of bugs that come down to
| integer over/under flow. The reality is that we like to work
| with a simplistic model of what is going on, and ignoring the
| hard to reason about edge cases really does simplify problems,
| and _if_ on revisiting the code we can either handle or rule
| out these cases as ever happening, then we 're golden.
|
| As for fixed point arithmetic, this is how DSPs worked for
| years, and there are still lots of edge cases to worry about. A
| typical application would see many multiply accumulates for
| something like an FIR filter, so you still have to consider how
| large the value can end up, and split your integer into a
| suitable scaling factor to avoid overflow whilst trading off
| precision. The whole point of floating point is that this
| tradeoff is always carried out on your behalf.
|
| But don't get me started on denormals...
| gautamcgoel wrote:
| Thanks for your reply, this is an interesting perspective -
| based on your comment, it seems likely that you have more
| real-world experience with these issues than I do.
|
| Regarding floating point performing this tradeoff for you:
| yes, I understand that floating point arithmetic trades
| simplicity for greater dynamic range, giving lots of
| precision to small values and less precision to large values.
| You can tackle some of these issues with fixed point
| arithmetic at the cost of increased memory usage. For
| example, instead of using a 32 bit float, you could use a 64
| bit fixed point value, where the most significant 32 bits
| represent the integral part and the least significant 32 bits
| represent the fractional part. Obviously this type offers
| both much greater range and greater precision (at every
| scale) than the 32 bit float, at the cost of 2x memory usage.
| I suspect that the 64 bit fixed point type might still be
| faster to work with in hardware, due to the relative
| simplicity of implementing fixed point arithmetic in
| hardware. What do you think?
| gpderetta wrote:
| Why would you compare a 64 bit fix pint with a 32 bit
| float? If you can afford twice the space you would use a 64
| bit float. Also I'm not an hardware designer but I think
| that a 32 bit float multiplier is going to be less complex
| than a 64 bit fix point multiplier.
| Igelau wrote:
| > floating point isn't madness, but people thinking it
| represents the real numbers without stopping to consider how
| it works is madness
|
| This is certainly true in some places, but I would wager that
| a large number of JS devs are completely ignorant of floating
| point, and many of them likely lack the background knowledge
| to understand it.
| cratermoon wrote:
| No need to pick on JS. Most devs in any mainstream language
| are ignorant of how IEEE754 floating point works. In the
| 2000s, it was Java developers. In the 90s, Perl
| programmers. Today it's JS and Python. In another 10 years,
| it could be Rust. https://0.30000000000000004.com/
| Igelau wrote:
| Not picking on JS. It's just the best representative of
| the problem right now.
| pixelpoet wrote:
| Definitely also check out interval arithmetic:
| https://en.wikipedia.org/wiki/Interval_arithmetic
|
| Excellent article about it: http://cs.utep.edu/interval-
| comp/hayes.pdf
| enriquto wrote:
| From my limited point of view, interval arithmetic looks
| particularly wrong for everything.
|
| It is as if we were using Bayes rule to update our knowledge of
| numbers after arithmetic operations (which is a good idea), but
| all the distributions were artificially changed to uniform
| after the update (which is obviously a very bad idea).
|
| Is there any practical application where interval arithmetic
| makes sense?
| petters wrote:
| It can find the global optimum for a function of a few
| variables.
| gugagore wrote:
| The big problem with interval arithmetic, stated in terms of
| probability distributions, isn't that the distributions are
| uniform, but that the distributions are independent.
|
| The distribution of x-x is the degenerate distribution 0. But
| the naive implementation of uncertainty propagation would not
| take into account the dependence between x and x, and would
| give an over-approximation.
|
| It's true that probability distributions generalize sets, and
| sets generalize intervals. But there are many applications
| where intervals are a great choice for representing
| uncertainty.
|
| Changing a distribution to uniform is not obviously a bad
| idea because you can parameterize a uniform distribution and
| represent the parameters compactly.
|
| There is no compact representation for arbitrary
| distributions.
| johnklos wrote:
| It's funny how closely some implementations can come to IEEE 754
| yet not show the same issues. The first example, which takes just
| 54 iterations to finish on a typical system, runs forever on a
| VAX, for instance, which is to say that all the world isn't IEEE
| 754, but accuracy and rounding issues should always be
| considered.
| jeffreygoesto wrote:
| Would not be complete without a link to William Kahan [0]...
|
| [0] http://www.cs.berkeley.edu/~wkahan/Mindless.pdf
___________________________________________________________________
(page generated 2021-08-27 23:02 UTC)