[HN Gopher] Examples of floating point problems
___________________________________________________________________
Examples of floating point problems
Author : grappler
Score : 161 points
Date : 2023-01-13 14:59 UTC (8 hours ago)
(HTM) web link (jvns.ca)
(TXT) w3m dump (jvns.ca)
| weakfortress wrote:
| Used to run into these problems all the time when I was doing
| work in numerical analysis.
|
| The PATRIOT missile error (it wasn't a _disaster_ ) was more due
| to the handling of timestamps than just floating point deviation.
| There were several concurrent failures that allowed the SCUD to
| hit it's target. IIRC the clock drift was significant and was
| magnified by being converted to a floating point and,
| importantly, _truncated_ into a 24 bit register. Moreover, they
| weren 't "slightly off". The clock drift alone put the missile
| considerably off target.
|
| While I don't claim that floating points didn't have a hand in
| this error it's likely the correct handling of timestamps would
| not have introduced the problem in the first place. Unlike the
| other examples given this one is a better example of knowing your
| system and problem domain rather than simply forgetting to
| calculate a delta or being unaware of the limitations of IEEE
| 754. "Good enough for government work" struck again here.
| [deleted]
| ape4 wrote:
| All numbers in JavaScript are floats, unless you make an array
| with Int8Array(). https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
|
| I wonder if people sometimes make a one element integer array
| this way so they can have a integer to work with.
| mochomocha wrote:
| Regarding denormal/subnormal numbers mentioned as "weird": the
| main issue with them is that their hardware implementation is
| awfully slow, to the point of being unusable for most computation
| cases with even moderate FLOPs
| dunham wrote:
| I had one issue where pdftotext would produce different output on
| different machines (Linux vs Mac). It broke some of our tests.
|
| I tracked down where it was happening (involving an ==), but it
| magically stopped when I added print statements or looked at it
| in the debugger.
|
| It turns out the x86 was running the math at a higher precision
| and truncating when it moved values out of registers - as soon as
| it hit memory, things were equal. MacOS was defaulting to
| -ffloat-store to get consistency (their UI library is float
| based).
|
| There were too many instances of == in that code base (which IMO
| is a bad idea with floats), so I just added -ffloat-store to the
| Linux build and called it a day.
| alkonaut wrote:
| x86 (x87) FP is notoriously inconsistent because of the 80 bit
| extended precision that may not be used. In a JITed language
| line Java/C# it's even less fun as it can theoretically be
| inconsistent even for the same compiled program on different
| machines.
|
| Thankfully the solution to that problem came when x86 (32 bit)
| mostly disappeared.
| WalterBright wrote:
| > NaN/infinity values can propagate and cause chaos
|
| NaN is the most misunderstood feature of IEEE floating point.
| Most people react to a NaN like they'd react to the dentist
| telling them they need a root canal. But NaN is actually a very
| valuable and useful tool!
|
| NaN is just a value that represents an invalid floating point
| value. The result of any operation on a NaN is a NaN. This means
| that NaNs propagate from the source of the original NaN to the
| final printed result.
|
| "This sounds terrible" you might think.
|
| But let's study it a bit. Suppose you are searching an array for
| a value, and the value is not in the array. What do you return
| for an index into the array? People often use -1 as the "not
| found" value. But then what happens when the -1 value is not
| noticed? It winds up corrupting further attempts to use it. The
| problem is that integers do not have a NaN value to use for this.
|
| What's the result of sqrt(-1.0)? It's not a number, so it's a
| NaN. If a NaN appears in your results, you know you've got
| mistake in your algorithm or initial values. Yes, I know, it can
| be clumsy to trace it back to its source, but I submit it is
| _better_ than having a bad result go unrecognized.
|
| NaN has value beyond that. Suppose you have an array of sensors.
| One of those sensors goes bad (like they always do). What value
| to you use for the bad sensor? NaN. Then, when the data is
| crunched, if the result is NaN, you know that your result comes
| from bad data. Compare with setting the bad input to 0.0. You
| never know how that affects your results.
|
| This is why D (in one of its more controversial choices) sets
| uninitialized floating point values to NaN rather than the more
| conventional choice of 0.0.
|
| NaN is your friend!
| inetknght wrote:
| > _This means that NaNs propagate from the source of the
| original NaN to the final printed result._
|
| An exception would be better. Then you immediately get at the
| first problem instead of having to track down the lifetime of
| the observed problem to find the first problem.
| insulanus wrote:
| Definitely. Unfortunately, Language implementations that
| guaranteed exceptions were not in wide use at the time. Also,
| to have a chance at being implemented on more than one CPU,
| it had to work in C and assembly.
| pwpwp wrote:
| I don't find this convincing.
|
| > What do you return for an index into the array?
|
| An option/maybe type would solve this much better.
|
| > Yes, I know, it can be clumsy to trace it back to its source
|
| An exception would be much better, alerting you to the exact
| spot where the problem occurred.
| WalterBright wrote:
| > An option/maybe type would solve this much better.
|
| NaN's are already an option type, although implemented in
| hardware. The checking comes for free.
|
| > An exception would be much better
|
| You can configure the FPU to cause an Invalid Operation
| Exception, but I personally don't find that attractive.
| pwpwp wrote:
| Good points!
| omginternets wrote:
| As far as I'm aware, there's no equivalent to a stack trace
| with NaN, so finding the origin of a NaN can be extremely
| tedious.
| ratorx wrote:
| The missing bit is language tooling. The regular floating
| point API exposed by most languages don't force handling of
| NaNs.
|
| The benefit of the option type is not necessarily just the
| extra value, but also the fact that the API that forces you
| to handle the None value. It's the difference between null
| and Option.
|
| Even if the API was better, I think there's value in
| expressing it as Option<FloatGuaranteedToNotBeNaN> which
| compiles down to using NaNs for the extra value to keep it
| similar to other Option specialisations and not have to
| remember about this special primitive type that has option
| built in.
| jcparkyn wrote:
| > NaN's are already an option type, although implemented in
| hardware
|
| The compromise with this is that it makes it impossible to
| represent a _non-optional_ float, which leads to the same
| issues as null pointers in c++ /java/etc.
|
| The impacts of NaN are almost certainly not as bad (in
| aggregate) as `null`, but it'd still be nice if more
| languages had ways to guarantee that certain numbers aren't
| NaN (e.g. with a richer set of number types).
| jordigh wrote:
| Exceptions are actually part of floats, they're called
| "signalling nans".
|
| So technically Python is correct when it decided that 0.0/0.0
| should raise an exception instead of just quietly returning
| NaN. Raising an exception is a standards-conforming option.
|
| https://stackoverflow.com/questions/18118408/what-is-the-
| dif...
| WalterBright wrote:
| In practice, I've found signalling NaNs to be completely
| unworkable and gave up on them. The trouble is they eagerly
| convert to quiet NaNs, too eagerly.
| maximilianburke wrote:
| I think the concept of NaNs are sound, but I think relying on
| them is fraught with peril, made so by the unobvious test for
| NaN-ness in many languages (ie, "if (x != x)"), and the lure of
| people who want to turn on "fast math" optimizations which do
| things like assume NaNs aren't possible and then dead-code-
| eliminate everything that's guarded by an "x != x" test.
|
| Really though, I'm a fan, I just think that we need better
| means for checking them in legacy languages and we need to
| entirely do away with "fast math" optimizations.
| WalterBright wrote:
| I call them "buggy math" optimizations. The dmd D compiler
| does not have a switch to enable buggy math.
| jordigh wrote:
| > What's the result of 1.0/0.0? It's not a number, so it's a
| NaN
|
| It's not often that I get to correct Mr D himself, but 1.0/0.0
| is...
| WalterBright wrote:
| You're right. I'll fix it.
| evancox100 wrote:
| Example 7 really got me, can anyone explain that? I'm not sure
| how "modulo" operation would be implemented in hardware, if it is
| a native instruction or not, but one would hope it would give a
| result consistent with the matching divide operation.
|
| Edit: x87 has FPREM1 which can calculate a remainder (accurately
| one hopes), but I can't find an equivalent in modern SSE or AVX.
| So I guess you are at the mercy of your language's library and/or
| compiler? Is this a library/language bug rather than a Floating
| Point gotcha?
| adrian_b wrote:
| This has nothing to do with the definition or implementation of
| the remainder or modulo function.
|
| It is a problem that appears whenever you compose an inexact
| function, like the conversion from decimal to binary, with a
| function that is not continuous, like the remainder a.k.a.
| modulo function.
|
| In decimal, 13.716 is exactly 3 times 4.572, so any kind of
| remainder must be null, but after conversion from decimal to
| binary that relationship is no longer true, and because the
| remainder is not a continuous function its value may be wildly
| different from the correct value.
|
| When you compute with approximate numbers, like the floating-
| point numbers, as long as you compose only continuous
| functions, the error in the final result remains bounded and
| smaller errors in inputs lead to a diminished error in the
| output.
|
| However, it is enough to insert one discontinuous function in
| the computation chain for losing any guarantee about the
| magnitude of the error in the final result.
|
| The conclusion is that whenever computing with approximate
| numbers (which may also use other representations, not only
| floating-point) you have to be exceedingly cautious when using
| any function that is not continuous.
| timerol wrote:
| Based on the nearest numbers that floats represent, the two
| numbers are Y = 13.715999603271484375
| (https://float.exposed/0x415b74bc) and X =
| 4.57200002670288085938 (https://float.exposed/0x40924dd3).
|
| The division of these numbers is 2.9999998957049091386350361962
| 468173875300478102103478639802753918, but the nearest float to
| that is 3. (Exactly 3.) [2]
|
| The modulo operation can (presumably) determine that 3X > Y, so
| the modulo is Y - 2X, as normal.
|
| This gives inconsistent results, if you don't know that every
| float is actually a range, and "3" as a float includes some
| numbers that are smaller than 3.
|
| [1]
| https://www.wolframalpha.com/input?i=13.715999603271484375+%...
| [2] https://www.wolframalpha.com/input?i=2.99999989570490913863
| 5..., then https://float.exposed/0x40400000
| svat wrote:
| This is useful but note that Python uses 64-bit floats (aka
| "double"), so the right values are:
|
| * "13.716" means 13.7159999999999993037
| (https://float.exposed/0x402b6e978d4fdf3b)
|
| * "4.572" means 4.57200000000000006395
| (https://float.exposed/0x401249ba5e353f7d)
|
| * "13.716 / 4.572" means the nearest representable value to
| 13.7159999999999993037 / 4.57200000000000006395 which (https:
| //www.wolframalpha.com/input?i=13.7159999999999993037+...) is
| 3.0 (https://float.exposed/0x4008000000000000)
|
| * "13.716 % 4.572" means the nearest representable value to
| 13.7159999999999993037 % 4.57200000000000006395 namely to
| 4.5719999999999991758 (https://www.wolframalpha.com/input?i=1
| 3.7159999999999993037+...), which is 4.57199999999999917577
| (https://float.exposed/0x401249ba5e353f7c) printed as
| 4.571999999999999.
|
| ----------------
|
| Edit: For a useful analogy (answering the GP), imagine you're
| working in decimal fixed-point arithmetic with two decimal
| digits (like dollars and cents), and someone asks you for
| 10.01/3.34 and 10.01%3.34. Well,
|
| * 10.01 / 3.34 is well over 2.99 (it's over 2.997 in fact) so
| you'd be justified in answering 3.00 (the nearest
| representable value).
|
| * 10.01 % 3.34 is 3.33 (which you can represent exactly), so
| you'd answer 3.33 to that one.
|
| (For an even bigger difference: try 19.99 and 6.67 to get
| 3.00 as quotient, but 6.65 as remainder.)
| kilotaras wrote:
| Story time.
|
| Back in university I was taking part in programming competition.
| I don't remember the exact details of a problem, but it was
| expected to be solved as a dynamic problem with dp[n][n] as an
| answer, n < 1000. But, wrangling some numbers around one could
| show that dp[n][n] = dp[n-1][n-1] + 1/n, and the answer was just
| the sum of first N elements of harmonic series. Unluckily for us
| the intended solution had worse precision and our solution
| failed.
| HarryHirsch wrote:
| They didn't take into account that floats come with an
| estimated uncertainty, and that values that are the same within
| the limits of experimental error are identical? That's a really
| badly set problem!
| kilotaras wrote:
| I think it that particular case they just didn't do error
| analysis.
|
| The task was to output answer with `10^-6` precision, which
| they solution didn't achieve. Funnily enough the number of
| other teams went the "correct" route and passed (as they were
| doing additions in same order as original solution).
| jordigh wrote:
| One thing that pains me about this kind of zoo of problems is
| that people often have the takeaway, "floating point is full of
| unknowable, random errors, never use floating point, you will
| never understand it."
|
| Floating point is amazingly useful! There's a reason why it's
| implemented in hardware in all modern computers and why every
| programming language has a built-in type for floats. You should
| use it! And you should understand that most of its limitations
| are an inherent mathematical and fundamental limitation, it is
| logically impossible to do better on most of its limitations:
|
| 1. Numerical error is a fact of life, you can only delay it or
| move it to another part of your computation, but you cannot get
| rid of it.
|
| 2. You cannot avoid working with very small or very large things
| because your users are going to try, and floating point or not,
| you'd better have a plan ready.
|
| 3. You might not like that floats are in binary, which makes
| decimal arithmetic look weird. But doing decimal arithmetic does
| not get rid of numerical error, see point 1 (and binary
| arithmetic thinks your decimal arithmetic looks weird too).
|
| But sure, don't use floats for ID numbers, that's always a
| problem. In fact, don't use bigints either, nor any other
| arithmetic type for something you won't be doing arithmetic on.
| zokier wrote:
| > One thing that pains me about this kind of zoo of problems is
| that people often have the takeaway, "floating point is full of
| unknowable, random errors, never use floating point, you will
| never understand it."
|
| > Floating point is amazingly useful!
|
| Another thing about floats is they are for most parts actually
| very predictable. In particular all basic operations should
| produce bit-exact results to last ulp. Also because they are
| language independent standard, you generally can get same
| behavior in different languages and platforms. This makes
| learning floats properly worthwhile because the knowledge is so
| widely applicable
| jsmith45 wrote:
| >In particular all basic operations should produce bit-exact
| results to last ulp.
|
| As long as you are not using a compiler that utilizes x87's
| extended precision flaots for intermediate calculations, and
| silently rounding whenever it transfers to memory (That used
| to be a common issue), and as long as you are not doing dumb
| stuff with compiler math flags.
|
| Also if you have any code anwhere in your program that relies
| on correct subnormal handling, then you need to be absolutely
| sure no code is compiled with `-ffast-math`, including in any
| dynamically loaded code in your entire program, or your math
| will break: https://simonbyrne.github.io/notes/fastmath/#flus
| hing_subnor...
|
| And of course if you are doing anything complicated with
| floating point number, there are entire fields of study about
| creating numerically stable algorithms, and determining the
| precision of algorithms with floating point numbers.
| gumby wrote:
| > Floating point is amazingly useful! There's a reason why it's
| implemented in hardware in all modern computers and why every
| programming language has a built-in type for floats.
|
| I completely agree with you even though I go out of my way to
| avoid FP, and even though, due to what I usually work on, I can
| often get away with avoiding FP (often fixed point works -- for
| me).
|
| IEEE-754 is a marvelous standard. It's a short, easy to
| understand standard attached to an absolutely mind boggling
| number of special cases or explanation as to why certain
| decisions in the simple standard were actually incredibly
| important (and often really smart and non-obvious). It's the
| product of some very smart people who had, through their
| careers, made FP implementations and discovered why various
| decisions turned out to have been bad ones.
|
| I'm glad it's in hardware, and not just because FP used to be
| quite slow and different on every machine. I'm glad it's in
| hardware because chip designers (unlike most software
| developers) are anal about getting things right, and
| implementing FP properly is _hard_ -- harder than using it!
| [deleted]
| carapace wrote:
| Floating point is a goofy hacky kludge.
|
| > There's a reason why it's implemented in hardware in all
| modern computers
|
| Yah, legacy.
|
| The reason we used it originally is that computers were small
| and slow. Now that they're big and fast we could do without it,
| except that there is already so much hardware and software out
| there that it will never happen.
| astrange wrote:
| Turning all your fixed-size numeric types into variable-sized
| numeric types introduces some really exciting performance and
| security issues. (At least if you consider DoS security.)
|
| I think fixed-point math is underrated though.
| dahfizz wrote:
| What replacement would you propose? They all have different
| tradeoffs.
| carapace wrote:
| (I just tried to delete my comment and couldn't because of
| your reply. Such is life.)
|
| ogogmad made a much more constructive comment than mine:
| https://news.ycombinator.com/item?id=34370745
|
| It really depends on your use case.
| ogogmad wrote:
| > And you should understand that most of its limitations are an
| inherent mathematical and fundamental limitation, it is
| logically impossible to do better on most of its limitations
|
| You can do exact real arithmetic. But this is only done by
| people who prove theorems with computers - or by the Android
| calculator! https://en.wikipedia.org/wiki/Computable_analysis
|
| Other alternatives (also niche) are exact rational arithmetic,
| computer algebra, arbitrary precision arithmetic.
|
| Fixed point sometimes gets used instead of floats because some
| operations lose no precision over them, but most operations
| still do.
| saagarjha wrote:
| These are only relevant in some circumstances. For example, a
| calculator is typically bounded in the number of operations
| you can perform to a small number (humans don't add millions
| of numbers). This allows for certain representations that
| don't make sense elsewhere.
| lanstin wrote:
| I wouldn't call computable reals the reals. They are a subset
| of measure zero. Perhaps all we sentient beings can aspire to
| use, but still short of the glory of the completed infinities
| that even one arbitrary real represents.
|
| One half : )
| jordigh wrote:
| In my opinion, that's in the realm of "you can only delay
| it". Sure, you can treat real numbers via purely logical
| deductions like a human mathematician would, but at some
| point someone's going to ask, "so, where is the number on
| this plot?" and that's when it's time to pay the fiddler.
|
| Same for arbitrary-precision calculations like big rationals.
| That just gives you as much precision as your computer can
| fit in memory. You will still run out of precision, just
| later rather then sooner.
| ogogmad wrote:
| > Same for arbitrary-precision calculations like big
| rationals. That just gives you as much precision as your
| computer can fit in memory. You will still run out of
| precision, later rather then sooner.
|
| Oh, absolutely. This actually shows that floats are (in
| some sense) more rigorous than more idealised mathematical
| approaches, because they explicitly deal with finite
| memory.
|
| Oh, I remembered! There's also interval arithmetic, and
| variants of it like affine arithmetic. At least you _know_
| when you 're losing precision. Why don't these get used
| more? These seem more ideal, somehow.
| gugagore wrote:
| If x is the interval [-1, 1], the typical implementation
| of IA will
|
| evaluate x-x to [-2, 2] (instead of [0, 0], and
|
| evaluate x*x [-1, 1] instead of [0, 1].
|
| Therefore the intervals become too conservative to be
| useful.
| genneth wrote:
| Because the interval, on average, grows exponentially
| with the number of basic operations. So it quickly
| becomes practically useless.
| zokier wrote:
| > 3. You might not like that floats are in binary, which makes
| decimal arithmetic look weird. But doing decimal arithmetic
| does not get rid of numerical error, see point 1 (and binary
| arithmetic thinks your decimal arithmetic looks weird too).
|
| One thing that I suspect trips people a lot is decimal
| string/literal <-> (binary) float conversions instead of the
| floating point math itself. This includes the classic 0.1+0.2
| thing, and many of the problems in the article.
|
| I think these days using floating point hex strings/literals
| more would help a lot. There are also decimal floating point
| numbers that people largely ignore despite being standard for
| over 15 years
| jordigh wrote:
| The only implementation of IEEE754 decimals I've ever seen is
| in Python's Decimal package. Is there an easily-available
| implementation anywhere else?
| zokier wrote:
| I don't think Pythons Decimal is ieee754, instead its some
| sort of arbitrary precision thingy.
|
| GCC has builtin support for decimal floats:
| https://gcc.gnu.org/onlinedocs/gcc/Decimal-Float.html
|
| There are also library implementations floating around,
| some of them are mentioned in this thread:
| https://discourse.llvm.org/t/rfc-decimal-floating-point-
| supp...
|
| decnumber has also rust wrappers if you are so inclined
| jordigh wrote:
| Python's decimal absolutely is IEEE 754 (well, based on
| the older standard, which has now been absorbed into IEEE
| 754):
|
| https://github.com/python/cpython/blob/main/Lib/_pydecima
| l.p...
|
| Cool, didn't know that gcc had built-in support. But is
| it really as incomplete as it says there?
| zokier wrote:
| Huh, I didn't know it was that close, I'll grant that.
| But I'd say still no cigar.
|
| One of the most elementary requirements of IEEE754 is:
|
| > A programming environment conforms to this standard, in
| a particular radix, by implementing one or more of the
| basic formats of that radix as both a supported
| arithmetic format and a supported interchange format.
|
| (Section 3.1.2)
|
| While you could argue that you may configure Decimals
| context parameters to match those of some IEEE754 format
| and thus claim conformance as arithmetic format, Python
| has absolutely no support for the specified interchange
| formats.
|
| To be honest, seeing this I'm bit befuddled on why closer
| conformance with IEEE754 is not sought. Quick search
| found e.g. this issue report on adding IEEE754
| parametrized context, which is a trivial patch, and it
| has been just sitting there for 10 years:
| https://github.com/python/cpython/issues/53032
|
| Adding code to import/export BID/DPD formats, while maybe
| not as trivial, seems still comparatively small task and
| would improve interoperability significantly imho.
| Lind5 wrote:
| AI already has led to a rethinking of computer architectures, in
| which the conventional von Neumann structure is replaced by near-
| compute and at-memory floorplans. But novel layouts aren't enough
| to achieve the power reductions and speed increases required for
| deep learning networks. The industry also is updating the
| standards for floating-point (FP) arithmetic.
| https://semiengineering.com/will-floating-point-8-solve-ai-m...
| dkarl wrote:
| I'm not on Mastodon, so I'll share here: I inherited some
| numerical software that was used primarily to prototype new
| algorithms and check errors for a hardware product that solved
| the same problem. It was known that different versions of the
| software produced slightly different answers, for seemingly no
| reason. The hardware engineer who handed it off to me didn't seem
| to be bothered by it. He wasn't using version control, so I
| couldn't dig into it immediately, but I couldn't stop thinking
| about it.
|
| Soon enough I had two consecutive releases in hand, which
| produced different results, and which had _identical numerical
| code_. The only code I had changed that ran during the numerical
| calculations was code that ran _between_ iterations of the
| numerical parts of the code. IIRC, it printed out some status
| information like how long it had been running, how many
| calculations it had done, the percent completed, and the
| predicted time remaining.
|
| How could that be affecting the numerical calculations??? My
| first thought was a memory bug (the code was in C-flavored C++,
| with manual memory management) but I got nowhere looking for one.
| Unfortunately, I don't remember the process by which I figured
| out the answer, but at some point I wondered what instructions
| were used to do the floating-point calculations. The Makefile
| didn't specify any architecture at all, and for that compiler, on
| that architecture, that meant using x87 floating-point
| instructions.
|
| The x87 instruction set was originally created for floating point
| coprocessors that were designed to work in tandem with Intel
| CPUs. The 8087 coprocessor worked with the 8086, the 287 with the
| 286, the 387 with the 386. Starting with the 486 generation, the
| implementation was moved into the CPU.
|
| Crucially, the x87 instruction set includes a stack of eight
| 80-bit registers. Your C code may specify 64-bit floating point
| numbers, but since the compiled code has to copy those value into
| the x87 registers to execute floating-point instructions, the
| calculations are done with 80-bit precision. Then the values are
| copied back into 64-bit registers. If you are doing multiple
| calculations, a smart compiler will keep intermediate values in
| the 80-bit registers, saving cycles and gaining a little bit of
| precision as a bonus.
|
| Of course, the number of registers is limited, so intermediate
| values may need to be copied to a 64-bit register temporarily to
| make room for another calculation to happen, rounding them in the
| process. And that's how code interleaved with numerical
| calculations can affected the results even if it semantically
| doesn't change any of the values. Calculating percent completed,
| printing a progress bar -- the compiler may need to move values
| out of the 80-bit registers to make room for these calculations,
| and when the code changes (like you decide to also print out an
| estimated time remaining) the compiler might change which
| intermediate values are bumped out of the 80-bit registers and
| rounded to 64 bits.
|
| It was silly that we were executing these ancient instructions in
| 2004 on Opteron workstations, which supported SSE2, so I added a
| compiler flag to enable SSE2 instructions, and voila, the
| numerical results matched exactly from build to build. We also
| got a considerable speedup. I later found out that there's a bit
| you can flip to force x87 arithmetic to always round results to
| 64 bits, probably to solve exactly the problem I encountered, but
| I never circled back to try it.
| jordigh wrote:
| Oh man, those 80-bit registers on 32-bit machines were weird. I
| was very confused as an undergrad when I ran the basic program
| to find machine epsilon, and was getting a much smaller epsilon
| than I expected on a 64-bit float. Turns out, the compiler had
| optimised all of my code to run on registers and I was getting
| the machine epsilon of the registers instead.
| cratermoon wrote:
| Muller's Recurrence is my favorite example of floating point
| weirdness. See https://scipython.com/blog/mullers-recurrence/ and
| https://latkin.org/blog/2014/11/22/mullers-recurrence-roundo...
| lifefeed wrote:
| My favorite floating point weirdness is that 0.1 can't be exactly
| represented in floating point.
| jrockway wrote:
| Isn't it equally weird that 1/3 can't be exactly represented in
| decimal?
| pitaj wrote:
| Yep! Too bad humanity has settled on decimal instead of
| dozenal (base 12).
| kps wrote:
| Indeed, 0.1 can be represented exactly in _decimal_ floating
| point, and can 't be represented in _binary_ fixed point. It
| 's just that fractional values are currently almost always
| represented using binary floating point, so the two get
| conflated.
| layer8 wrote:
| The reason why the 0.1 case is weird (unexpected) is that we
| use decimal notation in floating-point constants (in source
| code, in formats like JSON, and in UI number inputs), but the
| value that the constant actually ends up representing is
| really the closest binary number, where in addition the
| closeness depends on the FP precision used. If we would write
| FP values in binary or hexadecimal (which some languages
| support), the issue wouldn't arise.
| dahfizz wrote:
| > Javascript only has floating point numbers - it doesn't have an
| integer type.
|
| Can anyone justify this? Do JS developers prefer not having exact
| integers, or is this something that everyone just kinda deals
| with?
| thdc wrote:
| I believe this is technically inaccurate; while Javascript
| groups most of the number values under, well, "number", modern
| underlying implementations may resort to perform integer
| operations when they recognize it is possible. There are also a
| couple hacks you can do with bit operations to "work" with
| integers, although I don't remember them off the top of my head
| - typically used for truncating and whatnot and was mainly a
| performance thing.
|
| Also there are typed arrays and bigints if we can throw those
| in, too.
| saagarjha wrote:
| The way runtimes optimize arithmetic is an implementation
| detail and must conform to IEEE-754.
| thdc wrote:
| Fair point, I have been taking smis for granted
| enriquto wrote:
| > not having exact integers
|
| What do you mean? Floating-point arithmetic is, by design,
| exact for small integers. The result of adding 2.0 to 3.0 is
| exactly 5.0. This is one of the few cases where it is perfectly
| legitimate to compare floats for equality.
|
| In fact, using 64-bit doubles to represent ints you get way
| more ints than using plain 32-bit ints. Thus, choosing doubles
| to represent integers makes perfect sense (unless you worry
| about wasting a bit of memory and performance).
| josefx wrote:
| You can use doubles to store and calculate exact integer
| values. You just wont get 2^64 integers, instead you get the
| range +/-2^53 .
| deathanatos wrote:
| Nowadays, it has BigInt.
|
| If you're very careful, a double can be an integer type. (A
| 53-bit one, I think?) (I don't love this line of thinking. It
| has _a lot_ of sharp edges. But JS programmers effectively do
| this all the time, often without thinking too hard about it.)
|
| (And even before BigInt, there's an odd u32-esque "type" in JS;
| it's not a real type -- it doesn't appear in the JS type
| system, but rather an internal one that certain operations will
| be converted to internally. That's why (0x100000000 | 0) == 0 ;
| even though 0x100000000 (and every other number in that
| expression, and the right answer) is precisely representable as
| a f64. This doesn't matter for JSON decoding, though, ... and
| most other things.)
| guyomes wrote:
| Example 4 mentions that the result might be different with the
| same code. Here is an example that is particularly counter-
| intuitive.
|
| Some CPU have the instruction FMA(a,b,c) = ab + c and it is
| guaranteed to be rounded to the nearest float. You might think
| that using FMA will lead to more accurate results, which is true
| most of the time.
|
| However, assume that you want to compute a dot product between 2
| orthogonal vectors, say (u,v) and (w,u) where w = -v. You will
| write:
|
| p = uv + wu
|
| Without FMA, that amounts to two products and an addition between
| two opposite numbers. This results in p = 0, which is the
| expected result.
|
| With FMA, the compiler might optimize this code to:
|
| p = FMA(u, v, wu)
|
| That is one FMA and one product. Now the issue is that wu is
| rounded to the nearest float, say x, which is not exactly -vu. So
| the result will be the nearest float to uv + x, which is not
| zero!
|
| So even for a simple formula like this, testing if two vectors
| are orthogonal would not necessary work by testing if the result
| is exactly zero. One recommended workaround in this case is to
| test if the dot product has an absolute value smaller than a
| small threshold.
| [deleted]
| zokier wrote:
| Note that with gcc/clang you can control the auto-use of fma
| with compile flags (-ffp-contract=off). It is pretty crazy imho
| that gcc defaults to using fma
| thxg wrote:
| > It is pretty crazy imho that gcc defaults to using fma
|
| Yes! Different people can make different performance-vs-
| correctness trade-offs, but I also think reproducible-by-
| default would be better.
|
| Fortunately, specifying a proper standard (e.g. -std=c99 or
| -std=c++11) implies -ffp-contract=off. I guess specifying
| such a standard is probably a good idea independently when we
| care about reproducibility.
|
| Edit: Thinking about it, it the days of 80-bit x87 FPUs,
| strictly following the standard (specifically, always
| rounding to 64 bits after every operation) may have been
| prohibitively expensive. This may explain gcc's GNU mode
| defaulting to -ffast-math.
| zokier wrote:
| > Edit: Thinking about it, it the days of 80-bit x87 FPUs,
| strictly following the standard (specifically, always
| rounding to 64 bits after every operation) may have been
| prohibitively expensive
|
| afaik you could just set the precision of x87 to 32/64/80
| bits and there would not be any extra cost to the
| operations
| lanstin wrote:
| In general with reals with any source of error anywhere, this
| caution about equality is always correct. the odds of two reals
| being equal is zero.
| raphlinus wrote:
| I have an exception that proves the rule. I thought about
| responding to Julia's call, but decided this was too subtle.
| But here we go...
|
| A central primitive in 2D computational geometry is the
| orientation problem; in this case deciding whether a point
| lies to the left or right of a line. In real arithmetic, the
| classic way to solve it is to set up the line equation (so
| the value is zero for points on the line), then evaluate that
| for the given point and test the sign.
|
| The problem is of course that for points very near the line,
| roundoff error can give the wrong answer, it is in fact an
| example of cancellation. The problem has an exact answer, and
| can be solved with rational numbers, or in a related
| technique detecting when you're in the danger zone and upping
| the floating point precision just in those cases. (This
| technique is the basis of Jonathan Shewchuk's thesis).
|
| However, in work I'm doing, I want to take a different
| approach. If the y coordinate of the point matches the y
| coordinate of one of the endpoints of the line, then you can
| tell orientation exactly by comparing the x coordinates. In
| other cases, either you're far enough away that you know you
| won't get the wrong answer due to roundoff, or you can
| subdivide the line at that y coordinate. Then you get an
| orientation result that is not necessarily exactly correct
| wrt the original line, but you can count on it being
| consistent, which is what you really care about.
|
| So the ironic thing is that if you had a lint that said,
| "exact floating point equality is dangerous, you should use a
| within-epsilon test instead," it would break the reasoning
| outlined above, and you could no longer count on the
| orientations being consistent.
|
| As I said, though, this is a very special case. _Almost_
| always, it is better to use a fuzzy test over exact equality,
| and I can also list times I 've been bitten by that (
| _especially_ in fastmath conditions, which are hard to avoid
| when you 're doing GPU programming).
| thxg wrote:
| Yes, and this is not just a theoretical concern: There was an
| article here [1] in 2021 claiming that Apple M1's FMA
| implementation had "flaws". There was actually no such flaw.
| Instead, the author was caught off guard by the very phenomenon
| you are describing.
|
| [1] https://news.ycombinator.com/item?id=27880461
| kloch wrote:
| > if you add very big values to very small values, you can get
| inaccurate results (the small numbers get lost!)
|
| There is a simple workaround for this:
|
| https://en.wikipedia.org/wiki/Kahan_summation
|
| It's usually only needed when adding billions of values together
| and the accumulated truncation errors would be at an unacceptable
| level.
| phkahler wrote:
| It can also come up in simple control systems. A simple low-
| pass filter can fail to converge to a steady state value if the
| time constant is long and the sample rate is high.
|
| Y += (X-Y) * alpha * dt
|
| When dt is small and alpha is too, the right hand side can be
| too small to affect the 24bit mantissa of the left.
|
| I prefer a 16/32bit fixed point version that guarantees
| convergene to any 16bit steady state. This happened in a power
| conversion system where dt=1/40000 and I needed a filter in the
| 10's of Hz.
| jbay808 wrote:
| This is a very important application and a tougher problem
| than most would guess. There is a huge variety of ways to
| numerically implement even a simple transfer function, and
| they can have very different consequences in terms of
| rounding and overflow. Especially if you want to not only
| guarantee that it converges to a steady-state, but
| furthermore that the steady-state has no error. I spent a lot
| of time working on this problem for nanometre-accurate servo
| controls. Floating and fixed point each have advantages
| depending on the nature and dynamic range of the variable
| (eg. location parameter vs scale parameter).
| kergonath wrote:
| > It's usually only needed when adding billions of values
| together and the accumulated truncation errors would be at an
| unacceptable level.
|
| OTOH, it's easy to implement, so I have a couple of functions
| to do it easily, and I got quite a lot of use out of them. It's
| probably overkill sometimes, but sometimes it's useful.
| Aardwolf wrote:
| > but I wanted to mention it because:
|
| > 1. it has a funny name
|
| Reasoning accepted!
| [deleted]
| owisd wrote:
| My 'favourite' is that the quadratic formula -b+-sqrt(b2-4ac)/2a
| falls apart when you solve for the positive solution using
| floating point for cases where e=b2/4ac is small, the workaround
| being to use the binomial expansion -b/2a*(0.5e-0.125e2+O(e3))
| svat wrote:
| If you have only a couple of minutes to develop a mental model of
| floating-point numbers (and you have none currently), the most
| valuable thing IMO would be to spend them staring at a diagram
| like this one:
| https://upload.wikimedia.org/wikipedia/commons/b/b6/Floating...
| (uploaded to Wikipedia by user Joeleoj123 in 2020, made using
| Microsoft Paint) -- it already covers the main things you need to
| know about floating-point, namely there are only finitely many
| discrete representable values (the green lines), and the gaps
| between them are narrower near 0 and wider further away.
|
| With just that understanding, you can understand the reason for
| most of the examples in this post. You avoid both the extreme of
| thinking that floating-point numbers are mathematical (exact)
| real numbers, and the extreme of "superstition" like believing
| that floating-point numbers are some kind of fuzzy blurry values
| and that any operation always has some error / is "random", etc.
| You won't find it surprising why 0.1 + 0.2 [?] 0.3, but 1.0 + 2.0
| will always give 3.0, but 100000000000000000000000.0 +
| 200000000000000000000000.0 [?] 300000000000000000000000.0. :-)
| (Sure this confidence may turn out to be dangerous, but it's
| better than "superstition".) The second-most valuable thing, if
| you have 5-10 minutes, may be to go to https://float.exposed/ and
| play with it for a while.
|
| Anyway, great post as always from Julia Evans. Apart from the
| technical content, her attitude is really inspiring to me as
| well, e.g. the contents of the "that's all for now" section at
| the end.
|
| The page layout example ("example 7") illustrates the kind of
| issue because of which Knuth avoided floating-point arithmetic in
| TeX (except where it doesn't matter) and does everything with
| scaled integers (fixed-point arithmetic). (It was even worse then
| before IEEE 754.)
|
| I think things like fixed-point arithmetic, decimal arithmetic,
| and maybe even exact real arithmetic / interval arithmetic are
| actually more feasible these days, and it's no longer obvious to
| me that floating-point should be the default that programming
| languages guide programmers towards.
| sacrosancty wrote:
| If you have even less time, just think of them as representing
| physical measurements made with practical instruments and the
| math done with analog equipment.
|
| The common cause of floating point problems is usually treating
| them as a mathematical ideal. The quirks appear at the extremes
| when you try to to un-physical things with them. You can't
| measure exactly 0 V with a voltmeter, or use an instrument for
| measuring the distance to stars then add a length obtained from
| a micrometer without entirely losing the latter's contribution.
| svat wrote:
| Thanks, I actually edited my post (made the second paragraph
| longer) after seeing your comment. The "physical" / "analog"
| idea does help in one direction (prevents us from relying on
| floating-point numbers in unsafe ways) but I think it brings
| us too close to the "superstition" end of the spectrum, where
| we start to think that floating-point operations are non-
| deterministic, start doubting whether we can rely on (say)
| the operation 2.0 + 3.0 giving exactly 5.0 (we can!), whether
| addition is commutative (it is, if working with non-NaN
| floats) and so on.
|
| You could argue that it's "safe" to distrust floating-point
| entirely, but I find it more comforting to be able to take at
| least some things as solid and reason about them, to refine
| my mental model of when errors can happen and not happen,
| etc.
|
| Edit: See also the _floating point isn't "bad" or random_
| section that the author just added to the post
| (https://twitter.com/b0rk/status/1613986022534135809).
| ogogmad wrote:
| Related: In numerical analysis, I found the distinction between
| forwards and backwards numerical error to be an interesting
| concept. The forwards error initially seems like the only right
| kind, but is often impossible to keep small in numerical linear
| algebra. In particular, Singular Value Decomposition cannot be
| computed with small forwards error. But the SVD can be computed
| with small backwards error.
|
| Also: The JSON example is nasty. Should IDs then always be
| strings?
| gugagore wrote:
| IIRC, forward error: the error between the given answer and the
| right answer to the given question.
|
| Backward error: the error between the given question, and the
| question whose right answer is the given answer.
|
| Easier to parse like this: a small forward error means that you
| give an answer close to the right one.
|
| A small backward error means that the answer you give is the
| right answer for a nearby question.
| deathanatos wrote:
| > _The JSON example is nasty._
|
| Specs, vs. their implementations, vs. backwards compat. JSON
| just defines a number type, and neither the grammar nor the
| spec places limits on it (though the spec does call out exactly
| this problem). So the JSON is to-spec valid. But
| implementations have limits as to what they'll decode: JS's is
| that it decodes to number (a double) by default, and thus,
| loses precision.
|
| (I feel like this issue is pretty well known, but I suppose it
| probably bites everyone once.)
|
| JS does have the BigInt type, nowadays. Unfortunately, while
| the JSON.parse API includes a "reviver" parameter, the way it
| ends up working means that it can't actually take advantage of
| BigInt.
|
| > _Should IDs then always be strings?_
|
| That's a decent-ish solution; as it side-steps the interop
| issues. String, to me, is not unreasonable for an ID, as you're
| not going to be doing math on it.
| mikehollinger wrote:
| Love it. I actually use Excel which even power users take for
| granted to highlight that people _really_ need to understand the
| underlying system, or the system needs to have guard rails to
| prevent people from stubbing their toes. Microsoft even had to
| write a page explaining what might happen [1] with floating point
| wierdness.
|
| [1] https://docs.microsoft.com/en-
| us/office/troubleshoot/excel/f...
___________________________________________________________________
(page generated 2023-01-13 23:00 UTC)