[HN Gopher] Catastrophic Cancellation (2020)
       ___________________________________________________________________
        
       Catastrophic Cancellation (2020)
        
       Author : _ZeD_
       Score  : 69 points
       Date   : 2021-03-19 08:05 UTC (14 hours ago)
        
 (HTM) web link (twitter.com)
 (TXT) w3m dump (twitter.com)
        
       | ForHackernews wrote:
       | Does Python's Decimal type handle this correctly?
        
         | lazypenguin wrote:
         | Yes it does and should be used when precision is important.
        
       | fchu wrote:
       | Am I the only one who thinks the concept of relative error is not
       | meaningful in this context?
       | 
       | It gives a disproportionate meaning to 0 without real physical
       | consideration, eg:
       | 
       | - 0.10C +- 0.1 (wow 100% relative error) - 273.25K +- 0.1 (meh
       | 0.04% relative error)
        
         | jcheng wrote:
         | This is talking about the error in the _difference between_ two
         | values with the same units though. For temperature, it wouldn't
         | matter if you're using C, K, or F for your starting values, the
         | % error of the difference would be the same (I think).
        
         | Archelaos wrote:
         | I am sometimes joking with friends by suggesting: Let's meet at
         | 12 o'clock +- 5%.
        
         | andrepd wrote:
         | Yes, because there is an arbitrary choice of origin which
         | renders the relative error dependant on units. If you're
         | measuring a length, for instance, or an interval of time, the
         | relative error is independent of which units you choose. If
         | you're measuring e.g. a distance to some point, then again you
         | have an arbitrary choice of origin.
        
         | _Microft wrote:
         | The Celsius temperature scale is an interval scale [0] which
         | means that it is possible to calculate differences but not
         | ratios. The Kelvin temperature scale is a ratio scale [0] (it
         | has an "absolute zero") that allows to do that.
         | 
         | Beside that if there are uncertainties involved, one should do
         | proper propagation of uncertainty anyways. [1]
         | 
         | [0] https://en.wikipedia.org/wiki/Level_of_measurement
         | 
         | [1] https://en.wikipedia.org/wiki/Propagation_of_uncertainty
        
       | amelius wrote:
       | If you add a really large number, then your relative error will
       | decrease!
        
         | chias wrote:
         | If you add a function which increments a dummy value 4000
         | times, your test coverage will increase!
        
       | choeger wrote:
       | It is kind of relieving to not have that thread talk about the
       | latest stunt of cancel culture activists.
       | 
       | But I have to wonder: Why should I use a floating point number
       | for something lika a micro benchmark. Admittedly, counting from
       | 1970 wastes some bits, but I am usually interested in some
       | discrete quantity (ns, us, ...), so why introduce numerical
       | problems in the first place?
        
         | barbazoo wrote:
         | I think you're right if you are actually able to get ms or ms
         | but as far as I know the unit your getting most of the time in
         | python is second.
        
           | formerly_proven wrote:
           | perf_counter generally has nanosecond-ish resolution, but it
           | gives you a float with seconds so you can just drop it in
           | instead of time or other timers (perf_counter is like two
           | decades more recent than time.time()). Newer Pythons have
           | xxx_ns variants that give you an int instead.
        
       | Ancapistani wrote:
       | Threaded version:
       | https://threadreaderapp.com/thread/1275924648132149249.html
        
         | SkyMarshal wrote:
         | These tweets are already threaded on twitter, why is this app
         | even needed?
         | 
         | It tends to pollute the replies more often than not, with more
         | people invoking it than actually replying.
        
           | draw_down wrote:
           | I agree, that is so annoying, 8000 replies like
           | "@myCoolThreadBot unroll"
        
           | lupire wrote:
           | Twitter doesn't thread properly. It does weird things based
           | on likes.
        
           | s_gourichon wrote:
           | threadreaderapp.com opens quickly, works on a browser without
           | Javascript, doesn't nag about installing an app. twitter.com
           | on a browser fails those criteria (especially on mobile). So,
           | win for threadreaderapp.com.
        
       | majormajor wrote:
       | I don't understand how rearranging operations, like suggested
       | later in the thread, would avoid the large relative error in "how
       | much older is the earth than the oceans" given that our estimates
       | for the age of the earth and oceans are only so precise?
        
         | mam2 wrote:
         | maybe he wants to implictly display correlations. if you
         | substract directly you make the implicit assumptions the two
         | even dates are uncorrelated.
        
         | Someone wrote:
         | It wouldn't and cannot, because there's only one operation in
         | that calculation, so there's nothing to rearrange.
         | 
         | That example only shows that relative errors can explode even
         | when doing simple calculations.
         | 
         | Another thing is that, in computers, calculations often are
         | imprecise.
         | 
         | Because of catastrophic cancellation and similar issues, that
         | means that the computer result of a calculation can be quite
         | different from the mathematical result.
         | 
         | To make matters worse, in real life, we often don't know the
         | exact values of things we measure, so even if our calculations
         | are mathematically perfect, the outcome of a calculation by
         | computer can be quite different from the real result.
         | 
         | So, if you do a computer calculation, say to compute how strong
         | a bridge has to be, you really, really need to know now close
         | the computed value, at worst, is to the mathematically exact
         | result.
         | 
         | That's what numerical analysis is about. For a given
         | calculation, it might say such things as
         | 
         |  _"if the input is between 100 and 200, to get a result with n
         | decimal digits of precision, you'll have to compute all
         | intermediate results with 4 x n digits."_
         | 
         | or
         | 
         |  _"but if you rearrange the computation like this, you only
         | need to use 2 x n digits for n digits of precision in your
         | result"_
        
         | tylerhou wrote:
         | Doesn't help for the "older question." That's just used as an
         | intuitive example for how imprecision can arise from
         | subtraction.
         | 
         | Rearranging helps when you can only store intermediate results
         | with finite precision, but you can compute them to arbitrary
         | precision.
        
           | majormajor wrote:
           | I see. I might quibble that here the imprecision isn't just
           | from the subtraction, but because we originally just have
           | very rough estimates for the dates (though 'mam2 brings up a
           | very interesting point about correlation if estimates are
           | based on each other and using them like that isn't the ideal
           | way of answering the question).
           | 
           | It threw me off enough that I didn't get the original point
           | of it re: floating point numbers until your comment - in our
           | floating point formats, the imprecision is often an
           | accidental effect of the limits of the format, vs a true
           | unknown, and then magnifying the relative size of that
           | arbitrary limitation is a problem.
        
             | skybrian wrote:
             | Wether it's imprecise measurements or imprecise
             | calculations, there are still things you can do if you're
             | aware of the problem. When measurement is imprecise, it
             | might be possible to improve accuracy by using a different
             | measurement.
             | 
             | For geology, it's often easier to put things in the right
             | order due to rock layers rather than to figure how long ago
             | they were from present.
             | 
             | In this case, obviously, the Earth is older than the
             | oceans, even if the estimated ages were even rougher and
             | the error bars implied they could be in the opposite order.
             | 
             | For history, you may be able to figure out the relative
             | order of events without knowing what year they were on our
             | calendar.
        
         | dragontamer wrote:
         | This is a strange tweet. It assumes people are familiar with
         | classical cancellation error, but not familiar with error
         | analysis. Which in my experience... people either understand
         | both, or are ignorant of both concepts.
         | 
         | The general point is that "cancellation error" happens more
         | than just in floating-point operations, but also in "classic
         | scientific sig-fig error analysis".
         | 
         | ---------
         | 
         | The tweet should either be dumbed down to discuss cancellation
         | error in floating-point arithmetic, or elevated up and assume
         | people know about sig-fig analysis. It sits at a weird point in
         | the "assumed knowledge" curve.
         | 
         | ---------
         | 
         | For people unfamiliar with cancellation error, try the two
         | following statements in Python3 (which defaults to double-
         | precision... aka 53-bits of mantissa).
         | poor_ordering = 9007199254740992.0 + 1.0 + 1.0 + 1.0 + 1.0 -
         | 9007199254740992.0         good_ordering = 9007199254740992.0 -
         | 9007199254740992.0 + 1.0 + 1.0 + 1.0 + 1.0
         | 
         | What are the values of "poor_ordering" vs "good_ordering" ??
         | What does this tell us about double-precision?
         | 
         | 9007199254740992.0 == 2^53. So it is impossible for a double-
         | precision number to accurately represent +/- 1.0 at 2^53. (Note
         | that +/- 2.0 will work out just fine).
         | 
         | Play around with 9007199254740992.0 +/- 1.0, or 2.0, and other
         | values for about 15 minutes, and you'll probably learn
         | everything you need to know about cancellation error from that
         | playtime alone.
         | 
         | Double-precision numbers are composed of 52-explicit bits + 1
         | implicit bit + 1 sign bit + 11-bit exponent bits (yes, 65-bits
         | total. The implicit bit "doesn't count", but makes 0.0 and
         | subnormal numbers harder to deal with)
        
         | khawkins wrote:
         | I think it's a bad example, but illustrates an important point
         | he doesn't make explicit, that sometimes the variable you need
         | to estimate is the DX itself, not just X1 and X2 to produce X2
         | - X1 = DX. With a sufficiently high amount of variance in your
         | approximations of X1 and X2 their difference will tell you
         | little to nothing about DX.
        
       ___________________________________________________________________
       (page generated 2021-03-19 23:01 UTC)