[HN Gopher] How to Correctly Sum Up Numbers
       ___________________________________________________________________
        
       How to Correctly Sum Up Numbers
        
       Author : sebg
       Score  : 28 points
       Date   : 2024-10-31 09:07 UTC (3 days ago)
        
 (HTM) web link (cedardb.com)
 (TXT) w3m dump (cedardb.com)
        
       | eesmith wrote:
       | This essay only covers underflow and overflow, and gives an
       | example of "correct" summation for Rust.                 fn
       | sum_up_numbers_correctly(numbers: &[i32]) -> Option<i32> {
       | let mut sum: i32 = 0;           for i in numbers {
       | sum = sum.checked_add(*i)?;           }           Some(sum)
       | }
       | 
       | I expected it to also mention the Kahan summation algorithm
       | (https://en.wikipedia.org/wiki/Kahan_summation_algorithm), which
       | gives more accurate results than this checked_add solution, but
       | it did not. I wonder if that will be in one of the subsequent
       | blog posts.
        
         | lifthrasiir wrote:
         | To be clear, the checked_add solution is for integers while the
         | Kahan summation is for IEEE 754 numbers. (There is no
         | `f32::checked_add` or similar.)
        
           | eesmith wrote:
           | You're right! I am not a Rust programmer and mis-read the
           | "i32" as "float32".
           | 
           | The text does imply the correct float solution has a similar
           | structure to the int32 case, with "So, you will have to
           | resort to writing machine code manually if you want to
           | efficiently check overflows of float numbers."
        
             | jepler wrote:
             | You don't need machine code these days to find overflow in
             | IEEE floating point arithmetic.
             | 
             | If any intermediate sum overflows, you get an infinity.
             | Further arithmetic can never return you to the realm of
             | finite numbers (though you can also get into the world of
             | "not a number" NaNs by calculating -inf + inf for
             | instance), so you can use a standard function like (C99
             | <math.h>) isfinite() to check if any intermediate step of
             | summing FP numbers overflowed the representation.
        
             | tialaramex wrote:
             | Rust's 32-bit floating point type is, unsurprisingly, named
             | f32 (stable Rust also has f64, and unstable builds offer
             | f16 and possibly f128 which are both newer IEEE formats)
             | 
             | If you're sure your answer is a 32-bit IEEE floating point
             | number, I think you could store the intermediate sums as a
             | fully expanded fixed point number with enough bits to go
             | from the smallest possible f32 to the largest possible, so
             | you're just doing (big integer) integer summation. 36 bytes
             | ought to be very possible. When you've finished summing,
             | you find the top-most set bit, its position decides the
             | exponent and the subsequent 23 bits are your mantissa.
        
         | msichert wrote:
         | Hey, I'm the author of the blog post!
         | 
         | I had planned to write about the Kahan summation, but I didn't
         | want to overload the blog post with too many different
         | concepts. I'm definitely considering writing about it in a
         | follow-up post.
        
       | Koshkin wrote:
       | The order in which you sum a large collection of floating-point
       | numbers is also important, if you want the result to be stable.
        
         | 77pt77 wrote:
         | Sum smaller absolute value numbers first.
        
           | stabbles wrote:
           | It will still accumulate rounding errors of O(n eps). If you
           | care you can make it O(lg n eps) with pairwise summation or
           | O(eps) with Kahan summation
        
             | Pesthuf wrote:
             | I just wonder, in situations where precision is so
             | important that these small rounding errors actually matter,
             | aren't floats the wrong tool anyway?
        
               | owlbite wrote:
               | Sometimes its more about numerical stability than
               | absolute accuracy (numerical stability being defined as
               | not have the property that small changes in input cause
               | large changes in the output).
        
               | stabbles wrote:
               | In many numerical algorithms it is often that rounding
               | errors add up over iterations / time. Maybe you loose one
               | significant digit in one iteration of your algorithm, let
               | it run longer and rounding errors start to dominate --
               | the result is garbage.
               | 
               | Since performance is important, not using floats is not
               | an option, and you rarely see Kahan summation type of
               | tricks because the heavy lifting is usually done by BLAS
               | libraries (i.e. vectorized linear algebra operations).
               | 
               | So, in numerical algorithms the type of tricks is
               | typically using stable matrix decompositions like the QR
               | decomposition A = QR where Q'Q = I (repeated
               | multiplication with Q leaves the quantities roughly the
               | same size, whereas with A things blow up).
               | 
               | And there are other tricks, for example if you need to
               | orthogonalize a vector x to this Q, in full precision you
               | compute y = x - Q * (Q' * x), so that Q' * y = 0. The Q'
               | * x computation is a reduction, so it can accumulate O(n
               | eps) errors in finite precision. Instead of doing a more
               | stable reduction (e.g. Kahan summation etc), you can
               | instead repeat the computation: z = y - Q * (Q' * y). The
               | idea is that the second orthogonalization step kills the
               | rounding error noise introduced by the first step.
               | 
               | So in short: you can still get accuracy and decent
               | performance with floats, you just have to know the
               | tricks.
               | 
               | [Another somewhat interesting point is that on GPUs
               | reductions are by necessity implemented with O(ln n eps)
               | error, and the naive but vectorized CPU implementation
               | has an error O(n/k eps) where k is the vector width]
        
               | TeMPOraL wrote:
               | They are, but that didn't stop people from using them.
               | They're convenient, they're _bounded_ , and as such,
               | ended up having decades of hardware optimization behind
               | then. When precision turns out to be really important,
               | it's often cheaper to get some smart people to
               | restructure computation as to mitigate the errors, than
               | it is to switch to arbitrary-precision math and deal with
               | consequences. There's a whole field of CS that's spent
               | the last half century or more developing ways to deal
               | with floating point math problems - extending the scope
               | of their use, and making alternatives _even less
               | economical_.
               | 
               | There's an interesting case of it today: LLMs. In theory,
               | they should be fully deterministic with "temperature"
               | parameter set to 0, but _they aren 't_ - all that matrix
               | multiplication is massively parallelized, so with
               | floating point math being order-dependent, partial
               | results quickly accumulate non-deterministic errors.
        
         | stabbles wrote:
         | This is more relevant than overflow
        
       | GMoromisato wrote:
       | One of the things I love about software engineering is that there
       | are no absolute answers, only trade-offs.
       | 
       | You can't memorize your way into software engineering. You can't
       | say, "always wrap-around when an operation overflows" or "always
       | tell the user (panic) when an operation overflows".
       | 
       | This article is a great example. C wraps unsigned integer
       | addition on overflow, but Python upscales to arbitrary integer
       | sizes. The trade-off, of course, is performance. If you're
       | writing a real-time engine (for a rocket or a video game) and you
       | can guarantee integer sizes, then you need to use the fastest
       | algorithm possible. But if you're writing a general-purpose
       | calculator, you might prefer upscaling or at least catching
       | overflows and informing the user.
       | 
       | In almost any other disciple there would be a "correct" way to do
       | it, or at least an accepted "best practice." But for software
       | engineers, it's always, "it depends."
        
         | cratermoon wrote:
         | > You can't say, "always wrap-around when an operation
         | overflows" or "always tell the user (panic) when an operation
         | overflows".
         | 
         | If you were designing the flight software for the Ariane 5,
         | which choice would you make?
        
           | GMoromisato wrote:
           | I understood that reference.
           | 
           | But it just reinforces my point that there are no absolute
           | rules. You have to think.
        
           | TeMPOraL wrote:
           | The important choice was made elsewhere: flight termination
           | system successfully self-destructed the rocket once it was
           | unlikely to recover from its problem.
           | 
           | The Wiki quotes an interesting sentence from the official
           | report - "An underlying theme in the development of Ariane 5
           | is the bias towards the mitigation of random failure." It's
           | fortunate that that bias didn't extend to flight termination
           | system. At the point flight control software intersects with
           | real life, it's better to destroy an expensive rocket than
           | have it rain fiery hell on some random people somewhere else.
           | That principle holds to this day.
        
           | wang_li wrote:
           | > If you were designing the flight software for the Ariane 5,
           | which choice would you make?
           | 
           | I would take note of the underlying hardware and what range
           | of inputs are allowed and code appropriately. Stop pretending
           | that your software doesn't have a specific purpose and will
           | run on an actual computer. If you're doing finance you should
           | be using fixed point integer arithmetic with two or three
           | decimal places. If you're doing physics simulations you need
           | to be aware of how many significant digits exist. If you
           | ignore your problem space and the kind of system your code is
           | going to run on you're living in spherical cow world and it
           | doesn't matter if you over- or under-flow because your
           | software is garbage anyway.
        
         | Zamiel_Snawley wrote:
         | Saturation arithmetic doesn't get enough love, either.
         | 
         | I've heard Zig has convenient facilities for it.
        
         | IshKebab wrote:
         | There are plenty of absolutes. People just don't talk about
         | them much because there's not much to say:
         | 
         | * binary is the best way to encode information and numbers *
         | twos complement is the best general purpose way to encode
         | negative integers * static typing and stronger typing lead to
         | fewer bugs * length prefixes are better than using sentinel
         | values and escaping
         | 
         | > In almost any other discipline there would be a "correct" way
         | to do it
         | 
         | Tell me you know nothing about anything except computers
         | without...
         | 
         | You think there's a "correct" way to build a bridge?
        
       | wongarsu wrote:
       | I'd argue this is about _safely_ summing up numbers, not
       | correctly.
       | 
       | For example, this is the proposed rust solution (imho the most
       | readable):                   fn sum_up_numbers_correctly(numbers:
       | &[i32]) -> Option<i32> {             let mut sum: i32 = 0;
       | for i in numbers {                 sum = sum.checked_add(*i)?;
       | }             Some(sum)         }
       | 
       | If you try to sum [i32::MAX, i32::MIN, 2] this will return None,
       | while the naive wrapping addition would have returned 1. I'd call
       | None a safe result, it's the algorithm bailing out when it's
       | unsure, and 1 the correct result.
       | 
       | The upside of wrapping addition is that you can overflow as often
       | as you want, as long as the result is within the range of the
       | type, the result (of addition and subtraction) is correct. The
       | downside is that you don't know if the result is within range. So
       | you trade between correctly handling a larger range of inputs
       | with wrapping or being more certain of the result with checked
       | overflow.
       | 
       | My preferred solution would have been just upcasting. If we make
       | sum an i64 we know it can't overflow the intermediate if you sum
       | less than 2^32 i32 numbers, and on a 64 bit machine you lose
       | basically nothing by doing that. Keep the checked_add if you are
       | concerned about people summing extremely long lists of numbers
       | fn sum_up_numbers_correctly(numbers: &[i32]) -> Option<i32> {
       | let mut sum: i64 = 0;             for i in numbers {
       | sum = sum.checked_add(*i as i64)?;             }
       | Some(sum.try_into().ok()?)         }
       | 
       | (the try_into().ok() turns values that overflow i32 into None).
       | 
       | When summing i64 you can do the same trick by summing into an
       | i128
        
       | mmphosis wrote:
       | alias tally='awk -F '\''[^-.0-9]'\'' '\''{ for (i = 1; i <= NF;
       | i++) s += $i } END { print s }'\'''
        
       ___________________________________________________________________
       (page generated 2024-11-03 23:01 UTC)