[HN Gopher] How to average floating point numbers (2015)
       ___________________________________________________________________
        
       How to average floating point numbers (2015)
        
       Author : dolmen
       Score  : 130 points
       Date   : 2021-06-11 07:29 UTC (15 hours ago)
        
 (HTM) web link (www.nu42.com)
 (TXT) w3m dump (www.nu42.com)
        
       | ball_of_lint wrote:
       | That last example looks more like an outright bug than a floating
       | point error.
        
       | jfrunyon wrote:
       | YAGNI.
       | 
       | If you do, in fact, need to worry about floating point error,
       | then there is not some magical "it will never happen if you have
       | this much precision and use this neat algorithm". Heck, even
       | fixed/decimal/rational numbers aren't perfect solutions, they
       | just make it easier to tell if you're staying within your
       | precision.
       | 
       | Without infinite storage, you can't retain infinite precision, so
       | the only real question is how much error you are willing to
       | tolerate. And that depends very much on what you're doing - I can
       | tolerate a lot more error in rolls on a MMO than in my bank
       | account balance...
        
         | johnbcoughlin wrote:
         | > the only real question is how much error you are willing to
         | tolerate
         | 
         | So, you are, in fact, gonna need an algorithm for summation
         | that controls that error, if you're not willing to tolerate
         | very much.
        
           | jfrunyon wrote:
           | Yes. Similarly, if you design something with total disregard
           | to optimization, it will likely need optimization.
           | 
           | YAGNI is a prediction, not a rule. Just like premature
           | optimization, alternatives to floating point - or using
           | longer floating point types - have real costs, which must be
           | balanced against the needed precision.
        
         | commandlinefan wrote:
         | Well, his final example suggests otherwise, but I kind of
         | suspect that there must be something else going on there (that
         | I'm too lazy to try to track down). Still, while fixing most
         | floating point errors seem like rearranging the deck chairs on
         | the titanic, if there's one algorithm that's 99.999% accurate
         | and another that's only 99.9% accurate, might as well use and
         | be aware of the more accurate one.
        
           | jfrunyon wrote:
           | The error in the 'Variance calculated using
           | Statistics::Descriptive' is ~0.0002% of the mean (and of each
           | value).
           | 
           | As the OP says: "One saving grace of the real world is the
           | fact that a given variable is unlikely to contain values with
           | such an extreme range"
           | 
           | My point is that 99.999% algorithm is going to have a cost
           | over the 99.9% algorithm (more complex code, more resource
           | usage), and depending on what you're doing, you may have to
           | consider that cost.
        
       | jl2718 wrote:
       | tl;dr: sort first
       | 
       | Better: recursive bucket accumulators using the exponent
        
       | MaxBarraclough wrote:
       | Can't we just use Kahan summation [0] then divide by _n_?
       | 
       |  _edit_ Someone on the reddit thread [1] had the same thought.
       | 
       | [0] https://en.wikipedia.org/wiki/Kahan_summation_algorithm
       | 
       | [1]
       | https://old.reddit.com/r/perl/comments/2zneat/how_you_averag...
        
         | gpderetta wrote:
         | Interestingly enough I was given this problem (well,
         | numerically stable online variance, but close enough) a while
         | ago as a take-home job interview and I ended up researching
         | this quite a bit.
         | 
         | It has been a while, but if I remember correctly pairwise sum
         | was in practice as good or better as Kahan summation (and both
         | more stable than Walford's) but faster.
         | 
         | But by far the fastest and still very stable solution was to
         | use the legacy 80 bit x87 fpu as accumulator. Which is not
         | surprising as Kahan explicitly added 80 bit precision to x87
         | for intermediate computations.
        
         | m4r35n357 wrote:
         | Or can we sum more accurately by taking the stable_mean and
         | multiplying by n?
        
         | SuchAnonMuchWow wrote:
         | Kahan summation is better than naive summation, but not
         | perfect. The stable_mean algorithm in the article is not
         | perfect either. It just has more precision than the naive one,
         | but it more or less has the same precision as twice-as-precise
         | floats.
         | 
         | >>> data = (1_000_000.1, 1.1, -1_000_000.1, -1.1) * 25_000
         | 
         | >>> sum(data) / len(data) == -2.3283153183228934e-16
         | 
         | >>> sum(sorted(data)) / len(data) == 2.7122441679239272e-11
         | 
         | >>> stable_mean(data) == -2.571225228716577e-13
         | 
         | >>> float(sum(numpy.array(data, dtype=numpy.float128)) /
         | len(data)) == 2.2648549702353195e-19
         | 
         | Note that in python there is a math.fsum function which return
         | a perfectly rounded summation of floats. There are multiple
         | different techniques for that, but they are more complex than
         | Kahan summation.
         | 
         | >>> math.fsum(data) / len(data) == 0.0
        
           | dataflow wrote:
           | Specifically https://code.activestate.com/recipes/393090/
        
           | vitus wrote:
           | > Note that in python there is a math.fsum function which
           | return a perfectly rounded summation of floats.
           | 
           | Quirks of the underlying data can yield surprising results.
           | 
           | I was playing around with the "well-known" floating point
           | error result of 0.3 - 0.2 - 0.1 = some small number (instead
           | of 0), and found:                   >>> sum((0.3, -0.2, -0.1)
           | * 1_000_000) / 3e6         -2.775557561562891e-23         >>>
           | math.fsum((0.3, -0.2, -0.1) * 1_000_000) / 3e6
           | -9.25185853854297e-18         >>> stable_mean((0.3, -0.2,
           | -0.1) * 1_000_000)         -1.387894586067203e-17
           | 
           | Namely, the "naive" mean yielded the smallest error (due to
           | cancellation lining up just right such that it's capped after
           | two rounds), fsum yielded a nominally larger one, and
           | stable_mean yielded the biggest error.
           | 
           | Of course, if you sort the values (such that we're
           | accumulating -0.2 + ... -0.2 + -0.1 + ... + -0.1 + 0.3 + ...
           | + 0.3, then sum performs much worse, stable_mean is a little
           | worse, and fsum performs exactly the same.
           | >>> sum(sorted((0.3, -0.2, -0.1) * 1_000_000)) / 3e6
           | -1.042215244884126e-12         >>> math.fsum(sorted((0.3,
           | -0.2, -0.1) * 1_000_000)) / 3e6         -9.25185853854297e-18
           | >>> stable_mean(sorted((0.3, -0.2, -0.1) * 1_000_000))
           | -3.0046468865706775e-16
           | 
           | edit: Naturally, if you want accurate decimal calculations,
           | you should consider using decimal.Decimal or
           | fractions.Fraction (albeit at the cost of significant
           | performance). Using numpy.float128 will help, but it's still
           | subject to rounding errors due to imprecise representation.
        
             | MauranKilom wrote:
             | Some of these results are indeed very surprising. The
             | "correct" result of (double(0.3) - double(0.2) -
             | double(0.1)) is -1 * 2^-54 = -5.5511151231257827e-17, and
             | dividing by three comes out to -1.8503717077085942e-17.
             | 
             | Hence, as far as those functions are concerned, the naive
             | mean yielded the _worst_ error in both tests, but
             | stable_mean coincidentally happens to be the best in the
             | first test. I 'm slightly surprised that none of them got
             | the correct result.
        
               | vitus wrote:
               | I think you might be off by one for the "correct" result
               | (i.e. 2^-55), which is what fsum arrives at.
               | >>> 0.3 - 0.2 - 0.1         -2.7755575615628914e-17
               | >>> (0.3 - 0.2 - 0.1) / 3         -9.25185853854297e-18
               | 
               | But either way, the fundamental problem in this case (as
               | you noted) is actually due to loss of precision from
               | numeric representation.
        
               | MauranKilom wrote:
               | Heh, I reasoned about the exponent by hand and got -55
               | but then wasn't sure whether it was correct so I searched
               | for discussions of 0.3 - 0.2 - 0.1 and took the value
               | they quoted (2^-54). That being an off-by-one error does
               | resolve my question.
        
           | Majromax wrote:
           | In this case, it seems as if stable_mean is less accurate
           | than the naive sum?
        
             | SuchAnonMuchWow wrote:
             | The two behaves differently, so you can not strictly
             | compare the two. Most of the time stable_mean will give
             | more accurate results, but this is not always the case. For
             | example you can create test cases where the naive approach
             | will return an exact result but stable_mean will have some
             | rounding error.
             | 
             | Kahan summation however is always more precise than naive
             | summation.
        
       | worewood wrote:
       | This is useful for people working in critical applications. For
       | the majority of people the 0.000001% difference doesn't matter.
        
         | jart wrote:
         | That's the wrong way to think about floating point. Double
         | gives you ~15 digits of precision. If the answer is
         | 500000000.600000 but your algorithm says 500000000.601239 then
         | 25% of the bits you've got are garbage. It's also a question of
         | scalability. Increase the author's example array size to 1
         | gigabyte and now 40% of your double is garbage data. Increase
         | it to web scale, and you're toast. What's the solution? Divide
         | and conquer. Not only will that let you farm out the summation
         | across machines, but it also eliminates the linear growth in
         | rounding errors.
        
       | rjmunro wrote:
       | I still think the best way to deal with this stuff is to use
       | interval arithmetic. Calculate the numbers with rounding set to
       | round up, and separately to round down and compare the results.
       | See PyInterval https://pyinterval.readthedocs.io/ for an
       | implementation.
       | 
       | >>> from interval import interval
       | 
       | >>> data = (interval(1_000_000_000.1), interval(1.1)) * 50_000
       | 
       | >>> sum(data) / len(data)
       | 
       | interval([500000000.5971994, 500000000.601347])
       | 
       | The result is definitely between the 2 values. If you try another
       | method to do the calculation, (e.g. sorting first) you will know
       | which one was better without the answer having to be obvious as
       | it is in this case.
       | 
       | You might find one method is better at the lower bound, the other
       | is better at the higher bound, which means you can narrow the
       | value further than using either individually. It's a really
       | powerful system that I think should be much better known.
        
         | nestorD wrote:
         | You should be okay with a sum but interval arithmetic can blow
         | up in a way that is _not_ correlated with the numerical
         | accuracy of the algorithm. Newton 's method, for example, is
         | strongly convergent but produces diverging intervals.
        
           | uyt wrote:
           | Why does that happen? Is it because the errors are in
           | opposite directions and will cancel each other out?
        
       | amelius wrote:
       | Does numpy.mean() do this?
        
         | eesmith wrote:
         | Simple testing shows it does not use the naive sum.
         | >>> import numpy as np       >>> arr = np.array(
         | (1_000_000_000.1, 1.1) * 50_000)       >>> arr.mean()
         | 500000000.6000003
        
       | asciimov wrote:
       | I have a special hatred of floating point numbers. I wish and
       | advocate for a better standard.
       | 
       | My issue is that I've come across errors due to floating point in
       | weird places. One example, searching a list of strings for
       | another string. If the sort function decides those strings look
       | like numbers it may coerce them into doubles (ints being too
       | small to hold them). Then sort wont match correctly due to the
       | least significant bits being rounded off.
        
         | gameswithgo wrote:
         | There are "better" standards they are just way way way slower.
         | Which sometimes is what you need.
        
         | Kranar wrote:
         | Not sure what this could possibly have to do with floating
         | point numbers, nor am I aware of any function that does what
         | you're describing. Is this some kind of JavaScript quirky
         | behavior because nothing about floating point numbers has
         | anything to do with coercing strings.
        
           | TacticalCoder wrote:
           | I've definitely been bitten by similar (not identical) stuff.
           | I've seen data serialized and sent over the wire by
           | I-don't-know-what where several parsers can't correctly
           | parse/serialize and end up with the same thing written back.
           | Be it 32 bit float/64 bit float/string parsed as 32 bit
           | float/string parsed as 64 bit float/string parsed as big
           | decimals: I don't care, the fact is there are buggy
           | serializers, buggy parsers, and even sometimes buggy parsers
           | going into infinite loop (the JVM is famous for that: there
           | was a bug where Double.parseDouble could be sent into an
           | infinite loop by feeing it a carefully crafted string).
           | 
           | The fact is: such kind of SNAFU does happen.
        
         | TacticalCoder wrote:
         | > If the sort function decides those strings look like numbers
         | it may coerce them into doubles (ints being too small to hold
         | them).
         | 
         | You sometimes even see that with APIs inconsistently wrapping
         | behind quotes or not (seemingly at random) some values. And
         | then it gets trickier because you can have the parser correctly
         | parse into, say, "big decimals" some of the values, while
         | others may be automagically, imprecisely, parsed into floating-
         | point numbers.
         | 
         | The problem is of course made worse by people overall overusing
         | decimal numbers where integers would be totally fine.
         | 
         | One of the most beautiful floating-point bugs ever is this one
         | (from about ten years ago as I recall it): language preference
         | in browsers could be set as a decimal number between 0 and 1.
         | Java (and other languages) had a broken floating-point
         | implementation where parsing a certain floating-point number
         | would lead to an infinite loop. So you could set one core of
         | any webserver with that bug by simply requesting a page, with
         | the language preference set to that floating-point number.
         | Rinse and repeat a few times and all cores were spinning at
         | 100%, effectively DDoS'ing the webserver.
        
         | TinkersW wrote:
         | That isn't floating points fault, but whoever wrote the buggy
         | parser..
         | 
         | Floating point is overall pretty good, sometimes I wish there
         | was a version without some many ways to store NAN(for 16 bit
         | floats it is so wasteful), and maybe versions of
         | sqrt/div/rcp/rsqrt that would clamp invalid inputs to 0, but
         | other than that it is hard to see how you make any major
         | improvements given the fixed # of bits.
         | 
         | Actually one other possible improvement would be a way to
         | request the hardware to dither when rounding down to the fixed
         | bit count, although I'm not sure how it could do anything
         | besides pure white noise, but better than nothing..
        
         | titzer wrote:
         | I've gone back and forth over the years. Having implemented
         | some of the lowest-level floating point routines, taking care
         | to handle all the corner cases, it definitely is a chore. But
         | is there an alternative?
         | 
         | After all of this, I still think that IEEE 754 is pretty darn
         | good. The fact that it is available and fast in pretty much all
         | hardware is a major selling point.
         | 
         | That said, I absolutely hate -0 (minus zero). It's a stupid
         | design wart that I've never seen any good use case for. But it
         | is observable and therefore cannot be ignored, leading to _even
         | more_ icky special cases.
        
           | jart wrote:
           | How about determining the quadrant of an angle? That's how
           | atan2() uses signed zero and it's the greatest of all libm
           | functions.
        
         | dolmen wrote:
         | Don't blame the floating point numbers.
         | 
         | Just use a programming language that doesn't coerce strings
         | into numbers. Or use a comparison function/operator that
         | explicitely compares arguments as strings and not as numbers.
        
         | an1sotropy wrote:
         | > I have a special hatred of floating point numbers. I wish and
         | advocate for a better standard.
         | 
         | Like what? Packing the real number line in 32 (or 64) bits is
         | an incredible trick. The roughly scale-invariant distribution
         | of represented values means you can do math without regard to
         | choice of units, which would be impossible with any fixed point
         | scheme.
        
       | metalliqaz wrote:
       | Wow, Perl. Now, that's a name I've not heard in a long time.
        
       | rhaps0dy wrote:
       | According to Radford Neal, it is possible to sum a sequence of
       | n-bit floating point numbers exactly, i.e. rounding to the n-bit
       | floating point value closest to the exact result.
       | 
       | The algorithm uses 2n bits for the sum.
       | https://radfordneal.wordpress.com/2015/05/21/exact-computati...
        
       | andi999 wrote:
       | Give me an (real life) example where these errors matter. Any
       | real data will be noisy anyway and relying on anything at the
       | level of ppb (parts per billion) looks suspicious.
       | 
       | Clarification: I do not say that treating floats careful is not
       | important, and I know doing stuff like subtracting large numbers
       | can lead to inaccurate leading digits. This is clear and part and
       | parcel of any numerical work. But this is not what the article is
       | about. I am claiming instead the article is looking for a problem
       | where there is none (or let's say I fail to see) .
        
         | kstenerud wrote:
         | https://dolphin-emu.org/blog/2021/06/06/dolphin-progress-rep...
        
         | adrianN wrote:
         | A field where floating point errors accumulate quickly and can
         | totally ruin your results is computational geometry. See this
         | paper https://people.mpi-
         | inf.mpg.de/~mehlhorn/ftp/classroomExample... for examples.
        
         | pcwalton wrote:
         | Sure. Here's an example of blatantly incorrect font rendering
         | due to (what I assume is) floating point error in macOS:
         | https://twitter.com/pcwalton/status/971825249464475648
        
         | SuchAnonMuchWow wrote:
         | Rounding errors might be small, but catastrophic cancellation
         | can make errors much larger than parts per billion. Maybe for
         | your application it doesn't matter, but sometimes it does, and
         | you need to be aware of these cases.
         | 
         | You wouldn't want those kind of rounding errors in financial
         | applications, in rocket science applications, etc ...
        
           | andi999 wrote:
           | And how does the mentioned averages procedure help?
        
         | vardump wrote:
         | If you're developing floating point code, you're in for a huge
         | surprise, _eventually_. What happens for example when you
         | subtract two close numbers with an error like this? There are a
         | lot of mechanisms for small errors to become significant.
         | 
         | Many of us have had their attitudes readjusted by a bug where
         | these "insignificant" errors became anything but.
         | 
         | We're often wrong in ways we can't imagine, so it is wise to
         | remain humble and try to strive for correctness over our
         | potentially misguided ideas how things work.
         | 
         | For example, you'd better still have multiple layers of
         | security even though we've already made some other layer
         | "impenetrable". And still check for those "impossible" error
         | cases before they create nearly undebuggable issues.
        
           | andi999 wrote:
           | Yes, one has to careful about floating point, but honestly
           | the article is not about that. Actually it is more about
           | something where you do _not_ have to be careful.
        
             | vardump wrote:
             | The article explicitly says while you _might_ get away with
             | it with some luck due to characteristics of the real world
             | data, you _do_ have to be careful.
             | 
             | From TFA:
             | 
             | "Now, in the real world, you have programs that ingest
             | untold amounts of data. They sum numbers, divide them,
             | multiply them, do unspeakable things to them in the name of
             | "big data". Very few of the people who consider themselves
             | C++ wizards, or F# philosophers, or C# ninjas actually know
             | that one needs to pay attention to how you torture the
             | data. Otherwise, by the time you add, divide, multiply,
             | subtract, and raise to the nth power you might be reporting
             | mush and not data.
             | 
             | One saving grace of the real world is the fact that a given
             | variable is unlikely to contain values with such an extreme
             | range. On the other hand, in the real world, one hardly
             | ever works with just a single variable, and one can hardly
             | every verify the results of individual summations
             | independently.
             | 
             | Anyway, the point of this post was not to make a grand
             | statement, but to illustrate with a simple example that
             | when using floating point, the way numbers are added up,
             | and divided, matters."
        
         | GuB-42 wrote:
         | I work with avionics and it does matter.
         | 
         | Yes, real data is noisy but testing needs do be precise and
         | repeatable. For example, if we need to test that a value is
         | <=100, then it must pass at 100 and fail at 100.00000001. And
         | yes, we use margins and fixed point numbers too but sometimes,
         | we need to be precise with floating point numbers.
         | 
         | It also matters in some calculations, for example when doing
         | GCD/LCM with periods and frequencies. For that, we eventually
         | switched to rational numbers because precision of source data
         | was all over the place.
         | 
         | We all know that parts per billion rarely make sense but where
         | do we draw the line? Sometimes, it really matters (ex: GPS
         | clocks), sometimes, PI=3 is fine. So in doubt, use the highest
         | precision possible, you can relax later if you can characterize
         | your error.
        
           | habibur wrote:
           | Then use more bits for floats and use naive average. This
           | article is more like "how to squeeze the last bits from a
           | floating point operation."
           | 
           | And even if you use these techniques butterfly effect will
           | kick in eventually.
        
           | whatshisface wrote:
           | Of course, the fact that the threshold has to be so precise
           | is because that's the rules, not because of anything related
           | to aviation. That's a human-enforced quality. A threshold at
           | 100 probably has about one significant figure, sometimes
           | less.
        
             | MauranKilom wrote:
             | Well, not necessarily. The _" and repeatable"_ can be quite
             | a constraint.
             | 
             | Imagine three redundant systems on a plane that want to
             | compute the same from the same input array (and then check
             | whether they all agree). You want to implement this, and
             | figure that (since there's a lot of data) it makes sense to
             | parallelize the sum. Each thread sums some part and then in
             | the end you accumulate the per-thread results. No race
             | conditions, clean parallelization. And suddenly, warning
             | lights go off, because the redundant systems computed
             | different results. Why? Different thread team assignments
             | may lead to different summation orders and different
             | results, because floating point addition is not
             | associative.
             | 
             | More generally, whenever you want fully reproducible
             | results (like "the bits hash to the same value"), you need
             | to take care of these kinds of "irrelevant" problems.
        
             | GuB-42 wrote:
             | Aviation is all about rules, and sometimes a bit of flying.
        
               | mcguire wrote:
               | And most of the rules are written in blood.
        
           | sumtechguy wrote:
           | I didn't work in avionics but a field that used a lot of GPS
           | bits. Depending on where your error is, it can be +/- 100
           | feet or more or less depending on what point the cut off is.
           | Like you say you need to know what your target is.
        
         | [deleted]
        
         | snakeboy wrote:
         | In general, floating point error can accumulate in iterative
         | algorithms where you are feeding the result of one calculation
         | into the next step.
        
           | andi999 wrote:
           | Yes. If you have a positive Ljapunov coefficient that will
           | happen. But then what the article does won't help you either
        
             | johnbcoughlin wrote:
             | If you know about Lyapunov coefficients then you should
             | also know about the concept of order of accuracy. If I'm
             | developing a code that is supposed to be second order in
             | time and space, then when I'm running my convergence study,
             | I better be able to control the floating point error in my
             | computation of the total energy, so that I can properly
             | attribute the error I do observe to power series
             | truncation, or what have you.
        
             | thehappypm wrote:
             | The article is not saying that this is even all that useful
             | or revelatory -- but it's just something to keep in mind.
        
         | CarVac wrote:
         | I was accumulating deltas in a pde simulation on photographic
         | images and when the calculation was tiled, different tiles had
         | noticeable difference in brightness.
         | 
         | Real image data, roughly 1 megapixel per tile at that time, and
         | 32-bit floats.
        
       | [deleted]
        
       | eesmith wrote:
       | Switching to Python instead of Perl, use the 'statistics' module,
       | https://docs.python.org/3/library/statistics.html :
       | >>> data = (1_000_000_000.1, 1.1) * 50_000       >>> sum(data) /
       | len(data)       500000000.60091573       >>> sum(sorted(data)) /
       | len(data)       500000000.6004578       >>> sum(sorted(data,
       | reverse=True)) / len(data)       500000000.6012391       >>>
       | import statistics       >>> statistics.mean(data)
       | 500000000.6
       | 
       | and                 >>> X = [10_000_000_000 + k for k in (4, 7,
       | 13, 16)]       >>> statistics.variance(X)       30
        
         | trinovantes wrote:
         | I guess the question is does it still use floats internally or
         | use something like BigNum and handle the precision issue (and
         | thus more memory/cpu)
        
           | _delirium wrote:
           | It looks like the second. For many of its operations, the
           | statistics module first converts floats to integer ratios
           | (integers are bignums in Python), does intermediate
           | operations using integer arithmetic, and converts back to
           | floats at the end.
           | 
           | Here's the internal _sum function for example, which you can
           | see maps _exact_ratio over the data before summing: https://g
           | ithub.com/python/cpython/blob/3.9/Lib/statistics.py...
        
             | eesmith wrote:
             | There's also fmean() which is for floats:
             | >>> data = (1_000_000_000.1, 1.1) * 50_000       >>>
             | statistics.fmean(data)       500000000.6
             | 
             | Implementation at https://github.com/python/cpython/blob/6d
             | f926f1c46eb6db7b5dc... .
        
         | pdkl95 wrote:
         | Ruby seems to handle this out of the box:                 >>
         | RUBY_VERSION       =  "2.7.2"       >> data = [1_000_000_000.1,
         | 1.1] * 50_000 ; nil       =  nil       >> data.sum       =
         | 50000000060000.0       >> data.sum / data.length       =
         | 500000000.6       >> data.sort.sum / data.length       =
         | 500000000.6       >> data.sort.reverse.sum / data.length
         | =  500000000.6
         | 
         | I know that ruby will auto-promote an Integer into a Bignum. It
         | seems like ruby will also auto-promote other types? I should
         | research this...
        
           | dmurray wrote:
           | FWIW, Ruby the language doesn't have Bignum any more, though,
           | just Integer which handles arbitrarily large numbers. Under
           | the hood, there are some optimizations for small integers.
           | 
           | https://bugs.ruby-lang.org/issues/12005
        
             | chrisseaton wrote:
             | Bignum is still there, working in almost exactly the same
             | way, it's not exposed to the user anymore.
        
           | ska wrote:
           | This is a nice thing about Common Lisp, not just integer
           | promotion to bignums, but ratios too (works in complex,
           | also).
        
           | aardvark179 wrote:
           | Nope. The sum method is implemented in C, and as soon as it
           | sees a float value in the array it switches to Kahan-Babuska
           | summation.
           | 
           | We don't do that in TruffleRuby and so don't produce quite
           | such an accurate result.
        
       | _0ffh wrote:
       | I suppose if you're not bothered with wasting lots of time, do
       | the following for a precise sum: Select the two smallest numbers
       | from the list, remove them and insert their sum. Repeat until
       | only one number is left.
       | 
       | [Ed. smaller meaning closer to zero]
        
         | dolmen wrote:
         | This is exactly what happens when you sort the number ascending
         | and compute the sum starting from the smallest. Like in the
         | article.
         | 
         | Except this doesn't work if you must handle negative numbers.
         | So what matters is not the smallest numbers but the number
         | nearer from zero.
        
           | MauranKilom wrote:
           | No, that's not the same.
           | 
           | Summing [4, 5, 6, 7] would yield the series (4, )9, 15, 22
           | with "sort then add", but what the previous poster described
           | would first sum 4+5 (leaving the list [9, 6, 7]), then 6+7
           | (leaving the list [9, 13]) and finally arriving at 22.
        
             | dolmen wrote:
             | OK.
             | 
             | That algorithm could be improved (for accuracy, not speed)
             | by replacing sums of the smallest number with
             | multiplications as I suppose that multiplication of more
             | than 3 identical numbers or more is loosing less precision
             | than multiple additions.
             | 
             | For example:
             | 
             | [4, 3, 7, 7, 8] could be calculated as:
             | (3+4) => [7, 7, 7, 8]       (7*3) => [21, 8]       (8+21)
             | => 29
        
       | worik wrote:
       | If you are looking for such accuracy (where 50_000_000.00006 is
       | different from 50_000_000.06) then you need a specialised library
       | and do very careful coding.
       | 
       | If you are averaging bazillions of numbers and they range from
       | weeny to huge and you thing 50_000_000.000006 != 50_000_000.06
       | for your purposes then it is hard for me to see that you are
       | doing anything sensible.
       | 
       | This is, except in very specialised cases that I cannot imagine,
       | a non issue.
       | 
       | Just add them up and divide, (and check for overflow, that could
       | ruin your day!)
        
       ___________________________________________________________________
       (page generated 2021-06-11 23:01 UTC)