hngopher.com

       [HN Gopher] Beware of Fast-Math
       ___________________________________________________________________
        
       Beware of Fast-Math
        
       Author : blobcode
       Score  : 277 points
       Date   : 2025-05-31 07:05 UTC (15 hours ago)
        
 (HTM) web link (simonbyrne.github.io)
 (TXT) w3m dump (simonbyrne.github.io)
        
       | Sophira wrote:
       | Previously discussed at
       | https://news.ycombinator.com/item?id=29201473 (which the article
       | itself links to at the end).
        
         | anthk wrote:
         | On Forth, there's the philosophy of the fixed point:
         | 
         | https://www.forth.com/starting-forth/5-fixed-point-arithmeti...
         | 
         | With 32 and 64 bit numbers, you can just scale decimals up. So,
         | Torvalds was right. On dangerous contexts (uper-precise medical
         | doses, FP has good reasons to exist, and I am not completely
         | sure).
         | 
         | Also, both Forth and Lisp internally suggest to use represented
         | rationals before floating point numbers. Even toy lisps from
         | https://t3x.org have rationals too. In Scheme, you have both
         | exact->inexact and inexact->exact which convert rationals to FP
         | and viceversa.
         | 
         | If you have a Linux/BSD distro, you may already have Guile
         | installed as a dependency.
         | 
         | Hence, run it and then:                     scheme@(guile-
         | user)> (inexact->exact 2.5)           $2 = 5/2
         | scheme@(guile-user)> (exact->inexact (/ 5 2))           $3 =
         | 2.5
         | 
         | Thus, in Forth, I have a good set of q{+,-,*,/} operations for
         | rational (custom coded, literal four lines) and they work great
         | for a good 99% of the cases.
         | 
         | As for irrational numbers, NASA used up 16 decimals, and the
         | old 113/355 can be precise enough for a 99,99 of the pieces
         | built in Earth. Maybe not for astronomical distances, but
         | hey...
         | 
         | In Scheme:                        scheme@(guile-user)>
         | (exact->inexact (/ 355 113))              $5 =
         | 3.1415929203539825
         | 
         | In Forth, you would just use                        : pi* 355
         | 133 m*/ ;
         | 
         | with a great precision for most of the objects being measured
         | against.
        
           | eqvinox wrote:
           | Those rational numbers fly out the window as soon as your
           | math involves any kind of more complicated trigonometry, or
           | even a square root...
        
             | stassats wrote:
             | You can turn them back into rationals, (rational (sqrt
             | 2d0)) => 6369051672525773/4503599627370496
             | 
             | Or write your own operations that compute to the precision
             | you want.
        
               | anthk wrote:
               | My post already covered inexact->exact:
               | scheme@(guile-user)> (inexact->exact (sqrt 2.0))
               | 
               | $1 = 6369051672525773/4503599627370496
               | 
               | s9 Scheme fails on this as it's an irrational number, but
               | the rest of Schemes such as STKlos, Guile, Mit Scheme,
               | will do it right.
               | 
               | With Forth (and even EForth if the images it's compiled
               | with FP support), you are on your own to check (or
               | rewrite) an fsqrt function with an arbitrary precision.
               | 
               | Also, on trig, your parent commenter should check what
               | CORDIC was.
               | 
               | https://en.wikipedia.org/wiki/CORDIC
        
             | anthk wrote:
             | Check CORDIC, please.
             | 
             | https://en.wikipedia.org/wiki/CORDIC
             | 
             | Also, on sqrt functions, even a FP-enabled toy EForth under
             | the Subleq VM (just as a toy, again, but it works) provides
             | some sort of fsqrt functions:                   2 f fsqrt
             | f.         1.414 ok
             | 
             | Under PFE Forth, something 'bigger':                  40
             | set-precision ok          2e0 fsqrt f.
             | 1.4142135623730951454746218587388284504414 ok
             | 
             | EForth's FP precision it's tiny but good enough for very
             | small microcontrollers. But it wasn't so far from the
             | exponents the 80's engineers worked to create properly
             | usable machinery/hardware and even software.
        
             | dreamcompiler wrote:
             | If you want high precision trig functions on rationals,
             | nothing's stopping you from writing a Taylor series library
             | for them. Or some other polynomial appromation or a lookup
             | table or CORDIC.
        
           | AlotOfReading wrote:
           | Floats _are_ fixed point, just done in log space. The main
           | change is that the designers dedicated a few bits to variable
           | exponents, which introduces alignment and normalization steps
           | before /after the operation. If you don't mix exponents, you
           | can essentially treat it as identical to a lower precision
           | fixed point system.
        
             | anthk wrote:
             | No, not even close. Scaling integers to mimic decimals
             | under 32 and 64 bit can be much faster. And with 32 bit
             | double numbers you can cover Plank numbers, so with 64 bit
             | double numbers you can do any field.
        
       | orlp wrote:
       | I helped design an API for "algebraic operations" in Rust:
       | <https://github.com/rust-lang/rust/issues/136469>, which are
       | coming along nicely.
       | 
       | These operations are
       | 
       | 1. Localized, not a function-wide or program-wide flag.
       | 
       | 2. Completely safe, -ffast-math includes assumptions such that
       | there are no NaNs, and violating that is _undefined behavior_.
       | 
       | So what do these algebraic operations do? Well, one by itself
       | doesn't do much of anything compared to a regular operation. But
       | a sequence of them is allowed to be transformed using
       | optimizations which are algebraically justified, as-if all
       | operations are done using real arithmetic.
        
         | eqvinox wrote:
         | Are these calls going to clear the FTZ and DAZ flags in the
         | MXCSR on x86? And FZ & FIZ in the FPCR on ARM?
        
           | orlp wrote:
           | I don't believe so, no. Currently these operations only set
           | the LLVM flags to allow reassociation, contraction, division
           | replaced by reciprocal multiplication, and the assumption of
           | no signed zeroes.
           | 
           | This can be expanded in the future as LLVM offers more flags
           | that fall within the scope of algebraically motivated
           | optimizations.
        
             | eqvinox wrote:
             | Ah sorry I misunderstood and thought this API was for the
             | other way around, i.e. forbidding "unsafe" operations. (I
             | guess the question reverses to _setting_ those flags)
             | 
             | ('Naming: "algebraic" is not very descriptive of what this
             | does since the operations themselves are algebraic.' :D)
        
               | nextaccountic wrote:
               | > ('Naming: "algebraic" is not very descriptive of what
               | this does since the operations themselves are algebraic.'
               | :D)
               | 
               | Okay, the floating point operations are literally
               | algebraic (they form an algebra) but they don't follow
               | some common algebraic properties like associativity. The
               | linked tracking issue itself acknowledges that:
               | 
               | > Naming: "algebraic" is not very descriptive of what
               | this does since the operations themselves are algebraic.
               | 
               | Also this comment https://github.com/rust-
               | lang/rust/issues/136469#issuecomment...
               | 
               | > > On that note I added an unresolved question for
               | naming since algebraic isn't the most clear indicator of
               | what is going on. > > I think it is fairly clear. The
               | operations allow algebraically justified optimizations,
               | as-if the arithmetic was real arithmetic. > > I don't
               | think you're going to find a clearer name, but feel free
               | to provide suggestions. One alternative one might
               | consider is real_add, real_sub, etc.
               | 
               | Then retorted here https://github.com/rust-
               | lang/rust/issues/136469#issuecomment...
               | 
               | > These names suggest that the operations are more
               | accurate than normal, where really they are less
               | accurate. One might misinterpret that these are infinite-
               | precision operations (perhaps with rounding after a whole
               | sequence of operations). > > The actual meaning isn't
               | that these are real number operations, it's quite the
               | opposite: they have best-effort precision with no strict
               | guarantees. > > I find "algebraic" confusing for the same
               | reason. > > How about approximate_add, approximate_sub?
               | 
               | And the next comment
               | 
               | > Saying "approximate" feels imperfect, as while these
               | operations don't promise to produce the exact IEEE result
               | on a per-operation basis, the overall result might well
               | be more accurate algebraically. E.g.: > > (...)
               | 
               | So there's a discussion going on about the naming
        
               | eqvinox wrote:
               | It doesn't feel appropriate to comment there for me not
               | knowing any Rust really, but "lax_" (or "relax_") would
               | have the extra benefit of being very short.
               | 
               | (Is this going to overload operators or are people going
               | to have to type this... a lot... ?)
        
               | Sharlin wrote:
               | Rust has some precedence for adding convenience newtypes
               | with overloaded operators (eg. `Wrapping<I>' for
               | `I.wrapping_add(I)` etc). Such a wrapper isn't currently
               | proposed AFAIK but there's no reason one couldn't be
               | added in the future I believe.
        
               | eqvinox wrote:
               | Right, as long as the LLVM intrinsics are exposed you
               | could just put that in a crate somewhere.
        
               | Measter wrote:
               | For giggles, here's one I whipped up, along with an
               | example use: https://godbolt.org/z/Eezj35dzc
        
               | Sharlin wrote:
               | Wow, that's some hardcore unrolling.
        
               | CryZe wrote:
               | WebAssembly also ended up calling its set of similar
               | instructions relaxed.
        
         | evrimoztamur wrote:
         | Does that mean that a physics engine written with these
         | operations will always compile to yield the same deterministic
         | outcomes across different platforms (assuming they correctly
         | implement (or able to do so) algebraic operations)?
        
           | orlp wrote:
           | No, there is no guarantee which (if any) optimizations are
           | applied, only that they _may_ be applied. For example a fused
           | multiply-add instruction may be emitted for a*b + c on
           | platforms which support it, which is not cross-platform.
        
           | SkiFire13 wrote:
           | No, the result may depend on how the compiler reorders them,
           | which could be different on different platforms.
        
           | Sharlin wrote:
           | It's more like the opposite. These tell the compiler to
           | assume for optimization purposes that floats are associative
           | and so on (ie. algebraic), even when in reality they aren't.
           | So the results may vary depending on what transformations the
           | compiler performs - in particular, they may vary between
           | optimized and non-optimized builds, which normally isn't
           | allowed.
        
             | vanderZwan wrote:
             | > _These tell the compiler to assume for optimization
             | purposes that floats are associative and so on (ie.
             | algebraic), even when in reality they aren 't._
             | 
             | I wonder if it is possible to add an additional constraint
             | that guarantees the transformation has equal or fewer
             | numerical rounding errors. E.g. for floating point doubles
             | (0.2 + 0.1) - 0.1 results in 0.20000000000000004, so I
             | would expect that transforming some (A + B) - B to just A
             | would always reduce numerical error. OTOH, it's floating
             | point maths, there's probably some kind of weird gotcha
             | here as well.
        
               | legobmw99 wrote:
               | Kahan summation is an example (also described in the top
               | level article) of one such "gotcha". It involves adding a
               | term that - if floats were algebraic in this sense -
               | would always be zero, so ffast-math often deletes it, but
               | this actually completely removes the accuracy improvement
               | of the algorithm
        
               | anthk wrote:
               | Under EForth with FP done in software:
               | 
               | 2 f 1 f 1 f f+ f- f. 0.000 ok
               | 
               | PFE, I think reusing the GLIBC math library:
               | 
               | 2e0 1e0 1e0 f+ f- f. 0.000000 ok
        
               | StefanKarpinski wrote:
               | Pretty sure that's not possible. More accurate for some
               | inputs will be less accurate for others. There's a very
               | tricky tension in float optimization that the most
               | predictable operation structure is a fully skewed op
               | tree, as in naive left-to-right summation, but this is
               | the slowest and least accurate order of operations. Using
               | a more balanced tree is faster and more accurate (great),
               | but unfortunately which tree shape is fastest depends
               | very much on hardware-specific factors like SIMD width
               | (less great). And no tree shape is universally guaranteed
               | to be fully accurate, although a full binary tree tends
               | to have the best accuracy, but has bad base case
               | performance, so the actual shape that tends to get used
               | in high performance kernels is SIMD-width parallel in a
               | loop up to some fixed size like 256 elements, then
               | pairwise recursive reduction above that. The recursive
               | reduction can also be threaded. Anyway, there's no silver
               | bullet here.
        
               | scythe wrote:
               | I think a restricted version might be possible to
               | implement: only allow transformations if the transformed
               | version has strictly fewer numerical rounding errors on
               | some inputs. This will usually only mean canceling terms
               | and collecting expressions like "x+x+x" into 3x.
               | 
               | In general, rules that allow fewer transformations are
               | probably easier to understand and use. Trying to optimize
               | everything is where you run into trouble.
        
         | glkindlmann wrote:
         | That sounds neat. What would be really neat is if the language
         | helped to expose the consequences of the ensuing rounding error
         | by automating things that are otherwise clumsy for programmers
         | to do manually, like running twice with opposite rounding
         | directions, or running many many times with internally
         | randomized directions (two of the options in Sec 4 of *). That
         | is, it would be cool if Rust enabled people learn about the
         | subtleties of floating point, instead of hiding them away.
         | 
         | * https://people.eecs.berkeley.edu/~wkahan/Mindless.pdf
        
         | pclmulqdq wrote:
         | -ffast-math is actually something like 15 separate flags, and
         | you can use them individually if you want. 3 of them are "no
         | NaNs," "no infinities," and "no subnormals." Several of the
         | other flags allow you to treat math as associative or
         | distributive if you want that.
         | 
         | The library has some merit, but the goal you've stated here is
         | given to you with 5 compiler flags. The benefit of the library
         | is choosing when these apply.
        
       | eqvinox wrote:
       | I wish the Twitter links in this article weren't broken.
        
         | Smaug123 wrote:
         | They aren't, at least for the spot-check I performed; probably
         | you need to be logged in.
        
           | SunlitCat wrote:
           | Maybe an unpopular opinion, but having to be logged in, is
           | being broken. ;)
        
           | eqvinox wrote:
           | All it says is "Something went wrong. Try reloading." -- no
           | indication having an account logged in would help (...and I
           | don't feel like creating an account just to check...)
        
         | genewitch wrote:
         | Change X to xcancel
        
       | rlpb wrote:
       | > I mean, the whole point of fast-math is trading off speed with
       | correctness. If fast-math was to give always the correct results,
       | it wouldn't be fast-math, it would be the standard way of doing
       | math.
       | 
       | A similar warning applies to -O3. If an optimization in -O3 were
       | to reliably always give better results, it wouldn't be in -O3;
       | it'd be in -O2. So blindly compiling with -O3 also doesn't seem
       | like a great idea.
        
         | CamouflagedKiwi wrote:
         | The optimisations in -O3 aren't supposed to give incorrect
         | results. They're not in -O2 because they make a more aggressive
         | space/speed tradeoff or increase compile times more
         | significantly. In the same way, the optimisations in -O2 are
         | not meant to be less correct than -O1, but they aren't in that
         | group for similar reasons.
         | 
         | -Ofast is the 'dangerous' one. (It includes -ffast-math).
        
           | rlpb wrote:
           | > The optimisations in -O3 aren't supposed to give incorrect
           | results.
           | 
           | I didn't mean to imply that they result in incorrect results.
           | 
           | > they make a more aggressive space/speed tradeoff...
           | 
           | Right...so "better" becomes subjective, depends on the use
           | case, so it doesn't make sense to choose -O3 blindly unless
           | you understand the trade-offs and want that side of them for
           | the particular builds you're doing. Things that everyone
           | wants would be in -O2. That's all I'm saying.
        
             | eqvinox wrote:
             | It doesn't become subjective; things in -O3 can objectively
             | be understood to produce equal or faster code for a higher
             | build cost in the vast majority of cases, roughly averaged
             | across platforms. (Without loss in correctness.)
             | 
             | If you know your exact target and details about your input
             | expectations, of course you can optimize further, which
             | might involve turning off some things in -O3 (or even -O2).
             | On a whole bunch of systems, -Os can be faster than -O3 due
             | to I-cache size limits. But at-large, you can expect -O3 to
             | be faster.
             | 
             | Similar considerations apply for LTO and PGO. LTO is
             | commonly default for release builds these days, it just
             | costs a whole lot of compile time. PGO is done when
             | possible (i.e. known majority inputs).
        
             | CamouflagedKiwi wrote:
             | If they're things that everyone wants, why aren't they in
             | -O1?
        
         | wffurr wrote:
         | If the answer can be wrong, you can make it as fast as you
         | want.
        
       | zinekeller wrote:
       | (2021)
       | 
       | Previous discussion: Beware of fast-math (Nov 12, 2021,
       | https://news.ycombinator.com/item?id=29201473)
        
       | Affric wrote:
       | For non-associativity what is the best way to order operations?
       | Is there an optimal order for precision whereby more similar
       | values are added/multiplied first?
       | 
       | EDIT: I am now reading Goldberg 1991
       | 
       | Double edit: Kahan Summation formula. Goldberg is always worth
       | going back to.
        
         | zokier wrote:
         | Herbie can optimize arbitrary floating point expressions for
         | accuracy
         | 
         | https://herbie.uwplse.org/
        
       | Sharlin wrote:
       | > -funsafe-math-optimizations
       | 
       | What's wrong with fun, safe math optimizations?!
       | 
       | (:
        
         | keybored wrote:
         | Hah! I was just about to comment that I immediately read it as
         | fun-safe, everytime I see it.
         | 
         | I guess that happens when I don't deal with compiler flags
         | daily.
        
         | vardump wrote:
         | "This roller coaster is optimized to be Fun and Safe!"
        
           | Sharlin wrote:
           | Many funroll loops in that coaster.
        
       | emn13 wrote:
       | I get the feeling that the real problem here are the IEEE specs
       | themselves. They include a huge bunch of restrictions that each
       | individually aren't relevant to something like 99.9% of floating
       | point code, and probably even in aggregate not a single one is
       | relevant to a large majority of code segments out in the wild.
       | That doesn't mean they're not important - but some of these
       | features should have been locally opt-in, not opt out. And at the
       | very least, standards need to evolve to support hardware
       | realities of today.
       | 
       | Not being able to auto-vectorize seems like a pretty critical bug
       | given hardware trends that have been going on for decades now; on
       | the other hand sacrificing platform-independent determinism isn't
       | a trivial cost to pay either.
       | 
       | I'm not familiar with the details of OpenCL and CUDA on this
       | front - do they have some way to guarrantee a specific order-of-
       | operations such that code always has a predictable result on all
       | platforms and nevertheless parallelizes well on a GPU?
        
         | Affric wrote:
         | How does IEEE 754 prevent auto-vectorisation?
        
           | Kubuxu wrote:
           | IIRC reordering additions can cause the result to change
           | which makes auto-vectorisation tricky.
        
           | kzrdude wrote:
           | If you write a loop `for x in array { sum += x }` Then your
           | program is a specification that you want to add the elements
           | in exactly that order, one by one. Vectorization would change
           | the order.
        
             | stingraycharles wrote:
             | Yup, because of the imprecision of floating points, cannot
             | just assume that "(a + c) + (b + d)" is the same as "a + b
             | + c + d".
             | 
             | It would be pretty ironic if at some point fixed point /
             | bignum implementations end up being faster because of this.
        
               | anthk wrote:
               | They are, just check anything fixed-point for the 486SX
               | vs anything floating under a 486DX. It's faster scaling
               | and sum and print the desired precision than operating on
               | floats.
        
               | einpoklum wrote:
               | I wonder... couldn't there just be some library type for
               | this, e.g. `associative::float` and `associative::doube`
               | and such (in C++ terms), so that compilers can ignore
               | non-associativity for actions on values of these types?
               | Or attributes one can place on variables to force
               | assumption of associativity?
        
             | dahart wrote:
             | The bigger problem there is the language not offering a way
             | to signal the author's intent. If an author doesn't care
             | about the order of operations in a sum, they will still
             | write the exact same code as the author who does care. This
             | is a failure of the language to be expressive enough, and
             | doesn't reflect on the IEEE spec. (The spec even does
             | suggest that languages should offer and define these sorts
             | of semantics.) Whether the program is specifying an order
             | of operations is lost when the language offers no way for a
             | coder to distinguish between caring about order and not
             | caring. This is especially difficult since the vast
             | majority of people don't care and don't consider their own
             | code to be a specification on order of operations. Worse,
             | most people would even be surprised and/or annoyed if the
             | compiler didn't do certain simplifications and constant
             | folding, which change the results. The few cases where
             | people do care about order can be extremely important, but
             | they are rare nonetheless.
        
           | goalieca wrote:
           | Floating point arithmetic is neither commutative or
           | associative so you shouldn't.
        
             | lo0dot0 wrote:
             | While it technically correct to say this it also gets the
             | wrong point across because it leaves out the fact that
             | ordering changes create only a small difference. Other
             | examples where arithmetic is not commutative, e.g. matrix
             | multiplication , can create much larger differences.
        
               | kstrauser wrote:
               | > ordering changes create only a small difference.
               | 
               | That can't be assumed.
               | 
               | You can easily fall into a situation like:
               | total = large_float_value       for _ in
               | range(1_000_000_000):         total += .01       assert
               | total == large_float_value
               | 
               | Without knowing the specific situation, it's impossible
               | to say whether that's a tolerably small difference.
        
             | eapriv wrote:
             | Why is it not commutative?
        
               | layer8 wrote:
               | It actually is commutative according to IEEE-754, except
               | that in the case of a NaN result you might get a
               | different NaN representation.
        
               | adgjlsfhk1 wrote:
               | having multiple NaNs and no spec for how they should
               | behave feels like such an unforced error to me
        
             | layer8 wrote:
             | IEEE-754 addition and multiplication is commutative. It
             | isn't distributive, though.
        
           | dahart wrote:
           | The spec doesn't prevent auto-vectorization, it only says the
           | language should avoid it when it wants to opt in to producing
           | "reproducible floating-point results" (section 11 of IEEE
           | 754-2019). Vectorizing can be implemented in different ways,
           | so whether a language avoids vectorizing in order to opt in
           | to reproducible results is implementation dependent. It also
           | depends on whether there is an option to not vectorize. If a
           | language only had auto-vectorization, and the vectorization
           | result was deterministic and reproducible, and if the
           | language offered no serial mode, this could adhere to the
           | IEEE spec. But since C++ (for example) offers serial
           | reductions in debug & non-optimized code, and it wants to
           | offer reproducible results, then it has to be careful about
           | vectorizing without the user's explicit consent.
        
         | ajross wrote:
         | > I get the feeling that the real problem here are the IEEE
         | specs themselves.
         | 
         | Well, all standards are bad when you really get into them,
         | sure.
         | 
         | But no, the problem here is that floating point code is often
         | sensitive to precision errors. Relying on rigorous adherence to
         | a specification doesn't fix precision errors, but it does
         | guarantee that software behavior in the face of them is
         | deterministic. Which 90%+ of the time is enough to let you
         | ignore the problem as a "tuning" thing.
         | 
         | But no, precision errors _are bugs_. And the proper treatment
         | for bugs is to fix the bugs and not ignore them via tricks with
         | determinism. But that 's hard, as it often involves design
         | decisions and complicated math (consider gimbal lock: "fixing"
         | that requires understanding quaternions or some other
         | orthogonal orientation space, and that's hard!).
         | 
         | So we just deal with it. But IMHO --ffast-math is more good
         | than bad, and projects should absolutely enable it, because the
         | "problems" it discovers are bugs you want to fix anyway.
        
           | chuckadams wrote:
           | > (consider gimbal lock: "fixing" that requires understanding
           | quaternions or some other orthogonal orientation space, and
           | that's hard!)
           | 
           | Or just avoiding gimbal lock by other means. We went to the
           | moon using Euler angles, but I don't suppose there's much of
           | a choice when you're using real mechanical gimbals.
        
             | ajross wrote:
             | That is the "tuning" solution. And mostly it works by
             | limiting scope of execution ("just don't do that") and if
             | that doesn't work by having some kind of recovery method
             | ("push this button to reset", probably along with "use this
             | backup to recalibrate"). And it... works. But the bug is
             | still a bug. In software we prefer more robust techniques.
             | 
             | FWIW, my memory is that this was _exactly_ what happened
             | with Apollo 13. It lost its gyro calibration after the
             | accident (it did the thing that was the  "just don't do
             | that") and they had to do a bunch of iterative contortions
             | to recover it from things like the sun position (because
             | they couldn't see stars out the iced-over windows).
             | 
             | NASA would have strongly preferred IEEE doubles and
             | quaternions, in hindsight.
        
         | adrian_b wrote:
         | Not being able to auto-vectorize is not the fault of the IEEE
         | standard, but the fault of those programming languages which do
         | not have ways to express that the order of some operations is
         | irrelevant, so they may be executed concurrently.
         | 
         | Most popular programming languages have the defect that they
         | impose a sequential semantics even where it is not needed.
         | There have been programming languages without this defect, e.g.
         | Occam, but they have not become widespread.
         | 
         | Because nowadays only a relatively small number of users care
         | about computational applications, this defect has not been
         | corrected in any mainline programming language, though for some
         | programming languages there are extensions that can achieve
         | this effect, e.g. OpenMP for C/C++ and Fortran. CUDA is similar
         | to OpenMP, even if it has a very different syntax.
         | 
         | The IEEE standard for floating-point arithmetic has been one of
         | the most useful standards in all history. The reason is that
         | both hardware designers and naive programmers have always had
         | the incentive to cheat in order to obtain better results in
         | speed benchmarks, i.e. to introduce errors in the results with
         | the hope that this will not matter for users, which will be
         | more impressed by the great benchmark results.
         | 
         | There are always users who need correct results more than
         | anything else and it can be even a matter of life and death.
         | For the very limited in scope uses where correctness does not
         | matter, i.e. mainly graphics and ML/AI, it is better to use
         | dedicated accelerators, GPUs and NPUs, which are designed by
         | prioritizing speed over correctness. For general-purpose CPUs,
         | being not fully-compliant with the IEEE standard is a serious
         | mistake, because in most cases the consequences of such a
         | choice are impossible to predict, especially not by the people
         | without experience in floating-point computation who are the
         | most likely to attempt to bypass the standard.
         | 
         | Regarding CUDA, OpenMP and the like, by definition if some
         | operations are parallelizable, then the order of their
         | execution does not matter. If the order matters, then it is
         | impossible to provide guarantees about the results, on any
         | platform. If the order matters, it is the responsibility of the
         | programmer to enforce it, by synchronization of the parallel
         | threads, wherever necessary.
         | 
         | Whoever wants vectorized code should never rely on programming
         | languages like C/C++ and the like, but they should always use
         | one of the programming language extensions that have been
         | developed for this purpose, e.g. OpenMP, CUDA, OpenCL, where
         | vectorization is not left to chance.
        
           | emn13 wrote:
           | If you care about absolute accuracy, I'm skeptical you want
           | floats at all. I'm sure it depends on the use case.
           | 
           | Whether it's the standards fault or the languages fault for
           | following the standard in terms of preventing auto-
           | vectorization is splitting hairs; the whole point of the
           | standard is to have predictable and usually fairly low-error
           | ways of performing these operations, which only works when
           | the order of operations is defined. That very aim is the
           | problem; to the extent the stardard is harmless when ordering
           | guarrantees don't exist you're essentially applying some of
           | those tricky -ffast-math suboptimizations.
           | 
           | But to be clear in any case: there are obviously cases
           | whereby order-of-operations is relevant enough and accuracy
           | altering reorderings are not valid. It's just that those are
           | rare enough that for many of these features I'd much prefer
           | that to be the opt-in behavior, not opt-out. There's
           | absolutely nothing wrong with having a classic IEEE 754 mode
           | and I expect it's an essentialy feature in some niche corner
           | cases.
           | 
           | However, given the obviously huge application of massively
           | parallel processors and algorithms that accept rounding
           | errors (or sometimes conversely overly precise results!),
           | clearly most software is willing to generally accept rounding
           | errors to be able to run efficiently on modern chips. It just
           | so happens that none of the computer languages that rely on
           | mapping floats to IEEE 754 floats in a straitforward fashion
           | are any good at that, which is seems like its a bad trade
           | off.
           | 
           | There could be multiple types of floats instead; or code-
           | local flags that delineate special sections that need precise
           | ordering; or perhaps even expressions that clarify how much
           | error the user is willing to accept and then just let the
           | compiler do some but not all transformations; and perhaps
           | even other solutions.
        
         | dzaima wrote:
         | The precise requirements of IEEE-754 may not be important for
         | any given program, but as long as you want your numbers to have
         | _any_ form of well-defined semantics beyond  "numbers exist,
         | and here's a list of functions that do Something(tm) that may
         | or may not be related to their name", any number format that's
         | capable of (approximately) storing both 10^20 and 10^-20 in 64
         | bits is gonna have those drawbacks.
         | 
         | AFAIK GPU code is basically always written as scalar code
         | acting on each "thing" separately, that's, as a whole,
         | semantically looped over by the hardware, same way as
         | multithreading would (i.e. no order guaranteed at all), so you
         | physically cannot write code that'd need operation reordering
         | to vectorize. You just can't write an equivalent to "for (each
         | element in list) accumulator += element;" (or, well, you can,
         | by writing that and running just one thread of it, but that's
         | gonna be slower than even the non-vectorized CPU equivalent
         | (assuming the driver respects IEEE-754)).
        
       | cycomanic wrote:
       | I think this article overstates the importance of the problems
       | even for scientific software. In the scientific code I've
       | written, noise processes are often orders of magnitude larger
       | than what what is discussed here and I believe this applies to
       | many (most?) simulations modelling the real world (i.e. Physics
       | chemistry,..). At the same time enabling fast-math has often
       | yielded a very significant (>10%) performance boost.
       | 
       | I particularly find the discussion of - fassociative-math because
       | I assume that most writers of some code that translates a
       | mathetical formula to into simulations will not know which would
       | be the most accurate order of operations and will simply codify
       | their derivation of the equation to be simulated (which could
       | have operations in any order). So if this switch changes your
       | results it probably means that you should have a long hard look
       | at the equations you're simulating and which ordering will give
       | you the most correct results.
       | 
       | That said I appreciate that the considerations might be quite
       | different for libraries and in particular simulations for
       | mathematics.
        
         | londons_explore wrote:
         | It would be nice if there was some syntax for "math order
         | matters, this is the order I want it done in".
         | 
         | Then all other math will be fast-math, except where annotated.
        
           | hansvm wrote:
           | The article mentioned that gcc and clang have such
           | extensions. Having it in the language is nice though, and
           | that's the approach Zig took.
        
           | sfn42 wrote:
           | I thought most languages have this? If you simply write a
           | formula operations are ordered according to the language
           | specifiction. If you want different ordering you use
           | parentheses.
           | 
           | Not sure how that interacts with this fast math thing, I
           | don't use C
        
             | kstrauser wrote:
             | That's a different kind of ordering.
             | 
             | Imagine a function like Python's `sum(list)`. In abstract,
             | Python should be able to add those values in any order it
             | wants. Maybe it could spawn a thread so that one process
             | sums the first half in the list, another sums the second
             | half at the same time, and then you return the sum of those
             | intermediate values. You could imagine a clever `sum()`
             | being many times faster, especially using SIMD instructions
             | or a GPU or something.
             | 
             | But alas, you can't optimize like that with common IEEE-754
             | floats and expect to get the same answer out as when using
             | the simple one-at-a-time addition. The result depends on
             | what order you add the numbers together. Order them
             | differently and you very well may get a different answer.
             | 
             | That's the kind of ordering we're talking about here.
        
         | on_the_train wrote:
         | I worked in cad, robotics and now semiconductor optics. In
         | every single field, floating precision down to the very last
         | digits was a huge issue
        
           | cycomanic wrote:
           | Interesting, I stand corrected. In most of the fields I'm
           | aware off one could easily work in 32bit without any issues.
           | 
           | I find the robotics example quite surprising in particular. I
           | think the precision of most input sensors is less than 16bit
           | so. If your inputs have this much noise on them how come you
           | need so much precision your calculations?
        
             | spookie wrote:
             | The precision isn't uniform across a range of possible
             | inputs. This means you need a higher bit depth, even though
             | "you aren't really using it", just so you can establish a
             | good base precision you are sure you are hitting at every
             | range. The part where you are saying "most sensors" is
             | doing a lot of leverage here.
        
           | AlotOfReading wrote:
           | "precision" is an ambiguous term here. There's
           | reproducibility (getting the same results every time),
           | accuracy (getting as close as possible to same results
           | computed with infinite precision), and the native format
           | precision.
           | 
           | ffast-math is sacrificing both the first and the second for
           | performance. Compilers usually sacrifice the first for the
           | second by default with things like automation fma
           | contraction. This isn't a necessary trade-off, it's just
           | easier.
           | 
           | There's very few cases where you actually need accuracy down
           | to the ULP though. No robot can do anything meaningful with
           | femtometer+ precision, for example. Instead you choose a
           | development balance between reproducibility (relatively easy)
           | and accuracy (extremely hard). In robotics, that will usually
           | swing a bit towards reproducibility. CAD would swing more
           | towards accuracy.
        
       | quotemstr wrote:
       | All I want for Christmas is a programming language that uses
       | dependant typing to make floating point precision part of the
       | type system. Catastrophic cancellation should be a compiler error
       | if you assign the output to a float with better ulps than you get
       | with worst case operands.
        
         | thesuperbigfrog wrote:
         | Ada might have what you want:
         | 
         | https://www.jviotti.com/2017/12/05/an-introduction-to-adas-s...
         | 
         | http://www.ada-auth.org/standards/22rm/html/RM-3-5-7.html
         | 
         | http://www.ada-auth.org/standards/22rm/html/RM-A-5-3.html
         | 
         | Ada also has fixed point types:
         | 
         | http://www.ada-auth.org/standards/22rm/html/RM-3-5-9.html
        
       | storus wrote:
       | This problem is happening even on Apple MPS with PyTorch in deep
       | learning, where fast math is used by default in many operations,
       | leading to a garbage output. I hit it recently while training an
       | autoregressive image generation model. Here is a discussion by
       | folks that hit it as well:
       | 
       | https://github.com/pytorch/pytorch/issues/84936
        
       | JKCalhoun wrote:
       | > Even compiler developers can't agree.
       | 
       | > This is perhaps the single most frequent cause of fast-math-
       | related StackOverflow questions and GitHub bug reports
       | 
       | The second line above should settle the first.
        
         | layer8 wrote:
         | The first line points out that it doesn't, even if one thinks
         | that it should. Also, note the "perhaps".
        
       | teleforce wrote:
       | "Nothing brings fear to my heart more than a floating point
       | number." - Gerald Jay Sussman
       | 
       | Is there any IEEE standards committee working on FP alternative
       | for examples Unum and Posit [1],[2].
       | 
       | [1] Unum & Posit:
       | 
       | https://posithub.org/about
       | 
       | [2] The End of Error:
       | 
       | https://www.oreilly.com/library/view/the-end-of/978148223986...
        
         | Q6T46nT668w6i3m wrote:
         | Is this sarcasm? If not, the proposed posit standard, IEEE
         | P3109.
        
           | teleforce wrote:
           | Great, didn't know that it exists.
        
           | pclmulqdq wrote:
           | The current P3109 draft has no posits in it.
        
         | kvemkon wrote:
         | I'm wondering, why there are still no announcements for
         | hardware support of such approaches in CPUs.
        
           | neepi wrote:
           | HP had proper deterministic decimal arithmetic since the
           | 1970s.
        
       | datameta wrote:
       | Luckily outside of mission critical systems, like in demoscene
       | coding, I can happily use "44/7" as a 2pi approximation (my
       | beloved)
        
       | razighter777 wrote:
       | The worst thing that strikes fear into me is seeing floating
       | points used for real world currency. Dear god. So many things can
       | go wrong. I always use unsigned integers counting number of
       | cents. And if I gotta handle multiple currencies, then I'll use
       | or make a wrapper class.
        
         | knert wrote:
         | How do you store negative numbers?
        
           | psychoslave wrote:
           | Maybe as in accounting, one column for benefits, one for
           | debts?
        
           | MobiusHorizons wrote:
           | You use a signed integer type, so you just store a negative
           | number.
           | 
           | You can think of fixed point as equivalent to ieee754 floats
           | with a fixed exponent and a two's complement mantissa instead
           | of a sign bit.
        
         | rcleveng wrote:
         | Wrappers are good even when non dealing with multiple
         | currencies since in many places some transactions are in
         | fractions of cents, so depending on the usecase may need to
         | push that decimal a few places out.
         | 
         | I always have a wrapper class to put the logic of converting to
         | whole currency units when and if needed, as well as when
         | requirements change and now you need 4 digits past the decimal
         | instead of 2, etc.
        
         | simonw wrote:
         | I've been having an interesting challenge relating to this
         | recently. I'm trying to calculate costs for LLM usage, but the
         | amounts of money involved are _so tiny_. Gemini 1.5 Flash 8B is
         | $0.0375 per million tokens!
         | 
         | Should I be running my accounting system on units of 10
         | billionths of a dollar?
        
           | outurnate wrote:
           | You're better off representing values as rationals; a ratio
           | between two different numbers. For example, 0.0375 would be
           | represented as 375 over 10000, or 3 over 80
        
             | simonw wrote:
             | Sounds hard to model in SQLite.
        
               | teaearlgraycold wrote:
               | Two columns?
        
             | anthk wrote:
             | From Forth, here's how I'd set the rationals:
             | : gcd begin dup while tuck mod repeat drop ;         : lcm
             | 2dup \* abs -rot gcd / ;         : reduce 2dup gcd tuck /
             | >r / r> ;         : q+ rot 2dup \* >r rot \* -rot \* + r>
             | reduce ;         : q- swap negate swap q+ ;         : q\*
             | rot \* >r  \* r> reduce ;         : q/ >r \* swap r> \*
             | swap reduce ;
             | 
             | Example: to compute 70 * 0.25 = 35/2
             | 
             | 70 1 1 4 q* reduce .s 35 2 ok
             | 
             | On stack managing words like 2dup, rot and such, these are
             | easily grasped under either Google/DDG or any Forth with
             | the words "see" and/or "help".
             | 
             | as a hint, q- swaps the top two numbers in the stack,
             | (which compose a rational), makes the last one negative and
             | then turns back its position. And then it calls q+.
             | 
             | So, 2/5 - 3/2 = 2/5 + -3/2.
        
           | marcosdumay wrote:
           | Accounting happens on the unities people pay, not the ones
           | that generate expenses.
           | 
           | But you probably should run your billing in fixed point or
           | floating decimals with a billionth of a dollar precision,
           | yes. Either that or you should consolidate the expenses into
           | larger bunches.
        
           | scott_w wrote:
           | Fixed point Decimal is your friend here. I'm guessing you buy
           | tokens in increments of 1,000,000 so it isn't too much of an
           | issue to account for. You can then normalise in your
           | accounting so 1,000,000 is just "1 unit," or you can just
           | account in increments of 1,000,000 but that does start
           | looking weird (but might be necessary!)
        
             | Filligree wrote:
             | No, billing happens per-token. It's entirely necessary to
             | use billionths of a dollar here, if you don't use floating
             | point.
        
               | scott_w wrote:
               | In which case, I'd look at this thread
               | https://news.ycombinator.com/item?id=44145263
        
           | latchkey wrote:
           | Ethereum is 1e18 or 1 wei.
           | 
           | https://ethereum.stackexchange.com/questions/158517/does-
           | sol...
        
           | kolbe wrote:
           | I've used Auroa Units to do this. You can define the dollars
           | dimension, and then all the nano-micro-whatever scale comes
           | with.
        
           | klysm wrote:
           | Convert to money as late as possible
        
             | roryirvine wrote:
             | This is surely the right answer: simply count the number of
             | tokens used, and do the billing reconciliation as a
             | separate step.
             | 
             | As an added benefit, it makes it much easier to deal with
             | price changes.
        
         | osigurdson wrote:
         | Wouldn't it be better to use a decimal type?
        
           | MobiusHorizons wrote:
           | This is what's called a fixed point decimal type. If you need
           | variable precision, then a decimal type might be a good idea,
           | but fixed point removes a lot of potential foot guns if the
           | constraints work for you.
        
             | osigurdson wrote:
             | I meant fixed point decimal type (like C#) 128 bit. I don't
             | understand why the parent commenter (top voted comment?)
             | used unsigned integers to track individual cents. Why roll
             | your own decimal type?
             | 
             | Using arbitrary precision doesn't make sense if the data
             | needs to be stored in a database (for most situations at
             | least). Regardless, infinite precision is magical thinking
             | anyway: try adding Pi to your bank account without loss of
             | precision.
        
               | MobiusHorizons wrote:
               | the C# decimal type is not fixed point, its a floating
               | point implementation, but just uses a base 10 exponent
               | instead of a base 2 one like IEE754 floats.
               | 
               | Fixed point is a general technique that is commonly done
               | with machine integers when the necessary precision is
               | known at compile time. It is frequently used on embedded
               | devices that don't have a floating point unit to avoid
               | slow software based floating point implementations.
               | Limiting the precision to $0.01 makes sense if you only
               | do addition or subtraction. Precision of $0.001 (Tenths
               | of a cent also called mils) may be necessary when
               | calculating taxes or applying other percentages although
               | this is typically called out in the relevant laws or
               | regulations.
        
               | osigurdson wrote:
               | Good to know. In a scientific domain so haven't used it
               | previously.
        
               | MobiusHorizons wrote:
               | Fun fact there is a decimal type on some hardware. I
               | believe Power PC, and presumably mainframes. You can
               | actually use it from C although it's a software
               | implementation on most hardware. IEEE754-2008 if you are
               | curious.
        
             | jjmarr wrote:
             | IEEE754 defines a floating point decimal type. What are
             | your opinions on that?
        
               | MobiusHorizons wrote:
               | It's very cool, but not present on most hardware. Fixed
               | point is a lot simpler though if you are dealing with
               | something with inherent granularity like currency
        
         | nurettin wrote:
         | I inherited systems that trade real world money using f64. They
         | work surprisingly well, and the errors and bugs are almost
         | never due to rounding. Those that are also have easy fixes. So
         | I'm always baffled by this "expert opinion" of using integers
         | for cents. It is pretty much up there with "never use python
         | pickle it is unsafe" and "never use http, even if the program
         | will never leave the subnet".
        
           | dataangel wrote:
           | you can't accurately represent 10 cents with floats, 0.1 is
           | not directly representable. same with 1 cent, 0.01. Seems
           | like if you do and significant math on prices you should run
           | into rounding issues pretty quickly?
        
             | adgjlsfhk1 wrote:
             | no. Float64 has 16 digits of precision. Therefore even if
             | you're dealing with trillions of dollars, you have accuracy
             | down to the thousandth of a cent.
        
               | cstrahan wrote:
               | You might want to re-study this topic.
               | 
               | The decimal number 0.1 has an infinitely repeating binary
               | fraction.
               | 
               | Consider how 1/3 in decimal is 0.33333... If you truncate
               | that to some finite prefix, you no longer have 1/3. Now
               | let's suppose we know, in some context, that we'll only
               | ever have a finite number of digits -- let's say 5 digits
               | after the decimal point. Then, if someone asks "what
               | fraction is equivalent to 0.33333?", then it is
               | reasonable to reply with "1/3". That might sound like
               | we're lying, but remember that we agreed that, in this
               | context of discussion, we have a finite number of digits
               | -- so the value 1/3 _outside_ of this context has no way
               | of being represented faithfully _inside_ this context, so
               | we can only assume that the person is asking about the
               | nearest approximation of "1 /3 as it means outside this
               | context". If the person asking feels lied to, that's on
               | them for not keeping the base assumptions straight.
               | 
               | So back to floating point, and the case of 0.1
               | represented as 64 bit floating point number. In base 2,
               | the decimal number 0.1 looks like 0.0001100110011... (the
               | 0011 being repeated infinitely). But we don't have an
               | infinite number of digits. The finite truncation of that
               | is the closest we can get to the decimal number 0.1, and
               | by the same rationale as earlier (where I said that
               | equating 1/3 with 0.33333 is reasonable), your
               | programming language will likely parse "0.1" as a f64 and
               | print it back out as such. However, if you try something
               | like (a=0.1; a+a+a) you'll likely be surprised at what
               | you find.
        
               | adgjlsfhk1 wrote:
               | > you'll likely be surprised at what you find.
               | 
               | I very much doubt it. My day job is writing symbolic-
               | numeric code. The result of 0.1+0.1+0.1 != 0.3, but for
               | rounding to bring it up to 0.31 (i.e. rounding causing an
               | error of 1 cent), you would need to accumulate at least
               | .005 error, which will not happen unless you lose 13 out
               | of your 16 digits of precision, which will not happen
               | unless you do something incredibly stupid.
        
             | nulld3v wrote:
             | I'm curious where you got this idea from because it is
             | trivially disprovable by typing 0.1 or 0.01 into any python
             | or JS REPL?
        
               | krapht wrote:
               | https://docs.python.org/3/tutorial/floatingpoint.html
               | 
               | Stop at any finite number of bits, and you get an
               | approximation. On most machines today, floats are
               | approximated using a binary fraction with the numerator
               | using the first 53 bits starting with the most
               | significant bit and with the denominator as a power of
               | two. In the case of 1/10, the binary fraction is
               | 3602879701896397 / 2 * 55 which is close to but not
               | exactly equal to the true value of 1/10.
               | 
               | Many users are not aware of the approximation because of
               | the way values are displayed. Python only prints a
               | decimal approximation to the true decimal value of the
               | binary approximation stored by the machine. On most
               | machines, if Python were to print the true decimal value
               | of the binary approximation stored for 0.1, it would have
               | to display:                 0.1
               | 0.1000000000000000055511151231257827021181583404541015625
               | 
               | That is more digits than most people find useful, so
               | Python keeps the number of digits manageable by
               | displaying a rounded value instead:                 1 /
               | 10       0.1
               | 
               | That being said, double should be fine unless you're
               | aggregating trillions of low cost transactions. (API
               | calls?)
        
               | Izkata wrote:
               | For anyone curious about testing it themselves and/or
               | wanting to try other numbers:                 >>> from
               | decimal import Decimal       >>> Decimal(0.1)       Decim
               | al('0.100000000000000005551115123125782702118158340454101
               | 5625')
        
               | chowells wrote:
               | Do you believe that the way the REPL prints a number is
               | the way it's stored internally? If so, explaining this
               | will be a fun exercise:                   $ python3
               | Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0]
               | on linux         Type "help", "copyright", "credits" or
               | "license" for more information.         >>> a = 0.1
               | >>> a + a + a         0.30000000000000004
               | 
               | By way of explanation, the algorithm used to render a
               | floating point number to text used in most languages
               | these days is to find the shortest string representation
               | that will parse back to an identical bit pattern. This
               | has the direct effect of causing a REPL to print what you
               | typed in. (Well, within certain ranges of "reasonable"
               | inputs.) But this doesn't mean that the language stores
               | what you typed in - just an approximation of it.
        
               | anthk wrote:
               | Oddly, tcl prints 0.30000000000000004 while jimtcl prints
               | 0.3, while with 1/7 both crap out and round it to a
               | simple 0.
               | 
               | Edit: Now it does it fine after inputting floats:
               | 
               | puts [ expr { 1.0/7.0 } ]
               | 
               | Eforth on top of Subleq, a very small and dumb virtual
               | machine:                    1 f 7 f f/ f.          0.143
               | ok
               | 
               | Still, using rationals where possible (and mod operations
               | otherwise) gives a great 'precision', except for
               | irrationals.
        
               | nulld3v wrote:
               | :facepalm: my bad, I completely missed the more rational
               | intepretation of OP's comment...
               | 
               | I interpreted "directly representable" as "uniquely
               | representable", all < 15 digit decimals are uniquely
               | represented in fp64 so it is always safe to roundtrip
               | between those decimals <-> f64, though indeed this
               | guarantee is lost once you perform any math.
        
             | CamperBob2 wrote:
             | At the end of a long chain of calculations you're going to
             | round to the nearest 0.01. It will be a LONG time before
             | errors caused by double-precision floats cause you to gain
             | or lose a penny.
        
           | SonOfLilit wrote:
           | You can make money modeling buy/sell decisions in floats and
           | then having the bank execute them, but if the bank models
           | your account as a float and loses a cent here and there, it
           | will be sued into bankruptcy.
        
             | kccqzy wrote:
             | You will not lose a cent here and there just by using
             | float64, for the range of values that banks deal with. For
             | added assurance, just round to the nearest cent after each
             | operation.
        
             | jcranmer wrote:
             | A double-precision float has ~16 decimal digits of
             | precision. Which means as long as your bank account is less
             | than a quadrillion dollars, it can accurately store the
             | balance to the nearest cent.
        
         | scott_w wrote:
         | For far too many years I had inherited a billing system that
         | used floats for all calculations then rounded up or down. Also
         | doing some calculations in JS and mirroring them on the Python
         | backend, so "just switch to Decimal" wasn't an easy change to
         | make...
        
         | layer8 wrote:
         | Wait until you learn that Excel calculates everything using
         | floating-point, and doesn't even fully observe IEEE 754.
         | 
         | https://learn.microsoft.com/en-us/office/troubleshoot/excel/...
         | 
         | (It nevertheless happens to work just fine for most of what
         | Excel is used for.)
        
         | jksflkjl3jk3 wrote:
         | Floating point math shouldn't be that scary. The rules are well
         | defined in standards, and for many domains are the only
         | realistic option for performance reasons.
         | 
         | I've spent most of my career writing trading systems that have
         | executed 100's of billions of dollars worth of trades, and have
         | never had any floating point related bugs.
         | 
         | Using some kind of fixed point math would be entirely
         | inappropriate for most HFT or scientific computing
         | applications.
        
           | eddd-ddde wrote:
           | How do you handle the lack of commutativity? I've always
           | wondered about the practical implications.
        
             | jcranmer wrote:
             | Floating-point is completely commutative (ignoring NaN
             | payloads).
             | 
             | It's the associativity law that it fails to uphold.
        
             | jakevoytko wrote:
             | I asked an ex-Bloomberg coder this question once after he
             | told me he used floating points to represent currency all
             | the time, and his response was along the lines of "unless
             | you have blindingly-obvious problems like doing operations
             | on near-zero numbers against very large numbers, these
             | calculations are off by small amounts on their least-
             | significant digits. Why would you waste the time or the
             | electricity dealing with a discrepancy that's not even
             | worth the money to fix?"
        
             | BeetleB wrote:
             | Nitpick: FP arithmetic is commutative. It's not
             | associative.
        
           | kolbe wrote:
           | All your price field messages are sent to the exchange and
           | back via fixed point, so you are using fixed point for at
           | least some of the process (unless you're targeting those few
           | crypto exchanges that use fp prices).
           | 
           | If you need to be extremely fast (like fpga fast), you don't
           | waste compute transforming their fixed point representation
           | into floating.
        
             | djrj477dhsnv wrote:
             | Sure, string encodings are used for most APIs and ultra HFT
             | may pattern match on the raw bytes, but for regular HFT if
             | you're doing much math, it's going to be floating point
             | math.
        
           | usefulcat wrote:
           | You can certainly make trading systems that work using
           | floating point, but there are just so many fewer edge cases
           | to consider when using fixed point.
           | 
           | With fixed point and at least 2 decimal places, 10.01 + 0.01
           | is always _exactly_ equal to 10.02. But with FP you may end
           | up with something like 10.0199999999, and then you have to be
           | extra careful anywhere you convert that to a string that it
           | doesn 't get truncated to 10.01. That could be logging (not
           | great but maybe not the end of the world if that goes wrong),
           | or you could be generating an order message and then it is a
           | real problem. And either way, you have to take care every
           | time you do that, as opposed to solving the problem once at
           | the source, in the way the value is represented.
           | 
           | > Using some kind of fixed point math would be entirely
           | inappropriate for most HFT or scientific computing
           | applications.
           | 
           | In the case of HFT, this would have to depend very greatly on
           | the particulars. I know the systems I write are almost never
           | limited by arithmetical operations, either FP or integer.
        
             | kolbe wrote:
             | It depends on what you're doing. If your system is a linear
             | regression on 30 features, you should probably use floating
             | point. My recollection is fixed is prohibitively slower and
             | with far less FOSS support.
        
             | gamescr wrote:
             | I work on game engines and the problem with floats isn't on
             | small values like 10.01 but on large ones like 400,010.01
             | that's when the precision wildly varies.
        
               | malfist wrote:
               | Not only that but the precision loss accumulates.
               | Multiply too many numbers with small inaccuracies and you
               | wind up with numbers with large inaccuracies
        
               | osigurdson wrote:
               | The issue with floats is the mental model. The best way
               | to think about them is like a ruler with many points
               | clustered around 0 and exponentially fewer as the
               | magnitude grows. Don't think of it like a real value -
               | assume that there are hardly any values represented with
               | perfect precision. Even "normalish" numbers like 10.1 are
               | not in the set actually. When values are converted to
               | strings, even in debuggers sometimes, they are often
               | rounded which throws people off further ("hey, the value
               | is exactly 10.1 - it is right there in the debugger").
               | What you can count on however is that integers are
               | represented with perfect precision up to a point (e.g.
               | 2^53 -1 for f64).
               | 
               | The other "metal model" issue is that associative
               | operations in math. Adding a + (b + c) != (a + b) + c due
               | to rounding. This is where fp-precise vs fp-fast comes
               | in. Let's not talk about 80 bit registers (though that
               | used to be another thing to think about).
        
               | 01HNNWZ0MV43FF wrote:
               | Lua is telling me 0.1 + 0.1 == 0.2, but 0.1 + 0.2 != 0.3.
               | That's 64-bit precision. The issue is not with precision,
               | but with 1/10th being a repeating decimal in binary.
        
               | anthk wrote:
               | Not an issue on Scheme and Common Lisp and even Forth
               | operating directly with rationals with custom words.
        
           | phendrenad2 wrote:
           | I'm wondering if trading systems would run into the same
           | issues as a bank or scientific calculation. You might not be
           | making as many repeated calculations, and might not care if
           | things are "off" by a tiny amount, because you're trading
           | between money and securities, and the "loss" is part of your
           | overhead. If a bank lost $0.01 after every 1 million
           | transactions it would be a minor scandal.
        
             | usefulcat wrote:
             | Personally, I would be more concerned about something like
             | determining whether the spread is more than a penny.
             | Something like:                   if (ask - bid > 0.01) {
             | // etc         }
             | 
             | With floating point, I have to think about the following
             | questions: * What if the constant 0.01 is actually slightly
             | greater than mathematical 0.01? * What if the constant 0.01
             | is actually slightly less than mathematical 0.01? * What if
             | ask - bid is actually slightly greater than the
             | mathematical result? * What if ask - bid is actually
             | slightly less than the mathematical result?
             | 
             | With floating point, that seemingly obvious code is
             | anything but. With fixed point, you have none of those
             | problems.
             | 
             | Granted, this only works for things that are priced in
             | specific denominations (typically hundredths, thousandths,
             | or ten thousandths), which is most securities.
        
               | CamperBob2 wrote:
               | So the spread is 0.0099999 instead of 0.01. When will
               | that difference matter?
        
               | usefulcat wrote:
               | It matters if the strategy is designed to do very
               | different things depending on whether or not the offers
               | are locked (when bid == ask, or spread is less than
               | 0.01).
               | 
               | In this example, I'm talking about securities that are
               | priced in whole cents. If you represent prices as floats,
               | then it's possible that the spread appears to be less (or
               | greater) than 0.01 when it's actually not, due to the
               | inability of floats to exactly represent most real
               | numbers.
        
               | CamperBob2 wrote:
               | But I'm still not understanding the real-world
               | consequences. What will those be, exactly? Any good
               | examples or case studies to look at?
        
               | ljosifov wrote:
               | I can imagine sth like: if (bid ask blah blah) { send
               | order to buy 10 million of AAPL; }
        
           | T0Bi wrote:
           | > Using some kind of fixed point math would be entirely
           | inappropriate for most HFT or scientific computing
           | applications.
           | 
           | May I ask why? (generally curious)
        
             | jcranmer wrote:
             | For starters, it's giving up a lot of performance, since
             | fixed-point isn't accelerated by hardware like floating-
             | point is.
        
               | rendaw wrote:
               | Isn't fixed point just integer?
        
               | mitthrowaway2 wrote:
               | Yes, integer combined with bit-shifts.
        
               | Athas wrote:
               | Yes, but you're not going to have efficient
               | transcendental functions implemented in hardware.
        
               | rendaw wrote:
               | Ah okay, fair enough. But what sort of transcendental
               | functions would you use for HFT?
               | 
               | I guess I understood GGGGP's comment about using fixed
               | point for interacting with currency to be about
               | accounting. I'd expect floating point to be used for
               | trading algorithms, but that's mostly statistics and I
               | presume you'd switch back to fixed point before making
               | trades etc.
        
             | Athas wrote:
             | The problem with fixed point is in its, well, fixed point.
             | You assign a fixed number of bits to the fractional part of
             | the number. This gives you the same absolute precision
             | everywhere, but the relative precision (distance to the
             | next highest or lowest number) is worse for small numbers -
             | which is a problem, because those tend to be pretty
             | important. It's just overall a less efficient use of the
             | bit encoding space (not just performance-wise, but also in
             | the accuracy of the results you get back). Remember that
             | fixed point does not mean absence of rounding errors, and
             | if you use binary fixed point, you still cannot represent
             | many decimal fractions such as 0.1.
        
               | anthk wrote:
               | With fixed point you either scale it up or use rationals.
        
             | osigurdson wrote:
             | Fundamentally there is uncertainty associated with any
             | physical measurement which is usually proportional to the
             | magnitude being measured. As long as floating point is <<
             | this uncertainty results are equally predictive. Floating
             | point numbers bake these assumptions in.
        
           | f33d5173 wrote:
           | It's the front of house/back of house distinction. Front of
           | house should use fixed point, back of house should use
           | floating point. Unless you're doing trading, you want really
           | strict rules with regards to rounding and such, which are
           | going to be easier to achieve with fixed point.
        
             | pasc1878 wrote:
             | I don't think it is that clear. The split I think is
             | between calculating settlement amounts which lead to real
             | transfers of money and so should be fixed point whilst
             | risk, pricing (thus trading) and valuation use models which
             | need many calculations so need to be floating point.
        
         | pie_flavor wrote:
         | One of the things I always appreciate about the crypto
         | community is that you do not have to ask what numeric type is
         | being used for money, it is _always_ 8-digit fixed-point. No
         | floating-point rounding errors to be found anywhere.
        
           | immibis wrote:
           | Correction: Bitcoin is 8-digit fixed-point. But Lightning is
           | 10, IIRC. Other currencies have different conventions. Still,
           | it's fixed within a given system and always fixed-point. As
           | far as I'm aware, there are no floating-point
           | cryptocurrencies at all, because it would be an obvious
           | exploit vector - keep withdrawing 0.000000001 units from your
           | account that has 1.0 units.
        
           | Athas wrote:
           | How does this avoid rounding error? Division and
           | multiplication and still result in nonrepresentable numbers,
           | right?
        
         | jcranmer wrote:
         | I've found fear of the use of floating-point in finance to be a
         | good litmus test for how knowledgeable people are about
         | floating-point. Because as far as I can tell, finance people
         | almost exclusively uses (binary) floating-point [1], whereas a
         | lot of floating-point FUD focuses on how disastrous it is for
         | finance. And honestly, it's a bit baffling to me why so many
         | people seem to think that floating-point is disastrous.
         | 
         | My best guess for the latter proposition is that people are
         | reacting to the default float printing logic of languages like
         | Java, which display a float as the shortest base-10 number that
         | would correctly round to that value, which extremely
         | exaggerates the effect of being off by a few ULPs. By contrast,
         | C-style printf specifies the number of decimal digits to round
         | to, so all the numbers that are off by a few ULPs are still
         | correct.
         | 
         | [1] I'm not entirely sure about the COBOL mainframe
         | applications, given that COBOL itself predates binary floating-
         | point. I know that modern COBOL does have some support for IEEE
         | 754, but that tells me very little about what the applications
         | running around in COBOL do with it.
        
           | pgwhalen wrote:
           | I agree overall but my take is that it shows more ignorance
           | about the domain of finance (or a particular subdomain) than
           | it does about floating-point ignorance.
           | 
           | It's really more of a concern in accounting, when monetary
           | amounts are concrete and represent real money movement
           | between distinct parties. A ton of financial software systems
           | (HFT, trading in general) deal with money in a more abstract
           | way in most of their code, and the particular kinds of
           | imprecision that FP introduces doesn't result in bad business
           | outcomes that outweigh its convenience and other benefits.
        
             | munch117 wrote:
             | FP does not introduce imprecision. Quite the contrary: The
             | continuous rounding (or truncation) triggered by using
             | scaled integers is what introduces imprecision. Whereas
             | exponent scaling in floating point ensures that all the
             | bits in the mantissa are put to good use.
             | 
             | It's a trade-off between precision and predictability.
             | Floating point provides the former. Scaled integers provide
             | the latter.
        
               | pgwhalen wrote:
               | I was using imprecision in a more general and less
               | mathematical sense than the way you're interpreting it,
               | but yes this is a good point about why FP is useful in
               | many financial contexts, when the monetary amount is
               | derived from some model.
        
           | munch117 wrote:
           | The answer is accounting. In accounting you want
           | predictability and reproducibility more than anything, and
           | you are prepared to throw away precision on that alter.
           | 
           | If you're summing up the cost of items in a webshop, then
           | you're in the domain of accounting. If the result appears to
           | be off by a single cent because of a rounding subtlety, then
           | you're in trouble, because even though no one should care
           | about that single cent, it will give the appearance that you
           | don't know what you're doing. Not to mention the trouble you
           | could get in for computing taxes wrong.
           | 
           | If, on the other hand, you're doing financial forecasting or
           | computing stock price targets, then you're not in the domain
           | of accounting, and using floating point for money is just
           | fine.
           | 
           | I'm guessing from your post that your finance people are more
           | like the latter. I could be wrong though - accountants do
           | tend to use Excel.
        
             | jcranmer wrote:
             | To get the right answers for accounting, all you have to do
             | is pay attention to how you're doing rounding, which is no
             | harder for floating-point than it is for fixed-point.
             | Actually, it might be slightly easier for floating-point,
             | since you're probably not as likely to skip over the part
             | of the contract that tells you what the rounding rules you
             | have to follow are.
        
               | munch117 wrote:
               | Agreed. To do accounting, you need to employ some kind of
               | discipline to ensure that you get rounding right. So many
               | people erroneously believe that such a discipline has to
               | be based on fixed point or decimal floating point
               | numbers. But binary floating point can work just fine.
        
       | chuckadams wrote:
       | I haven't worked with C in nearly 20 years and even I remember
       | warnings against -ffast-math. It really ought not to exist: it's
       | just a super-flag for things like -funsafe-math-optizations, and
       | the latter makes it really clear that it's, well, unsafe (or
       | maybe it's actually _fun_ safe!)
        
       | smcameron wrote:
       | One thing I did not see mentioned in the article, or in these
       | comments (according to ctrl-f anyway) is the use of
       | feenableexcept()[1] to track down the source of NaNs in your
       | code.                   feenableexcept(FE_DIVBYZERO | FE_INVALID
       | | FE_OVERFLOW);
       | 
       | will cause your code to get a SIGFPE whenever a NaN crawls out
       | from under a rock. Of course it doesn't work with fast-math
       | enabled, but if you're unknowingly getting NaNs _without_ fast-
       | math enabled, you obviously need to fix those before even trying
       | fast-math, and they can be hard to find, and feenableexcept()
       | makes finding them a lot easier.
       | 
       | [1] https://linux.die.net/man/3/feenableexcept
        
       | hyghjiyhu wrote:
       | One thing I wonder is what happens if you have an inline function
       | in a header that is compiled with fast math by one translation
       | unit and without in another.
        
       | sholladay wrote:
       | Correctness > performance, almost always. It's easier to notice
       | that you need more performance than to notice that you need more
       | correctness. Though performance outliers can definitely be a
       | hidden problem that will bite you.
       | 
       | Make it work. Make it right. Make it fast.
        
       | cbarrick wrote:
       | This page consistently crashes on Vivaldi for Android.
       | 
       | Vivaldi 7.4.3691.52
       | 
       | Android 15; ASUS_AI2302 Build/AQ3A.240812.002
        
       | dirtyhippiefree wrote:
       | I'm stunned by the following admission: "If fast-math was to give
       | always the correct results, it wouldn't be fast-math"
       | 
       | If it's not always correct, whoever chooses to use it chooses to
       | allow error...
       | 
       | Sounds worse than worthless to me.
        
       | mg794613 wrote:
       | Haha, the neverending cycle.
       | 
       | Stop trying. Let their story unfold. Let the pain commence.
       | 
       | Wait 30 years and see them being frustrated trying to tell the
       | next generation.
        
       | boulos wrote:
       | I've also come around to --ffast-math considered harmful. It's
       | useful though to help find optimization _opportunities_ , but in
       | the modern (AVX2+) world, I think the risks outweigh the
       | benefits.
       | 
       | I'm surprised by the take that FTZ is worse than reassociation.
       | FTZ being environmental rather than per instruction is certainly
       | unfortunate, but that's true of rounding modes generally in x86.
       | And I would argue that _most_ programs are unprepared to handle
       | subnormals anyway.
       | 
       | By contrast, reassociation definitely allows more optimization,
       | but it also prohibits you from specifying the order precisely:
       | 
       | > Allow re-association of operands in series of floating-point
       | operations. This violates the ISO C and C++ language standard by
       | possibly changing computation result.
       | 
       | I haven't followed standards work in forever, but I imagine that
       | the introduction of std::fma, gets people most of the benefit.
       | That combined with something akin to volatile (if it actually
       | worked) would probably be good enough for most people. Known,
       | numerically sensitive code paths would be carefully written,
       | while the rest of the code base can effectively be "meh, don't
       | care".
        
       | leephillips wrote:
       | This part was fascinating:
       | 
       | "The problem is how FTZ actually implemented on most hardware: it
       | is not set per-instruction, but instead controlled by the
       | floating point environment: more specifically, it is controlled
       | by the floating point control register, which on most systems is
       | set at the thread level: enabling FTZ will affect all other
       | operations in the same thread.
       | 
       | "GCC with -funsafe-math-optimizations enables FTZ (and its close
       | relation, denormals-are-zero, or DAZ), even when building shared
       | libraries. That means simply loading a shared library can change
       | the results in completely unrelated code, which is a fun
       | debugging experience."
        
       ___________________________________________________________________
       (page generated 2025-05-31 23:00 UTC)