[HN Gopher] Understanding Fast-Math
       ___________________________________________________________________
        
       Understanding Fast-Math
        
       Author : ingve
       Score  : 63 points
       Date   : 2021-12-08 08:20 UTC (1 days ago)
        
 (HTM) web link (pspdfkit.com)
 (TXT) w3m dump (pspdfkit.com)
        
       | optimalsolver wrote:
       | "-fno-math-errno" and "-fno-signed-zeros" can be turned on
       | without any problems.
       | 
       | I got a four times speedup on <cmath> functions with no loss in
       | accuracy.
       | 
       | See also "Why Standard C++ Math Functions Are Slow":
       | 
       | https://medium.com/@ryan.burn/why-standard-c-math-functions-...
        
         | iamcreasy wrote:
         | Sorry, if it's a basic question - but does recompiling C/C++
         | code(with/without flags) produce more efficient code most of
         | the time? For example - let say I am using a binary that was
         | compiled on a processor that didn't have support for SIMD.
         | Assuming the program is capable of taking advantage of SIMD
         | instructions, and also assuming my processor support SIMD -
         | would it make sense to recompile the C/C++ code on my system
         | again hoping the newer binary would run faster?
        
           | optimalsolver wrote:
           | The more CPU-bound your program, the more benefit you'll see
           | from the optimization flags. If your program is constantly
           | waiting around for input on a network channel, then it may
           | not help as much.
           | 
           | My currently used optimization flags are: -O3 -fno-math-errno
           | -fno-signed-zeros -march=native -flto
           | 
           | Only use -march=native if the program is only intended to run
           | on your own machine. It carries out architecture-specific
           | optimizations that make the program non-portable.
           | 
           | Also look into profile-guided optimization, where you compile
           | and run your program, automatically generate a statistical
           | report, then recompile using that generated information. It
           | can result in some dramatic speedups.
           | 
           | https://ddmler.github.io/compiler/2018/06/29/profile-
           | guided-...
        
           | KMag wrote:
           | Yes. Careful selection of compilation flags can greatly
           | improve performance.
           | 
           | My employer spends many millions of dollars annually running
           | numerical simulations using double precision floating point
           | numbers. Some years ago when we retired the last machines
           | that didn't support SSE2, adding a flag to allow the compiler
           | to generate SSE2 instructions had a big time and cost savings
           | for our simulations.
        
             | jcranmer wrote:
             | > Some years ago when we retired the last machines that
             | didn't support SSE2, adding a flag to allow the compiler to
             | generate SSE2 instructions had a big time and cost savings
             | for our simulations.
             | 
             | That's kind of a special case, though. Without SSE2, you're
             | using x87 for floating-point numbers, and even using scalar
             | floating point on x87 is going to be a fair bit slower than
             | using scalar floating point SSE instructions. Of course,
             | enabling SSE also allows you to vectorize floating point at
             | all, but you'll still be seeing improvements just from
             | scalar SSE instead of x87.
        
           | mhh__ wrote:
           | 3 thoughts:
           | 
           | 1. Using SIMD can be a big win, so yes.
           | 
           | 2. SIMD (vectorization) is not the only optimization your
           | compiler can do, the compiler has a model of the processor so
           | it can pick the right instructions and lay them out properly
           | with as many tricks as they can describe generically.
           | 
           | 3. Compilers have PGO. Use it (if you can). Compilers without
           | PGO are a bit like an engine management unit with no sensors
           | - all the gear, no idea. The compiler has to assume a hazy
           | middle-of-the-road estimate of what branches will be
           | exercised, whereas with PGO enabled your compiler can make
           | the cold code smaller, and be more aggressive with hot code
           | etc. etc.
        
             | bee_rider wrote:
             | > all the gear, no idea
             | 
             | I like this because it only makes sense in some accents.
             | For example it wouldn't work in Boston where the r would
             | only be pronounced on one of the words (idea).
        
           | bee_rider wrote:
           | gcc has the ability to target different architectures (look
           | up the -march and -mtune flags for example). Linux
           | distributions are typically set to be compatible with a
           | pretty wide range of devices, so they often don't take
           | advantage of recent instructions.
           | 
           | Compiling a big program can be a bit of a pain, though, so it
           | is probably only worthwhile if you have a program that you
           | use very frequently. Also compilers aren't magic, the
           | bottleneck in the program you want to run could be various
           | things: CPU stuff, memory bandwidth, weird memory access
           | patterns, disk access, network access, etc. The compiler
           | mostly just helps with the first one.
           | 
           | Also, note that some libraries, like Intel's MKL, are able to
           | check what processor you are using and just dispatch the
           | appropriate code (your mileage may vary, they sometimes don't
           | keep up with changes in AMD processors, causing great
           | annoyance).
        
         | AstralStorm wrote:
         | There are a few critical algorithms where fp error cancellation
         | or simple ifs get optimized out if you disable signed zeros.
         | Typically you would know which these are, they tend to appear
         | in statistical machine learning which use sign or expect
         | monotonicity near zero, or filters with coefficients that are
         | near zero (and filter out NaNs explicitly).
        
           | CamperBob2 wrote:
           | What would be an example of a filter with coefficients near
           | zero that would be adversely affected by the loss of signed-
           | zero support?
           | 
           | You're already in mortal peril if you're working with
           | "coefficients near zero" because of denormals, another bad
           | idea that should have been disabled by default and turned on
           | only in the vanishingly-few applications that benefit from
           | them.
        
       | bla3 wrote:
       | Recent post on the same topic that gets the same information
       | across with fewer words:
       | https://kristerw.github.io/2021/10/19/fast-math/
        
       | [deleted]
        
       | bee_rider wrote:
       | Nitpicky, but saying
       | 
       | > It turns out that, like with almost anything else relating to
       | IEEE floating-point math, it's a rabbit hole full of surprising
       | behaviors.
       | 
       | immediately before describing that they disabled IEEE floating-
       | point math is a bit funny. The standard isn't surprising,
       | floating point numbers (arguably) are. The whole point of
       | standardizing floating point path was to reduce this
       | surprisingness. Can't complain about IEEE floating point numbers
       | if you tell the compiler not to use them.
        
         | sampo wrote:
         | It takes some time to learn how floating point and numerical
         | calculations work. Not too long, but more than one evening. If
         | you take a numerical analysis course in a university, the first
         | 2 or 3 lectures might be about floating point, error
         | propagation and error analysis. Or the first chapter of a
         | numerical analysis textbook.
         | 
         | But almost nobody spends this much effort to familiarize
         | themselves with the floating point system. So it keeps
         | surprising people.
        
       | toolslive wrote:
       | Once spent 2 days trying to find out why a python prototype
       | yielded different results on a different platform (in casu
       | Solaris). Eventually discovered that an unnamed culprit
       | implemented the cunning plan to compile the interpreter with
       | `-ffast-math`. Fun times
        
       | kristofferc wrote:
       | https://simonbyrne.github.io/notes/fastmath/ also has a nice
       | discussion about the possible pitfalls of "fast"-math.
        
       | gumby wrote:
       | I am shocked by the plethora of posts by people surprised when
       | -ffast-math breaks their code (no insult to the pspdfkit folks
       | intended).
       | 
       | If it were a harmless flag you'd think it would be enabled by
       | default. That's a clue that you should look before you leap.
       | 
       | We have a few files in our code base that compile with -ffast-
       | math but the rest don't. Those files were written with that flag
       | in mind.
        
         | asveikau wrote:
         | People sometimes have unreasonable faith in compilers or
         | libraries. It's a common adage to look at bugs in application
         | code before suspecting these lower layers. That is _not_ the
         | same thing as the lower layers never producing unexpected
         | results. So when people get evidence that a compiler feature
         | may be rough around the edges, they resist it.
        
         | Sharlin wrote:
         | No, it's easy to assume (even after reading the documentation)
         | that -ffast-math just makes some calculations less precise or
         | otherwise not strictly IEEE 754 compliant. That would be plenty
         | reason enough not to enable it by default, but make it
         | available for those who desire maximum performance at the
         | expense of precision (or accuracy), as may be a perfectly
         | reasonable tradeoff in cases like computer graphics. It's very
         | understandable that people are surprised by the fact that
         | -ffast-math can actually _break_ code in unintuitive ways.
        
       | Symmetry wrote:
       | I sort of assume the main benefit of assuming no NaNs is being
       | able to replace `a == a` with `true`?
       | 
       | The handling of isnan is certainly a big question. I can see
       | wanting to respect that, but I can also see littering your code
       | with assertions that isnan is false and compiling with normal
       | optimizations and then hoping that later recompiling with fast-
       | math and all its attendant "fun, safe optimizations" will let you
       | avoid any performance penalty for all those asserts.
        
         | jcranmer wrote:
         | There's also several cases where handling NaNs "correctly" may
         | require a fair amount of extra work (C99's behavior for complex
         | multiplication is far more complicated when NaNs are involved).
        
       | assbuttbuttass wrote:
       | > To my surprise, there were no measurable differences outside of
       | the standard deviation we already see when repeating tests
       | multiple times.
       | 
       | This should be the main takeaway. Don't enable ffast-math if
       | floating point calculations aren't a bottleneck, and especially
       | not if you don't understand all the other optimizations it
       | enables.
        
         | nickelpro wrote:
         | Counter argument: Always enable fast-math unless your
         | application has a demonstrable need for deterministically
         | consistent floating point math. If you're just trying to get a
         | "good enough" calculation (which is the vast majority of
         | floating point work, physics and rendering calculations for 3D
         | graphics), there's no reason to leave the performance on the
         | floor.
        
           | nemetroid wrote:
           | > there's no reason to leave the performance on the floor.
           | 
           | As the article demonstrates, there is: with ffast-math,
           | floating point subtly behaves in ways that don't match the
           | ways you've been taught it behaves.
        
           | a_e_k wrote:
           | This is the approach that I take! (Note: I write graphics and
           | rendering code for a living. "Good enough" for me tends to
           | mean quantizing to either identical pixels at the display bit
           | depth or at least perceptually identical pixels. Also, I
           | usually do see a measurable performance benefit to -Ofast
           | over -O3 or -O2. YMMV.)
           | 
           | Just like cranking the warning levels as high as I can at the
           | beginning of a project, I also like to build and test with
           | -ffast-math (really -Ofast) from the very beginning. Keeping
           | it warning-free and working under -ffast-math as I go is a
           | lot easier than trying to do it all at once later!
           | 
           | And much like the warnings, I find that any new code that
           | fails under -ffast-math tends to be a bit suspect. I've found
           | stuff that tends to break under -ffast-math will also
           | frequently break with a different compiler or on a different
           | hardware architecture. So -ffast-math is a nice canary for
           | that.
        
           | josefx wrote:
           | > Always enable fast-math unless your application has a
           | demonstrable need for deterministically consistent floating
           | point math.
           | 
           | As far as I remember fast math also breaks things like NaN
           | and infinity, which makes filtering out invalid values before
           | they hit something important fun since they will obviously
           | still exist and mess up your results but you can no longer
           | check for them.
        
           | chpatrick wrote:
           | Unless you upgrade your compiler and now your climate model
           | produces different results... I think "good enough" is
           | actually pretty rare unless it's for games or something.
        
       | dahart wrote:
       | I've long wondered if changing the name to something like
       | -ffast_inaccurate_math might help stem the tide of surprises and
       | mistakes. Having only "fast" in the name makes it sounds like a
       | good thing, rather than a tradeoff to consider.
        
       ___________________________________________________________________
       (page generated 2021-12-09 23:01 UTC)