[HN Gopher] Understanding Fast-Math
___________________________________________________________________
Understanding Fast-Math
Author : ingve
Score : 63 points
Date : 2021-12-08 08:20 UTC (1 days ago)
(HTM) web link (pspdfkit.com)
(TXT) w3m dump (pspdfkit.com)
| optimalsolver wrote:
| "-fno-math-errno" and "-fno-signed-zeros" can be turned on
| without any problems.
|
| I got a four times speedup on <cmath> functions with no loss in
| accuracy.
|
| See also "Why Standard C++ Math Functions Are Slow":
|
| https://medium.com/@ryan.burn/why-standard-c-math-functions-...
| iamcreasy wrote:
| Sorry, if it's a basic question - but does recompiling C/C++
| code(with/without flags) produce more efficient code most of
| the time? For example - let say I am using a binary that was
| compiled on a processor that didn't have support for SIMD.
| Assuming the program is capable of taking advantage of SIMD
| instructions, and also assuming my processor support SIMD -
| would it make sense to recompile the C/C++ code on my system
| again hoping the newer binary would run faster?
| optimalsolver wrote:
| The more CPU-bound your program, the more benefit you'll see
| from the optimization flags. If your program is constantly
| waiting around for input on a network channel, then it may
| not help as much.
|
| My currently used optimization flags are: -O3 -fno-math-errno
| -fno-signed-zeros -march=native -flto
|
| Only use -march=native if the program is only intended to run
| on your own machine. It carries out architecture-specific
| optimizations that make the program non-portable.
|
| Also look into profile-guided optimization, where you compile
| and run your program, automatically generate a statistical
| report, then recompile using that generated information. It
| can result in some dramatic speedups.
|
| https://ddmler.github.io/compiler/2018/06/29/profile-
| guided-...
| KMag wrote:
| Yes. Careful selection of compilation flags can greatly
| improve performance.
|
| My employer spends many millions of dollars annually running
| numerical simulations using double precision floating point
| numbers. Some years ago when we retired the last machines
| that didn't support SSE2, adding a flag to allow the compiler
| to generate SSE2 instructions had a big time and cost savings
| for our simulations.
| jcranmer wrote:
| > Some years ago when we retired the last machines that
| didn't support SSE2, adding a flag to allow the compiler to
| generate SSE2 instructions had a big time and cost savings
| for our simulations.
|
| That's kind of a special case, though. Without SSE2, you're
| using x87 for floating-point numbers, and even using scalar
| floating point on x87 is going to be a fair bit slower than
| using scalar floating point SSE instructions. Of course,
| enabling SSE also allows you to vectorize floating point at
| all, but you'll still be seeing improvements just from
| scalar SSE instead of x87.
| mhh__ wrote:
| 3 thoughts:
|
| 1. Using SIMD can be a big win, so yes.
|
| 2. SIMD (vectorization) is not the only optimization your
| compiler can do, the compiler has a model of the processor so
| it can pick the right instructions and lay them out properly
| with as many tricks as they can describe generically.
|
| 3. Compilers have PGO. Use it (if you can). Compilers without
| PGO are a bit like an engine management unit with no sensors
| - all the gear, no idea. The compiler has to assume a hazy
| middle-of-the-road estimate of what branches will be
| exercised, whereas with PGO enabled your compiler can make
| the cold code smaller, and be more aggressive with hot code
| etc. etc.
| bee_rider wrote:
| > all the gear, no idea
|
| I like this because it only makes sense in some accents.
| For example it wouldn't work in Boston where the r would
| only be pronounced on one of the words (idea).
| bee_rider wrote:
| gcc has the ability to target different architectures (look
| up the -march and -mtune flags for example). Linux
| distributions are typically set to be compatible with a
| pretty wide range of devices, so they often don't take
| advantage of recent instructions.
|
| Compiling a big program can be a bit of a pain, though, so it
| is probably only worthwhile if you have a program that you
| use very frequently. Also compilers aren't magic, the
| bottleneck in the program you want to run could be various
| things: CPU stuff, memory bandwidth, weird memory access
| patterns, disk access, network access, etc. The compiler
| mostly just helps with the first one.
|
| Also, note that some libraries, like Intel's MKL, are able to
| check what processor you are using and just dispatch the
| appropriate code (your mileage may vary, they sometimes don't
| keep up with changes in AMD processors, causing great
| annoyance).
| AstralStorm wrote:
| There are a few critical algorithms where fp error cancellation
| or simple ifs get optimized out if you disable signed zeros.
| Typically you would know which these are, they tend to appear
| in statistical machine learning which use sign or expect
| monotonicity near zero, or filters with coefficients that are
| near zero (and filter out NaNs explicitly).
| CamperBob2 wrote:
| What would be an example of a filter with coefficients near
| zero that would be adversely affected by the loss of signed-
| zero support?
|
| You're already in mortal peril if you're working with
| "coefficients near zero" because of denormals, another bad
| idea that should have been disabled by default and turned on
| only in the vanishingly-few applications that benefit from
| them.
| bla3 wrote:
| Recent post on the same topic that gets the same information
| across with fewer words:
| https://kristerw.github.io/2021/10/19/fast-math/
| [deleted]
| bee_rider wrote:
| Nitpicky, but saying
|
| > It turns out that, like with almost anything else relating to
| IEEE floating-point math, it's a rabbit hole full of surprising
| behaviors.
|
| immediately before describing that they disabled IEEE floating-
| point math is a bit funny. The standard isn't surprising,
| floating point numbers (arguably) are. The whole point of
| standardizing floating point path was to reduce this
| surprisingness. Can't complain about IEEE floating point numbers
| if you tell the compiler not to use them.
| sampo wrote:
| It takes some time to learn how floating point and numerical
| calculations work. Not too long, but more than one evening. If
| you take a numerical analysis course in a university, the first
| 2 or 3 lectures might be about floating point, error
| propagation and error analysis. Or the first chapter of a
| numerical analysis textbook.
|
| But almost nobody spends this much effort to familiarize
| themselves with the floating point system. So it keeps
| surprising people.
| toolslive wrote:
| Once spent 2 days trying to find out why a python prototype
| yielded different results on a different platform (in casu
| Solaris). Eventually discovered that an unnamed culprit
| implemented the cunning plan to compile the interpreter with
| `-ffast-math`. Fun times
| kristofferc wrote:
| https://simonbyrne.github.io/notes/fastmath/ also has a nice
| discussion about the possible pitfalls of "fast"-math.
| gumby wrote:
| I am shocked by the plethora of posts by people surprised when
| -ffast-math breaks their code (no insult to the pspdfkit folks
| intended).
|
| If it were a harmless flag you'd think it would be enabled by
| default. That's a clue that you should look before you leap.
|
| We have a few files in our code base that compile with -ffast-
| math but the rest don't. Those files were written with that flag
| in mind.
| asveikau wrote:
| People sometimes have unreasonable faith in compilers or
| libraries. It's a common adage to look at bugs in application
| code before suspecting these lower layers. That is _not_ the
| same thing as the lower layers never producing unexpected
| results. So when people get evidence that a compiler feature
| may be rough around the edges, they resist it.
| Sharlin wrote:
| No, it's easy to assume (even after reading the documentation)
| that -ffast-math just makes some calculations less precise or
| otherwise not strictly IEEE 754 compliant. That would be plenty
| reason enough not to enable it by default, but make it
| available for those who desire maximum performance at the
| expense of precision (or accuracy), as may be a perfectly
| reasonable tradeoff in cases like computer graphics. It's very
| understandable that people are surprised by the fact that
| -ffast-math can actually _break_ code in unintuitive ways.
| Symmetry wrote:
| I sort of assume the main benefit of assuming no NaNs is being
| able to replace `a == a` with `true`?
|
| The handling of isnan is certainly a big question. I can see
| wanting to respect that, but I can also see littering your code
| with assertions that isnan is false and compiling with normal
| optimizations and then hoping that later recompiling with fast-
| math and all its attendant "fun, safe optimizations" will let you
| avoid any performance penalty for all those asserts.
| jcranmer wrote:
| There's also several cases where handling NaNs "correctly" may
| require a fair amount of extra work (C99's behavior for complex
| multiplication is far more complicated when NaNs are involved).
| assbuttbuttass wrote:
| > To my surprise, there were no measurable differences outside of
| the standard deviation we already see when repeating tests
| multiple times.
|
| This should be the main takeaway. Don't enable ffast-math if
| floating point calculations aren't a bottleneck, and especially
| not if you don't understand all the other optimizations it
| enables.
| nickelpro wrote:
| Counter argument: Always enable fast-math unless your
| application has a demonstrable need for deterministically
| consistent floating point math. If you're just trying to get a
| "good enough" calculation (which is the vast majority of
| floating point work, physics and rendering calculations for 3D
| graphics), there's no reason to leave the performance on the
| floor.
| nemetroid wrote:
| > there's no reason to leave the performance on the floor.
|
| As the article demonstrates, there is: with ffast-math,
| floating point subtly behaves in ways that don't match the
| ways you've been taught it behaves.
| a_e_k wrote:
| This is the approach that I take! (Note: I write graphics and
| rendering code for a living. "Good enough" for me tends to
| mean quantizing to either identical pixels at the display bit
| depth or at least perceptually identical pixels. Also, I
| usually do see a measurable performance benefit to -Ofast
| over -O3 or -O2. YMMV.)
|
| Just like cranking the warning levels as high as I can at the
| beginning of a project, I also like to build and test with
| -ffast-math (really -Ofast) from the very beginning. Keeping
| it warning-free and working under -ffast-math as I go is a
| lot easier than trying to do it all at once later!
|
| And much like the warnings, I find that any new code that
| fails under -ffast-math tends to be a bit suspect. I've found
| stuff that tends to break under -ffast-math will also
| frequently break with a different compiler or on a different
| hardware architecture. So -ffast-math is a nice canary for
| that.
| josefx wrote:
| > Always enable fast-math unless your application has a
| demonstrable need for deterministically consistent floating
| point math.
|
| As far as I remember fast math also breaks things like NaN
| and infinity, which makes filtering out invalid values before
| they hit something important fun since they will obviously
| still exist and mess up your results but you can no longer
| check for them.
| chpatrick wrote:
| Unless you upgrade your compiler and now your climate model
| produces different results... I think "good enough" is
| actually pretty rare unless it's for games or something.
| dahart wrote:
| I've long wondered if changing the name to something like
| -ffast_inaccurate_math might help stem the tide of surprises and
| mistakes. Having only "fast" in the name makes it sounds like a
| good thing, rather than a tradeoff to consider.
___________________________________________________________________
(page generated 2021-12-09 23:01 UTC)