hngopher.com

       [HN Gopher] Optimizations Enabled by -ffast-Math
       ___________________________________________________________________
        
       Optimizations Enabled by -ffast-Math
        
       Author : bwidlar
       Score  : 200 points
       Date   : 2021-10-20 05:24 UTC (17 hours ago)
        
 (HTM) web link (kristerw.github.io)
 (TXT) w3m dump (kristerw.github.io)
        
       | zarazas wrote:
       | 1+1 is 2. Quick math
        
       | tcpekin wrote:
       | Why is this compiler optimization beneficial with -ffinite-math-
       | only and -fno-signed-integers?
       | 
       | From                   if (x > y) {           do_something();
       | } else {           do_something_else();         }
       | 
       | to the form                   if (x <= y) {
       | do_something_else();         } else {           do_something();
       | }
       | 
       | What happens when x or y are NaN?
        
         | zro wrote:
         | > NaN is unordered: it is not equal to, greater than, or less
         | than anything, including itself. x == x is false if the value
         | of x is NaN [0]
         | 
         | My read of this is that comparisons involving NaN on either
         | side always evaluate to false.
         | 
         | In the first one if X or Y is NaN then you'll get
         | do_something_else, and in the second one you'll get
         | do_something.
         | 
         | As far as why one order would be more optimal than the other,
         | I'm not sure. Maybe something to do with branch prediction?
         | 
         | [0]
         | https://www.gnu.org/software/libc/manual/html_node/Infinity-...
        
         | progbits wrote:
         | It is always false. In the first block do_something_else() gets
         | executed when either is NaN, in the second it is
         | do_something().
        
       | ISL wrote:
       | Fascinating to learn that something labelled 'unsafe' stands a
       | reasonable chance of making a superior mathematical/arithmetic
       | choice (even if it doesn't match on exactly to the unoptimized FP
       | result).
       | 
       | If you're taking the ratio of sine and cosine, I'll bet that most
       | of the time you're better off with tangent....
        
       | gigatexal wrote:
       | I am not a C dev but this was a really fascinating blog post and
       | how compilers optimize things.
        
         | superjan wrote:
         | Excellent post I agree, but isn't it more about how they can't
         | optimize? Compilers can seem magic how they optimize integer
         | math but this is a great explanation why not to rely on the
         | compiler if you want fast floating point code.
        
       | andraz wrote:
       | The question is how to enable all this in Rust... Right now
       | there's no simple way to just tell Rust to be fast & loose with
       | floats.
        
         | varajelle wrote:
         | Rust is not really a fast & loose language where anything can
         | be UB.
         | 
         | At least it doesn't need the errno ones since rust don't use
         | errno.
        
         | kzrdude wrote:
         | It must be clearly understood which of the flags are entirely
         | safe and which need to be under 'unsafe', as a prerequisite.
         | 
         | Blanket flags for the whole program do not fit very well with
         | Rust, while point use of these flags is inconvenient or needs
         | new syntax.. but there are discussions about these topics.
        
           | atoav wrote:
           | Also maybe you woild like to have more granular control over
           | which parts of your program has the priority on speed and
           | which part favours accuracy. Maybe this could be done with a
           | seperate type (e.g. ff64) or a decorator (which would be
           | useful if you want to enable this for someone elses library).
        
             | andraz wrote:
             | Maybe... or maybe there would be just a flag to enable all
             | this and be fine with consequences.
        
               | mjburgess wrote:
               | ..have you used Rust?
               | 
               | > just a flag to enable all this and be fine with
               | consequences
               | 
               | Is probably the opposite of what the language is about.
        
               | wongarsu wrote:
               | Reasoning about the consequences along a couple functions
               | in your hot path is one thing. Reasoning about the
               | consequences in your entire codebase and all libraries is
               | quite another.
        
         | jamesmishra wrote:
         | I don't know if most Rust programmers would be happy with any
         | fast and loose features making it into the official Rust
         | compiler.
         | 
         | Besides, algorithms that benefit from -ffast-math can be
         | implemented in C and with Rust bindings automatically
         | generated.
         | 
         | This solution isn't exactly "simple", but it could help
         | projects keep track of the expectations of correctness between
         | different algorithm implementations.
        
           | vlovich123 wrote:
           | Rust should have native ways to express this stuff. You don't
           | want the answer to be "write this part of your problem domain
           | in C because we can't do it in Rust".
        
         | pornel wrote:
         | There are fast float intrinsics: https://doc.rust-
         | lang.org/std/intrinsics/fn.fadd_fast.html
         | 
         | but better support dies in endless bikeshed of:
         | 
         | * People imagine enabling fast float by "scope", but there's no
         | coherent way to specify that when it can mix with closures
         | (even across crates) and math operators expand to std functions
         | defined in a different scope.
         | 
         | * Type-based float config could work, but any proposal of just
         | "fastfloat32" grows into a HomerMobile of "Float<NaN=false,
         | Inf=Maybe, NegZeroEqualsZero=OnFullMoonOnly, etc.>"
         | 
         | * Rust doesn't want to allow UB in safe code, and LLVM will cry
         | UB if it sees Inf or Nan, but nobody wants compiler inserting
         | div != 0 checks.
         | 
         | * You could wrap existing fast intrinsics in a newtype, except
         | newtypes don't support literals, and <insert user-defined
         | literals bikeshed here>
        
         | pjmlp wrote:
         | The whole point of ALGOL derived languages for systems
         | programming, is not being fast & loose with anything, unless
         | the seatbelt and helmet are on as well.
        
       | ptidhomme wrote:
       | Do some applications actually use denormal floats ? I'm curious.
        
         | im3w1l wrote:
         | Accidentally trigger that path now and then? Probably. Depends
         | on subnormals to give a reasonable result? Probably rare. But
         | what do I know...
        
         | adrian_b wrote:
         | Denormal floats are not a purpose, they are not used
         | intentionally.
         | 
         | When the CPU generates denormal floats on underflow, that
         | ensures that underflow does not matter, because the errors
         | remain the same as at any other floating-point operation.
         | 
         | Without denormal floats, underflow must be an exception
         | condition that must be handled somehow by the program, because
         | otherwise the computation errors can be much higher than
         | expected.
         | 
         | Enabling flush-to-zero instead is an optimization of the same
         | kind as ignoring integer overflow.
         | 
         | It can be harmless in a game or graphic application where a
         | human will not care if the displayed image has some errors, but
         | it can be catastrophic in a simulation program that is expected
         | to provide reliable results.
         | 
         | Providing a flush-to-zero option is a lazy solution for CPU
         | designers, because there have been many examples in the past of
         | CPU designs where denormal numbers were handled without
         | penalties for the user (but of course with a slightly higher
         | cost for the manufacturer) so there was no need for a flush-to-
         | zero option.
        
           | ptidhomme wrote:
           | Thanks for this explanation. But yeah, I meant : are there
           | applications where use of denormals vs. flush-to-zero is
           | actually useful... ? If your variable will be nearing zero,
           | and e.g. you use it as a divider, don't you need to handle a
           | special case anyway ? Just like you should handle your
           | integer overflows.
           | 
           | I'm seeing denormals as an extension of the floating point
           | range, but with a tradeoff that's not worth it. Maybe I got
           | it wrong ?
        
             | amadvance wrote:
             | Denormals give the guarantee that if a != b, then a - b !=
             | 0.
             | 
             | Without denormals a check before a division may fail to
             | detect a division by 0.
        
             | marcan_42 wrote:
             | I agree with you; if your code "works" with denormals and
             | "doesn't work" with flush-to-zero, then it's already broken
             | and likely doesn't work for some inputs anyway, you just
             | haven't run into it yet.
        
             | adrian_b wrote:
             | The use of denormals is always useful except when you want
             | to make your program faster no matter what and your target
             | CPU happens to be a CPU where operations with denormals are
             | slower than operations with other numbers.
             | 
             | You must keep in mind that if you enable flush-to-zero and
             | you really see an increase in speed, that means that you
             | have now injected errors in your FP computations. Whether
             | the errors matter or not, that depends on the application.
             | 
             | If you do not see any serious speed improvement, then you
             | have no reason to enable flush-to-zero. Even when you might
             | want to use flush-to-zero in a production binary, you
             | should not use it during development.
             | 
             | Prior to enabling flush-to-zero it would be good to enable
             | trap-on-underflow instead of denormals, to see if this
             | really happens frequently enough to matter for performance
             | and possibly to investigate the reason, to see if underflow
             | could be avoided in other ways.
             | 
             | In conclusion using denormals is always useful, because it
             | limits the errors accumulated during FP computation. On
             | some CPUs you may get higher speed by not using denormals,
             | but only if large errors do not matter. If you avoid
             | underflows by other means, e.g. range checking of inputs,
             | then it does not matter how slow the operations with
             | denormals are, so there is no reason to use any option that
             | enables flush-to-zero (e.g. -ffast-math).
             | 
             | Making a FP computation with "double" numbers and enabling
             | flush-to-zero on that would usually be quite stupid,
             | because if the errors do not matter, then that is a task
             | for single-precision "float" (unless you need "double" only
             | for the exponent range, not for precision).
        
           | marcan_42 wrote:
           | Flushing denormals to zero only matters if your calculations
           | are _already_ running into the lower end of floating-point
           | exponents (and even with denormals, if they 're doing that,
           | they're going to run into lost precision anyway sooner or
           | later).
           | 
           | The useful thing denormals do is make the loss of precision
           | at that point gradual, instead of sudden. But you're still
           | losing precision, and a few orders of magnitude later you're
           | going to wind up at 0 either way.
           | 
           | If your formulas are producing intermediate results with
           | values that run into the lower end of FP range, _and_ it
           | matters that those values retain precision there, then you
           | 're either using the wrong FP type or you're using the wrong
           | formulas. Your code is likely already broken, the breakage is
           | just rarer than in flush-to-zero mode.
           | 
           | So just enable FTZ mode, and if you run into issues, you need
           | to fix that code (e.g. don't divide by values that can be too
           | close to 0), not switch denormals on.
        
             | adrian_b wrote:
             | Your arguments are correct, but the conclusion does not
             | result from them.
             | 
             | If we assume that underflows happen in your program and
             | this, as you say, is a sign that greater problems will be
             | caused by that, then you must not enable flush-to-zero, but
             | you must enable trap-on-underflow, to see where underflows
             | happen and to investigate the reason and maybe rearrange
             | your formulas to avoid the too small results.
             | 
             | Flush-to-zero may sometimes lead to crashes, when it
             | becomes obvious that something is wrong, but more often you
             | just get some results with large errors that are difficult
             | to distinguish from good results.
             | 
             | Opinions obviously vary, but I have never seen any good use
             | case for flush-to-zero.
             | 
             | Underflows are either so rare that their performance does
             | not matter, or if they are so frequent that flush-to-zero
             | will increase the speed, then your code has a problem and
             | the formulas should be restructured, which will both
             | increase the speed and eliminate the loss of precision.
        
               | stephencanon wrote:
               | > Opinions obviously vary, but I have never seen any good
               | use case for flush-to-zero.
               | 
               | The classical use case is real-time audio, where an IIR
               | filter may have quite slow decay, such that the signal
               | stays in the subnormal regime for many samples. If this
               | happens and your hardware has significant stalls for
               | subnormal data, you may miss your real-time deadline
               | resulting in clicking or other audio corruption.
        
               | adrian_b wrote:
               | I agree that this is a good example and there are
               | probably similar cases in video processing and in
               | graphics rendering.
               | 
               | However not all CPUs stall for subnormal data and if the
               | CPU designers would have ensured that none of them stall,
               | no such discussions would have been ever needed.
               | 
               | Outside such special cases like real-time DSP and
               | graphics, using flush-to-zero or options like -Ofast or
               | -ffast-math are seldom justified and they can be
               | dangerous, especially when they are used without a prior
               | analysis and tests to determine whether exception
               | conditions really appear when the program is run, and
               | why.
        
               | stephencanon wrote:
               | > However not all CPUs stall for subnormal data and if
               | the CPU designers would have ensured that none of them
               | stall, no such discussions would have been ever needed.
               | 
               | I don't really disagree (I've pushed hard for no
               | subnormal stalls for years), but there are some timing
               | and area tradeoffs that come with this that make it a
               | somewhat hard sell for FPUs that aren't built entirely
               | around FMA (there are tradeoffs even for designs that are
               | purely FMA based, but they're a lot smaller).
               | 
               | Mainstream x86 designs _still_ have some subnormal
               | stalls!
        
               | nyanpasu64 wrote:
               | > Outside such special cases like real-time DSP and
               | graphics
               | 
               | Personally I see audio and graphics as the common case,
               | and scientific computing as the special case ;) Do game
               | physics engines expect denormals to be preserved?
        
         | enriquto wrote:
         | It's not that you decide specifically to use them, but
         | sometimes they sort of happen very naturally. For example, if
         | you evaluate a Gaussian function f(x)=e^(-x^2), for moderate
         | values of x the value f(x) is denormal.
        
         | PaulDavisThe1st wrote:
         | If you want maximum possible accuracy for FP computation, you
         | need to pay the performance price that denormals incur. In many
         | cases, that will likely be a long running computation where
         | responsiveness from a user-perspective and/or meeting realtime
         | deadlines is not an issue. Any time "I want the best possible
         | answer that we can compute", you should leave denormals
         | enabled.
         | 
         | By contrast, in media software (audio and/or video) denormals
         | getting flushed to zero is frequently necessary to meet
         | performance goals, and ultimately has no impact on the result.
         | It's quite common for several audio FX (notably reverb) to
         | generate denormals as the audio fades out, but those values are
         | so far below the physical noise floor that it makes no
         | difference to flush them to zero.
        
       | rocqua wrote:
       | I found the following note for -ffinite-math-only and -fno-
       | signed-zeros quite worrying:
       | 
       | The program may behave in strange ways (such as not evaluating
       | either the true or false part of an if-statement) if calculations
       | produce Inf, NaN, or -0.0 when these flags are used.
       | 
       | I always thought that -ffast-math was telling the compiler. "I do
       | not care about floating point standards compliance, and I do not
       | rely on it. So optimize things that break the standard".
       | 
       | But instead it seems like this also implies a promise to the
       | compiler. A promise that you will not produce Inf, NaN, or -0.0.
       | Whilst especially Inf and NaN can be hard to exclude.
       | 
       | This changes the flag from saying "don't care about standards,
       | make it fast" to "I hereby guarantee this code meets a stricter
       | standard" where also it becomes quite hard to actually deduce you
       | will meet this standard. Especially if you want to actually keep
       | your performance. Because if you need to start putting all
       | divisions in if statements to prevent getting Inf or NaN, that is
       | a huge performance penalty.
        
         | 0xTJ wrote:
         | That's the way a number of those special flags work. They are a
         | contract to the compiler saying "act like this can't happen, I
         | promise it won't".
         | 
         | The picking and choosing specific flags enable might be better
         | in general. The flag `-funsafe-math-optimizations` seems to be
         | useful as general-purpose, although as this article mentions,
         | some of those optimizations need the more unsafe ones to become
         | active.
         | 
         | I would say that `-ffast-math` is a dangerously safe-sounding
         | flag, `-funsafe-fast-math` would be better IMO.
        
         | dspillett wrote:
         | _> I always thought that -ffast-math was telling the compiler.
         | "I do not care about floating point standards compliance, ...
         | So optimize things that break the standard". But instead it
         | seems like this also implies a promise to the compiler [that]
         | you will not produce Inf, NaN, or -0.0._
         | 
         | I see it as both, as with many such flags1n though as you say
         | it does put the onus on the developer to avoid inputs the
         | optimisation may cause problems for. You are saying "I promise
         | that I'm being careful not to do things that will be affected
         | by these parts of the standard, or won't blame you for
         | undefined behaviour if I do, so you can skip those rules if it
         | helps you speed things up".
         | 
         | [1] like those that enable optimisations which can cause
         | significant breakage if pointer aliasing is happening
        
         | okl wrote:
         | > A promise that you will not produce Inf, NaN, or -0.0. Whilst
         | especially Inf and NaN can be hard to exclude.
         | 
         | Cumbersome but not that difficult: Range-check all input
         | values.
        
           | contravariant wrote:
           | And don't do any division.
           | 
           | Easy.
        
             | okl wrote:
             | There's nothing wrong with divisions if you exclude
             | problematic combinations of input values.
        
               | contravariant wrote:
               | You're going to have to do some intermediate checks if
               | you want stuff like x/sin(x) to behave.
        
             | scheme271 wrote:
             | Or multiplications or additions to avoid overflows or
             | denormals
        
           | orangepanda wrote:
           | As the compiler assumes you wont produce such values, wouldnt
           | it also optimise those range checks away?
        
             | nly wrote:
             | He means simple stuff like ensuring a function like
             | float         unit_price (float volume, int count) {
             | return volume / count;         }
             | 
             | is only called where count > 0
        
               | colejohnson66 wrote:
               | But couldn't the compiler optimize out the check because
               | you're guaranteeing to never divide by 0, so logically
               | count would never be 0. Akin to undefined behavior
               | almost.
        
               | adrian_b wrote:
               | What the compiler assumes depends on the compilation
               | options.
               | 
               | Unfortunately most compilers have too permissive default
               | options.
               | 
               | C and C++ programs should always be compiled, at least
               | during development, with strict options, e.g.:
               | 
               | -fsanitize=undefined -fsanitize-undefined-trap-on-error
               | 
               | or equivalent options, which eliminate all the horror
               | stories with unexpected behavior caused by integer
               | divide-by-zero or overflow, pointer overflow, out-of-
               | bounds access and many others.
               | 
               | Because unlike integer operations the FP operations are
               | standardized, there is no need for compiler options, the
               | behavior for exceptions can be controlled from the
               | program with the functions defined in <fenv.h> in C or
               | <cfenv> in C++.
        
               | throwawaylinux wrote:
               | This test can not be optimized to always true:
               | if (count > 0)             total += unit_price(volume,
               | count);
               | 
               | Are you confusing it with the opposite scenario:
               | total += unit_price(volume, count);         if (count ==
               | 0)             printf("oops\n");
               | 
               | That test might be optimized out because divide by zero
               | is undefined behavior.
        
               | colejohnson66 wrote:
               | I probably am confusing them. I don't program in C or
               | C++, so I'm not aware of all the "gotchas", and can
               | easily confuse them (as shown here). Not to mention that
               | compilers have been getting more aggressive in optimizing
               | undefined behavior.
        
             | jcelerier wrote:
             | At least GCC is I believe fairly buggy in that regard since
             | isnan is optimised out, thus you cannot check if say an
             | incoming value read from the network is a NaN. There's some
             | lengthy discussion about it on the big tracker iirc..
        
               | jcelerier wrote:
               | I was misremembering, was thinking about this recent
               | convo on LLVM mailinglist:
               | https://lists.llvm.org/pipermail/llvm-
               | dev/2021-September/152...
        
               | usefulcat wrote:
               | I don't think you are misremembering; -ffast-math causes
               | __builtin_isnan and __builtin_isinf to unconditionally
               | return zero, for both gcc and clang:
               | 
               | https://godbolt.org/z/1qjxvW3KK
               | 
               | I found this out the hard way several years ago and still
               | have my own implementations of these functions laying
               | about in case I want to use -ffast-math.
        
           | fho wrote:
           | That's where more advanced type systems shine. You get to
           | encode things like "this number is not null" and a safe
           | division function will only take "not nulls" as the input.
           | 
           | The workflow would then be:                   1. prove that
           | the input is not null (if input == 0) yielding a "not null
           | number"         2. running your calculation (no check needed
           | for multiplication or division, if you add something you have
           | to reprove that the number is not null)
        
           | benttoothpaste wrote:
           | The checks could be expensive though, more expensive than the
           | gains from this optimization. Often it is better to let the
           | calculation proceed and use isnan to check the final result.
        
         | heavenlyblue wrote:
         | > such as not evaluating either the true or false part of an
         | if-statement
         | 
         | Do you mean `if-else if` or `if-else`? Because the second would
         | work and would always short-circuit to else.
        
           | Linosaurus wrote:
           | I imagine it means that the compiler could turn an
           | if(x<6)-else into two free standing if statement like
           | if(x<6), if(x>=6). Which will logically work under these
           | assumptions.
        
         | adrian_b wrote:
         | The whole rationale for the standard for floating-point
         | operations is to specify the FP operations with such properties
         | that a naive programmer will be able to write programs which
         | will behave as expected.
         | 
         | If you choose any option that is not compliant with the
         | standard, that means, exactly as you have noticed, that you
         | claim that you are an expert in FP computations and you know
         | how to write FP programs that will give the desired results
         | even with FP operations that can behave in strange ways.
         | 
         | So yes, that means that you become responsible to either
         | guarantee that erroneous results do not matter or that you will
         | take care to always check the ranges of input operands, as
         | "okl" has already posted, to ensure that no overflows,
         | underflows or undefined operations will happen.
        
           | celrod wrote:
           | > So yes, that means that you become responsible to either
           | guarantee that erroneous results do not matter or that you
           | will take care to always check the ranges of input operands,
           | as "okl" has already posted, to ensure that no overflows,
           | underflows or undefined operations will happen.
           | 
           | This is why I dislike the fact that it removes simple methods
           | for checking if values are NaN, for example. I do not find it
           | ergonomic to disable these checks.
        
           | billfruit wrote:
           | Is it adviced to detect overflow and do something about it
           | after an FP computation rather than adding pre-checks to try
           | to avert the overflow?
        
             | KarlKemp wrote:
             | The only obvious way to test if some operation is going to
             | produce some value(s) is to perform that operation. Yes,
             | for, say, a single multiplication I can anticipate the
             | necessary bounds for each input value. But even short
             | sequences of operations would quickly devolve into
             | labyrinthine nested switch statements.
        
             | owlbite wrote:
             | So it depends on how you want to detect overflow.
             | 
             | Trapping on math exceptions or otherwise pulling an
             | emergency break doesn't really fit with modern hardware
             | design if you want performance (for one thing, what do you
             | do with the rest of your vector? do you just live with a
             | pipeline flush?).
             | 
             | Adding targeted checks used sparingly can work, but
             | probably requires a deeper understanding of the underlying
             | numerical analysis of your problem. Generally just checking
             | for NaN or Inf at the end is a better solution as these are
             | in most cases absorbing states.
        
           | owlbite wrote:
           | Rename to: -fdisable-math-safeties
        
         | gpderetta wrote:
         | I think the documentation is fairly clear:
         | 
         | "Allow optimizations for floating-point arithmetic that assume
         | that arguments and results are not NaNs or +-Infs."
         | 
         | The obvious implication is that is that arguments and results
         | are always finite and as a programmer you are responsible of
         | guaranteeing the correct preconditions.
         | 
         | Certainly do not use finite-math-only when dealing with
         | external data.
        
           | hurrrrrrrr wrote:
           | >Certainly do not use finite-math-only when dealing with
           | external data.
           | 
           | so... never?
        
             | gpderetta wrote:
             | let me rephrase: data which you cannot guarantee meet the
             | preconditions. Either the input is known good data or you
             | sanitize it through some code path not compiled with fast-
             | math.
        
         | praptak wrote:
         | Not only math optimizations are like this. Strict aliasing is a
         | promise to the compiler to not make two pointers of different
         | types point to the same address.
         | 
         | You technically make that promise by writing in C (except a
         | narrow set of allowed conversions) but most compilers on
         | standard settings do let you get away with it.
        
         | phkahler wrote:
         | I thought the idea was to never produce NaN or -0 in the first
         | place, but it seems that's not the case.
        
         | pinteresting wrote:
         | Generally if you're seeing a NaN/Inf something has gone wrong,
         | It's very difficult to gracefully recover from and if you tried
         | I think you would lose both sanity and performance!
         | 
         | Regarding performance, the cost of a real division is about 3-4
         | orders worse performance than an if statement that is very
         | consistent, but the usual way is to have fast/safe versions of
         | functions, where you need performance and can deduce if
         | something is always/never true, create/use the fast function
         | but by default everything uses the slower/safer function.
        
           | nemetroid wrote:
           | > Generally if you're seeing a NaN/Inf something has gone
           | wrong
           | 
           | That's a bold claim.
        
             | xioxox wrote:
             | As a scientist, it depends on what you mean by wrong. It's
             | nice to push an array of numbers through some array
             | operation. If some of the outputs look like NaN or Inf then
             | that tells me something went wrong in the operation and to
             | take a closer look. If some optimizer was told that NaN or
             | Inf couldn't happen, meaning that they wouldn't be
             | generated, then some of this output could be pure nonsense
             | and I wouldn't notice. As NaNs propagate through the
             | calculation they're extremely helpful to indicate something
             | went wrong somewhere in the call chain.
             | 
             | NaN is also very useful itself as a data missing value in
             | an array. If you have some regularly sampled data, with
             | some items missing, it makes a lot of things easier to
             | populate the holes with values which guarantee they won't
             | produce misleading output.
        
             | jes wrote:
             | What makes it a bold claim?
        
               | djmips wrote:
               | Personally, I thought they were being sarcastic.
        
             | LeanderK wrote:
             | it's true in my experience. NaN/Inf was always a bug and
             | resulted in needing to rewrite the calculation
        
             | pinteresting wrote:
             | Can you think of a function where the input is valid, the
             | output is NaN and nothing has gone wrong in the process?
             | 
             | I can't think of any, haven't experienced any, not heard of
             | any examples of it, so you're welcome to break my ignorance
             | on the subject.
        
               | jleahy wrote:
               | Calculating the average value for some observation in a
               | time bucket, where some buckets may have no observation
               | (resulting in 0/0=nan), finally taking some kind of
               | summary of these ignoring nan values (there is a whole
               | library of functions for this in numpy for example).
               | 
               | NaN and Inf are extremely valuable and useful encodings
               | to represent real things. Particularly signed infinity is
               | wonderful, as functions like exp behave correctly for inf
               | (eg. exp(-inf)=0).
        
               | noctune wrote:
               | >Calculating the average value for some observation in a
               | time bucket, where some buckets may have no observation
               | (resulting in 0/0=nan)
               | 
               | But there are many ways such a calculation could result
               | in NaN besides a division. For example, intermediate sums
               | could result in something like inf - inf. I find it rare
               | that you can definitely conclude that a NaN means some
               | specific thing.
        
               | ChrisLomont wrote:
               | There are algorithms that behave correctly but require
               | intermediate Inf or NaN to work. If I recall there is one
               | related to finding bounding box intersections in ray
               | tracing. I know when I introduce such code into a
               | codebase I put a large "DO NOT CHANGE THIS..." style
               | comment with an explanation so some enterprising
               | programmer doesn't put some if statements in to remove
               | what they think is an error, when it is not.
               | 
               | I'd suspect the same is true for NaN. These things were
               | introduced for reasons. If you don't have a deep
               | knowledge of why, don't assume everyone else is the same.
               | 
               | Edit: I found examples using a NaN and Inf intermediates
               | to get a large speed increase [1,2].
               | 
               | [1] https://tavianator.com/2015/ray_box_nan.html
               | 
               | [2] https://tavianator.com/2011/ray_box.html
        
               | dkersten wrote:
               | Sure, of course they have uses otherwise why would they
               | exist. But I would argue that the examples you have
               | aren't the common case -- most people's floating point
               | calculations do not rely on these special values and
               | seeing them denotes something went wrong.
        
               | ChrisLomont wrote:
               | Of course they're not the common case, but the parent
               | wanted examples where they are used in calculations that
               | do have nice answers.
               | 
               | So they are useful. They are more useful than most people
               | realize that simply assume they only mark errors.
               | 
               | And without them in your case, you would not be alerted
               | that a calculation went wrong. So even there they are
               | very useful. Not using them correctly is like ignoring
               | file API errors in your programs - sure they are rare,
               | but you need to understand and handle them if you want to
               | make good software.
        
               | dkersten wrote:
               | Ah right, fair enough! They are, indeed, good examples of
               | the usefulness.
        
               | ImprobableTruth wrote:
               | Do you count under/overflow resulting in Inf as
               | 'something has gone wrong'? If so, why would gracefully
               | recovering be hard?
        
               | vanviegen wrote:
               | Because you're about 15 stack frames deep into some
               | calculation when the problem gets noticed. So how do you
               | handle this? You could throw an exception (or similar),
               | which would allow the UI to state something semi-helpful
               | like 'Calculation failed'. But recover? Usually not.
               | 
               | I think NaN and inf can be really useful in specific
               | cases, but they probably shouldn't be allowed to
               | propagate through your program, and shouldn't occur
               | unless you're specifically making use of them.
        
               | adrian_b wrote:
               | When you do not want such values to be propagated, it is
               | always possible to choose to have overflow exceptions
               | instead of infinities and undefined operation exceptions
               | instead of NaNs. Also underflow exceptions instead of
               | denormals.
               | 
               | The programmer should choose which is most appropriate
               | for an application. Choosing to both ignore the
               | exceptions and not generate a special value that can be
               | examined later, that is seldom acceptable.
        
               | MaulingMonkey wrote:
               | JavaScript's parseInt and parseFloat return NaN on
               | something as benign as "well, that wasn't a number".
               | var jsonish = "...";         var number =
               | parseFloat(jsonish);         if (!isNaN(number)) {
               | // deserialized a number         } else {             //
               | ...try deserializing as a boolean or string next...
               | }
               | 
               | Fast math optimizations can break code like this by
               | breaking isNaN.
               | 
               | I was porting a C++ project to a certain platform - and
               | that platform enabled a -ffast-math equivalent by default
               | in Release (but not Debug) builds! This broke duktape, a
               | JS engine said project embedded, in some nasty and subtle
               | ways. Instead of storing a number/pointer/??? (8 bytes) +
               | type tag (4? bytes) for each dynamically typed JS value,
               | duktape can bit-pack values into a single 8 byte "double"
               | value by storing object/string handles as NaN values -
               | this isn't an uncommon trick for dynamically typed
               | scripting stuff:
               | 
               | https://github.com/svaarala/duktape/blob/c3722054ea4a4e50
               | f48...
               | 
               | Naturally, the -ffast-math equivalent broke isNaN checks,
               | which caused random object/string handles to be
               | mistakenly reinterpreted as "numbers" - but only in
               | Release builds, for this one particular platform, in one
               | rarely taken branch, so neither QA nor CI caught it,
               | leading to hours of manufacturing a repro case, stepping
               | through an absurd amount of code, and then finally
               | looking at the default build rules and facepalming.
               | 
               | Cursing the platform vendor under my breath, I overrode
               | the defaults to align with the defaults of every other
               | config x platform combination we already had: no fast
               | math. If you want those optimizations, use SSE-friendly
               | NaN-avoiding intrinsics - or, if you _must_ use the
               | compiler flags, ensure you do so consistently across
               | build configs and platforms, perhaps limited to a few TUs
               | or modules if possible. This allows you to have a chance
               | at using your Debug builds to debug the resulting
               | "optimizations".
        
               | jacobolus wrote:
               | This depends how you define "gone wrong". In evaluating
               | rational functions (a useful tool for approximating all
               | sorts of other functions), one efficient and well behaved
               | algorithm is the "barycentric formula", based on
               | interpolating function values at various specific input
               | points. When the input is one of those points directly,
               | this formula results in Inf / Inf = NaN. Evaluation code
               | needs to check for this case, and then replace the output
               | by the appropriate value for the input.
               | 
               | +-Inf is a very natural output for many kinds of
               | numerical codes.
        
               | jlokier wrote:
               | > When the input is one of those points directly,
               | 
               | That's against one of the rules of well-behaved floating
               | point programming: Never test for equality.
        
               | gjm11 wrote:
               | It's worth being a bit more explicit about this.
               | 
               | The appearance of NaNs doesn't result from breaking the
               | "never test for equality" rule. It happens when the point
               | where you're evaluating the function _just happens to_
               | exactly match one of the interpolation points.
               | 
               | But if you decide to deal with the NaN issue by (1)
               | testing whether your input exactly matches any of the
               | interpolation points, or (2) testing the result for NaN,
               | then you're testing for equality or doing something very
               | similar, and then you may have one of the problems the
               | "never test for equality" is meant to prevent: if bad
               | stuff happens when x _exactly_ equals one of the x_j,
               | then perhaps less-obvious bad stuff happens when x
               | _almost exactly_ equals one of the x_j, and an equality
               | test won 't catch that.
               | 
               | So, does it? Actually, I think not. Suppose x = xj+h with
               | h very small. Then your interpolant is (h times something
               | of normal size) times (1/h times something of normal size
               | + smaller terms) and although one of those is very small
               | and the other very large, ordinary floating-point
               | arithmetic doesn't care about that. There's no
               | catastrophic cancellation or anything. And floating point
               | has much more dynamic range than precision, so to speak,
               | so until h literally reaches zero you aren't running into
               | trouble. (I guess there might be ways of organizing the
               | calculation that _do_ give you catastrophic cancellation,
               | but if you do it the obvious way then that doesn 't
               | happen.)
               | 
               | So in this particular case you really could just test for
               | equality or test for NaNs. I think.
               | 
               | (For some interesting investigation of other aspects of
               | the numerical stability of this kind of interpolation,
               | see https://people.maths.ox.ac.uk/trefethen/publication/P
               | DF/2011.... But note that the authors of this paper are
               | using the formula for _extrapolation_ or, as they prefer
               | to call it,  "numerical analytic continuation".)
        
               | jacobolus wrote:
               | > _perhaps less-obvious bad stuff happens when x almost
               | exactly equals one of the x_j, and an equality test won
               | 't catch that. So, does it? Actually, I think not._
               | 
               | You are correct, it does not. Quite the opposite: the
               | results are excellent when x doesn't quite equal one of
               | the interpolation nodes, because you end up with a value
               | of S * y / S, where S = <<very big but not close to
               | overflowing float>> and y = <<expected output value>>.
               | 
               | The only potentially problematic case is if S * y
               | overflows but S does not. But floating point numbers have
               | a ton of dynamic range, so unless you are looking at
               | points _really_ , _really_ close to zero, with very large
               | outputs (say, x - x_j = 1e-200, y = 1e+200), there is no
               | problem. And the failure case there is that you end up
               | with Inf instead of 1e200 as an output.
               | 
               | It isn't a problem in practice, and if you are worried
               | about it as a problem in theory, you can take care in
               | choice of interpolation nodes or do some scaling of the
               | output values.
        
               | jlokier wrote:
               | > There's no catastrophic cancellation or anything. And
               | floating point has much more dynamic range than
               | precision, so to speak, so until h literally reaches zero
               | you aren't running into trouble.
               | 
               | I just tried it to be sure, in C with "double", and
               | organized the calculation the way it's described in the
               | paper you cited, formula 2.1.
               | 
               | It failed due to catastrophic rational cancellation close
               | to an interpolation point, depending on the values
               | involved. An equality test was not enough.
               | 
               | Approaching the first interpolation point, before it
               | transitioned from the correct result, there were some
               | incorrect Inf results prior to it switching to Nan for an
               | input not equal to any interpolation point.
               | 
               | This was trivial 1d interpolation between two points,
               | where the points were (0.0 => 10.0, 1.0 => 20.0).
        
               | gjm11 wrote:
               | Hm, interesting. I also tried some numerical experiments
               | to sanity-check and I got reasonable results right up to
               | 1ulp away from the interpolation points. I tried again
               | using exactly your example -- f(x) = 10+10x,
               | interpolation points at 0 and 1 -- and again everything
               | seemed fine.
               | 
               | I wonder what we're doing differently, or whether we
               | happened to test at different points. Here's my code:
               | def interpolate1(f,x,xs,ws):             p,s = 1,0
               | for (xj,wj) in zip(xs,ws):                 p *= x-xj
               | s += wj*f(xj)/(x-xj)             return p*s
               | 
               | (it's "interpolate1" because I also implemented the other
               | version of the formula described in the article, which
               | for this purpose behaves very similarly).
               | 
               | For your example we want f(x) = 10+10x, xs=[0,1],
               | ws=[-1,1].
               | 
               | As I mentioned before, if you implement the calculation
               | of the polynomial called l(x) in the paper -- the thing
               | my code calls p -- in an unfortunate way, you can get
               | cancellation problems, but for this case I'm not sure I
               | see any way for that to happen.
               | 
               | I tried calculating l(x) in a different way that could
               | conceivably cause cancellation issues, aimed at this
               | particular interpolant:                   def foo(x):
               | p = x*x-x             s = -1*10/x + 1*20/(x-1)
               | return p*s
               | 
               | but this also gave reasonable results for every value of
               | x I tried other than exactly 0 and exactly 1.
               | 
               | What exact values of x are giving you what bad results?
               | What language are you implementing this in, with what
               | data types?*
        
               | jlokier wrote:
               | #include <stdio.h>       #include <math.h>
               | double interp(double x, double x1, double y1, double x2,
               | double y2)       {           double node_poly = (x - x1)
               | * (x - x2);           double w1 = 1.0 / (x1 - x2);
               | double w2 = 1.0 / (x2 - x1);           double y =
               | node_poly * ((w1 * y1) / (x - x1) + (w2 * y2) / (x -
               | x2));           return y;       }            int main()
               | {           for (int p = -300; p >= -330; p -= 1) {
               | double x = pow(10, p);               double y = interp(x,
               | 0, 10, 1, 20);               printf("x = %.6g, y =
               | %.6g\n", x, y);           }           return 0;       }
        
               | jacobolus wrote:
               | You implemented the wrong formula. You don't want to
               | explicitly calculate "node poly".
               | 
               | The formula you want is the "second barycentric formula",
               | (4.2) at the top of page 505 of
               | https://people.maths.ox.ac.uk/trefethen/barycentric.pdf
               | 
               | Or formula (2.6) halfway down page 2 of https://people.ma
               | ths.ox.ac.uk/trefethen/publication/PDF/2011...
               | 
               | Also cf. Higham (2004) "The numerical stability of
               | barycentric Lagrange interpolation" https://www.maths.man
               | chester.ac.uk/~higham/narep/narep440.pd...
               | 
               | If you want to implement the "first barycentric formula"
               | instead you need to take some additional care to avoid
               | overflow/underflow.
               | 
               | As the papers above note, the choice of interpolation
               | nodes makes a big difference. If you have an arbitrary
               | function to approximate on an arbitrary complex domain,
               | take a look at https://arxiv.org/abs/1612.00337
        
               | jlokier wrote:
               | Thanks for the pointer, and to the paper showing how to
               | calculate error bounds. Interesting paper.
        
               | gjm11 wrote:
               | There are advantages to using either version of the
               | formula. If you use the "first" then you do need to take
               | care about overflow/underflow when the number of
               | interpolation points is large, but that isn't an issue
               | here. The "first" behaves better than the "second" for
               | extrapolation, as discussed in the paper by Trefethen et
               | al. And yes, the choice of interpolation points is
               | important.
               | 
               | But none of that should make a difference here. What's
               | actually causing the dubious results jlokier reports
               | isn't exactly proximity to the interpolation points -- I
               | don't think it could happen if one of them weren't
               | exactly zero. What's happening is that the very smallest
               | values of x in jlokier's code are _denormal_ numbers;
               | that is, ones whose floating-point representation has
               | less precision than usual.
               | 
               | Usually a floating-point number looks like 1.<stuff> x
               | 2^n; for IEEE double precision, the <stuff> is 52 bits
               | long. So if you start with 1, 1/2, 1/4, 1/8, and keep
               | dividing by 2, you always have stuff=0 and n decreases by
               | 1 each time. But what should you do once you reach the
               | smallest value of n that will fit in the floating-point
               | format? One answer is just to say that after that you get
               | zero. But you can get a little extra dynamic range if
               | instead you say that when n is as small as it's allowed
               | to be, you can use 0.<stuff> instead of 1.<stuff>. These
               | numbers have reduced precision for the sake of extra
               | dynamic range, and they are called "denormal". Lots of
               | things go subtly (or not so subtly) wrong when dealing
               | with denormal numbers.
               | 
               | One thing that goes wrong is the interpolation formula
               | we're discussing here. The formula looks like f(x) ~=
               | l(x) . sum of wj f(xj) / (x-xj) where xj are the
               | interpolation points (in this case 0 and 1), l(x) is the
               | product of (x-xj), so in this case x(x-1), and wj =
               | product (xj-xi) where xi ranges over all the _other_
               | interpolation points besides xj, so w0 = -1 and w1 = +1.
               | So we have f(x) ~= x(x-1) . [-10 /x + 20/(x-1)]. In fact
               | this is _exact_ provided x is neither 0 or 1, because the
               | thing we 're approximating is a polynomial of degree <=
               | (# interpolation points) - 1.
               | 
               | So, what I claimed before was: suppose x = xj + h where h
               | is small, and suppose you compute l(x) as the product of
               | its factors. Then we get l(x) = h times product
               | (xj+h-xi), and the sum is wj f(xj) / h + the sum of the
               | other terms; and if h is tiny then the other terms in the
               | sum are tiny in comparison, and the factors of h and 1/h
               | cancel and everything's fine.
               | 
               | But that doesn't happen when xj=0 and x (and hence h) is
               | denormal -- because you can't pull the denormal trick at
               | the _large_ end of the range, so the reciprocal of a
               | denormal floating-point number is floating-point
               | infinity!
               | 
               | This isn't a cancellation problem. It isn't a matter of
               | the formula going unstable when x is very close to one of
               | the interpolation points. But it _is_ a problem, and it
               | never occurred to me until jlokier demonstrated it,
               | because denormals are kinda obscure and I didn 't think
               | to think about what might happen if they get involved. It
               | _would_ be avoided if you made your interpolator check
               | for values _very close_ to one of the xj. But -- I
               | _think_ , but this stuff is treacherous -- it would also
               | be avoided if you checked for infinities as well as NaNs
               | on the output side, which would be more efficient
               | (provided your inputs are _usually_ sensible). Or -- I
               | _think_ , but this stuff is treacherous -- you could just
               | check explicitly for denormal values of x and maybe of
               | the xj, because if none of those is denormal this
               | pathology can't arise.
        
               | jlokier wrote:
               | Let's say we switch to the "second barycentric formula":
               | double interp(double x, double x1, double y1, double x2,
               | double y2)       {           double w1 = 1.0 / (x1 - x2);
               | double w2 = 1.0 / (x2 - x1);           double num = (w1 *
               | y1) / (x - x1) + (w2 * y2) / (x - x2);           double
               | denom = w1 / (x - x1) + w2 / (x - x2);           return
               | num / denom;       }
               | 
               | This still gives incorrect +Inf results in an interval
               | around the first interpolation point.
               | 
               | The interval is small for parameters interp(x, 0, 10, 1,
               | 20). But interp(x, 0, 1e50, 1, 2e50) shows the interval
               | can be much larger, as it depends on the y values.
               | 
               | Both formulas return incorrect results in an input range
               | larger than the denormals (1e50 times larger in the
               | second example above). Plenty of "normal" inputs that can
               | arise from other calculations. And these parameters are
               | not unusual either.
               | 
               | So you can't ensure a correct result from either
               | barycentric formula in floating point by just checking
               | for denormal input.
               | 
               | I think you may be right that checking for +/-Inf result
               | as well as NaN may be an adequate test, but I wouldn't
               | assume it without a proof or some paper demonstrating
               | that for sure.
        
               | gjm11 wrote:
               | Nice catch on using enormous values of y, where indeed it
               | isn't just denormal values of x that cause trouble. Still
               | not a catastrophic-cancellation problem, though. Let's
               | look more closely. Suppose x is very very close to 0. (In
               | particular, close enough that x-1 is indistinguishable
               | from -1. This is not a source of trouble in this case.)
               | 
               | The weights w1,w2 are -1 and +1, as before.
               | 
               | The numerator is -10^50/x - 10^250. The denominator is
               | -1/x - 1. If x is large enough (which it is in the cases
               | that give spurious infinities) the first term in each of
               | these is enough larger that adding the second doesn't
               | change it; so what you're really computing is (-10^50/x)
               | / (-1/x). And if x is small enough that 10^50/x is
               | infinite, then _boom_.
               | 
               | Something similar could happen even without numbers as
               | large as 10^50. If the value at 0 were 3 instead, then
               | values of x bigger than 1/3 the largest representable
               | non-infinity would produce the same problem. Using
               | f(0)=10^50 enlarges the range in which this particular
               | overflow happens by a factor of 10^50, though.
               | 
               | The fact that this isn't an exotic catastrophic-
               | cancellation problem doesn't, of course, detract from
               | your original point (with which, for the avoidance of
               | doubt, I agree): a calculation that can blow up for
               | certain exact values is likely to misbehave for values
               | that are merely close to those, so testing for the exact
               | values is likely to be a very bad idea.
               | 
               | If I were actually writing something that does
               | interpolation in this way, I would certainly not want to
               | trust that checking the results for infs and nans was
               | sufficient (without, as you say, either proving it or
               | finding someone else who had -- but I suspect it's
               | actually only _almost_ true).
        
               | jacobolus wrote:
               | If x is very small (<1e-100 or something) you can
               | alternatively evaluate a linear approximation accurate
               | near 0. If y is going to be enormous (>1e+100 or
               | something) you should rescale it.
        
               | adrian_b wrote:
               | Like any rule, this has exceptions.
               | 
               | You should never test for equality numbers which are
               | known only approximately, which is the case for most FP
               | numbers.
               | 
               | There are nonetheless cases when certain FP values are
               | known exactly and it is OK to test them for equality.
               | 
               | In general the rule does not depend on number
               | representation, it applies equally to floating-point
               | numbers, fixed-point numbers, rational numbers and even
               | large integers in some cases.
               | 
               | What counts is whether a number is known exactly or only
               | approximately.
        
               | jlokier wrote:
               | There are exceptions.
               | 
               | But the type of barycentric interpolation formula that
               | leads to Inf/Inf at an interpolation point _isn 't_ one
               | of those exceptions - it's an example which shows why the
               | rule should be borne in mind.
               | 
               | When the input is very close to one of the interpolation
               | points, the calculation becomes ill-conditioned,
               | unsurprisingly as two values are diverging towards Inf,
               | so the result can become incorrect as the input
               | approaches the interpolation point, until it becomes
               | Inf/Inf = Nan at some even closer, but non-zero,
               | distance. Before reaching Nan it can also have an
               | interval where the result is +/-Inf.
               | 
               | It depends on the interpolation points. But under some
               | circumstances, it is necessary to recognise when the
               | input is _close_ to an interpolation point, and adjust
               | the formula appropriately.
        
               | twic wrote:
               | What is the rationale for liking rational functions?
               | 
               | Years ago i had some interpolation to do, and used the
               | barycentric rational from Boost [1]. The resulting curve
               | was all over the place. A natural cubic spline gave a
               | much more subjectively sensible-looking fit.
               | 
               | [1] https://www.boost.org/doc/libs/1_77_0/libs/math/doc/h
               | tml/mat...
        
               | jacobolus wrote:
               | Without knowing a lot more about the context and what
               | precisely you did, it's impossible to comment about your
               | negative experience.
               | 
               | > _What is the rationale for liking rational functions?_
               | 
               | They are cheap to evaluate, (relatively) easy to prove
               | things about symbolically, and will efficiently
               | approximate any reasonably smooth function.
        
           | stncls wrote:
           | An example use of Inf is bounds on a value. Say you want to
           | express that x sometimes has upper and/or lower bounds. With
           | Inf you can simply write                   l <= x <= u
           | 
           | accepting that l is sometimes -Inf and u is sometimes +Inf.
           | Without Inf, you get four different cases to handle. This is
           | particularly handy when operations get slightly more complex,
           | like transforming                   l' <= x + b <= u'
           | 
           | into the canonical form above, for some finite b. With Inf
           | you can simply write                   l = l' - b         u =
           | u' - b
           | 
           | and things will work as one expects. Again, without Inf,
           | multiple cases to handle correctly.
        
           | beagle3 wrote:
           | I often use NaNs on purpose to designate "unknown", and its
           | properties of "infecting" computations mean that I don't have
           | to check every step - just at the end of the computation.
           | 
           | R goes even farther, and uses one specific NaN (out of the
           | million-billions) to signal "not available", while other NaNs
           | are just NaNs.
           | 
           | Its properties, used properly, make code much more simple and
           | sane.
        
           | kloch wrote:
           | NaN usually indicates a bug _or_ algorithmic problem, but not
           | always. I have had cases where I need to detect NaN, present
           | a useful message to the user even though everything is
           | working as designed.
        
       | SeanLuke wrote:
       | * x+0.0 cannot be optimized to x because that is not true when x
       | is -0.0
       | 
       | Wait, what? What is x + -0.0 then? What are the special cases?
       | The only case I can think of would be 0.0 + -0.0.
        
         | foo92691 wrote:
         | Seems like it might depend on the FPU rounding mode?
         | 
         | https://en.wikipedia.org/wiki/Signed_zero#Arithmetic
        
       | mhh__ wrote:
       | If you use the LLVM D compiler you can opt in or out of these
       | individually on a per-function basis. I don't trust them
       | globally.
        
         | gpderetta wrote:
         | with gcc you can also use #pragma gcc optimize or
         | __attribute__(optimize(...)) for a similar effect.
         | 
         | It is not 100% bug free (at least it didn't use to) and often
         | it prevents inlining a function into another having different
         | optimization levels (so in practice its use has to be coarse
         | grained).
        
           | nly wrote:
           | This pragma doesn't quite work for -ffast-math
           | 
           | https://gcc.godbolt.org/z/voMK7x7hG
           | 
           | Try it with and without the pragma, and adding -ffast-math to
           | the compiler command line. It seems that with the pragma
           | sqrt(x) * sqrt(x) becomes sqrt(x*x), but with the command
           | line version it is simplified to just x.
        
             | gpderetta wrote:
             | That's very interesting. The pragma does indeed do a lot of
             | optimizations compared to no-pragma (for example it doesn't
             | call sqrtf at all), but the last simplification is only
             | done with the global flag set. I wonder if it is a missed
             | optimization or if there is a reason for that.
             | 
             | edit: well with pragma fast-math it appears it is simply
             | assuming finite math and x>0 thus skipping the call to the
             | sqrtf as there are not going to be any errors, basically
             | only saving a jmp. Using pragma finite-math-only and an
             | explicit check are enough to trigger the optimization (but
             | passing finite-math as a command line is not).
             | 
             | Generally it seems that the various math flags behave
             | slightly differently in the pragma/attribute.
        
         | stabbles wrote:
         | In Julia you can do `@fastmath ...`, which is basically just a
         | find-and-replace of math operations and does not propagate into
         | functions called in that code block, even when they are
         | ultimately inlined. So what it does is:
         | julia> @macroexpand @fastmath x + y
         | :(Base.FastMath.add_fast(x, y))
         | 
         | And maybe that's a good thing, because the scope of @fastmath
         | is as limited as it gets.
        
           | celrod wrote:
           | Yeah, I really like that Julia and LLVM allow applying it on
           | a per-operation basis.
           | 
           | Because most of LLVM's backends don't allow for the same
           | level of granularity, they do end up propagating some
           | information more than I would like. For example, marking an
           | operation as fast lets LLVM assume that it does not result in
           | NaNs, letting nan checks get compiled away even though they
           | themselves are not marked fast:                 julia>
           | add_isnan(a,b) = isnan(@fastmath(a+b))       add_isnan
           | (generic function with 1 method)              julia>
           | @code_llvm debuginfo=:none add_isnan(1.2,2.3)       define i8
           | @julia_add_isnan_597(double %0, double %1) #0 {       top:
           | ret i8 0       }
           | 
           | meaning it remains more dangerous to use than it IMO should
           | be. For this reason, LoopVectorization.jl does not apply
           | "nonans".
        
         | p0nce wrote:
         | +1 Every time I tried --fast-math made things a bit slower.
         | It's not that valuable with LLVM.
        
       | m4r35n357 wrote:
       | Scary stuff. I really hope the default is -fno-fast-math!!!
        
       ___________________________________________________________________
       (page generated 2021-10-20 23:02 UTC)