[HN Gopher] Optimizations Enabled by -ffast-Math
___________________________________________________________________
Optimizations Enabled by -ffast-Math
Author : bwidlar
Score : 200 points
Date : 2021-10-20 05:24 UTC (17 hours ago)
(HTM) web link (kristerw.github.io)
(TXT) w3m dump (kristerw.github.io)
| zarazas wrote:
| 1+1 is 2. Quick math
| tcpekin wrote:
| Why is this compiler optimization beneficial with -ffinite-math-
| only and -fno-signed-integers?
|
| From if (x > y) { do_something();
| } else { do_something_else(); }
|
| to the form if (x <= y) {
| do_something_else(); } else { do_something();
| }
|
| What happens when x or y are NaN?
| zro wrote:
| > NaN is unordered: it is not equal to, greater than, or less
| than anything, including itself. x == x is false if the value
| of x is NaN [0]
|
| My read of this is that comparisons involving NaN on either
| side always evaluate to false.
|
| In the first one if X or Y is NaN then you'll get
| do_something_else, and in the second one you'll get
| do_something.
|
| As far as why one order would be more optimal than the other,
| I'm not sure. Maybe something to do with branch prediction?
|
| [0]
| https://www.gnu.org/software/libc/manual/html_node/Infinity-...
| progbits wrote:
| It is always false. In the first block do_something_else() gets
| executed when either is NaN, in the second it is
| do_something().
| ISL wrote:
| Fascinating to learn that something labelled 'unsafe' stands a
| reasonable chance of making a superior mathematical/arithmetic
| choice (even if it doesn't match on exactly to the unoptimized FP
| result).
|
| If you're taking the ratio of sine and cosine, I'll bet that most
| of the time you're better off with tangent....
| gigatexal wrote:
| I am not a C dev but this was a really fascinating blog post and
| how compilers optimize things.
| superjan wrote:
| Excellent post I agree, but isn't it more about how they can't
| optimize? Compilers can seem magic how they optimize integer
| math but this is a great explanation why not to rely on the
| compiler if you want fast floating point code.
| andraz wrote:
| The question is how to enable all this in Rust... Right now
| there's no simple way to just tell Rust to be fast & loose with
| floats.
| varajelle wrote:
| Rust is not really a fast & loose language where anything can
| be UB.
|
| At least it doesn't need the errno ones since rust don't use
| errno.
| kzrdude wrote:
| It must be clearly understood which of the flags are entirely
| safe and which need to be under 'unsafe', as a prerequisite.
|
| Blanket flags for the whole program do not fit very well with
| Rust, while point use of these flags is inconvenient or needs
| new syntax.. but there are discussions about these topics.
| atoav wrote:
| Also maybe you woild like to have more granular control over
| which parts of your program has the priority on speed and
| which part favours accuracy. Maybe this could be done with a
| seperate type (e.g. ff64) or a decorator (which would be
| useful if you want to enable this for someone elses library).
| andraz wrote:
| Maybe... or maybe there would be just a flag to enable all
| this and be fine with consequences.
| mjburgess wrote:
| ..have you used Rust?
|
| > just a flag to enable all this and be fine with
| consequences
|
| Is probably the opposite of what the language is about.
| wongarsu wrote:
| Reasoning about the consequences along a couple functions
| in your hot path is one thing. Reasoning about the
| consequences in your entire codebase and all libraries is
| quite another.
| jamesmishra wrote:
| I don't know if most Rust programmers would be happy with any
| fast and loose features making it into the official Rust
| compiler.
|
| Besides, algorithms that benefit from -ffast-math can be
| implemented in C and with Rust bindings automatically
| generated.
|
| This solution isn't exactly "simple", but it could help
| projects keep track of the expectations of correctness between
| different algorithm implementations.
| vlovich123 wrote:
| Rust should have native ways to express this stuff. You don't
| want the answer to be "write this part of your problem domain
| in C because we can't do it in Rust".
| pornel wrote:
| There are fast float intrinsics: https://doc.rust-
| lang.org/std/intrinsics/fn.fadd_fast.html
|
| but better support dies in endless bikeshed of:
|
| * People imagine enabling fast float by "scope", but there's no
| coherent way to specify that when it can mix with closures
| (even across crates) and math operators expand to std functions
| defined in a different scope.
|
| * Type-based float config could work, but any proposal of just
| "fastfloat32" grows into a HomerMobile of "Float<NaN=false,
| Inf=Maybe, NegZeroEqualsZero=OnFullMoonOnly, etc.>"
|
| * Rust doesn't want to allow UB in safe code, and LLVM will cry
| UB if it sees Inf or Nan, but nobody wants compiler inserting
| div != 0 checks.
|
| * You could wrap existing fast intrinsics in a newtype, except
| newtypes don't support literals, and <insert user-defined
| literals bikeshed here>
| pjmlp wrote:
| The whole point of ALGOL derived languages for systems
| programming, is not being fast & loose with anything, unless
| the seatbelt and helmet are on as well.
| ptidhomme wrote:
| Do some applications actually use denormal floats ? I'm curious.
| im3w1l wrote:
| Accidentally trigger that path now and then? Probably. Depends
| on subnormals to give a reasonable result? Probably rare. But
| what do I know...
| adrian_b wrote:
| Denormal floats are not a purpose, they are not used
| intentionally.
|
| When the CPU generates denormal floats on underflow, that
| ensures that underflow does not matter, because the errors
| remain the same as at any other floating-point operation.
|
| Without denormal floats, underflow must be an exception
| condition that must be handled somehow by the program, because
| otherwise the computation errors can be much higher than
| expected.
|
| Enabling flush-to-zero instead is an optimization of the same
| kind as ignoring integer overflow.
|
| It can be harmless in a game or graphic application where a
| human will not care if the displayed image has some errors, but
| it can be catastrophic in a simulation program that is expected
| to provide reliable results.
|
| Providing a flush-to-zero option is a lazy solution for CPU
| designers, because there have been many examples in the past of
| CPU designs where denormal numbers were handled without
| penalties for the user (but of course with a slightly higher
| cost for the manufacturer) so there was no need for a flush-to-
| zero option.
| ptidhomme wrote:
| Thanks for this explanation. But yeah, I meant : are there
| applications where use of denormals vs. flush-to-zero is
| actually useful... ? If your variable will be nearing zero,
| and e.g. you use it as a divider, don't you need to handle a
| special case anyway ? Just like you should handle your
| integer overflows.
|
| I'm seeing denormals as an extension of the floating point
| range, but with a tradeoff that's not worth it. Maybe I got
| it wrong ?
| amadvance wrote:
| Denormals give the guarantee that if a != b, then a - b !=
| 0.
|
| Without denormals a check before a division may fail to
| detect a division by 0.
| marcan_42 wrote:
| I agree with you; if your code "works" with denormals and
| "doesn't work" with flush-to-zero, then it's already broken
| and likely doesn't work for some inputs anyway, you just
| haven't run into it yet.
| adrian_b wrote:
| The use of denormals is always useful except when you want
| to make your program faster no matter what and your target
| CPU happens to be a CPU where operations with denormals are
| slower than operations with other numbers.
|
| You must keep in mind that if you enable flush-to-zero and
| you really see an increase in speed, that means that you
| have now injected errors in your FP computations. Whether
| the errors matter or not, that depends on the application.
|
| If you do not see any serious speed improvement, then you
| have no reason to enable flush-to-zero. Even when you might
| want to use flush-to-zero in a production binary, you
| should not use it during development.
|
| Prior to enabling flush-to-zero it would be good to enable
| trap-on-underflow instead of denormals, to see if this
| really happens frequently enough to matter for performance
| and possibly to investigate the reason, to see if underflow
| could be avoided in other ways.
|
| In conclusion using denormals is always useful, because it
| limits the errors accumulated during FP computation. On
| some CPUs you may get higher speed by not using denormals,
| but only if large errors do not matter. If you avoid
| underflows by other means, e.g. range checking of inputs,
| then it does not matter how slow the operations with
| denormals are, so there is no reason to use any option that
| enables flush-to-zero (e.g. -ffast-math).
|
| Making a FP computation with "double" numbers and enabling
| flush-to-zero on that would usually be quite stupid,
| because if the errors do not matter, then that is a task
| for single-precision "float" (unless you need "double" only
| for the exponent range, not for precision).
| marcan_42 wrote:
| Flushing denormals to zero only matters if your calculations
| are _already_ running into the lower end of floating-point
| exponents (and even with denormals, if they 're doing that,
| they're going to run into lost precision anyway sooner or
| later).
|
| The useful thing denormals do is make the loss of precision
| at that point gradual, instead of sudden. But you're still
| losing precision, and a few orders of magnitude later you're
| going to wind up at 0 either way.
|
| If your formulas are producing intermediate results with
| values that run into the lower end of FP range, _and_ it
| matters that those values retain precision there, then you
| 're either using the wrong FP type or you're using the wrong
| formulas. Your code is likely already broken, the breakage is
| just rarer than in flush-to-zero mode.
|
| So just enable FTZ mode, and if you run into issues, you need
| to fix that code (e.g. don't divide by values that can be too
| close to 0), not switch denormals on.
| adrian_b wrote:
| Your arguments are correct, but the conclusion does not
| result from them.
|
| If we assume that underflows happen in your program and
| this, as you say, is a sign that greater problems will be
| caused by that, then you must not enable flush-to-zero, but
| you must enable trap-on-underflow, to see where underflows
| happen and to investigate the reason and maybe rearrange
| your formulas to avoid the too small results.
|
| Flush-to-zero may sometimes lead to crashes, when it
| becomes obvious that something is wrong, but more often you
| just get some results with large errors that are difficult
| to distinguish from good results.
|
| Opinions obviously vary, but I have never seen any good use
| case for flush-to-zero.
|
| Underflows are either so rare that their performance does
| not matter, or if they are so frequent that flush-to-zero
| will increase the speed, then your code has a problem and
| the formulas should be restructured, which will both
| increase the speed and eliminate the loss of precision.
| stephencanon wrote:
| > Opinions obviously vary, but I have never seen any good
| use case for flush-to-zero.
|
| The classical use case is real-time audio, where an IIR
| filter may have quite slow decay, such that the signal
| stays in the subnormal regime for many samples. If this
| happens and your hardware has significant stalls for
| subnormal data, you may miss your real-time deadline
| resulting in clicking or other audio corruption.
| adrian_b wrote:
| I agree that this is a good example and there are
| probably similar cases in video processing and in
| graphics rendering.
|
| However not all CPUs stall for subnormal data and if the
| CPU designers would have ensured that none of them stall,
| no such discussions would have been ever needed.
|
| Outside such special cases like real-time DSP and
| graphics, using flush-to-zero or options like -Ofast or
| -ffast-math are seldom justified and they can be
| dangerous, especially when they are used without a prior
| analysis and tests to determine whether exception
| conditions really appear when the program is run, and
| why.
| stephencanon wrote:
| > However not all CPUs stall for subnormal data and if
| the CPU designers would have ensured that none of them
| stall, no such discussions would have been ever needed.
|
| I don't really disagree (I've pushed hard for no
| subnormal stalls for years), but there are some timing
| and area tradeoffs that come with this that make it a
| somewhat hard sell for FPUs that aren't built entirely
| around FMA (there are tradeoffs even for designs that are
| purely FMA based, but they're a lot smaller).
|
| Mainstream x86 designs _still_ have some subnormal
| stalls!
| nyanpasu64 wrote:
| > Outside such special cases like real-time DSP and
| graphics
|
| Personally I see audio and graphics as the common case,
| and scientific computing as the special case ;) Do game
| physics engines expect denormals to be preserved?
| enriquto wrote:
| It's not that you decide specifically to use them, but
| sometimes they sort of happen very naturally. For example, if
| you evaluate a Gaussian function f(x)=e^(-x^2), for moderate
| values of x the value f(x) is denormal.
| PaulDavisThe1st wrote:
| If you want maximum possible accuracy for FP computation, you
| need to pay the performance price that denormals incur. In many
| cases, that will likely be a long running computation where
| responsiveness from a user-perspective and/or meeting realtime
| deadlines is not an issue. Any time "I want the best possible
| answer that we can compute", you should leave denormals
| enabled.
|
| By contrast, in media software (audio and/or video) denormals
| getting flushed to zero is frequently necessary to meet
| performance goals, and ultimately has no impact on the result.
| It's quite common for several audio FX (notably reverb) to
| generate denormals as the audio fades out, but those values are
| so far below the physical noise floor that it makes no
| difference to flush them to zero.
| rocqua wrote:
| I found the following note for -ffinite-math-only and -fno-
| signed-zeros quite worrying:
|
| The program may behave in strange ways (such as not evaluating
| either the true or false part of an if-statement) if calculations
| produce Inf, NaN, or -0.0 when these flags are used.
|
| I always thought that -ffast-math was telling the compiler. "I do
| not care about floating point standards compliance, and I do not
| rely on it. So optimize things that break the standard".
|
| But instead it seems like this also implies a promise to the
| compiler. A promise that you will not produce Inf, NaN, or -0.0.
| Whilst especially Inf and NaN can be hard to exclude.
|
| This changes the flag from saying "don't care about standards,
| make it fast" to "I hereby guarantee this code meets a stricter
| standard" where also it becomes quite hard to actually deduce you
| will meet this standard. Especially if you want to actually keep
| your performance. Because if you need to start putting all
| divisions in if statements to prevent getting Inf or NaN, that is
| a huge performance penalty.
| 0xTJ wrote:
| That's the way a number of those special flags work. They are a
| contract to the compiler saying "act like this can't happen, I
| promise it won't".
|
| The picking and choosing specific flags enable might be better
| in general. The flag `-funsafe-math-optimizations` seems to be
| useful as general-purpose, although as this article mentions,
| some of those optimizations need the more unsafe ones to become
| active.
|
| I would say that `-ffast-math` is a dangerously safe-sounding
| flag, `-funsafe-fast-math` would be better IMO.
| dspillett wrote:
| _> I always thought that -ffast-math was telling the compiler.
| "I do not care about floating point standards compliance, ...
| So optimize things that break the standard". But instead it
| seems like this also implies a promise to the compiler [that]
| you will not produce Inf, NaN, or -0.0._
|
| I see it as both, as with many such flags1n though as you say
| it does put the onus on the developer to avoid inputs the
| optimisation may cause problems for. You are saying "I promise
| that I'm being careful not to do things that will be affected
| by these parts of the standard, or won't blame you for
| undefined behaviour if I do, so you can skip those rules if it
| helps you speed things up".
|
| [1] like those that enable optimisations which can cause
| significant breakage if pointer aliasing is happening
| okl wrote:
| > A promise that you will not produce Inf, NaN, or -0.0. Whilst
| especially Inf and NaN can be hard to exclude.
|
| Cumbersome but not that difficult: Range-check all input
| values.
| contravariant wrote:
| And don't do any division.
|
| Easy.
| okl wrote:
| There's nothing wrong with divisions if you exclude
| problematic combinations of input values.
| contravariant wrote:
| You're going to have to do some intermediate checks if
| you want stuff like x/sin(x) to behave.
| scheme271 wrote:
| Or multiplications or additions to avoid overflows or
| denormals
| orangepanda wrote:
| As the compiler assumes you wont produce such values, wouldnt
| it also optimise those range checks away?
| nly wrote:
| He means simple stuff like ensuring a function like
| float unit_price (float volume, int count) {
| return volume / count; }
|
| is only called where count > 0
| colejohnson66 wrote:
| But couldn't the compiler optimize out the check because
| you're guaranteeing to never divide by 0, so logically
| count would never be 0. Akin to undefined behavior
| almost.
| adrian_b wrote:
| What the compiler assumes depends on the compilation
| options.
|
| Unfortunately most compilers have too permissive default
| options.
|
| C and C++ programs should always be compiled, at least
| during development, with strict options, e.g.:
|
| -fsanitize=undefined -fsanitize-undefined-trap-on-error
|
| or equivalent options, which eliminate all the horror
| stories with unexpected behavior caused by integer
| divide-by-zero or overflow, pointer overflow, out-of-
| bounds access and many others.
|
| Because unlike integer operations the FP operations are
| standardized, there is no need for compiler options, the
| behavior for exceptions can be controlled from the
| program with the functions defined in <fenv.h> in C or
| <cfenv> in C++.
| throwawaylinux wrote:
| This test can not be optimized to always true:
| if (count > 0) total += unit_price(volume,
| count);
|
| Are you confusing it with the opposite scenario:
| total += unit_price(volume, count); if (count ==
| 0) printf("oops\n");
|
| That test might be optimized out because divide by zero
| is undefined behavior.
| colejohnson66 wrote:
| I probably am confusing them. I don't program in C or
| C++, so I'm not aware of all the "gotchas", and can
| easily confuse them (as shown here). Not to mention that
| compilers have been getting more aggressive in optimizing
| undefined behavior.
| jcelerier wrote:
| At least GCC is I believe fairly buggy in that regard since
| isnan is optimised out, thus you cannot check if say an
| incoming value read from the network is a NaN. There's some
| lengthy discussion about it on the big tracker iirc..
| jcelerier wrote:
| I was misremembering, was thinking about this recent
| convo on LLVM mailinglist:
| https://lists.llvm.org/pipermail/llvm-
| dev/2021-September/152...
| usefulcat wrote:
| I don't think you are misremembering; -ffast-math causes
| __builtin_isnan and __builtin_isinf to unconditionally
| return zero, for both gcc and clang:
|
| https://godbolt.org/z/1qjxvW3KK
|
| I found this out the hard way several years ago and still
| have my own implementations of these functions laying
| about in case I want to use -ffast-math.
| fho wrote:
| That's where more advanced type systems shine. You get to
| encode things like "this number is not null" and a safe
| division function will only take "not nulls" as the input.
|
| The workflow would then be: 1. prove that
| the input is not null (if input == 0) yielding a "not null
| number" 2. running your calculation (no check needed
| for multiplication or division, if you add something you have
| to reprove that the number is not null)
| benttoothpaste wrote:
| The checks could be expensive though, more expensive than the
| gains from this optimization. Often it is better to let the
| calculation proceed and use isnan to check the final result.
| heavenlyblue wrote:
| > such as not evaluating either the true or false part of an
| if-statement
|
| Do you mean `if-else if` or `if-else`? Because the second would
| work and would always short-circuit to else.
| Linosaurus wrote:
| I imagine it means that the compiler could turn an
| if(x<6)-else into two free standing if statement like
| if(x<6), if(x>=6). Which will logically work under these
| assumptions.
| adrian_b wrote:
| The whole rationale for the standard for floating-point
| operations is to specify the FP operations with such properties
| that a naive programmer will be able to write programs which
| will behave as expected.
|
| If you choose any option that is not compliant with the
| standard, that means, exactly as you have noticed, that you
| claim that you are an expert in FP computations and you know
| how to write FP programs that will give the desired results
| even with FP operations that can behave in strange ways.
|
| So yes, that means that you become responsible to either
| guarantee that erroneous results do not matter or that you will
| take care to always check the ranges of input operands, as
| "okl" has already posted, to ensure that no overflows,
| underflows or undefined operations will happen.
| celrod wrote:
| > So yes, that means that you become responsible to either
| guarantee that erroneous results do not matter or that you
| will take care to always check the ranges of input operands,
| as "okl" has already posted, to ensure that no overflows,
| underflows or undefined operations will happen.
|
| This is why I dislike the fact that it removes simple methods
| for checking if values are NaN, for example. I do not find it
| ergonomic to disable these checks.
| billfruit wrote:
| Is it adviced to detect overflow and do something about it
| after an FP computation rather than adding pre-checks to try
| to avert the overflow?
| KarlKemp wrote:
| The only obvious way to test if some operation is going to
| produce some value(s) is to perform that operation. Yes,
| for, say, a single multiplication I can anticipate the
| necessary bounds for each input value. But even short
| sequences of operations would quickly devolve into
| labyrinthine nested switch statements.
| owlbite wrote:
| So it depends on how you want to detect overflow.
|
| Trapping on math exceptions or otherwise pulling an
| emergency break doesn't really fit with modern hardware
| design if you want performance (for one thing, what do you
| do with the rest of your vector? do you just live with a
| pipeline flush?).
|
| Adding targeted checks used sparingly can work, but
| probably requires a deeper understanding of the underlying
| numerical analysis of your problem. Generally just checking
| for NaN or Inf at the end is a better solution as these are
| in most cases absorbing states.
| owlbite wrote:
| Rename to: -fdisable-math-safeties
| gpderetta wrote:
| I think the documentation is fairly clear:
|
| "Allow optimizations for floating-point arithmetic that assume
| that arguments and results are not NaNs or +-Infs."
|
| The obvious implication is that is that arguments and results
| are always finite and as a programmer you are responsible of
| guaranteeing the correct preconditions.
|
| Certainly do not use finite-math-only when dealing with
| external data.
| hurrrrrrrr wrote:
| >Certainly do not use finite-math-only when dealing with
| external data.
|
| so... never?
| gpderetta wrote:
| let me rephrase: data which you cannot guarantee meet the
| preconditions. Either the input is known good data or you
| sanitize it through some code path not compiled with fast-
| math.
| praptak wrote:
| Not only math optimizations are like this. Strict aliasing is a
| promise to the compiler to not make two pointers of different
| types point to the same address.
|
| You technically make that promise by writing in C (except a
| narrow set of allowed conversions) but most compilers on
| standard settings do let you get away with it.
| phkahler wrote:
| I thought the idea was to never produce NaN or -0 in the first
| place, but it seems that's not the case.
| pinteresting wrote:
| Generally if you're seeing a NaN/Inf something has gone wrong,
| It's very difficult to gracefully recover from and if you tried
| I think you would lose both sanity and performance!
|
| Regarding performance, the cost of a real division is about 3-4
| orders worse performance than an if statement that is very
| consistent, but the usual way is to have fast/safe versions of
| functions, where you need performance and can deduce if
| something is always/never true, create/use the fast function
| but by default everything uses the slower/safer function.
| nemetroid wrote:
| > Generally if you're seeing a NaN/Inf something has gone
| wrong
|
| That's a bold claim.
| xioxox wrote:
| As a scientist, it depends on what you mean by wrong. It's
| nice to push an array of numbers through some array
| operation. If some of the outputs look like NaN or Inf then
| that tells me something went wrong in the operation and to
| take a closer look. If some optimizer was told that NaN or
| Inf couldn't happen, meaning that they wouldn't be
| generated, then some of this output could be pure nonsense
| and I wouldn't notice. As NaNs propagate through the
| calculation they're extremely helpful to indicate something
| went wrong somewhere in the call chain.
|
| NaN is also very useful itself as a data missing value in
| an array. If you have some regularly sampled data, with
| some items missing, it makes a lot of things easier to
| populate the holes with values which guarantee they won't
| produce misleading output.
| jes wrote:
| What makes it a bold claim?
| djmips wrote:
| Personally, I thought they were being sarcastic.
| LeanderK wrote:
| it's true in my experience. NaN/Inf was always a bug and
| resulted in needing to rewrite the calculation
| pinteresting wrote:
| Can you think of a function where the input is valid, the
| output is NaN and nothing has gone wrong in the process?
|
| I can't think of any, haven't experienced any, not heard of
| any examples of it, so you're welcome to break my ignorance
| on the subject.
| jleahy wrote:
| Calculating the average value for some observation in a
| time bucket, where some buckets may have no observation
| (resulting in 0/0=nan), finally taking some kind of
| summary of these ignoring nan values (there is a whole
| library of functions for this in numpy for example).
|
| NaN and Inf are extremely valuable and useful encodings
| to represent real things. Particularly signed infinity is
| wonderful, as functions like exp behave correctly for inf
| (eg. exp(-inf)=0).
| noctune wrote:
| >Calculating the average value for some observation in a
| time bucket, where some buckets may have no observation
| (resulting in 0/0=nan)
|
| But there are many ways such a calculation could result
| in NaN besides a division. For example, intermediate sums
| could result in something like inf - inf. I find it rare
| that you can definitely conclude that a NaN means some
| specific thing.
| ChrisLomont wrote:
| There are algorithms that behave correctly but require
| intermediate Inf or NaN to work. If I recall there is one
| related to finding bounding box intersections in ray
| tracing. I know when I introduce such code into a
| codebase I put a large "DO NOT CHANGE THIS..." style
| comment with an explanation so some enterprising
| programmer doesn't put some if statements in to remove
| what they think is an error, when it is not.
|
| I'd suspect the same is true for NaN. These things were
| introduced for reasons. If you don't have a deep
| knowledge of why, don't assume everyone else is the same.
|
| Edit: I found examples using a NaN and Inf intermediates
| to get a large speed increase [1,2].
|
| [1] https://tavianator.com/2015/ray_box_nan.html
|
| [2] https://tavianator.com/2011/ray_box.html
| dkersten wrote:
| Sure, of course they have uses otherwise why would they
| exist. But I would argue that the examples you have
| aren't the common case -- most people's floating point
| calculations do not rely on these special values and
| seeing them denotes something went wrong.
| ChrisLomont wrote:
| Of course they're not the common case, but the parent
| wanted examples where they are used in calculations that
| do have nice answers.
|
| So they are useful. They are more useful than most people
| realize that simply assume they only mark errors.
|
| And without them in your case, you would not be alerted
| that a calculation went wrong. So even there they are
| very useful. Not using them correctly is like ignoring
| file API errors in your programs - sure they are rare,
| but you need to understand and handle them if you want to
| make good software.
| dkersten wrote:
| Ah right, fair enough! They are, indeed, good examples of
| the usefulness.
| ImprobableTruth wrote:
| Do you count under/overflow resulting in Inf as
| 'something has gone wrong'? If so, why would gracefully
| recovering be hard?
| vanviegen wrote:
| Because you're about 15 stack frames deep into some
| calculation when the problem gets noticed. So how do you
| handle this? You could throw an exception (or similar),
| which would allow the UI to state something semi-helpful
| like 'Calculation failed'. But recover? Usually not.
|
| I think NaN and inf can be really useful in specific
| cases, but they probably shouldn't be allowed to
| propagate through your program, and shouldn't occur
| unless you're specifically making use of them.
| adrian_b wrote:
| When you do not want such values to be propagated, it is
| always possible to choose to have overflow exceptions
| instead of infinities and undefined operation exceptions
| instead of NaNs. Also underflow exceptions instead of
| denormals.
|
| The programmer should choose which is most appropriate
| for an application. Choosing to both ignore the
| exceptions and not generate a special value that can be
| examined later, that is seldom acceptable.
| MaulingMonkey wrote:
| JavaScript's parseInt and parseFloat return NaN on
| something as benign as "well, that wasn't a number".
| var jsonish = "..."; var number =
| parseFloat(jsonish); if (!isNaN(number)) {
| // deserialized a number } else { //
| ...try deserializing as a boolean or string next...
| }
|
| Fast math optimizations can break code like this by
| breaking isNaN.
|
| I was porting a C++ project to a certain platform - and
| that platform enabled a -ffast-math equivalent by default
| in Release (but not Debug) builds! This broke duktape, a
| JS engine said project embedded, in some nasty and subtle
| ways. Instead of storing a number/pointer/??? (8 bytes) +
| type tag (4? bytes) for each dynamically typed JS value,
| duktape can bit-pack values into a single 8 byte "double"
| value by storing object/string handles as NaN values -
| this isn't an uncommon trick for dynamically typed
| scripting stuff:
|
| https://github.com/svaarala/duktape/blob/c3722054ea4a4e50
| f48...
|
| Naturally, the -ffast-math equivalent broke isNaN checks,
| which caused random object/string handles to be
| mistakenly reinterpreted as "numbers" - but only in
| Release builds, for this one particular platform, in one
| rarely taken branch, so neither QA nor CI caught it,
| leading to hours of manufacturing a repro case, stepping
| through an absurd amount of code, and then finally
| looking at the default build rules and facepalming.
|
| Cursing the platform vendor under my breath, I overrode
| the defaults to align with the defaults of every other
| config x platform combination we already had: no fast
| math. If you want those optimizations, use SSE-friendly
| NaN-avoiding intrinsics - or, if you _must_ use the
| compiler flags, ensure you do so consistently across
| build configs and platforms, perhaps limited to a few TUs
| or modules if possible. This allows you to have a chance
| at using your Debug builds to debug the resulting
| "optimizations".
| jacobolus wrote:
| This depends how you define "gone wrong". In evaluating
| rational functions (a useful tool for approximating all
| sorts of other functions), one efficient and well behaved
| algorithm is the "barycentric formula", based on
| interpolating function values at various specific input
| points. When the input is one of those points directly,
| this formula results in Inf / Inf = NaN. Evaluation code
| needs to check for this case, and then replace the output
| by the appropriate value for the input.
|
| +-Inf is a very natural output for many kinds of
| numerical codes.
| jlokier wrote:
| > When the input is one of those points directly,
|
| That's against one of the rules of well-behaved floating
| point programming: Never test for equality.
| gjm11 wrote:
| It's worth being a bit more explicit about this.
|
| The appearance of NaNs doesn't result from breaking the
| "never test for equality" rule. It happens when the point
| where you're evaluating the function _just happens to_
| exactly match one of the interpolation points.
|
| But if you decide to deal with the NaN issue by (1)
| testing whether your input exactly matches any of the
| interpolation points, or (2) testing the result for NaN,
| then you're testing for equality or doing something very
| similar, and then you may have one of the problems the
| "never test for equality" is meant to prevent: if bad
| stuff happens when x _exactly_ equals one of the x_j,
| then perhaps less-obvious bad stuff happens when x
| _almost exactly_ equals one of the x_j, and an equality
| test won 't catch that.
|
| So, does it? Actually, I think not. Suppose x = xj+h with
| h very small. Then your interpolant is (h times something
| of normal size) times (1/h times something of normal size
| + smaller terms) and although one of those is very small
| and the other very large, ordinary floating-point
| arithmetic doesn't care about that. There's no
| catastrophic cancellation or anything. And floating point
| has much more dynamic range than precision, so to speak,
| so until h literally reaches zero you aren't running into
| trouble. (I guess there might be ways of organizing the
| calculation that _do_ give you catastrophic cancellation,
| but if you do it the obvious way then that doesn 't
| happen.)
|
| So in this particular case you really could just test for
| equality or test for NaNs. I think.
|
| (For some interesting investigation of other aspects of
| the numerical stability of this kind of interpolation,
| see https://people.maths.ox.ac.uk/trefethen/publication/P
| DF/2011.... But note that the authors of this paper are
| using the formula for _extrapolation_ or, as they prefer
| to call it, "numerical analytic continuation".)
| jacobolus wrote:
| > _perhaps less-obvious bad stuff happens when x almost
| exactly equals one of the x_j, and an equality test won
| 't catch that. So, does it? Actually, I think not._
|
| You are correct, it does not. Quite the opposite: the
| results are excellent when x doesn't quite equal one of
| the interpolation nodes, because you end up with a value
| of S * y / S, where S = <<very big but not close to
| overflowing float>> and y = <<expected output value>>.
|
| The only potentially problematic case is if S * y
| overflows but S does not. But floating point numbers have
| a ton of dynamic range, so unless you are looking at
| points _really_ , _really_ close to zero, with very large
| outputs (say, x - x_j = 1e-200, y = 1e+200), there is no
| problem. And the failure case there is that you end up
| with Inf instead of 1e200 as an output.
|
| It isn't a problem in practice, and if you are worried
| about it as a problem in theory, you can take care in
| choice of interpolation nodes or do some scaling of the
| output values.
| jlokier wrote:
| > There's no catastrophic cancellation or anything. And
| floating point has much more dynamic range than
| precision, so to speak, so until h literally reaches zero
| you aren't running into trouble.
|
| I just tried it to be sure, in C with "double", and
| organized the calculation the way it's described in the
| paper you cited, formula 2.1.
|
| It failed due to catastrophic rational cancellation close
| to an interpolation point, depending on the values
| involved. An equality test was not enough.
|
| Approaching the first interpolation point, before it
| transitioned from the correct result, there were some
| incorrect Inf results prior to it switching to Nan for an
| input not equal to any interpolation point.
|
| This was trivial 1d interpolation between two points,
| where the points were (0.0 => 10.0, 1.0 => 20.0).
| gjm11 wrote:
| Hm, interesting. I also tried some numerical experiments
| to sanity-check and I got reasonable results right up to
| 1ulp away from the interpolation points. I tried again
| using exactly your example -- f(x) = 10+10x,
| interpolation points at 0 and 1 -- and again everything
| seemed fine.
|
| I wonder what we're doing differently, or whether we
| happened to test at different points. Here's my code:
| def interpolate1(f,x,xs,ws): p,s = 1,0
| for (xj,wj) in zip(xs,ws): p *= x-xj
| s += wj*f(xj)/(x-xj) return p*s
|
| (it's "interpolate1" because I also implemented the other
| version of the formula described in the article, which
| for this purpose behaves very similarly).
|
| For your example we want f(x) = 10+10x, xs=[0,1],
| ws=[-1,1].
|
| As I mentioned before, if you implement the calculation
| of the polynomial called l(x) in the paper -- the thing
| my code calls p -- in an unfortunate way, you can get
| cancellation problems, but for this case I'm not sure I
| see any way for that to happen.
|
| I tried calculating l(x) in a different way that could
| conceivably cause cancellation issues, aimed at this
| particular interpolant: def foo(x):
| p = x*x-x s = -1*10/x + 1*20/(x-1)
| return p*s
|
| but this also gave reasonable results for every value of
| x I tried other than exactly 0 and exactly 1.
|
| What exact values of x are giving you what bad results?
| What language are you implementing this in, with what
| data types?*
| jlokier wrote:
| #include <stdio.h> #include <math.h>
| double interp(double x, double x1, double y1, double x2,
| double y2) { double node_poly = (x - x1)
| * (x - x2); double w1 = 1.0 / (x1 - x2);
| double w2 = 1.0 / (x2 - x1); double y =
| node_poly * ((w1 * y1) / (x - x1) + (w2 * y2) / (x -
| x2)); return y; } int main()
| { for (int p = -300; p >= -330; p -= 1) {
| double x = pow(10, p); double y = interp(x,
| 0, 10, 1, 20); printf("x = %.6g, y =
| %.6g\n", x, y); } return 0; }
| jacobolus wrote:
| You implemented the wrong formula. You don't want to
| explicitly calculate "node poly".
|
| The formula you want is the "second barycentric formula",
| (4.2) at the top of page 505 of
| https://people.maths.ox.ac.uk/trefethen/barycentric.pdf
|
| Or formula (2.6) halfway down page 2 of https://people.ma
| ths.ox.ac.uk/trefethen/publication/PDF/2011...
|
| Also cf. Higham (2004) "The numerical stability of
| barycentric Lagrange interpolation" https://www.maths.man
| chester.ac.uk/~higham/narep/narep440.pd...
|
| If you want to implement the "first barycentric formula"
| instead you need to take some additional care to avoid
| overflow/underflow.
|
| As the papers above note, the choice of interpolation
| nodes makes a big difference. If you have an arbitrary
| function to approximate on an arbitrary complex domain,
| take a look at https://arxiv.org/abs/1612.00337
| jlokier wrote:
| Thanks for the pointer, and to the paper showing how to
| calculate error bounds. Interesting paper.
| gjm11 wrote:
| There are advantages to using either version of the
| formula. If you use the "first" then you do need to take
| care about overflow/underflow when the number of
| interpolation points is large, but that isn't an issue
| here. The "first" behaves better than the "second" for
| extrapolation, as discussed in the paper by Trefethen et
| al. And yes, the choice of interpolation points is
| important.
|
| But none of that should make a difference here. What's
| actually causing the dubious results jlokier reports
| isn't exactly proximity to the interpolation points -- I
| don't think it could happen if one of them weren't
| exactly zero. What's happening is that the very smallest
| values of x in jlokier's code are _denormal_ numbers;
| that is, ones whose floating-point representation has
| less precision than usual.
|
| Usually a floating-point number looks like 1.<stuff> x
| 2^n; for IEEE double precision, the <stuff> is 52 bits
| long. So if you start with 1, 1/2, 1/4, 1/8, and keep
| dividing by 2, you always have stuff=0 and n decreases by
| 1 each time. But what should you do once you reach the
| smallest value of n that will fit in the floating-point
| format? One answer is just to say that after that you get
| zero. But you can get a little extra dynamic range if
| instead you say that when n is as small as it's allowed
| to be, you can use 0.<stuff> instead of 1.<stuff>. These
| numbers have reduced precision for the sake of extra
| dynamic range, and they are called "denormal". Lots of
| things go subtly (or not so subtly) wrong when dealing
| with denormal numbers.
|
| One thing that goes wrong is the interpolation formula
| we're discussing here. The formula looks like f(x) ~=
| l(x) . sum of wj f(xj) / (x-xj) where xj are the
| interpolation points (in this case 0 and 1), l(x) is the
| product of (x-xj), so in this case x(x-1), and wj =
| product (xj-xi) where xi ranges over all the _other_
| interpolation points besides xj, so w0 = -1 and w1 = +1.
| So we have f(x) ~= x(x-1) . [-10 /x + 20/(x-1)]. In fact
| this is _exact_ provided x is neither 0 or 1, because the
| thing we 're approximating is a polynomial of degree <=
| (# interpolation points) - 1.
|
| So, what I claimed before was: suppose x = xj + h where h
| is small, and suppose you compute l(x) as the product of
| its factors. Then we get l(x) = h times product
| (xj+h-xi), and the sum is wj f(xj) / h + the sum of the
| other terms; and if h is tiny then the other terms in the
| sum are tiny in comparison, and the factors of h and 1/h
| cancel and everything's fine.
|
| But that doesn't happen when xj=0 and x (and hence h) is
| denormal -- because you can't pull the denormal trick at
| the _large_ end of the range, so the reciprocal of a
| denormal floating-point number is floating-point
| infinity!
|
| This isn't a cancellation problem. It isn't a matter of
| the formula going unstable when x is very close to one of
| the interpolation points. But it _is_ a problem, and it
| never occurred to me until jlokier demonstrated it,
| because denormals are kinda obscure and I didn 't think
| to think about what might happen if they get involved. It
| _would_ be avoided if you made your interpolator check
| for values _very close_ to one of the xj. But -- I
| _think_ , but this stuff is treacherous -- it would also
| be avoided if you checked for infinities as well as NaNs
| on the output side, which would be more efficient
| (provided your inputs are _usually_ sensible). Or -- I
| _think_ , but this stuff is treacherous -- you could just
| check explicitly for denormal values of x and maybe of
| the xj, because if none of those is denormal this
| pathology can't arise.
| jlokier wrote:
| Let's say we switch to the "second barycentric formula":
| double interp(double x, double x1, double y1, double x2,
| double y2) { double w1 = 1.0 / (x1 - x2);
| double w2 = 1.0 / (x2 - x1); double num = (w1 *
| y1) / (x - x1) + (w2 * y2) / (x - x2); double
| denom = w1 / (x - x1) + w2 / (x - x2); return
| num / denom; }
|
| This still gives incorrect +Inf results in an interval
| around the first interpolation point.
|
| The interval is small for parameters interp(x, 0, 10, 1,
| 20). But interp(x, 0, 1e50, 1, 2e50) shows the interval
| can be much larger, as it depends on the y values.
|
| Both formulas return incorrect results in an input range
| larger than the denormals (1e50 times larger in the
| second example above). Plenty of "normal" inputs that can
| arise from other calculations. And these parameters are
| not unusual either.
|
| So you can't ensure a correct result from either
| barycentric formula in floating point by just checking
| for denormal input.
|
| I think you may be right that checking for +/-Inf result
| as well as NaN may be an adequate test, but I wouldn't
| assume it without a proof or some paper demonstrating
| that for sure.
| gjm11 wrote:
| Nice catch on using enormous values of y, where indeed it
| isn't just denormal values of x that cause trouble. Still
| not a catastrophic-cancellation problem, though. Let's
| look more closely. Suppose x is very very close to 0. (In
| particular, close enough that x-1 is indistinguishable
| from -1. This is not a source of trouble in this case.)
|
| The weights w1,w2 are -1 and +1, as before.
|
| The numerator is -10^50/x - 10^250. The denominator is
| -1/x - 1. If x is large enough (which it is in the cases
| that give spurious infinities) the first term in each of
| these is enough larger that adding the second doesn't
| change it; so what you're really computing is (-10^50/x)
| / (-1/x). And if x is small enough that 10^50/x is
| infinite, then _boom_.
|
| Something similar could happen even without numbers as
| large as 10^50. If the value at 0 were 3 instead, then
| values of x bigger than 1/3 the largest representable
| non-infinity would produce the same problem. Using
| f(0)=10^50 enlarges the range in which this particular
| overflow happens by a factor of 10^50, though.
|
| The fact that this isn't an exotic catastrophic-
| cancellation problem doesn't, of course, detract from
| your original point (with which, for the avoidance of
| doubt, I agree): a calculation that can blow up for
| certain exact values is likely to misbehave for values
| that are merely close to those, so testing for the exact
| values is likely to be a very bad idea.
|
| If I were actually writing something that does
| interpolation in this way, I would certainly not want to
| trust that checking the results for infs and nans was
| sufficient (without, as you say, either proving it or
| finding someone else who had -- but I suspect it's
| actually only _almost_ true).
| jacobolus wrote:
| If x is very small (<1e-100 or something) you can
| alternatively evaluate a linear approximation accurate
| near 0. If y is going to be enormous (>1e+100 or
| something) you should rescale it.
| adrian_b wrote:
| Like any rule, this has exceptions.
|
| You should never test for equality numbers which are
| known only approximately, which is the case for most FP
| numbers.
|
| There are nonetheless cases when certain FP values are
| known exactly and it is OK to test them for equality.
|
| In general the rule does not depend on number
| representation, it applies equally to floating-point
| numbers, fixed-point numbers, rational numbers and even
| large integers in some cases.
|
| What counts is whether a number is known exactly or only
| approximately.
| jlokier wrote:
| There are exceptions.
|
| But the type of barycentric interpolation formula that
| leads to Inf/Inf at an interpolation point _isn 't_ one
| of those exceptions - it's an example which shows why the
| rule should be borne in mind.
|
| When the input is very close to one of the interpolation
| points, the calculation becomes ill-conditioned,
| unsurprisingly as two values are diverging towards Inf,
| so the result can become incorrect as the input
| approaches the interpolation point, until it becomes
| Inf/Inf = Nan at some even closer, but non-zero,
| distance. Before reaching Nan it can also have an
| interval where the result is +/-Inf.
|
| It depends on the interpolation points. But under some
| circumstances, it is necessary to recognise when the
| input is _close_ to an interpolation point, and adjust
| the formula appropriately.
| twic wrote:
| What is the rationale for liking rational functions?
|
| Years ago i had some interpolation to do, and used the
| barycentric rational from Boost [1]. The resulting curve
| was all over the place. A natural cubic spline gave a
| much more subjectively sensible-looking fit.
|
| [1] https://www.boost.org/doc/libs/1_77_0/libs/math/doc/h
| tml/mat...
| jacobolus wrote:
| Without knowing a lot more about the context and what
| precisely you did, it's impossible to comment about your
| negative experience.
|
| > _What is the rationale for liking rational functions?_
|
| They are cheap to evaluate, (relatively) easy to prove
| things about symbolically, and will efficiently
| approximate any reasonably smooth function.
| stncls wrote:
| An example use of Inf is bounds on a value. Say you want to
| express that x sometimes has upper and/or lower bounds. With
| Inf you can simply write l <= x <= u
|
| accepting that l is sometimes -Inf and u is sometimes +Inf.
| Without Inf, you get four different cases to handle. This is
| particularly handy when operations get slightly more complex,
| like transforming l' <= x + b <= u'
|
| into the canonical form above, for some finite b. With Inf
| you can simply write l = l' - b u =
| u' - b
|
| and things will work as one expects. Again, without Inf,
| multiple cases to handle correctly.
| beagle3 wrote:
| I often use NaNs on purpose to designate "unknown", and its
| properties of "infecting" computations mean that I don't have
| to check every step - just at the end of the computation.
|
| R goes even farther, and uses one specific NaN (out of the
| million-billions) to signal "not available", while other NaNs
| are just NaNs.
|
| Its properties, used properly, make code much more simple and
| sane.
| kloch wrote:
| NaN usually indicates a bug _or_ algorithmic problem, but not
| always. I have had cases where I need to detect NaN, present
| a useful message to the user even though everything is
| working as designed.
| SeanLuke wrote:
| * x+0.0 cannot be optimized to x because that is not true when x
| is -0.0
|
| Wait, what? What is x + -0.0 then? What are the special cases?
| The only case I can think of would be 0.0 + -0.0.
| foo92691 wrote:
| Seems like it might depend on the FPU rounding mode?
|
| https://en.wikipedia.org/wiki/Signed_zero#Arithmetic
| mhh__ wrote:
| If you use the LLVM D compiler you can opt in or out of these
| individually on a per-function basis. I don't trust them
| globally.
| gpderetta wrote:
| with gcc you can also use #pragma gcc optimize or
| __attribute__(optimize(...)) for a similar effect.
|
| It is not 100% bug free (at least it didn't use to) and often
| it prevents inlining a function into another having different
| optimization levels (so in practice its use has to be coarse
| grained).
| nly wrote:
| This pragma doesn't quite work for -ffast-math
|
| https://gcc.godbolt.org/z/voMK7x7hG
|
| Try it with and without the pragma, and adding -ffast-math to
| the compiler command line. It seems that with the pragma
| sqrt(x) * sqrt(x) becomes sqrt(x*x), but with the command
| line version it is simplified to just x.
| gpderetta wrote:
| That's very interesting. The pragma does indeed do a lot of
| optimizations compared to no-pragma (for example it doesn't
| call sqrtf at all), but the last simplification is only
| done with the global flag set. I wonder if it is a missed
| optimization or if there is a reason for that.
|
| edit: well with pragma fast-math it appears it is simply
| assuming finite math and x>0 thus skipping the call to the
| sqrtf as there are not going to be any errors, basically
| only saving a jmp. Using pragma finite-math-only and an
| explicit check are enough to trigger the optimization (but
| passing finite-math as a command line is not).
|
| Generally it seems that the various math flags behave
| slightly differently in the pragma/attribute.
| stabbles wrote:
| In Julia you can do `@fastmath ...`, which is basically just a
| find-and-replace of math operations and does not propagate into
| functions called in that code block, even when they are
| ultimately inlined. So what it does is:
| julia> @macroexpand @fastmath x + y
| :(Base.FastMath.add_fast(x, y))
|
| And maybe that's a good thing, because the scope of @fastmath
| is as limited as it gets.
| celrod wrote:
| Yeah, I really like that Julia and LLVM allow applying it on
| a per-operation basis.
|
| Because most of LLVM's backends don't allow for the same
| level of granularity, they do end up propagating some
| information more than I would like. For example, marking an
| operation as fast lets LLVM assume that it does not result in
| NaNs, letting nan checks get compiled away even though they
| themselves are not marked fast: julia>
| add_isnan(a,b) = isnan(@fastmath(a+b)) add_isnan
| (generic function with 1 method) julia>
| @code_llvm debuginfo=:none add_isnan(1.2,2.3) define i8
| @julia_add_isnan_597(double %0, double %1) #0 { top:
| ret i8 0 }
|
| meaning it remains more dangerous to use than it IMO should
| be. For this reason, LoopVectorization.jl does not apply
| "nonans".
| p0nce wrote:
| +1 Every time I tried --fast-math made things a bit slower.
| It's not that valuable with LLVM.
| m4r35n357 wrote:
| Scary stuff. I really hope the default is -fno-fast-math!!!
___________________________________________________________________
(page generated 2021-10-20 23:02 UTC)