[HN Gopher] Someone's Been Messing with My Subnormals
___________________________________________________________________
Someone's Been Messing with My Subnormals
Author : jpegqs
Score : 260 points
Date : 2022-09-06 15:14 UTC (7 hours ago)
(HTM) web link (moyix.blogspot.com)
(TXT) w3m dump (moyix.blogspot.com)
| benreesman wrote:
| That's...terrifying. This is a fantastic find: big, big respect
| to @moyix, this is going to save people's ass.
| olliej wrote:
| Wow, I am surprised that -ffast-math triggers a mode switch in
| the FPU in part due to the author's library problem, but also
| because the documentation for clang at least[1] does not say it
| impacts behaviour of denormals and in fact has a separate mode
| switch for that, which is not explicitly called out as being
| implied by -ffast-math.
|
| [1] https://clang.llvm.org/docs/UsersManual.html#cmdoption-
| ffast...
| nsajko wrote:
| -Ofast isn't a good name for the option, but in GCC's defense the
| manual is pretty clear about all this, and there's no excuse for
| blindly turning on compiler options - they literally change the
| semantics of your code.
| bombcar wrote:
| It's a quirk of language, that for compiler writers and other
| algorithmic people "fast" often means "ballpark, but damn
| quick".
|
| It's hard to come up with a similar name that isn't long.
| cesarb wrote:
| > It's hard to come up with a similar name that isn't long.
|
| The suggestion given elsewhere in these comments to call it
| "unsafe math" instead of "fast math" sounds good. It's nearly
| as short, and properly conveys the "you must know what you're
| doing" aspect of these flags. It's even better if you're used
| to Rust.
| actually_a_dog wrote:
| I agree. I think --ffast-math should actually be called
| --finexact-math. One would also hope that explicitly disabling
| an option on the command line would, you know, explicitly
| disable the option, but maybe that's too much to ask.
| mbauman wrote:
| I don't think it should exist at all. It's such a crazy grab
| bag of code changes disguised as "optimizations" that it's
| completely impossible to reason about, even for folks that
| "don't care" about the exact floating point arithmetic.
|
| It has global effects like those in TFA, and even locally you
| no longer know if a line or two of arithmetic will become
| more precise (e.g., by using higher precision intermediate
| results), less precise, or become complete gibberish (e.g.,
| because it thinks it can prove you're now dividing by zero
| and thus can just return whatever it wants).
| trelane wrote:
| -fyolo-math?
|
| -fgoodenough-math?
|
| -fbroken-but-fast-math
| mbauman wrote:
| I wholeheartedly disagree. -Ofast
| Disregard strict standards compliance. ...
|
| There's strict standards compliance and then there's the crazy
| grab bag of code changes that is `-ffast-math`. Further, I'd
| say gevent can defensibly say that -ffast-math is okay for them
| given what the manual says: -ffast-math
| ... it can result in incorrect output for programs that depend
| on an exact implementation of IEEE or ISO
| rules/specifications for math functions. It may,
| however, yield faster code for programs that do not require the
| guarantees of these specifications.
|
| This is 100% on the compiler people. For the option name, the
| documentation, and the behavior.
|
| https://gcc.gnu.org/onlinedocs/gcc-12.2.0/gcc/Optimize-Optio...
| nsajko wrote:
| Well, how would you improve the docs? Both documentation
| entries seem reasonable to me.
|
| That said, I don't see why the -Ofast option even needs to
| exist, except backwards compatibility, as -ffast-math and the
| others can (and should IMO) be specified explicitly.
| mrguyorama wrote:
| The fact that -ffast-math makes no mention that it will
| poison any other code executing in your process space is a
| huge missing point of info. Docs as written, anyone not
| doing scientific math should have that flag, but the
| reality is that most people have some code somewhere in
| their process that expects fairly sane floating point math
| behavior, even if it's just displaying progress bars or
| something.
| nsajko wrote:
| > The fact that -ffast-math makes no mention that it will
| poison any other code executing in your process space
|
| Untrue. The doc entry for -ffast-math says "can result in
| incorrect output for _programs_ that depend on an exact
| implementation of IEEE or ISO rules /specifications for
| math functions". Emphasis mine.
|
| So they clearly say that the entire program can turn
| invalid when -ffast-math is used.
|
| You and some other people here act like the docs say
| "translation unit" or something like that, instead of
| "program", but this is simply not the case.
|
| Furthermore, the entry for -ffast-math points to entries
| for suboptions that -ffast-math turns on (located right
| below in the man page), e.g. -funsafe-math-optimization.
| These also make clear how dangerous they can be even when
| turned on one at a time.
| Athas wrote:
| Consider the documentation for the similar compiler flag in
| the OpenCL specification:
|
| > -cl-unsafe-math-optimizations
|
| > Allow optimizations for floating-point arithmetic that
| (a) assume that arguments and results are valid, (b) may
| violate IEEE 754 standard and (c) may violate the OpenCL
| numerical compliance requirements as defined in section 7.4
| for single-precision floating-point, section 9.3.9 for
| double-precision floating-point, and edge case behavior in
| section 7.5. This option includes the -cl-no-signed-zeros
| and -cl-mad-enable options.
|
| While it stops short of saying "this will likely break your
| code" (maybe because it doesn't have the nonlocal effects
| of -ffast-math), it makes it much more clear that this flag
| is generally unsafe and fragile, except under rather
| specific circumstances. Also, it is reasonably exact about
| what those circumstances are. I'm not sure -ffast-math is
| documented with enough precision for a programmer to even
| know whether it will break their code. Best you can do is
| try and see if the program still works.
| nsajko wrote:
| The relevant GCC man page entries are even more clear
| than the OpenCL spec excerpt.
|
| -ffast-math:
|
| > This option is not turned on by any -O option besides
| -Ofast since it can result in incorrect output for
| programs that depend on an exact implementation of IEEE
| or ISO rules/specifications for math functions.
|
| It also point to the -funsafe-math-optimizations sub-
| option, where it is said that:
|
| > Allow optimizations for floating-point arithmetic that
| (a) assume that arguments and results are valid and (b)
| may violate IEEE or ANSI standards. When used at link
| time, it may include libraries or startup files that
| change the default FPU control word or other similar
| optimizations. [...]
| mbauman wrote:
| Yes, exactly: I'd deprecate it entirely. It shouldn't be a
| single flag.
| fweimer wrote:
| What's missing is that it also affects linking, and results in
| this strange action-at-a-distance. Maybe disabling the linker
| part with -shared would be a reasonable compromise.
| nsajko wrote:
| You're wrong, both the doc entry for -Ofast and the one for
| -ffast-math say that they can result in incorrect _programs_.
| Programs are produced by linking, so I don 't see what other
| way to interpret this is possible.
| brigade wrote:
| Why not simply replace all FP math with a constant zero?
| That'd be _really_ fast and an equally valid strict
| interpretation of "can result in incorrect programs."
| nsajko wrote:
| See https://news.ycombinator.com/newsguidelines.html,
| e.g.:
|
| > Please don't post shallow dismissals, especially of
| other people's work. A good critical comment teaches us
| something.
| brigade wrote:
| Just because you're shallowly dismissing my comment
| doesn't make it wrong.
|
| Linking in code with undefined (in this case, _re_
| defined) behavior doesn't automatically invalidate the
| entire program. But thats the language used because once
| the undefined behavior is hit at runtime, the spec no
| longer defines what the behavior is and what the program
| will do afterwards.
| Const-me wrote:
| That thread-local MXCSR register is particularly entertaining in
| a thread pool environment, such as OpenMP. OSes carefully
| preserve that piece of thread state across context switches.
|
| I tend to avoid touching that value, even when it means extra
| instructions like roundpd for specific rounding mode, or shuffles
| to avoid division by 0 in the unused lanes.
| mananaysiempre wrote:
| Following the article's links, I fail to find an actual example
| of anything failing to converge in flush-subnormals mode. I mean,
| I'm sure one could be squeezed out, but the justification given
| amounts to "Sterbenz's lemma [the one that rephrases
| "catastrophic cancellation" as "exact differences"] fails, maybe
| something somewhere also will". And my (shallow but not
| nonexistent) experience with numerical analysis is that proofs
| lump subnormals with underflow, and most of them don't survive
| even intermediate underflows.
|
| (AFAIU the original Intel justification for pushing subnormals
| into 754 was gradual underflow, i.e. to give people at least
| something to look at for debugging when they've ran out of
| precision.)
|
| So, yes, it's not exactly polite to fiddle with floating-point
| flag bits that are not yours, and it's better that this not
| happen for reproducibility if nothing else, but I doubt it
| actually breaks any interesting numerics.
| moyix wrote:
| The gevent issue has an example:
|
| https://github.com/gevent/gevent/pull/1820
|
| I haven't examined the code of scipy.stats.skellam.sf so I
| can't say for sure that it's not converging, but it's clearly
| some kind of pathological behavior.
| mananaysiempre wrote:
| So somebody tried to calculate, for integer arguments from 0
| to 99 inclusive, the CDF of the difference of two Poisson
| variables with means 4e-6 and 1e-6? I... don't know if it is
| at all reasonable to expect an answer to that question. As
| in, genuinely don't know--obviously it's an utterly rotten
| thing to compute, but at the same time maybe somebody got
| really interested in that and figured out a way to make it
| work.
|
| Anyhow, my spelunking was cut off by sleep, so the best I can
| tell that would end up in the CDFLIB[1] routine CUMCHN with X
| = 8e-6, PNONC = 2e-6, DF from 0 to 99. The insides don't
| really look like the kind of magic that is held up by
| Sterbenz's lemma and strategically arranged to take advantage
| of gradual underflow, so at first glance I wouldn't trust
| anything subnormal-dependent that it would compute, but maybe
| it still is? Sleep.
|
| [1]
| https://people.sc.fsu.edu/~jburkardt/f_src/cdflib/cdflib.f90
| moyix wrote:
| Yeah, unfortunately I have no idea if that was their
| original goal (which seems unlikely?) or if this is just a
| minimal example they came up with after tripping over the
| actual problem in a more realistic setting.
|
| I think it suffices to show that the behavior of FTZ/DAZ
| caused an actual problem for someone, though. I agree that
| the vast majority of numerical code won't care about
| FTZ/DAZ, but when it's enabled thread-wide you have no idea
| what kind of code you'll end up affecting.
| UncleEntity wrote:
| My last bug report I wrote a small C++ program to put all
| the values between 0x000 .. 0xfff into a tree structure and
| then iterate over the tree printing out the values.
|
| I'd have loved if the library author replied with "why
| don't you just print out the values directly?"
| leni536 wrote:
| Does this only affect pypi, or should I now worry about shared
| libraries shipped with my distro as well? Debian is not crazy
| enough to ship shared libs compiled with -ffast-math, right?
| RIGHT?
| moyix wrote:
| Please don't do this to me, I don't know if I have it in me to
| go on ANOTHER big scrape & scan.
| JonChesterfield wrote:
| If the package build scripts from upstream have that in them,
| Debian packaged versions probably do too
| cesarb wrote:
| At a previous company I worked at, we had an issue with our
| software (Windows-based, written in a proprietary language)
| randomly crashing. After some debugging, we found that this
| happened whenever the user made some specific actions, but only
| if, in that session, the user had previously printed something or
| opened a file picker. The culprit was either a printer driver or
| a shell extension which, when loaded, changed the floating point
| control word to trap. That happened whenever the culprit DLL had
| been compiled by a specific compiler, which had the offending
| code in the startup routine it linked into every DLL it produced.
|
| Our solution was the inverse of the one presented in this
| article: instead of wrapping our routines to temporarily set the
| floating point control word to sane values, we wrapped the calls
| to either printing or the file picker, and reset the floating
| point control word to its previous (and sane) value after these
| calls.
| becurious wrote:
| Had this exact same problem. It was a specific color inkjet
| driver doing this, my guess is to enable dithering or something
| similar. It's one of those things that infects everything in
| the code base because the way you print with GDI is to
| progressively draw parts of the page - so you have to call in
| and out of code that talks to the printer DC. We also had to
| render one item using Direct3D retained mode and that added to
| the fp control word complexity. Things seemed to be more robust
| on NT based OSes.
| klysm wrote:
| That is one hell of a war story - I didn't realize that kind of
| failure was even possible, but it is truly terrifying.
| pavlov wrote:
| Direct3D used to flip the x87 FPU to single precision mode by
| default. This produced some amazing bugs when your other C
| libraries reasonably assumed that a double would be at least
| 64 bits. (The FPU mode settings affected the thread that
| called Direct3D, and most programs used to be single-
| threaded.)
|
| It seems they changed this behavior in Direct3D 10:
|
| https://microsoft.public.win32.programmer.directx.graphics.n.
| ..
| speeder wrote:
| I stumbled into this bug in a rather spetacular manner.
|
| I was making a game using D3D, Lua and Chipmunk physics,
| and some of the behaviour of the game was being odd.
|
| So I started to try printing random stuff with Lua,
| eventually I just tried: print(5+5), and to my surprise my
| console outputted "11".
|
| I went into Lua's irc channel to talk about this, and
| everyone said I was nuts, that the number was too small to
| trigger precision issues, that I was a troll and so on.
|
| After a lot of searching I found out about this D3D bug, so
| I switched the game to use OpenGL instead there it was, 5+5
| = 10 again!
|
| Now why fiddling with the FPU could make 5+5 become 11, I
| have no idea.
| titzer wrote:
| I've heard so many stories akin to this one that I just shake
| my head. It's a self-inflicted wound that people who prioritize
| _performance_ above other considerations _keep inflicting on
| everyone else_.
|
| I _hope_ we learned our lessons on this specific question in
| the design of Wasm. There are subnormals in Wasm and you can 't
| turn them off for performance.
| ack_complete wrote:
| Had to deal with this same issue when I had a program
| supporting plugins, DLLs compiled with Delphi would turn on all
| the floating point traps. Took a while to track down what was
| causing FP faults in comctl32.dll. It got so bad that I had to
| put in a popup dialog that would name and shame the offending
| DLL so the authors would fix their broken plugins. It's an ABI
| violation in Windows since the ABI specifically defines FPU
| exceptions as masked, so this was more egregious than just
| turning on FTZ/DAZ (which Intel-compiled DLLs did).
|
| Many of these same DLLs would also hijack
| SetUnhandledExceptionFilter() for their custom exception
| support, which would also result in hard fastfail crashes when
| they failed to unhook properly. Ended up having to hotpatch
| SetUnhandledExceptionFilter() Detours-style to prevent my crash
| reporting filter from being overridden. Years later, Microsoft
| revealed that Office had done the same thing for the same
| reasons.
|
| The new version of this problem is DLLs that use AVX
| instructions and then don't execute a VZEROALL/VZEROUPPER
| instruction before returning. This is more sinister as it
| doesn't cause a failure, it just causes SSE2 code to run up to
| four times slower in the thread.
| astrange wrote:
| You could also get an issue with x87/MMX where floating point
| code wouldn't work if you wrote some MMX code and didn't do
| an `emms` instruction afterward.
|
| This is basically the reason compiler autovectorization
| doesn't do MMX.
| pavon wrote:
| Yep, I've encountered floating point flag incompatibilities
| when dynamically loading Borland-compiled libraries into
| Visual Studio compiled applications, as well as when using
| C++ code via Java Native Interface.
|
| It is nice that diverse vendor-specific calling conventions
| and ABIs are less common these days.
| Xorlev wrote:
| I was interested in the last point about AVX instructions,
| and found https://john-
| h-k.github.io/VexTransitionPenalties.html which discusses the
| problem.
| puffoflogic wrote:
| Dynamic linking is the root of all kinds of evil, enough said.
| benreesman wrote:
| As a default (particularly an effectively _mandatory_ default,
| looking at you glibc) it is indeed insane.
|
| But for something like a Python extension it's what we've got.
|
| Which has the ancillary benefit of surfacing stuff like this.
| woodruffw wrote:
| The content of this post has nothing to do with the specifics
| of dynamic linking: it would be just as true if the wheels in
| question had static binaries instead.
| benreesman wrote:
| Eh, somewhere in the middle. Someone else put '-ffast-math'
| in a compile line and it poisons FP math far away with no
| recompile?
|
| I believe it's a necessary price in this case, but it does
| highlight how suboptimal it is to pay the price in other
| cases.
| woodruffw wrote:
| It's fair to point out that shared objects _surface_ the
| problem here, but I don 't know if I would lay the blame
| with them: the underlying problem is that a FPU control
| register isn't (and can't be) meaningfully isolated. Python
| needs to use shared objects for loadable extensions, but
| the contaminating code might be statically linked into that
| shared object.
|
| (I don't say this because I want to excuse dynamic linking,
| which I also generally dislike! Only that I think the
| problem is somewhere else in this particular case.)
| jeroenhd wrote:
| What is the alternative here? To provide a python.so file with
| all possible binary Python packages statically linked into it?
| You'd need to update it every hour to include all the bugfixes
| in every native library yanked in! To recompile Python itself
| every time you install a package? Even with a compiler cache
| you'd have the Gentoo experience of waiting for ages every time
| you try to use the package manager.
|
| Dynamic linking solves a real problem, especially in this
| space. It comes with new problems of its own but so does the
| alternative.
| [deleted]
| [deleted]
| compiler-guy wrote:
| -funsafe-math is neither fun nor safe.
| kibwen wrote:
| I hereby propose that we rename "unsafe-math" to "ucking-
| broken-math".
| tomrod wrote:
| I approve. Lets get someone with authority to make the
| change.
| black_knight wrote:
| I ran Gentoo back in the good old days. The biggest draw was that
| after about a week of compiling my system ran a lot faster
| because of all the compiler optimisations one could enable
| because it only had to work on your CPU.
|
| I might be misremembering, but I think fastmath was one of the
| flags explicitly warned against in the Gentoo manual.
| bombcar wrote:
| It was and people would still use it because "hey it says
| fast".
|
| The CPU flags was less interesting to me compared to being able
| to disable features like X.
| p_l wrote:
| There was a big warning that it might produce broken system,
| iirc
| jeffbee wrote:
| ChromeOS is sort of the successor to Gentoo. The images are
| built with profile-guided, link-time, and post-link
| optimization, and they are targeted to the specific CPU in a
| given Chromebook. Every other Linux leaves a large amount of
| performance on the table by targeting a common denominator CPU
| that's 20 years old and not having PGO.
| TazeTSchnitzel wrote:
| Apple avoid this problem with their OS by having a separate
| architecture slice for modern x64 (Haswell+).
| yjftsjthsd-h wrote:
| It's not a successor, it's a derivative. And yes, if you're
| only targeting specific known hardware than you can and
| probably should optimize for it, but most distributions fully
| intend to be usable on very nearly any x86(_64) hardware so
| they can't do that.
| jerf wrote:
| It's also a bit less relevant when everything is so fast. I
| used Gentoo on a cheap-for-the-time Pentium 133MHz. Gentoo
| was basically the difference between a modestly pleasant
| system and an unusably slow system if I tried to run a
| standard still-compiled-for-386 distro on it.
|
| I've long since stopped worrying about it because on the
| systems I run, which are not top-of-the-line but aren't
| RPis either, it's not worth worrying about anymore for most
| programs. At most maybe you should target the one
| particular program you use that could use a boost.
| yjftsjthsd-h wrote:
| Yeah, I don't know the breakdown between better hardware
| and better compiler optimizations (even in the default
| settings) and less differentiation between processors,
| but I've done some minor not-very-scientific tests of
| compiling packages with O3/march=mtune=native and in my
| limited experience it wasn't particularly useful. Like,
| not just small benefits, but zero or below the noise
| floor benefits in my benchmarks. Obviously this is super
| dependent on your workload and maybe hardware; it's an
| area where if you care, you _have_ to do your own
| testing.
| jeffbee wrote:
| Tune for native sometimes makes a difference but not
| always. Targeting a platform that is known to have AVX2,
| instead of detecting AVX2 at runtime and bouncing through
| the PLT, can make a large difference. PGO remains the
| largest opportunity.
| hackingthelema wrote:
| > I might be misremembering, but I think fastmath was one of
| the flags explicitly warned against in the Gentoo manual.
|
| It is, here:
| https://wiki.gentoo.org/wiki/GCC_optimization#But_I_get_bett...
| TazeTSchnitzel wrote:
| Global state is the root of so many evils! FPU rounding mode, FPU
| flush-to-zero mode, C locale, errno, and probably some other
| things should all be eliminated. The functionality should still
| exist but not as global flags.
| leni536 wrote:
| At least many of those are thread-local. But not C locale, it
| is truly horrible.
| Tyr42 wrote:
| Oh man, great job digging through all that. This is exactly the
| kind of content I want to see.
|
| Don't you love your fun safe math?
| ChrisRackauckas wrote:
| The Julia package ecosystem has a lot of safeguards against
| silent incorrect behavior like this. For example, if you try to
| add a package binary build which would use fast math flags, it
| will throw an error and tell you to repent:
|
| https://github.com/JuliaPackaging/BinaryBuilderBase.jl/blob/...
|
| In user codes you can do `@fastmath`, but it's at the semantic
| level so it will change `sin` to `sin_fast` but not recurse down
| into other people's functions, because at that point you're just
| asking for trouble. There's also calls to rename it `@unsafemath`
| in Julia, just to make it explicit. In summary, "Fastmath" is
| overused and many times people actually want other optimizations
| (automatic FMA), and people really need to stop throwing global
| changes around willy-nilly, and programming languages need to
| force people to avoid such global issues both semantically and
| within its package ecosystems norms.
| aidenn0 wrote:
| Automatic FMA can change the result of operations, so it makes
| (some) sense to be bundled in with fastmath.
| ChrisRackauckas wrote:
| But if what you want is automatic FMA, then why carry along
| every other possible behavior with it? Just because you want
| FMA, suddenly NaNs are turned into Infs, subnormal numbers go
| to zero, handling of sin(x) at small values is inaccurate,
| etc? To me that's painting numerical handling in way too
| broad of strokes. FMA also only increases numerical accuracy,
| it doesn't decrease numerical accuracy, so bundling it with
| unsafe transformations makes one uncertain now whether it has
| improved or decreased accuracy.
|
| For reference, to handle this well we use MuladdMacro.jl
| which is a semantic transformation that turns x*y+z into
| muladd expressions, and it does not recurse into functions so
| it does not change the definitions of the callers inside of
| the macro scope.
|
| https://github.com/SciML/MuladdMacro.jl
|
| This is something that will always increase performance and
| accuracy (performance because muladd in Julia is an FMA that
| is only applied if hardware FMA exists, effectively never
| resorting to a software FMA emulation) because it's targeted
| to do only a transformation that has that property.
| eigenspace wrote:
| This isn't really as valid a comparison as you might think it
| is. The results of operations varying is not the problem with
| 'fast-math', the problem is that can negatively impact
| accuracy in catastrophic ways (among other things).
|
| Sure, automatic FMA can change the result, but to my
| knowledge it always gives a _more_ accurate result, not a
| less accurate one, and the way in which the results may
| differ is bounded.
| raymondh wrote:
| This is a rockstar quality post. It is astonishing how much
| detective work was involved.
| stabbles wrote:
| See also https://simonbyrne.github.io/notes/fastmath/ for a
| similar story in julia, where ffast-math is now banned for
| C/C++/Fortran dependencies
| jesse__ wrote:
| 10/10 yak shave. Would certainly read again
| elina123 wrote:
| bee_rider wrote:
| A decorator is a nice idea for this.
|
| I was going to suggest another package that just resets the MXCSR
| when imported, but I guess... hypothetically... some function
| might actually want the FTZ behavior.
| jcranmer wrote:
| If you want that behavior, you should explicitly enable
| it/disable it at the borders of the region where you want that
| behavior, rather than screwing over everybody for your own
| benefit.
| jcranmer wrote:
| The problem here is that enabling FTZ/DAZ flags involves
| modifying global (technically thread-local) state that is
| relatively expensive to do. Ideally, you'd want to twiddle these
| flags only for code that wants to work in this mode, but given
| the relative expense of this operation, it's not entirely
| practicable to auto-add twiddling to every function call, and
| doing it manually is somewhat challenging because compilers tend
| to support accessing the floating-point status rather poorly.
| Also, FTZ/DAZ aren't IEEE 754, so there's no portable function
| for twiddling these bits as there is for other rounding mode or
| exception controls. I will note that icc's -fp-model=fast and
| MSVC's /fp:fast correctly do not link code with crtfastmath.
|
| As a side note, this kind of thing is why I think a good title
| for a fast-math would be "Fast math, or how I learned to start
| worrying and hate floating point."
| [deleted]
| titzer wrote:
| I don't think flipping these flags is expensive. Can you
| provide a source for that? AFAICT modern microarchitectures are
| going to register-rename that into the u-ops issued to the
| functional units, rather than flush the entire ROB.
| mrtesthah wrote:
| I thought the purpose of Python was to make development simple
| and predictable. Needing to track down the compilation and linker
| flags of every single shared library reveals the fallacy of this
| abstraction.
| RodgerTheGreat wrote:
| If a language wishes to reap the rewards of a pre-existing
| ecosystem, it must pay for the warts and misfeatures of that
| ecosystem. Python is deeply dependent on C libraries to achieve
| acceptable performance, and this is the price.
| magicalhippo wrote:
| Denormalized numbers is one reason why you really want to think
| carefully if you try to optimize code by rewriting expressions
| involving multiplication and division.
|
| For example, if you got "x = (a / b) * (c / d)" one might think
| that rewriting it as "x = (a * c) / (b * d)" will save you a
| division and gain you speed. It will and it might, respectively.
|
| However it will also potentially break an otherwise safe
| operation. If the numbers are _very_ small, but still normal,
| then the product (b * d) might result in a denormalized number,
| and dividing by it will result in + /- infinity.
|
| However, the code might guarantee that the ratios (a / b) and (c
| / d) are not too small or too large, so that multiplying them is
| guaranteed to lead to a useful result.
| bee_rider wrote:
| Anyway, since there aren't any dependencies between a, b, c,
| and d, I would expect the two divisions to end up basically in
| parallel in the pipeline. So the critical path is a division
| and a multiplication either way. Of course that is just a
| guess.
| garaetjjte wrote:
| > it turns out that when you use -Ofast, -fno-fast-math does not,
| in fact, disable fast math. lol. lmao.
|
| What about -fno-unsafe-math-optimizations?
| moyix wrote:
| Nope, it still links in crtfastmath: $ gcc
| -Ofast -fno-unsafe-math-optimizations -fpic -shared foo.c -o
| foo.so $ objdump -j .text --disassemble=set_fast_math
| foo.so foo.so: file format elf64-x86-64
| Disassembly of section .text: 0000000000001040
| <set_fast_math>: 1040: f3 0f 1e fa
| endbr64 1044: 0f ae 5c 24 fc stmxcsr
| -0x4(%rsp) 1049: 81 4c 24 fc 40 80 00 orl
| $0x8040,-0x4(%rsp) 1050: 00 1051: 0f
| ae 54 24 fc ldmxcsr -0x4(%rsp) 1056: c3
| retq
| Night_Thastus wrote:
| Ouch. Two flags that should reasonably stop this, and neither
| do. This feels a bit like the time I was told "No, -wAll does
| not in fact add all warnings".
| speeder wrote:
| Wait, it doesn't? O.o
| moyix wrote:
| Nope. clang has "-Weverything", and gcc has "-Wextra",
| both of which go beyond "-Wall".
|
| https://stackoverflow.com/questions/11714827/how-can-i-
| turn-...
| klysm wrote:
| Pain. This is so scuffed
___________________________________________________________________
(page generated 2022-09-06 23:00 UTC)