[HN Gopher] Emulating the FMAdd Instruction, Part 1: 32-bit Floats
___________________________________________________________________
Emulating the FMAdd Instruction, Part 1: 32-bit Floats
Author : TazeTSchnitzel
Score : 32 points
Date : 2025-01-02 08:37 UTC (14 hours ago)
(HTM) web link (drilian.com)
(TXT) w3m dump (drilian.com)
| msk-lywenn wrote:
| I think the link should be:
| https://drilian.com/posts/2024.12.31-emulating-the-fmadd-ins...
| RossBencina wrote:
| I would love to know how the author tested/verified that the code
| is correct.
| eapriv wrote:
| It doesn't even compile (the last code snippet uses some
| undefined variables).
| stephencanon wrote:
| The _algorithm_ is certainly correct (this is a relatively
| straightforward and well-known theorem of floating-point
| arithmetic, see e.g. "Handbook of Floating-Point Arithmetic" by
| JM Muller et al).
|
| Whether or not the code presented is a faithful translation of
| that algorithm is more subtle and somewhat dependent on the
| environment; the AddWithError operation requires that no excess
| precision is used (FLT_EVAL_METHOD == 0), which is the norm for
| compilers today but not quite universal, especially for
| compilers that target 32b x86. The `LowestBitOfMantissa`,
| `AddOneBitToMantissa` and `SubtractOneBitFromMantissa`
| operations are not implemented in the post--those aren't hard
| to implement, but doing it without invoking undefined behavior
| can be subtle (and what's actually permitted is a critical
| distinction between C and C++).
|
| It also may not handle some non-finite cases correctly--when
| the result is infinite or NaN, the error term as computed will
| be NaN, which will pass `errorTerm != 0.0 &&
| LowestBitOfMantissa(value) == 0`, which will cause a bit to be
| subtracted from the result, rounding infinity back to a finite
| value. That will overflow back to infinity on the final
| conversion in default rounding, but in directed rounding modes
| will give the wrong answer. It also will get the FP flags wrong
| in some cases, which doesn't matter much for most use, but
| means you couldn't use it as a C standard library
| implementation.
|
| A fully correct implementation that handles all these small
| details isn't really any more complicated, and can be written
| in ~20 lines of C without any code golfing (I wrote Apple's
| libm fma[f] for targets that do not have hardware support). The
| "hard" part is writing the thousands and thousands of tests
| that allow you to have some confidence in the implementation
| details--this can largely be automated, but it's definitely the
| harder side of the task.
___________________________________________________________________
(page generated 2025-01-02 23:02 UTC)