[HN Gopher] Emulating AMD Approximate Arithmetic Instructions on...
___________________________________________________________________
Emulating AMD Approximate Arithmetic Instructions on Intel
Author : zdw
Score : 63 points
Date : 2021-09-12 14:25 UTC (8 hours ago)
(HTM) web link (robert.ocallahan.org)
(TXT) w3m dump (robert.ocallahan.org)
| boulos wrote:
| Yikes.
|
| A lot of code uses _mm_rsqrt_ps (sometimes) followed by a Newton-
| raphson update to compute a "precise" 1/sqrt(x). Here's a good
| example of NEON's rsqrt being sufficiently different from Intel,
| that more iterations were necessary for Embree on ARM [1].
|
| Because I only cared about vectorization a long time ago, and AMD
| was so uncompetitive then, I'd bet a lot of code assumes that the
| SSE rsqrtps values match.
|
| [1] https://github.com/lighttransport/embree-aarch64/issues/20
| boulos wrote:
| (Too late for edit)
|
| Looks like Eigen also _defaults_ to EIGEN_FAST_MATH which makes
| Eigen 's psqrt ("packet sqrt") use _mm256_rsqrt_ps instead of
| _mm256_sqrt_ps [1].
|
| Interestingly, the thing they're trying to avoid (long latency
| of sqrt vs rsqrt) hasn't been true for a long time on Intel
| processors, but apparently is still true for AMD parts
| according to Agner Fog's tables [2] (though maybe I'm reading
| them wrong, there is no vsqrtps entry for Zen2/3).
|
| Hopefully, Eigen will separate the single global "fast math"
| config [3].
|
| [1]
| https://gitlab.com/libeigen/eigen/-/blob/a75122584594fb98db0...
|
| [2] https://agner.org/optimize/instruction_tables.pdf
|
| [3] https://gitlab.com/libeigen/eigen/-/issues/1687
| Varriount wrote:
| Neat, though I'm curious - when do instructions like this
| typically get used?
| pkhuong wrote:
| The Newton iteration is nicer for the inverse square root than
| for the square root. You can refine an initial approximation
| for `1 / sqrt(x)`, and multiply the result by `x` to compute an
| approximation of `sqrt(x)`. This less direct approach only
| needs FP multiplications and additions (and the initial
| approximation).
| sharpneli wrote:
| Normalizing a vector needs this. And it's super common as
| whenever you just want to have a direction you're likely going
| to need it. Your GPU also has a fast instruction for this
| because in graphics it's very very common operation.
| _0ffh wrote:
| Presumably when speed of execution is more desireable than
| maximal precision. RSQRTSS for example could be useful for
| graphics applications I believe.
| saagarjha wrote:
| They compute reciprocal, or inverse, square roots. You may be
| familiar with the "fast inverse square root" code snippet: this
| is a hardware instruction for that operation, and is thus
| useful for games.
| dnautics wrote:
| Not a real answer because I've never deployed such a thing, but
| I could imagine using these functions to bootstrap an
| approximation procedure (e.g. newton-raphson, or something of
| the sort) so you're going to do an accuracy refinement anyways
| that is going to buy you more accuracy and guarantee
| convergence within <1 ulp in a subsequent step so you might as
| well use a fast instruction over an accurate one for the first
| step.
| snovv_crash wrote:
| Inverse square root is often used to normalize the length of a
| vector to 1.
| alex_smart wrote:
| To add to what others have already pointed out, inverse square
| root is an important operation in finance (for example in
| pricing options: calculating implied volatility in Black-
| Scholes Model).
|
| However, if speed is that important to you (e.g. if you are an
| HFT), you don't even want to be calculating the inverse square
| root in your hotpath. Basically, the implied volatility is
| "seeded" once and then you update it using greeks (finance term
| for a derivative, don't ask me why).
| mhh__ wrote:
| Sometimes they aren't even real Greek!
| arcticbull wrote:
| Greeks are a family of derivatives that apply to options, not
| just one.
|
| Delta is the amount an option goes up or down in price for
| every $ the underlying moves.
|
| Gamma is the second derivative. The change in delta as a
| function of change in price of the underlying.
|
| Theta is the amount an option goes down in price for each day
| you hold it ("time decay")
| alex_smart wrote:
| You know you're in a forum of programmers when you
| implicitly switch a noun from plural to singular in a
| parenthetical remark and instead of inferring the meaning
| based on context, someone feels the need to correct you.
| gumby wrote:
| I didn't read it as a correction to number but rather as
| a response to your "I don't know why"
| Sniffnoy wrote:
| Since the post doesn't explain it, I'll note that while RSQRTSS
| is approximate reciprocal square root, RCPSS is just approximate
| reciprocal.
___________________________________________________________________
(page generated 2021-09-12 23:01 UTC)