hngopher.com

       [HN Gopher] Emulating AMD Approximate Arithmetic Instructions on...
       ___________________________________________________________________
        
       Emulating AMD Approximate Arithmetic Instructions on Intel
        
       Author : zdw
       Score  : 63 points
       Date   : 2021-09-12 14:25 UTC (8 hours ago)
        
 (HTM) web link (robert.ocallahan.org)
 (TXT) w3m dump (robert.ocallahan.org)
        
       | boulos wrote:
       | Yikes.
       | 
       | A lot of code uses _mm_rsqrt_ps (sometimes) followed by a Newton-
       | raphson update to compute a "precise" 1/sqrt(x). Here's a good
       | example of NEON's rsqrt being sufficiently different from Intel,
       | that more iterations were necessary for Embree on ARM [1].
       | 
       | Because I only cared about vectorization a long time ago, and AMD
       | was so uncompetitive then, I'd bet a lot of code assumes that the
       | SSE rsqrtps values match.
       | 
       | [1] https://github.com/lighttransport/embree-aarch64/issues/20
        
         | boulos wrote:
         | (Too late for edit)
         | 
         | Looks like Eigen also _defaults_ to EIGEN_FAST_MATH which makes
         | Eigen 's psqrt ("packet sqrt") use _mm256_rsqrt_ps instead of
         | _mm256_sqrt_ps [1].
         | 
         | Interestingly, the thing they're trying to avoid (long latency
         | of sqrt vs rsqrt) hasn't been true for a long time on Intel
         | processors, but apparently is still true for AMD parts
         | according to Agner Fog's tables [2] (though maybe I'm reading
         | them wrong, there is no vsqrtps entry for Zen2/3).
         | 
         | Hopefully, Eigen will separate the single global "fast math"
         | config [3].
         | 
         | [1]
         | https://gitlab.com/libeigen/eigen/-/blob/a75122584594fb98db0...
         | 
         | [2] https://agner.org/optimize/instruction_tables.pdf
         | 
         | [3] https://gitlab.com/libeigen/eigen/-/issues/1687
        
       | Varriount wrote:
       | Neat, though I'm curious - when do instructions like this
       | typically get used?
        
         | pkhuong wrote:
         | The Newton iteration is nicer for the inverse square root than
         | for the square root. You can refine an initial approximation
         | for `1 / sqrt(x)`, and multiply the result by `x` to compute an
         | approximation of `sqrt(x)`. This less direct approach only
         | needs FP multiplications and additions (and the initial
         | approximation).
        
         | sharpneli wrote:
         | Normalizing a vector needs this. And it's super common as
         | whenever you just want to have a direction you're likely going
         | to need it. Your GPU also has a fast instruction for this
         | because in graphics it's very very common operation.
        
         | _0ffh wrote:
         | Presumably when speed of execution is more desireable than
         | maximal precision. RSQRTSS for example could be useful for
         | graphics applications I believe.
        
         | saagarjha wrote:
         | They compute reciprocal, or inverse, square roots. You may be
         | familiar with the "fast inverse square root" code snippet: this
         | is a hardware instruction for that operation, and is thus
         | useful for games.
        
         | dnautics wrote:
         | Not a real answer because I've never deployed such a thing, but
         | I could imagine using these functions to bootstrap an
         | approximation procedure (e.g. newton-raphson, or something of
         | the sort) so you're going to do an accuracy refinement anyways
         | that is going to buy you more accuracy and guarantee
         | convergence within <1 ulp in a subsequent step so you might as
         | well use a fast instruction over an accurate one for the first
         | step.
        
         | snovv_crash wrote:
         | Inverse square root is often used to normalize the length of a
         | vector to 1.
        
         | alex_smart wrote:
         | To add to what others have already pointed out, inverse square
         | root is an important operation in finance (for example in
         | pricing options: calculating implied volatility in Black-
         | Scholes Model).
         | 
         | However, if speed is that important to you (e.g. if you are an
         | HFT), you don't even want to be calculating the inverse square
         | root in your hotpath. Basically, the implied volatility is
         | "seeded" once and then you update it using greeks (finance term
         | for a derivative, don't ask me why).
        
           | mhh__ wrote:
           | Sometimes they aren't even real Greek!
        
           | arcticbull wrote:
           | Greeks are a family of derivatives that apply to options, not
           | just one.
           | 
           | Delta is the amount an option goes up or down in price for
           | every $ the underlying moves.
           | 
           | Gamma is the second derivative. The change in delta as a
           | function of change in price of the underlying.
           | 
           | Theta is the amount an option goes down in price for each day
           | you hold it ("time decay")
        
             | alex_smart wrote:
             | You know you're in a forum of programmers when you
             | implicitly switch a noun from plural to singular in a
             | parenthetical remark and instead of inferring the meaning
             | based on context, someone feels the need to correct you.
        
               | gumby wrote:
               | I didn't read it as a correction to number but rather as
               | a response to your "I don't know why"
        
       | Sniffnoy wrote:
       | Since the post doesn't explain it, I'll note that while RSQRTSS
       | is approximate reciprocal square root, RCPSS is just approximate
       | reciprocal.
        
       ___________________________________________________________________
       (page generated 2021-09-12 23:01 UTC)