hngopher.com

       [HN Gopher] How do computers calculate sine?
       ___________________________________________________________________
        
       How do computers calculate sine?
        
       Author : vegesm
       Score  : 63 points
       Date   : 2024-03-07 18:59 UTC (4 hours ago)
        
 (HTM) web link (androidcalculator.com)
 (TXT) w3m dump (androidcalculator.com)
        
       | paulpauper wrote:
       | sine is easy because the series is globally convergent and fast
       | converging
        
         | ot wrote:
         | Did you read the article? It is specifically about how the
         | series looks simple, but the error is actually very bad if you
         | do things naively.
        
           | paulpauper wrote:
           | That still makes it easier compared to computing constants in
           | which the series are not globally convergent, like inverse
           | trig functions. Obviously, you would have to break it apart
           | to speed convergence.
        
       | duped wrote:
       | 1 - cos^2(x), obviously
        
         | nh23423fefe wrote:
         | instead, you could just double negate to optimize away the
         | square root                   -(-(sin(x))
        
         | ChainOfFools wrote:
         | clear, but too verbose. 1/csc is what you want.
         | 
         | or for style points just -cos'
        
       | t-3 wrote:
       | Link doesn't appear to be valid, but aren't these usually
       | precalculated and stored in a lookup table?
        
       | tails4e wrote:
       | I've seen the third order Taylor series used, but with the
       | coefficients calculated at various offsets for a quarter wave. So
       | you lookuo where you are in the quarter wave, then look up the 3
       | or 4 cofficients. This keeps the error somewhat bounded as the
       | size of X is a small so the series does not diverge too much.
        
       | amelius wrote:
       | And arcsin?
        
       | convolvatron wrote:
       | http://steve.hollasch.net/cgindex/math/inccos.html is a great
       | technique if you need a fast integer approximation for some some
       | arbitrary sampling interval (i.e. motor control)
        
       | warpech wrote:
       | This made me realize that trigonometric functions are not
       | deterministic across different CPU architectures, OS, and
       | programming languages (floating point precision aside).
       | 
       | E.g. I would assume that Math.sin(x) returns the same thing in
       | NodeJS on Windows and Mac/M1, but it turns out it is necessarily
       | so. https://stackoverflow.com/questions/74074312/standard-
       | math-f...
        
         | adgjlsfhk1 wrote:
         | some languages (e.g. Julia) provide their own math library do
         | that you get the same results across across operating systems.
        
         | TylerE wrote:
         | Safer to assume that floats are never deterministic.
        
           | jacobolus wrote:
           | Floats follow a clear specification which determines
           | precisely how basic arithmetic should work. They should work
           | the same on all popular modern platforms. (Whether specific
           | software libraries are the same is a separate question.)
        
             | zokier wrote:
             | But transcendentals like sine are not part of the strictly
             | defined basic arithmetic; they are intentionally defined
             | with relaxed behavior.
        
               | gpderetta wrote:
               | Even there many standard libraries provide very good
               | precision at least within sane domain.
        
               | TylerE wrote:
               | Yeah, that's kind of my point. 99% consistent isn't.
        
               | jacobolus wrote:
               | If you implement sine in software using the same sequence
               | of basic arithmetic instructions, the result should be
               | the same across platforms. If you make two different
               | implementations using different arithmetic, then of
               | course you can't rely on them being the same.
        
               | zokier wrote:
               | Point being that IEEE 754 defines two sets of operations,
               | the required operations (section 5) that should be
               | produce correctly rounded results to the last digit, and
               | recommend operations (section 9) with relaxed
               | requirements. And sine belongs to the latter section, so
               | IEEE 754 does not mandate reproducible results for sine.
        
               | jacobolus wrote:
               | My understanding is that most software always uses some
               | software implementation of sine, rather than calling a
               | hardware instruction. Which is definitely what you should
               | do if you care about getting the exact same results
               | across platforms.
        
           | aardvark179 wrote:
           | Floats are well defined, and it is perfectly possible to
           | reason about how algorithms based on them should behave. Few
           | languages specify the accuracy of things like trig functions,
           | so relying on them can be tricky, and JavaScript is
           | particularly bad in that respect.
        
           | saagarjha wrote:
           | This is not always a safe assumption (in certain scenarios
           | floating point results being nondeterministic has the
           | possibility to introduce bugs and security issues) and is
           | also a kind of sad way to look at the world. The response to
           | "I don't understand how this works" should not be to adopt an
           | incorrect viewpoint, but to know the limitations of your
           | understanding.
        
             | TylerE wrote:
             | It's not that I don't understand, it's that I do. Floats
             | are inherently lossy representations. Yes, this means the
             | more operations you perform on a float input, the fuzzier
             | the value is.You ignore that harsh reality at _your_ peril.
             | If you find engineering rigor sad, I don't know what to
             | tell you.
        
               | saagarjha wrote:
               | "Floats are not deterministic" is not engineering rigor,
               | it's just wrong. They are specified precisely by IEEE-754
               | in how they must behave and which operations are allowed
               | to produce which results.
        
               | TylerE wrote:
               | IEEE 754 conforming floats conform to IEEE-754. If they
               | actually conform. Low end devices with shitty software
               | implementations often get the hard edge cases wrong.
        
               | saagarjha wrote:
               | Yes and when that happens it is important to know what
               | went wrong rather than just handwaving "oh it's that
               | float stuff again I can't trust it".
        
               | jacobolus wrote:
               | > _the more operations you perform on a float input, the
               | fuzzier the value is_
               | 
               | No, any float always precisely represents a specific
               | number. The issue is that only a finite number of numbers
               | are representable.
               | 
               | Some algorithms are poorly conditioned and when
               | implemented using floating point arithmetic will lead to
               | a result that is different than what you would get in
               | idealized real number arithmetic. That doesn't make any
               | floating point value "fuzzy".
        
         | cogman10 wrote:
         | Well, what's fun is that (AFAIK) trigonometric functions tend
         | not to be implemented in the newer floating point instructions,
         | such as AVX or SSE.
         | 
         | So while what you say is true about the x87 implementation of
         | those functions, for anything targeting a machine built in the
         | last 20 years it's likely the code will run consistently
         | regardless the architecture (barring architecture floating
         | point bugs, which aren't terribly uncommon in the less
         | significant bits and when overclocking comes into play).
         | 
         | x86 compilers won't use x87 instructions when SSE2 and later
         | are available. x87 is just a really weird and funky instruction
         | set that's best left in the gutter of history.
        
           | bnprks wrote:
           | Sadly even SSE vs. AVX is enough to often give different
           | results, as SSE doesn't have support for fused multiply-add
           | instructions which allow calculation of a*b + c with
           | guaranteed correct rounding. Even though this should allow
           | CPUs from 2013 and later to all use FMA, gcc/clang don't
           | enable AVX by default for the x86-64 targets. And even if
           | they did, results are only guaranteed identical if
           | implementations have chosen the exact same polynomial
           | approximation method and no compiler optimizations alter the
           | instruction sequence.
           | 
           | Unfortunately, floating point results will probably continue
           | to differ across platforms for the foreseeable future.
        
             | cogman10 wrote:
             | That's a bit of a different problem IMO.
             | 
             | Barring someone doing a "check if AVX is available" check
             | inside their code, binaries are generally compiled
             | targeting either SSE or AVX and not both. You can
             | reasonably expect that the same binary thrown against
             | multiple architectures will have the same output.
             | 
             | This, of course, doesn't apply if we are talking about a
             | JIT. All bets are off if you are talking about javascript
             | or the JVM.
             | 
             | That is to say, you can expect that a C++ binary blob from
             | the Ubuntu repo is going to get the same numbers regardless
             | the machine since they generally will target fairly old
             | architectures.
        
               | zokier wrote:
               | > Barring someone doing a "check if AVX is available"
               | check inside their code
               | 
               | Afaik that is exactly what glibc does internally
        
             | gpderetta wrote:
             | GCC won't use FMA without fast-math though. Even when AVX
             | is otherwise enabled.
        
               | zokier wrote:
               | Sure it will:
               | 
               | > -ffp-contract=fast enables floating-point expression
               | contraction such as forming of fused multiply-add
               | operations if the target has native support for them
               | 
               | > The default is -ffp-contract=off for C in a standards
               | compliant mode (-std=c11 or similar), -ffp-contract=fast
               | otherwise.
               | 
               | https://gcc.gnu.org/onlinedocs/gcc/Optimize-
               | Options.html#ind...
        
               | gpderetta wrote:
               | Oh, wow, forgot about fp-contract. It says it is off in C
               | by default, what about C++?
        
               | zokier wrote:
               | Read closer, it defaults to fast, not off
        
           | bee_rider wrote:
           | They do, however, have some intrinsics for trig functions in
           | AVX in their compilers. Not as good as having an instruction
           | of course.
        
           | enriquto wrote:
           | > x87 is just a really weird and funky instruction set that's
           | best left in the gutter of history
           | 
           | hmmm, can you use the long doubles in sse or avx? They are
           | glorious, and as far as I see from playing with godbolt, they
           | still require dirtying your hands with the x87 stack.
        
             | cogman10 wrote:
             | The 80bit float? Not as far as I'm aware. However, it's
             | fairly trivial to represent a 127bit float with 2 64bit
             | floats. And with the nature of AVX/SSE, you don't really
             | take much of a performance hit for doing that as you are
             | often operating on both parts of the double with the same
             | instruction.
        
               | enriquto wrote:
               | Do you know if there's language support for that? Are
               | there obscure gcc options that make "long double" be
               | quadruple precision floats?
        
               | cogman10 wrote:
               | Which language?
               | 
               | For C++, there's this:
               | https://en.cppreference.com/w/cpp/types/floating-point
        
               | zokier wrote:
               | You can just use standard C _Float128 type
               | https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html
        
         | aardvark179 wrote:
         | Somewhat annoyingly the ascribe standard only specifies that
         | various math functions return an approximation but does not set
         | any bounds on that approximation. So for many functions you
         | could just return NaN and still be compliant.
        
           | layer8 wrote:
           | Isn't NaN the one value that can't possibly count as an
           | approximation, because it's not a number and unordered? ;)
        
             | aardvark179 wrote:
             | You might think so, but if it's not specified in the
             | standard...
        
               | zardo wrote:
               | It is the worst possible approximation though.
        
               | layer8 wrote:
               | I was submitting an interpretation of the standard.
        
         | retrac wrote:
         | Rounding transcendentals correctly has unknown time and space
         | complexity. [1] Sort of brushes up against the halting problem.
         | With limited precision, the upper limit becomes calculable but
         | it's rather large - packages that offer correct rounding on
         | 64-bit floating point use potentially hundreds of bytes to deal
         | with a single floating point value. Dedicated circuitry to
         | implement it fast would be big and complicated even by today's
         | standards.
         | 
         | https://en.wikipedia.org/wiki/Rounding#Table-maker's_dilemma
        
       | microtherion wrote:
       | P.J. Plauger's _The Standard C Library_ provides an
       | implementation for all functions in the (then) C standard:
       | https://www.amazon.com/Standard-Library-P-J-Plauger/dp/01383...
        
       | dboreham wrote:
       | This was my first use of open source, around 1978. I wondered how
       | calculators and computers did trig functions, and was also using
       | Unix V7. We had a large disk and kept the source on line. So I
       | was able to find this: https://www.tuhs.org/cgi-
       | bin/utree.pl?file=V7/usr/src/libm/s... and from there this book:
       | https://www.biblio.com/book/computer-approximations-john-f-h...
        
       | vardump wrote:
       | CORDIC is how it's usually done in hardware (and FPGAs).
       | 
       | https://en.wikipedia.org/wiki/CORDIC
        
         | adgjlsfhk1 wrote:
         | no it's not. cordic has awful convergence of 1 bit per
         | iteration. pretty much everyone uses power series.
        
           | f1shy wrote:
           | That is 64 iterations for a double, that is nothing!
        
         | perihelions wrote:
         | CORDIC is pretty obsolete, AFAIK. Its advantage is that its
         | hardware requirements are absolutely tiny: two (?) accumulator
         | registers, and hardware adders and shift-ers--I think that's
         | all. No multiplication needed, in particular. Very convenient
         | if you're building things from _discrete transistors_ , like
         | the some of those earlier scientific calculators!
         | 
         | (Also has a nice property, apparently, that CORDIC-like
         | routines exist for a bunch of special functions and they're
         | very similar to each other. Does anyone have a good resource
         | for learning the details of those algorithms? They sound
         | elegant).
        
           | pclmulqdq wrote:
           | CORDIC still is used in tiny microcontrollers (smaller than
           | Cortex-M0) and in FPGAs when you are very resource-
           | constrained. Restricting the domain and using Chebyshev/Remez
           | is the way to go pretty much everywhere.
        
           | f1shy wrote:
           | Multiplication is pretty much needed in cordic! And is far
           | from obsolete! It works perfectly fine, and dont have any of
           | the problems said in the article.
        
             | perihelions wrote:
             | It uses multiplication by powers of two, which is a
             | floating-point bit shift.
        
         | bigbillheck wrote:
         | Why would you ever use CORDIC if you had any other option?
        
           | vardump wrote:
           | It's great for hardware implementations, because it's simple
           | and you get good/excellent accuracy. I wouldn't be surprised
           | if that's still how modern x86-64 CPUs compute sin, cos, etc.
           | 
           | That said, last time I had to do that in software, I used
           | Taylor series. Might not have been an optimal solution.
           | 
           | EDIT:
           | 
           | AMD's Zen 4 takes 50-200 cycles (latency) to compute sine. I
           | think that strongly suggests AMD uses CORDIC.
           | https://www.agner.org/optimize/instruction_tables.pdf page
           | 130.
           | 
           | Same for Intel, Tiger Lake (Intel gen 11) has 60-120 cycles
           | of latency. Page 353.
           | 
           | I'd guess usually ~50 cycles for Zen 4 (and ~60 for Intel)
           | for float32, float64/float80 datatype. Denormals might also
           | cost more cycles.
        
             | bigbillheck wrote:
             | They switched away from CORDIC at one point: https://www.in
             | tel.com/content/www/us/en/developer/articles/t...
             | 
             | (there doesn't seem to actually be a linked article there,
             | just the summary)
        
               | vardump wrote:
               | Pretty weird Intel's sine computation latency hasn't
               | changed all that much over the years. Latencies have been
               | pretty similar for 20 years.
               | 
               | EDIT: That's a paper for a software library, not the
               | CPU's internal implementation. Which is probably still
               | done with CORDIC.
        
               | bigbillheck wrote:
               | > EDIT: That's a paper for a software library, not the
               | CPU's internal implementation.
               | 
               | Unless you're seeing something I'm not, it's talking
               | about x87, which hasn't been anything other than
               | 'internal' since they stopped selling the 80486sx.
        
       | eska wrote:
       | You might also find this video interesting:
       | 
       | "Finding the BEST sine function for Nintendo 64"
       | 
       | https://www.youtube.com/watch?v=xFKFoGiGlXQ
        
       | perihelions wrote:
       | Is this still current? The paper has a publication year of 1999.
        
       | jxy wrote:
       | It's much clearer if you read one of the source code of the libm.
       | 
       | Plan 9: https://9p.io/sources/plan9/sys/src/libc/port/sin.c
       | 
       | Freebsd: https://cgit.freebsd.org/src/tree/lib/msun/src/k_sin.c
        
       | deepthaw wrote:
       | Ignorant question:
       | 
       | Given the ridiculous number of transistors and so on we can use
       | in CPUs and GPUs nowadays how feasible is a relatively huge trig
       | lookup table burned into rom?
        
         | bigbillheck wrote:
         | Seems like a terrible idea on latency grounds alone.
        
         | zokier wrote:
         | There is huge number of 64bit floats and huge portion of those
         | are between 0..pi/2.
        
       | azhenley wrote:
       | I blogged my adventure of implementing cosine from scratch and
       | how others have done it:
       | 
       | https://austinhenley.com/blog/cosine.html
        
       ___________________________________________________________________
       (page generated 2024-03-07 23:01 UTC)