[HN Gopher] How do computers calculate sine?
___________________________________________________________________
How do computers calculate sine?
Author : vegesm
Score : 63 points
Date : 2024-03-07 18:59 UTC (4 hours ago)
(HTM) web link (androidcalculator.com)
(TXT) w3m dump (androidcalculator.com)
| paulpauper wrote:
| sine is easy because the series is globally convergent and fast
| converging
| ot wrote:
| Did you read the article? It is specifically about how the
| series looks simple, but the error is actually very bad if you
| do things naively.
| paulpauper wrote:
| That still makes it easier compared to computing constants in
| which the series are not globally convergent, like inverse
| trig functions. Obviously, you would have to break it apart
| to speed convergence.
| duped wrote:
| 1 - cos^2(x), obviously
| nh23423fefe wrote:
| instead, you could just double negate to optimize away the
| square root -(-(sin(x))
| ChainOfFools wrote:
| clear, but too verbose. 1/csc is what you want.
|
| or for style points just -cos'
| t-3 wrote:
| Link doesn't appear to be valid, but aren't these usually
| precalculated and stored in a lookup table?
| tails4e wrote:
| I've seen the third order Taylor series used, but with the
| coefficients calculated at various offsets for a quarter wave. So
| you lookuo where you are in the quarter wave, then look up the 3
| or 4 cofficients. This keeps the error somewhat bounded as the
| size of X is a small so the series does not diverge too much.
| amelius wrote:
| And arcsin?
| convolvatron wrote:
| http://steve.hollasch.net/cgindex/math/inccos.html is a great
| technique if you need a fast integer approximation for some some
| arbitrary sampling interval (i.e. motor control)
| warpech wrote:
| This made me realize that trigonometric functions are not
| deterministic across different CPU architectures, OS, and
| programming languages (floating point precision aside).
|
| E.g. I would assume that Math.sin(x) returns the same thing in
| NodeJS on Windows and Mac/M1, but it turns out it is necessarily
| so. https://stackoverflow.com/questions/74074312/standard-
| math-f...
| adgjlsfhk1 wrote:
| some languages (e.g. Julia) provide their own math library do
| that you get the same results across across operating systems.
| TylerE wrote:
| Safer to assume that floats are never deterministic.
| jacobolus wrote:
| Floats follow a clear specification which determines
| precisely how basic arithmetic should work. They should work
| the same on all popular modern platforms. (Whether specific
| software libraries are the same is a separate question.)
| zokier wrote:
| But transcendentals like sine are not part of the strictly
| defined basic arithmetic; they are intentionally defined
| with relaxed behavior.
| gpderetta wrote:
| Even there many standard libraries provide very good
| precision at least within sane domain.
| TylerE wrote:
| Yeah, that's kind of my point. 99% consistent isn't.
| jacobolus wrote:
| If you implement sine in software using the same sequence
| of basic arithmetic instructions, the result should be
| the same across platforms. If you make two different
| implementations using different arithmetic, then of
| course you can't rely on them being the same.
| zokier wrote:
| Point being that IEEE 754 defines two sets of operations,
| the required operations (section 5) that should be
| produce correctly rounded results to the last digit, and
| recommend operations (section 9) with relaxed
| requirements. And sine belongs to the latter section, so
| IEEE 754 does not mandate reproducible results for sine.
| jacobolus wrote:
| My understanding is that most software always uses some
| software implementation of sine, rather than calling a
| hardware instruction. Which is definitely what you should
| do if you care about getting the exact same results
| across platforms.
| aardvark179 wrote:
| Floats are well defined, and it is perfectly possible to
| reason about how algorithms based on them should behave. Few
| languages specify the accuracy of things like trig functions,
| so relying on them can be tricky, and JavaScript is
| particularly bad in that respect.
| saagarjha wrote:
| This is not always a safe assumption (in certain scenarios
| floating point results being nondeterministic has the
| possibility to introduce bugs and security issues) and is
| also a kind of sad way to look at the world. The response to
| "I don't understand how this works" should not be to adopt an
| incorrect viewpoint, but to know the limitations of your
| understanding.
| TylerE wrote:
| It's not that I don't understand, it's that I do. Floats
| are inherently lossy representations. Yes, this means the
| more operations you perform on a float input, the fuzzier
| the value is.You ignore that harsh reality at _your_ peril.
| If you find engineering rigor sad, I don't know what to
| tell you.
| saagarjha wrote:
| "Floats are not deterministic" is not engineering rigor,
| it's just wrong. They are specified precisely by IEEE-754
| in how they must behave and which operations are allowed
| to produce which results.
| TylerE wrote:
| IEEE 754 conforming floats conform to IEEE-754. If they
| actually conform. Low end devices with shitty software
| implementations often get the hard edge cases wrong.
| saagarjha wrote:
| Yes and when that happens it is important to know what
| went wrong rather than just handwaving "oh it's that
| float stuff again I can't trust it".
| jacobolus wrote:
| > _the more operations you perform on a float input, the
| fuzzier the value is_
|
| No, any float always precisely represents a specific
| number. The issue is that only a finite number of numbers
| are representable.
|
| Some algorithms are poorly conditioned and when
| implemented using floating point arithmetic will lead to
| a result that is different than what you would get in
| idealized real number arithmetic. That doesn't make any
| floating point value "fuzzy".
| cogman10 wrote:
| Well, what's fun is that (AFAIK) trigonometric functions tend
| not to be implemented in the newer floating point instructions,
| such as AVX or SSE.
|
| So while what you say is true about the x87 implementation of
| those functions, for anything targeting a machine built in the
| last 20 years it's likely the code will run consistently
| regardless the architecture (barring architecture floating
| point bugs, which aren't terribly uncommon in the less
| significant bits and when overclocking comes into play).
|
| x86 compilers won't use x87 instructions when SSE2 and later
| are available. x87 is just a really weird and funky instruction
| set that's best left in the gutter of history.
| bnprks wrote:
| Sadly even SSE vs. AVX is enough to often give different
| results, as SSE doesn't have support for fused multiply-add
| instructions which allow calculation of a*b + c with
| guaranteed correct rounding. Even though this should allow
| CPUs from 2013 and later to all use FMA, gcc/clang don't
| enable AVX by default for the x86-64 targets. And even if
| they did, results are only guaranteed identical if
| implementations have chosen the exact same polynomial
| approximation method and no compiler optimizations alter the
| instruction sequence.
|
| Unfortunately, floating point results will probably continue
| to differ across platforms for the foreseeable future.
| cogman10 wrote:
| That's a bit of a different problem IMO.
|
| Barring someone doing a "check if AVX is available" check
| inside their code, binaries are generally compiled
| targeting either SSE or AVX and not both. You can
| reasonably expect that the same binary thrown against
| multiple architectures will have the same output.
|
| This, of course, doesn't apply if we are talking about a
| JIT. All bets are off if you are talking about javascript
| or the JVM.
|
| That is to say, you can expect that a C++ binary blob from
| the Ubuntu repo is going to get the same numbers regardless
| the machine since they generally will target fairly old
| architectures.
| zokier wrote:
| > Barring someone doing a "check if AVX is available"
| check inside their code
|
| Afaik that is exactly what glibc does internally
| gpderetta wrote:
| GCC won't use FMA without fast-math though. Even when AVX
| is otherwise enabled.
| zokier wrote:
| Sure it will:
|
| > -ffp-contract=fast enables floating-point expression
| contraction such as forming of fused multiply-add
| operations if the target has native support for them
|
| > The default is -ffp-contract=off for C in a standards
| compliant mode (-std=c11 or similar), -ffp-contract=fast
| otherwise.
|
| https://gcc.gnu.org/onlinedocs/gcc/Optimize-
| Options.html#ind...
| gpderetta wrote:
| Oh, wow, forgot about fp-contract. It says it is off in C
| by default, what about C++?
| zokier wrote:
| Read closer, it defaults to fast, not off
| bee_rider wrote:
| They do, however, have some intrinsics for trig functions in
| AVX in their compilers. Not as good as having an instruction
| of course.
| enriquto wrote:
| > x87 is just a really weird and funky instruction set that's
| best left in the gutter of history
|
| hmmm, can you use the long doubles in sse or avx? They are
| glorious, and as far as I see from playing with godbolt, they
| still require dirtying your hands with the x87 stack.
| cogman10 wrote:
| The 80bit float? Not as far as I'm aware. However, it's
| fairly trivial to represent a 127bit float with 2 64bit
| floats. And with the nature of AVX/SSE, you don't really
| take much of a performance hit for doing that as you are
| often operating on both parts of the double with the same
| instruction.
| enriquto wrote:
| Do you know if there's language support for that? Are
| there obscure gcc options that make "long double" be
| quadruple precision floats?
| cogman10 wrote:
| Which language?
|
| For C++, there's this:
| https://en.cppreference.com/w/cpp/types/floating-point
| zokier wrote:
| You can just use standard C _Float128 type
| https://gcc.gnu.org/onlinedocs/gcc/Floating-Types.html
| aardvark179 wrote:
| Somewhat annoyingly the ascribe standard only specifies that
| various math functions return an approximation but does not set
| any bounds on that approximation. So for many functions you
| could just return NaN and still be compliant.
| layer8 wrote:
| Isn't NaN the one value that can't possibly count as an
| approximation, because it's not a number and unordered? ;)
| aardvark179 wrote:
| You might think so, but if it's not specified in the
| standard...
| zardo wrote:
| It is the worst possible approximation though.
| layer8 wrote:
| I was submitting an interpretation of the standard.
| retrac wrote:
| Rounding transcendentals correctly has unknown time and space
| complexity. [1] Sort of brushes up against the halting problem.
| With limited precision, the upper limit becomes calculable but
| it's rather large - packages that offer correct rounding on
| 64-bit floating point use potentially hundreds of bytes to deal
| with a single floating point value. Dedicated circuitry to
| implement it fast would be big and complicated even by today's
| standards.
|
| https://en.wikipedia.org/wiki/Rounding#Table-maker's_dilemma
| microtherion wrote:
| P.J. Plauger's _The Standard C Library_ provides an
| implementation for all functions in the (then) C standard:
| https://www.amazon.com/Standard-Library-P-J-Plauger/dp/01383...
| dboreham wrote:
| This was my first use of open source, around 1978. I wondered how
| calculators and computers did trig functions, and was also using
| Unix V7. We had a large disk and kept the source on line. So I
| was able to find this: https://www.tuhs.org/cgi-
| bin/utree.pl?file=V7/usr/src/libm/s... and from there this book:
| https://www.biblio.com/book/computer-approximations-john-f-h...
| vardump wrote:
| CORDIC is how it's usually done in hardware (and FPGAs).
|
| https://en.wikipedia.org/wiki/CORDIC
| adgjlsfhk1 wrote:
| no it's not. cordic has awful convergence of 1 bit per
| iteration. pretty much everyone uses power series.
| f1shy wrote:
| That is 64 iterations for a double, that is nothing!
| perihelions wrote:
| CORDIC is pretty obsolete, AFAIK. Its advantage is that its
| hardware requirements are absolutely tiny: two (?) accumulator
| registers, and hardware adders and shift-ers--I think that's
| all. No multiplication needed, in particular. Very convenient
| if you're building things from _discrete transistors_ , like
| the some of those earlier scientific calculators!
|
| (Also has a nice property, apparently, that CORDIC-like
| routines exist for a bunch of special functions and they're
| very similar to each other. Does anyone have a good resource
| for learning the details of those algorithms? They sound
| elegant).
| pclmulqdq wrote:
| CORDIC still is used in tiny microcontrollers (smaller than
| Cortex-M0) and in FPGAs when you are very resource-
| constrained. Restricting the domain and using Chebyshev/Remez
| is the way to go pretty much everywhere.
| f1shy wrote:
| Multiplication is pretty much needed in cordic! And is far
| from obsolete! It works perfectly fine, and dont have any of
| the problems said in the article.
| perihelions wrote:
| It uses multiplication by powers of two, which is a
| floating-point bit shift.
| bigbillheck wrote:
| Why would you ever use CORDIC if you had any other option?
| vardump wrote:
| It's great for hardware implementations, because it's simple
| and you get good/excellent accuracy. I wouldn't be surprised
| if that's still how modern x86-64 CPUs compute sin, cos, etc.
|
| That said, last time I had to do that in software, I used
| Taylor series. Might not have been an optimal solution.
|
| EDIT:
|
| AMD's Zen 4 takes 50-200 cycles (latency) to compute sine. I
| think that strongly suggests AMD uses CORDIC.
| https://www.agner.org/optimize/instruction_tables.pdf page
| 130.
|
| Same for Intel, Tiger Lake (Intel gen 11) has 60-120 cycles
| of latency. Page 353.
|
| I'd guess usually ~50 cycles for Zen 4 (and ~60 for Intel)
| for float32, float64/float80 datatype. Denormals might also
| cost more cycles.
| bigbillheck wrote:
| They switched away from CORDIC at one point: https://www.in
| tel.com/content/www/us/en/developer/articles/t...
|
| (there doesn't seem to actually be a linked article there,
| just the summary)
| vardump wrote:
| Pretty weird Intel's sine computation latency hasn't
| changed all that much over the years. Latencies have been
| pretty similar for 20 years.
|
| EDIT: That's a paper for a software library, not the
| CPU's internal implementation. Which is probably still
| done with CORDIC.
| bigbillheck wrote:
| > EDIT: That's a paper for a software library, not the
| CPU's internal implementation.
|
| Unless you're seeing something I'm not, it's talking
| about x87, which hasn't been anything other than
| 'internal' since they stopped selling the 80486sx.
| eska wrote:
| You might also find this video interesting:
|
| "Finding the BEST sine function for Nintendo 64"
|
| https://www.youtube.com/watch?v=xFKFoGiGlXQ
| perihelions wrote:
| Is this still current? The paper has a publication year of 1999.
| jxy wrote:
| It's much clearer if you read one of the source code of the libm.
|
| Plan 9: https://9p.io/sources/plan9/sys/src/libc/port/sin.c
|
| Freebsd: https://cgit.freebsd.org/src/tree/lib/msun/src/k_sin.c
| deepthaw wrote:
| Ignorant question:
|
| Given the ridiculous number of transistors and so on we can use
| in CPUs and GPUs nowadays how feasible is a relatively huge trig
| lookup table burned into rom?
| bigbillheck wrote:
| Seems like a terrible idea on latency grounds alone.
| zokier wrote:
| There is huge number of 64bit floats and huge portion of those
| are between 0..pi/2.
| azhenley wrote:
| I blogged my adventure of implementing cosine from scratch and
| how others have done it:
|
| https://austinhenley.com/blog/cosine.html
___________________________________________________________________
(page generated 2024-03-07 23:01 UTC)