[HN Gopher] Cosine Implementation in C
       ___________________________________________________________________
        
       Cosine Implementation in C
        
       Author : opisthenar84
       Score  : 288 points
       Date   : 2023-03-31 03:03 UTC (19 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | spenczar5 wrote:
       | Here is a naive question. A lot of effort goes into dealing with
       | floating point imprecision and instabilities, right. So why
       | aren't these algorithms implemented with integers? A 64 bit
       | integer could represent the region from 0 to 1 with 2*64 evenly
       | spaced slices, for example.
       | 
       | Would that make things more accurate across operations? Is this
       | something that is done?
        
         | dzdt wrote:
         | Re: "imprecision and instabilities" on standard hardware there
         | really are neither. Floating point numbers have an exact
         | meaning (a particular fraction with an integer numerator and a
         | denominator that is a power of 2). There are exactly specified
         | rules for basic operations so that adding, subtracting,
         | multiplying, or dividing these fractions will give you the
         | nearest representable fraction to the exact result (assuming
         | the most common round-to-nearest mode).
         | 
         | The bit that appears as "imprecision and instabilities" is a
         | mismatch between three things: (1) what the hardware provides
         | (2) what controls or compiler optimisation settings the
         | programming language exposes (3) what the user expects.
         | 
         | There are several sticky points with this. One mentioned in the
         | parent article is higher precision. Starting from the original
         | 8087 floating point coprocessor x86 has supported both 80-bit
         | and 64-bit floating point in hardware, but programming language
         | support has mostly ignored this and only shows the user
         | "double" precison with 64 bits of storage allocated. But with
         | some compiler settings parts of the compuation would be done at
         | 80 bit precision with no user visibility or control over which
         | parts had that extra precision.
         | 
         | Newer hardware modes stick to 64 bit computation so you're
         | unlikely to run into that particular stickiness on a modern
         | programming stack. There is another that comes up more now:
         | fused multiply and add. Modern hardware can calculate
         | round(a*x+b)
         | 
         | in one step where a,x,b are floating point numbers (i.e.
         | fractions) and the "round" operation produces the nearest
         | representable fraction to the correct result. This is faster
         | and more accurate to calculate than doing the multiplication
         | and addition seperately, which gives
         | round(round(a*x)+b)
         | 
         | But again programming language control is weak over whether and
         | when fused-multiply-and-add is used, and this is a detail that
         | almost all programmers almost always just want to ignore
         | anyway.
        
         | syockit wrote:
         | Speaking from no experience, but it should be doable. But since
         | in most applications you're going to use floating point anyway
         | (or, if you're using fixed point, maybe right bit shift or
         | maybe even do integer division until it fits the range of your
         | value), the extra precision would be discarded in the end.
        
         | jcranmer wrote:
         | What you're describing is fixed point. Fixed point absolutely
         | exists in practice, but it has an important limitation: its
         | accuracy is scale-dependent, unlike floating-point.
         | 
         | Suppose for example you were doing a simulation of the solar
         | system. What units do you use for coordinates--AUs, light-
         | seconds, kilometers, meters, even miles? With floating-point,
         | it doesn't matter: you get the same relative accuracy at the
         | end. With fixed point, you get vastly different relative
         | accuracy if your numbers are ~10-6 versus ~106.
        
         | moonchild wrote:
         | That is called 'fixed point', and it is useful for some
         | applications. What makes floating point interesting is that it
         | can represent values with very different magnitudes; your
         | 64-bit fixed point number can't represent any number greater
         | than 1, for instance. Whereas a 32-bit float can represent
         | numbers with magnitudes between ~2^-127 and ~2^127.
        
           | psychphysic wrote:
           | Maybe Cosine could be fixed and converter to float before
           | use. Using identities to calculate Sine and Tangent
           | functions.
        
             | bigbillheck wrote:
             | Seems like a lot of effort for no actual gain.
        
               | psychphysic wrote:
               | Just an option if magnitude is an issue. You only needs
               | 0-1 plus a sign bit.
               | 
               | Fixed to float is easy enough. It'd let you use a fixed
               | point implementation for 0 - pi/2 benefiting from smaller
               | look up table and simpler implementation.
        
               | Someone wrote:
               | > It'd let you use a fixed point implementation for 0 -
               | pi/2 benefiting from smaller look up table
               | 
               | Why would that table be smaller if you use 2^64 values in
               | that range instead of the fewer than 2^55 doubles in that
               | range?
               | 
               | > and simpler implementation.
               | 
               | Why would it be simpler? With doubles, the hardware will
               | do scaling for you. With fixed point, you need a helper
               | function or macro to do multiplications, and those have
               | to use an intermediate result with more than 64 bits (in
               | this special case, some shortcuts are possible, but those
               | won't make the implementation simpler).
        
               | psychphysic wrote:
               | One or both of us have lost the thread here.
               | 
               | There was an exchange about using fixed point (which
               | apparently is being used according to other comments for
               | easier correctness verification, speed and simplicity).
               | Someone pointed out that float gives you a wider range of
               | values. But those values are not needed for cosine which
               | has a range of 0-1 in the interval 0 to pi/2.
        
               | dahart wrote:
               | > But those values are not needed for cosine which has a
               | range of 0-1 in the interval 0 to pi/2
               | 
               | Ah not quite true though; the "range" is also referring
               | to the exponent range of representable numbers, so don't
               | forget the small numbers near 0. You can get _much_
               | smaller numbers with much higher precision out of a
               | single-precision float than a fixed-point 64 bit [0,1]
               | number.
        
               | psychphysic wrote:
               | Ah makes sense. I imagine depending one how you intend to
               | use the numbers you need that precision at the steeper
               | parts too.
        
         | ChuckNorris89 wrote:
         | There should be plenty of C trigonometric algorithms made for
         | fixed point. They usually use lookup tables and interpolation.
         | Used a lot in embedded microcontrollers that don't have
         | hardware floating point or hardware trig functions, and in
         | automotive where floats were (still are?) a no-go due to the
         | dangerous float math pitfalls and the safety critical nature of
         | ECUs.
         | 
         | I remember writing a fast, small footprint, but very accurate
         | fixed point trig function for the microcontroller used in my
         | batcher project, many lives ago. I'll try to find it and put it
         | on github.
         | 
         | The neat thing is that you can use the native overflow wrap
         | around property enabled by C integers and processor registers
         | to rotate back and fourth around the trigonometric circle
         | without needing to account for the edge cases of the 0/360
         | degree point. You do need to suppress MISRA warning here though
         | as it will go nuts seeing that.
        
           | rtfeldman wrote:
           | I'd love to see some implementations of fixed point trig
           | functions, but I haven't found any! Do you have any links by
           | any chance?
        
             | ChuckNorris89 wrote:
             | I don't have the code at hand to show you, nor do I have
             | any links similar trig stuff as I wrote everything from
             | scratch, first in Matlab to prover the concept, then in C,
             | as there wasn't anything similar online at the time, but I
             | used the principles of fixed point fractional math of
             | representing floating point numbers as scaled integers in
             | the Q notation, which is well documented for use in
             | microcontrollers and DSPs of the era (early to late '00s)
             | 
             | Here's the wiki gist with some C code snippet examples [1].
             | Here's a more detailed math paper that goes in the
             | weeds[2]. Here's a nice presentation from Motorola-
             | Freescale-NXP I just found which I wish I had back then as
             | this is awesome and explains the basics for everyone to
             | understand, and seeing those Motorola Windows 95 GUI tools
             | brings a nostalgic smile on my face.
             | 
             | [1] https://en.wikipedia.org/wiki/Q_(number_format)
             | 
             | [2] http://darcy.rsgc.on.ca/ACES/ICE4M/FixedPoint/FixedPoin
             | tRepr...
             | 
             | [3] https://www.nxp.com/files-
             | static/training_presentation/TP_CO...
        
             | Someone wrote:
             | I'd Google "forth trig functions" and get, for example
             | 
             | https://web.stanford.edu/~dljaffe/SVFIG99/trig.html
             | 
             | https://www.hpmuseum.org/forum/thread-18123.html (which
             | links to https://archive.org/details/Forth_Dimension_Volume
             | _04_Number...)
        
               | rtfeldman wrote:
               | Nice, I didn't realize Forth used them - thank you!
        
       | sylware wrote:
       | Programming accurate floating point calculations is where you
       | need a maths college/uni degree. 99% or the "other programming"
       | (including "AI") does not require skills higher than those from
       | high-school.
        
       | djmips wrote:
       | sincos is better. Two for one deal. :)
        
       | pjmlp wrote:
       | If you care about this kind of stuff, the theory lies in
       | numerical methods, for example,
       | 
       | http://www.sci.brooklyn.cuny.edu/~mate/nml/numanal.pdf
        
       | rurban wrote:
       | Why does he link to his lame fork of musl for the official
       | FreeBSD sources for libm, taken from Sun, used by musl?
       | 
       | https://github.com/freebsd/freebsd-src/blob/main/lib/msun/sr...
       | or at least https://git.musl-
       | libc.org/cgit/musl/tree/src/math/__cos.c
        
         | erk__ wrote:
         | The history of the freebsd file is interesting in itself
        
       | claytongulick wrote:
       | This is the kind of thing that triggers my imposter syndrome big
       | time.
       | 
       | I've been coding professionally for almost 30 years, and I see
       | stuff like this and wonder if I'll ever be a real programmer.
        
         | Mike_12345 wrote:
         | This is Numerical Analysis, not "programming". It's a topic in
         | Applied Mathematics that you would need to study separately
         | from programming.
         | https://ocw.mit.edu/courses/18-330-introduction-to-numerical...
        
         | kolbe wrote:
         | There are 1000's of different ways to be an expert in CS.
         | implementing high performance functions in numerical analysis
         | is only one.
        
         | bigdict wrote:
         | Just slow down and look at it line by line. Take detours to
         | make sure you understand every part.
        
           | claytongulick wrote:
           | Sure, but that's not what causes it.
           | 
           | I can reason through the code and understand it.
           | 
           | The thing that makes me wonder if I'll ever be a real
           | programmer when I grow up is that someone wrote this,
           | probably on a Thursday after lunch (at least, that's how my
           | brain imagines it).
           | 
           | They were able to conceptualize the math solution to a major
           | optimization problem, deal with all sorts of precision issues
           | with bit manipulation, and come up with a blazing fast
           | bedrock algorithm that's dazzling both in it's simplicity and
           | complexity. And then likely popped out to a pub for a pint
           | and didn't think much about it again.
           | 
           | Intellectually, I understand that this person was
           | specialized, and may struggle mightily navigating the
           | complexity of modern web frameworks, something that I find
           | straightforward because I do it every day.
           | 
           | But for some reason, this stuff always strikes me as being
           | "real" programming, and what I'm doing as just plumbing,
           | piping together the bits that the "real" programmers write.
        
             | nicky0 wrote:
             | Your mind is inventing the part about them doing it on a
             | thursday afternoon before popping out to the pub.
             | 
             | This could have taken weeks of research and refinement. You
             | can't tell from just looking at the final result.
             | 
             | Give yourself some credit. If you were tasked with
             | producing a cosine implementation, you'd pull out some
             | textbooks, read specs, research algorithms, try a few
             | ideas, test, iterate. You could come up with this. The only
             | reason you didn't is that you didn't need to make a cos
             | function yet; you had other stuff to do.
        
             | BeetleB wrote:
             | If it's any consolation, I worked with one of these teams
             | for a few years. Brilliant at the math stuff, and many
             | could not handle git. Or unit tests.
             | 
             | You've spent your time one X, they've spent their time on
             | Y. That's it. Most of these people are math people. They
             | _may_ have CS degrees, but they 've spent the bulk of their
             | time in math.
             | 
             | Actually, now that I recall, one of them was a nuclear
             | physicist and the other was an industrial engineer -
             | neither had CS degrees. I don't know about the others.
        
             | nicoburns wrote:
             | IMO the part of programming that interfaces with humans and
             | requirements is just as tricky (in a different way) and
             | just as important.
             | 
             | Also, have you ever formally studied maths? People don't
             | just come up with this kind of thing. They read about
             | algorithms in textbooks. They might not have read this
             | specific algorithm (which may be original), but they'll
             | have read lots of similar ones.
        
             | nemo1618 wrote:
             | Do you ever feel a burning desire to just _solve the hell_
             | out of a particular problem? Usually this happens in the
             | context of performance optimization, but it can be
             | anything. Maybe you 're supposed to be working on something
             | else, but all you can think about is your pet problem. You
             | throw a bunch of stuff at it, and mostly it doesn't work,
             | but the failures don't discourage you at all; making any
             | sort of progress is _addicting_. You look at other people
             | 's code. You read blog posts and research papers. You fill
             | up notepads. None of this feels remotely like work. It's
             | _fun_ , and you simply _cannot stop_ until you finally
             | grind the problem into the dust.
             | 
             | My guess is that the author of this code was possessed by a
             | spirit like this. Maybe I'm wrong; maybe he was just
             | assigned by his manager to write a cosine function that met
             | a certain performance threshold, and he immediately had a
             | general idea of how to do it, and banged it out in an
             | afternoon. But I doubt it.
             | 
             | My point is that, if you have felt this way about
             | programming, _you are doing "real" programming._ It's not
             | about whether you're writing a web framework or a cosine
             | function. It's about the thrill of the chase.
        
           | [deleted]
        
           | sillysaurusx wrote:
           | I don't think this would help anyone's imposter syndrome.
           | Taking hardcore math line by line will yield confusion.
           | 
           | A better approach might be to learn how cosine can be
           | approximated with a Taylor expansion, then realize this is
           | just an optimized version of that. (Even if it's a different
           | algorithm, it's still an approximation algorithm, which is
           | the point.)
           | 
           | Staring at math constants won't get you that insight.
        
         | jcranmer wrote:
         | You're probably more of a "real programmer" than whomever wrote
         | this code--it's numerics code, after all.
         | 
         | There's no real tricks or anything to this code, it's just a
         | polynomial evaluation of a Taylor series (which you probably
         | learned, and subsequently forgot, from calculus), with some
         | steps done using techniques for getting higher precision out of
         | a floating-point value, and (in some other functions in libm,
         | not here) occasional use of the bit representation of floating-
         | point values. It's standard techniques, but in a field that is
         | incredibly niche and where there is very little value in
         | delving into if you don't have a need for it.
        
           | opportune wrote:
           | Agreed with this. Also outside of very specific applications
           | like graphics libraries, robotics, game engines you would
           | never want to write something like this from scratch and
           | instead rely on an existing library or game engine. So unless
           | you're working in that area it's not imposter syndrome to
           | find it foreign, it's probably good!
           | 
           | I've never worked directly on floating point libraries like
           | this, but having dealt with similar stuff, I suspect if this
           | were in production it would end up peppered with IFDEF to
           | support various implementation differences across hardware
           | types, even if it is compliant with eg IEEE standards.
        
           | [deleted]
        
           | new2yc wrote:
           | If you don't care about "performance":
           | 
           | double my_cos(double x) {                   double sum = 0;
           | for (int n = 0; n < 10; n++) { // 10 terms of Taylor Series
           | sum += pow(-1, n) * pow(x, 2 * n) / factorial(2 * n);
           | }              return sum;
           | 
           | }
           | 
           | Edit: code formatting
        
             | Sesse__ wrote:
             | Also if you don't care about accuracy at all. my_cos(2 *
             | M_PI) = 0.9965. my_cos(4 * M_PI) = -2917.7144 (!).
        
             | irchans wrote:
             | /* This code is silly, not super accurate, but fun _/
             | 
             | double cos2(double x, int n) {                   double
             | numerator = 1.0;              double denominator = 1.0;
             | double pi = 3.14159265358979323846;              int i;
             | for (i = 1; i <= n; ++i) {             numerator   *=
             | pow((x - n * pi / 2.0), 2);             denominator *=
             | pow((n * pi / 2.0), 2);         }              return
             | numerator / denominator;
             | 
             | }
             | 
             | (Edit for format)_
        
           | Sesse__ wrote:
           | It's not a Taylor series. It's a series, but a minimax
           | approximation, not Taylor.
           | 
           | Taylor series are essentially "Give me a polynomial p(x). At
           | x=a, I want the value to be b, the first derivative to be c,
           | the second derivative to be d, etc. etc.". (Usually, a=0,
           | also known as a MacLaurin series.) This gives you a _great_
           | approximation in the immediate vicinity of a, but usually a
           | pretty horrible one the further away you get; in a sense, you
           | spend all your approximation energy on that single point.
           | 
           | Minimax approximations are "Give me a polynomial p(x). Over
           | the area x=[a,b], I want it to never stray far from f(x);
           | give me the one that minimizes the worst-case distance". (The
           | distance function is usually either the absolute error
           | |p(x)-f(x)|, or the relative error |p(x)-f(x)|/|f(x)|, but
           | you can have an arbitrary weight function in some
           | implementations.) So you consider everything in that area to
           | be important, not just one point.
           | 
           | Taylor series are simple to calculate (which can help if you
           | need to calculate the coefficients on the-fly, or by hand),
           | good for analysis, and easy to understand, so most engineers
           | will have heard of them in calculus. The Remez algorithm, on
           | the other hand, essensially requires a computer, and is
           | significantly more complicated to understand and implement.
           | (Remez is not the only way to generate minimax polynomials;
           | Chebychev polynomials come close, and I believe Sollya uses
           | some other algorithm.) Both will converge to the actual
           | function giving infinite terms, but minimax polynomials will
           | be _much_ closer in most real-life situations. Also, if you
           | are really pressed for it, you can have rational minimax
           | polynomials (p(x)/q(x)) that approximate tricky functions
           | _really_ well even at low degree, but of course at the cost
           | of a division.
        
         | okaleniuk wrote:
         | This is a fascinating topic indeed. And the math behind
         | polynomial approximation is not that deep or unapproachable.
         | You just haven't found an explanation that works for you. Some
         | time ago I tried to make one with a sine as an example:
         | https://wordsandbuttons.online/sympy_makes_math_fun_again.ht...
         | 
         | It doesn't mention Taylor series anywhere but if you only make
         | your system out of N differential equations, you'll get N
         | members of the power series.
        
       | kolbe wrote:
       | Memory model experts: if you define the constants in the way this
       | code has, will they be contiguous in memory? Or would it be
       | better to allocate these in an aligned(64) array?
        
       | stagger87 wrote:
       | Implementing low precision transcendental functions is much more
       | interesting IMO, as it allows for a wider variety of algorithms
       | and approaches. When you are limited to the IEEE definitions,
       | there really isn't much room for creativity and you have to
       | handle all the obscure edge cases and chase least significant
       | bits, bleh. That's not something you want to do if you need to
       | calculate a trillion atan's. If one is curious how these
       | algorithms work, check out the book "Elementary Functions".
        
         | l33t233372 wrote:
         | This function is completely constant in running time, so I
         | don't think these considerations are important here.
        
           | saulpw wrote:
           | You can always have a smaller constant!
        
             | l33t233372 wrote:
             | You sure can, but if you can beat a single IEEE
             | multiplication on an arbitrary processor my god that's a
             | program I hope to never ever in my life have to reverse
             | engineer.
        
           | dan_mctree wrote:
           | Never quite got this argument. You can frequently do 1000
           | fold improvements or do even better by optimizing within the
           | same algorithmic complexity. That frequently flips the bit of
           | "is this software usable".
           | 
           | Sometimes you can even make real world performance
           | significantly better by ignoring the technically better
           | algorithmic complexity, especially when you almost always
           | with low n's
        
         | mananaysiempre wrote:
         | The IEEE standard doesn't really require any particular
         | precision for transcendental functions IIUC? The original
         | standard doesn't even define them. The 2008 version includes
         | optional correctly-rounded transcendentals, but there seems to
         | be a total of one implementation, CRlibm, which is an academic
         | project of more than a decade (and a dead home page because of
         | the INRIA Forge shutdown). Your run-of-the-mill libm has little
         | to do with this.
        
           | lntue wrote:
           | The latest revision IEEE 754-2019 do require correctly
           | rounded for transcendental functions: https://en.wikipedia.or
           | g/wiki/IEEE_754#Recommended_operation...
        
             | mananaysiempre wrote:
             | Right, it _requires_ correct rounding for the _optional_
             | correctly-rounded functions, which ISO C, for example, does
             | not require to correspond to your usual libm ones, or even
             | to be provided at all (even with __STDC_IEC_559__ or
             | __STDC_IEC_60559_BFP__). I _think_ I saw a spec for
             | correctly rounded functions for C somewhere, but even the
             | latest draft[1] of C23 is content with just reserving the
             | cr_* namespace and not mandating much for the usual
             | functions outside it (F.10 /16):
             | 
             | > IEC 60559 specifies correct rounding for the operations
             | in the F.3 table of operations recommended by IEC 60559,
             | and thereby preserves useful mathematical properties such
             | as symmetry, monotonicity, and periodicity. The
             | corresponding functions with (potentially) reserved
             | `cr_`-prefixed names (7.33.8) do the same. The C functions
             | in the table, however, are not required to be correctly
             | rounded, but implementations should still preserve as many
             | of these useful mathematical properties as possible.
             | 
             | [1]: https://www.open-
             | std.org/jtc1/sc22/wg14/www/docs/n3088.pdf
        
         | lifthrasiir wrote:
         | In reality though, a typical stock libm is not _that_ accurate;
         | their error generally ranges from 1 to 5 ulps with no actual
         | guarantees. I 'm more interested in a hybrid libm with
         | configurable errors, from the most accurate possible (0.5 ulps)
         | to something like 2^20 ulps, and all in between.
        
       | cultofmetatron wrote:
       | cosine and sine are easy thanks to the taylor series. They are
       | both one liners in haskell. tangent on the other hand....
        
       | okaleniuk wrote:
       | I love these things! Approximation is art.
       | 
       | I have a similar example in my book Geometry for Programmers
       | (https://www.manning.com/books/geometry-for-programmers). There I
       | make a polynomial approximation for sine that is: 1) odd (meaning
       | antisymmetric); 2) precise at 0 and Pi/2 (and -Pi/2
       | automatically); 3) have precise derivatives and 0 and Pi/2 (and
       | -Pi/2); 4) have better overall precision than the equivalent
       | approximation with Taylor series; and it only consists of 4
       | members! Adding more members makes it impractical in the modern
       | world since we have hardware implementations for sine on CPUs and
       | GPUs.
       | 
       | Spoiler alert, the secret sauce is the integral equation ;-)
        
         | stkdump wrote:
         | CPUs do have a 'hardware' (I guess microcode) implementation in
         | the 80387 era FPU. But it isn't used. One reason is that the
         | software implementation is faster in modern floating point
         | code. I don't know about GPUs but I would guess compilers
         | transparently take care of that.
        
           | sacnoradhq wrote:
           | It depends on the precision and accuracy and data types of
           | the arguments and result. The 80387 was absolutely faster
           | than software emulation for F32/64/80 by hooking real-mode
           | interrupt handlers the way it was back to the 8086/8087. For
           | I8/16, lookup tables were faster. For I32, a quality result
           | would be very expensive. There were no GPUs in the 387 era,
           | and there wasn't SIMD (MMX) either. In the x86 era, compilers
           | weren't that smart. If they were, then they were slow.
        
           | adgjlsfhk1 wrote:
           | the real problem is Intel screwed up their sin implementation
           | by only using 66 bits of pi and then refused to fix it.
        
         | boyka wrote:
         | The realization that a 6th order approximation is good enough
         | given our definition of float is beautiful.
        
           | commandlinefan wrote:
           | You can approximate 8 to 8 significant figures using
           | 987654321/123456789 = 8.0000000729
        
           | kolbe wrote:
           | It is technically 12th order, but every other coefficient is
           | zero, so it calculates a polynomial approximation in x^2
           | space
        
         | okaleniuk wrote:
         | Oh, and I have a free explanation too:
         | https://wordsandbuttons.online/sympy_makes_math_fun_again.ht...
        
           | JunkEmu wrote:
           | That was a really fun read. Thanks for sharing!
        
       | olabyne wrote:
       | https://github.com/RobTillaart/FastTrig/blob/master/FastTrig...
       | 
       | If absolute precision is not required, there is many ways to
       | compute cosinus !
        
       | himinlomax wrote:
       | What is y, "the tail of x"?
        
         | bonzini wrote:
         | You pass an argument, let's call it a, to cos. Because the
         | polynomial approximation is only valid for a relatively small
         | value of the argument, the first step is to compute the
         | remainder of a when divided by pi/2. The result, let's call it
         | x, is as small as possible but will (or should) satisfy
         | cos(x)=cos(a).
         | 
         | For example, let's say you're calculating cos(4). To reduce the
         | argument you want to compute 4-n*pi/2 for some n. In this case
         | n=2 so you want 4-pi, the smallest possible number whose cosine
         | is equal to cos(4). In double-precision floating point, pi is
         | not irrationall it's _exactly_ 7074237752028440 /2^51. Thus,
         | x=4-pi will be 7731846010850208/2^53.
         | 
         | However, that's not enough to calculate cos(4) precisely,
         | because pi is actually 1.224... * 10^-16 more than the fraction
         | I gave before and thus cos(x) will not be exactly cos(4). To
         | fix this, the remainder is computed in multiple precision, and
         | that's where y comes in.
         | 
         | y is the result of computing (a-n*pi/2)-x with a value of pi
         | that has a higher precision than just 7074237752028440/2^51. It
         | is very small---so small that you don't need a polynomial to
         | compute the remaining digits of cos(4), you can just use the
         | linear approximation -x*y. But it's still very important in
         | order to get the best possible approximation of cos(a), even if
         | a is outside the interval -pi/2 to pi/2.
        
       | dboreham wrote:
       | My first experience with "Use the source Luke" was wondering as a
       | young lad how computers calculated trig functions. This led me to
       | the Unix V7 math lib source code[1], and thence to a book by Hart
       | & Cheney which by luck my university library had on the shelf.
       | Good times.
       | 
       | [1]
       | https://github.com/v7unix/v7unix/blob/master/v7/usr/src/libm...
        
         | abudabi123 wrote:
         | A like minded reader may want to have a look at the Go lang.
         | math standard library source.
         | 
         | * https://pkg.go.dev/math
        
           | chubot wrote:
           | Woah why do they have s390x assembly versions of everything?
           | 
           | https://cs.opensource.google/go/go/+/refs/tags/go1.20.2:src/.
           | ..
           | 
           | That's for a mainframe?
        
           | cnity wrote:
           | What are the functions P and Q defined here? https://cs.opens
           | ource.google/go/go/+/refs/tags/go1.20.2:src/...
        
             | hurrrrrrrr wrote:
             | Polynomials with the coefficients defined in the sin and
             | cos arrays (and likely in the referenced literature).
        
       | [deleted]
        
       | howerj wrote:
       | Personally I like Universal CORDIC, you can calculate much more
       | than Cosine, but it tends to be slower. See
       | https://en.wikibooks.org/wiki/Digital_Circuits/CORDIC#The_Un...
       | and a somewhat janky fixed point library that implements it
       | https://github.com/howerj/q
        
         | phazy wrote:
         | The CORDIC algorithm is great! Just a few months ago, I
         | implemented a sine generator with it and also used fixed point
         | arithmetic. It is a real eye opener to see that you can
         | substitute multiplications by bit shifting when you store the
         | numbers in the "right" format.
        
         | userbinator wrote:
         | CORDIC is what many digital calculators use (along with decimal
         | floating point).
        
           | nsajko wrote:
           | > along with decimal floating point
           | 
           | Source? I'd find this a bit strange because:
           | 
           | 1. I suspect decimal floating point is quite expensive to
           | implement computationally.
           | 
           | 2. Not an expert, but I think the floating-point algorithms
           | and proofs for bases other than 2 are less well researched
           | and less well known.
           | 
           | 3. A calculator is a resource-constrained device.
           | 
           | 4. Are there even any real benefits? Yes, some human-friendly
           | numbers have an exact finite representation that is missing
           | in base-2, but I can't see why would that matter, compared to
           | just using 32-bit or 64-bit IEEE-754.
        
             | adgjlsfhk1 wrote:
             | Decimal floating point is pretty easy to impliment in
             | hardware.
        
               | nsajko wrote:
               | You may be confusing floating-point with fixed-point?
               | 
               | Hardware with decimal floating-point only appeared in
               | 2007, and still doesn't seem to be easily available:
               | 
               | https://en.wikipedia.org/wiki/Decimal_floating_point
               | 
               | https://en.wikipedia.org/wiki/POWER6
        
               | Dylan16807 wrote:
               | I expect that's because the cost versus effort for having
               | _two_ kinds of floating point isn 't very good. Not
               | because decimal floating point is notably more expensive
               | than binary floating point.
        
             | userbinator wrote:
             | One of the more memorable examples:
             | https://news.ycombinator.com/item?id=28559442
             | 
             | Decimal, and particularly BCD floating point, is much
             | simpler than binary for a calculator since the numbers
             | don't need any additional conversion steps to display.
             | 
             | And the fact that 0.1 + 0.1 works as expected is indeed a
             | huge benefit.
        
       | [deleted]
        
       | zhamisen wrote:
       | Very good explanation about Remez algorithm: https://xn--
       | 2-umb.com/22/approximation/index.html
        
       | vector_spaces wrote:
       | Does anyone know of a good reference on signals and systems for
       | people with a pure math background rather than an engineering or
       | physics background?
       | 
       | So, the implementation linked above uses the polynomial of best
       | approximation to cosine on an interval. That polynomial was
       | presumably obtained using the Remez algorithm, which, given a
       | function f and a set of n+2 points in your desired interval,
       | yields a polynomial of degree n best approximating f in the
       | interval (in the uniform norm sense)[1] by performing a sequence
       | of linear solves
       | 
       | I have a math and specifically approximation theory background,
       | and the wiki article on the Remez algorithm makes a lot of sense
       | to me. Like, this is the language I speak:
       | https://en.wikipedia.org/wiki/Remez_algorithm
       | 
       | I'm less comfortable though when I try to understand the
       | documentation for the remez implementation in SciPy because it is
       | using the language of signals and systems, which is how engineers
       | think about this stuff:
       | https://docs.scipy.org/doc/scipy/reference/generated/scipy.s...
       | 
       | Signals seems to use a lot of analogies to sound waves, and it
       | does map in some way or another to the mathematical language I'm
       | comfortable with. I've spent a little bit of time reading papers
       | and introductory texts on signals, but the definitions feel a
       | little loosey goosey to me.
       | 
       | Anyway, just opportunistically using this thread to ask if anyone
       | knows of some book or papers or blog posts that can help me
       | translate this stuff into math language :)
        
         | ur-whale wrote:
         | I feel your pain.
         | 
         | I've spent most of my career working at the intersection of
         | multiple highly technical domains, both theoretical and
         | applied, and a) the "jargon switch" b) the methodology
         | differences and worst c) the switch of conceptual framework
         | required to work on a problem that spans multiple domains is
         | initially very irritating.
         | 
         | I'm pointing that out because if you are only looking for a
         | rosetta stone to only tackle a), even if you find one, it'll
         | leave you wanting because you'll still need maps to tackle b)
         | and c)
         | 
         | Example: analog electronics. From a mathematician's POV, an
         | analog circuit is just a system of differential equations.
         | 
         | But very few analog electronics specialist ever think about a
         | circuit that way.
         | 
         | The words they use are the least important part of the
         | "translation" problem. The way they _approach_ and _think
         | about_ the system will be utterly alien to a pure or even
         | applied math person.
         | 
         | Eventually, after learning the trade, you'll start to see
         | familiar patterns emerge that can be mapped back to stuff
         | you're familiar with, but a simple "this EE jargon means that
         | in math language" won't be enough: you're going to have to get
         | your hands dirty and learn _how_ they think of the problem.
        
         | lntue wrote:
         | I don't know much about numpy.signal package, but there are
         | other python packages that have good implementation for
         | generating polynomial approximation. Among them are
         | pythonsollya (a Python wrapper for Sollya), and SageMath.
         | 
         | Other than that, some of the overview references about Remez
         | algorithm (should be a bit closer to your taste) that I like
         | are:
         | 
         | - J.M. Muller et. al., "Handbook of Floating-Point Arithmetic",
         | section 10.3.2 (and follow references in that section).
         | 
         | - N. Brisebarre and S. Chevillard, "Efficient polynomial
         | L-approximations," 18th IEEE Symposium on Computer Arithmetic
         | (ARITH '07), which is implemented in Sollya tool:
         | https://www.sollya.org/sollya-2.0/help.php?name=fpminimax
        
         | asdf_snar wrote:
         | Just noting that your link to docs.scipy is broken. I'm also in
         | a similar boat to you and would be interested in an answer to
         | this question.
        
           | vector_spaces wrote:
           | Thanks, fixed it!
        
         | stagger87 wrote:
         | The scipy documentation you linked is specifically for
         | generating a FIR filter using the Remez algorithm, which is why
         | it is written in that style. It is a very specific use case of
         | the algorithm and probably not the one I would recommend
         | learning from. Another top level comment has a link to a decent
         | overview of the Remez algorithm written without the "Signals
         | and Systems" language.
        
           | vector_spaces wrote:
           | Thanks, I think my request was unclear, but I'm specifically
           | trying to understand the documentation linked, not the Remex
           | algorithm itself. I'm trying to reconcile my knowledge of it
           | with the language used in the documentation.
           | 
           | I have a pretty comfortable understanding of the Remez
           | algorithm and generalizations and have written my own
           | implementations for a research project
        
             | stagger87 wrote:
             | Take a look at the 2 listed references by Parks and
             | McClellan. They are considered seminal papers in the field
             | of signal processing and are the basis of the function you
             | linked. And they will probably appeal to your math
             | background more.
             | 
             | The name of the algorithm is Parks McClellan filter design,
             | or sometimes you will see it named firpm, as popularized by
             | MATLAB.
             | 
             | https://en.wikipedia.org/wiki/Parks%E2%80%93McClellan_filte
             | r....
        
       | noobcoder wrote:
       | https://www.atwillys.de/content/cc/sine-lookup-for-embedded-...
       | 
       | Have you checked out this?
       | 
       | Its an alternative implementation which uses interpolated table
       | lookup and requires only 66 bytes of constant memory with an
       | error of less than 0.05% compared to the floating point sine.
        
         | lifthrasiir wrote:
         | The (presumably relative) error of 0.05% implies only ~11
         | significant bits, while 32-bit and 64-bit floating point
         | formats have 24 and 53 significant bits respectively. If your
         | sine has only 11 correct bits and you are okay with that, why
         | do you have much more than 11 bits of _input_ precision after
         | all?
         | 
         | EDIT: Someone (sorry, the comment was removed before I could
         | cite one) mentioned that it is actually for Q1.15 fixed point.
         | So it was not even an "alternative" implementation...
        
           | [deleted]
        
       | schappim wrote:
       | This is a faster implementation of cosine that utilises six
       | static const double variables (C1, C2, C3, C4, C5, C6) to speed
       | things up. These variables represent the coefficients of a
       | polynomial approximation for the cosine function.
       | 
       | For the genuine article, check out:
       | https://github.com/ifduyue/musl/blob/master/src/math/cos.c
        
         | [deleted]
        
         | kolbe wrote:
         | Why do you return x-x for inf or nan? I can't test now, but I
         | imagine they both return nan?
         | 
         | I run a high performance math library in C#, and I'm always
         | looking to improve.
         | 
         | https://github.com/matthewkolbe/LitMath
        
           | jcranmer wrote:
           | Using the expression ensures that the FP exceptions are
           | properly set. It also does quieting of sNaN or NaN payload
           | propagation as appropriate.
        
           | hurrrrrrrr wrote:
           | It's an easy way to return nan and set FE_INVALID. If you're
           | not worried about floating point exceptions you can just
           | return NaN directly. Idk how much C# even guarantees about
           | the floating point operations and if it's applicable there.
           | 
           | https://en.cppreference.com/w/cpp/numeric/fenv/FE_exceptions
        
         | FabHK wrote:
         | It's the core implementation for a specific and very narrow
         | range of arguments. The "genuine article" moves the argument
         | into that range (via the rem_pio2 function performing range
         | reduction, which is worth an article in itself), and then calls
         | this core implementation.
        
       | zython wrote:
       | Its been a while since I've dealt with this, is this something
       | similar to the CORDIC algorithm ? Couple of years ago I
       | implemented it in VHDL, very fun exercise.
       | 
       | https://en.wikipedia.org/wiki/CORDIC
        
       | metalrain wrote:
       | So why is that extra term needed here?
       | 
       | w = 1.0-hz;
       | 
       | return w + (((1.0-w)-hz) + (z _r-x_ y));
       | 
       | ------------------^ here
        
         | moonchild wrote:
         | Mathematically, we have (1 - (1 - hz)) - hz = (1 + hz) - hz =
         | 1, so the computation seems redundant. But due to floating-
         | point rounding, it is not, and still creates information. As
         | the comment says:
         | 
         | > cos(x+y) = 1 - (x*x/2 - (r - x*y))
         | 
         | > For better accuracy, rearrange to
         | 
         | > cos(x+y) ~ w + (tmp + (r-x*y))
         | 
         | > where w = 1 - x*x/2 and tmp is a tiny correction term
         | 
         | > (1 - x*x/2 == w + tmp exactly in infinite precision)
         | 
         | (hz is x*x/2)
         | 
         | See double-double arithmetic -
         | https://en.wikipedia.org/wiki/Quadruple-precision_floating-p...
        
         | jcranmer wrote:
         | The idea is to be able to compute 1.0 - h * z + (z * r - x * y)
         | in high precision.
         | 
         | If h * z is slightly less than 1.0, then w = 1.0 - h * z can
         | lose a lot of bits of precision. To recover the bits that were
         | dropped in subtracting h * z from 1.0, we compute which bits of
         | h * z were used (the high bits) and then subtract those bits
         | from h * z to get only the low bits. We find the bits that were
         | used by simply undoing the operation: 1.0 - w is mathematically
         | the same as h * z, but is not the same as implemented due to
         | limited precision.
         | 
         | So (1.0 - w) - h * z is the extra bits that were dropped from
         | the computation of w. Add that to the final term z * r - x * y
         | _before_ that result is added to w, and we have a more precise
         | computation of w + z * r - x * y.
        
           | bonzini wrote:
           | Also why does it do r = z * ..., and then does another
           | multiplication z * r at the end? Why not leave that z out and
           | do w * r at the end?
        
       | EVa5I7bHFq9mnYK wrote:
       | I remember an approximation function used in vacuum-tubes based
       | surface-to-air system of the 60s: cos(x) = 0.7. As explained to
       | us rookies, that precision was enough to hit the target.
        
         | ogogmad wrote:
         | That can't be good for all x. You mean x=p/4 or to 1dp.
        
           | paulddraper wrote:
           | > That can't be good for all x.
           | 
           | It's within 1.3.
        
           | version_five wrote:
           | It's probably a pretty good 1st order guess that a surface to
           | air missile launcher is aimed at 45 degrees. It's not aimed
           | straight up and it's not flat on the ground.
        
             | Arrath wrote:
             | > It's not aimed straight up
             | 
             | Well that certainly depends!
             | 
             | The VLS in ships are exactly that, vertical launch. As are,
             | I believe, most Soviet/Russian large SAM systems like the
             | S300 and S400
        
           | dodnkskd wrote:
           | Well, cos(p/4) happens to be the RMS value of the function.
        
           | Joker_vD wrote:
           | Eh, it's close enough. After all, in the time of war the
           | value of sine can go up to 4.
        
             | nroets wrote:
             | Only 4 ?
             | 
             | sin(1.5708) = 1
             | 
             | sin(1.5708 - 1.3170 i) = 2
             | 
             | sin(1.5708 - 2.0634 i) = 4
             | 
             | ...
        
               | dustingetz wrote:
               | wait what? ELI grad student, I know you can plug i into
               | the sine taylor series expansion but what is happening
               | here
        
               | hansvm wrote:
               | Sin/cos are tightly connected to the complex exponential
               | and exhibit a lot of such behavior.
               | 
               | In particular, sin(x+it) = sin(x)cosh(y) + i
               | cos(x)sinh(y). Set x something to zero out cosine, and
               | rely on the exponential nature of cosh to find a target y
               | value.
        
               | version_five wrote:
               | The above should be intelligible to someone with high
               | school math, no? I'd be interested in the grad student
               | version, just out of curiosity
        
               | nroets wrote:
               | Most highschool students mainly use sin to calculate the
               | lengths of sides of triangles. So it feels very
               | counterintuitive that sin can be greater than 1.
               | 
               | If you define sin as the solution to a particular
               | differential equation, then it's relationship to the
               | hyperbolic functions is much more obvious.
        
               | some_furry wrote:
               | This is radians, which can go from 0 to tau (2 * pi,
               | about 6.28).
        
               | [deleted]
        
               | scythe wrote:
               | sin(x + pi/2) = cos(x) = cosh(ix)
               | 
               | Therefore, sin (pi/2 + i arcosh(2)) = cosh(-arcosh(2)) =
               | 2
               | 
               | where arcosh(y) is the familiar inverse of the hyperbolic
               | cosine, i.e. log(y + sqrt(y^2 - 1)), so we find sin(pi/2
               | + i log(2 + sqrt(3))) = 2.
        
         | marshray wrote:
         | I can see it now.                 float cos(x)       {
         | return 0.7; // Chosen by fair dice roll       }
        
       | mockery wrote:
       | Can someone shed light on these two steps in the approximation?
       | *         since cos(x+y) ~ cos(x) - sin(x)*y       *
       | ~ cos(x) - x*y,
       | 
       | For the first, maybe it's an approximate identity I'd never heard
       | of?
       | 
       | For the second, it seems like it relies on X being small, but in
       | this code... Y might be small enough for sin==identity
       | approximation, but X isn't, is it?
        
         | dahart wrote:
         | The input is a double-double where y << x by definition. So the
         | first line is just using the standard cos identity cos(x+y) =
         | cos(x)cos(y) - sin(x)sin(y). Since y is very small, we can
         | approximate cos(y) ~= 1, and sin(y) ~= y. This yields the first
         | line.
         | 
         | The second line, I believe, is not relying on x being small,
         | but rather relying on sin(x)*y being small. The whole result of
         | the sin(x)*y term is approximately on the order of the LSB of
         | the result, and the _relative_ error of approximating sin(x)
         | using 1 is better than using 0, so this term will help produce
         | more accurate rounding of the bottom bits.
        
         | hurrrrrrrr wrote:
         | Looks like angle sum identity.                   cos(x+y) =
         | cos(x)*cos(y) - sin(x)*sin(y)     (always)                  ~
         | cos(x)*1      - sin(x)*y          (if y small)
         | 
         | Can't speak for the validity of small angle assumptions though.
        
         | lntue wrote:
         | I think the function inputs are in double-double format, so the
         | assumption is that the magnitude of y is significantly smaller
         | than the magnitude of x, ideally |y| < 2^(-52) |x|. So that 1
         | is a very good approximation for cos(y), since |cos(y) - 1| <
         | |y|^2 < 2^(-104) |x|^2. Similarly, if we assume |x| is also
         | small, |sin(x) - x| < |x|^3 is also a very good approximation
         | (same with sin(y) ~ y).
         | 
         | So using the cosine of sum formula:                 *  cos(x +
         | y) = cos(x) cos(y) - sin(x) sin(y)       *             ~ cos(x)
         | - x*y.
        
       | squeaky-clean wrote:
       | Here's someone else's fast-sine code snippet I found and used
       | about a decade ago when I was working on an additive synthesizer
       | I never finished. It basically uses the same method, Taylor
       | series approximation.
       | 
       | Driving it as an oscillator relies on just a single integer
       | addition operation per sample to increment the phase, and relies
       | on integer overflow to reset the phase.
       | 
       | I like the comment from someone asking if it could really be so
       | fast since it has 7 whole multiplications in it. Ah I love the
       | realtime dsp world, so different from my python/JS day job.
       | 
       | https://www.musicdsp.org/en/latest/Synthesis/261-fast-sin-ap...
        
         | IIAOPSW wrote:
         | You might be interested to know about mod 2^m -1. Look at these
         | numbers:
         | 
         | 32640, 15420, 39321, 43690
         | 
         | In binary they look like: 1111111100000000, 0111100001111000,
         | 0011001100110011, 0101010101010101
         | 
         | How cute. They are like little one bit precision sine waves.
         | 
         | I assure you that these numbers multiplied by each other mod
         | (2^16 -1) are all equal to zero. When multiplied by themselves
         | they shift position. test it for yourself. Long story short, in
         | the field of mod 2^16 -1, 2^x works just like e^pi i, these
         | numbers function just like sine and cosine, and you can even
         | make linear superposition out of them and use them as a basis
         | to decompose other ints into "frequency space."
         | 
         | Even better, the number i is represented here. There's a number
         | int this modulo system which squares to 1111111111111110 (aka
         | -1). Once you see it, you can implement fast sine and cosine
         | directly through 2^{ix} = cos(x) + i sin(x).
         | 
         | Cool right.
         | 
         | Have fun.
        
           | squeaky-clean wrote:
           | Whoa that is awesome. You and Sesse__ are really inspiring me
           | to get a VST dev environment set up again this weekend.
        
         | Sesse__ wrote:
         | If you're doing a synth and want a sine wave, you usually won't
         | need to compute cos(x + at) from scratch every time. There's a
         | cheaper and more accurate trick.
         | 
         | First, figure out how much the phase (in radians) is going to
         | increase every sample. E.g., for 440 Hz at 44100 Hz sample
         | rate, you want 2pi * 440 / 44100 = 0.06268937721 (or so).
         | Precompute sin(a) and cos(a), once.
         | 
         | Now your oscillator is represented as a point in 2D space.
         | Start it at [1,0]. Every sample, you rotate this point a
         | radians around the origin by a radians (this is a 2x2 matrix
         | multiplication, using the aforemented sin(a) and cos(a)
         | constants, so four muladds instead of seven). You can now
         | directly read off your desired cos value; it's the x
         | coordinate. You get sin for free if you need it (in the y
         | coordinate).
         | 
         | If you run this for millions or billions of samples (most
         | likely not; few notes are longer than a few seconds), the
         | accumulated numerical errors will cause your [x,y] vector to be
         | of length different from 1; if so, just renormalize every now
         | and then.
         | 
         | Of course, if you need to change a during the course of a note,
         | e.g. for frequency or phase modulation (FM/PM), this won't
         | work. But if so, you probably already have aliasing problems
         | and usually can't rely on a simple cos() call anyway.
        
       | ducktective wrote:
       | Same vibes:                 float Q_rsqrt( float number )
       | {        long i;        float x2, y;        const float
       | threehalfs = 1.5F;             x2 = number * 0.5F;        y  =
       | number;        i  = * ( long * ) &y;                       //
       | evil floating point bit level hacking        i  = 0x5f3759df - (
       | i >> 1 );               // what the fuck?         y  = * ( float
       | * ) &i;        y  = y * ( threehalfs - ( x2 * y * y ) );   // 1st
       | iteration        // y  = y * ( threehalfs - ( x2 * y * y ) );
       | // 2nd iteration, this can be removed        return y;      }
       | 
       | https://en.wikipedia.org/wiki/Fast_inverse_square_root#Overv...
        
         | 29athrowaway wrote:
         | It may be possible to rewrite this in a more readable way, with
         | fewer magic numbers, using constant expressions.
        
       | lntue wrote:
       | Some of the more recent implementations of cosine for single
       | precision that are correctly rounded to all rounding modes
       | according to IEEE 754 standard:
       | 
       | - The CORE-MATH project: https://gitlab.inria.fr/core-math/core-
       | math/-/blob/master/sr...
       | 
       | - The RLIBM project: https://github.com/rutgers-apl/The-RLIBM-
       | Project/blob/main/l...
       | 
       | - The LLVM libc project: https://github.com/llvm/llvm-
       | project/blob/main/libc/src/math...
        
       | nhatcher wrote:
       | I recently had to implement some Bessel functions numerically.
       | It's fantastic to see that someone at Sun Microsystems
       | implemented all this procedures in the early 90's (I'm sure even
       | long before that). You will find this implementations in R,
       | Julia, musl, you name it!
       | 
       | I like it specially the use the "GET_HIGH_WORD" and friends:
       | 
       | https://git.musl-libc.org/cgit/musl/tree/src/math/j0.c
       | 
       | It's a lot of fun to figure out what they meant by things like:
       | 
       | > if (ix >= 0x7ff00000) return 1/(x*x);
        
         | adgjlsfhk1 wrote:
         | you should check out Bessels.jl which has really high
         | performance implications.
        
           | nhatcher wrote:
           | https://github.com/JuliaMath/Bessels.jl/blob/master/src/bess.
           | ..
           | 
           | Thanks! I love it, so easy to understand and follow.
           | 
           | My favourite work on the subject is Fredrik Johansson's:
           | 
           | https://github.com/fredrik-johansson/arb
           | 
           | Whenever I feel down and without energy I just read something
           | in there
        
             | adgjlsfhk1 wrote:
             | yeah, Fredrik's papers are great. A lot of the reason
             | Bessels.jl is so good is because ARB makes it easy for us
             | to have test against a known source of truth (which make
             | things like minimax polynomials and picking optimal
             | tradeoffs a ton easier).
        
               | nhatcher wrote:
               | You work with Julia? Nice! I actually use Julia to
               | generate my test cases:                 using Nemo
               | CC = AcbField(100)            values = [1, 2, 3, -2, 5,
               | 30, 2e-8]            for value in values           y_acb
               | = besseli(CC(1), CC(value))           real64 =
               | convert(Float64, real(y_acb))           im64 =
               | convert(Float64, real(y_acb))           println("(",
               | value, ", ", real64, "),")       end
        
       | IIAOPSW wrote:
       | Personally I go with the recursive definition. Use the trig
       | identity cos(x)^2 + 1 = cos(2x).                   2cos(x)^2 - 1
       | = cos(2x).         2(2cos(x)^2 - 1)^2 - 1 = 2cos(2x)^2 - 1 =
       | cos(4x)         2(2(2cos(x)^2 - 1)^2 - 1)^2 - 1 = 2cos(4x)^2 - 1
       | = cos(8x)
       | 
       | Keep squaring, doubling and subtracting one one the left. Keep
       | raising by powers of 2 in the argument on the right. Eventually,
       | if you're working with rational numbers, multiplication by 2^k
       | moves all k fractional bits over to the integer side of the
       | decimal, and then integer times pi is the base case of the
       | recursion. If you rescale the units so that pi = 2^(n-1) for some
       | n, you don't even have to think about the non-fractional part.
       | The register value circling back to 0 is supposed to happen
       | anyway.
       | 
       | You could Just as easily recursively define cos to "spiral
       | downwards in x rather than outwards", each recursive layer
       | reducing the argument of x by a factor of 2 until it reduces down
       | to the precomputed value cos(machine precision).
       | 
       | This was all just circular reasoning
        
         | austinjp wrote:
         | > This was all just circular reasoning.
         | 
         | Nice :)
        
       | lifthrasiir wrote:
       | Which is just a support function for cos (which doesn't do
       | argument reduction). See [1] for the actual function, and [2] &
       | [3] for the argument reduction code.
       | 
       | [1] https://github.com/ifduyue/musl/blob/master/src/math/cos.c
       | 
       | [2]
       | https://github.com/ifduyue/musl/blob/master/src/math/__rem_p...
       | 
       | [3]
       | https://github.com/ifduyue/musl/blob/master/src/math/__rem_p...
        
         | FabHK wrote:
         | To give a bit more background on this:
         | 
         | You really only need to compute one eighth of a circle segment
         | (so, zero to pi/4), everything else you can do by symmetry by
         | switching signs and flipping x, y. (And if you supported a
         | larger range, you'd waste precision).
         | 
         | The function supports doubledouble precision (which is a trick,
         | different from long double or quadruple precision, in which you
         | express a number as an unevaluated sum of two doubles, the head
         | (large) and the tail (much smaller)).
         | 
         | So, the "real" cosine function first takes its argument, and
         | then reduces it modulo pi/2 to the desired range -pi/4, +pi/4
         | (with minimal loss of precision - you utilise the sign bit, so
         | that's better than using the range 0, +pi/2) in double double
         | precision (rem_pio2 returns an int denoting which quadrant the
         | input was in, and two doubles), and those two doubles are then
         | passed to this function (or its cousin __sin, with the sign
         | flipped as appropriate, depending on the quadrant).
         | 
         | Many people really have put in quite some care and thought to
         | come up with this neat and precise solution.
         | 
         | ETA: Double double (TIL that it's also something in
         | basketball?): https://en.m.wikipedia.org/wiki/Quadruple-
         | precision_floating...
         | 
         | Range reduction:
         | https://redirect.cs.umbc.edu/~phatak/645/supl/Ng-ArgReductio...
        
           | efitz wrote:
           | _You really only need to compute one eighth of a circle
           | segment (so, zero to pi /4), everything else you can do by
           | symmetry by switching signs and flipping x, y. (And if you
           | supported a larger range, you'd waste precision)._
           | 
           | In high school I was writing a paint program in BASIC on the
           | Apple ][. Drawing circles was mind-numbingly slow until I
           | read an article somewhere that pointed this out. It was like
           | magic.
        
         | toxik wrote:
         | Interesting idea to compare the absolute value by simply not
         | doing a float comparison.
         | 
         | I see they do a bit more than that now. Wonder how much time
         | that saves.
        
       | mianos wrote:
       | This takes me back. I did the transcendental functions for a CPM
       | C compiler in the early 80s using Chebyshev polynomials.
       | 
       | My tutor failed my implementation because my code formatting was
       | utterly crap. :) I am still laughing as there was no sin, cos in
       | the maths library at all at the time.
        
         | OnlyMortal wrote:
         | Bit shifted sin values in a pre-made table of integers could be
         | used so the maths was all integer based.
         | 
         | Commonly used on processors like the 6502 or 68000 for vector
         | graphics.
         | 
         | It was accurate enough for games.
        
           | mianos wrote:
           | It was floating point math. There was nothing in the CPU to
           | support it so it was ridiculously slow.
           | 
           | But, sometimes you just want to have those functions to do
           | some complex geometry.
        
           | richrichardsson wrote:
           | > It was accurate enough for games.
           | 
           | Unless like me you made a typo when copying the table from a
           | book and had a vector routine that mostly looked alright but
           | would jerk slightly when that one value that was incorrect
           | was chosen! <facepalm />
        
         | anonymousiam wrote:
         | I'm pretty sure that indent was around even in the early 80's.
         | Even if you couldn't be bothered to run it, your tutor could
         | have. Must not have been a very good tutor to fail you for
         | style.
        
           | Quarrel wrote:
           | How sure?
           | 
           | indent was written in 76, but didn't make it to BSD until 82.
           | 
           | CP/M was first released in 74!
           | 
           | It was a pretty different world.
        
             | anonymousiam wrote:
             | I was using CP/M since about 1978, but you're right --
             | AFAIK there never was a port of indent to CP/M. I did make
             | some assumptions, but developing for CP/M on CP/M was
             | somewhat painful. There just wasn't enough memory space to
             | work efficiently, or (often) even fit the compiled image
             | into memory (<~60k after BIOS/BDOS/CCP on CP/M 2.2).
             | Chebyshev polynomials certainly imply floating point, which
             | could be a _BIG_ memory hog with so little room.
             | 
             | If I had the resources at the time, I would have undertaken
             | such a development task on a more capable (Unix) platform.
             | Such an environment would improve the ability for
             | developing/debugging the optimizations required. As you
             | say, indent was available in BSD since '82, and I know it
             | was present on the Sun2 systems I used in '83.
        
               | mianos wrote:
               | I had an Osborne 1. I had to swap disks multiple times to
               | compile anything. I remember the C compiler by some guy,
               | 'Walt Bolisvki'?, had the arguments passed in reverse on
               | the stack (last on the top) and printf and friends ended
               | up crazy hacks.
               | 
               | I didn't have access to a Unix at the time or very
               | limited open source so ident was not a thing.
        
       | hardware2win wrote:
       | double_t hz,z,r,w;
       | 
       | Wtf?
       | 
       | *clicks request changes*
        
       | SeanLuke wrote:
       | Fascinating. But why are cos and sin implemented with different
       | polynomials?
        
       | vrglvrglvrgl wrote:
       | [flagged]
        
       | fwungy wrote:
       | It is possible to draw a line through n points with an n-1 degree
       | polynomial, and since the sine/cosine curve is so simple you
       | don't need to take that many samples to get a good approximate.
       | 
       | Would it not be faster to just store a lookup table and
       | interpolate is my question. I recall hearing that's how it's
       | done.
        
         | 6nf wrote:
         | CPUs can do dozens of calculations for the cost of one L1 cache
         | fetch so this is still going to be faster probably
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-03-31 23:02 UTC)