[HN Gopher] Implementing Cosine in C from Scratch (2020)
___________________________________________________________________
Implementing Cosine in C from Scratch (2020)
Author : signa11
Score : 150 points
Date : 2022-03-29 16:38 UTC (6 hours ago)
(HTM) web link (austinhenley.com)
(TXT) w3m dump (austinhenley.com)
| aaaronic wrote:
| Mandatory Physics "troll" comment:
|
| For small angles, sin(x) ~= x and cos(x) ~= 1 -- the ["small
| angle approximation"](https://en.wikipedia.org/wiki/Small-
| angle_approximation)
|
| It's actually kind of ridiculous how many times it comes up and
| works well enough in undergraduate Mechanics.
| andi999 wrote:
| Dont forget optics, there tan x=sin x
| aaaronic wrote:
| Yep, it's all over Physics, really. Many non-mechanical
| concepts are often mapped to a mechanical analog model (so
| many tiny oscillators out there!).
| Sharlin wrote:
| Useful in astronomy and telephoto photography as well.
| jhgb wrote:
| This a perfectly nice Programming 101 example of a recursive
| function -- if an angle is too large, calculate its sine using
| the sine of the half-angle, otherwise return the angle (or, if
| you're fancy, some simple polynomial approximation of the
| sine). I'm sure everyone did this in school (we did, in fact).
| gugagore wrote:
| It's not obvious when you should switch over to an
| approximation (base case), so I'd say it's not a good example
| to introduce recursion. I have never seen it, and I did not
| do it in school.
| jhgb wrote:
| It's not obvious but that makes it a nice example in
| parameter optimization as well.
| scythe wrote:
| It's "obvious" if you remember how the Taylor series for
| sin(x) works. The error between x and sin(x) is bounded
| above by x^3/6. So there you go.
| aaaronic wrote:
| When the difference between iterations is less than
| whatever threshold of accuracy you're looking for, I'd
| assume.
|
| "Relaxation" algorithms for calculating fields work that
| way.
| dekhn wrote:
| IIRC this comes up in the derivation of the boltzmann
| distribution. I also recall stirlings approximation for x!.
| nwallin wrote:
| It's all well and good to have a library with fast sin/cos/tan
| functions, but always remember to cheat if you can.
|
| For instance, in a game, it's common to have a gun/spell or
| whatever that shoots enemies in a cone shape. Like a shotgun or
| burning hands. One way to code this is to calculate the angle
| between where you're pointing and the enemies, (using arccos) and
| if that angle is small enough, apply the damage or whatever.
|
| A better way to do it to take the vector where the shotgun is
| pointing, and the vector to a candidate enemy is, and take the
| dot product of those two. Pre-compute the cosine of the angle of
| effect, and compare the dot product to that -- if the dot product
| is higher, it's a hit, if it's lower, it's a miss. (you can get
| away with _very_ rough normalization in a game, for instance
| using the rqsrt instruction in sse2) You 've taken a potentially
| slow arccos operation and turned it into a fast handful of basic
| float operations.
|
| Or for instance, you're moving something in a circle over time.
| You might have something like float angle = 0
| for (...) angle += 0.01 <something with
| sin(angle) cos(angle)>
|
| Instead, you might do: const float sin_dx =
| sin(0.01) const float cos_dx = cos(0.01) float
| sin_angle = 0 float cos_angle = 1 for (...)
| const float new_sin_angle = sin_angle * cos_dx + cos_angle *
| sin_dx cos_angle = cos_angle * cos_dx - sin_angle *
| sin_dx sin_angle = new_sin_angle
|
| And you've replaced a sin/cos pair with 6 elementary float
| operations.
|
| And there's the ever popular comparing squares of distances
| instead of comparing distances, saving yourself a square root.
|
| In general, inner loops should never have a transcendental or
| exact square root in them. If you think you need it, there's
| almost always a way to hoist from an inner loop out into an outer
| loop.
| aj7 wrote:
| So 57 years ago, there weren't any programming books. Well, there
| was one. McCracken and Dorn, Numerical Methods and Fortran
| Programming. A kindly childless couple, Ben and Bluma Goldin, who
| lived in the same apartment house in Brooklyn, bought me this
| book as a gift. An axiomatic lesson from the book was distrust of
| Taylor expansions, and the necessity of moving the expansion
| point of any function to a favorable location. Because harmonic
| functions are very well behaved, it should also be obvious that
| even a very coarse lookup table + interpolation should converge
| to high accuracy quickly. Finally, use of trigonometric
| identities to zero in on the desired angle is helpful. OK I'll
| shut up now.
| ok123456 wrote:
| A lot of Basic implementations, that had software support for
| floating point numbers, implemented it like this. Microsoft BASIC
| used a 6th degree polynomial iirc.
| nonrandomstring wrote:
| Audio stuff uses different methods depending on the application.
| Additive oscillators need to be accurate and can use some closed
| forms and series, mostly people use Taylor and quadratic
| interpolation as the article shows. For tables remember you only
| really need to compute amd store one quadrant and shift it
| around. But for LFOs people use all manner of dirty and quick
| methods like Bhaskara.
|
| https://en.wikipedia.org/wiki/Bhaskara_I%27s_sine_approximat...
| dilyevsky wrote:
| I remember researching cordic for some pet fpga project i had and
| finding some libs/discussions so it's definitely used.
| xioxox wrote:
| Here's a solution which is vectorizable:
| http://gallium.inria.fr/blog/fast-vectorizable-math-approx/
|
| I've also read that Chebyshev polynomials are far better for
| approximating functions than Taylor series (eg
| https://en.wikipedia.org/wiki/Approximation_theory)
| freemint wrote:
| This matters less once you put things through the horner
| scheme. And if you are not you are leaving performance on the
| table.
| dheera wrote:
| > So we are really only calculating cosine from 0 to pi.
|
| You could go further and only to 0 to pi/2 as it is mirrored and
| flipped for pi/2 to pi.
|
| Or you could go even further and do only 0 to pi/4 and use a
| simple trig identity and your presumably parallely-developed sin
| function for the cos of pi/4 to pi.
|
| > One of the implementations is nearly 3x faster than math.h
|
| Is this true even for -O3?
| blt wrote:
| If you want sinusoidal oscillation over time, then you can
| integrate the second-order differential equation x'' = -o2 x
| instead. This only costs a few arithmetic instructions per time
| step. On the other hand, it adds dependency on a state other than
| the current time, and you must be careful to use energy-
| conserving integration.
| duped wrote:
| Here's a quick way of doing it with one matrix multiplication,
| you just need to precompute cos(o) and sin(o) with a bit of
| intuition: [ x(n + 1) ] = [cos(o), -sin(o)] [
| x(n) ] [ y(n + 1) ] [sin(o), cos(o)] [ y(n) ]
| where x(n) = cos(o n) y(n) = sin(o n)
| x(0) = 1 y(0) = 0 o = frequency (radians) *
| time_step
|
| There are two ways to look at this, from the systems
| perspective this is computing the impulse response of a
| critically stable filter with poles at e^(+/-jo).
| Geometrically, it's equivalent to starting at (1, 0) and
| rotating around the unit circle by o radians each time step.
| You can offset to arbitrary phases.
|
| It's not suitable for numerous cases (notably if the frequency
| needs to change in real time) but provided that the sum of the
| coefficients sum to `1.0` it's guaranteed stable and will not
| alias.
| user_7832 wrote:
| Slightly off topic but kudos to the author for explaining the
| process extremely clearly. As someone without experience of low
| level programing this was quite understandable for me (albeit
| with uni-level maths knowledge).
| em3rgent0rdr wrote:
| Speaking of lookup tables, I recall reading the DX-7
| (https://www.righto.com/2021/11/reverse-engineering-yamaha-dx...
| shared a few months ago on HN), that DX-7 has a 4096-entry 14-bit
| sine ROM built into the chip itself.
| anonymousiam wrote:
| I did the same thing for sqrt() seven years ago. It benchmarked
| at about five times faster than math.h/libm, but the trade was a
| reduction in the maximum value of the result. My version would
| not produce accurate results for inputs greater than 2^16. It did
| work very well for generating real-time laser pointing X/Y
| coordinates for an Archimedean Spiral.
|
| https://en.wikipedia.org/wiki/Archimedean_spiral
|
| (This was on a MSP430 platform with a FPU that only did
| multiplication.)
| adgjlsfhk1 wrote:
| I'm surprised this was faster than an initial guess (half the
| exponent and mantisa) followed by newton's method.
| [deleted]
| ThePhysicist wrote:
| I remember doing this in Macromedia Flash as I wanted to make
| stuff go in circles and the language didn't have any higher math
| functions back then. Good ole times.
| todd8 wrote:
| The article uses a game movement function as a possible
| motivation for needing the cosine. If object.rotation is constant
| as the spiral animation seems to suggest, there is an important
| optimization for rapidly computing the sequence: cos(theta),
| cos(theta + delta), cos(theta + 2 _delta), cos(theta + 3_ delta),
| ...
|
| This comes up in movement under constant rate of rotation as well
| as Fourier transforms. See [1] for details, but the basic idea
| uses the simple trig identities: cos(theta +
| delta) = cos(theta) - (a*cos(theta) + b*sin(theta))
| sin(theta + delta) = sin(theta) - (a*sin(theta) - b*cos(theta))
|
| The values for a and b must be calculated, but only once if delta
| is constant: a = 2 sin^2(delta/2) b =
| sin(delta)
|
| By using these relationships, it is only necessary to calculate
| cos(theta) and sin(theta) _once_ for the first term in the series
| in order to calculate the entire series (until accumulated errors
| become a problem).
|
| [1] Press, William H. et. al., _Numerical Recipes_ , Third
| Edition, Cambridge University Press, p. 219
| dancsi wrote:
| The implementation of the __cos kernel in Musl is actually quite
| elegant. After reducing the input to the range [-pi/4, pi/4], it
| just applies the best degree-14 polynomial for approximating the
| cosine on this interval. It turns out that this suffices for
| having an error that is less than the machine precision. The
| coefficients of this polynomial can be computed with the Remez
| algorithm, but even truncating the Chebyshev expansion is going
| to yield much better results than any of the methods proposed by
| the author.
| brandmeyer wrote:
| This has been the standard algorithm used by every libm for
| decades. Its not special to Musl.
| enriquto wrote:
| But isn't this code rarely called in practice? I guess on
| intel architectures the compiler just calls the fsin
| instruction of the cpu.
| stephencanon wrote:
| No. FSIN has accuracy issues as sibling mentions, but is
| also much slower than a good software implementation (it
| varies with uArch, but 1 result every ~100 cycles is
| common; even mediocre scalar software implementations can
| produce a result every twenty cycles).
| jcelerier wrote:
| Wasn't there some blog article a few years ago which showed
| how glibc's implementation was faster than fsin ?
| chrisseaton wrote:
| > I guess on intel architectures the compiler just calls
| the fsin instruction of the cpu.
|
| Do people do that in practice? It's on the FPU, which is
| basically legacy emulated these days, and it's inaccurate.
| enriquto wrote:
| > the FPU, which is basically legacy
|
| you'll pry my long doubles from my cold, dead hands!
| chrisseaton wrote:
| You're selectively quoting me - I said it's 'legacy
| emulated'. It's emulated using very long-form microcode,
| so basically emulated in software. I didn't say it was
| simply 'legacy'.
| jhgb wrote:
| The question is if you wouldn't be better served with
| double-doubles today. You get ~100 bits of mantissa AND
| you can still vectorize your computations.
| enriquto wrote:
| Sure. There should be a gcc flag to make "long double"
| become quadruple precision.
|
| The thing is, my first programming language was x86
| assembler and the fpu was the funniest part. Spent weeks
| as a teenager writing almost pure 8087 code. I have a lot
| of emotional investment in that tiny rolling stack of
| extended precision floats.
| colejohnson66 wrote:
| How is "quad floating point" implemented on x86? Is it
| software emulation?
| jhgb wrote:
| FSIN _only_ works on x87 registers which you will rarely
| use on AMD64 systems -- you really want to use at least
| scalar SSE2 today (since that is whence you receive your
| inputs as per typical AMD64 calling conventions anyway).
| Moving data from SSE registers to the FP stack _just_ to
| calculate FSIN and then moving it back to SSE will probably
| kill your performance even if your FSIN implementation is
| good. If you 're vectorizing your computation over 4 double
| floats or 8 single floats in an AVX register, it gets even
| worse for FSIN.
| stephencanon wrote:
| Moving between x87 and xmm registers is actually fairly
| cheap (it's through memory, so it's not free, but it's
| also not _that_ bad). FSIN itself is catastrophically
| slow.
| jhgb wrote:
| Fair enough, and I imagine there may even be some
| forwarding going on? There often is when a load follows a
| store, if I remember correctly. (Of course this _will_ be
| microarchitecture-dependent.)
| [deleted]
| adgjlsfhk1 wrote:
| No. The fsin instruction is inaccurate enough to be
| useless. It gives 0 correct digits when the output is close
| to 0.
| enriquto wrote:
| > 0 correct digits when the output is close to 0
|
| this is an amusing way to describe the precision of sub-
| normal floating point numbers
| adgjlsfhk1 wrote:
| It's not just sub-normal numbers. As
| https://randomascii.wordpress.com/2014/10/09/intel-
| underesti... shows, fsin only uses 66 bits of pi, which
| means you have roughly no precision whenever
| abs(sin(x))<10^-16 which is way bigger than the biggest
| subnormal (10^-307 or so)
| lifthrasiir wrote:
| It is much more amusing if you describe it in ulps; for
| some inputs the error can reach > 2^90 ulps, more than
| the mantissa size itself.
| lifthrasiir wrote:
| x87 trig functions can be very inaccurate due to the
| Intel's original sloppy implementation and subsequent
| compatibility requirements [1].
|
| [1] http://nighthacks.com/jag/blog/134/index.html
| toolslive wrote:
| Yup. Chebyshev is the way to go (after 30s of consideration)
| potiuper wrote:
| https://github.com/ifduyue/musl/blob/master/src/math/__cos.c
| ufo wrote:
| I think the polynomial calculation in the end looks
| interesting. It doesn't use Horner's rule.
| lifthrasiir wrote:
| It does use Horner's rule, but splits the expression into
| two halves in order to exploit instruction-level
| parallelism.
| jacobolus wrote:
| Considering the form of both halves is the same, are
| compilers smart enough to vectorize this code?
| llimllib wrote:
| > Input y is the tail of x.
|
| What does "tail" mean in this context?
| lifthrasiir wrote:
| It is a double-double representation [1], where a logical
| fp number is represented with a sum of two machine fp
| numbers x (head, larger) and y (tail, smaller). This
| effectively doubles the mantissa. In the context of musl
| this representation is produced from the range reduction
| process [2].
|
| [1] https://en.wikipedia.org/wiki/Quadruple-
| precision_floating-p...
|
| [2] https://github.com/ifduyue/musl/blob/6d8a515796270eb6ce
| c8a27...
| jhgb wrote:
| Does it make sense to use a double-double input when you
| only have double output? Sine is Lispchitz-limited by 1
| so I don't see how this makes a meaningful difference.
| lifthrasiir wrote:
| The input might be double but the constant pi is not. Let
| f64(x) be a function from any real number to double, so
| that an ordinary expression `a + b` actually computes
| f64(a + b) and so on. Then in general f64(sin(x)) may
| differ from f64(sin(f64(x mod 2pi))); since you can't
| directly compute f64(sin(x mod 2pi)), you necessarily
| need more precision during argument reduction so that
| f64(sin(x)) = f64(sin(f64timeswhatever(x mod 2pi))).
| jhgb wrote:
| But am I correct in thinking that is at worst a 0.5 ulp
| error in this case? The lesser term in double-double
| can't be more than 0.5 ulp of the greater term and
| sensitivity of both sine and cosine to an error in the
| input will not be more than 1.
|
| Also, in case of confusion, I was specifically commenting
| on the function over the [-pi/4, pi/4] domain in https://
| github.com/ifduyue/musl/blob/master/src/math/__cos.c ,
| which the comment in
| https://news.ycombinator.com/item?id=30846546 was
| presumably about.
| adgjlsfhk1 wrote:
| Double rounding can still bite you. You are forced to
| incur up to half an ulp of error from your polynomial, so
| taking another half ulp in your reduction can lead to a
| total error of about 1 ulp.
| lifthrasiir wrote:
| Yeah, sine and cosine are not as sensitive (but note that
| many libms target 1 or 1.5 ulp error for them, so a 0.5
| ulp error might still be significant). For tangent
| however you definitely need more accurate range
| reduction.
| [deleted]
| skulk wrote:
| I learned how to write programs from Scratch[0]. Back in the
| pre-1.4 days (IIRC) there was no native sin/cos functions, so
| they had to be "implemented" somehow to translate arbitrary
| angles into delta (x,y). The way that I found people did this was
| as follows (for sin(A), cos(A)):
|
| Invisible sprite:
|
| [ when I receive "sine" ]: [ go to center ]
| [ turn A degrees ] [ move 1 step forward ]
|
| Now, anyone that wants to know sin(A), cos(A) just has to read
| that sprite's X and Y position.
|
| [0]: https://scratch.mit.edu
| em3rgent0rdr wrote:
| I guess scratch (or LOGO) is Turing complete, so someone could
| make a CPU built using a little turtle in a box where all
| functionality is built out of LOGO commands.
| aaaronic wrote:
| I teach lookup tables specifically for sine/cosine (we just use
| one with a phase shift and modulus) on the Gameboy Advance. The
| hardware has no floating point unit support, so we do it in 8.8
| fixed point with 1-degree resolution.
|
| It turns out the BIOS on the hardware can do sine/cosine, but you
| have to do that via inline assembly (swi calls) and it's no
| faster than the simple lookup table.
|
| This is all just to get terms for the affine matrix when rotating
| sprites/backgrounds.
| waynecochran wrote:
| You are doing the polynomial evaluation wrong for Taylor Series.
| First of all you should be using Horner's Method to reduce the
| number of multiplications needed and then you should be using FMA
| to increase precision of the solution.
| https://momentsingraphics.de/FMA.html
| phonebucket wrote:
| This is a fun article!
|
| Alternative avenues that complement the approaches shown:
|
| - Pade Approximants
| (https://en.wikipedia.org/wiki/Pad%C3%A9_approximant) can be
| better than long Taylor Series for this kind of thing.
|
| - Bhaskara I's sin approximation
| (https://en.wikipedia.org/wiki/Bhaskara_I%27s_sine_approximat...)
| is easily adaptable to cosine, remarkably accurate for its
| simplicity and also fast to calculate.
| deepspace wrote:
| I was surprised that pre-calculating x _x as double xx = x_ x
| made a difference.
|
| The author's compiler must not be great at optimization if this
| is the case.
| msk-lywenn wrote:
| My favorite implementation of a super fast but not that accurate
| cosinus is from a now defunct forum. Good thing we have
| webarchive !
| https://web.archive.org/web/20111104163237/http://www.devmas...
| karma_fountain wrote:
| You can also use the Chebyshev polynomials to economise a
| higher degree taylor series with bit of a loss of accuracy.
| slim wrote:
| you can't because boycott Chebyshev
| rurban wrote:
| You can't boycott Chebyshev, he died 1894 or so.
| gweinberg wrote:
| I think you should be able to get good results for precalculating
| the cosines for a set of angles and then using the formula for
| the cosine of the sum of 2 angles combined with the taylor
| series. The series converges very quickly for very small angles,
| right?
| dboreham wrote:
| This takes me back in time, to c. 1982 when I got curious about
| how computers calculated trig functions. Luckily I was using one
| of the first OSS systems (modulo AT&T lawyers): Unix V7. So I
| pulled up the source and took a look. You can still see it here:
| https://github.com/v7unix/v7unix/blob/master/v7/usr/src/libm...
| There's a comment referencing "Hart & Cheney", which is this book
| : https://www.google.com/books/edition/Computer_Approximations...
| Also luckily I had access to a library that had the book which I
| was able to read, at least in part. This taught me a few things :
| that the behavior of a computer can be a white box, not a black
| box; the value of having access to the source code; to use
| comments effectively; that for many difficult tasks someone
| probably had written a book on the subject. Also that polynomials
| can approximate transcendental functions.
| hatsunearu wrote:
| Since cosine is even and periodic, you really only need like, up
| to x = 0 to pi/4 and the rest you can infer just from that data.
| mhh__ wrote:
| That is how it's often implemented i.e. shift back to the
| domain you actually need to be able to compute.
| stncls wrote:
| This is a very nice article, and I think the three initial
| objectives are fully attained (simple enough, accurate enough,
| fast enough).
|
| Some of the performance claims have a caveat, though: Lookup
| tables and micro-benchmarks don't mix well. At all.
|
| I just added a simple loop that thrashes the L3 cache every 500
| iterations (diff here:
| https://goonlinetools.com/snapshot/code/#sm4fjqtvjyn36dyednc...).
| Now the method recommended at the end (cos_table 0_001 LERP) is
| _slower_ than glibc 's cos() (while still having an accuracy that
| is more than 10^8 times worse)!
|
| Time benchmark output: cos_table_1_LERP
| 0.0038976235167879 cos_table_0_1_LERP
| 0.0042602838286585 cos_table_0_01_LERP
| 0.0048867938030232 cos_table_0_001_LERP
| 0.0091254562801794 cos_table_0_0001_LERP
| 0.0139627164397000 cos_math_h
| 0.0089332715693581
|
| Lookup table sizes: cos_table_1_LERP
| 64 bytes cos_table_0_1_LERP 512 bytes
| cos_table_0_01_LERP 5040 bytes
| cos_table_0_001_LERP 50280 bytes
| cos_table_0_0001_LERP 502664 bytes
| adgjlsfhk1 wrote:
| How long does the cache thrashing loop take compared to 500
| iterations? I'm not sure how much you can trash the cache while
| still being performance bottle-necked by a table-based
| implementation of a trig function.
| stncls wrote:
| Yes of course the cache thrashing takes a lot longer. But I
| modified the code to time only the cos() functions (see
| diff), using the rdtsc instruction. That's why it had to be
| every 500 iterations: If it was after each iteration, then
| rdtsc itself would become the bottleneck!
| adgjlsfhk1 wrote:
| My point wasn't about how you were timing the code, what I
| meant is in the real world, I don't think there are
| programs where this will be an issue because if they are
| thrashing their caches too much, they won't be bottlenecked
| on trig. The advantage of table based functions is that
| they are really fast if you are calling a lot of them at
| once, but that's also the only case where they need to be
| fast. If only 1/10000 instructions is trig, it's OK if the
| trig is a little slower.
| anugrahit wrote:
| charlieyu1 wrote:
| I think off-zero Taylor series along with lookup table should be
| at least tried. You can approximate cos(x+a) using only cos(a)
| and sin(a) and the series converges really fast
| SeanLuke wrote:
| A LERP lookup table is fast; but only somewhat slower, and
| probably quite a bit more accurate, would be a lookup table
| interpolated by a Catmull-Rom spline.
| marcodiego wrote:
| I don't know if this benchmark is trustworthy. Using a lookup
| table pollutes cache. It definitely needs to be compared with
| interactions on other parts of the code to get a good performance
| measurement.
| naoqj wrote:
| >I couldn't find any notable projects on GitHub that use a table
| for trig functions, but I'm sure they exist.
|
| Doom?
| zekica wrote:
| Doom also uses the famous fast inverse square root method which
| although I know how it works still amazes me.
| naoqj wrote:
| It doesn't, to the best of my knowledge. It appeared in an id
| game for the first time in Quake 3.
| freemint wrote:
| Julia?
| adgjlsfhk1 wrote:
| Julia uses a very similar approach to musl here. Reduction
| mod 2pi followed by polynomial kernel. We use tables for
| `exp` and `log`, but the trig functions have good reductions
| so we use those instead. The implementation is here https://g
| ithub.com/JuliaLang/julia/blob/03af78108afe6058f54c... for
| anyone interested.
| dekhn wrote:
| lookup tables and lerp are the most common solution I've seen
| (for a wide range of functions). You can also memorize with an
| LRU cache.
| phkahler wrote:
| >> lookup tables and lerp are the most common solution I've
| seen
|
| The author missed one nice table method. When you need sin()
| and cos() split the angle into a coarse and fine angle. Coarse
| might be 1/256 of a circle and fine would be 1/65536 of a
| circle (whatever that angle is). You look up the sin/cos of the
| coarse angle in a 256 entry table. Then you rotate that by the
| fine angle which uses another 256 sin * cos entries. The
| rotation between coarse values is more accurate than linear
| interpolation and give almost the same accuracy as a 65536
| entry table using only 768 entries. You can use larger tables
| or even another rotation by finer angle steps to get more
| accuracy.
| LeifCarrotson wrote:
| > _Then you rotate that by the fine angle_
|
| Explain?
|
| Are you trying to reduce the error in the linear
| interpolation by describing the convex curve between the
| endpoints?
|
| The derivative of the cosine is the sine, and the derivative
| of the sine is the cosine...so I'd expect the required output
| of the fine angle table to interpolate 0.15 between 0.1 and
| 0.2 and the required output to interpolate 45.15 between 45.1
| and 45.2 degrees will be very different.
| marginalia_nu wrote:
| Remember that cos(a+b)=cos(a)cos(b)-sin(a)sin(b)
| phkahler wrote:
| >> Explain?
|
| Yeah I thought that might not be clear enough. Let's say
| you want 0.1 degree accuracy but don't want a 3600 entry
| table. Make one table for every 1 degree. You get to use
| the same table for sin and cos by changing the index. That
| is the coarse table with 360 entries. Then make another
| table with sin and cos values for every 0.1 degrees, or
| specifically 0.1*n for n going from 0-9. This is another 20
| values, 10 sin and 10 cos for small angles.
|
| Take your input angle and split it into a coarse (integer
| degrees) and fine (fractional degrees) angle. Now take the
| (sin(x),cos(x)) from the coarse table as a vector and
| rotate it using the sin & cos values of the fine angle
| using a standard 2x2 rotation matrix.
|
| You can size these tables however you need. I would not use
| 360 and 10 for their sizes, but maybe 256 and 256. This can
| also be repeated with a 3rd table for "extra fine" angles.
| marginalia_nu wrote:
| > You can also memorize with an LRU cache.
|
| That's probably not going to perform well for an operation this
| fast. The computation is faster than main memory read.
| dekhn wrote:
| You can fit quite a large LRU cache in the L2 cache of the
| processor. You never want to go to main memory for a lerp
| table.
| Sharlin wrote:
| In some cases (eg. oldskool 2D graphics) a 256-entry LUT is
| enough. And a 8-bit fixed point format can be enough so the
| whole table fits in 256 bytes.
| munificent wrote:
| Yeah, lerping into a lookup table is common in audio and (old
| school) games.
|
| A common trick to make that faster is to not use doubles and
| radians to represent the phase. Instead, represent phase using
| an integer in some power-of-two range. That lets you truncate
| the phase to fit in a single period using a bit mask instead of
| the relatively slow modulo.
___________________________________________________________________
(page generated 2022-03-29 23:00 UTC)