https://win-vector.com/2024/05/09/solving-recurrence-relations/ < return [ ] * About Us + Company Information + Staff + Example Engagements * Service Offerings + Consulting + Training Overview + Use Cases * Blog * Talks and Presentations * Contact * Practical Data Science with R Menu Home * Win-Vector Blog * Company * @WinVectorLLC Twitter Win Vector LLC Data science advising, consulting, and training Solving Recurrence Relations By John Mount on May 9, 2024 * ( Leave a comment ) Introduction A neat bit of "engineering mathematics" is solving recurrence relations. The solution method falls out of the notation itself, and harkens back to a time where formal sums were often used in place of vector subscript notation. Unfortunately the variety of such solutions is small enough to allow teaching by memorization. In this note I try to avoid memorization, and motivate the methodology. I feel this is facilitated by separating a number of often smeared together representations (formulas, sequences, vectors, linear checks, characteristic polynomial, and polynomial check families) into distinct realizations. We are also going to emphasize calculating and confirming claims in Python. The problem A simple form of the recurrence problem is to write down a general solution for a subscripted family of linear equations such as the following F[n+2] = F[n+1] + F[n] where n is a subscript varying over all positive integers. Such a relation or equation can arise in number of situations or applications: * Time series analysis. * Estimating run times of algorithms. * Combinatorics. The above example is in fact the Fibonacci sequence. The question is: if we are given initial conditions F[1] = 1 and F[2] = 1, what is F[n] (for general non-negative integer n)? In this case the Wikipedia solution is F[n] = (r[1]^n - r[2]^n)/sqrt (5) where: * r[1] = (1 + sqrt(5))/2 * r[2] = (1 - sqrt(5))/2 Natural questions include: * Why is the solution in this form? * How do you find r[1] and r[2]? We will set up some notation and then solve a few examples. The Solution Vector space notation Let's formalize our notation a bit. First let's settle on working over the vector space of all infinite sequences of real numbers with non-negative subscripts. This is just saying we consider the infinite sequence F = (F[1], F[2], F[3], ...) we are solving for as one of many possible sequences. We can use "R [Z+]" as the group-ring style naming of this vector space. Now consider any such vector v that obeys the recurrence equations: v[n+2] = v[n+1] + v[n] for n = 1, 2, ... Let "S" denote the subset of R[Z+] that obey all of the above linear recurrence checks. We claim a few things about S (the set of solutions to our current example system): * The all zeroes vector is in S. So S is non-empty. * If v obeys all of the above constraints then so does c v for any constant c. * If u and v obey all of the above constraints then so does u + v. * By induction, all v[n] are completely determined by the values v [1] and v[2]. The first three observations tell us S is a vector subspace of R[Z+]. The fourth observation tells us this vector subspace is 2 dimensional, so any solution can be written as the linear combination of two basis solutions. For notational convenience we will associate functions with vectors in R[Z+]. For a function f() let [f(n) | n = 1, 2, ...] denote the graph or vector (f(1), f(2), f(3), ...). In particular we are interested in the vectors [r[1]^n | n = 1, 2, ...] = (r[1], r[1]^2, r [1]^3, ...) and [r[2]^n | n = 1, 2, ...] = (r[2], r[2]^2, r[2]^3, ...) (for the previously defined r[1] and r[2]). We claim [r[1]^n | n = 1, 2, ...] and [r[2]^n | n = 1, 2, ...] both obey the above linear check conditions. Inspecting the claimed Fibonacci solution Let's confirm both [r[1]^n | n = 1, 2, ...] and [r[2]^n | n = 1, 2, ...] both obey the linear check conditions. In my opinion one doesn't truly learn the math without working at least a few concrete examples. In[1]: # define [r1**n | n = 1, 2, ...] = f_r_1 import numpy as np r_1 = (1 + np.sqrt(5))/2 f_r_1 = np.asarray([r_1**n for n in range(1, 11)]) f_r_1 Out[1]: array([ 1.61803399, 2.61803399, 4.23606798, 6.85410197, 11.09016994, 17.94427191, 29.03444185, 46.97871376, 76.01315562, 122.99186938]) In[2]: # check f_r_1 obeys specified Fibonacci relations assert np.all([ np.abs(f_r_1[n+2] - (f_r_1[n+1] + f_r_1[n])) < 1e-8 for n in range(2, len(f_r_1) - 2) ]) In[3]: # define [r2**n | n = 1, 2, ...] = f_r_2 r_2 = (1 - np.sqrt(5))/2 f_r_2 = np.asarray([r_2**n for n in range(1, 11)]) f_r_2 Out[3]: array([-0.61803399, 0.38196601, -0.23606798, 0.14589803, -0.09016994, 0.05572809, -0.03444185, 0.02128624, -0.01315562, 0.00813062]) In[4]: # check f_r_2 obeys specified Fibonacci relations assert np.all([ np.abs(f_r_2[n+2] - (f_r_2[n+1] + f_r_2[n])) < 1e-8 for n in range(2, len(f_r_2) - 2) ]) In[5]: # generate a bit of the Fibonacci sequence using # the definitional recurrence F[n+2] = F[n+1] + F[n] f = [1, 1] for i in range(8): n = len(f) - 2 f.append(f[n + 1] + f[n]) f Out[5]: [1, 1, 2, 3, 5, 8, 13, 21, 34, 55] In[6]: # Wikipedia claimed solution prediction = (f_r_1 - f_r_2) / np.sqrt(5) prediction Out[6]: array([ 1., 1., 2., 3., 5., 8., 13., 21., 34., 55.]) In[7]: # confirm claimed combination matches Fibonacci sequence assert np.max(np.abs( np.asarray(f) - prediction )) < 1e-8 In[8]: # we can also solve for the mixture coefficients by # treating our sequences as column vectors and asking # a solver how to write the vector f as a linear # combination of the vectors f_r_1 and f_r_2. # # As a linear system in the first 2 terms this looks like: # # f[0] = soln[0] * f_r_1[0] + soln[1] * f_r_1[0] # f[1] = soln[0] * f_r_1[1] + soln[1] * f_r_1[1] # # And such a linear system is translated into # matrix notation as follows: soln = np.linalg.solve( np.asarray([ [f_r_1[0], f_r_2[0]], [f_r_1[1], f_r_2[1]], ]), [ f[0], f[1], ] ) soln Out[8]: array([ 0.4472136, -0.4472136]) In[9]: # our (identical) derived solution prediction2 = ( soln[0] * f_r_1 + soln[1] * f_r_2 ) prediction2 Out[9]: array([ 1., 1., 2., 3., 5., 8., 13., 21., 34., 55.]) In[10]: # confirm claimed combination matches Fibonacci sequence assert np.max(np.abs( np.asarray(f) - prediction2 )) < 1e-8 The neat trick The core of the solution follows from a neat trick: replace the subscripts with superscripts. This is very powerful trick. Let's see that in action. We gamble and hope some of the solutions are of the following simple form: [r^n | n = 1, 2, ...] = (r, r^2, r^3, ...), where r is a to be determined (possibly complex) number. Our claim is: if the number r is a solution to the polynomial equation (in x) x^2 = x + 1 then v = [r^n | n = 1, 2, ...] satisfies v[n+2] = v[n+1] + v[n] for n = 1, 2, ... . The polynomial is called "the characteristic polynomial" of the linear recurrence checks. The "trick" to this is: if x = r is a root of, or satisfies, x^2 = x + 1, then it also satisfies x^n+2 = x^n+1 + x^n (which is equivalent to x^n x^2 = x^n x + x^n 1) for all n >= 0 . So r^n+2 = r^n+1 + r^n for all n >= 0 which is exactly the claim v = [r^n | n = 1, 2, ...] is a solution to all of the subscripted v-checks. It pays to think of the characteristic polynomial p(x) as shorthand for the family of "check polynomials" x^n p(x) for n = 0, 1, 2, .... However, for some problems we will need to appeal directly to the family of check polynomials. In our case, the roots, or solutions, to this polynomial equation are the roots x^2 - x - 1 = 0. By the quadratic formula: * r[1] = (1 + sqrt(5))/2 * r[2] = (1 - sqrt(5))/2 This gives us two linearly independent solutions to the recurrence check equations: [r[1]^n | n = 1, 2, ...] and [r[2]^n | n = 1, 2, ...]. These solutions are enough to form a basis for the solution space S, so we know F is some linear combination of [r[1]^n | n = 1, 2, ...] and [r[2]^n | n = 1, 2, ...]. And we have already showed how to find the linear combination by setting up linear equations to match the first two entries of the vector. A harder example Suppose we want to solve the recurrence: W[n+2] = 6 W[n+1] - 9 W[n] for n = 1, 2, ... The above are the "W checks" we now want to satisfy. A solution to these is a specific vector of values we can substitute in for the W's, such that none of the equations are false. We will try the earlier solution strategy. We are then interested in roots of the corresponding characteristic polynomial: x^2 - 6 x + 9 = 0 The above polynomial factors into (x - 3)^2. So r[1] = r[2] = 3. So we know [3^n | n = 1, 2, ...] is a solution to the W checks. The space of solutions is again 2 dimensional. So to parameterize over all the possible solutions, we need a second linear independent solution. The question then is: how do we find a second linear independent solution? Dealing with repeated roots When our characteristic polynomial p(x) has repeated roots, the characteristic polynomial is not expressive enough to represent some solutions. However, the corresponding family of check polynomials x^n p(x) for n = 0, 1, ... are rich enough to find the extra solutions. We will use that when a polynomial p(x) has a repeated root (that is: for some number r, the fact that (x-r)^2 divides into p(x) with no remainder), then p(x) shares that root with p'(x) (where p'(x) is the derivative of p(x) with respect to x). Take the earlier "W check" polynomials p(x) = x^n+2 - 6 x^n+1 + 9 x^ n. Define the polynomial y(x) = x p'(x) = (n+2) x^n+2 - 6 (n+1) x^n+1 + 9 n x^n. y(x) itself is (by inspection) the check polynomials corresponding to the following (new) linear recurrence checks: (n+2) Y[n+2] = 6 (n+1) Y[n+1] - 9 (n) Y[n] for n = 1, 2, ... As y(3) = 0 (true because 3 is a double root of p(x)) we know [3^n | n = 1, 2, ...] is a solution to the new Y linear recurrence checks. Substitute this valid Y solution [3^n | n = 1, 2, ...] into the Y checks to derive the following family of (true or valid) equations. ((n+2) 3^n+2) = 6 ((n+1) 3^n+1) - 9 ((n) 3^n) for n = 1, 2, ... Notice the above is saying [n 3^n | n = 1, 2, ...] obeys the earlier W linear recurrence problem. We have our desired additional solution. Confirming the last claim Frankly, this sort of argument doesn't fully sink in until one confirms some examples and calculations. The derivation is a bit of "proof by change of notation", which is never very satisfying. So: let's confirm some calculations claimed in the example to try to build some familiarity with the items discussed. In[11]: # show down p(x) zeros out at x=3 r = 3 evals_p = [ r**(n+2) - 6 * r**(n+1) + 9 * r**n for n in range(10)] evals_p Out[11]: [0, 0, 0, 0, 0, 0, 0, 0, 0, 0] In[12]: assert np.all(np.abs(evals_p)) < 1e-8 In[13]: # show down y(x) = x p'(x) zeros out at x=3 r = 3 evals_y = [ r * ((n+2) * r**(n+1) - 6 * (n+1) * r**(n) + 9 * n * r**(n-1)) for n in range(10)] evals_y Out[13]: [0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0] In[14]: assert np.all(np.abs(evals_y)) < 1e-8 In[15]: # write down the first p = [3**n | n = 1, 2, ...] solution p = np.asarray([ r**n for n in range(1, 11) ]) p Out[15]: array([ 3, 9, 27, 81, 243, 729, 2187, 6561, 19683, 59049]) In[16]: # confirm W checks on p' assert np.all([ p[n+2] - 6 * p[n+1] + 9 * p[n] == 0 for n in range(1, len(p) - 2)]) In[17]: # confirm Y checks on p # this works as W is the "Y" checks for p' assert np.all([ (n+2) * p[n+2] - 6 * (n+1) * p[n+1] + 9 * n * p[n] == 0 for n in range(1, len(p) - 2)]) Back to the example As we now have the required number of linearly independent solutions (2 solutions: [3^n | n = 1, 2, ...] and [n 3^n | n = 1, 2, ...]), we can solve for any specified initial conditions (as demonstrated earlier). Believe it or not, we are done. In general The general solution strategy is as follows. For a general homogeneous linear recurrence of the form: v[n+k] = c[k-1] v[n+k-1] + ... + c[0] v[n] Move to the characteristic polynomial equation: x^k = c[k-1] x^k-1 + ... + c[0] We can generate a basis for all solutions as v = [ n^k r^n | n = 1, 2, ...] where r is a root of the characteristic polynomial, and k is a non-negative integer smaller than the degree of multiplicity of the root r. Once we have enough linear independent solutions, we can write any other solution as a linear combination of what we have. We call all of the above the "swap subscripts (general and powerful) to powers (specific and weak)" strategy (though there are obviously a few more steps and details to the method). Conclusion Our solution strategy was exchanging powerful subscripts (which can implement any integer keyed lookup table) with less powerful superscripts (that can only express powers). We can lift any solution found in the weaker world (power series) to the more general one (subscripted sequences). We don't always find enough power series solutions, and in that case we transform the problem to find more solutions to modified polynomials. The trick is to track the details of how the transforms operate on both our vector solutions and check polynomials. The above system is general, it can solve a lot of problems. We concentrated on calculating over vector values. Related methods include the umbral calculus, the study of shift operators, and the Laplace transform. (The source code for this worksheet can be found here) Share this: * Twitter * LinkedIn * Facebook * Reddit * Email * Like this: Like Loading... Categories: Tutorials Tagged as: engineering mathematics python recurrence relations [a4b2fbd7b2c8d] John Mount The AI Problem is not Alignment Leave a ReplyCancel reply Site Map * Blog * About Us * Practical Data Science with R * Company Information + Staff + Example Engagements * Service Offerings + Consulting + Training Overview + Use Cases * Talks and Presentations * Contact Recent Posts * Solving Recurrence Relations * The AI Problem is not Alignment * The m = n Machine Learning Anomaly * How Data Quantity Drives Model Quality * Exact Sums Over a Group Action search [ ] Categories * Tutorials (453) * Statistics (434) * Opinion (303) * Pragmatic Data Science (255) * data science (251) search Discover more from Win Vector LLC Subscribe now to keep reading and get access to the full archive. Type your email... [ ] Subscribe Continue reading %d