https://win-vector.com/2024/05/09/solving-recurrence-relations/

< return
[                    ] 
  * About Us
      + Company Information
      + Staff
      + Example Engagements
  * Service Offerings
      + Consulting
      + Training Overview
      + Use Cases
  * Blog
  * Talks and Presentations
  * Contact
  * Practical Data Science with R

Menu Home

  * Win-Vector Blog
  * Company
  * @WinVectorLLC Twitter

Win Vector LLC

Data science advising, consulting, and training

Solving Recurrence Relations

By John Mount on May 9, 2024 * ( Leave a comment )

Introduction 

A neat bit of "engineering mathematics" is solving recurrence
relations. The solution method falls out of the notation itself, and
harkens back to a time where formal sums were often used in place of
vector subscript notation.

Unfortunately the variety of such solutions is small enough to allow
teaching by memorization. In this note I try to avoid memorization,
and motivate the methodology. I feel this is facilitated by
separating a number of often smeared together representations
(formulas, sequences, vectors, linear checks, characteristic
polynomial, and polynomial check families) into distinct
realizations. We are also going to emphasize calculating and
confirming claims in Python.

The problem 

A simple form of the recurrence problem is to write down a general
solution for a subscripted family of linear equations such as the
following

F[n+2] = F[n+1] + F[n]

where n is a subscript varying over all positive integers.

Such a relation or equation can arise in number of situations or
applications:

  * Time series analysis.
  * Estimating run times of algorithms.
  * Combinatorics.

The above example is in fact the Fibonacci sequence.

The question is: if we are given initial conditions F[1] = 1 and F[2]
= 1, what is F[n] (for general non-negative integer n)?

In this case the Wikipedia solution is F[n] = (r[1]^n - r[2]^n)/sqrt
(5) where:

  * r[1] = (1 + sqrt(5))/2
  * r[2] = (1 - sqrt(5))/2

Natural questions include:

  * Why is the solution in this form?
  * How do you find r[1] and r[2]?

We will set up some notation and then solve a few examples.

The Solution 

Vector space notation 

Let's formalize our notation a bit.

First let's settle on working over the vector space of all infinite
sequences of real numbers with non-negative subscripts. This is just
saying we consider the infinite sequence F = (F[1], F[2], F[3], ...)
we are solving for as one of many possible sequences. We can use "R
[Z+]" as the group-ring style naming of this vector space. Now
consider any such vector v that obeys the recurrence equations:

v[n+2] = v[n+1] + v[n] for n = 1, 2, ...

Let "S" denote the subset of R[Z+] that obey all of the above linear
recurrence checks. We claim a few things about S (the set of
solutions to our current example system):

  * The all zeroes vector is in S. So S is non-empty.
  * If v obeys all of the above constraints then so does c v for any
    constant c.
  * If u and v obey all of the above constraints then so does u + v.
  * By induction, all v[n] are completely determined by the values v
    [1] and v[2].

The first three observations tell us S is a vector subspace of R[Z+].
The fourth observation tells us this vector subspace is 2
dimensional, so any solution can be written as the linear combination
of two basis solutions.

For notational convenience we will associate functions with vectors
in R[Z+]. For a function f() let [f(n) | n = 1, 2, ...] denote the
graph or vector (f(1), f(2), f(3), ...). In particular we are
interested in the vectors [r[1]^n | n = 1, 2, ...] = (r[1], r[1]^2, r
[1]^3, ...) and [r[2]^n | n = 1, 2, ...] = (r[2], r[2]^2, r[2]^3,
...) (for the previously defined r[1] and r[2]). We claim [r[1]^n | n
= 1, 2, ...] and [r[2]^n | n = 1, 2, ...] both obey the above linear
check conditions.

Inspecting the claimed Fibonacci solution 

Let's confirm both [r[1]^n | n = 1, 2, ...] and [r[2]^n | n = 1, 2,
...] both obey the linear check conditions. In my opinion one doesn't
truly learn the math without working at least a few concrete
examples.
In[1]:

# define [r1**n | n = 1, 2, ...] = f_r_1
import numpy as np

r_1 = (1 + np.sqrt(5))/2

f_r_1 = np.asarray([r_1**n for n in range(1, 11)])

f_r_1

Out[1]:

array([  1.61803399,   2.61803399,   4.23606798,   6.85410197,
        11.09016994,  17.94427191,  29.03444185,  46.97871376,
        76.01315562, 122.99186938])

In[2]:

# check f_r_1 obeys specified  Fibonacci relations
assert np.all([
    np.abs(f_r_1[n+2] - (f_r_1[n+1] + f_r_1[n])) < 1e-8
    for n in range(2, len(f_r_1) - 2)
])

In[3]:

# define [r2**n | n = 1, 2, ...] = f_r_2

r_2 = (1 - np.sqrt(5))/2

f_r_2 = np.asarray([r_2**n for n in range(1, 11)])

f_r_2

Out[3]:

array([-0.61803399,  0.38196601, -0.23606798,  0.14589803, -0.09016994,
        0.05572809, -0.03444185,  0.02128624, -0.01315562,  0.00813062])

In[4]:

# check f_r_2 obeys specified  Fibonacci relations
assert np.all([
    np.abs(f_r_2[n+2] - (f_r_2[n+1] + f_r_2[n])) < 1e-8
    for n in range(2, len(f_r_2) - 2)
])

In[5]:

# generate a bit of the Fibonacci sequence using 
# the definitional recurrence F[n+2] = F[n+1] + F[n]
f = [1, 1]
for i in range(8):
    n = len(f) - 2
    f.append(f[n + 1] + f[n])

f

Out[5]:

[1, 1, 2, 3, 5, 8, 13, 21, 34, 55]

In[6]:

# Wikipedia claimed solution
prediction = (f_r_1 - f_r_2) / np.sqrt(5)

prediction

Out[6]:

array([ 1.,  1.,  2.,  3.,  5.,  8., 13., 21., 34., 55.])

In[7]:

# confirm claimed combination matches Fibonacci sequence
assert np.max(np.abs(
        np.asarray(f) - prediction
    )) < 1e-8

In[8]:

# we can also solve for the mixture coefficients by
# treating our sequences as column vectors and asking
# a solver how to write the vector f as a linear
# combination of the vectors f_r_1 and f_r_2.
#
# As a linear system in the first 2 terms this looks like:
#
# f[0] = soln[0] * f_r_1[0]  +  soln[1] * f_r_1[0]
# f[1] = soln[0] * f_r_1[1]  +  soln[1] * f_r_1[1]
#
# And such a linear system is translated into 
# matrix notation as follows:
soln = np.linalg.solve(
    np.asarray([
        [f_r_1[0], f_r_2[0]],
        [f_r_1[1], f_r_2[1]],
    ]),
    [
        f[0],
        f[1],
    ]
)

soln

Out[8]:

array([ 0.4472136, -0.4472136])

In[9]:

# our (identical) derived solution
prediction2 = (
    soln[0] * f_r_1
    + soln[1] * f_r_2
)

prediction2

Out[9]:

array([ 1.,  1.,  2.,  3.,  5.,  8., 13., 21., 34., 55.])

In[10]:

# confirm claimed combination matches Fibonacci sequence
assert np.max(np.abs(
        np.asarray(f) - prediction2
    )) < 1e-8

The neat trick 

The core of the solution follows from a neat trick: replace the
subscripts with superscripts. This is very powerful trick. Let's see
that in action.

We gamble and hope some of the solutions are of the following simple
form: [r^n | n = 1, 2, ...] = (r, r^2, r^3, ...), where r is a to be
determined (possibly complex) number.

Our claim is: if the number r is a solution to the polynomial
equation (in x)

x^2 = x + 1

then v = [r^n | n = 1, 2, ...] satisfies

v[n+2] = v[n+1] + v[n] for n = 1, 2, ...
.
The polynomial is called "the characteristic polynomial" of the
linear recurrence checks.

The "trick" to this is: if x = r is a root of, or satisfies, x^2 = x
+ 1, then it also satisfies x^n+2 = x^n+1 + x^n (which is equivalent
to x^n x^2 = x^n x + x^n 1) for all n >= 0 . So r^n+2 = r^n+1 + r^n
for all n >= 0 which is exactly the claim v = [r^n | n = 1, 2, ...] is
a solution to all of the subscripted v-checks. It pays to think of
the characteristic polynomial p(x) as shorthand for the family of
"check polynomials" x^n p(x) for n = 0, 1, 2, .... However, for some
problems we will need to appeal directly to the family of check
polynomials.

In our case, the roots, or solutions, to this polynomial equation are
the roots x^2 - x - 1 = 0. By the quadratic formula:

  * r[1] = (1 + sqrt(5))/2
  * r[2] = (1 - sqrt(5))/2

This gives us two linearly independent solutions to the recurrence
check equations: [r[1]^n | n = 1, 2, ...] and [r[2]^n | n = 1, 2,
...]. These solutions are enough to form a basis for the solution
space S, so we know F is some linear combination of [r[1]^n | n = 1,
2, ...] and [r[2]^n | n = 1, 2, ...]. And we have already showed how
to find the linear combination by setting up linear equations to
match the first two entries of the vector.

A harder example 

Suppose we want to solve the recurrence:

W[n+2] = 6 W[n+1] - 9 W[n] for n = 1, 2, ...

The above are the "W checks" we now want to satisfy. A solution to
these is a specific vector of values we can substitute in for the
W's, such that none of the equations are false.

We will try the earlier solution strategy. We are then interested in
roots of the corresponding characteristic polynomial:

x^2 - 6 x + 9 = 0

The above polynomial factors into (x - 3)^2. So r[1] = r[2] = 3. So
we know [3^n | n = 1, 2, ...] is a solution to the W checks.

The space of solutions is again 2 dimensional. So to parameterize
over all the possible solutions, we need a second linear independent
solution. The question then is: how do we find a second linear
independent solution?

Dealing with repeated roots 

When our characteristic polynomial p(x) has repeated roots, the
characteristic polynomial is not expressive enough to represent some
solutions. However, the corresponding family of check polynomials x^n
p(x) for n = 0, 1, ... are rich enough to find the extra solutions.
We will use that when a polynomial p(x) has a repeated root (that is:
for some number r, the fact that (x-r)^2 divides into p(x) with no
remainder), then p(x) shares that root with p'(x) (where p'(x) is the
derivative of p(x) with respect to x).

Take the earlier "W check" polynomials p(x) = x^n+2 - 6 x^n+1 + 9 x^
n. Define the polynomial y(x) = x p'(x) = (n+2) x^n+2 - 6 (n+1) x^n+1
+ 9 n x^n. y(x) itself is (by inspection) the check polynomials
corresponding to the following (new) linear recurrence checks:

(n+2) Y[n+2] = 6 (n+1) Y[n+1] - 9 (n) Y[n] for n = 1, 2, ...

As y(3) = 0 (true because 3 is a double root of p(x)) we know [3^n |
n = 1, 2, ...] is a solution to the new Y linear recurrence checks.
Substitute this valid Y solution [3^n | n = 1, 2, ...] into the Y
checks to derive the following family of (true or valid) equations.

((n+2) 3^n+2) = 6 ((n+1) 3^n+1) - 9 ((n) 3^n) for n = 1, 2, ...

Notice the above is saying [n 3^n | n = 1, 2, ...] obeys the earlier
W linear recurrence problem. We have our desired additional solution.

Confirming the last claim 

Frankly, this sort of argument doesn't fully sink in until one
confirms some examples and calculations. The derivation is a bit of
"proof by change of notation", which is never very satisfying.

So: let's confirm some calculations claimed in the example to try to
build some familiarity with the items discussed.

In[11]:

# show down p(x) zeros out at x=3
r = 3
evals_p = [
    r**(n+2) - 6 * r**(n+1) + 9 * r**n
    for n in range(10)]

evals_p

Out[11]:

[0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In[12]:

assert np.all(np.abs(evals_p)) < 1e-8

In[13]:

# show down y(x) = x p'(x) zeros out at x=3
r = 3
evals_y = [
    r * ((n+2) * r**(n+1) - 6 * (n+1) * r**(n) + 9 * n * r**(n-1))
    for n in range(10)]

evals_y

Out[13]:

[0.0, 0, 0, 0, 0, 0, 0, 0, 0, 0]

In[14]:

assert np.all(np.abs(evals_y)) < 1e-8

In[15]:

# write down the first p = [3**n | n = 1, 2, ...] solution
p = np.asarray([
    r**n for n in range(1, 11)
    ])

p

Out[15]:

array([    3,     9,    27,    81,   243,   729,  2187,  6561, 19683,
       59049])

In[16]:

# confirm W checks on p'
assert np.all([
    p[n+2] - 6 * p[n+1] + 9 * p[n] == 0
    for n in range(1, len(p) - 2)])

In[17]:

# confirm Y checks on p
# this works as W is the "Y" checks for p'
assert np.all([
    (n+2) * p[n+2] - 6 * (n+1) * p[n+1] + 9 * n * p[n] == 0
    for n in range(1, len(p) - 2)])

Back to the example 

As we now have the required number of linearly independent solutions
(2 solutions: [3^n | n = 1, 2, ...] and [n 3^n | n = 1, 2, ...]), we
can solve for any specified initial conditions (as demonstrated
earlier).

Believe it or not, we are done.

In general 

The general solution strategy is as follows.

For a general homogeneous linear recurrence of the form:

v[n+k] = c[k-1] v[n+k-1] + ... + c[0] v[n]

Move to the characteristic polynomial equation:

x^k = c[k-1] x^k-1 + ... + c[0]

We can generate a basis for all solutions as v = [ n^k r^n | n = 1,
2, ...] where r is a root of the characteristic polynomial, and k is
a non-negative integer smaller than the degree of multiplicity of the
root r. Once we have enough linear independent solutions, we can
write any other solution as a linear combination of what we have.

We call all of the above the "swap subscripts (general and powerful)
to powers (specific and weak)" strategy (though there are obviously a
few more steps and details to the method).

Conclusion 

Our solution strategy was exchanging powerful subscripts (which can
implement any integer keyed lookup table) with less powerful
superscripts (that can only express powers). We can lift any solution
found in the weaker world (power series) to the more general one
(subscripted sequences). We don't always find enough power series
solutions, and in that case we transform the problem to find more
solutions to modified polynomials. The trick is to track the details
of how the transforms operate on both our vector solutions and check
polynomials.

The above system is general, it can solve a lot of problems. We
concentrated on calculating over vector values. Related methods
include the umbral calculus, the study of shift operators, and the
Laplace transform.

(The source code for this worksheet can be found here)

Share this:

  * Twitter
  * LinkedIn
  * Facebook
  * Reddit
  * Email
  * 

Like this:

Like Loading...

Categories: Tutorials

Tagged as: engineering mathematics python recurrence relations

[a4b2fbd7b2c8d]

John Mount

The AI Problem is not Alignment

Leave a ReplyCancel reply

Site Map

  * Blog
  * About Us
  * Practical Data Science with R
  * Company Information
      + Staff
      + Example Engagements
  * Service Offerings
      + Consulting
      + Training Overview
      + Use Cases
  * Talks and Presentations
  * Contact

Recent Posts

  * Solving Recurrence Relations
  * The AI Problem is not Alignment
  * The m = n Machine Learning Anomaly
  * How Data Quantity Drives Model Quality
  * Exact Sums Over a Group Action

search

[                    ]  
Categories

  * Tutorials (453)
  * Statistics (434)
  * Opinion (303)
  * Pragmatic Data Science (255)
  * data science (251)

search

Discover more from Win Vector LLC

Subscribe now to keep reading and get access to the full archive.

Type your email... [                    ]

Subscribe

Continue reading

%d