https://www.moria.us/articles/demystifying-the-lfsr/ * Moria logo, stylized M moria.us * Articles * Demystifying the LFSR * Twitter logo Demystifying the LFSR Feb 13, 2022, Programming Contents * What is an LFSR? * Questions * Brute Force * Addition in Binary * Subtraction in Binary * Multiplication in Binary * Division in Binary * LFSRs, Now with More Mathematics! * Irreducible Polynomials * Primitive Polynomials * Results * More Information Note: This uses JavaScript to render math formulas and provide animated circuit diagrams. Are linear feedback shift registers really that hard to understand? I think not! All you need to know to understand LFSRs is high-school level mathematics and an understanding of binary: * You know basic algebra and you can work with polynomials like x^2 + x + 1. * You know how to do long division--or at least, you vaguely remember what long division is, and you can muddle through it. * You know what prime numbers are, and how any number can be factored into prime factors, like 20 = 2^2 \cdot 5. * You know some digital logic, like what "XOR" means. * If you know some group theory, great! Some of this will be familiar. Deeper Mathematics I'm not going to leave out the deeper mathematics entirely, but put discussions about more advanced topics in sections marked like this. You do not need to understand the "deeper mathematics" sections in order to understand LFSRs. What is an LFSR? A linear feedback shift register is a simple way to generate a sequence of pseudorandom bits by taking a register, shifting its contents by one bit, and mixing in some new data using an XOR operation. This is not a rigorous definition, I just want to get you thinking about how an LFSR actually works--either as a digital circuit or as software. LFSRs are often used as random number generators, especially in constrained software and hardware environments. If you travel back in time to the 1980s or 1970s, the LFSR was an easy and reliable way to generate random numbers on 8-bit CPUs like the 8080 or 6502. If you're working in hardware, you can implement an LFSR with only a handful of gates and get a fast, reliable random number generator. There are two different ways to set up an LFSR: the "Fibonacci LFSR" and the "Galois LFSR". Both versions work by taking certain bits from the shift register, XORing them together, and putting them back into the shift register. The two versions just work in opposite directions. How to Pronounce "Fibonacci" and "Galois" "Fibonacci" is pronounced: * /,fIb@'na:tSi/ (or /,fi:b-/) in English, "fi-buh-na-chee", and * /fi.bo'nat.tSi/ in Italian. See YouTube: How to Pronounce Fibonacci "Galois" is pronounced: * /gael'wa:/ in English, "gal-waa", and * /ga'lwa/ in French. See YouTube: How to Pronounce Evariste Galois Fibonacci Configuration LFSRGalois Configuration LFSR These are both LFSRs. It turns out that the Galois configuration is easier to implement in software, and it has higher performance when implemented in hardware, so I'm just going to focus on the Galois configuration. The Galois configuration and the Fibonacci configuration are equivalent, but for this article, I'm just going to ignore the Fibonacci configuration. Code It is simple to implement an LFSR in code. I'm choosing to use the Galois configuration, and to shift the register left each step. #include // Advance the LFSR by one step. uint32_t lfsr_advance(uint32_t value, int width, uint32_t taps) { const uint32_t high_bit = (uint32_t)1 << width; value <<= 1; // If a 1 bit gets shifted out, then... if ((value & high_bit) != 0) { // Clear that bit, and flip the taps. value ^= high_bit | taps; } return value; } You'll often hard-code the width of the LFSR and the taps, because it's not something most people care to configure. // Calculate the next step of a 16-bit LFSR with the taps at // position 0, 2, 3, and 5. This LFSR has maximum period. // // These taps are 101101 in binary, or 0x2d in hexadecimal. uint32_t my_lfsr_advance(uint32_t value) { return lfsr_advance(value, 16, 0x2d); } This works as a reasonable random number generator, although it only generates one bit at a time. I chose to shift the LFSR to the left to make the explanation a little easier. You could just as easily shift to the right instead. It will produce the same stream of bits, in the same order, but the bits in the register will be numbered in reverse. // Advance a 16-bit LFSR by one step, shifting to the right. This is // the same 16-bit LFSR as above, with taps at position 0, 2, 3, and // 5, but the taps are numbered in the opposite direction. uint32_t right_lfsr_advance(uint32_t value) { uint32_t new_value = value >> 1; if ((value & 1) != 0) { new_value ^= 0xb400; } return new_value; } Questions The questions we have are: * Where do we put the taps? * How many times can you shift before it starts repeating? (What is the period of the shift register?) Just based on our knowledge of digital logic, we can figure out that if there are k bits in the shift register, there are 2^k possible states, and the maximum possible period cannot 2^k. We also realize that if the register contains all zeroes, then the next state will also be all zeroes. If our shift register gets into the zero state, it gets stuck. That means that, at best, our shift register could only loop through the other 2^k-1 possible states. Is it possible to make a shift register that has a period of 2^k-1? The answer is "yes". First of all, we can crib that answer from Wikipedia. Wikipedia says that we can construct an LFSR with maximum period if the "reciprocal characteristic polynomial" is a "primitive polynomial". If you just need to implement an LFSR, you can get a list of "primitive polynomials" from Xilinx (Application Note XAPP-052) for any length up to 168 bits, and you're done! Read on if you want to understand what that means and learn how to construct your own primitive polynomials. Deeper Mathematics You may ask, "Why can't we redesign the LFSR so that it gets out of the zero state?" You might consider adding NOT gates to your LFSR, or XORing the state with a constant, or some other trick. However, these simple redesigns will only change which state the LFSR gets stuck in, they won't prevent the LFSR from getting stuck. Brute Force Of course, you don't actually need to know about polynomials, for smaller LFSRs. You can find out the period of an LFSR easily enough by running it until it repeats. #include // Calculate the length of an LFSR's period. int lfsr_period(uint32_t initial_state, int width, uint32_t taps) { // Record when each state is first visited. int *states = calloc(1 << width, sizeof(*states)); if (states == NULL) { abort(); } uint32_t state = initial_state; for (int t = 1;; t++) { if (states[state] != 0) { return t - states[state]; } states[state] = t; state = lfsr_advance(state, width, taps); } } Addition in Binary Let's take a moment to step back and talk about addition in binary, because we're going to be doing addition a little differently. You may have seen an explanation of how to add ordinary numbers in binary. For example, in binary, we can add 1011 + 110 = 10001: 111011+11010001 This is "normal" addition, with carrying. What happens when we don't carry? We get 1011 + 110 = 1121... which is not binary, but that's okay for now. 1011+1101121 This is just like adding two polynomials together. In fact, mathematicians prefer to call this polynomial addition. x3+x+1+x2+xx3+x2+2x+1 What happens when we do addition modulo 2? In modulo 2 arithmetic, 1 + 1 = 0. Addition modulo 2 is the same as the XOR operation in digital logic. Modulo 2, we get 1011 + 110 = 1101. 1011+01101101 Or if we think about it as adding polynomials, x3+x+1+x2+xx3+x2+1 Mathematicians usually call this "polynomial addition" too, but you can see that it doesn't matter if you write out the polynomial as x^ 2+x or if you just write the coefficients as 110. We're going to be using modulo 2 arithmetic to design LFSRs. Code It is easy enough to write a function for polynomial addition modulo 2. Here is the C code: #include // Add two polynomials, modulo 2. uint32_t mod2_add(uint32_t p, uint32_t q) { return p ^ q; } This only gives us 32 bits for polynomial coefficients. You may need to use larger types if you want to work with larger polynomials and larger LFSRs. Deeper Mathematics Mathematicians have different names for modulo 2 arithmetic depending on what kind of mathematics they are studying. People studying groups call it a "cyclic group," and write it as \mathbb{Z}/2\mathbb{Z}. People studying rings might call it a "quotient ring". People studying finite fields call it a "finite field", and write it as \ mathbb{F}_2 or \mathrm{GF}(2)--the "G" stands for "Galois", who was mentioned earlier. I will call modulo 2 arithmetic \mathbb{F}_2, just because it makes explaining LFSRs a little easier. Mathematicians aren't trying to be difficult by coming up with different names for the same thing--it turns out that finite fields and cyclic groups are different things. For example, arithmetic modulo 6 is It just It turns out that finite fields and cyclic groups are For one thing, modulo 6 arithmetic is an ordinary cyclic group \ mathbb{Z}/6\mathbb{Z}, but it is not a finite field! The cyclic group and the finite field are only the same as each other when the size is a prime number, like 2. Note that the only numbers that exist in \mathbb{F}_2 are 0 and 1. We've been using larger numbers like 1011... what's up with that? Well, when mathematicians do this kind of arithmetic, they say that numbers (like 1011) are actually polynomials (like x^3+x+1) where all the coefficients are in \mathbb{F}_2. The polynomials with coefficients in \mathbb{F}_2 are \mathbb{F}_2[x], which is called a "polynomial ring". So, in \mathbb{F}_2 we have the numbers 0 and 1. In \mathbb{F}_2[x], we have polynomials like x^2+1, x^3+x+1, and 0, but we can choose to just write the coefficients, like 101, 1011, and 0. More Information * Cyclic Group on Wikipedia, MathWorld * Finite Field on Wikipedia, MathWorld * Polynomial Ring on Wikipedia Subtraction in Binary What happens when you subtract modulo 2? Do you get negative numbers? Think about what happens when you subtract 1000 - 1011. Is 1011 bigger than 1000? It turns out that "bigger" or "smaller" isn't really something we think about when we are working modulo 2. Subtracting 1000 - 1011 we get 11, and we know that this is right because 1000 + 11 = 1011. In modulo 2 arithmetic, x - y = x + y. Addition and subtraction are the same operation, in modulo 2 arithmetic. Or we can say x = -x, which means the same thing.. Deeper Mathematics There are a ton of theorems in algebra that can be proven to work in any algebraic system except modulo 2 arithmetic. The fact that x = -x , throws a wrench in many of these classic theorems and proofs. Mathematicians say that \mathbb{F}_2 has "characteristic 2", which means that if you add 2 copies of 1, you get 1+1=0. These theorems will start with a phrase like, "Let \mathbb{F} be a field with characteristic other than 2..." More Information * Characteristic on Wikipedia, MathWorld * Mathematics Stack Exchange: What's so special about characteristic 2? Multiplication in Binary Let's continue exploring modulo 2 arithmetic by talking about multiplication. Addition is just repeated multiplication, so with our understanding of modulo 2 arithmetic, we can calculate 1011 \cdot 110 = 111010. 1011x11000001011+1011111010 Keep in mind that this is special, modulo-2 multiplication, without carry. It's not the same kind of multiplication you get when you write 0b1011 * 0b110 in a program. If you're a programmer, you'll recognize that multiplying by 10 is the same as shifting to the left by one bit. For example, we can shift 1001101 to the left by one bit: 1001101x100000000+100110110011010 If we think of it as a polynomial, shifting to the left by one bit is the same as multiplying the polynomial by x (and the polynomial x has coefficients 10). Code The straightforward way to multiply two polynomials, mod 2, is with long multiplication. #include // Multiply two polynomials, mod 2. uint32_t mod2_multiply(uint32_t p, uint32_t q) { uint32_t result = 0; for (int i = 0; i < 32; i++) { if (((q >> i) & 1) != 0) { result ^= p << i; } } return result; } It turns out that this kind of multiplication is important enough that various CPUs have instructions for it, to make it faster to multiply polynomials modulo 2. The x86 has pclmulqdq and ARM has vmull.p8. Deeper Mathematics In \mathbb{F}_2[x], what is (x+y)^2? Answer \begin{split} (x + y)^2 &= x^2 + 2xy + y^2 \\ &= x^2 + y^2 \end {split} This is another reason why various theorems are not true for fields with characteristic 2, even if they are true for all other fields. Division in Binary The final piece of the arithmetic puzzle is division. You can use long division to divide numbers modulo 2. Division gives us the quotient and remainder. We're not interested in fractions, so the quotient will always be an integer. For example, we can divide 10001101 / 1011 and get 10110, with a remainder of 111. 10110101110001101-10111111-10111000-1011111 Note that unlike normal division, we just need to look at the highest bit in each division step. See also Wikipedia: Polynomial long division for a more general discussion of how to divide polynomials, although the article doesn't cover modulo 2 arithmetic. Code To divide modulo 2, we just have to translate the old, familiar long division algorithm to code. This algorithm divides p by q. In each step, we shift q to the left so the highest 1 bit lines up with the remainder, and then subtract (which is XOR). #include #include // Return the index of the highest bit set, 0-31. Returns -1 if no // bits are set. int highest_bit(uint32_t p) { if (p == 0) { return -1; } int bit = 0; while (p > 1) { p >>= 1; bit++; } return bit; } // The result of dividing two polynomials, modulo 2. struct division_result { uint32_t quotient; // p / q uint32_t remainder; // p % q }; // Divide the polynomial p by q, modulo 2. struct division_result mod2_divide(uint32_t p, uint32_t q) { assert(q != 0); uint32_t quotient = 0; uint32_t remainder = p; int remainder_width = highest_bit(p); int q_width = highest_bit(q); while (remainder_width >= q_width) { int shift = remainder_width - q_width; remainder ^= q << shift; quotient |= 1 << shift; remainder_width = highest_bit(remainder); } return (struct division_result){ .quotient = quotient, .remainder = remainder, }; } If you want to check that the division algorithm worked correctly, there are two things that we want to check. First, we check that the remainder is smaller than q. If it's not, then we haven't finished dividing. Second, we multiply the quotient by q, add the remainder, and check that you get the original value of p. // Divide the polynomial p by q, modulo 2, and check that the result // is correct. void mod2_divide_check(uint32_t p, uint32_t q) { struct division_result r = mod2_divide(p, q); // Check that the remainder is strictly lower order than q. assert(highest_bit(r.remainder) < highest_bit(q)); // Check that we get the original value of p back. assert(mod2_add(mod2_multiply(r.quotient, q), r.remainder) == p); } LFSRs, Now with More Mathematics! If we take a look at LFSRs again, we can explain what LFSRs are doing in terms of polynomials modulo 2. We can think of the state of an LFSR with k bits as a polynomial with degree smaller than k. For example, a 16-bit LFSR will have a state that contains coefficients for x^0 up to x^{15}. If the LFSR shifts left, then the state 0x870C is the polynomial P(x): P(x) = x^{15} + x^{10} + x^9 + x^8 + x^3 + x^2 When we shift the polynomial left one bit, it's the same as multiplying the polynomial by x: x P(x) = x^{16} + x^{11} + x^{10} + x^9 + x^4 + x^3 We can think of the taps for the LFSR as a 16-th degree polynomial, equal to x^{16}, plus x^n for each tap at position n. With our taps at 0, 2, 3, and 5, the taps represent the polynomial Q(x): Q(x) = x^{16} + x^5 + x^3 + x^2 + 1 This is called the characteristic polynomial of the LFSR. Note that the number of bits in the LFSR is equal to the degree of the characteristic polynomial. Propagating the high bit to the taps is the same as dividing by Q(x) and taking the remainder. This means we have a simple formula for the next state, P^\prime(x): \begin{split} P^\prime(x) &= x P(x) \bmod Q(x) \\ &= x^{11} + x^{10} + x^9 + x^5 + x^4 + x^2 + 1 \end{split} Now, how do we pick the taps? How can we calculate the period of an LFSR? Deeper Mathematics Just like we took the integers \mathbb{Z} and constructed the integers mod 2 as the quotient ring \mathbb{Z}/2\mathbb{Z}, we've taken the polynomials \mathbb{F}_2[x] and constructed the quotient ring \mathbb{F}_2[x]/\lang Q(x) \rang, which contains the polynomials in \mathbb{F}_2[x] modulo Q(x). (At this point, a mathematician might decide to get technical and say that a quotient ring contains equivalence classes, and explain that \ lang Q(x) \rang is called a "principal ideal", and but that's not a distinction we need to make here.) More Information * Quotient ring on Wikipedia, MathWorld * Principal Ideal on Wikipedia, MathWorld * Ideal on Wikipedia, MathWorld Irreducible Polynomials An irreducible polynomial is a polynomial that can't be factored into the product of two non-constant polynomials. For example, the polynomial P below is irreducible, but Q is reducible because it can be factored. P(x) = x^4 + x + 1 \begin{split} Q(x) &= x^4 + x^3 + x^2 + 1 \\ &= (x^3 + x + 1) (x + 1) \end{split} Any LFSR with maximum period must use an irreducible polynomial for the taps. Why? Think about what happens if you have a polynomial which can be factored, Q(x) = Q_1(x)Q_2(x) Suppose that the LFSR starts in state P(x)=Q_1(x) Each time this LFSR advances, if the original state is a multiple of Q_1(x), then the next state will also be a multiple of Q_1(x). Proof The next state is: P^\prime(x) = x P(x) \bmod Q(x) This means that there is a quotient, S(x), P^\prime(x) + S(x)Q(x) = x P(x) By rearranging, we get, P^\prime(x) = x P(x) - S(x) Q(x) If P(x) is divisible by Q_1(x), then xP(x) and S(x)Q(x) are both divisible by Q_1(x), and therefore P^\prime(x) must also be divisible by Q_1(x). If Q_1(x) and Q_2(x) have no common factors, then you are getting the same result as if you were running two LFSRs in parallel--one for Q_1, and one for Q_2. Running two shorter LFSRs will give a shorter period than you would get for a single, maximum-period LFSR with the same number of bits. Deeper Mathematics If Q(x) is an irreducible polynomial in \mathbb{F}_2[x], then the quotient ring of polynomials modulo Q(x), called \mathbb{F}_2[x]/\ lang Q(X) \rang, is a field. It turns out that all of these fields are isomorphic to each other, and if you are just trying to construct a field, it doesn't matter what you pick for Q(x) as long as it is irreducible. If Q(x) has order k, then the field is often called \mathbb{F}_{2^k} or \mathrm {GF}(2^k). Note that 2^k is sometimes just written out as a number and it means the same thing, so \mathbb{F}_{2^8}, \mathbb{F}_{256}, \ mathrm{GF}(2^8), and \mathrm{GF}(256) are all the same thing. It does matter for us what we choose for Q(x) because we care about how the elements of \mathrm{GF}(2^k) are represented. More Information * Irreducible Polynomial on Wikipedia, MathWorld Finding Irreducible Polynomials Like many problems, the way that you find irreducible polynomials depends on how big the polynomials should be. For smaller polynomials, it's easy enough to try and factor them. If a polynomial with degree n has factors, then at least one of the factors must have degree n/2 or smaller. We can list all of the irreducible polynomials up to degree n/2 and use this list to test whether polynomials of degree n are irreducible. This requires an exponential increasing amount of CPU power and memory as the polynomials grow larger, but in practice, it works fine if you just need to find a polynomial of degree no more than, say, 40. There are faster algorithms which will let you find much larger irreducible polynomials. Code Here is some code for finding an irreducible polynomial of degree 31 or lower. For higher degree polynomials, the code will require some modification... but this approach is too slow for polynomials of high degree anyway. #include #include #include #include // A dynamic list of polynomials. struct poly_list { uint32_t *polys; int count; int alloc_size; }; // Append a single polynomial to the end of the list, growing the // array if necessary. void poly_append(struct poly_list *list, uint32_t poly) { if (list->count >= list->alloc_size) { int new_size = list->alloc_size == 0 ? 8 : list->alloc_size * 2; uint32_t *new_array = realloc(list->polys, new_size * sizeof(*new_array)); if (new_array == NULL) { abort(); } list->polys = new_array; list->alloc_size = new_size; } list->polys[list->count] = poly; list->count++; } // Number of lists of irreducible polynomials. #define LIST_COUNT 16 // List of irreducible polynomials of each degree, up to degree 15. struct poly_list irreducible[LIST_COUNT]; // Test if a polynomial is irreducible. Requires that the lists of // irreducible polynomials be initialized up to and including // degree/2. bool is_irreducible(uint32_t poly, int degree) { // Test polynomials with degree no larger than degree/2. If this // polynomial is divisible by none of those polynomials, it is // irreducible. for (int n = 1; n <= degree / 2; n++) { uint32_t *list = irreducible[n].polys; int count = irreducible[n].count; for (int i = 0; i < count; i++) { struct division_result r = mod2_divide(poly, list[i]); if (r.remainder == 0) { return false; } } } return true; } // Initialize the list of irreducible polynomials in order to find // irreducible polynomials of the given degree. void init_irreducible(int degree) { assert(degree / 2 < LIST_COUNT); for (int n = 1; n <= degree / 2; n++) { uint32_t start = (uint32_t)1 << n; uint32_t end = start << 1; for (uint32_t poly = start; poly < end; poly++) { if (is_irreducible(poly, degree)) { poly_append(&irreducible[n], poly); } } } } // Find an irreducible polynomial with the given degree. uint32_t find_irreducible(int degree) { init_irreducible(degree); uint32_t start = (uint32_t)1 << degree; uint32_t end = start << 1; for (uint32_t poly = start; poly < end; poly++) { if (is_irreducible(poly, degree)) { return poly; } } // Mathematically impossible! For any degree, it is always // possible to find an irreducible polynomial. assert(false); } Primitive Polynomials Here's where we get into the real math. Not all irreducible polynomials give us a maximum period LFSRs, but we know that the characteristic polynomial must be irreducible. It turns out that the polynomial must not only be irreducible, but also a primitive polynomial. There are different but equivalent ways of defining "primitive polynomial", but for our purposes, we can just say that a primitive polynomial is a polynomial that makes an LFSR with maximum possible period. To simplify the notation, I'll write polynomials as p or q instead of P(x) or Q(x). Let q be the LFSR's characteristic polynomial, with degree k. The possible states for the LFSR are, \mathcal{P}_k(\mathbb{F}_2), the set of polynomials with degree less than k. Let S(p) be the set of all states reachable from initial state p. S(p) = \{ px^n \bmod q : n \in \mathbb{N} \} If q is a primitive polynomial, then S(p) will contain 2^k-1 states, and our LFSR has maximum period. Deeper Mathematics There are a couple interesting bits of group and ring theory here. Primitive polynomials are also defined as polynomials, Q(x), where Q (x) is a factor of x^n+1 for n=2^k-1, but not for any 0 < n < 2^k-1. If you think about the way LFSRs work, this is just equivalent to saying that the LFSR has period 2^k-1! (There is another common way of defining "primitive polynomial" which defines it in terms of something called field extensions.) The nonzero elements \mathrm{GF}(2^k)\setminus\{0\} form a group, using multiplication as a group operation. S(1) is just a subgroup of this group, also written as \lang x \rang, the subgroup generated by x. S(p) is a coset of this group, written as p\lang x \rang. More Information * Primitive Polynomial on Wikipedia, Primitive Polynomial. MathWorld notes that, "Amazingly, primitive polynomials over GF (2) define a recurrence relation which can be used to obtain a new pseudorandom bit from the n preceding ones." Doesn't that sound familiar? It's talking about LFSRs! * Subgroup on Wikipedia, MathWorld * Coset on Wikipedia, MathWorld Period and Initial State We can prove that all sets S(p) for p \ne 0 have the same size. In other words, the period of the LFSR does not depend on which state you start in, as long as you don't start in the zero state. First, we can redefine S(p) in terms of S(1): \begin{split} S(p) &= \{p x^n \bmod q : n \in \mathbb{N} \} \\ &= \{p (x^n \bmod q) \bmod q : n \in \mathbb{N} \} \\ &= \{p s \bmod q : s \ in S(1) \} \end{split} Written this way, it's obvious that S(p) cannot be larger than S(1). If S(p) is smaller, then it must mean that two different elements, s_1, s_2 \in S(1) become the same element in S(p), which means that ps_1 = ps_2 \mod q. By rearranging, p(s_1 - s_2) = 0 \mod q. Since q is irreducible, it must either be a factor of p or a factor of s_1 - s_2. If it's a factor of p, then p=0. If it's a factor of s_1 - s_2, then s_1 = s_2. Both possibilities contradict the premise, so the size of S(1) and S(p) is the same, if p \ne 0. Partitioning All States We can prove that any set of states, S(p_1) and S(p_2), are either equal to each other or disjoint. We can exclude S(0) because it is the only set which contains 0. Suppose that some element is in two sets, p \in S(p_1) \cap S(p_2), where p \ne 0. From the way these sets are constructed, both S(p_1) and S(p_2) must contain all states reachable from p, S(p). Since all these sets are the same size, they must be the same set. LFSR Period This brings us to the final piece of the puzzle, where we can prove what the period of the LFSR is. Consider the set of all sets of reachable states, excluding the zero state. T = \{ S(p) : p \in \mathcal{P}(\mathbb{F}_2), p \ne 0 \} Each element in T is a set of states that the LFSR can be stuck in. Every nonzero state must appear in at least one of these sets, because p \in \mathcal{P}(\mathbb{F}_2). However, no state appears in more than one of these sets, because we proved that these sets are either disjoint or equal. Since all of the sets in T are the same size, then the total number of nonzero states must be equal to the number of sets in T, multiplied by the number of states reachable from a nonzero state. \lvert T \rvert \cdot \lvert S(1) \rvert = 2^k-1 In other words, the period of an LFSR which uses an irreducible polynomial must be a divisor of 2^k-1. How does this help us? Deeper Mathematics This is Lagrange's Group Theorem. More Information * Lagrange's Group Theorem on Wikipedia, MathWorld Finding Primitive Polynomials We already have a system for finding irreducible polynomials. Once we find an irreducible polynomial, we can test to see if it is primitive by checking that the period is 2^k-1. Instead of iterating through 2^k-1 possible states, there's a shortcut. Since the period of an LFSR with an irreducible characteristic polynomial must be a divisor of 2^k-1, we only have to check the divisors of 2^k-1. In fact, we don't have to check all the divisors, we just find every prime factor a, and check that the LFSR period is not a divisor of (2^k-1)/a. For some number n, we can check whether the LFSR's period is a factor of n by calculating x^n \bmod q. If the period is a factor of n, then x^n \bmod q = 1. Example For an 8-bit LFSR, we will end up with a maximum period of 255. The prime factors of 255 are 3, 5, and 17. That means that the only periods we need to check for an 8-bit LFSR are 85 (255/3), 51 (255/ 5), and 15 (255/17). In math terms, this means taking an irreducible polynomial Q(x) and testing that the following equations are true. If an irreducible polynomial with degree 8 satisfies these equations, it is a primitive polynomial. \begin{split} x^{85} & \ne x \mod Q(x) \\ x^{51} & \ne x \mod Q(x) \\ x^{15} & \ne x \mod Q(x) \end{split} Since the LFSR must repeat when advanced 255 times, however, the following formula will hold: x^{255} = x \mod Q(x) Code First, we need to find the factors of 2^k-1 which we want to test. // Maximum number of factors to test. Note that 32-bit number can // only have 9 prime factors, at most. #define MAX_FACTORS 16 // Given an integer n, find all unique factors of the form n/a, // where a is a prime number and a != n. Returns the number of // factors. int factorize(int *factors, int n) { int count = 0; int rem = n; int factor = 2; for (; rem >= factor * factor; factor++) { if ((rem % factor) == 0) { assert(count < MAX_FACTORS); factors[count] = n / factor; count++; while ((rem % factor) == 0) { rem /= factor; } } } // If rem == n, then n is prime, and we will return 0 factors. if (rem > 1 && rem != n) { assert(count < MAX_FACTORS); factors[count] = n / rem; count++; } return count; } Next, we write a function to calculate x^n \bmod q. Ther's an algorithm for this that runs in \mathrm{O}(\log n) time. It works by repeatedly squaring polynomials. First, we can write a function for multiplying two polynomials, modulo q. // Multiply two polynomials, mod 2, and return the result, mod q. uint32_t multiply_modq(uint32_t a, uint32_t b, uint32_t q, int width) { uint32_t result = 0; // p = b * x^i mod q. uint32_t p = b; for (int i = 0; i < width; i++) { // Calculate p = p * x mod q. p <<= 1; if (((p >> width) & 1) != 0) { p ^= q; } // Accumulate the result. if (((a >> i) & 1) != 0) { result ^= p; } } return result; } The exponentiation algorithm works by observing that: a^b = \begin{cases} 1 &\text{if } b = 0 \\ (a^2)^{b/2} &\text{if $b$ even, $b \ne 0$} \\ aa^{b-1} &\text{if $b$ odd} \end{cases} This formula still holds with modular arithmetic, polynomials, and LFSRs. // Compute pow(a, b), mod q. uint32_t pow_modq(uint32_t a, int b, uint32_t q, int width) { if (b == 0) { return 1; } // Loop invariant: The final result is always equal to: // accum * power ^ exponent. uint32_t accum = 1; uint32_t power = a; for (int exponent = b; exponent > 1; exponent >>= 1) { if ((exponent & 1) != 0) { accum = multiply_modq(accum, power, q, width); } power = multiply_modq(power, power, q, width); } return multiply_modq(accum, power, q, width); } Using these functions, we can find a primitive polynomial with any order we want, and therefore a maximum period LFSR of any size. // Return true if a polynomial is a primitive polynomial. The // factors must be the result of factorize. bool is_primitive(uint32_t poly, int degree, int *factors, int factor_count) { if (!is_irreducible(poly, degree)) { return false; } for (int i = 0; i < factor_count; i++) { // This calculates x^n mod poly, where n is the factor we are // testing. "2" is the representation of the polynomial x. if (pow_modq(2, factors[i], poly, degree) == 1) { return false; } } return true; } // Find a primitive polynomial with the given degree. uint32_t find_primitive(int degree) { // List all factors we want to test. int factors[MAX_FACTORS]; int factor_count = factorize(factors, (1 << degree) - 1); // Loop over irreducible polynomials. init_irreducible(degree); uint32_t start = (uint32_t)1 << degree; uint32_t end = start << 1; for (uint32_t poly = start; poly < end; poly++) { if (is_primitive(poly, degree, factors, factor_count)) { return poly; } } // Mathematically impossible! For any degree, it is always // possible to find a primitive polynomial. assert(false); } Results With the code above, we can search for a primitive polynomial, then check that the period of an LFSR with that polynomial has the maximum possible length. Here are the smallest primitive polynomials of various orders up to 24: Size Polynomial (hex) 8 0x11d 9 0x211 10 0x409 11 0x805 12 0x1053 14 0x402b 16 0x1002d 20 0x100009 24 0x100001b More Information * Wikipedia: Linear feedback shift register * David A. Singer, Secrets of Linear Feedback Shift Registers. This presents a more rigorous explanation of primitive polynomials, which uses somewhat more advanced foundations. It's short and sweet, only ten pages long, but it does assume that you are comfortable with linear algebra, groups, rings, and fields. * Jason Sachs, Linear Feedback Shift Registers for the Uninitiated, Part XVIII: Primitive Polynomial Generation. This article goes into more depth. Sachs discusses runtime performance and presents different algorithms for constructing primitive polynomials. The article covers eleven different algorithms, explains the math, and provides source code--it's quite thorough!