https://leetarxiv.substack.com/p/emulating-a-gpu-on-a-cpu-using-finite [https] LeetArxiv SubscribeSign in Share this post [https] LeetArxiv LeetArxiv Emulating a GPU on a CPU using Finite Field Assembly Copy link Facebook Email Notes More Emulating a GPU on a CPU using Finite Field Assembly A Friendly yet Thorough Introduction to Finite Field Assembly programming language and the Recursive Computing Paradigm [http] Murage Kibicho Jan 12, 2025 3 Share this post [https] LeetArxiv LeetArxiv Emulating a GPU on a CPU using Finite Field Assembly Copy link Facebook Email Notes More 1 Share Quick Note This is Chapter 4 in our book, A Super-friendly Guide to Finite Fields for Programmers. LeetArxiv and the Finite Field Assembly (FF-asm) programming language are entirely funded by our Substack readers. Subscribe to support us! [ ] Subscribe Past Chapters (for paying subscribers) * Chapter 1 : FizzBuzz and Finite Field Theory * Chapter 2 : Introducing Silly Math Jargon like Congruences and Primes. * Chapter 3 (Unfinished) : Constructing our First Finite Field * Chapter 4 (We are here): Emulating a GPU using a CPU using Congruences, Primes and Number Theory. --------------------------------------------------------------------- Introduction [https] FF-asm GitHub repo Finite Field Assembly (FF-asm) is a programming language founded on the thesis: Math is mostly invented, rarely discovered. For instance, binary digits (0 and 1) fall into the category of discovered math, while concepts like 2's complement, fixed-point arithmetic, and floating-point arithmetic are examples of invented math. The FF-asm language empowers programmers to create their own mathematical systems to solve the problem at hand. Finite Field Assembly's main feature is recursive computing : it's not vectorization, not parallelization, but performing a calculation inside a calculation inside another calculation. *The finite field is the most basic data structure in FF-asm. Defining our Problem The problem : We need to emulate a GPU on a regular CPU. Here are our constraints : 1. We're NOT using SIMD vectorization. 2. We're NOT utilizing OpenMP's parallelization routines. 3. We lack access to a GPU. Therefore, we need to build math that permits us perform lots of parallel computations on our CPU. [https] Meme explaining the RASM philosophy : You can just create your own math This tutorial guides us through constructing a custom finite field that supports addition and multiplication using unsigned 8-bit integers in FF-asm. Setting up our FF-asm Environment 1. Installing Dependencies 1. One needs a working installation of the GNU MP Bignum library.^ [1] Here's the link to install libgmp. 1. Ubuntu installation instructions here. 2. One also need the ff_asm_runtime.h file and the ff_asm_primes.h located in the official GitHub repository. 2. Verifying Installation : Writing Hello World in FF-asm Remember, FF-asm, just like CUDA, is an extension of C. It's super low-level so you need to handle memory allocations. This Hello World Program creates an 8-bit unsigned integer finite field. * Create a file called 01_HelloWorld.c in the same directory that contains ff_asm_runtime.h and ff_asm_primes.h + [https] Sample Directory Structure * Copy the code below. You can find the source here. #include #include "ff_asm_runtime.h" //Run : gcc 01_HelloWorld.c -lgmp -lm -o m.o && ./m.o int main() { //Unsigned 8-bit array uint8_t data[] = {50, 100, 20}; uint64_t fieldOrder[] = {8, 9, 11}; size_t dataLength = sizeof(data) / sizeof(uint8_t); //Allocate memory for a FF-asm field of type Int_8_U_Field ff_asmField field = ff_asmMalloc(dataLength, Int_8_U_Field); //Set our finite field order ff_asmSetFieldOrder(field, dataLength, fieldOrder); //Place data in the FF-asm fieldNode at index 0 ff_asmAppendData(field, dataLength, data, Int_8_U_Field); //Print the FF-asm field ff_asmPrintField(field); //Free the FF-asm field ff_asmFreeField(field); return 0; } Compile and run using these two commands: $ gcc 01_HelloWorld.c -lgmp -lm -o m.o && ./m.o $ ./m.o If your installation is working, you should get this Field Data Type: Int_8_U_Field Field Length: 3 Field Node Count: 1 Field Order : 8 9 11 Node Data Type: Int_8_U_Field Node Data Length: 3 Node Data : 50, 100, 20, Node integer : 658 1. Int_8_U_Field : This is the field data type, an unsigned 8-bit integer in this case. 2. (8,9,11): Field order (the number of elements your field can hold). i.e you can store 8 * 9 * 11 elements in the field . 3. 658 : This is a unique integer that represents your specific the set of finite field elements. It is the solution to a linear congruence. 3. Performing Recursive Addition on CPU Recursive computing is a first-class feature in Finite Field Assembly (FF-asm). We use the mathematical theory of congruences and primes to perform lots of different calculations at the same time. In this section, we show how to perform addition in Finite Field Assembly (FF-asm) using the Recursive Computing paradigm. We cover the mathematical theory of Recursive Computing in Chapter 2 of our book, A Super-friendly Guide to Finite Fields for Programmers. *Recursive computing is not parallelization, not cache-aware vectorization, but a calculation inside a calculation inside another calculation. [https] Visualizing addition in the Recursive Computing Paradigm with Finite Field Assembly *Recursive computing - performing one outer calculation lets you perform several inner calculations at the same time. Coding Lab : Recursive Addition for the vectors {5,3,6} and {2,4,1} This coding lab introduces the functions : * ff_asmMalloc * ff_asmSetFieldOrder * ff_asmAppendData * ff_asmAdd * ff_asmDataDebug * ff_asmPrintField ff_asmFreeField You can find the file on GitHub here. #include #include "ff_asm_runtime.h" //Run : gcc 02_RecursiveAddition.c -lgmp -lm -o m.o && ./m.o int main() { //Unsigned 8-bit arrays we want to add uint8_t data0[] = {5, 3, 6}; uint8_t data1[] = {2, 4, 1}; uint64_t fieldOrder[] = {8, 9, 11}; int dataLength = sizeof(data0) / sizeof(uint8_t); //Allocate memory for one FF-asm fields of type Int_8_U_Field ff_asmField field = ff_asmMalloc(dataLength, Int_8_U_Field); //Set our finite field order ff_asmSetFieldOrder(field, dataLength, fieldOrder); //Place data0 in the FF-asm fieldNode at index 0 ff_asmAppendData(field, dataLength, data0, Int_8_U_Field); //Place data1 in the FF-asm field at index 1 ff_asmAppendData(field, dataLength, data1, Int_8_U_Field); //Add data at index 0 and data at index 1 : store at index 2. ff_asmAdd(field, 0, 1, 2); //Prepare the data for printing. ff_asmDataDebug(field); //Print the FF-asm field ff_asmPrintField(field); //Free the FF-asm field ff_asmFreeField(field); return 0; } This will output Field Data Type: Int_8_U_Field Field Length: 3 Field Node Count: 3 Field Order : 8 9 11 Node Data Type: Int_8_U_Field Node Data Length: 3 Node Data : 5, 3, 6, Node integer : 237 Node Data Type: Int_8_U_Field Node Data Length: 3 Node Data : 2, 4, 1, Node integer : 562 Node Data Type: Int_8_U_Field Node Data Length: 3 Node Data : 7, 7, 7, Node integer : 7 * We observe that our result array {7,7,7} is the last element in the field. 4. Performing Recursive Multiplication on CPU [https] Visualizing multiplication in the Finite Field Assembly language Coding Lab : Recursive Multiplication for the vectors {5,3,6} and {2,4,1} Note : we increase the order of field (i.e the values in the fieldOrder array) to construct a field that can hold our expected result. This coding lab introduces the function: * ff_asmMultiply #include #include "ff_asm_runtime.h" //Run : gcc 03_RecursiveMultiplication.c -lgmp -lm -o m.o && ./m.o int main() { //Unsigned 8-bit arrays we want to add uint8_t data0[] = {5, 3, 6}; uint8_t data1[] = {2, 4, 1}; uint64_t fieldOrder[] = {15,17,19}; int dataLength = sizeof(data0) / sizeof(uint8_t); //Allocate memory for one FF-asm fields of type Int_8_U_Field ff_asmField field = ff_asmMalloc(dataLength, Int_8_U_Field); //Set our finite field order ff_asmSetFieldOrder(field, dataLength, fieldOrder); //Place data0 in the FF-asm fieldNode at index 0 ff_asmAppendData(field, dataLength, data0, Int_8_U_Field); //Place data1 in the FF-asm field at index 1 ff_asmAppendData(field, dataLength, data1, Int_8_U_Field); //Multiply data at index 0 and data at index 1 : store at index 2. ff_asmMultiply(field, 0, 1, 2); //Prepare the data for printing. ff_asmDataDebug(field); //Print the FF-asm field ff_asmPrintField(field); //Free the FF-asm field ff_asmFreeField(field); return 0; } Output Field Data Type: Int_8_U_Field Field Length: 3 Field Node Count: 3 Field Order : 15 17 19 Node Data Type: Int_8_U_Field Node Data Length: 3 Node Data : 5, 3, 6, Node integer : 785 Node Data Type: Int_8_U_Field Node Data Length: 3 Node Data : 2, 4, 1, Node integer : 4067 Node Data Type: Int_8_U_Field Node Data Length: 3 Node Data : 10, 12, 6, Node integer : 4585 * We observe that our result array {10, 12, 6} is the last element in the field. This was a short introduction to Finite Field Assembly. Make sure to look at our source code and star the repo here Funding Goals These are our immediate goals. It's mostly math research and tooling. *Paying subscribers please add a note describing what you're supporting. 1. Finite Field GEMM - writing a Matrix Multiplication library over Finite Fields. (Research was funded and completed here) 2. Add subtraction, division and better error handling. 3. Supporting signed integers, floats and doubles. 4. Add a polynomial arithmetic library. 5. Add Tree-Sitter syntax highlighting and LSP support. 6. Hire webdev for better documentation. [ ] Subscribe Special thanks to the Red Bull Futurism Fund and Redbull Futurist . 3 Share this post [https] LeetArxiv LeetArxiv Emulating a GPU on a CPU using Finite Field Assembly Copy link Facebook Email Notes More 1 Share Discussion about this post CommentsRestacks [htt] [ ] [ ] [ ] [ ] TopLatestDiscussions No posts Ready for more? [ ] Subscribe (c) 2025 Murage Kibicho Privacy [?] Terms [?] Collection notice Start WritingGet the app Substack is the home for great culture Share Copy link Facebook Email Notes More This site requires JavaScript to run correctly. Please turn on JavaScript or unblock scripts