https://leetarxiv.substack.com/p/emulating-a-gpu-on-a-cpu-using-finite

[https]

LeetArxiv

SubscribeSign in

Share this post

 
[https]
LeetArxiv
LeetArxiv
Emulating a GPU on a CPU using Finite Field Assembly
Copy link
Facebook
Email
Notes
More

Emulating a GPU on a CPU using Finite Field Assembly

A Friendly yet Thorough Introduction to Finite Field Assembly
programming language and the Recursive Computing Paradigm

 
[http]
Murage Kibicho
Jan 12, 2025
3

Share this post

 
[https]
LeetArxiv
LeetArxiv
Emulating a GPU on a CPU using Finite Field Assembly
Copy link
Facebook
Email
Notes
More
 
1
 
Share

Quick Note

This is Chapter 4 in our book, A Super-friendly Guide to Finite
Fields for Programmers.

LeetArxiv and the Finite Field Assembly (FF-asm) programming language
are entirely funded by our Substack readers. Subscribe to support us!

[                    ]
Subscribe
Past Chapters (for paying subscribers)

  * Chapter 1 : FizzBuzz and Finite Field Theory

  * Chapter 2 : Introducing Silly Math Jargon like Congruences and
    Primes.

  * Chapter 3 (Unfinished) : Constructing our First Finite Field

  * Chapter 4 (We are here): Emulating a GPU using a CPU using
    Congruences, Primes and Number Theory.

---------------------------------------------------------------------

Introduction

 
[https]
FF-asm GitHub repo

Finite Field Assembly (FF-asm) is a programming language founded on
the thesis: Math is mostly invented, rarely discovered.

For instance, binary digits (0 and 1) fall into the category of 
discovered math, while concepts like 2's complement, fixed-point
arithmetic, and floating-point arithmetic are examples of invented
math.

The FF-asm language empowers programmers to create their own
mathematical systems to solve the problem at hand.

Finite Field Assembly's main feature is recursive computing : it's
not vectorization, not parallelization, but performing a calculation
inside a calculation inside another calculation.

*The finite field is the most basic data structure in FF-asm.

Defining our Problem

The problem : We need to emulate a GPU on a regular CPU.

Here are our constraints :

 1. We're NOT using SIMD vectorization.

 2. We're NOT utilizing OpenMP's parallelization routines.

 3. We lack access to a GPU.

Therefore, we need to build math that permits us perform lots of
parallel computations on our CPU.

 
[https]
Meme explaining the RASM philosophy : You can just create your own
math

This tutorial guides us through constructing a custom finite field
that supports addition and multiplication using unsigned 8-bit
integers in FF-asm.

Setting up our FF-asm Environment

1. Installing Dependencies

 1. One needs a working installation of the GNU MP Bignum library.^
    [1] Here's the link to install libgmp.

     1. Ubuntu installation instructions here.

 2. One also need the ff_asm_runtime.h file and the ff_asm_primes.h
    located in the official GitHub repository.

2. Verifying Installation : Writing Hello World in FF-asm

Remember, FF-asm, just like CUDA, is an extension of C. It's super
low-level so you need to handle memory allocations.

This Hello World Program creates an 8-bit unsigned integer finite
field.

  * Create a file called 01_HelloWorld.c in the same directory that
    contains ff_asm_runtime.h and ff_asm_primes.h

      + 
         
        [https]
        Sample Directory Structure
  * Copy the code below. You can find the source here.

#include <stdio.h>
#include "ff_asm_runtime.h"

//Run : gcc 01_HelloWorld.c -lgmp -lm -o m.o && ./m.o
int main()
{
        //Unsigned 8-bit array
        uint8_t data[] = {50, 100, 20};
        uint64_t fieldOrder[] = {8, 9, 11};
        size_t dataLength = sizeof(data) / sizeof(uint8_t);

        //Allocate memory for a FF-asm field of type Int_8_U_Field
        ff_asmField field = ff_asmMalloc(dataLength, Int_8_U_Field);

        //Set our finite field order
        ff_asmSetFieldOrder(field, dataLength, fieldOrder);

        //Place data in the FF-asm fieldNode at index 0
        ff_asmAppendData(field, dataLength, data, Int_8_U_Field);

        //Print the FF-asm field
        ff_asmPrintField(field);

        //Free the FF-asm field
        ff_asmFreeField(field);
        return 0;
}

Compile and run using these two commands:

$ gcc 01_HelloWorld.c -lgmp -lm -o m.o && ./m.o
$ ./m.o

If your installation is working, you should get this

Field Data Type: Int_8_U_Field
Field Length: 3
Field Node Count: 1
Field Order : 8 9 11
        Node Data Type: Int_8_U_Field
        Node Data Length: 3
        Node Data :  50, 100,  20,
        Node integer : 658

 1. Int_8_U_Field : This is the field data type, an unsigned 8-bit
    integer in this case.

 2. (8,9,11): Field order (the number of elements your field can
    hold). i.e you can store 8 * 9 * 11 elements in the field .

 3. 658 : This is a unique integer that represents your specific the
    set of finite field elements. It is the solution to a linear
    congruence.

3. Performing Recursive Addition on CPU

Recursive computing is a first-class feature in Finite Field Assembly
(FF-asm). We use the mathematical theory of congruences and primes to
perform lots of different calculations at the same time.

In this section, we show how to perform addition in Finite Field
Assembly (FF-asm) using the Recursive Computing paradigm.

We cover the mathematical theory of Recursive Computing in Chapter 2
of our book, A Super-friendly Guide to Finite Fields for Programmers.

*Recursive computing is not parallelization, not cache-aware 
vectorization, but a calculation inside a calculation inside another
calculation.

 
[https]
Visualizing addition in the Recursive Computing Paradigm with Finite
Field Assembly

*Recursive computing - performing one outer calculation lets you
perform several inner calculations at the same time.

Coding Lab : Recursive Addition for the vectors {5,3,6} and {2,4,1}

This coding lab introduces the functions :

  * ff_asmMalloc

  * ff_asmSetFieldOrder

  * ff_asmAppendData

  * ff_asmAdd

  * ff_asmDataDebug

  * ff_asmPrintField

ff_asmFreeField

You can find the file on GitHub here.

#include <stdio.h>
#include "ff_asm_runtime.h"

//Run : gcc 02_RecursiveAddition.c -lgmp -lm -o m.o && ./m.o
int main()
{
        //Unsigned 8-bit arrays we want to add
        uint8_t data0[] = {5, 3, 6};
        uint8_t data1[] = {2, 4, 1};
        uint64_t fieldOrder[] = {8, 9, 11};
        int dataLength = sizeof(data0) / sizeof(uint8_t);

        //Allocate memory for one FF-asm fields of type Int_8_U_Field
        ff_asmField field = ff_asmMalloc(dataLength, Int_8_U_Field);

        //Set our finite field order
        ff_asmSetFieldOrder(field, dataLength, fieldOrder);

        //Place data0 in the FF-asm fieldNode at index 0
        ff_asmAppendData(field, dataLength, data0, Int_8_U_Field);

        //Place data1 in the FF-asm field at index 1
        ff_asmAppendData(field, dataLength, data1, Int_8_U_Field);

        //Add data at index 0 and data at index 1 : store at index 2.
        ff_asmAdd(field, 0, 1, 2);

        //Prepare the data for printing.
        ff_asmDataDebug(field);

        //Print the FF-asm field
        ff_asmPrintField(field);

        //Free the FF-asm field
        ff_asmFreeField(field);
        return 0;
}

This will output

Field Data Type: Int_8_U_Field
Field Length: 3
Field Node Count: 3
Field Order : 8 9 11
        Node Data Type: Int_8_U_Field
        Node Data Length: 3
        Node Data :   5,   3,   6,
        Node integer : 237

        Node Data Type: Int_8_U_Field
        Node Data Length: 3
        Node Data :   2,   4,   1,
        Node integer : 562

        Node Data Type: Int_8_U_Field
        Node Data Length: 3
        Node Data :   7,   7,   7,
        Node integer : 7

  * We observe that our result array {7,7,7} is the last element in
    the field.

4. Performing Recursive Multiplication on CPU

 
[https]
Visualizing multiplication in the Finite Field Assembly language

Coding Lab : Recursive Multiplication for the vectors {5,3,6} and
{2,4,1}

Note : we increase the order of field (i.e the values in the 
fieldOrder array) to construct a field that can hold our expected
result.

This coding lab introduces the function:

  * ff_asmMultiply

#include <stdio.h>
#include "ff_asm_runtime.h"

//Run : gcc 03_RecursiveMultiplication.c -lgmp -lm -o m.o && ./m.o
int main()
{
        //Unsigned 8-bit arrays we want to add
        uint8_t data0[] = {5, 3, 6};
        uint8_t data1[] = {2, 4, 1};
        uint64_t fieldOrder[] = {15,17,19};
        int dataLength = sizeof(data0) / sizeof(uint8_t);

        //Allocate memory for one FF-asm fields of type Int_8_U_Field
        ff_asmField field = ff_asmMalloc(dataLength, Int_8_U_Field);

        //Set our finite field order
        ff_asmSetFieldOrder(field, dataLength, fieldOrder);

        //Place data0 in the FF-asm fieldNode at index 0
        ff_asmAppendData(field, dataLength, data0, Int_8_U_Field);

        //Place data1 in the FF-asm field at index 1
        ff_asmAppendData(field, dataLength, data1, Int_8_U_Field);

        //Multiply data at index 0 and data at index 1 : store at index 2.
        ff_asmMultiply(field, 0, 1, 2);

        //Prepare the data for printing.
        ff_asmDataDebug(field);

        //Print the FF-asm field
        ff_asmPrintField(field);

        //Free the FF-asm field
        ff_asmFreeField(field);
        return 0;
}

Output

Field Data Type: Int_8_U_Field
Field Length: 3
Field Node Count: 3
Field Order : 15 17 19
        Node Data Type: Int_8_U_Field
        Node Data Length: 3
        Node Data :   5,   3,   6,
        Node integer : 785

        Node Data Type: Int_8_U_Field
        Node Data Length: 3
        Node Data :   2,   4,   1,
        Node integer : 4067

        Node Data Type: Int_8_U_Field
        Node Data Length: 3
        Node Data :  10,  12,   6,
        Node integer : 4585

  * We observe that our result array {10, 12, 6} is the last element
    in the field.

This was a short introduction to Finite Field Assembly. Make sure to
look at our source code and star the repo here

Funding Goals

These are our immediate goals. It's mostly math research and tooling.

*Paying subscribers please add a note describing what you're
supporting.

 1. Finite Field GEMM - writing a Matrix Multiplication library over
    Finite Fields. (Research was funded and completed here)

 2. Add subtraction, division and better error handling.

 3. Supporting signed integers, floats and doubles.

 4. Add a polynomial arithmetic library.

 5. Add Tree-Sitter syntax highlighting and LSP support.

 6. Hire webdev for better documentation.

    [                    ]
    Subscribe

Special thanks to the Red Bull Futurism Fund and
Redbull Futurist
.

3

Share this post

 
[https]
LeetArxiv
LeetArxiv
Emulating a GPU on a CPU using Finite Field Assembly
Copy link
Facebook
Email
Notes
More
 
1
 
Share

Discussion about this post

CommentsRestacks
[htt]
[                    ]
[                    ]
[                    ]
[                    ]
TopLatestDiscussions

No posts

Ready for more?

[                    ]
Subscribe
(c) 2025 Murage Kibicho
Privacy [?] Terms [?] Collection notice
 Start WritingGet the app
Substack is the home for great culture

Share

Copy link
Facebook
Email
Notes
More
This site requires JavaScript to run correctly. Please turn on
JavaScript or unblock scripts