https://www.mattkeeter.com/projects/prospero/ The Prospero Challenge Matt Keeter, March 2025 --------------------------------------------------------------------- prospero.vm is a plain-text file containing 7866 math expressions. It begins as follows: # Text of a monologue from The Tempest _0 const 2.95 _1 var-x _2 const 8.13008 _3 mul _1 _2 _4 add _0 _3 ...and continues in that vein for many, many lines. If you evaluate the expression with (x,y) values in the +-1 square, then color pixels black or white based on their sign, you get the following image: text of a monologue from The Tempest, in black-on-white text The "challenge" is simple: render the image as quickly as possible. A basic renderer is 28 lines of Python, using Numpy for pixel parallelism: import numpy as np with open('prospero.vm') as f: text = f.read().strip() image_size = 1024 space = np.linspace(-1, 1, image_size) (x, y) = np.meshgrid(space, -space) v = {} for line in text.split('\n'): if line.startswith('#'): continue [out, op, *args] = line.split() match op: case "var-x": v[out] = x case "var-y": v[out] = y case "const": v[out] = float(args[0]) case "add": v[out] = v[args[0]] + v[args[1]] case "sub": v[out] = v[args[0]] - v[args[1]] case "mul": v[out] = v[args[0]] * v[args[1]] case "max": v[out] = np.maximum(v[args[0]], v[args[1]]) case "min": v[out] = np.minimum(v[args[0]], v[args[1]]) case "neg": v[out] = -v[args[0]] case "square": v[out] = v[args[0]] * v[args[0]] case "sqrt": v[out] = np.sqrt(v[args[0]]) case _: raise Exception(f"unknown opcode '{op}'") out = v[out] with open('out.ppm', 'wb') as f: # write the image out f.write(f'P5\n{image_size} {image_size}\n255\n'.encode()) f.write(((out < 0) * 255).astype(np.uint8).tobytes()) On my machine, this takes about 15 seconds to produce a 1024x1024 image. (Spoilers: it also allocates 60+ GB of RAM for intermediate results, so consider reducing the image size before running it on your own machine) "But wait!", you say. "There's so much obvious room for improvement! You could pre-parse the expression, or use numba, or run it on the GPU, or use LLVM ..." Yes. I would like you to do those things, and then tell me about them! --------------------------------------------------------------------- This page has been seeded with a few of my projects, but I'd like for it to grow to contain different experiments by other people. A few ideas to think about: * The Python implementation above is a simple interpreter. What would an optimized interpreter look like? How about a compiler? * How can we take advantage of the expression's mathematical structure? * Start-up time versus steady-state performance. What kinds of precomputation help, and how long does it take? (Obviously, you could precompute the entire image, but that's against the spirit of the challenge) * Along those lines, what happens if the user is making changes to the model? Strategies which rely on expensive precomputation may not be suitable for interactive use. * How does rendering performance change with image size? A 1024x1024 image has 4x as many pixels as 512x512; does it necessarily take 4x longer to render? * Panning and zooming is a simple transform of the incoming (x,y) coordinates. How does your system handle interactive viewing of the model? Submitting your experiments Feel free to submit both successful and unsuccessful experiments via email to matt.j.keeter@gmail.com, and I will add them to this page. The ideal submission would be a blog post or code repository, along with a few-sentence description that you'd like included on this page. I'd also be happy to include hero images or video embeds. The description should call out any particularly promising results, clever ideas, or interesting tools! Don't get too fixated on absolute speed; the goal is to explore ideas, and that can absolutely be done without breaking any performance records! Is there a prize? Besides eternal fame and glory? If you submit an interesting write-up, I will buy you a coffee or drink if we happen to be in the same city. It will be particularly easy for Cambridge / Boston / Somerville residents to receive their prize; I make no guarantees for submissions from farther afield. Background material Interval Arithmetic and Recursive Subdivision for Implicit Functions and Constructive Solid Geometry (Duff '92) is a great introduction to common techniques. In particular, the combination of interval arithmetic, spatial subdivision, and expression simplification remains the basis for modern renderers. These ideas are also presented in SS3.7 of my thesis, Hierarchical Volumetric Object Representations for Digital Fabrication Workflows (2013), although my personal implementations have evolved since then (see below for examples). Finally, I gave a talk titled Implicit Surfaces & Independent Research (2025), which covers all of this material and hews closely to my current implementations. The audience was CS undergraduates in their senior year, so it's intended to be approachable! Gallery Massively Parallel Rendering of Complex Closed-Form Implicit Surfaces Matt Keeter, 2020 Paper homepage The MPR implementation compiles the expression down to a register-based bytecode, then executes it with a small interpreter running in a CUDA kernel. The interpreter evaluates the expression using interval arithmetic on many spatial regions in parallel, then simplifies the expression, subdivides the active regions, and recurses. Below a certain size, we swap over to per-pixel evaluation, also in parallel on the GPU. Expression simplification is particularly important for performance: by the time we evaluate individual pixels, the expression is 200x smaller! Most of the hard work went into making a deeply recursive algorithm (with heterogeneous workloads in each branch) into something more GPU-friendly. Rendering a 1024x1024 image takes 3.9 milliseconds using a Tesla V100, which was a high-end GPU back in 2020 (and is still absurdly powerful). Performance is basically flat through 4096x4096, indicating that we're far from saturating the GPU for this kind of 2D rendering (the paper also describes a similar strategy for 3D rendering). Unfortunately, I don't have timing numbers for setup time, but the kernels are generic interpreters, i.e. not specialized to a particular expression. Fidget Matt Keeter, 2025 Project homepage Fidget uses the same broad strategy of interval evaluation and expression simplification, but runs on the CPU and compiles expressions down to machine code with a JIT compiler. Running on an M1 Max CPU (2021), it renders a 1024x1024 image in 6.3 milliseconds, with about 11 ms of setup time: $ cargo run -pfidget-cli --release -- render2d -i models/prospero.vm \ -s1024 --eval=jit -N500 --mode mono [2025-03-23T21:26:44Z INFO fidget_cli] Loaded file in 4.994333ms [2025-03-23T21:26:44Z INFO fidget_cli] Built shape in 6.04525ms [2025-03-23T21:26:47Z INFO fidget_cli] Rendered 500x at 6.331644 ms/frame Scaling up to 4096x4096, it takes 57 milliseconds, which is about 9x slower; this is sublinear scaling, because it's 16x more pixels. There are a few interesting aspects to this project: * The JIT compiler's architecture is specialized for very fast compilation, because it's running many times to evaluate a single frame! * There's also a VM interpreter, which is almost as fast: 7.5 ms for a 1024x1024 image * Everything except the JIT compiler can be run in WebAssembly, albeit with a performance penalty Prospero challenge, now with more garbage collection Max Bernstein, 2025 Blog post Max addresses the "60 GB of intermediate results" issue by adding garbage collection to the Python interpreter, seeing a quick 4x speedup (and a dramatic reduction in RAM usage, down to about 1 GB). The post also shows off CuPy, a drop-in replacement for Numpy which offloads computation to the GPU. Simply replacing the import statement yields a further 6.6x speedup! --------------------------------------------------------------------- That's all for now; but you can help by submitting your own experiments! (c) 2025 Matt Keeter