[HN Gopher] BBC BASIC raytracer in 432 characters
       ___________________________________________________________________
        
       BBC BASIC raytracer in 432 characters
        
       Author : neffo
       Score  : 184 points
       Date   : 2024-01-17 03:53 UTC (19 hours ago)
        
 (HTM) web link (mastodon.me.uk)
 (TXT) w3m dump (mastodon.me.uk)
        
       | LeoPanthera wrote:
       | In the replies underneath, someone ran it on a real Acorn
       | Electron computer (6502 at 1 MHz), where it completed in 8 hours
       | and 40 minutes. Not too bad really.
        
         | kragen wrote:
         | that's astounding; i would have guessed months or years
        
           | LeoPanthera wrote:
           | BBC BASIC is surprisingly efficient for an interpreted
           | language. It kind of had to be, running on processors that
           | slow.
        
             | KerrAvon wrote:
             | Some good discussion of that on The BBC BASIC wiki entry:
             | https://en.wikipedia.org/wiki/BBC_BASIC (paragraph
             | beginning "Due to a number of optimizations[...]".
        
               | crimbles wrote:
               | Also you could swap out hot spots with assembly trivially
               | as it had an inline assembler. And you had indirect
               | pointers in it!
        
               | mavhc wrote:
               | The OP's code has REM based assembly it in to save space
        
               | kragen wrote:
               | is that assembly or is that the dithering table
        
               | alphabetter wrote:
               | Yes, the REM is the lookup table used in line 70.
               | 
               | Because BBC Basic had a built-in assembler it was pretty
               | uncommon for BBC programs to inline machine code as raw
               | data (unlike some other computers from BITD).
        
               | londons_explore wrote:
               | just a table of constants.
        
             | kragen wrote:
             | all the basics i used on processors that slow were
             | surprisingly inefficient, even for an interpreted languages
             | 
             | i thought they had to be, running on processors with that
             | little memory. though later on i learned about forth, which
             | _is_ surprisingly efficient for an interpreted language
             | 
             | a more likely explanation is that sophie wilson was just a
             | better hacker than bill gates and paul allen
        
               | iainmerrick wrote:
               | Yeah, BBC BASIC is _really_ good. Both the language
               | design and the implementation.
               | 
               | I loved it at the time, and the more I think back on it
               | the more impressed I am. Like it had a very decent suite
               | of floating-point routines, which if I remember right
               | were very performant. In a 32KB ROM!
        
               | forinti wrote:
               | Basic was actually a 16KB ROM. MOS occupied another 16KB
               | ROM.
        
               | crq-yml wrote:
               | Some of the improvement in BBC BASIC over Microsoft's
               | BASIC is attributable to the larger ROM(the first Altair
               | BASIC was just 4k), but it's also that the BBC Micro's
               | 6502 and DRAM was clocked higher than contemporaries.
               | It's just a faster, more refined 8-bit machine all
               | around.
        
           | bunabhucan wrote:
           | I recreated a mandlebrot zoom I saw in a library book in bbc
           | basic and the last image took two weeks. I had 8 of them
           | saved on a floppy and that would slideshow them for a 2fps
           | "movie"
        
             | kragen wrote:
             | that is fucking awesome, you are a legend
        
         | PlunderBunny wrote:
         | I'm running this on a BBC micro emulator running at period-
         | accurate speed, and it's taking just under a minute to draw one
         | line. So I think it's going to finish in about 4 hours, which
         | would make sense - approximately twice as fast as the Acorn
         | Electron.
        
           | PlunderBunny wrote:
           | Update: Finished in only 2 hours 30 minutes.
        
             | tom_ wrote:
             | The Electron runs extra slowly in MODE 1 as the video
             | system needs 100% of the RAM bandwidth to form the display.
             | The CPU essentially ends up paused during the visible
             | portion of each scan line, so the effective speed is
             | probably more like 0.6 MHz.
        
         | Sharlin wrote:
         | I wonder how much of that it spent just computing those square
         | roots.
        
           | araes wrote:
           | Does anybody happen to know what SQR algorithm BBC Basic
           | uses? http://www.bbcbasic.co.uk/bbcwin/manual/bbcwin7.html#pr
           | intha... does not seem to have the actual algorithm.
           | 
           | Intel's optimization manual has suggestions for fast versions
           | at 15.12.3 (recommended by @James
           | https://stackoverflow.com/a/2637823/10981777) Don't look
           | especially long to implement for 22-bit approximation.
           | 
           | https://software.intel.com/content/www/us/en/develop/downloa.
           | ..
           | 
           | Also, what's the SQRD? Not finding that referred to anywhere.
        
             | Sharlin wrote:
             | SQRD is just SQR D :)
        
               | araes wrote:
               | Thanks. So much more of that code makes sense now. IFD,
               | SGNU, FORN, FORM. FORM especially. So many codes now have
               | the word FORM as a special word.
               | 
               | What a nightmare to read though if you don't know what
               | it's supposed to be written like. Is it a: "FORM=" or a
               | "FOR M=" ? Is it: "S GNU:G", "SG NU:G", "SGN U:G"? SQRD
               | totally looks like some special kind of square root
               | implementation.
        
               | Sharlin wrote:
               | Yeah, early BASICs tended to let you skip the spaces, and
               | in fact it sometimes did have a measurable effect on
               | parsing/execution speed. And of course on file size
               | too... Additionally, at least C64 Basic had two-character
               | shorthands for all keywords.
        
       | kragen wrote:
       | this says `goto 50` but there are no line numbers
        
         | LeoPanthera wrote:
         | The robot automatically adds line numbers in increments of 10s.
        
         | EspadaV9 wrote:
         | If you go to the source link - https://bbcmic.ro/?t=9ctpk -
         | then you can see the numbers with the lines. They increment in
         | steps of 10 (which IIRC you could override manually with
         | numbers in-between).
        
           | PlunderBunny wrote:
           | If you're playing with this on a BBC micro emulator, the best
           | way to get this code into the emulator is to download the
           | disk image from the above link. The actual basic program is
           | named "PROGRAM" on the disc. List the code by typing LIST20,
           | (the computer will lock up if you try to list line 10).
           | 
           | If you just want to see the result, type MODE 1 and then
           | press return, then type *LOAD SCREEN and press return again.
        
           | kragen wrote:
           | thanks! i didn't see that link. but i think the 432 bytes
           | leaves the line numbers out, so it should actually be 448
           | bytes
        
             | flohofwoe wrote:
             | I guess that's a philosophical question.
             | 
             | In most home computer BASIC interpreters, the source code
             | (including line numbers) never exists as text in memory or
             | on tape/disk, only as binary tokens. The only place where
             | the original input text exists is a small line buffer which
             | gets translated into binary token form when pressing
             | Enter/Return.
             | 
             | The LIST command translates the token form back into human
             | readable text.
             | 
             | And if you use 'AUTO' you also never type any line
             | numbers... so are they actually part of the source code? Is
             | there even 'source code' when the original source is never
             | stored as text anywhere? Is the 'source code size' then the
             | size of the original text data (that actually never exists
             | a whole), or the size of binary token representation in
             | memory and on tape/disk? :)
        
               | kragen wrote:
               | sure, you might want to count the tokenized
               | representation instead of the printed representation, so
               | maybe the real number is something smaller than 432, but
               | it contains the line numbers too. you definitely don't
               | want to count the 432 bytes of printed representation and
               | leave out the 16 bytes of line numbers because the
               | program won't run without them
        
       | ManuelKiessling wrote:
       | Ok, this is probably a dumb question: I get that the code traces
       | the rays for this scene, obviously -- but where is the scene,
       | then? That is, where is the "data" (as opposed to the algorithm)
       | that says "there's a sphere at coordinates x,y,z with radius r,
       | and here's the other one, and here's the checkerboard plane"?
        
         | sp332 wrote:
         | W=1/SQR(U*U+V*V+1)
         | 
         | This makes a sphere. It gets multiplied by two coordinates to
         | make spheres at different places, U and V.
        
           | kragen wrote:
           | i think this is incorrect, i think that's computing the z
           | component of the initial direction vector of the ray being
           | traced
           | 
           | if you change the initialization of i from sgn u to 1.2 * sgn
           | u you will get an image with the spheres a little further
           | apart
        
             | Sharlin wrote:
             | Yeah, that's something of a red herring. The actual ray-
             | sphere intersection happens on line 50:
             | // sphere_center = (E, F, Z)         E = X - I          F =
             | Y - I         // solving the ray-sphere quadratic eqn...
             | P = U * E + V * F - W * Z         // discriminant         D
             | = P * P - E * E - F * F - Z * Z + 1         // if solution
             | (ie. intersection) exists         IF D > 0            //
             | intersect_point = ray_orig + T * ray_dir           // this
             | is the closer pt, -P + SQR D is the other            T = -P
             | - SQR D
        
               | kragen wrote:
               | yeah, that was what i guessed it was doing, but i
               | couldn't remember or rederive the ray-sphere intersection
               | formula to be sure
        
         | Sharlin wrote:
         | It's mostly implicit. See how the spheres are symmetric about
         | the centerpoint? (Well, almost, the camera is at Y = -0.1.) The
         | (X, Y) coordinates of their centers are simply (-1, -1) and (1,
         | 1), given by the `I = SGN U` variable at the end of line 40.
         | The Z is implicitly zero. Their radius is 1. The plane is
         | basically defined by `P = Y - 2` at line 60.
        
           | Sharlin wrote:
           | ...to elucidate a bit, when tracing the rays on the left side
           | of the image, the sign of u is -1, so we're trying to
           | intersect a sphere at (x-1, y-1, z), and on the right side
           | similarly one at (x+1, y+1, z). This works because none of
           | the left-side rays can even in theory hit the right-side
           | sphere and vice versa. It's a very simple version of space
           | partitioning optimization. And when tracing the reflected
           | rays, we flip the sign of i, because there's no self-
           | reflection - the left-hand sphere can only reflect the right-
           | hand one and vice versa.
        
             | kragen wrote:
             | these comments were extremely helpful, thank you
        
               | Sharlin wrote:
               | No problem, this was a fun puzzle to solve =D
        
         | flohofwoe wrote:
         | The 3D scene is described by formulas instead of data. It's a
         | bit similar to how SDF demos describe a 3D scene by combing
         | shape formulas into a single formula for the whole scene (SDF =
         | signed distance fields).
         | 
         | E.g. see:
         | 
         | https://iquilezles.org/articles/distfunctions/
        
           | kragen wrote:
           | while i know you didn't say this was using an sdf, i want to
           | point out that it doesn't seem to be using an sdf, just in
           | case anyone is wondering; it seems to be a conventional
           | whitted-style raytracer that computes sphere-ray
           | intersections in closed form by solving the quadratic
        
         | kragen wrote:
         | sharlin has explained how the sphere coordinates are computed;
         | the checkerboard plane is implemented by converting downward-
         | pointing vectors (where the y-component v is negative) into
         | upward-pointing vectors toward a point in the sky with the
         | right color                   IF V < 0 then P = (Y + 2)/V : V =
         | -V*((INT(X - U*P) + INT(Z - W*P) AND 1)/2 + .3) + .2
         | 
         | then the (palette index of?) the color of the sky is computed
         | from the square root of the y-coordinate of the direction
         | vector as                   GCOL 0, 3 - (48*SQR V) DIV 16
         | 
         | plus a fixed dithering table
        
       | ChrisArchitect wrote:
       | This one was from 2021.
       | 
       | Do enjoy the insane things ppl managed to cram into tweets to the
       | bbcmicrobot over the years. Anyone who's followed any part of the
       | demoscene especially the classic smaller payload demos.. 4k
       | etc.....this kinda thing is mindblowing and totally in the same
       | spirit.
        
       | po wrote:
       | Also, don't miss the gallery site! https://www.bbcmicrobot.com
        
         | neffo wrote:
         | the gallery site is amazing in and of itself. the gallery isn't
         | PNGs or GIFs - it's generated in the browser from the emulator
         | state. if you hover over an item it resumes the emulator.
         | 
         | https://mastodon.me.uk/@bbcmicrobot/111760484438439751
        
       | jug wrote:
       | This effort is truly specular!
        
       | rwmj wrote:
       | https://www.pouet.net/prod.php?which=78045
       | 
       | Incredible raytraced demo in 256 bytes.
        
         | briansm wrote:
         | https://www.pouet.net/prod.php?which=53871
         | 
         | https://www.youtube.com/watch?v=36BPql6Nl_U
         | 
         | Incredible raytraced demo in 128 bytes. (An underwater journey
         | through a Menger-sponge fractal)
        
         | kragen wrote:
         | last year i wrote a tetris in arm assembly
         | https://asciinema.org/a/622461 and was pleased to get it down
         | below 1024 bytes
         | 
         | then i looked and rrrola's 4is256
         | https://www.pouet.net/prod.php?which=29286 is a working tetris
         | in under 256 bytes, and it's better than mine because it has
         | colors and scoring. i could blame a little bit of inflation on
         | arm being 32-bit rather than 16-bit, but not 4x
        
           | bbojan wrote:
           | You could try using Thumb instructions, they are 16 bit.
        
             | kragen wrote:
             | i was, but i wasn't talking about the size of the
             | instruction word, but the size of the registers
        
           | userbinator wrote:
           | The upper limit of instruction density on x86 is much higher
           | than any RISC.
        
             | kragen wrote:
             | that hasn't been my experience with instructions i've
             | written myself, but then again, i'm not rrrola
        
       | SillyUsername wrote:
       | Can anybody provide de-obfuscated/minified source so those of us
       | who understand the principle (tracing rays of light around a
       | scene) can understand the maths?
        
         | kragen wrote:
         | here's my attempt                   10 REMAECG$C*Ec!ae-e'd
         | rem dithering table in comment above consulted by GCOL line?
         | rem raytracer from https://bbcmic.ro/?t=9ctpk by
         | https://mastodon.me.uk/@coprolite9000            rem
         | deobfuscated and possibly broken by kragen javier sitaker
         | (untested)         20 MODE 1            VDU 5         30 FOR N
         | = 8 TO 247         : rem loop over rows (n) and columns (m) of
         | pixels               FOR M = 0 TO 319         40       X = 0
         | : rem three-dimensional vector                  Y = -.1
         | Z = 3                       U = (M - 159.5)/160    : rem xform
         | screen coords to direction vector                  V = (N -
         | 127.5)/160                       W = 1/SQR(U*U + V*V + 1) : rem
         | normalize direction vector                  U = U * W
         | V = V * W                       I = SGN U    : rem derive
         | coordinate of sphere center (i, i, 1)?                  G = 1
         | 50          E = X - I : rem beginning of do-while loop
         | F = Y - I                     P = U*E + V*F - W*Z
         | D = P*P - E*E - F*F - Z*Z + 1                     IF D <= 0
         | then goto 60                        T = -P - SQR D
         | IF T <= 0 then goto 60                            X = X + T*U
         | Y = Y + T*V                            Z = Z - T*W
         | E = X - I                            F = Y - I
         | G = Z                                 P = 2*(U*E + V*F - W*G)
         | U = U - P*E                            V = V - P*F
         | W = W + P*G                                 I = -I  : rem if we
         | are reflecting then check other sphere now
         | GOTO 50 : rem end of do-while loop, we have our escape from the
         | scene geometry                       rem color according to
         | y-component of direction vector to get sky gradient
         | rem if instead y component is negative, make a checkerboard of
         | pieces of sky         60       IF V < 0 then P = (Y + 2)/V : V
         | = -V*((INT(X - U*P) + INT(Z - W*P) AND 1)/2 + .3) + .2
         | 70       GCOL 0, 3 - (48*SQR V + ?(PAGE + 5 + M MOD 4 + N MOD
         | 4*4)/3) DIV 16         80       PLOT 69, 4*M, 4*N
         | NEXT,
         | 
         | i'm interested to hear what i got wrong
         | 
         | see also https://news.ycombinator.com/item?id=39026517 and
         | https://news.ycombinator.com/item?id=39026495
        
           | Sharlin wrote:
           | > rem if instead y component is negative, make a checkerboard
           | of pieces of sky
           | 
           | I'd maybe phrase this a bit differently - in both cases it
           | uses the v coordinate to calculate the shade but whereas the
           | sky is a simple gradient, the plane has a "ground fog" type
           | effect that can be adjusted with those +.3 and +.2 magic
           | constants.
        
             | kragen wrote:
             | i was trying to figure out how the ground fog happens, and
             | i still don't
        
               | Sharlin wrote:
               | The value of v corresponds to how far away the
               | intersection point is. At the horizon v=0 so the color
               | value is just constant 0.2. The closer to the camera, the
               | more the checker value (0 or 1) contributes. The exact
               | values are probably chosen experimentally, but note the
               | important property that -v*(checker/2+0.3)+0.2 is always
               | between 0 and 1. Note also that 1 means black and 0 white
               | here, as ultimately the value is negated at line 70.
        
       | kragen wrote:
       | a thing that puzzles me is how https://bbcmic.ro/?t=9ctpk is only
       | 4x faster in emulation (about 30 seconds per scan line and so on
       | the order of 2 hours for the whole image)
       | 
       | i'm running this on a ryzen 5 3500u at 2400 megahertz. the acorn
       | electron which supposedly takes 8 hours and 40 minutes is a 1
       | megahertz 6502 when running from ram, roughly, 262144
       | instructions per second. at 2 ipc one core of the ryzen should be
       | about 4800 mips. if the emulation slowdown is 10x (10 host
       | instructions to emulate one guest instruction), which is typical
       | for naive interpretation, it should be about 1000x faster on this
       | laptop as on the original hardware
       | 
       | (possibly the basic is in rom and so it's closer to 524288
       | instructions per second)
       | 
       | emulation through qemu-like dynamic code generation should cut
       | the 10x slowdown to 3x, so it should be 3000x faster than the
       | original hardware
       | 
       | where did that factor of 1000 go? not the javascript engine,
       | surely?
       | 
       | incidentally there is a rocket-ship button underneath the output
       | screen image labeled 'send to beebjit' which runs the program in
       | about a second
        
         | flohofwoe wrote:
         | If the emulator is entirely cycle correct for the entire system
         | (including the video system, which is usually the most
         | expensive to emulate, not the CPU), then the speedup on modern
         | computers will be much less than just comparing the clock
         | frequencies.
         | 
         | Consider that each chip of a home computer system (CPU, 1..2
         | IO/timer chips, audio, video, ...) needs to do 'emulation work'
         | for each 1 or 2 MHz clock cycle which can add up to quite a
         | number of host computer instructions (dozens to hundreds).
         | 
         | If each chip emulator just takes around 10..20 host system
         | clock cycles to emulate one emulator clock cycle, then you are
         | already looking at around 100 host system clock cycles per
         | emulated clock cycle for the entire home computer (in reality
         | it's probably worse).
         | 
         | Such 'vertically sliced' emulation code is also full of
         | conditional branches which put a heavy burden on the host CPU
         | branch predictor.
         | 
         | ...and that's how a theoretical 1000x speedup suddenly becomes
         | a 10x speedup, it's not the CPU emulation (this is usually
         | cheap) but the rest of the emulated system which can be
         | expensive.
         | 
         | Different emulators use all sorts of tricks and shortcuts, but
         | usually with tradeoffs like less precise emulation, or less
         | 'compartmentalized' code.
         | 
         | PS: case in point this is just the top-level per-cycle function
         | in my C64 emulator, which in turn calls per-cycle-functions in
         | each of the chip emulators (which may each be just as much
         | code):
         | 
         | https://github.com/floooh/chips/blob/9a7f6d659b5d1bbf72bc8d0...
         | 
         | I'm trying to strike a balance between 'purity' (e.g. an entire
         | emulated system can be 'plugged together' by wiring together
         | chip emulators, just like one would build a real 8-bit computer
         | on a breadboard), while still being fast enough to comfortably
         | run in realtime even in browsers
         | (https://floooh.github.io/tiny8bit/c64.html).
         | 
         | It's definitely possible to implement a faster cycle-correct
         | C64 emulator with a less 'pure' architecture, but it's quite
         | hard to do meaningful optimizations while keeping that same
         | 'virtual breadboard' architecture.
         | 
         | ...considering all the code that runs per cycle it's actually
         | amazing how fast modern CPUs are :)
        
           | kragen wrote:
           | that's an excellent point, i hadn't considered that they
           | might be running this program in basic on a cycle-correct
           | emulator of the bbc micro. thank you very much for explaining
        
         | londons_explore wrote:
         | The rocket ship icon seems to have disappeared. Maybe I used it
         | too much, or maybe the site owner removed it? (was it
         | serverside and overloaded?)
        
           | kragen wrote:
           | i had that problem when i visited via a saved url in
           | tinyurl.com
        
         | tom_ wrote:
         | The Acorn Electron is a very different design, and the BBC
         | Micro runs measurably faster (particularly in this case due to
         | the high res graphics). The bbcmic.ro emulation will almost
         | certainly be pretty much exactly the same speed as real BBC
         | Micro.
         | 
         | The Firefox profiler suggests it's spending most of its time
         | doing nothing, waiting for the next
         | Window.requestAnimationFrame.
        
       | londons_explore wrote:
       | In 398 characters, I can give it error diffusion dithering
       | capabilities, which makes the code smaller and the output quality
       | a bit better:
       | 
       | https://bbcmic.ro/#%7B%22v%22%3A1%2C%22program%22%3A%22MODE1...
       | 
       | It would look even better with bidirectional error diffusion, but
       | that requires reading back memory which I don't know how to do.
        
         | logicallee wrote:
         | Thanks for sharing. I asked ChatGPT to analyze your code, while
         | telling it what it does. How well do you think it did? Did it
         | make any mistakes?
         | 
         | https://chat.openai.com/share/6e754fa1-531d-432e-a767-8c8132...
        
           | bmsleight_ wrote:
           | Wow - I never thought of using ChatGPT for analysing code. It
           | reads well - but does the author agree ?
        
           | CamperBob2 wrote:
           | I just asked it to reformat the code for better readability,
           | and it guessed it was a ray tracer. I for one welcome our new
           | robot^H^H^H^H^Hparrot overlords.
           | 
           | Also tried having it generate a JavaScript version, but the
           | output doesn't look right: https://chat.openai.com/share/5d76
           | e32b-81a1-4b4a-a808-7682a8... I'd be inclined to suspect the
           | lack of line numbers, as a first guess towards the problem.
        
         | aardvark179 wrote:
         | It makes the output better in some ways, but since it's just 1
         | dimensional and there isn't really any noise in the image it
         | does produce a very stripy image on the portion I let it run
         | for.
        
         | qingcharles wrote:
         | I wish there was an option to run at greater than realtime!
        
       | anovikov wrote:
       | Just curious how fast could the same thing be if someone who
       | knows the hardware really well, wrote it in assembly.
        
         | aardvark179 wrote:
         | There's a bunch of things you can do. Replacing all the square
         | roots with a lookup table might be doable, and you could make
         | the final drawing faster by not using the generic plot
         | routines.
         | 
         | You don't have a lot of memory to play with though as you only
         | have 32k to start with and the frame buffer will use 20k of
         | that, so I'm not sure how good a look up table you could make
         | for SQR.
        
       | Sharlin wrote:
       | In a subthread, I got nerdsniped to figure out how this works,
       | and for fun transcribed it ~faithfully to Rust - this one just
       | outputs ASCII art instead of plotting pixels. You'll want to zoom
       | out your browser as far as you can:
       | 
       | https://www.rustexplorer.com/b#Zm4gbWFpbigpIHsKICAgIC8vIHJvd...
        
       | maxglute wrote:
       | The dither effect looks really neat on ray trace, then I
       | remembered thats just your average news paper photo. Still would
       | be cool to see it in a game.
        
         | mungoman2 wrote:
         | Check out the game Return of the Obra Dinn for exactly this.
        
       | mixedmath wrote:
       | This is interesting and serves as a potential "hook" for getting
       | people interested in coding. I understand the BASIC appeal!
       | 
       | In something like python, an analogous thing is turtle graphics.
       | Is there something more similar (but still very basic) in python
       | for something like this? A mini graphical language?
        
         | simonklitj wrote:
         | Hmm, something like graphics.py maybe?
         | https://mcsp.wartburg.edu/zelle/python/ppics2/code/graphics....
        
           | mixedmath wrote:
           | Yeah, something like that. It's hard to be both expressive
           | and useful.
           | 
           | This is a small set of primitives around a tkinter canvas. I
           | suppose one could expose a small set of primitives around a
           | pygame too.
           | 
           | I have to think hard about what exactly I would want.
        
       | miohtama wrote:
       | Nice! Is there a commented source code?
        
       ___________________________________________________________________
       (page generated 2024-01-17 23:02 UTC)