[HN Gopher] A toy RTOS inside Super Mario Bros. using emulator s...
___________________________________________________________________
A toy RTOS inside Super Mario Bros. using emulator save states
This started as a throwaway metaphor in a blog post, but is now
fully runnable: a toy RTOS with preemptive multitasking inside of
Super Mario Bros. on the NES. Essentially, this is: - A
rudimentary preemptive RTOS - Using an unmodified NES emulator
(FCEUX) as the CPU - "Unmodified" depending on
how you define terms - With emulator save states as the
thread contexts - With support for (very basic) mutexes, interrupt
masking, and condition variables - Demonstrated using Super Mario
Bros. 1-1 with sections of the map dedicated to various
synchronization primitives There are many simplifications and
shortcuts taken (doesn't even have task priorities), and it doesn't
map 1:1 to true multithreading (e.g., emulator save states
represent the state of the entire machine including RAM, whereas
thread contexts represent a much more minimal slice), but I think
it's A) pretty interesting and B) a unique visceral explanation of
threads.
Author : notorious_pgb
Score : 242 points
Date : 2025-05-28 20:15 UTC (1 days ago)
(HTM) web link (prettygoodblog.com)
(TXT) w3m dump (prettygoodblog.com)
| ignormies wrote:
| This is a super cool visual demonstration of RTOS/scheduling! I
| love the region-based critical sections!
|
| I took a real-time operating systems course in university as an
| elective. One of the hardest courses I took the whole four years,
| but also one of the most interesting. Had a great professor, who
| gave really demanding, but very instructive, project-based
| assignments.
|
| I need to find a toy project to play around with this domain
| again.
| notorious_pgb wrote:
| Thank you!
|
| I'm curious how effective you feel this specific example
| might've been if it were delivered during your course. I
| _suspect_ I 've stumbled across a really helpful teaching tool,
| but having not gone to university, I don't actually know how
| this stuff is being taught :v
| throwanem wrote:
| Nor I, but _I 've_ learned from the metaphor, for what that's
| worth. And the demystification of primitives has considerable
| value these days.
| notorious_pgb wrote:
| > Nor I, but I've learned from the metaphor, for what
| that's worth.
|
| It's the sickest possible outcome for me, so: worth a lot!
| Refreeze5224 wrote:
| This is a fantastic way of demonstrating threading and
| scheduling, well done!
| bitwize wrote:
| "...But first we need to talk about parallel universes."
| dsfdsfsfew wrote:
| Kind of pushing the definition of RTOS here, I was expecting
| something like using ACE to run a concurrent program, still an
| interesting and informative post.
| notorious_pgb wrote:
| You're goddamn right it's pushing it.
|
| I struggled with how to convey this; the ultimate goal is to
| _viscerally_ demonstrate _what a thread IS_ , so I'm
| comfortable with it being... something strange... but I do want
| it to resemble actual concurrent systems to a useful degree.
|
| Like, the thread scheduler code is userland code -- similar to,
| say, FreeRTOS -- but the "threads" themselves aren't _exactly_
| threads in an exact sense, since their context packages are
| entire save states for a console, not just its processor.
|
| Also, the game code (the true userland?) has no ability to
| access the threading API whatsoever, which really harms the
| analogy.
|
| So I went with RTOS because it's a preemptive scheduler with
| synchronous reschedule-on-block; where the scheduler is
| implemented in the same codebase as the code using it. But to
| be honest, nothing I know of fits.
| loas wrote:
| I really enjoyed this, thank you.
| shakna wrote:
| If you're going to make this your DooM thing, clearly the next
| thing to do is grab a DooM engine and get to work. (PsyDoom has
| Lua support even).
|
| But congrats. This is worth being included in a uni course.
| notorious_pgb wrote:
| It'd be really interesting to push this to that level -- the
| big challenge likely being that DooM's state (including video
| memory) is certainly much larger and expensive to swap than an
| NES save state.
|
| One would have to get really clever...
| noone_youknow wrote:
| ... and introduce something like virtual memory?
| anthk wrote:
| Forth does that too.
| https://www.bradrodriguez.com/papers/mtasking.html
| gwbas1c wrote:
| Last night I read an article about eliminating lag in emulators.
| It's done with a similar concept. Basically, for each frame, the
| emulator calculates the state for different button combinations.
| Then, based on the button you actually push, the state shown
| moves to the precalculated one.
| jchw wrote:
| Not 100% familiar with exactly what this is but I am familiar
| with run-ahead, which is basically also the same idea as GGPO,
| but for eliminating lag:
|
| - Run the emulator one frame normally, using real polled input.
| Somehow snapshot the state.
|
| - Then, run the emulator n frames (usually just one) with the
| same input. Present the video + audio from the _last_ frame.
|
| - Synchronize. (Some emulators can get very fancy; instead of
| just waiting for vsync, they'll also delay until the end of the
| window minus estimated processing time, to poll input at the
| last possible moment.)
|
| - Roll back to the saved snapshot. (I believe you can also
| optimize if you know the inputs really didn't change, avoiding
| a lot of rollbacks at the cost of less predictable frame
| times.)
|
| The main reason this is even a good idea is because most games
| will have some of their own processing latency by design, so
| jumping a frame or two ahead usually doesn't have any
| noticeable side-effects. This is a pretty cool idea since
| obviously modern computers with LCD screens have a lot more
| latency basically everywhere versus older simpler machines
| connected to CRTs.
|
| Unfortunately, this sort of approach only works when your
| emulator's state is small enough and fast enough to restore.
|
| I actually have been dying to experiment with designing an
| emulator to have fast incremental snapshots from the ground up
| to see if you could manage to make this feasible for more
| modern consoles. You could, for example, track dirty memory
| pages with userfaultfd/MEM_WRITE_WATCH, and design structures
| like JIT caches to be able to handle rewinding without having
| to drop the entire cache. I'm actually not sure that all
| emulators clear their caches upon loading state, but you know,
| more generally, I would like to know how fast and small you
| could get save states to be if you were designing for that from
| the ground up.
| achierius wrote:
| What do you mean by JIT cache in this case -- like an inline
| data cache, or like a cache for the jitted binary code?
| jchw wrote:
| The latter, though I'm just using it as an example really.
| I think for most emulator core designs today you wind up
| dumping a lot of ephemeral state when you rewind or load
| state (and sometimes even saving state has some overhead
| due to synchronization.)
| cortesoft wrote:
| i think this is kind of the inverse of how some emulators that
| add remote play deal with network lag... they will predict what
| control the opponent will give for the current frame and
| display it, but then when they actually receive the real
| opponent control some number of frames later, they will go back
| to that frame and re-calculate based on knowing the actual
| control input for that frame, and then catch up to the current
| frame. That way the game isn't waiting for remote inputs to
| refresh the screen, but can then reconcile once all information
| is known.
| potato3732842 wrote:
| Sounds a lot like speculative execution for hardware.
| sparky_ wrote:
| Was just thinking the same, this is effectively branch
| prediction/precalculation.
| cevn wrote:
| Umm, that's brilliant. Is this technique employed in all modern
| emulators or a recent thing?
| notorious_pgb wrote:
| Interesting; which emulator(s) was this for? I have to imagine
| this strategy is mostly effective below a certain total
| complexity level -- however you want to define that -- and the
| more modern you get, the more you're lucky to even render a
| frame at all.
| liamwire wrote:
| Brilliant visual demonstration, really helps build an intuition
| for the concepts at play.
| Sohcahtoa82 wrote:
| That's pretty neat.
|
| The metaphor I usually go for when it comes to for threads is
| having several widgets that need to be assembled. To begin
| working on Widget B, you have to pause the work on Widget A.
| Adding SMP is like adding a second person to help assemble
| widgets, but you're still using the same toolbox, so if you both
| need the same screwdriver, there's no performance benefit to
| having a second person. Multicore is having multiple workbenches
| each with their own toolbox so they can operate completely in
| parallel.
|
| A mutex is like having a specialized tool that can only be used
| by one widget assembly at a time. Like, each workbench might have
| a full set of screwdrivers, hammers, wrenches, sockets, etc., but
| you only have 1 welder.
|
| A semaphore is a tool that can be used by a limited number of
| widgets, like an oven that can fit up to 4 widgets.
| eru wrote:
| As a special challenge: explain (Software) Transactional
| Memory, as eg implemented in Haskell, in your metaphor.
|
| STM is pretty similar to how some databases implement
| transactions.
| ImPostingOnHN wrote:
| _> Adding SMP is like adding a second person to help assemble
| widgets, but you 're still using the same toolbox, so if you
| both need the same screwdriver, there's no performance benefit
| to having a second person._
|
| This sounds like hyperthreading - or, maybe I'm missing the
| screwdriver in the metaphor :)
| ranger207 wrote:
| Hyperthreading is Intel's implementation of SMP
| Sohcahtoa82 wrote:
| Funny thing is, when I was first typing that comment, I
| started with "Hyperthreading" and deleted it in favor of
| SMP to be vendor agnostic.
|
| EDIT: And it should have been SMT anyways!
| _ihaque wrote:
| I think you mean SMT (simultaneous multithreading), as
| opposed to SMP (symmetric multiprocessing)?
| Sohcahtoa82 wrote:
| Ah hell, that was MY mistake. It should be SMT in my
| original comment and it's way too late to edit it now.
| notorious_pgb wrote:
| I think this is a pretty damn good metaphor for the
| _introduction_ of these concepts, especially in a context where
| the learner(s) might need to imminently apply this knowledge.
|
| What drove me to write this post (as someone who didn't attend
| college where I might've learned exactly what I'm about to say
| -- big grain of salt) is that it took me being forced by
| circumstance to become a firmware engineer for a time to
| actually understand not just _how to multithread well_ , but
| _what the hell this all even was_.
|
| And I guess I saw one too many Reddit comments (mistake #1)
| which downplayed the idea that there's any benefit to knowing
| the answer to such things.
|
| That all said, this blog post / metaphor is definitely aimed at
| someone who's already been using threads effectively for some
| time -- I don't think it'd fare very well as a first
| introduction.
| iwontberude wrote:
| SMP is a second processor, even more redundant than a second
| core. Its own cache, memory, Northbridge access, etc.
| Sohcahtoa82 wrote:
| Yeah, I messed up SMT vs SMP. Can't edit it now, though.
| notorious_pgb wrote:
| Ah well -- the article itself has its own fair share of
| accuracy issues -- and I learned something from your
| conversation, which wouldn't have happened without said
| mistake.
| aftbit wrote:
| https://web.archive.org/web/20250528203416/https://prettygoo...
|
| https://archive.ph/msHkO
| hei-lima wrote:
| Wow! So cool!
| noone_youknow wrote:
| This is simply amazing work, I take my hat off to you. And not
| just the threading in an emulator (which is a genius concept all
| by itself) but for the article itself - I have spent years trying
| to find the right way to articulate exactly your points in hopes
| of engaging people to learn more about the systems they're
| building on top of.
|
| From now on, I'll simply send out a link to this. Thank you!
| notorious_pgb wrote:
| My person!
|
| I worried that I'd come across as a high-horse dickhead in the
| mystification section, when that's the fundamental opposite of
| my intention and attitude here. I'm glad to hear that it
| resonated with another.
|
| Although I guess we could both be high-horse dickheads.
| upghost wrote:
| Not only that but now, thanks to you, we can all be high-
| horse dickheads _concurrently_.
| peterkelly wrote:
| The mystification section was my favorite part - you make
| some excellent points there.
| noone_youknow wrote:
| IMO the mystification section is right on the money, I
| couldn't have said it better myself (and believe me, I've
| tried).
| CGamesPlay wrote:
| Would be neat if the subworld memory was shared between the save
| states, to really drive home the mutex concept.
| notorious_pgb wrote:
| This is one of the core weaknesses of the analogy that I'm
| trying to figure out how to resolve: there's no shared memory,
| unlike actual threads.
|
| I think the solution requires that the _game code_ is what 's
| executing threading syscalls, not the Lua around the game code.
|
| It'd definitely be a super sick evolution to add true shared
| in-game memory.
| kelnos wrote:
| I guess in a way the three Mario threads are more like
| processes? And they're using OS-level or shared-memory
| mutexes and condvars that can be shared between processes?
| notorious_pgb wrote:
| I see what you're getting at -- though then it raises the
| question: what would it mean for those processes to have
| multiple threads within them?
|
| The way I've been looking at it, if I want to take this
| further, processes would maybe be _NES ROMs_, and each
| process can have multiple threads. Swapping between
| processes would inherently be more expensive than between
| threads within a process, which I think would be a really
| instructive aspect. Also, the whole idea of "sharing
| memory" between _entirely different games_ is obviously
| ridiculous, which could be yet another teaching tool.
|
| Then if we want to expand it even _further_ , maybe
| multicore support would look like multiple emulator
| instances, connected via sockets to support cross-core IPC.
| You'd probably want to institute an "all the threads of a
| process have to be on the same core" rule (so they can
| locally share primitives in the same Lua script), which
| would be a great way to demonstrate the principle of
| building systems which adapt to their realistic
| constraints, as opposed to building systems which exactly
| model a textbook.
|
| I've gotten ahead of myself, though.
| Ericson2314 wrote:
| Easier to leave out the parallelism and just look at
| concurrency.
|
| For a game like Mario, you can split up the memory and
| decide what you want to be shared and what do you want to
| be per-thread. E.g. starting small you can rig up just a
| few fixed variables like scores or whatever to be
| persisted from one game to the next. trying to push that
| further without causing endless corruption should be fun
| :)
| leni536 wrote:
| You have critical section in the bonus section behind the
| pipe. Maybe shared state protected by that critical section
| could be your shared memory, like the state of the coins
| there.
| Cloudef wrote:
| Isn't this more like coroutines / green threads? I guess it
| doesn't matter as they are essentially same concept apart from
| cooperative scheduling vs. preemptive scheduling.
| notorious_pgb wrote:
| I suppose the _actual thing it is_ is closer to what you 're
| describing, but the _thing that it makes viscerally
| understandable_ is threads itself -- I hope.
| abyesilyurt wrote:
| Tsoding recently built coroutines for C from scratch. The concept
| of coroutines is related to threads, so I bet you'd like the
| video if you liked the article.
|
| https://youtu.be/sYSP_elDdZw
| Ericson2314 wrote:
| Sorry to be an pedant.
|
| From your previous post
|
| > The first thing to understand about threads is that processors
| do not support them.
|
| Yes, but I would say by changing the emulator (augmenting the
| emulator software with your other software, sure) in the
| _outside_ world, not doing something in the _inside_ world, you
| have are implementing something closer akin to hardware
| concurrency support or processes.
|
| > It doesn't map 1:1 to true multithreading (e.g., emulator save
| states represent the state of the entire machine including RAM,
| whereas thread contexts represent a much more minimal slice
|
| Right, but I think if we're going to make a threads vs process
| distinction (vs concurrency in general) whether we have only a
| small opt-in set of communication channels / synchronization
| primitives (the mutex and barrier), or a wide open one (shared
| memory, be careful!) is the operative distinction.
| mort96 wrote:
| This is probably a stupid question, but why real-time? Isn't it
| enough to "just" make an OS in SMB, doesn't adding real-time
| guarantees make it even more difficult, by restricting what
| allocator and scheduling algorithms you can use?
|
| In any case, this looks like a really cool project.
___________________________________________________________________
(page generated 2025-05-29 23:01 UTC)