[HN Gopher] What's the big deal about Deterministic Simulation T...
___________________________________________________________________
What's the big deal about Deterministic Simulation Testing?
Author : todsacerdoti
Score : 70 points
Date : 2024-08-20 11:16 UTC (11 hours ago)
(HTM) web link (notes.eatonphil.com)
(TXT) w3m dump (notes.eatonphil.com)
| gguergabo wrote:
| Thanks for this, Phil! Blog posts like this one that break down
| complex topics into digestible pieces are a big help for the
| space and are some of my favorites.
|
| Antithesis employee here. Happy to jump in and answer any burning
| questions people might have about Deterministic Simulation
| Testing (DST).
| msarrel wrote:
| Good stuff, thanks
| RsmFz wrote:
| We have that here at Firezone. To deal with I/O we don't use
| Tokio inside the test boundary at all, just futures. So no I/O,
| no sleeping, etc. Thomas explained it here https://firezone-git-
| docs-blogsans-io-firezone.vercel.app/bl...
|
| I haven't dealt with it directly on Firezone but I wrote one or
| two games this way for game jams years ago, and I keep wishing it
| would catch on. It was harder with the games because floating-
| point math doesn't like to be deterministic across platforms.
| rhplus wrote:
| For anyone writing services in C# there's a project from MSR
| called Coyote that does similar deterministic simulation testing
| by systematically testing interleavings of async code.
|
| https://microsoft.github.io/coyote/
| RandomThoughts3 wrote:
| Considering you actually have to design around DST, I'm still
| widely unconvinced that the time and effort spent setting up DST
| and fuzzing hoping it finds your bugs wouldn't be better spent
| actually proving that your design is bug free using tools like
| TLA+ before intelligently using static analysis and formal proof
| during implementation.
|
| I believe DST to be the wrong solution to the actual problem. Its
| main advantage is that it doesn't require that people used to
| design distributed system actually acquire a new skill set and it
| doesn't challenge the status quo too much (after all it's pretty
| much just fuzzing on path you have chosen to make fuzzable).
| hwayne wrote:
| One of the big problems with using TLA+ is that it verifies
| your design, not your code. People are looking for ways to link
| the two. Formal proof works but is too expensive for most
| businesses.
|
| The most promising approach I've seen so far is... DST! First
| we simulate a system, we generate a bunch of timelines, then we
| see if those timelines are valid behaviors in the TLA+ design.
| I've heard of a few success stories and it's definitely cheaper
| than formal proof!
| joncrocks wrote:
| I don't think you have to have systems in the same thread/process
| if you have bake in an API for controlling time and
| ingress/egress for each component. (depending on what you're
| trying to test)
|
| You can have the communication channels between components under
| the control of the simulation environment rather than have them
| happen in their 'normal' manner. This allows you to inject
| latency between components, 'fiddle' with the inputs/outputs as
| suggested as well as record messaging (assuming you're working
| with systems that exchange messages/events) with appropriate
| event times.
|
| Another important point around clocks is that you'll want to
| include a scheduling API in your clock components to be able to
| schedule events for themselves in the future. If you start moving
| to event-time then being able to fast-forward in time is another
| advantage.
|
| Overall it's worth considering the class of issue you're trying
| to detect with this approach. It's great to be able to run things
| in event time, debug ad-nauseam but you're not going to catch
| race conditions without the smarts mentioned around native
| scheduling.
|
| There are some parallels as well between this style of testing
| and production replication, so that's good to have at the back of
| your mind if you're looking to collect production
| telemetry/ingress + egress data. If you build a good system for
| this type of testing producing reproduceable code will be part of
| your development process and you're more likely to be able to
| produce tooling for investigation/reproduction of production
| issues.
|
| (The context in which I do work that's kind of in this area is
| robust backtesting of trading systems)
| 10000truths wrote:
| The reason for using a single underlying thread/process is to
| prevent the OS scheduler from interfering with deterministic
| execution. You can't control how and when the OS scheduler
| kicks in, nor can you perfectly reproduce the clock
| drift/jitter between multiple cores. If the program under test
| spawns threads, then you'll have to emulate the execution of
| those threads by writing your own scheduler whose time slicing
| and scheduling policies are done deterministically.
| Veserv wrote:
| You can control scheduling; that is how many record-replay
| based time-traveling debuggers do it.
|
| Also, scheduling is independent of deterministic execution
| unless you are doing inherently non-deterministic things like
| multithreaded shared memory accesses which you can not
| simulate faithfully anyways. The only thing that matters in a
| deterministic execution model is runs of deterministic
| execution interrupted with non-deterministic events injected
| at precise points in the execution trace.
|
| When serializing onto a single thread you already need to
| define some sort of correspondence between "simulated
| scheduler state" to number of instructions to execute as you
| are already giving up on the actual scheduler (unless you do
| not care about correspondence to the actual schedule
| configuration). You just do that, but you get to execute with
| all of your cores until you reach the injection point (which
| is how replay systems can work already). Now you can execute
| in parallel (multiprocessing only though, no multithreading)
| and use blocking I/O.
| a_t48 wrote:
| This is actually an area I'm solving right now in the robotics
| space - you've got it exactly right. You need to restrict usage
| of calls to query the system time, deep control over the
| message passing later, and a custom scheduler for executing
| message handlers.
|
| Edit: trading systems isn't an area I considered for this work.
| A lot of parallels, though.
| skybrian wrote:
| JavaScript is single-threaded, but I/O (events) introduce
| nondeterminism. I'm wondering if there are tools that let you
| control how events get scheduled when testing async code?
___________________________________________________________________
(page generated 2024-08-20 23:01 UTC)