[HN Gopher] What's the big deal about Deterministic Simulation T...
       ___________________________________________________________________
        
       What's the big deal about Deterministic Simulation Testing?
        
       Author : todsacerdoti
       Score  : 70 points
       Date   : 2024-08-20 11:16 UTC (11 hours ago)
        
 (HTM) web link (notes.eatonphil.com)
 (TXT) w3m dump (notes.eatonphil.com)
        
       | gguergabo wrote:
       | Thanks for this, Phil! Blog posts like this one that break down
       | complex topics into digestible pieces are a big help for the
       | space and are some of my favorites.
       | 
       | Antithesis employee here. Happy to jump in and answer any burning
       | questions people might have about Deterministic Simulation
       | Testing (DST).
        
       | msarrel wrote:
       | Good stuff, thanks
        
       | RsmFz wrote:
       | We have that here at Firezone. To deal with I/O we don't use
       | Tokio inside the test boundary at all, just futures. So no I/O,
       | no sleeping, etc. Thomas explained it here https://firezone-git-
       | docs-blogsans-io-firezone.vercel.app/bl...
       | 
       | I haven't dealt with it directly on Firezone but I wrote one or
       | two games this way for game jams years ago, and I keep wishing it
       | would catch on. It was harder with the games because floating-
       | point math doesn't like to be deterministic across platforms.
        
       | rhplus wrote:
       | For anyone writing services in C# there's a project from MSR
       | called Coyote that does similar deterministic simulation testing
       | by systematically testing interleavings of async code.
       | 
       | https://microsoft.github.io/coyote/
        
       | RandomThoughts3 wrote:
       | Considering you actually have to design around DST, I'm still
       | widely unconvinced that the time and effort spent setting up DST
       | and fuzzing hoping it finds your bugs wouldn't be better spent
       | actually proving that your design is bug free using tools like
       | TLA+ before intelligently using static analysis and formal proof
       | during implementation.
       | 
       | I believe DST to be the wrong solution to the actual problem. Its
       | main advantage is that it doesn't require that people used to
       | design distributed system actually acquire a new skill set and it
       | doesn't challenge the status quo too much (after all it's pretty
       | much just fuzzing on path you have chosen to make fuzzable).
        
         | hwayne wrote:
         | One of the big problems with using TLA+ is that it verifies
         | your design, not your code. People are looking for ways to link
         | the two. Formal proof works but is too expensive for most
         | businesses.
         | 
         | The most promising approach I've seen so far is... DST! First
         | we simulate a system, we generate a bunch of timelines, then we
         | see if those timelines are valid behaviors in the TLA+ design.
         | I've heard of a few success stories and it's definitely cheaper
         | than formal proof!
        
       | joncrocks wrote:
       | I don't think you have to have systems in the same thread/process
       | if you have bake in an API for controlling time and
       | ingress/egress for each component. (depending on what you're
       | trying to test)
       | 
       | You can have the communication channels between components under
       | the control of the simulation environment rather than have them
       | happen in their 'normal' manner. This allows you to inject
       | latency between components, 'fiddle' with the inputs/outputs as
       | suggested as well as record messaging (assuming you're working
       | with systems that exchange messages/events) with appropriate
       | event times.
       | 
       | Another important point around clocks is that you'll want to
       | include a scheduling API in your clock components to be able to
       | schedule events for themselves in the future. If you start moving
       | to event-time then being able to fast-forward in time is another
       | advantage.
       | 
       | Overall it's worth considering the class of issue you're trying
       | to detect with this approach. It's great to be able to run things
       | in event time, debug ad-nauseam but you're not going to catch
       | race conditions without the smarts mentioned around native
       | scheduling.
       | 
       | There are some parallels as well between this style of testing
       | and production replication, so that's good to have at the back of
       | your mind if you're looking to collect production
       | telemetry/ingress + egress data. If you build a good system for
       | this type of testing producing reproduceable code will be part of
       | your development process and you're more likely to be able to
       | produce tooling for investigation/reproduction of production
       | issues.
       | 
       | (The context in which I do work that's kind of in this area is
       | robust backtesting of trading systems)
        
         | 10000truths wrote:
         | The reason for using a single underlying thread/process is to
         | prevent the OS scheduler from interfering with deterministic
         | execution. You can't control how and when the OS scheduler
         | kicks in, nor can you perfectly reproduce the clock
         | drift/jitter between multiple cores. If the program under test
         | spawns threads, then you'll have to emulate the execution of
         | those threads by writing your own scheduler whose time slicing
         | and scheduling policies are done deterministically.
        
           | Veserv wrote:
           | You can control scheduling; that is how many record-replay
           | based time-traveling debuggers do it.
           | 
           | Also, scheduling is independent of deterministic execution
           | unless you are doing inherently non-deterministic things like
           | multithreaded shared memory accesses which you can not
           | simulate faithfully anyways. The only thing that matters in a
           | deterministic execution model is runs of deterministic
           | execution interrupted with non-deterministic events injected
           | at precise points in the execution trace.
           | 
           | When serializing onto a single thread you already need to
           | define some sort of correspondence between "simulated
           | scheduler state" to number of instructions to execute as you
           | are already giving up on the actual scheduler (unless you do
           | not care about correspondence to the actual schedule
           | configuration). You just do that, but you get to execute with
           | all of your cores until you reach the injection point (which
           | is how replay systems can work already). Now you can execute
           | in parallel (multiprocessing only though, no multithreading)
           | and use blocking I/O.
        
         | a_t48 wrote:
         | This is actually an area I'm solving right now in the robotics
         | space - you've got it exactly right. You need to restrict usage
         | of calls to query the system time, deep control over the
         | message passing later, and a custom scheduler for executing
         | message handlers.
         | 
         | Edit: trading systems isn't an area I considered for this work.
         | A lot of parallels, though.
        
       | skybrian wrote:
       | JavaScript is single-threaded, but I/O (events) introduce
       | nondeterminism. I'm wondering if there are tools that let you
       | control how events get scheduled when testing async code?
        
       ___________________________________________________________________
       (page generated 2024-08-20 23:01 UTC)