[HN Gopher] Revy - proof-of-concept time-travel debugger for the...
       ___________________________________________________________________
        
       Revy - proof-of-concept time-travel debugger for the Bevy game
       engine
        
       Author : teh_cmc
       Score  : 83 points
       Date   : 2024-03-04 14:09 UTC (8 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | teh_cmc wrote:
       | Author here; we had some fun building this last week.
       | 
       | Feel free to ask me anything!
        
         | mysterydip wrote:
         | With Bevy being in early development still, are you worried
         | about frequent maintenance to fix breaking changes?
        
           | indigochill wrote:
           | I'd guess from "It is not a full-fledged, properly maintained
           | thing" in the README, probably not.
        
           | teh_cmc wrote:
           | As mentioned in the README, Revy is not meant to be a
           | polished / properly maintained project -- it's just a proof-
           | of-concept. I've talked more about how and why it came to
           | exist in the first place in this thread [1], if you're
           | interested.
           | 
           | That being said, I do intend to publish updates when new
           | versions of either Rerun or Bevy land; if only to experiment
           | with new APIs as they come online.
           | 
           | Now, to answer your question, I've been using Bevy since the
           | 0.1 release and, in my experience, keeping up with the
           | changes upstream has always been pretty painless. Their
           | organization nand release process is top-notch, with some of
           | the most high quality changelogs and migration guides I've
           | ever seen in any project, and releases are rare enough
           | (~about once a quarter) to just not be an issue.
           | 
           | The community maintains compatibility matrices such as this
           | one [2], and things generally just work :tm:.
           | 
           | [1] https://www.reddit.com/r/rust/comments/1b6bqv1/revy_proof
           | ofc...
           | 
           | [2] https://github.com/rerun-io/revy?tab=readme-ov-
           | file#compatib...
        
         | ordinaryradical wrote:
         | Given that Bevy's systems scheduler is nondeterministic (for
         | everything not explicitly ordered), do you foresee issues
         | coming from that? Or does this approach sidestep that as an
         | issue?
        
           | teh_cmc wrote:
           | Revy is frame-based: it runs as the last system at the end of
           | the frame, with exclusive access to the `World`, and
           | synchronizes the state of the Bevy database with the state of
           | the Rerun database at that point in time (it keeps track of 3
           | timelines during that process: the wall-clock time given by
           | the OS, and the frame number and simulation time given by
           | Bevy itself).
           | 
           | So non-deterministic scheduling is just not an issue by
           | default.
           | 
           | You could of course access the Revy logger from any system
           | (it's just a `Resource` after all) and log arbitrary data to
           | Rerun from there (the resource is basically a handle to the
           | Rerun SDK). This still wouldn't be a problem. The data would
           | once again be logged to the 3 same timelines (wall-clock,
           | frame number and sim_time) and you would be able to visualize
           | in which order the different systems doing the logging were
           | scheduled during each frame.
        
       | jasonjmcghee wrote:
       | So. Freaking. Cool.
       | 
       | Awesome stuff.
        
         | teh_cmc wrote:
         | Thanks!
        
       | diggan wrote:
       | This is really awesome! I recently picked up Bevy and Rust to
       | resume my attempt at making games and hopefully publishing
       | something worthwhile. This is something that I felt was missing
       | since day 2 of learning Bevy.
       | 
       | My own personal workaround have been to dump "user actions" to a
       | ndjson file, which I can load at runtime when I want a "replay"
       | but obviously missing being able to move forward/backwards, it
       | just plays the actions.
       | 
       | Would love to see it working with bevy_xpdb, although I'm not
       | sure how deterministic it is and if that gets in the way (I
       | assume so?), it does have a `enhanced-determinism` flag that says
       | "Enables increased determinism", but the lack of "complete/full
       | determinism" terms doesn't give me a lot of hope.
        
         | teh_cmc wrote:
         | Whether the physics engine is deterministic or not doesn't
         | matter here -- Revy (and more importantly, Rerun) doesn't
         | replay anything: it just stores state, every single frame, and
         | then visualizes that state at every timestamp available.
         | 
         | Check out e.g. the live demo of the breakout example for
         | example [1]: if you click on the pallet and then go to its
         | parent node, you'll see that we just store that node's final
         | transform (i.e. post-physics) every frame.
         | 
         | Happy gamedev!
         | 
         | [1]
         | https://app.rerun.io/version/0.14.1/index.html?url=https://s...
        
       | LarsDu88 wrote:
       | This kind of reminds me of the article:
       | https://spacetimedb.com/blog/databases-and-data-oriented-des...
       | 
       | Where basically the ECS boils down to what is essentially a
       | relational database, and here it looks like that's being
       | leveraged to do snapshotting and point-in-time queries!
        
         | teh_cmc wrote:
         | Oh for sure, there's a lot of overlap between traditional
         | relational databases and ECS designs. As always, in the end the
         | hard part is to match the performance requirements.
         | 
         | If you squint enough, most ECS out there are pretty much very
         | specialized relational databases that trade off flexibility in
         | favor of performance for common gamedev use cases (very wide
         | joins, very deep hierarchies (e.g. transform trees), full-table
         | filters, etc).
         | 
         | Rerun's ECS goes one step further and makes time a first-class
         | citizen, allowing for efficient joins across different
         | components across different timestamps.
         | 
         | This is what makes it possible to only log diffs in Revy (we
         | only snapshot the components that were modified during the last
         | frame), rather than having to full snapshots every frame, which
         | would be prohibitively expensive (both time and space). Rerun
         | then stitches back everything together during visualization, in
         | real-time!
        
       | anthk wrote:
       | 'TIme travel'. Ah, capturing the state and rolling back.
       | Something a Z-Machine interpreter had 40 years ago I think, if
       | not more, with the 'undo' command at the prompt :D.
       | 
       | One day the OS' shells will have an undo command for everything,
       | but they will waste tons of CPU cycles. And not by virtualizing.
       | Altough if you run your OS under a light hypervisor such as xen,
       | that funcionality might be able to be called from the userland
       | and some kernel driver/hardware hook. Who knows.
        
         | mathteddybear wrote:
         | Bill Lewis, I presume, called it more or less like that
         | 
         | https://arxiv.org/abs/cs/0310016
         | 
         | https://www.youtube.com/watch?v=xpI8hIgOyko
         | 
         | Soon later, "debugging backwards in time" morphed into "time-
         | travel debugging"
        
           | Veserv wrote:
           | No, time travel debugging is almost certainly a root, but
           | comes from a different lineage.
           | 
           | https://jakob.engbloms.se/archives/1564
           | 
           | The Green Hills Software Time Machine product for time travel
           | debugging was commercially available by September 2003 [1]
           | which is at least contemporaneous with that paper by Bill
           | Lewis (i.e. terminology could not have been derived from it).
           | 
           | Given the alternative terminology frequently used for the
           | technology up to and after that point such as bidirectional,
           | reverse, reversible, omniscient, replay, record-replay, etc.
           | time-travel debugging as a term almost certainly
           | originates/was popularized by Time Machine as the first
           | successful time travel product (yes, I see the Lauterbach CTS
           | is listed as existing first, but it was not commercially
           | distinguished and successful and obviously has no terminology
           | lineage).
           | 
           | [1] https://www.ghs.com/news/20030930_best_of_show.html
        
       | SeanAnderson wrote:
       | What's the performance of this like? It seems really appealing. I
       | would love to be able to use it to debug
       | https://github.com/MeoMix/symbiants because I use RNG heavily to
       | add variance to the world and that, combined with indeterminate
       | execution order of systems, can really leave me scratching my
       | head sometimes.
       | 
       | However, I'm building using a tilemap that's 144x144. So I've got
       | ~21000 entities to log. It seems impractical to snapshot the
       | world every tick, but maybe if it were able to snapshot deltas or
       | something?
        
         | teh_cmc wrote:
         | Revy already works with snapshot deltas (see other comments
         | scattered around this section for more details, but basically
         | we only sync components that changed during the previous frame
         | -- Rerun stitches everything back together at runtime)... but
         | at 21k entities, I'm afraid you'll be facing much bigger issues
         | on the Rerun-side of things :D
         | 
         | Rerun was originally designed for few (i.e. dozens up to
         | hundreds) massive entities (e.g. it's common for a single
         | entity to have a few million 3D points and color values
         | attached to it).
         | 
         | While we're slowly working towards improving the many-entities
         | use-case, the correct thing to do in this case would probably
         | be for Revy to identify that all these entities are really just
         | different instances of the same batch (either automatically, or
         | by exposing a marker component or something).
         | 
         | So, say, you'd set a marker component on all your tiles, Revy
         | would then snapshot them as a single batch of 144^2 instances,
         | and then in Rerun you'd see a single entity `/tiles` which
         | would be a batch of 144^2 instances (each with their own set of
         | components, that's fine!). From Rerun's point-of-view, this
         | would be similar to a point cloud, and at 21k instances you'd
         | be easily running at your monitor refresh rate with _a lot_ of
         | margin.
         | 
         | But by any means, try it! Not the web version though, you're
         | definitely going to need multithreading :D
         | 
         | Nice project btw; I'll keep an eye on it and probably use it as
         | a benchmark for the many-entities use-case!
        
           | SeanAnderson wrote:
           | Thanks for the response! :) Great to hear deltas work. Yeah,
           | sounds like it's the sort of thing that would need to run
           | natively until multithreading is supported in the web.
        
         | Veserv wrote:
         | Is a generic time travel debugging solution too much overhead?
         | A good multithreaded time travel debugger (not deterministic
         | replay based) should only incur ~100% overhead in the memory
         | bandwidth bound case. If you are not saturating your memory bus
         | without instrumentation then the overhead should be
         | proportionally less.
        
           | SeanAnderson wrote:
           | Nah, that'd probably work, too. I think the key here is
           | multithreading. I do most of my development in a WASM context
           | where Bevy doesn't support multithreading yet. I switch to
           | native debugging when I want breakpoints (or in this case,
           | when I'd want multithreading).
           | 
           | It's not the greatest workflow to default to WASM, but it
           | makes it easier to treat web as a first-class development
           | target. Still not sure that's worthwhile overall, but giving
           | it a shot for now.
        
             | Veserv wrote:
             | Wait, which is the hard one you wish you had time travel
             | debugging on, the single threaded WASM context or the
             | multithreaded native context?
             | 
             | The multithreaded native context is the one that is harder
             | in principle, but should only incur ~100% overhead for any
             | program including ones not using Bevy. Though I do not know
             | about the general availability of these products in your
             | field.
             | 
             | A single-threaded context is vastly simpler and can be done
             | with similar overhead without platform support or ~1-10%
             | overhead with platform support. Though I do not know is
             | anybody has implemented efficient WASM support or if
             | anybody with efficient multithreading implementations has
             | ported to WASM.
             | 
             | Likely the only available ones are the inefficient 1,000%
             | overhead or the hilariously bad 100,000% overhead ones like
             | the default gdb implementation. To be fair, these
             | implementations are much easier to write. Even ~100%
             | overhead in the single-threaded case is more common amongst
             | extant solutions since getting down to ~10% requires some
             | serious optimization. Still should be perfectly adequate
             | for development work.
        
               | SeanAnderson wrote:
               | Sounds like you know a lot more about this area than I :)
               | 
               | I would like an efficient way of time travelling in a
               | single threaded context.
               | 
               | As you describe it, it makes sense that supporting
               | multithreading would make the problem space much more
               | challenging to navigate. I wasn't thinking about that,
               | but it's clear once you point it out. I was just
               | considering the overhead of maintaining the undo state
               | without being able to delegate it to a separate thread.
               | 
               | As OP mentions, they use change detection to
               | calculate/store deltas, but Bevy's ECS change detection
               | isn't very performant. You still have to iterate over all
               | components and check a component's value to learn changed
               | state rather than being able to filter on a `Changed`
               | archetype. It kind of makes sense, though, because
               | adding/removing Changed components from tons of entities
               | every tick would also be expensive. Either way, change
               | detection feels like a sore spot when working with tons
               | of entities in ECS. I'm not super confident there's a way
               | around that without manually maintaining some data
               | structures outside of the ECS paradigm, but was thinking
               | that if I could at least run the change detection on a
               | separate thread that it might be tolerable.
        
       | cmrdporcupine wrote:
       | Does Revy depend just on the ECS crate, or does it bring in other
       | parts of Bevy? I see a blanket dep onto bevy, but is it really
       | using more than the ECS? I like the idea. I might try it out.
       | 
       | I've been playing with Bevy the last couple weeks, and in general
       | from my first impression I'd have to say that the bevy_ecs crate
       | seems more mature than the rest of it. It's not a bad ECS
       | framework, and actually quite useful independent of Bevy itself.
       | I'd like it if they cleaned up their crates deps a bit, but it's
       | pretty good standalone and not just for games, but for any
       | concurrent data driven application.
       | 
       | ECS has weird nomenclature when viewed outside of the games
       | industry. What it really has if you pan out, is queries and
       | binary relations/tables/facts/properties, but calls them
       | 'systems' and 'components'. "Components" outside of games & ECS
       | usually means something else, so it's a bit of a head scratcher
       | at first.
       | 
       | I think if you dig past the surface what you actually have is a
       | high performance version of what we used to call "tuple spaces",
       | a good model for managing state in parallel data-driven
       | applications, esp where there's lots and lots of bits of state
       | (e.g. vehicle autonomy with vision detection, or robotics, etc.)
        
       ___________________________________________________________________
       (page generated 2024-03-04 23:00 UTC)