[HN Gopher] Revy - proof-of-concept time-travel debugger for the...
___________________________________________________________________
Revy - proof-of-concept time-travel debugger for the Bevy game
engine
Author : teh_cmc
Score : 83 points
Date : 2024-03-04 14:09 UTC (8 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| teh_cmc wrote:
| Author here; we had some fun building this last week.
|
| Feel free to ask me anything!
| mysterydip wrote:
| With Bevy being in early development still, are you worried
| about frequent maintenance to fix breaking changes?
| indigochill wrote:
| I'd guess from "It is not a full-fledged, properly maintained
| thing" in the README, probably not.
| teh_cmc wrote:
| As mentioned in the README, Revy is not meant to be a
| polished / properly maintained project -- it's just a proof-
| of-concept. I've talked more about how and why it came to
| exist in the first place in this thread [1], if you're
| interested.
|
| That being said, I do intend to publish updates when new
| versions of either Rerun or Bevy land; if only to experiment
| with new APIs as they come online.
|
| Now, to answer your question, I've been using Bevy since the
| 0.1 release and, in my experience, keeping up with the
| changes upstream has always been pretty painless. Their
| organization nand release process is top-notch, with some of
| the most high quality changelogs and migration guides I've
| ever seen in any project, and releases are rare enough
| (~about once a quarter) to just not be an issue.
|
| The community maintains compatibility matrices such as this
| one [2], and things generally just work :tm:.
|
| [1] https://www.reddit.com/r/rust/comments/1b6bqv1/revy_proof
| ofc...
|
| [2] https://github.com/rerun-io/revy?tab=readme-ov-
| file#compatib...
| ordinaryradical wrote:
| Given that Bevy's systems scheduler is nondeterministic (for
| everything not explicitly ordered), do you foresee issues
| coming from that? Or does this approach sidestep that as an
| issue?
| teh_cmc wrote:
| Revy is frame-based: it runs as the last system at the end of
| the frame, with exclusive access to the `World`, and
| synchronizes the state of the Bevy database with the state of
| the Rerun database at that point in time (it keeps track of 3
| timelines during that process: the wall-clock time given by
| the OS, and the frame number and simulation time given by
| Bevy itself).
|
| So non-deterministic scheduling is just not an issue by
| default.
|
| You could of course access the Revy logger from any system
| (it's just a `Resource` after all) and log arbitrary data to
| Rerun from there (the resource is basically a handle to the
| Rerun SDK). This still wouldn't be a problem. The data would
| once again be logged to the 3 same timelines (wall-clock,
| frame number and sim_time) and you would be able to visualize
| in which order the different systems doing the logging were
| scheduled during each frame.
| jasonjmcghee wrote:
| So. Freaking. Cool.
|
| Awesome stuff.
| teh_cmc wrote:
| Thanks!
| diggan wrote:
| This is really awesome! I recently picked up Bevy and Rust to
| resume my attempt at making games and hopefully publishing
| something worthwhile. This is something that I felt was missing
| since day 2 of learning Bevy.
|
| My own personal workaround have been to dump "user actions" to a
| ndjson file, which I can load at runtime when I want a "replay"
| but obviously missing being able to move forward/backwards, it
| just plays the actions.
|
| Would love to see it working with bevy_xpdb, although I'm not
| sure how deterministic it is and if that gets in the way (I
| assume so?), it does have a `enhanced-determinism` flag that says
| "Enables increased determinism", but the lack of "complete/full
| determinism" terms doesn't give me a lot of hope.
| teh_cmc wrote:
| Whether the physics engine is deterministic or not doesn't
| matter here -- Revy (and more importantly, Rerun) doesn't
| replay anything: it just stores state, every single frame, and
| then visualizes that state at every timestamp available.
|
| Check out e.g. the live demo of the breakout example for
| example [1]: if you click on the pallet and then go to its
| parent node, you'll see that we just store that node's final
| transform (i.e. post-physics) every frame.
|
| Happy gamedev!
|
| [1]
| https://app.rerun.io/version/0.14.1/index.html?url=https://s...
| LarsDu88 wrote:
| This kind of reminds me of the article:
| https://spacetimedb.com/blog/databases-and-data-oriented-des...
|
| Where basically the ECS boils down to what is essentially a
| relational database, and here it looks like that's being
| leveraged to do snapshotting and point-in-time queries!
| teh_cmc wrote:
| Oh for sure, there's a lot of overlap between traditional
| relational databases and ECS designs. As always, in the end the
| hard part is to match the performance requirements.
|
| If you squint enough, most ECS out there are pretty much very
| specialized relational databases that trade off flexibility in
| favor of performance for common gamedev use cases (very wide
| joins, very deep hierarchies (e.g. transform trees), full-table
| filters, etc).
|
| Rerun's ECS goes one step further and makes time a first-class
| citizen, allowing for efficient joins across different
| components across different timestamps.
|
| This is what makes it possible to only log diffs in Revy (we
| only snapshot the components that were modified during the last
| frame), rather than having to full snapshots every frame, which
| would be prohibitively expensive (both time and space). Rerun
| then stitches back everything together during visualization, in
| real-time!
| anthk wrote:
| 'TIme travel'. Ah, capturing the state and rolling back.
| Something a Z-Machine interpreter had 40 years ago I think, if
| not more, with the 'undo' command at the prompt :D.
|
| One day the OS' shells will have an undo command for everything,
| but they will waste tons of CPU cycles. And not by virtualizing.
| Altough if you run your OS under a light hypervisor such as xen,
| that funcionality might be able to be called from the userland
| and some kernel driver/hardware hook. Who knows.
| mathteddybear wrote:
| Bill Lewis, I presume, called it more or less like that
|
| https://arxiv.org/abs/cs/0310016
|
| https://www.youtube.com/watch?v=xpI8hIgOyko
|
| Soon later, "debugging backwards in time" morphed into "time-
| travel debugging"
| Veserv wrote:
| No, time travel debugging is almost certainly a root, but
| comes from a different lineage.
|
| https://jakob.engbloms.se/archives/1564
|
| The Green Hills Software Time Machine product for time travel
| debugging was commercially available by September 2003 [1]
| which is at least contemporaneous with that paper by Bill
| Lewis (i.e. terminology could not have been derived from it).
|
| Given the alternative terminology frequently used for the
| technology up to and after that point such as bidirectional,
| reverse, reversible, omniscient, replay, record-replay, etc.
| time-travel debugging as a term almost certainly
| originates/was popularized by Time Machine as the first
| successful time travel product (yes, I see the Lauterbach CTS
| is listed as existing first, but it was not commercially
| distinguished and successful and obviously has no terminology
| lineage).
|
| [1] https://www.ghs.com/news/20030930_best_of_show.html
| SeanAnderson wrote:
| What's the performance of this like? It seems really appealing. I
| would love to be able to use it to debug
| https://github.com/MeoMix/symbiants because I use RNG heavily to
| add variance to the world and that, combined with indeterminate
| execution order of systems, can really leave me scratching my
| head sometimes.
|
| However, I'm building using a tilemap that's 144x144. So I've got
| ~21000 entities to log. It seems impractical to snapshot the
| world every tick, but maybe if it were able to snapshot deltas or
| something?
| teh_cmc wrote:
| Revy already works with snapshot deltas (see other comments
| scattered around this section for more details, but basically
| we only sync components that changed during the previous frame
| -- Rerun stitches everything back together at runtime)... but
| at 21k entities, I'm afraid you'll be facing much bigger issues
| on the Rerun-side of things :D
|
| Rerun was originally designed for few (i.e. dozens up to
| hundreds) massive entities (e.g. it's common for a single
| entity to have a few million 3D points and color values
| attached to it).
|
| While we're slowly working towards improving the many-entities
| use-case, the correct thing to do in this case would probably
| be for Revy to identify that all these entities are really just
| different instances of the same batch (either automatically, or
| by exposing a marker component or something).
|
| So, say, you'd set a marker component on all your tiles, Revy
| would then snapshot them as a single batch of 144^2 instances,
| and then in Rerun you'd see a single entity `/tiles` which
| would be a batch of 144^2 instances (each with their own set of
| components, that's fine!). From Rerun's point-of-view, this
| would be similar to a point cloud, and at 21k instances you'd
| be easily running at your monitor refresh rate with _a lot_ of
| margin.
|
| But by any means, try it! Not the web version though, you're
| definitely going to need multithreading :D
|
| Nice project btw; I'll keep an eye on it and probably use it as
| a benchmark for the many-entities use-case!
| SeanAnderson wrote:
| Thanks for the response! :) Great to hear deltas work. Yeah,
| sounds like it's the sort of thing that would need to run
| natively until multithreading is supported in the web.
| Veserv wrote:
| Is a generic time travel debugging solution too much overhead?
| A good multithreaded time travel debugger (not deterministic
| replay based) should only incur ~100% overhead in the memory
| bandwidth bound case. If you are not saturating your memory bus
| without instrumentation then the overhead should be
| proportionally less.
| SeanAnderson wrote:
| Nah, that'd probably work, too. I think the key here is
| multithreading. I do most of my development in a WASM context
| where Bevy doesn't support multithreading yet. I switch to
| native debugging when I want breakpoints (or in this case,
| when I'd want multithreading).
|
| It's not the greatest workflow to default to WASM, but it
| makes it easier to treat web as a first-class development
| target. Still not sure that's worthwhile overall, but giving
| it a shot for now.
| Veserv wrote:
| Wait, which is the hard one you wish you had time travel
| debugging on, the single threaded WASM context or the
| multithreaded native context?
|
| The multithreaded native context is the one that is harder
| in principle, but should only incur ~100% overhead for any
| program including ones not using Bevy. Though I do not know
| about the general availability of these products in your
| field.
|
| A single-threaded context is vastly simpler and can be done
| with similar overhead without platform support or ~1-10%
| overhead with platform support. Though I do not know is
| anybody has implemented efficient WASM support or if
| anybody with efficient multithreading implementations has
| ported to WASM.
|
| Likely the only available ones are the inefficient 1,000%
| overhead or the hilariously bad 100,000% overhead ones like
| the default gdb implementation. To be fair, these
| implementations are much easier to write. Even ~100%
| overhead in the single-threaded case is more common amongst
| extant solutions since getting down to ~10% requires some
| serious optimization. Still should be perfectly adequate
| for development work.
| SeanAnderson wrote:
| Sounds like you know a lot more about this area than I :)
|
| I would like an efficient way of time travelling in a
| single threaded context.
|
| As you describe it, it makes sense that supporting
| multithreading would make the problem space much more
| challenging to navigate. I wasn't thinking about that,
| but it's clear once you point it out. I was just
| considering the overhead of maintaining the undo state
| without being able to delegate it to a separate thread.
|
| As OP mentions, they use change detection to
| calculate/store deltas, but Bevy's ECS change detection
| isn't very performant. You still have to iterate over all
| components and check a component's value to learn changed
| state rather than being able to filter on a `Changed`
| archetype. It kind of makes sense, though, because
| adding/removing Changed components from tons of entities
| every tick would also be expensive. Either way, change
| detection feels like a sore spot when working with tons
| of entities in ECS. I'm not super confident there's a way
| around that without manually maintaining some data
| structures outside of the ECS paradigm, but was thinking
| that if I could at least run the change detection on a
| separate thread that it might be tolerable.
| cmrdporcupine wrote:
| Does Revy depend just on the ECS crate, or does it bring in other
| parts of Bevy? I see a blanket dep onto bevy, but is it really
| using more than the ECS? I like the idea. I might try it out.
|
| I've been playing with Bevy the last couple weeks, and in general
| from my first impression I'd have to say that the bevy_ecs crate
| seems more mature than the rest of it. It's not a bad ECS
| framework, and actually quite useful independent of Bevy itself.
| I'd like it if they cleaned up their crates deps a bit, but it's
| pretty good standalone and not just for games, but for any
| concurrent data driven application.
|
| ECS has weird nomenclature when viewed outside of the games
| industry. What it really has if you pan out, is queries and
| binary relations/tables/facts/properties, but calls them
| 'systems' and 'components'. "Components" outside of games & ECS
| usually means something else, so it's a bit of a head scratcher
| at first.
|
| I think if you dig past the surface what you actually have is a
| high performance version of what we used to call "tuple spaces",
| a good model for managing state in parallel data-driven
| applications, esp where there's lots and lots of bits of state
| (e.g. vehicle autonomy with vision detection, or robotics, etc.)
___________________________________________________________________
(page generated 2024-03-04 23:00 UTC)