[HN Gopher] Debugging operating systems with time-traveling virt...
___________________________________________________________________
Debugging operating systems with time-traveling virtual machines
[pdf]
Author : Intralexical
Score : 41 points
Date : 2024-08-18 18:28 UTC (4 hours ago)
(HTM) web link (www.usenix.org)
(TXT) w3m dump (www.usenix.org)
| Intralexical wrote:
| Abstract: Operating systems are difficult to
| debug with traditional cyclic debugging. They are non-
| deterministic; they run for long periods of time; they interact
| directly with hard-ware devices; and their state is easily
| perturbed by the act of debugging. This paper describes a time-
| traveling virtual machine that overcomes many of the difficulties
| associated with debugging operating systems. Time travel enables
| a programmer to navigate backward and forward arbitrarily through
| the execution history of a particular run and to replay arbitrary
| segments of the past execution. We integrate time travel into a
| general-purpose debugger to enable a programmer to debug an OS in
| reverse, implementing commands such as reverse breakpoint,
| reverse watchpoint, and reverse single step. The space and time
| overheads needed to support time travel are reasonable for
| debugging, and movements in time are fast enough to support
| interactive debugging. We demonstrate the value of our time-
| traveling virtual machine by using it to understand and fix
| several OS bugs that are difficult to find with standard
| debugging tools. Reverse debugging is especially helpful in
| finding bugs that are fragile due to non-determinism, bugs in
| device drivers, bugs that require long runs to trigger, bugs that
| corrupt the stack, and bugs that are detected after the relevant
| stack frame is popped.
|
| Related work:
|
| "ConSnap: Taking continuous snapshots for running state
| protection of virtual machines"
|
| https://web.archive.org/web/20151014000129id_/http://act.bua...
| ...In this paper, we present ConSnap, a system designed to enable
| taking fine-grained continuous snapshots of virtual machines...
| decrease the snapshot interval to dozens of milliseconds. We have
| implemented ConSnap on QEMU/KVM.... Compared with the stop-and-
| copy based incremental snapshots, ConSnap reduces the performance
| loss by 71.1% ~ 10.2% under Compilation workload, and 14.5% ~
| 4.7% for the Ftp workload, when the interval varies from 1s to
| 60s.
|
| "ReVirt: Enabling Intrusion Analysis Through Virtual-Machine
| Logging and Replay"
|
| https://www.usenix.org/legacy/events/osdi02/tech/full_papers...
| ...ReVirt logs enough information to replay a long-term execution
| of the virtual machine instruction-by-instruction. This enables
| it to provide arbitrarily detailed observations about what
| transpired on the system, even in the presence of non-
| deterministic attacks and executions. ReVirt adds reasonable time
| and space overhead. Overheads due to virtualization are
| imperceptible for interactive use and CPU-bound workloads, and
| 13-58% for kernel-intensive workloads. Logging adds 0-8%
| overhead, and logging traffic for our workloads can be stored on
| a single disk for several months.
| drewg123 wrote:
| This needs a [2005] qualifier
| Veserv wrote:
| A history of other time traveling debugging papers and products
| (including this one):
|
| https://jakob.engbloms.se/archives/1554
|
| https://jakob.engbloms.se/archives/1564
| fatcunt wrote:
| Microsoft has, or had, a similar technology they use internally,
| called TKO:
| https://www.microsoft.com/security/blog/2020/05/04/mitigatin...
|
| It's written in Rust and is based around a version of Bochs
| modified for deterministic execution. It's got time-travel
| debugging (with WinDbg), which works by replaying forward from
| the nearest snapshot to the point at which the user is asking to
| move backwards to.
|
| The primary author of this software wanted to open source it, but
| the higher-ups at MSFT refused. He's been working on similar
| projects in a personal capacity though, e.g.
| https://gamozolabs.github.io/fuzzing/2020/12/06/fuzzos.html
| Cyph0n wrote:
| Watched some of his streams a while back. One of the most
| talented/productive devs I have ever seen tbh.
| grepfru_it wrote:
| The trick at Microsoft is to start working on your project in
| your spare time. Then incorporate it into your project at MSFT.
| You get the clout associated with having an open source project
| but then you also get to use it internally as a sanctioned tool
| roca wrote:
| I wonder if this inspired the VMWare VM record-and-replay
| functionality that came out in 2008. They discontinued it in
| 2011, but it's important to me because we used it at Mozilla to
| great effect and that made it easier for me to get Mozilla to
| support the development of rr, which started in 2011.
| icholy wrote:
| I don't get how rr isn't more popular.
| userbinator wrote:
| Debuggers have had history tracing functionality for a long time,
| but being extremely slow and consuming a lot of storage meant it
| was rarely used except for very specific cases. Now that CPUs are
| faster and the average machine has a lot more RAM, it becomes
| more feasible to do this.
___________________________________________________________________
(page generated 2024-08-18 23:00 UTC)