[HN Gopher] Bringing Record and Replay debugging everywhere on L...
       ___________________________________________________________________
        
       Bringing Record and Replay debugging everywhere on Linux
        
       Author : sidkshatriya
       Score  : 190 points
       Date   : 2025-03-26 12:49 UTC (4 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | yjftsjthsd-h wrote:
       | Well! That eliminates one of if not the biggest problems with
       | rr:) Is there some catch or tradeoff? Performance, maybe?
        
         | sidkshatriya wrote:
         | Author here. Yes there are some tradeoffs:
         | 
         | - Performance: slower than using rr with HW counters. Both
         | dynamic and static instrumentation employed by _Software
         | Counter mode_ rr slow things down
         | 
         | - Potential fragility: Dynamic and static instrumentation can
         | often make record/replay a bit more fragile
         | 
         | - Currently only x86-64 support has been publicly released. I
         | have aarch64 support working reasonably well internally and it
         | allows me for instance run to rr in a Linux VM running on macOS
         | ! I have yet to figure out my strategy for the aarch64 release
         | so watch https://github.com/sidkshatriya/rr.soft for any
         | updates.
         | 
         | - Currently can run only on a few recent Linux distributions
         | (e.g. Fedora 40/41, Debian Unstable, Ubuntu 24.10) because it
         | relies on robust debuginfod support that is not widespread yet.
         | See https://github.com/sidkshatriya/rr.soft/wiki#how-does-
         | softwa... for why debuginfod is required. The debuginfod
         | requirement may be relaxed in the future with more work
         | 
         | Regardless of the tradeoffs this allows rr to be used in many
         | more situations i.e. wherever HW Performance counter access is
         | not possible/not reliable/broken.
         | 
         | I would love it if more people tried this out and let me know
         | if things worked out well for them (or not) with their
         | programs.
        
           | Agingcoder wrote:
           | Does it work with Pernosco ?
        
             | sidkshatriya wrote:
             | I've not used Pernosco but I am generally aware of it. My
             | guess is that Pernosco would need to be technically
             | modified to support recordings that have "soft ticks" i.e.
             | use Software Counters.
             | 
             | I don't see any compelling reason why this should _not_ be
             | possible from a broad level technical point of view.
             | Pernosco engineers of course would be able to give a more
             | authoritative reply.
        
           | vlovich123 wrote:
           | I'm assuming this changes nothing about the lack of io_uring
           | support?
        
             | sidkshatriya wrote:
             | Yes, io_uring is still not supported due to fundamental
             | issues in the overall rr architecture which my modification
             | does not resolve. My modification only addresses the HW
             | counter requirement of upstream rr and the other core
             | aspects of rr remain the same.
             | 
             | Normal system calls transition to kernel space and return
             | back from kernel space. They will change your program's
             | memory/process state as soon as they complete. This gives
             | rr an easy boundary when it "can do its thing" to record
             | memory/process state changes or insert results (during
             | replay).
             | 
             | When does an io_uring request/response complete ? That's
             | difficult to say. The kernel/userspace when using io_uring
             | communicate with each other by checking a queue head or
             | tail with memory accesses to see if something got
             | added/removed from request/response ring buffer.
             | 
             | Think of io_uring and userspace cooperating via memory.
             | (Yes, sometimes "proper" traditional ring crossing system
             | calls are made but what makes io_uring so fast is
             | communicating via memory and not via system calls most of
             | the time). Anyways all this makes it difficult for rr to
             | intervene on the boundary between kernel and userspace
             | because this boundary is elusive when it comes to io_uring.
             | The memory writes cannot be caught by ptrace ! This
             | explanation is simplified of course.
             | 
             | There are some plans to deal with io_uring by rr
             | maintainers https://github.com/rr-debugger/rr/issues/2613
        
           | CMCDragonkai wrote:
           | Will if run on latest NixOS?
        
       | IshKebab wrote:
       | I had to scroll a _very_ long way to get to the most important
       | bit:
       | 
       | > Running rr record/replay without access to CPU HW performance
       | counters is accomplished using lightweight dynamic (and static)
       | instrumentation. The Software Counters mode rr wiki has more
       | details in case you're curious about some more of the internals.
       | 
       | You should move that to the top.
        
         | sidkshatriya wrote:
         | Good suggestion. Added to the TL;DR section of the blog post !
        
       | db48x wrote:
       | Very cool. It's difficult to praise rr too much, and it just
       | keeps getting better. If you're not using it, you're missing out
       | on a superpower.
        
       | stuaxo wrote:
       | Very nice.
       | 
       | Has anyone got rr working with python?
        
         | sidkshatriya wrote:
         | rr has always worked with Python in the sense that it can
         | record and replay Python programs.
         | 
         | However, when you try to debug the program you can only debug
         | the C code the Python interpreter is written in.
         | 
         | I suppose you want to be able to debug the Python code itself.
         | Here is something that could do this
         | https://pypy.org/posts/2016/07/reverse-debugging-for-python-...
         | . I don't think the project is active nowadays though. Also I
         | haven't used it so can't say whether it is good or not.
         | 
         | It should be possible to built a Python reverse debugger on top
         | of rr. I know this should be possible because I built something
         | for PHP https://github.com/sidkshatriya/dontbug .
         | 
         | There are other fancy (and possibly better) things that are
         | possible -- instead of building a Python debugger atop rr you
         | can record the full trace of the Python program and then for
         | e.g. store the values of important variables at each executed
         | line of the Python program in a database. This would again use
         | rr as the record/replay substrate but with a slightly different
         | approach. This is an area which I've done some work internally
         | but nothing public released yet :-) !
        
           | senkora wrote:
           | Do the gdb commands for printing information about python
           | frames work with rr? E.g. py-bt, py-print, py-locals?
        
             | sidkshatriya wrote:
             | Any gdb integration scripts for Python to get stack frames
             | etc. should work fine in rr and Software Counters Mode rr.
             | 
             | `rr replay` and (the software counters equivalent) `rr
             | replay -W` invokes gdb.
        
         | 29ebJCyy wrote:
         | https://retracesoftware.com/ Is python specific and seems
         | promising.
        
       | pabs3 wrote:
       | Is this getting merged back into rr?
        
         | sidkshatriya wrote:
         | Tough to say right now. Here is a long answer:
         | https://github.com/sidkshatriya/rr.soft?tab=readme-ov-file#w...
        
       | georgewsinger wrote:
       | Still waiting for rr to work more transparently/easily with
       | Haskell.
        
         | sidkshatriya wrote:
         | rr and Gdb are very DWARF debugging focussed. As long as
         | Haskell has only basic DWARF debugging support I wonder how
         | much rr/gdb can do.
         | 
         | Though I do see a lot of promise in the future. rr can help
         | make premature evaluation (many expressions are evaluated
         | earlier than they might typically happen in a real Haskell
         | program because a user may want to inspect a value) in the
         | debugger not matter so much because that evaluation can be
         | executed in a diversion session.
        
       | chacham15 wrote:
       | I have tried SO hard to get rr to work for me, including buying a
       | separate pc just to use it...but it just consistently fails so
       | I've basically abandoned it. Something like this would absolutely
       | be a godsend. Just getting something consistently working with
       | Ubuntu is amazing. Does this approach make working in something
       | like WSL viable?
       | 
       | I would love if this were upstreamed. Is there a github issue
       | where you discuss the possibility of this with the rr devs? That
       | might be something to add to your readme for everyone else who
       | wants to follow along. Thanks!
        
         | sidkshatriya wrote:
         | Thanks for the encouraging words! Please do try it out and
         | report back if it worked well or not for you on the issue
         | tracker.
         | 
         | With sufficient usage I think we can make a good case to get
         | merged upstream. This patch introduces dynamic/static
         | instrumentation for ticks counting which is quite different to
         | how things have happened till now on rr. If there are many
         | success stories a stronger case for upstream merge can be made.
         | The rr maintainers are aware of this project but it is early
         | days yet for an upstream merge PR attempt yet
        
           | chacham15 wrote:
           | With a big changeset, its better to have a brief discussion
           | about how it works / what it needs before you actually
           | actually make a PR. Just big principles high level stuff.
           | This way if you build a train station, the devs wont be like
           | "ooh, we really need an airport." Thats why an issue to track
           | it is good: it raises visibility for anyone who has an issue
           | with the approach etc. long before its time to make a merge.
           | Also, if theyre like "well never take this" or "well take
           | this if you build a space station" its good to know that
           | before investing a ton of time into something PR-able.
        
       | maleldil wrote:
       | Patiently waiting for the day when someone makes something
       | similar for macOS.
        
         | sidkshatriya wrote:
         | It's very difficult for a broad based record/replay software
         | like rr to exist for macOS in my opinion. macOS system
         | interfaces are quite basic in terms of functionality compared
         | to Linux and increasingly locked down.
         | 
         | rr uses many advanced features of Linux `ptrace`. Compare `man
         | ptrace` on Linux with that on macOS for example and you will
         | notice that Linux gives a lot of features to `ptrace` that
         | macOS simply does not.
         | 
         | There are a large number of other features required for
         | practical record and replay -- I dont think macOS simply
         | provides them also.
         | 
         | It's probably possible to build _some_ record/replay system on
         | macOS with constraints, restrictions, workarounds and
         | compromises -- never say never as they say. But I don't think
         | it can be as capable/generic as rr on Linux.
        
           | transpute wrote:
           | Could this help?
           | https://developer.apple.com/documentation/xcode-release-
           | note...                 Instruments 16.3 includes a new
           | Processor Trace Instrument which uses hardware-supported,
           | low-overhead CPU execution tracing to accurately reconstruct
           | execution of the program. This tool provides metrics like
           | duration, number of cycles, and instructions retired for
           | every function executed on the CPU. Timeline in Instruments
           | presents execution flame graph, while detail views provide
           | aggregate-level data like Call Tree or aggregated metrics
           | (min, max, count, sum), divided by function. Traces can be
           | recorded using the new Processor Trace template on supported
           | devices: M4 Mac, M4 iPad, and iPhone 16/16 Pro. Tracing on
           | the device requires additional configuration in the System
           | Settings.
        
       | vchuravy wrote:
       | Very cool. Does the dynamic instrumentation handle JIT emitted
       | code?
        
       ___________________________________________________________________
       (page generated 2025-03-30 23:01 UTC)