[HN Gopher] What rr does
       ___________________________________________________________________
        
       What rr does
        
       Author : mmarq
       Score  : 796 points
       Date   : 2022-06-04 06:28 UTC (16 hours ago)
        
 (HTM) web link (rr-project.org)
 (TXT) w3m dump (rr-project.org)
        
       | stewbrew wrote:
       | We already have https://r-project.org. Now we have https://rr-
       | project.org. So, https://rrr-project.org is next?
        
         | [deleted]
        
         | leoff wrote:
         | It is actually a geometric progression, so https://rrrr-
         | project.org would be the next one.
        
           | throwamon wrote:
           | Common misconception. Actually it's a Fibonacci sequence, so
           | the next one really is https://rrr-project.org and then it's
           | https://rrrrr-project.org.
           | 
           | This does also mean that there's https://-project.org, and
           | that https://r-project.org secretly disambiguates into two
           | different projects.
        
       | InfiniteRand wrote:
       | Does rr require debug builds? Like if I took a random executable
       | on Linux and used rr record, would rr replay work?
        
         | teaearlgraycold wrote:
         | You might be relegated to stepping through disassembled machine
         | code. I was able to use rr with a home made JIT compiler,
         | stepping through JIT'd instructions. So I see no reason why you
         | can't at least get that experience with a production binary.
        
         | mstange wrote:
         | It works with optimized builds, and it works better with them
         | than gdb does.
         | 
         | When you debug an optimized build with debug info in gdb by
         | stepping line by line, it is easy to accidentally step "too
         | far" and completely lose your place. In rr, you can always step
         | back and recover.
        
       | blacksqr wrote:
       | Makes every day seem like Talk Like a Pirate Day.
        
       | sfink wrote:
       | If you use https://pernos.co/ then you don't need any of this,
       | but I have a set of only slightly buggy gdbinit scripts that
       | extend the rr debugging experience at
       | <https://github.com/hotsphink/sfink-tools>. The main things it
       | adds are:
       | 
       | 1. a `log` command that just records whatever you give it into a
       | plaintext file, together with its "point in time" according to
       | rr. This is useful because when using rr, you tend to move
       | forward and backward in time a lot, and it's hard to keep track
       | of the actual sequence of events and where you are within them.
       | It also creates a checkpoint so you can return to any one of your
       | log points. It also has some niceties like replacing any
       | expression enclosed in curly brackets with the results of
       | executing the gdb expression given, so you can do things like
       | log starting execution of Init() with v={v}. About to crash.
       | 
       | 2. a `label` command that lets you assign names to random hex
       | values. Then in the output of `p expr` or the above `log` with no
       | arguments, which displays the full set of log messages you've
       | recorded, it will replace known hex values with their labels.
       | This is _so_ much nicer than memorizing numeric values and
       | matching them up.                   (rr) p obj         $1 =
       | (JSObject*) 0x7f606892a200         (rr) label OUTER_OBJ=obj
       | (rr) p $OUTER_OBJ         (JSObject*) $OUTER_OBJ         (rr) log
       | 701/31299795 [c4] starting processing with obj=(JSObject*)
       | $OUTER_OBJ         983/31299 [c2] starting processing with
       | obj=(JSObject*) 0x7f6068a2a200         2081/7382911 [c3]
       | traversing to (JSObject*) 0x7f6069c2a7e8         3316/199 [c1]
       | crashing while accessing field of object (JSObject*) $OUTER_OBJ
       | 
       | The [c2] markers are the automatically-created checkpoints,
       | numbered in order that you made those log entries in the
       | debugger. It reorders the log messages to show them in execution
       | order rather than debugging order. Pernosco has a very similar
       | feature called the Notebook (where you only have to click on a
       | log entry to view the state at that point in time.)
       | 
       | The scripts are also intended for sharing log files and labels
       | between multiple concurrent replays of the same execution, which
       | I find useful to have separate windows each maintaining a
       | different context (point in time, and portion of the execution
       | that I'm examining.) That tends to be the buggier part of the
       | scripts, though. ;-)
       | 
       | If you're working with C or C++ (or Rust? haven't tried it), rr
       | really is a superpower. I rarely bother using straight gdb
       | anymore. It feels crippled.
        
       | raydiatian wrote:
       | Man. I wish I had this for Typescript.
       | 
       | Well done!
        
       | stefantalpalaru wrote:
       | https://github.com/rr-debugger/rr#system-requirements :
       | 
       | "rr currently requires either:                   - An Intel CPU
       | with Nehalem (2010) or later microarchitecture.         - Certain
       | AMD Zen or later processors (see https://github.com/rr-
       | debugger/rr/wiki/Zen)"
        
       | ris58h wrote:
       | Could you next time provide some small but meaningful description
       | in the title? "Rr" is a little bit short in my opinion.
        
         | [deleted]
        
         | kzrdude wrote:
         | The RR debugger
        
         | mmarq wrote:
         | Can I edit it?
        
           | kzrdude wrote:
           | Normally yes, but maybe not so late after submission.
           | Moderators can update it.
        
           | pvg wrote:
           | You could but really the title is fine, just like reposting
           | after sufficient time is fine. It's ok for a title to require
           | a click.
        
         | mlochbaum wrote:
         | It also becomes a bit of a cheat in terms of HN points, as
         | mobile users miss the link and hit the up arrow next to it.
         | 
         | https://news.ycombinator.com/item?id=30906989
        
         | asicsp wrote:
         | For those reading the above comment, but hadn't clicked the
         | article link yet (like me):
         | 
         | > _rr aspires to be your primary C /C++ debugging tool for
         | Linux, replacing -- well, enhancing -- gdb. You record a
         | failure once, then debug the recording, deterministically, as
         | many times as you want. The same execution is replayed every
         | time._
         | 
         | > _rr also provides efficient reverse execution under gdb. Set
         | breakpoints and data watchpoints and quickly reverse-execute to
         | where they were hit._
        
         | pvinis wrote:
         | maybe "rr, a gdb replacement"
        
           | bajsejohannes wrote:
           | That feels like it's underselling it a bit, since gdb does
           | not have reverse execution, which is a pretty major
           | contribution by rr.
        
             | ncmncm wrote:
             | AIUI, Gdb does claim reverse execution, for certain
             | targets. So, there are differences, but I don't understand
             | them.
        
               | jcranmer wrote:
               | gdb claims it. I have not once ever gotten it to work,
               | however. For anything seemingly larger than a trivial
               | program, the reverse-execution state grows too big and
               | needs to be pruned. I also don't think it supports such
               | fancy things as "floating point".
        
               | ynik wrote:
               | gdb's reverse execution is incredibly slow.
               | 
               | Performance overhead of reverse debuggers:
               | 
               | * gdb: >1000x (note: I never tested this one myself; just
               | heard about this overhead in a HN comment a long time
               | ago)
               | 
               | * Microsoft WinDbg TimeTravel Debugger: >40x
               | 
               | * rr: 1.5x
               | 
               | rr is the only one fast enough to be used on a regular
               | basis -- the others are slow enough that they only make
               | sense on particularly nasty bugs (usually memory
               | corruption)
        
               | fanf2 wrote:
               | There is also a commercial alternative,
               | https://undo.io/solutions/products/udb/ though I have not
               | used it myself and I don't know what its overhead is. (I
               | know some of the people who work on it.)
        
               | zempfel wrote:
               | I would be surprised if the overhead of TTD is typically
               | 40x, given that it records multithreaded processes in
               | parallel. Which, to my knowledge, rr does not.
               | 
               | It also supports selective recording so, if this is
               | configured (e.g. selecting certain functions), only a
               | subset of the process execution is actually committed to
               | the trace file, further reducing the overhead.
        
               | ynik wrote:
               | I don't know about multithreaded processes -- the program
               | I use it is single-threaded. My main use case is not to
               | make the crashes reproducible (they usually already are),
               | but to understand where a bogus value is coming from.
               | (memory breakpoint + run backwards) Which often ends up
               | being a certain third-party library that makes liberal
               | use of C unions, sometimes accessing the wrong variant...
               | 
               | I'll have to look into selective recording, but I'm not
               | sure how helpful it'll be in my use case (I don't know
               | said library well enough to predict which functions might
               | be causing the bogus values)
        
               | db48x wrote:
               | Using GDB's reverse execution requires you to already be
               | pretty sure where the bug you are looking for is, and
               | then recording a very short portion of the program,
               | preferably just 10k instructions or so. Recording for a
               | whole second could easily take 15 minutes. But it does
               | work well, within those limitations.
        
             | teddyh wrote:
             | > _gdb does not have reverse execution_
             | 
             | GDB _does_ have reverse execution:
             | 
             | https://sourceware.org/gdb/current/onlinedocs/gdb/Reverse-
             | Ex...
        
           | sph wrote:
           | It doesn't look like it's a replacement, it's more a
           | companion tool for gdb to deterministically record, replay
           | and debug a process after the fact.
        
             | db48x wrote:
             | Yea, I wouldn't call it a replacement. It acts as a GDB
             | debugging target; basically you connect a GDB process to rr
             | and GDB controls rr for you. (To confuse things further,
             | the "rr replay" command starts both rr and GDB for you, so
             | it can be difficult to see the seams.)
        
       | qumpis wrote:
       | Is there a counterpart of this in Python?
        
         | BiteCode_dev wrote:
         | There was an attempt for Python 2, but it didn't catch on.
        
         | pizza wrote:
         | pytrace https://pytrace.com
        
       | bornfreddy wrote:
       | Now _this_ is how the project documentation should look like!
       | Explains what the project does, gives some short examples of the
       | most common features, then goes into the details - while being
       | easy to understand for the target audience. Kudos to whoever
       | wrote this! (and rr sounds like a nice tool too ;) )
        
         | logbiscuitswave wrote:
         | I fully agree. I wish more intro-level documentation had this
         | kind of easy to follow and progressive level of detail.
         | 
         | All too many things like this either dive straight into the
         | deep end inundating you with superfluous details when all you
         | want is a primer, or provide so little information as to be
         | nearly useless.
         | 
         | The writers on this did a great job.
        
       | jchw wrote:
       | Historically, rr has not worked on AMD processors, which is a
       | bummer. However, I have been able to make good use of it on my
       | 5950X now with the workaround script and newer versions of rr.
       | This is good news.
       | 
       | I've not read their extended technical report, but I am kind of
       | curious exactly what performance counters AMD is implementing
       | poorly and how that impacts rr.
        
         | vchuravy wrote:
         | I think https://github.com/rr-
         | debugger/rr/issues/2034#issuecomment-6... is the right
         | synopsis.
        
         | KenoFischer wrote:
         | vchuravy's link gives the details, but basically, there's a
         | microarchitecture optimization in Zen that breaks determinism
         | of the performance counters. Fortunately, there's a chicken bit
         | that turns it off, which is what the script does. I've been
         | trying to convince AMD to officially document the bit such that
         | the kernel can set it automatically, but no luck so far.
         | 
         | There is still one remaining annoyance, which is that AMD's NMI
         | latency is super high, which directly tanks rr's reverse
         | execution latency. There's probably some improvements that
         | could be made to the replayer to be more aggressive about
         | optimistic assumptions on NMI latency and retrying if those
         | assumptions are off, but it'd be a fair bit of work. I don't
         | really understand why AMD decided to use this kind of
         | architecture. It also makes profiles much less accurate.
        
           | jchw wrote:
           | Thanks for the info. I was wondering what was going on in
           | that script. It's unfortunate that their architectural
           | decisions had to impact rr, but I guess these days, every
           | last bit of benchmark score really matters.
        
       | adgjlsfhk1 wrote:
       | Breakpoint plus reverse watch is incredibly powerful. It makes it
       | trivial to find the code that last modified a variable before a
       | breakpoint.
        
       | sph wrote:
       | This is incredible!
       | 
       | For those that have used it, how useful it is for debugging
       | multithreading heisenbugs? Can I let a process run under rr for
       | days, wait until it crashes to due a heisenbug, and replay the
       | trace without rr having to go through days of recording? i.e. is
       | it possible to fast forward the trace, somehow?
       | 
       | (I nerd sniped myself a bit here, wondering how fast forwarding
       | could be implemented. I think it might be achievable with
       | periodic process memory snapshots and incremental traces.)
        
         | pm215 wrote:
         | rr numbers each 'event' it records, and you can pass an event
         | number to the gdb 'run' command to tell it to start from that
         | event. Recent 'rr' now also supports the -e option to replay
         | meaning 'start the debug session pointing at the last recorded
         | event, whatever that was'. Details in the usage page:
         | https://github.com/rr-debugger/rr/wiki/Usage
         | 
         | AIUI you get 'start at an event' basically for free, because
         | 'step backward' is implemented as 'start at the preceding event
         | and then step forward by N', so events are frequent in the
         | trace and the machinery to get to that point without running
         | all the way from the start of the debug session exists anyway.
         | There's some stuff on the website about how this is all
         | implemented, I think.
        
         | fanf2 wrote:
         | I have not used rr heavily, but I did use it to help find a
         | multithreading heisenbug in BIND
         | https://kb.isc.org/docs/aa-01606
         | 
         | I could not reproduce the bug in less than an hour of run time,
         | which meant that analysing the bug in gdb required an hour for
         | it to run forward to the crash point, after which it was
         | possible to skip back and forth.
        
         | roca wrote:
         | You probably could record a process running for days but it
         | would also take days to replay to the end, which would not be
         | much fun. We don't create checkpoints during the recording.
         | 
         | You'd be better off restarting the recording periodically.
         | Also, rr has a "chaos mode" which randomizes scheduling and
         | often makes threading bugs easier to reproduce.
         | https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo...
        
           | Agingcoder wrote:
           | Chaos mode works quite well in my experience, definitely
           | worth trying if you don't want to wait for days.
           | 
           | I had a heisenbug which would appear once a week, and that I
           | couldn't trigger on my workstation. Chaos mode did the trick.
        
           | anarazel wrote:
           | I've found that the scheduling quanta with chaos mode are too
           | high to hit concurrency issues in a reasonable amount of
           | time. And IIUC --num-cpu-ticks is not randomized. So if
           | something happens below that tick quantum it's hard to hit.
           | 
           | I wonder if a) rr could randomize the cpu ticks as well, at
           | least in chaos mode, b) profiled code could somehow hint to
           | rr that a certain instruction would be an "interesting"
           | scheduling point.
        
             | roca wrote:
             | Chaos mode varies the scale of the tick quantum to try to
             | catch stuff like that. It doesn't always work, especially
             | if the window of vulnerability to the bug is incredibly
             | small (e.g. a few instructions).
        
               | anarazel wrote:
               | Hm. Is it possible that that works better with multi-
               | threaded than multi-process programs?
        
           | pm215 wrote:
           | Hmm, when I asked for 'replay -e' I thought it would be
           | faster than 'type run and wait' -- is it not?
        
             | roca wrote:
             | No.
        
         | patrec wrote:
         | > For those that have used it, how useful it is for debugging
         | multithreading heisenbugs?
         | 
         | Not that useful, because as it says on the page (under
         | "Limitations"):
         | 
         | > emulates a single-core machine.
        
           | semiquaver wrote:
           | Multithreading does not require multiple processors, it has
           | existed since long before SMP was a thing.
        
             | db48x wrote:
             | But rr does squash all your threads into a single virtual
             | cpu core. It context-switches between them, but ultimately
             | only one of them is running at a time. This makes it hard
             | to capture some kinds of bugs. To compensate it also has a
             | chaos mode that randomly stops switching between the
             | threads fairly (starving some and giving others more than
             | their fair share) in the hopes of triggering those same
             | bugs.
             | 
             | For most uses rr is a major win, but for race conditions it
             | sometimes doesn't help.
        
         | IshKebab wrote:
         | I haven't used it, but it might be quite useful. It does force
         | code to run on a single core, so you won't get truly concurrent
         | execution which I guess might hide some multithreading bugs. On
         | the other hand it does come with "chaos mode" which is
         | basically thread schedule fuzzing.
         | 
         | You can "fast forward" the trace as you imagine. rr works by
         | recording all non-deterministic input and output to the program
         | so it can start from the core dump and step backwards.
         | 
         | As I understand it anyway; I've never actually used it - the
         | one time I really wanted something like rr was on a Mac.
        
           | sfink wrote:
           | > You can "fast forward" the trace as you imagine. rr works
           | by recording all non-deterministic input and output to the
           | program so it can start from the core dump and step
           | backwards.
           | 
           | Not exactly. rr can't magically inflate a core dump into all
           | the open file descriptors and other state accumulated during
           | a process's execution. It needs to run from the beginning.
           | 
           | So starting from the beginning, you can let it run to any
           | arbitrary point. (And there are ways of knowing useful
           | points, eg if you record with -M it will print out event
           | counts with anything written to stdout/stderr, so you can
           | quickly run with -g to start debugging at the point that
           | message was emitted.) But it does still need to run from the
           | beginning. And you're recording a whole process tree, you
           | need to start from the initial process and let it go forward
           | to your requested point in time.
           | 
           | In practice, I usually use it by starting a replay,
           | continuing forward to a crash (or a breakpoint at some line
           | if it didn't crash), and only then starting to pay attention
           | to what's going on. It's a simple, muscle-memory process to
           | get to that point, and if it was a long recording you kind of
           | start it up and wait until it's ready. (Which will take
           | roughly as long as the initial run took to get to the same
           | point. A little slower because of the overhead, a little
           | faster because it doesn't actually have to wait for I/O,
           | averaging out to a mostly unnoticeable amount slower.)
           | 
           | I always have to mention: one of my favorite things about rr
           | is something that doesn't even require all the sophisticated
           | machinery. I often want to debug a single process within a
           | whole process tree, and with most things there aren't
           | --debugger flags (or they're broken). With rr, I can just
           | record the whole tree, then pick out the process I care about
           | after the fact. It's a small thing, but it saves me from my
           | usual hairball of wrapper scripts.
           | 
           | Random example: when debugging a gcc plugin, I record a call
           | to gcc, but the actual compile I care about is done by a
           | forked cc1plus process.
        
         | amelius wrote:
         | Next question: can I do the same with a multi-node system?
        
         | eloff wrote:
         | Only somewhat useful as it runs singlethreaded, last I checked.
         | That will prevent some forms of heisenbugs from happening.
        
       | omginternets wrote:
       | How does this work, exactly? Is it recording every state change
       | in the program?
        
         | sfink wrote:
         | No, that has been done but is much slower. It records all
         | communication with the external world. The full answer is well
         | described in https://arxiv.org/pdf/1705.05937.pdf which I'll
         | quote here:
         | 
         | > We identify a boundary around state and computation, record
         | all sources of nondeterminism within the boundary and all
         | inputs crossing into the boundary, and reexecute the
         | computation within the boundary by replaying the nondeterminism
         | and inputs. If all inputs and nondeterminism have truly been
         | captured, the state and computation within the boundary during
         | replay will match that during recording.
         | 
         | So for any chunk of time spent entirely in user space doing
         | computation, the replay starts out in the same situation and
         | executes in exactly the same way the original process did, with
         | zero overhead. That's what enables rr to be so low overhead
         | overall; most programs spend the bulk of their time computing
         | stuff and reading/writing memory. The replayed process has no
         | way of knowing that its file descriptors aren't actually open,
         | since anything it does with them will be provided by the
         | recording. Quoting again:
         | 
         | > In particular, user-space memory and register values are
         | preserved exactly, with a few exceptions noted later in the
         | paper. This implies CPU-level control flow is identical between
         | recording and replay, as is memory layout.
        
           | omginternets wrote:
           | That is very cool, and very clever! :)
        
       | sirwhinesalot wrote:
       | This sounds like a debugger I might actually enjoy using (unlike
       | all the others).
        
         | db48x wrote:
         | rr is a superpower, but pernosco is several superpowers
         | (https://pernos.co/; it's built on rr). I recommend both!
        
           | Agingcoder wrote:
           | I second this. Pernosco is just unbelievable.
        
       | pizza wrote:
       | what I would give to have something like this that additionally
       | worked on Mac and Windows too
        
       | borodi wrote:
       | rr is a life changing experience for debugging things. One
       | underrated thing is being able to save and share rr traces. rr +
       | CI makes finding and potentially fixing heisenbugs a lot easier.
        
       | DoctorOW wrote:
       | Previously: https://news.ycombinator.com/item?id=18388879
        
         | mmarq wrote:
         | Sorry, I didn't know that. I thought the site would recognise
         | duplicated links.
        
           | dang wrote:
           | As jwilk correctly says, reposts are fine after a year or so.
           | Pointing to previous links with comments is just to satisfy
           | users who might be curious for more. You did good!
        
           | [deleted]
        
           | jwilk wrote:
           | From the FAQ <https://news.ycombinator.com/newsfaq.html>:
           | 
           | > _Are reposts ok?_
           | 
           | > _If a story has not had significant attention in the last
           | year or so, a small number of reposts is ok._
        
           | mkl95 wrote:
           | Reposting is OK. I once got an email from HN staff inviting
           | me to repost a link.
        
           | speps wrote:
           | It's quite common for people to refer to old discussions
           | (here 2+ years ago) for popular projects like rr.
        
             | PostOnce wrote:
             | Yes, because there may be something to learn from the old
             | comments and the new. It's good. Different people comment
             | in different eras.
        
         | dang wrote:
         | Thanks! Macroexpanded:
         | 
         |  _Instant replay: Debugging C and C++ programs with rr_ -
         | https://news.ycombinator.com/item?id=27034588 - May 2021 (66
         | comments)
         | 
         |  _Using time travel to remotely debug faulty DRAM_ -
         | https://news.ycombinator.com/item?id=24589597 - Sept 2020 (62
         | comments)
         | 
         |  _Time Traveling Linux Bug Reporting: Coming in Julia 1.5_ -
         | https://news.ycombinator.com/item?id=23069372 - May 2020 (21
         | comments)
         | 
         |  _rr: lightweight recording and deterministic debugging_ -
         | https://news.ycombinator.com/item?id=18388879 - Nov 2018 (52
         | comments)
         | 
         |  _Rr 5.0 Released_ -
         | https://news.ycombinator.com/item?id=15191445 - Sept 2017 (3
         | comments)
         | 
         |  _Debugging Leaks with rr_ -
         | https://news.ycombinator.com/item?id=10573308 - Nov 2015 (4
         | comments)
         | 
         |  _Back to the Futu-Rr-e: Deterministic Debugging with Rr_ -
         | https://news.ycombinator.com/item?id=10492664 - Nov 2015 (9
         | comments)
         | 
         |  _Rr 4.0 Debugger Released with Reverse Execution_ -
         | https://news.ycombinator.com/item?id=10441618 - Oct 2015 (11
         | comments)
         | 
         |  _Rr records nondeterministic executions and debugs them
         | deterministically_ -
         | https://news.ycombinator.com/item?id=8817954 - Dec 2014 (9
         | comments)
         | 
         |  _Rr 3.0 Released with x86-64 Support_ -
         | https://news.ycombinator.com/item?id=8734502 - Dec 2014 (6
         | comments)
         | 
         |  _Porting rr to x86-64_ -
         | https://news.ycombinator.com/item?id=8543624 - Nov 2014 (9
         | comments)
        
       | logbiscuitswave wrote:
       | This kind of replayable debugging can be wonderful - especially
       | for hard to debug issues like heap corruption and such.
       | 
       | Windows has something similar called Time Travel Debugging[1] but
       | in my experience the dump files it creates can be enormous and be
       | a pain to analyze as a result. (It also relies on WinDbg which
       | while being extremely powerful and capable, has a huge learning
       | and usability cliff. I've been using it for over a decade and I
       | still need a cheat sheet from time to time. The revamped WinDbg
       | Preview[2] improves the UI a lot, but ultimately it's still
       | WinDbg.)
       | 
       | [1] https://docs.microsoft.com/en-us/windows-
       | hardware/drivers/de...
       | 
       | [2] https://docs.microsoft.com/en-us/windows-
       | hardware/drivers/de...
        
       | jng wrote:
       | Eager for the day someone integrates this into VS Code.
        
         | hsivonen wrote:
         | Already done: https://farre.github.io/midas/
        
         | jchw wrote:
         | FWIW, rr integrates into gdb, so it should be possible to use
         | anything that integrates with gdb.
         | 
         | https://github.com/rr-debugger/rr/wiki/Using-rr-in-an-IDE
        
       | jonnycomputer wrote:
       | Looks like a cool debugging tool, but I clicked the link because
       | I thought maybe it was related to R. Maybe modify the title to
       | make it clearer?
        
         | [deleted]
        
       ___________________________________________________________________
       (page generated 2022-06-04 23:01 UTC)