[HN Gopher] What rr does
___________________________________________________________________
What rr does
Author : mmarq
Score : 796 points
Date : 2022-06-04 06:28 UTC (16 hours ago)
(HTM) web link (rr-project.org)
(TXT) w3m dump (rr-project.org)
| stewbrew wrote:
| We already have https://r-project.org. Now we have https://rr-
| project.org. So, https://rrr-project.org is next?
| [deleted]
| leoff wrote:
| It is actually a geometric progression, so https://rrrr-
| project.org would be the next one.
| throwamon wrote:
| Common misconception. Actually it's a Fibonacci sequence, so
| the next one really is https://rrr-project.org and then it's
| https://rrrrr-project.org.
|
| This does also mean that there's https://-project.org, and
| that https://r-project.org secretly disambiguates into two
| different projects.
| InfiniteRand wrote:
| Does rr require debug builds? Like if I took a random executable
| on Linux and used rr record, would rr replay work?
| teaearlgraycold wrote:
| You might be relegated to stepping through disassembled machine
| code. I was able to use rr with a home made JIT compiler,
| stepping through JIT'd instructions. So I see no reason why you
| can't at least get that experience with a production binary.
| mstange wrote:
| It works with optimized builds, and it works better with them
| than gdb does.
|
| When you debug an optimized build with debug info in gdb by
| stepping line by line, it is easy to accidentally step "too
| far" and completely lose your place. In rr, you can always step
| back and recover.
| blacksqr wrote:
| Makes every day seem like Talk Like a Pirate Day.
| sfink wrote:
| If you use https://pernos.co/ then you don't need any of this,
| but I have a set of only slightly buggy gdbinit scripts that
| extend the rr debugging experience at
| <https://github.com/hotsphink/sfink-tools>. The main things it
| adds are:
|
| 1. a `log` command that just records whatever you give it into a
| plaintext file, together with its "point in time" according to
| rr. This is useful because when using rr, you tend to move
| forward and backward in time a lot, and it's hard to keep track
| of the actual sequence of events and where you are within them.
| It also creates a checkpoint so you can return to any one of your
| log points. It also has some niceties like replacing any
| expression enclosed in curly brackets with the results of
| executing the gdb expression given, so you can do things like
| log starting execution of Init() with v={v}. About to crash.
|
| 2. a `label` command that lets you assign names to random hex
| values. Then in the output of `p expr` or the above `log` with no
| arguments, which displays the full set of log messages you've
| recorded, it will replace known hex values with their labels.
| This is _so_ much nicer than memorizing numeric values and
| matching them up. (rr) p obj $1 =
| (JSObject*) 0x7f606892a200 (rr) label OUTER_OBJ=obj
| (rr) p $OUTER_OBJ (JSObject*) $OUTER_OBJ (rr) log
| 701/31299795 [c4] starting processing with obj=(JSObject*)
| $OUTER_OBJ 983/31299 [c2] starting processing with
| obj=(JSObject*) 0x7f6068a2a200 2081/7382911 [c3]
| traversing to (JSObject*) 0x7f6069c2a7e8 3316/199 [c1]
| crashing while accessing field of object (JSObject*) $OUTER_OBJ
|
| The [c2] markers are the automatically-created checkpoints,
| numbered in order that you made those log entries in the
| debugger. It reorders the log messages to show them in execution
| order rather than debugging order. Pernosco has a very similar
| feature called the Notebook (where you only have to click on a
| log entry to view the state at that point in time.)
|
| The scripts are also intended for sharing log files and labels
| between multiple concurrent replays of the same execution, which
| I find useful to have separate windows each maintaining a
| different context (point in time, and portion of the execution
| that I'm examining.) That tends to be the buggier part of the
| scripts, though. ;-)
|
| If you're working with C or C++ (or Rust? haven't tried it), rr
| really is a superpower. I rarely bother using straight gdb
| anymore. It feels crippled.
| raydiatian wrote:
| Man. I wish I had this for Typescript.
|
| Well done!
| stefantalpalaru wrote:
| https://github.com/rr-debugger/rr#system-requirements :
|
| "rr currently requires either: - An Intel CPU
| with Nehalem (2010) or later microarchitecture. - Certain
| AMD Zen or later processors (see https://github.com/rr-
| debugger/rr/wiki/Zen)"
| ris58h wrote:
| Could you next time provide some small but meaningful description
| in the title? "Rr" is a little bit short in my opinion.
| [deleted]
| kzrdude wrote:
| The RR debugger
| mmarq wrote:
| Can I edit it?
| kzrdude wrote:
| Normally yes, but maybe not so late after submission.
| Moderators can update it.
| pvg wrote:
| You could but really the title is fine, just like reposting
| after sufficient time is fine. It's ok for a title to require
| a click.
| mlochbaum wrote:
| It also becomes a bit of a cheat in terms of HN points, as
| mobile users miss the link and hit the up arrow next to it.
|
| https://news.ycombinator.com/item?id=30906989
| asicsp wrote:
| For those reading the above comment, but hadn't clicked the
| article link yet (like me):
|
| > _rr aspires to be your primary C /C++ debugging tool for
| Linux, replacing -- well, enhancing -- gdb. You record a
| failure once, then debug the recording, deterministically, as
| many times as you want. The same execution is replayed every
| time._
|
| > _rr also provides efficient reverse execution under gdb. Set
| breakpoints and data watchpoints and quickly reverse-execute to
| where they were hit._
| pvinis wrote:
| maybe "rr, a gdb replacement"
| bajsejohannes wrote:
| That feels like it's underselling it a bit, since gdb does
| not have reverse execution, which is a pretty major
| contribution by rr.
| ncmncm wrote:
| AIUI, Gdb does claim reverse execution, for certain
| targets. So, there are differences, but I don't understand
| them.
| jcranmer wrote:
| gdb claims it. I have not once ever gotten it to work,
| however. For anything seemingly larger than a trivial
| program, the reverse-execution state grows too big and
| needs to be pruned. I also don't think it supports such
| fancy things as "floating point".
| ynik wrote:
| gdb's reverse execution is incredibly slow.
|
| Performance overhead of reverse debuggers:
|
| * gdb: >1000x (note: I never tested this one myself; just
| heard about this overhead in a HN comment a long time
| ago)
|
| * Microsoft WinDbg TimeTravel Debugger: >40x
|
| * rr: 1.5x
|
| rr is the only one fast enough to be used on a regular
| basis -- the others are slow enough that they only make
| sense on particularly nasty bugs (usually memory
| corruption)
| fanf2 wrote:
| There is also a commercial alternative,
| https://undo.io/solutions/products/udb/ though I have not
| used it myself and I don't know what its overhead is. (I
| know some of the people who work on it.)
| zempfel wrote:
| I would be surprised if the overhead of TTD is typically
| 40x, given that it records multithreaded processes in
| parallel. Which, to my knowledge, rr does not.
|
| It also supports selective recording so, if this is
| configured (e.g. selecting certain functions), only a
| subset of the process execution is actually committed to
| the trace file, further reducing the overhead.
| ynik wrote:
| I don't know about multithreaded processes -- the program
| I use it is single-threaded. My main use case is not to
| make the crashes reproducible (they usually already are),
| but to understand where a bogus value is coming from.
| (memory breakpoint + run backwards) Which often ends up
| being a certain third-party library that makes liberal
| use of C unions, sometimes accessing the wrong variant...
|
| I'll have to look into selective recording, but I'm not
| sure how helpful it'll be in my use case (I don't know
| said library well enough to predict which functions might
| be causing the bogus values)
| db48x wrote:
| Using GDB's reverse execution requires you to already be
| pretty sure where the bug you are looking for is, and
| then recording a very short portion of the program,
| preferably just 10k instructions or so. Recording for a
| whole second could easily take 15 minutes. But it does
| work well, within those limitations.
| teddyh wrote:
| > _gdb does not have reverse execution_
|
| GDB _does_ have reverse execution:
|
| https://sourceware.org/gdb/current/onlinedocs/gdb/Reverse-
| Ex...
| sph wrote:
| It doesn't look like it's a replacement, it's more a
| companion tool for gdb to deterministically record, replay
| and debug a process after the fact.
| db48x wrote:
| Yea, I wouldn't call it a replacement. It acts as a GDB
| debugging target; basically you connect a GDB process to rr
| and GDB controls rr for you. (To confuse things further,
| the "rr replay" command starts both rr and GDB for you, so
| it can be difficult to see the seams.)
| qumpis wrote:
| Is there a counterpart of this in Python?
| BiteCode_dev wrote:
| There was an attempt for Python 2, but it didn't catch on.
| pizza wrote:
| pytrace https://pytrace.com
| bornfreddy wrote:
| Now _this_ is how the project documentation should look like!
| Explains what the project does, gives some short examples of the
| most common features, then goes into the details - while being
| easy to understand for the target audience. Kudos to whoever
| wrote this! (and rr sounds like a nice tool too ;) )
| logbiscuitswave wrote:
| I fully agree. I wish more intro-level documentation had this
| kind of easy to follow and progressive level of detail.
|
| All too many things like this either dive straight into the
| deep end inundating you with superfluous details when all you
| want is a primer, or provide so little information as to be
| nearly useless.
|
| The writers on this did a great job.
| jchw wrote:
| Historically, rr has not worked on AMD processors, which is a
| bummer. However, I have been able to make good use of it on my
| 5950X now with the workaround script and newer versions of rr.
| This is good news.
|
| I've not read their extended technical report, but I am kind of
| curious exactly what performance counters AMD is implementing
| poorly and how that impacts rr.
| vchuravy wrote:
| I think https://github.com/rr-
| debugger/rr/issues/2034#issuecomment-6... is the right
| synopsis.
| KenoFischer wrote:
| vchuravy's link gives the details, but basically, there's a
| microarchitecture optimization in Zen that breaks determinism
| of the performance counters. Fortunately, there's a chicken bit
| that turns it off, which is what the script does. I've been
| trying to convince AMD to officially document the bit such that
| the kernel can set it automatically, but no luck so far.
|
| There is still one remaining annoyance, which is that AMD's NMI
| latency is super high, which directly tanks rr's reverse
| execution latency. There's probably some improvements that
| could be made to the replayer to be more aggressive about
| optimistic assumptions on NMI latency and retrying if those
| assumptions are off, but it'd be a fair bit of work. I don't
| really understand why AMD decided to use this kind of
| architecture. It also makes profiles much less accurate.
| jchw wrote:
| Thanks for the info. I was wondering what was going on in
| that script. It's unfortunate that their architectural
| decisions had to impact rr, but I guess these days, every
| last bit of benchmark score really matters.
| adgjlsfhk1 wrote:
| Breakpoint plus reverse watch is incredibly powerful. It makes it
| trivial to find the code that last modified a variable before a
| breakpoint.
| sph wrote:
| This is incredible!
|
| For those that have used it, how useful it is for debugging
| multithreading heisenbugs? Can I let a process run under rr for
| days, wait until it crashes to due a heisenbug, and replay the
| trace without rr having to go through days of recording? i.e. is
| it possible to fast forward the trace, somehow?
|
| (I nerd sniped myself a bit here, wondering how fast forwarding
| could be implemented. I think it might be achievable with
| periodic process memory snapshots and incremental traces.)
| pm215 wrote:
| rr numbers each 'event' it records, and you can pass an event
| number to the gdb 'run' command to tell it to start from that
| event. Recent 'rr' now also supports the -e option to replay
| meaning 'start the debug session pointing at the last recorded
| event, whatever that was'. Details in the usage page:
| https://github.com/rr-debugger/rr/wiki/Usage
|
| AIUI you get 'start at an event' basically for free, because
| 'step backward' is implemented as 'start at the preceding event
| and then step forward by N', so events are frequent in the
| trace and the machinery to get to that point without running
| all the way from the start of the debug session exists anyway.
| There's some stuff on the website about how this is all
| implemented, I think.
| fanf2 wrote:
| I have not used rr heavily, but I did use it to help find a
| multithreading heisenbug in BIND
| https://kb.isc.org/docs/aa-01606
|
| I could not reproduce the bug in less than an hour of run time,
| which meant that analysing the bug in gdb required an hour for
| it to run forward to the crash point, after which it was
| possible to skip back and forth.
| roca wrote:
| You probably could record a process running for days but it
| would also take days to replay to the end, which would not be
| much fun. We don't create checkpoints during the recording.
|
| You'd be better off restarting the recording periodically.
| Also, rr has a "chaos mode" which randomizes scheduling and
| often makes threading bugs easier to reproduce.
| https://robert.ocallahan.org/2016/02/introducing-rr-chaos-mo...
| Agingcoder wrote:
| Chaos mode works quite well in my experience, definitely
| worth trying if you don't want to wait for days.
|
| I had a heisenbug which would appear once a week, and that I
| couldn't trigger on my workstation. Chaos mode did the trick.
| anarazel wrote:
| I've found that the scheduling quanta with chaos mode are too
| high to hit concurrency issues in a reasonable amount of
| time. And IIUC --num-cpu-ticks is not randomized. So if
| something happens below that tick quantum it's hard to hit.
|
| I wonder if a) rr could randomize the cpu ticks as well, at
| least in chaos mode, b) profiled code could somehow hint to
| rr that a certain instruction would be an "interesting"
| scheduling point.
| roca wrote:
| Chaos mode varies the scale of the tick quantum to try to
| catch stuff like that. It doesn't always work, especially
| if the window of vulnerability to the bug is incredibly
| small (e.g. a few instructions).
| anarazel wrote:
| Hm. Is it possible that that works better with multi-
| threaded than multi-process programs?
| pm215 wrote:
| Hmm, when I asked for 'replay -e' I thought it would be
| faster than 'type run and wait' -- is it not?
| roca wrote:
| No.
| patrec wrote:
| > For those that have used it, how useful it is for debugging
| multithreading heisenbugs?
|
| Not that useful, because as it says on the page (under
| "Limitations"):
|
| > emulates a single-core machine.
| semiquaver wrote:
| Multithreading does not require multiple processors, it has
| existed since long before SMP was a thing.
| db48x wrote:
| But rr does squash all your threads into a single virtual
| cpu core. It context-switches between them, but ultimately
| only one of them is running at a time. This makes it hard
| to capture some kinds of bugs. To compensate it also has a
| chaos mode that randomly stops switching between the
| threads fairly (starving some and giving others more than
| their fair share) in the hopes of triggering those same
| bugs.
|
| For most uses rr is a major win, but for race conditions it
| sometimes doesn't help.
| IshKebab wrote:
| I haven't used it, but it might be quite useful. It does force
| code to run on a single core, so you won't get truly concurrent
| execution which I guess might hide some multithreading bugs. On
| the other hand it does come with "chaos mode" which is
| basically thread schedule fuzzing.
|
| You can "fast forward" the trace as you imagine. rr works by
| recording all non-deterministic input and output to the program
| so it can start from the core dump and step backwards.
|
| As I understand it anyway; I've never actually used it - the
| one time I really wanted something like rr was on a Mac.
| sfink wrote:
| > You can "fast forward" the trace as you imagine. rr works
| by recording all non-deterministic input and output to the
| program so it can start from the core dump and step
| backwards.
|
| Not exactly. rr can't magically inflate a core dump into all
| the open file descriptors and other state accumulated during
| a process's execution. It needs to run from the beginning.
|
| So starting from the beginning, you can let it run to any
| arbitrary point. (And there are ways of knowing useful
| points, eg if you record with -M it will print out event
| counts with anything written to stdout/stderr, so you can
| quickly run with -g to start debugging at the point that
| message was emitted.) But it does still need to run from the
| beginning. And you're recording a whole process tree, you
| need to start from the initial process and let it go forward
| to your requested point in time.
|
| In practice, I usually use it by starting a replay,
| continuing forward to a crash (or a breakpoint at some line
| if it didn't crash), and only then starting to pay attention
| to what's going on. It's a simple, muscle-memory process to
| get to that point, and if it was a long recording you kind of
| start it up and wait until it's ready. (Which will take
| roughly as long as the initial run took to get to the same
| point. A little slower because of the overhead, a little
| faster because it doesn't actually have to wait for I/O,
| averaging out to a mostly unnoticeable amount slower.)
|
| I always have to mention: one of my favorite things about rr
| is something that doesn't even require all the sophisticated
| machinery. I often want to debug a single process within a
| whole process tree, and with most things there aren't
| --debugger flags (or they're broken). With rr, I can just
| record the whole tree, then pick out the process I care about
| after the fact. It's a small thing, but it saves me from my
| usual hairball of wrapper scripts.
|
| Random example: when debugging a gcc plugin, I record a call
| to gcc, but the actual compile I care about is done by a
| forked cc1plus process.
| amelius wrote:
| Next question: can I do the same with a multi-node system?
| eloff wrote:
| Only somewhat useful as it runs singlethreaded, last I checked.
| That will prevent some forms of heisenbugs from happening.
| omginternets wrote:
| How does this work, exactly? Is it recording every state change
| in the program?
| sfink wrote:
| No, that has been done but is much slower. It records all
| communication with the external world. The full answer is well
| described in https://arxiv.org/pdf/1705.05937.pdf which I'll
| quote here:
|
| > We identify a boundary around state and computation, record
| all sources of nondeterminism within the boundary and all
| inputs crossing into the boundary, and reexecute the
| computation within the boundary by replaying the nondeterminism
| and inputs. If all inputs and nondeterminism have truly been
| captured, the state and computation within the boundary during
| replay will match that during recording.
|
| So for any chunk of time spent entirely in user space doing
| computation, the replay starts out in the same situation and
| executes in exactly the same way the original process did, with
| zero overhead. That's what enables rr to be so low overhead
| overall; most programs spend the bulk of their time computing
| stuff and reading/writing memory. The replayed process has no
| way of knowing that its file descriptors aren't actually open,
| since anything it does with them will be provided by the
| recording. Quoting again:
|
| > In particular, user-space memory and register values are
| preserved exactly, with a few exceptions noted later in the
| paper. This implies CPU-level control flow is identical between
| recording and replay, as is memory layout.
| omginternets wrote:
| That is very cool, and very clever! :)
| sirwhinesalot wrote:
| This sounds like a debugger I might actually enjoy using (unlike
| all the others).
| db48x wrote:
| rr is a superpower, but pernosco is several superpowers
| (https://pernos.co/; it's built on rr). I recommend both!
| Agingcoder wrote:
| I second this. Pernosco is just unbelievable.
| pizza wrote:
| what I would give to have something like this that additionally
| worked on Mac and Windows too
| borodi wrote:
| rr is a life changing experience for debugging things. One
| underrated thing is being able to save and share rr traces. rr +
| CI makes finding and potentially fixing heisenbugs a lot easier.
| DoctorOW wrote:
| Previously: https://news.ycombinator.com/item?id=18388879
| mmarq wrote:
| Sorry, I didn't know that. I thought the site would recognise
| duplicated links.
| dang wrote:
| As jwilk correctly says, reposts are fine after a year or so.
| Pointing to previous links with comments is just to satisfy
| users who might be curious for more. You did good!
| [deleted]
| jwilk wrote:
| From the FAQ <https://news.ycombinator.com/newsfaq.html>:
|
| > _Are reposts ok?_
|
| > _If a story has not had significant attention in the last
| year or so, a small number of reposts is ok._
| mkl95 wrote:
| Reposting is OK. I once got an email from HN staff inviting
| me to repost a link.
| speps wrote:
| It's quite common for people to refer to old discussions
| (here 2+ years ago) for popular projects like rr.
| PostOnce wrote:
| Yes, because there may be something to learn from the old
| comments and the new. It's good. Different people comment
| in different eras.
| dang wrote:
| Thanks! Macroexpanded:
|
| _Instant replay: Debugging C and C++ programs with rr_ -
| https://news.ycombinator.com/item?id=27034588 - May 2021 (66
| comments)
|
| _Using time travel to remotely debug faulty DRAM_ -
| https://news.ycombinator.com/item?id=24589597 - Sept 2020 (62
| comments)
|
| _Time Traveling Linux Bug Reporting: Coming in Julia 1.5_ -
| https://news.ycombinator.com/item?id=23069372 - May 2020 (21
| comments)
|
| _rr: lightweight recording and deterministic debugging_ -
| https://news.ycombinator.com/item?id=18388879 - Nov 2018 (52
| comments)
|
| _Rr 5.0 Released_ -
| https://news.ycombinator.com/item?id=15191445 - Sept 2017 (3
| comments)
|
| _Debugging Leaks with rr_ -
| https://news.ycombinator.com/item?id=10573308 - Nov 2015 (4
| comments)
|
| _Back to the Futu-Rr-e: Deterministic Debugging with Rr_ -
| https://news.ycombinator.com/item?id=10492664 - Nov 2015 (9
| comments)
|
| _Rr 4.0 Debugger Released with Reverse Execution_ -
| https://news.ycombinator.com/item?id=10441618 - Oct 2015 (11
| comments)
|
| _Rr records nondeterministic executions and debugs them
| deterministically_ -
| https://news.ycombinator.com/item?id=8817954 - Dec 2014 (9
| comments)
|
| _Rr 3.0 Released with x86-64 Support_ -
| https://news.ycombinator.com/item?id=8734502 - Dec 2014 (6
| comments)
|
| _Porting rr to x86-64_ -
| https://news.ycombinator.com/item?id=8543624 - Nov 2014 (9
| comments)
| logbiscuitswave wrote:
| This kind of replayable debugging can be wonderful - especially
| for hard to debug issues like heap corruption and such.
|
| Windows has something similar called Time Travel Debugging[1] but
| in my experience the dump files it creates can be enormous and be
| a pain to analyze as a result. (It also relies on WinDbg which
| while being extremely powerful and capable, has a huge learning
| and usability cliff. I've been using it for over a decade and I
| still need a cheat sheet from time to time. The revamped WinDbg
| Preview[2] improves the UI a lot, but ultimately it's still
| WinDbg.)
|
| [1] https://docs.microsoft.com/en-us/windows-
| hardware/drivers/de...
|
| [2] https://docs.microsoft.com/en-us/windows-
| hardware/drivers/de...
| jng wrote:
| Eager for the day someone integrates this into VS Code.
| hsivonen wrote:
| Already done: https://farre.github.io/midas/
| jchw wrote:
| FWIW, rr integrates into gdb, so it should be possible to use
| anything that integrates with gdb.
|
| https://github.com/rr-debugger/rr/wiki/Using-rr-in-an-IDE
| jonnycomputer wrote:
| Looks like a cool debugging tool, but I clicked the link because
| I thought maybe it was related to R. Maybe modify the title to
| make it clearer?
| [deleted]
___________________________________________________________________
(page generated 2022-06-04 23:01 UTC)