[HN Gopher] Reverse Debugging at Scale
___________________________________________________________________
Reverse Debugging at Scale
Author : kiyanwang
Score : 57 points
Date : 2021-05-03 11:26 UTC (11 hours ago)
(HTM) web link (engineering.fb.com)
(TXT) w3m dump (engineering.fb.com)
| [deleted]
| jeffbee wrote:
| It is fairly surprising to me that FB would pay a roughly 5%
| throughput penalty to get this.
| adamfeldman wrote:
| Where does the 5% number come from? I didn't see it in the
| article or on the linked page for Intel Processor Trace.
| jeffbee wrote:
| Pulled from my experience. How much stuff needs to be logged
| depends on the structure of the program, and different
| programs are more or less sensitive to having some of their
| memory store bandwidth stolen.
| adamfeldman wrote:
| Thank you for sharing!
| sacheendra wrote:
| It is not that surprising. Facebook is complex ecosystem, and
| therefore experiences lots of outages/problems.
|
| Looking at their status page
| (https://developers.facebook.com/status/dashboard/), they seem
| to have a problems every week. These are only the public ones!
|
| I guess they figured they are losing more money due to these
| problems than the additional 5% they have to spend on
| infrastructure.
| jeffbee wrote:
| I can see why an org would choose to do this, but the number
| is still frightening. At Google, we were held to a limit of
| at most 0.01% cost for sampling LBR events. 5% for debug-
| ability just seems really high.
| slver wrote:
| In a car, you get the best speed when you press the pedal to
| the metal and close your eyes. Yet we pay performance penalty
| by driving it, instead.
| r00fus wrote:
| Car analogy is not applicable - think more instead of
| lowering speed of the entire high-speed roadway by 5mph (e.g.
| by road material/quality) - has a qualitative difference at
| that scale.
| slver wrote:
| Point is slowing down a bit, allows us to see what's
| happening, so we can make course corrections, and crash
| less.
|
| That's an understandable tradeoff for car driving.
|
| It's an understandable tradeoff for Facebook debugging.
|
| Ergo, we do it.
| DSingularity wrote:
| They must have a reason. Probably helps them resolve otherwise
| costly failures in good time.
| bluedino wrote:
| Sort of surprised to see VScode and LLDB mentioned. So Java or
| C++? Rust?
| Veserv wrote:
| The technology they are describing is largely language-agnostic
| as it is just reconstructing the sequence of hardware
| instructions that executed. So, in principle you can apply the
| underlying technique to any language as long as you can
| determine the source line that corresponds to a hardware
| instruction at a point in time. Which is already done by any
| standard debugger, at least for AOT compiled languages, as this
| is how a debugger can use the hardware instruction the
| processor stopped at to tell you which source code line you are
| stopped at. For JIT or interpreted languages it is slightly
| more complex, but still a solved problem.
| roca wrote:
| It won't work for anything with a JIT or interpreter, not
| without _significantly_ more work.
| Veserv wrote:
| Assuming that a Java debugger can convert a breakpoint to
| its corresponding source line, it must maintain some sort
| of source<->assembly mapping that transforms over time to
| do that lookup. As long as you record those changes, namely
| the introduction or destruction of any branches that Intel
| PT would record, the same underlying approach should work.
| The primary complexities there would be making sure those
| JIT records are ordered correctly with respect to branches
| in the actual program, and if the JIT deletes the original
| program text as that might require actually reversing the
| execution and JIT history to recover the instructions at
| the time of recording. This would require adding some
| instrumentation to the JIT to record branches that were
| inserted or deleted, but that seems like something that can
| be implemented as a post-processing step at a relatively
| minor performance cost, so it seems quite doable. If there
| are no deletions then you could just use the final JIT
| state for the source<->assembly mapping. Is there something
| that I am missing beyond glossing over the potential
| difficulties of engaging with a giant code base that might
| not be amenable to changes?
|
| As for an interpreter I have not really thought about it
| too hard. It might be harder than I was originally
| considering because I was thinking in the context of a full
| data trace which would just let you re-run the program +
| interpreter. With just an instruction trace you might need
| a lot more support from the interpreter. Alternatively, you
| might be able to do it if the interpreter internals
| properly separate out handling for the interpreted
| instructions and you could use that to reverse engineer
| what the interpreted program executed. Though that would
| probably require a fair bit of language/interpreter-
| specific work. Also, given the expected relative execution
| speeds of probably ~10x, it would probably not be so great
| since you get so much less execution per unit of storage.
| Veserv wrote:
| From what I can tell, they are just using standard instruction
| trace rather than a full trace, so they can only inspect
| execution history rather than full data history that most other
| time travel debugging solutions provide. The advantages of their
| approach of just using the standard hardware instruction trace
| functionality is that it functions even on shared-memory
| multithreaded applications at "full" speed unlike most other time
| travel debugging solutions. The disadvantages being that it
| requires higher storage bandwidth, Intel does not seem to support
| data trace, and even if it did support data trace would require
| significantly more storage bandwidth (something like 10x-100x).
| inglor wrote:
| Ok, so how is thousands of servers 0.1%? That implies they have
| millions of servers or one for every 9000 people on earth - are
| companies this size really that wasteful in terms of servers
| needed?
| akiselev wrote:
| If they have two million servers that would mean about 1000
| daily active users per server. Assuming the average user makes
| 2000 requests (API calls, images, videos, etc.) a day
| mindlessly browsing the infinite feed, that works out to about
| 1 request per second.
|
| Facebook makes its money from advertisers so that's likely
| where most of the compute resources are going - users just see
| the ads at the end of all that computation. Combined with the
| mandatory over provisioning, the overhead of a massive
| distributed systems, tracing, etc, I'm not surprised those are
| the numbers.
|
| Assuming each server cost an average of $20k, that's $40
| billion which is two quarters worth of revenue but amortized
| over 5+ years. It's really not all that much.
| packetslave wrote:
| The official public answer is "millions of servers" (see, for
| example, https://engineering.fb.com/2018/07/19/data-
| infrastructure/lo...).
|
| Keep in mind that this includes Instagram and Whatsapp too, as
| far as I know. As for "wasteful", well.. 1.88
| billion DAU (Q1 earning report) / 86400 =
| 21759 "users per second" (note: I made this up)
|
| Multiply that by N where N is the number of frontend and
| backend queries it takes to service one user, and you have a
| _lot_ of QPS. Now add in headroom, redundancy, edge POPs and
| CDN to put servers as close to users as possible, etc.
|
| It's hard to fathom just how _big_ traffic to FAANG services
| can be, until you see what it takes to serve it. Is there some
| waste? Sure, probably, but not as much as you 'd think.
___________________________________________________________________
(page generated 2021-05-03 23:02 UTC)