[HN Gopher] Unwinding the Stack the Hard Way
___________________________________________________________________
Unwinding the Stack the Hard Way
Author : todsacerdoti
Score : 63 points
Date : 2023-04-16 17:51 UTC (5 hours ago)
(HTM) web link (lesenechal.fr)
(TXT) w3m dump (lesenechal.fr)
| zX41ZdbW wrote:
| We have implemented asynchronous signal-safe in-process stack
| unwinding for always-on profiler in ClickHouse:
| https://clickhouse.com/docs/en/operations/optimizing-perform...
|
| The downside - it required many patches to LLVM's libunwind, and
| not all of them are accepted yet:
| https://bugs.llvm.org/show_bug.cgi?id=48186
|
| ClickHouse source code: https://github.com/ClickHouse/ClickHouse
| brancz wrote:
| For the always-on open-source profiler we happen to work on at
| work we had to do similar things and it was even more involved
| since we base the whole thing on eBPF to lower the overhead and
| therefore needed to get the verifier to accept it. [1]
|
| We really wish frame pointers were always present, but here we
| are.
|
| [1]
| https://www.polarsignals.com/blog/posts/2022/11/29/profiling...
| the_mitsuhiko wrote:
| Please just stop omitting frame pointers. You might lose out on 6
| months of CPU speed advances, but from then on out you will reap
| the benefits of better production observability for years to
| come.
| rwmj wrote:
| I would suggest _not_ omitting the frame pointer. Fedora recently
| changed the default and it makes collecting stack traces vastly
| simpler, leading to better profiling support
| (https://rwmj.wordpress.com/2023/02/14/frame-pointers-vs-dwar...)
|
| Since then I gave a short (15 min) talk about producing and
| understanding flame graphs:
| http://oirase.annexia.org/tmp/2023-03-08-flamegraphs.mp4
| fweimer wrote:
| I still hope we can revert that once we have complete, CPU-
| verified backtraces. As a linked list, frame pointers are still
| somewhat slow. Just copying the hardware address stack should
| be quite a bit faster, and more importantly, the CPU will
| enforce that the addresses are correct.
| boulos wrote:
| Yeah, mandating rbp on amd64 would have saved years of
| headache. There are still cases like tail calls that wouldn't
| work, but the _vast majority_ of code doesn 't need to care
| about dedicating a register to having usable backtraces.
| rco8786 wrote:
| I figured this article would be about Golang.
|
| if err := nil { return err }
| userbinator wrote:
| There's a far simpler and also _very_ generic method that I 'm
| surprised no one has really mentioned much, although it's well-
| known in some corners of the RE/Asm community and I believe some
| debuggers (but not gdb?) use it: scan the stack for values that
| look like valid code addresses, then disassemble backwards from
| there to see if they were actually return addresses written there
| by a call instruction. You will find a chain that leads back to
| the entry point.
|
| Of course it won't work in edge cases like handwritten Asm that
| uses the stack in more clever ways, but when you're dealing with
| compiler output, it'll be fine. No need for all this complexity,
| and works in almost all cases.
| loeg wrote:
| Do you analyze the dynamic loader segments (to determine code-
| like addresses) or just do some hard-coded approximations per
| platform? Pretty cute idea and I agree it will often work.
| userbinator wrote:
| It just needs to be on an executable page. The processor
| doesn't care about anything else, so this will work with
| things like JIT-generated code too. The key idea is that in
| essentially all cases of compiler-generated code, every
| return address will be immediately after the call that wrote
| it.
| loeg wrote:
| Yeah, but how do you look up executable page mappings?
| (Dynamic loader segments will have at least one executable
| Seg mapped somewhere.)
| userbinator wrote:
| Linux: /proc/$pid/maps
|
| Windows: VirtualQuery()
|
| Mac: vm_region()
|
| Your own OS (like this article): you should know.
| loeg wrote:
| Ok, some platform specific mechanisms. Cool.
| the_mitsuhiko wrote:
| That method is pretty much guarnateed to give you completely
| broken stacks. We (Sentry) use it as a fallback if nothing else
| works, and the success rate is awful.
| loeg wrote:
| Please elaborate on what breaks.
| xuhu wrote:
| Probably most valid code addresses on the stack are from
| older calls that are no longer part of the current call
| chain.
| the_mitsuhiko wrote:
| The act of stackwalking breaks. You often end up in
| completely nonsensical stacks halfway through and then all
| is lost.
| loeg wrote:
| Is this just an artifact of old return addresses on the
| stack not being overwritten?
| userbinator wrote:
| Are you also doing back-disassembly to link the call chain?
| I've had very good success with it. I agree that without
| analysing the potential call sites and making sure that they
| are actually possible, it won't work.
| the_mitsuhiko wrote:
| Generally what we do when we fall back to scanning depends
| on the architecture. For x86 you can find the logic here:
| https://github.com/rust-minidump/rust-
| minidump/blob/77638ab7...
|
| Since we do post-hoc stack walking our ability to actually
| look at the assembly is limited. In most cases where we
| have no CFI we also do not have the binary to begin with,
| so we're in random memory land.
___________________________________________________________________
(page generated 2023-04-16 23:00 UTC)