[HN Gopher] How to build highly-debuggable C++ binaries
___________________________________________________________________
How to build highly-debuggable C++ binaries
Author : synergy20
Score : 133 points
Date : 2024-07-26 00:00 UTC (3 days ago)
(HTM) web link (dhashe.com)
(TXT) w3m dump (dhashe.com)
| mark_undoio wrote:
| Good to see this discussed - debuggability is not talked about
| enough but, done right, it could be a superpower.
|
| Setting the build for an old x64 machine (https://dhashe.com/how-
| to-build-highly-debuggable-c-binaries...) for reversible / time
| travel debuggers seems unnecessarily restrictive to me. I'd
| expect a modern time travel debug tool (e.g. either rr or Undo -
| disclaimer, which I work on) to cope fine with most modern
| instructions (I believe GDB's built-in record / replay debugging
| tends to be further behind the curve on new CPU instructions -
| but if you're doing anything at scale it's not the right choice
| anyhow).
|
| Regarding compilation (https://dhashe.com/how-to-build-highly-
| debuggable-c-binaries...) - we generally advise customers to use
| -Og rather than -O0. As the article states, this will still
| optimise out some code but should be a good trade-off without
| being too slow. (NB. last I checked, clang currently uses -Og as
| an alias for -O1, so it may behave less satisfactorily than under
| GCC).
|
| It's also not said enough but: you don't need a special debug
| build to be able to debug. It's less user-friendly to debug a
| fully-optimised release build but it's totally possible. You just
| need to retain the DWARF debug info (instead of throwing it
| away). This is really important to know if you're debugging on a
| customer system or analysing a bug that's only in release builds.
| saagarjha wrote:
| Having debugged a lot of optimized code I would strongly
| recommend against it unless you are in a context where
| performance of your build is paramount (games?) or you cannot
| reproduce the bug when compiled without optimizations.
| Compilers really do a terrible job at preserving useful debug
| info when you turn them on. It's a massive pain to have
| everything be marked as "optimized out" and reassemble the
| things you want from other variables or by using a disassembler
| to manually track which register the value is hiding in.
| SleepyMyroslav wrote:
| It's not only games. Anything sizeable that needs to run to
| repro will crumble under 20-100x times slowdown.
| Multithreaded behaviors will be just different. All those
| wonderful templated abstractions do not come for free in -O0.
| Ranges are especially egregious example. Debug build is truly
| dead outside of unit testing (imho).
|
| Realistic scenario that gamedev uses: deoptimize translation
| units you are interested in finding or reproducing bugs.
| senkora wrote:
| > Realistic scenario that gamedev uses: deoptimize
| translation units you are interested in finding or
| reproducing bugs.
|
| Yep, looks like that's this bullet point:
|
| https://dhashe.com/category/blog.html#partition-your-tus-
| int...
| mark_undoio wrote:
| > It's a massive pain to have everything be marked as
| "optimized out" and reassemble the things you want from other
| variables or by using a disassembler to manually track which
| register the value is hiding in.
|
| If you've got a time traveling / reversible debugger than you
| can (sometimes) go back to a point where the value was being
| written / used, at which point it'll often reappear in scope
| and be accessible.
|
| I believe DWARF's built-in virtual machine should be able to
| recompute missing values in many cases but I don't think
| compilers are great at putting the relevant info in, even
| where it should be possible to compute the right value fairly
| easily.
| mark_undoio wrote:
| The other trick I've found really helpful for "optimized
| out" values is to find places where they cross boundaries
| that block optimizations (e.g. procedure calls to another
| translation unit, so long as you're not doing some kind of
| link-time optimisation).
|
| e.g. if the value you're interested in is being passed to /
| returned from a function then inspecting it around the call
| / return site should have the value available.
| o11c wrote:
| Two particular notes around this (exact commands assuming
| gdb, but other debuggers should have equivalents):
|
| Pass various arguments to `backtrace` rather than just
| relying the default. Chances are it will have _some_ non-
| optimized-out variables, which you can use to figure out
| what 's going on.
|
| Use `info registers` and see what looks like a pointer,
| then cast it to a type you suspect it is. Note that this
| can be done for _any_ stack frame.
| DyslexicAtheist wrote:
| feels strange still to see complaints about debugging in
| production being inconvenient when we should have caught
| these issues in test/staging. secondly I think not having
| debug tools and debug data in production is a security
| feature.
| o11c wrote:
| > you don't need a special debug build to be able to debug
|
| Note that this is _highly_ dependent on choice of compiler.
| Clang is utter crap for debugging even at -O1, but I 've
| encountered basically no trouble ever using GCC at -O2 (you do
| have to learn a little about how the binary changes but that's
| easy enough to pick up). I really would not recommend -O3;
| historically it introduced bugs and regardless it makes the
| build process _much_ slower, and the performance gain is fairly
| negligible (I can 't say how much it destroys debuggability due
| to lack of experience). I can't speak for MSVC personally but
| it's a bad sign that its culture strongly promotes separate
| debug builds.
|
| That said, sanitizers are a place where a special debug build
| does help. Valgrind can do many of the things that sanitizers
| can but is around 10x slower which is a real pain if you can't
| isolate what you're targeting, so recompiling for sanitizers is
| a good idea.
|
| (Other brief notes)
|
| I have never actually encountered a case where the lack of
| frame pointers actually caused problems. As far as I'm
| concerned, any tool that breaks without them is a broken tool.
| (Theoretically they can speed up large traceback contexts if
| you're doing extensive profiling; good API design probably
| helps for the sanitizers case here)
|
| Rather than assembly int3, Unix-portable `SIGTRAP` is very
| useful for breakpoints; debuggers handle it specially. You can
| ignore it for non-debugged runs but get breakpoints when you
| are debugging without changing the binary or options!
| Alternatively you could leave it unignored if you have tooling
| that dumps core or something nicely for you.
| omoikane wrote:
| Debugging experience aside, I found that "-O3" is generally
| worth it if you also set "-march=native". For example, here
| are some run times for computing SHA256, you can see that
| there is slightly more to be gained going from -O2 to -O3
| with -march=native: -O2: 10.22 -O3:
| 9.82 -O2 -march=native: 9.86 -O3
| -march=native: 9.43
|
| This is basically SHA256 over ~8GB of data, averaged over 5
| runs. The numbers are rather crude here since I measured them
| just now, but I remember it was more significant when I first
| did it last month for
| https://news.ycombinator.com/item?id=40687942
| josephg wrote:
| Yeah -march=native is amazing. I use it when compiling &
| benchmarking rust code.
|
| But - to anyone reading this later - please don't do this
| blindly. You probably never want to distribute binaries
| with this flag set. It enables all the features available
| on the host CPU. So your build will change depending on the
| physical cpu you have installed. If you have a modern amd
| cpu, it may enable avx512 extensions and make your binary
| unusable on many Intel CPUs.
| hurpdurpdurp wrote:
| Great article with lots of good advice, but it makes me wonder
| what the consensus is on using ASAN in production.
|
| Once upon a time it was widely said that ASAN should not be used
| for production code. The authors advocated against it and from a
| general-purpose security perspective it gives attackers a very
| large writable memory region at a fixed offset to play with. But
| over time I see more and more ASAN code in production on the
| theory that ASAN _may_ make a system easier to exploit, but a
| memory corruption _will_ make it easier to exploit. And so it 's
| better to have knowledge of the issue.
|
| Also, I've personally found the glibc malloc tunables very useful
| for debugging.
| ryandrake wrote:
| To me, leaving a debug tool on in production because it happens
| to mask a bug is like the old (mal)practice of turning off
| optimizations in Release builds because of hard-to-debug
| crashes. Better to just fix the crashes.
| ggambetta wrote:
| About a million years ago (OK, more like 20) I was making casual
| videogames in C++ and I wanted a cross-platform (Linux, Mac,
| Windows) way to get a stack trace whenever a game crashed. What I
| ended up doing was adding a macro to the first line of every
| function, let's call it STACKTRACE, which was something like
| #define STACKTRACE GLOBAL_STACK_FILE[GLOBAL_STACK_IDX] =
| __FILE__; GLOBAL_STACK_LINE[GLOBAL_STACK_IDX++] = __LINE__;
| StackTraceCleaner stc;
|
| StackTraceCleaner was a class that didn't do anything but execute
| GLOBAL_STACK_IDX-- in its destructor.
|
| So at any point in time I could inspect GLOBAL_STACK_FILE and
| GLOBAL_STACK_LINE and have a complete stack trace of the game.
|
| Obviously this only worked because these games weren't
| performance-critical and because they were essentially single-
| threaded, but it did the job at the time. We're talking about a
| time when Visual Studio 6's support for templates was half-
| broken, and the STL wasn't exactly S, to the point that I had to
| roll out my own string, smart pointers, containers, etc -- made
| twice as hard because of the aforementioned broken template
| support in VS6 :(
|
| I do miss these simpler, more innocent times, though.
| VikingCoder wrote:
| Back in VS6 days, I did a similar thing, but I generated a GUID
| for the first line of every function.
|
| STACKTRACE("e0957136-fed3-414d-80b9-8bbf84f3fa03");
|
| With the GUID, I could see where functions moved as they were
| refactored.
|
| I would write out the GUID to a thread-local file handle, along
| with a time-stamp, and an "enter" or "exit" when an RAII object
| left the stack.
|
| Then I could retrospectively debug after running my program. I
| could see the callstack, and step through in time. I would walk
| my source and record GUID-to-filename/linenumber in a map. Then
| I could dump out a Visual Studio output that had the file name
| and line number, and execution time... allowing me to step
| forward and back through the execution.
|
| Stone knives and bearskins.
| arjvik wrote:
| My C++ is quite rusty, but why not have StackTraceCleaner's
| constructor take __FILE__ and __LINE__ as arguments and update
| the file and line arrays there?
| ggambetta wrote:
| Could have been that, honestly I'm not sure. 20-ish years :)
| FooBarWidget wrote:
| This is the same strategy I used in the Passenger application
| server.
| pjmlp wrote:
| We used a similar technique on a CRM server, UNIX based, back
| in 1999 - 2001.
|
| Another techique is that all key allocations were handled
| based, so we could also easily dump what the whole process map
| was about.
| jeffbee wrote:
| Anyone have tips on getting good stack traces in opt builds? I am
| really struggling with it at the moment. LLVM sanitizers all
| generate brilliant stack traces by forking llvm-symbolizer and
| feeding it the goods. But during runtime crashes on optimized
| binaries I don't seem to get good stack traces. One of the
| problems is that some library backtrace functions do not print
| the base address of the DSO mapping, which means they are
| printing a meaningless PC that can't be used to find file and
| line later.
| saagarjha wrote:
| Is calling dladdr on the addresses not enough for you?
| jeffbee wrote:
| It's not async signal safe, so I did not even try that.
|
| I think there's a huge amount of complexity both inherent to
| the problem and caused by fifty years of accumulated bad
| habits, which is indicated by the thousands of lines of code
| in compiler-rt dedicated to handling this issue. I'd like to
| call their library functions but they are all in private-
| looking namespaces. I also tried to use the Abseil failure
| signal handler but it often fails to unwind and even when it
| does unwind has a habit of just printing unknown for the
| symbol name or file, and never prints the DSO base addresses.
| bogwog wrote:
| Have you looked into using a library like Breakpad
| (https://chromium.googlesource.com/breakpad/breakpad/)? It's
| probably too much work to integrate for local debugging only
| though.
| mark_undoio wrote:
| If you're on *NIX have you tried just invoking gstack or
| similar as an external process?
| https://linux.die.net/man/1/gstack
|
| Or, indeed, getting a core dump and applying GDB to it. GDB
| seems generally pretty good at reconstructing stacks at
| arbitrary points in application runtime.
|
| We've also used a combination of libunwind and
| https://linux.die.net/man/1/addr2line to produce good crash
| dumps when GDB is not necessarily available.
| jeffbee wrote:
| To which of the projects that are all named "libunwind" do
| you refer?
| mark_undoio wrote:
| This one, I believe: https://github.com/libunwind/libunwind
|
| ETA: Thinking about it, I'm not really sure what it'd do
| for C++ - I guess you'd end up with mangled names, so if
| you want sensible names you might need to demangle (either
| as a post-processing step or within the dumper) too.
|
| I don't think you'll get any decoded argument values out of
| it either, so I guess it depends what backtrace info is
| needed.
| o11c wrote:
| Rule number one: never use Clang; its optimizers destroy too
| much information unlike GCC.
|
| You can use `dl_iterate_phdr` at startup if you need DSO info?
| dllu wrote:
| I enjoy using backward: https://github.com/bombela/backward-cpp
| jeffbee wrote:
| Looks worth investigating. Also making me wonder how many
| different backtrace implementations are out there on GitHub
| with Google copyrights!
| jcranmer wrote:
| Mozilla has a tool to fix up the bad dladdr-based printing
| methods in log files here: https://github.com/mozilla/fix-
| stacks/. Note that it relies on doing a little bit of post-
| processing on dladdr to get the base of the DSO it is in:
| https://searchfox.org/mozilla-central/source/mozglue/misc/St...
|
| As for whether or not you can use this in a signal handler...
| well, I hate reading the POSIX standard with regard to signal
| safety because it's just not well-written, but as far as I can
| tell, a non-async-signal-safe function can be safely called
| from a signal handler for a synchronous signal (which most of
| the interesting signals for dumping stack traces are--it's only
| something like dump-stack-trace-on-SIGUSR1 that's actually
| going to be an asynchronous signal), so long as it is not
| interrupting a non-async-signal-safe function. So as long as
| you're not crashing in libc, it should be kosher.
| forrestthewoods wrote:
| Great post. I'm surprised it requires so much effort. On Windows
| you pretty just need to make a debug build and... that's it!
|
| A nice trick with MSVC is you can turn off optimizations for TU
| or any block of code with: #pragma optimize(
| "", off )
|
| Leaps and bounds easier than hacking the build the system.
| FpUser wrote:
| >"On Windows you pretty just need to make a debug build and...
| that's it!"
|
| I got pretty much the same on Linux when using CLion IDE from
| JetBrains.
| aseipp wrote:
| I mean, to be fair, if you're fiddling with/invoking cl.exe
| manually then you'd need to know a lot of the general
| equivalents and nits listed here. MSVC's debug build will do a
| lot for you out of the box though which is great. That said you
| often have to support/know 400 random build tools when using
| C++ to enable things like this, so it's often useful knowledge
| anyway.
| o11c wrote:
| Assuming spatulas are absent, I've never had a problem with the
| build system on Linux. Nonetheless, for GCC the equivalent is:
| #pragma GCC optimize ("O0")
|
| (see also target, push_options, and pop_options)
|
| This is also available as a per-function attribute, using both
| gnu and standard syntaxes:
| __attribute__((optimize("O0"))) [[gnu::optimize("O0")]]
| bialpio wrote:
| For completeness, in Clang `[[clang::optnone]]` as a per-
| function attribute also works fine. I'm using it for
| debugging quite frequently lately.
| Galanwe wrote:
| What irks me the most is that in 2024, I still can't reliably
| embed source code in dwarf5 to get meaningful source-
| contextualized stacktraces and have to ship source code
| separately and override the source mapping.
| forrestthewoods wrote:
| Strong agree.
|
| At least on Windows you can setup Symbol Server + Source
| Indexing to achieve the same result.
|
| Once upon a time I wrote a small tool that can embed full
| source code into PDBs. I doubt anyone has ever used it though.
| For proprietary software it's not uncommon to leak PDBs on
| accident at some point. It could be disastrous to also leak
| full source code!
|
| https://www.forrestthewoods.com/blog/embedding-source-code-i...
|
| It's relatively easy to add source indexing to PDBs. I've
| successfully done that for a non-standard Monorepo. Works
| great.
| mark_undoio wrote:
| There's `debuginfod` on Linux:
| https://developers.redhat.com/blog/2019/10/14/introducing-
| de...
|
| It builds a lot on quite a simple conceptual base, benefiting
| from native support in gcc / clang (for embedding unique
| build IDs) and in GDB (for contacting the server). It can
| serve up both source and symbol information.
|
| I would like to see this adopted more - e.g. build
| infrastructure automatically populating a debuginfod server
| so debugging is seamless.
| becurious wrote:
| There is SourceLink where you can get the mapping into the
| pdb files:
|
| https://github.com/dotnet/sourcelink
| Const-me wrote:
| Tangentially related, a few tips about offline debugging on
| Windows: http://const.me/articles/windbg/windbg-intro.pdf
|
| Not a silver bullet but still, being able to collect and analyze
| user-mode crash dumps is sometimes the best way to investigate
| and fix bugs.
| weinzierl wrote:
| Good points, but for me number one would be to avoid runtime
| polymorphism like the plague.
|
| If your call graph has more roots than your neighbors garden and
| the whole thing is a forest and not a tree you will have a hard
| time understanding, analyzing and ultimately debugging.
| binary132 wrote:
| I particularly liked the point that not every TU needs to be
| compiled in debug mode. I am working on a build system and now
| I'm thinking of setting aside some time to make sure debug and
| optimization is an object level option. In general, I think the
| usefulness of a well specified ABI over object code is vastly
| underappreciated!
| renox wrote:
| > Avoid stepping into irrelevant code
|
| Thanks a lot!! I don't know how many times I've stepped into the
| C++ standard library and it gets really annoying..
___________________________________________________________________
(page generated 2024-07-29 23:00 UTC)