[HN Gopher] How to build highly-debuggable C++ binaries
       ___________________________________________________________________
        
       How to build highly-debuggable C++ binaries
        
       Author : synergy20
       Score  : 133 points
       Date   : 2024-07-26 00:00 UTC (3 days ago)
        
 (HTM) web link (dhashe.com)
 (TXT) w3m dump (dhashe.com)
        
       | mark_undoio wrote:
       | Good to see this discussed - debuggability is not talked about
       | enough but, done right, it could be a superpower.
       | 
       | Setting the build for an old x64 machine (https://dhashe.com/how-
       | to-build-highly-debuggable-c-binaries...) for reversible / time
       | travel debuggers seems unnecessarily restrictive to me. I'd
       | expect a modern time travel debug tool (e.g. either rr or Undo -
       | disclaimer, which I work on) to cope fine with most modern
       | instructions (I believe GDB's built-in record / replay debugging
       | tends to be further behind the curve on new CPU instructions -
       | but if you're doing anything at scale it's not the right choice
       | anyhow).
       | 
       | Regarding compilation (https://dhashe.com/how-to-build-highly-
       | debuggable-c-binaries...) - we generally advise customers to use
       | -Og rather than -O0. As the article states, this will still
       | optimise out some code but should be a good trade-off without
       | being too slow. (NB. last I checked, clang currently uses -Og as
       | an alias for -O1, so it may behave less satisfactorily than under
       | GCC).
       | 
       | It's also not said enough but: you don't need a special debug
       | build to be able to debug. It's less user-friendly to debug a
       | fully-optimised release build but it's totally possible. You just
       | need to retain the DWARF debug info (instead of throwing it
       | away). This is really important to know if you're debugging on a
       | customer system or analysing a bug that's only in release builds.
        
         | saagarjha wrote:
         | Having debugged a lot of optimized code I would strongly
         | recommend against it unless you are in a context where
         | performance of your build is paramount (games?) or you cannot
         | reproduce the bug when compiled without optimizations.
         | Compilers really do a terrible job at preserving useful debug
         | info when you turn them on. It's a massive pain to have
         | everything be marked as "optimized out" and reassemble the
         | things you want from other variables or by using a disassembler
         | to manually track which register the value is hiding in.
        
           | SleepyMyroslav wrote:
           | It's not only games. Anything sizeable that needs to run to
           | repro will crumble under 20-100x times slowdown.
           | Multithreaded behaviors will be just different. All those
           | wonderful templated abstractions do not come for free in -O0.
           | Ranges are especially egregious example. Debug build is truly
           | dead outside of unit testing (imho).
           | 
           | Realistic scenario that gamedev uses: deoptimize translation
           | units you are interested in finding or reproducing bugs.
        
             | senkora wrote:
             | > Realistic scenario that gamedev uses: deoptimize
             | translation units you are interested in finding or
             | reproducing bugs.
             | 
             | Yep, looks like that's this bullet point:
             | 
             | https://dhashe.com/category/blog.html#partition-your-tus-
             | int...
        
           | mark_undoio wrote:
           | > It's a massive pain to have everything be marked as
           | "optimized out" and reassemble the things you want from other
           | variables or by using a disassembler to manually track which
           | register the value is hiding in.
           | 
           | If you've got a time traveling / reversible debugger than you
           | can (sometimes) go back to a point where the value was being
           | written / used, at which point it'll often reappear in scope
           | and be accessible.
           | 
           | I believe DWARF's built-in virtual machine should be able to
           | recompute missing values in many cases but I don't think
           | compilers are great at putting the relevant info in, even
           | where it should be possible to compute the right value fairly
           | easily.
        
             | mark_undoio wrote:
             | The other trick I've found really helpful for "optimized
             | out" values is to find places where they cross boundaries
             | that block optimizations (e.g. procedure calls to another
             | translation unit, so long as you're not doing some kind of
             | link-time optimisation).
             | 
             | e.g. if the value you're interested in is being passed to /
             | returned from a function then inspecting it around the call
             | / return site should have the value available.
        
               | o11c wrote:
               | Two particular notes around this (exact commands assuming
               | gdb, but other debuggers should have equivalents):
               | 
               | Pass various arguments to `backtrace` rather than just
               | relying the default. Chances are it will have _some_ non-
               | optimized-out variables, which you can use to figure out
               | what 's going on.
               | 
               | Use `info registers` and see what looks like a pointer,
               | then cast it to a type you suspect it is. Note that this
               | can be done for _any_ stack frame.
        
           | DyslexicAtheist wrote:
           | feels strange still to see complaints about debugging in
           | production being inconvenient when we should have caught
           | these issues in test/staging. secondly I think not having
           | debug tools and debug data in production is a security
           | feature.
        
         | o11c wrote:
         | > you don't need a special debug build to be able to debug
         | 
         | Note that this is _highly_ dependent on choice of compiler.
         | Clang is utter crap for debugging even at -O1, but I 've
         | encountered basically no trouble ever using GCC at -O2 (you do
         | have to learn a little about how the binary changes but that's
         | easy enough to pick up). I really would not recommend -O3;
         | historically it introduced bugs and regardless it makes the
         | build process _much_ slower, and the performance gain is fairly
         | negligible (I can 't say how much it destroys debuggability due
         | to lack of experience). I can't speak for MSVC personally but
         | it's a bad sign that its culture strongly promotes separate
         | debug builds.
         | 
         | That said, sanitizers are a place where a special debug build
         | does help. Valgrind can do many of the things that sanitizers
         | can but is around 10x slower which is a real pain if you can't
         | isolate what you're targeting, so recompiling for sanitizers is
         | a good idea.
         | 
         | (Other brief notes)
         | 
         | I have never actually encountered a case where the lack of
         | frame pointers actually caused problems. As far as I'm
         | concerned, any tool that breaks without them is a broken tool.
         | (Theoretically they can speed up large traceback contexts if
         | you're doing extensive profiling; good API design probably
         | helps for the sanitizers case here)
         | 
         | Rather than assembly int3, Unix-portable `SIGTRAP` is very
         | useful for breakpoints; debuggers handle it specially. You can
         | ignore it for non-debugged runs but get breakpoints when you
         | are debugging without changing the binary or options!
         | Alternatively you could leave it unignored if you have tooling
         | that dumps core or something nicely for you.
        
           | omoikane wrote:
           | Debugging experience aside, I found that "-O3" is generally
           | worth it if you also set "-march=native". For example, here
           | are some run times for computing SHA256, you can see that
           | there is slightly more to be gained going from -O2 to -O3
           | with -march=native:                  -O2: 10.22        -O3:
           | 9.82             -O2 -march=native: 9.86        -O3
           | -march=native: 9.43
           | 
           | This is basically SHA256 over ~8GB of data, averaged over 5
           | runs. The numbers are rather crude here since I measured them
           | just now, but I remember it was more significant when I first
           | did it last month for
           | https://news.ycombinator.com/item?id=40687942
        
             | josephg wrote:
             | Yeah -march=native is amazing. I use it when compiling &
             | benchmarking rust code.
             | 
             | But - to anyone reading this later - please don't do this
             | blindly. You probably never want to distribute binaries
             | with this flag set. It enables all the features available
             | on the host CPU. So your build will change depending on the
             | physical cpu you have installed. If you have a modern amd
             | cpu, it may enable avx512 extensions and make your binary
             | unusable on many Intel CPUs.
        
       | hurpdurpdurp wrote:
       | Great article with lots of good advice, but it makes me wonder
       | what the consensus is on using ASAN in production.
       | 
       | Once upon a time it was widely said that ASAN should not be used
       | for production code. The authors advocated against it and from a
       | general-purpose security perspective it gives attackers a very
       | large writable memory region at a fixed offset to play with. But
       | over time I see more and more ASAN code in production on the
       | theory that ASAN _may_ make a system easier to exploit, but a
       | memory corruption _will_ make it easier to exploit. And so it 's
       | better to have knowledge of the issue.
       | 
       | Also, I've personally found the glibc malloc tunables very useful
       | for debugging.
        
         | ryandrake wrote:
         | To me, leaving a debug tool on in production because it happens
         | to mask a bug is like the old (mal)practice of turning off
         | optimizations in Release builds because of hard-to-debug
         | crashes. Better to just fix the crashes.
        
       | ggambetta wrote:
       | About a million years ago (OK, more like 20) I was making casual
       | videogames in C++ and I wanted a cross-platform (Linux, Mac,
       | Windows) way to get a stack trace whenever a game crashed. What I
       | ended up doing was adding a macro to the first line of every
       | function, let's call it STACKTRACE, which was something like
       | #define STACKTRACE GLOBAL_STACK_FILE[GLOBAL_STACK_IDX] =
       | __FILE__; GLOBAL_STACK_LINE[GLOBAL_STACK_IDX++] = __LINE__;
       | StackTraceCleaner stc;
       | 
       | StackTraceCleaner was a class that didn't do anything but execute
       | GLOBAL_STACK_IDX-- in its destructor.
       | 
       | So at any point in time I could inspect GLOBAL_STACK_FILE and
       | GLOBAL_STACK_LINE and have a complete stack trace of the game.
       | 
       | Obviously this only worked because these games weren't
       | performance-critical and because they were essentially single-
       | threaded, but it did the job at the time. We're talking about a
       | time when Visual Studio 6's support for templates was half-
       | broken, and the STL wasn't exactly S, to the point that I had to
       | roll out my own string, smart pointers, containers, etc -- made
       | twice as hard because of the aforementioned broken template
       | support in VS6 :(
       | 
       | I do miss these simpler, more innocent times, though.
        
         | VikingCoder wrote:
         | Back in VS6 days, I did a similar thing, but I generated a GUID
         | for the first line of every function.
         | 
         | STACKTRACE("e0957136-fed3-414d-80b9-8bbf84f3fa03");
         | 
         | With the GUID, I could see where functions moved as they were
         | refactored.
         | 
         | I would write out the GUID to a thread-local file handle, along
         | with a time-stamp, and an "enter" or "exit" when an RAII object
         | left the stack.
         | 
         | Then I could retrospectively debug after running my program. I
         | could see the callstack, and step through in time. I would walk
         | my source and record GUID-to-filename/linenumber in a map. Then
         | I could dump out a Visual Studio output that had the file name
         | and line number, and execution time... allowing me to step
         | forward and back through the execution.
         | 
         | Stone knives and bearskins.
        
         | arjvik wrote:
         | My C++ is quite rusty, but why not have StackTraceCleaner's
         | constructor take __FILE__ and __LINE__ as arguments and update
         | the file and line arrays there?
        
           | ggambetta wrote:
           | Could have been that, honestly I'm not sure. 20-ish years :)
        
         | FooBarWidget wrote:
         | This is the same strategy I used in the Passenger application
         | server.
        
         | pjmlp wrote:
         | We used a similar technique on a CRM server, UNIX based, back
         | in 1999 - 2001.
         | 
         | Another techique is that all key allocations were handled
         | based, so we could also easily dump what the whole process map
         | was about.
        
       | jeffbee wrote:
       | Anyone have tips on getting good stack traces in opt builds? I am
       | really struggling with it at the moment. LLVM sanitizers all
       | generate brilliant stack traces by forking llvm-symbolizer and
       | feeding it the goods. But during runtime crashes on optimized
       | binaries I don't seem to get good stack traces. One of the
       | problems is that some library backtrace functions do not print
       | the base address of the DSO mapping, which means they are
       | printing a meaningless PC that can't be used to find file and
       | line later.
        
         | saagarjha wrote:
         | Is calling dladdr on the addresses not enough for you?
        
           | jeffbee wrote:
           | It's not async signal safe, so I did not even try that.
           | 
           | I think there's a huge amount of complexity both inherent to
           | the problem and caused by fifty years of accumulated bad
           | habits, which is indicated by the thousands of lines of code
           | in compiler-rt dedicated to handling this issue. I'd like to
           | call their library functions but they are all in private-
           | looking namespaces. I also tried to use the Abseil failure
           | signal handler but it often fails to unwind and even when it
           | does unwind has a habit of just printing unknown for the
           | symbol name or file, and never prints the DSO base addresses.
        
         | bogwog wrote:
         | Have you looked into using a library like Breakpad
         | (https://chromium.googlesource.com/breakpad/breakpad/)? It's
         | probably too much work to integrate for local debugging only
         | though.
        
         | mark_undoio wrote:
         | If you're on *NIX have you tried just invoking gstack or
         | similar as an external process?
         | https://linux.die.net/man/1/gstack
         | 
         | Or, indeed, getting a core dump and applying GDB to it. GDB
         | seems generally pretty good at reconstructing stacks at
         | arbitrary points in application runtime.
         | 
         | We've also used a combination of libunwind and
         | https://linux.die.net/man/1/addr2line to produce good crash
         | dumps when GDB is not necessarily available.
        
           | jeffbee wrote:
           | To which of the projects that are all named "libunwind" do
           | you refer?
        
             | mark_undoio wrote:
             | This one, I believe: https://github.com/libunwind/libunwind
             | 
             | ETA: Thinking about it, I'm not really sure what it'd do
             | for C++ - I guess you'd end up with mangled names, so if
             | you want sensible names you might need to demangle (either
             | as a post-processing step or within the dumper) too.
             | 
             | I don't think you'll get any decoded argument values out of
             | it either, so I guess it depends what backtrace info is
             | needed.
        
         | o11c wrote:
         | Rule number one: never use Clang; its optimizers destroy too
         | much information unlike GCC.
         | 
         | You can use `dl_iterate_phdr` at startup if you need DSO info?
        
         | dllu wrote:
         | I enjoy using backward: https://github.com/bombela/backward-cpp
        
           | jeffbee wrote:
           | Looks worth investigating. Also making me wonder how many
           | different backtrace implementations are out there on GitHub
           | with Google copyrights!
        
         | jcranmer wrote:
         | Mozilla has a tool to fix up the bad dladdr-based printing
         | methods in log files here: https://github.com/mozilla/fix-
         | stacks/. Note that it relies on doing a little bit of post-
         | processing on dladdr to get the base of the DSO it is in:
         | https://searchfox.org/mozilla-central/source/mozglue/misc/St...
         | 
         | As for whether or not you can use this in a signal handler...
         | well, I hate reading the POSIX standard with regard to signal
         | safety because it's just not well-written, but as far as I can
         | tell, a non-async-signal-safe function can be safely called
         | from a signal handler for a synchronous signal (which most of
         | the interesting signals for dumping stack traces are--it's only
         | something like dump-stack-trace-on-SIGUSR1 that's actually
         | going to be an asynchronous signal), so long as it is not
         | interrupting a non-async-signal-safe function. So as long as
         | you're not crashing in libc, it should be kosher.
        
       | forrestthewoods wrote:
       | Great post. I'm surprised it requires so much effort. On Windows
       | you pretty just need to make a debug build and... that's it!
       | 
       | A nice trick with MSVC is you can turn off optimizations for TU
       | or any block of code with:                   #pragma optimize(
       | "", off )
       | 
       | Leaps and bounds easier than hacking the build the system.
        
         | FpUser wrote:
         | >"On Windows you pretty just need to make a debug build and...
         | that's it!"
         | 
         | I got pretty much the same on Linux when using CLion IDE from
         | JetBrains.
        
         | aseipp wrote:
         | I mean, to be fair, if you're fiddling with/invoking cl.exe
         | manually then you'd need to know a lot of the general
         | equivalents and nits listed here. MSVC's debug build will do a
         | lot for you out of the box though which is great. That said you
         | often have to support/know 400 random build tools when using
         | C++ to enable things like this, so it's often useful knowledge
         | anyway.
        
         | o11c wrote:
         | Assuming spatulas are absent, I've never had a problem with the
         | build system on Linux. Nonetheless, for GCC the equivalent is:
         | #pragma GCC optimize ("O0")
         | 
         | (see also target, push_options, and pop_options)
         | 
         | This is also available as a per-function attribute, using both
         | gnu and standard syntaxes:
         | __attribute__((optimize("O0")))       [[gnu::optimize("O0")]]
        
           | bialpio wrote:
           | For completeness, in Clang `[[clang::optnone]]` as a per-
           | function attribute also works fine. I'm using it for
           | debugging quite frequently lately.
        
       | Galanwe wrote:
       | What irks me the most is that in 2024, I still can't reliably
       | embed source code in dwarf5 to get meaningful source-
       | contextualized stacktraces and have to ship source code
       | separately and override the source mapping.
        
         | forrestthewoods wrote:
         | Strong agree.
         | 
         | At least on Windows you can setup Symbol Server + Source
         | Indexing to achieve the same result.
         | 
         | Once upon a time I wrote a small tool that can embed full
         | source code into PDBs. I doubt anyone has ever used it though.
         | For proprietary software it's not uncommon to leak PDBs on
         | accident at some point. It could be disastrous to also leak
         | full source code!
         | 
         | https://www.forrestthewoods.com/blog/embedding-source-code-i...
         | 
         | It's relatively easy to add source indexing to PDBs. I've
         | successfully done that for a non-standard Monorepo. Works
         | great.
        
           | mark_undoio wrote:
           | There's `debuginfod` on Linux:
           | https://developers.redhat.com/blog/2019/10/14/introducing-
           | de...
           | 
           | It builds a lot on quite a simple conceptual base, benefiting
           | from native support in gcc / clang (for embedding unique
           | build IDs) and in GDB (for contacting the server). It can
           | serve up both source and symbol information.
           | 
           | I would like to see this adopted more - e.g. build
           | infrastructure automatically populating a debuginfod server
           | so debugging is seamless.
        
           | becurious wrote:
           | There is SourceLink where you can get the mapping into the
           | pdb files:
           | 
           | https://github.com/dotnet/sourcelink
        
       | Const-me wrote:
       | Tangentially related, a few tips about offline debugging on
       | Windows: http://const.me/articles/windbg/windbg-intro.pdf
       | 
       | Not a silver bullet but still, being able to collect and analyze
       | user-mode crash dumps is sometimes the best way to investigate
       | and fix bugs.
        
       | weinzierl wrote:
       | Good points, but for me number one would be to avoid runtime
       | polymorphism like the plague.
       | 
       | If your call graph has more roots than your neighbors garden and
       | the whole thing is a forest and not a tree you will have a hard
       | time understanding, analyzing and ultimately debugging.
        
       | binary132 wrote:
       | I particularly liked the point that not every TU needs to be
       | compiled in debug mode. I am working on a build system and now
       | I'm thinking of setting aside some time to make sure debug and
       | optimization is an object level option. In general, I think the
       | usefulness of a well specified ABI over object code is vastly
       | underappreciated!
        
       | renox wrote:
       | > Avoid stepping into irrelevant code
       | 
       | Thanks a lot!! I don't know how many times I've stepped into the
       | C++ standard library and it gets really annoying..
        
       ___________________________________________________________________
       (page generated 2024-07-29 23:00 UTC)