[HN Gopher] Rust: Investigating an Out of Memory Error
       ___________________________________________________________________
        
       Rust: Investigating an Out of Memory Error
        
       Author : erebe__
       Score  : 92 points
       Date   : 2025-01-15 09:07 UTC (4 days ago)
        
 (HTM) web link (www.qovery.com)
 (TXT) w3m dump (www.qovery.com)
        
       | xgb84j wrote:
       | In the article they talk about how printing an error from the
       | anyhow crate in debug format creates a full backtrace, which
       | leads to an OOM error. This happens even with 4 GB of memory.
       | 
       | Why does creating a backtrace need such a large amount of memory?
       | Is there a memory leak involved as well?
        
         | prerok wrote:
         | Well, based on the article, if there was a memory leak then
         | they should see the steady increase in memory consumption,
         | which was not the case.
         | 
         | The only explanation I can see (if their conclusion is
         | accurate) is that the end result of the symbolization is more
         | than 400MB additional memory consumption (which is a lot in my
         | opinion), however the process of the symbolization requires
         | more than 2GB additional memory (which is incredibly a lot).
        
           | prerok wrote:
           | The author replied with additional explanations, so it seems
           | that the additional 400MB were needed because the debug
           | symbols were compressed.
        
         | CodesInChaos wrote:
         | I don't think the 4 GiB instance actually ran into an OOM
         | error. They merely observed a 400 MiB memory spike. The
         | crashing instances were limited to 256 and later 512 MiB.
         | 
         | (Assuming that the article incorrectly used Mib when they meant
         | MiB. Used correctly b=bit, B=byte)
        
         | erebe__ wrote:
         | Sorry if the article is misleading.
         | 
         | The first increase of the memory limit was not 4G, but
         | something roughly around 300Mb/400Mb, and the OOM did happen
         | again with this setting.
         | 
         | Thus leading to a 2nd increase to 4Gi to be sure the app would
         | not get OOM killed when the behavior get triggered. We needed
         | the app to be alive/running for us to investigate the memory
         | profiling.
         | 
         | Regarding the increase of 400MiB, yeah it is a lot, and it was
         | a surprise to us too. We were not expecting such increase.
         | There are, I think 2 reasons behind this.
         | 
         | 1. This service is a grpc server, which has a lot of code
         | generated, so lots of symbols
         | 
         | 2. we compile the binary with debug symbols and a flag to
         | compress the debug symbols sections to avoid having huge
         | binary. Which may part be of this issue.
        
           | CodesInChaos wrote:
           | > we compile the binary with debug symbols and a flag to
           | compress the debug symbols sections to avoid having huge
           | binary.
           | 
           | How big are the uncompressed debug symbols? I'd expected
           | processing uncompressed debug symbols to happen via a memory
           | mapped file, while compressed debug symbols probably need to
           | be extracted to anonymous memory.
           | 
           | https://github.com/llvm/llvm-project/issues/63290
        
             | erebe__ wrote:
             | Normal build                 cargo build --bin engine-
             | gateway --release           Finished `release` profile
             | [optimized + debuginfo] target(s) in 1m 00s            ls
             | -lh target/release/engine-gateway        .rwxr-xr-x erebe
             | erebe 198 MB Sun Jan 19 12:37:35 2025
             | target/release/engine-gateway
             | 
             | what we ship                 export RUSTFLAGS="-C link-
             | arg=-Wl,--compress-debug-sections=zlib -C force-frame-
             | pointers=yes"        cargo build --bin engine-gateway
             | --release           Finished `release` profile [optimized +
             | debuginfo] target(s) in 1m 04s            ls -lh
             | target/release/engine-gateway       .rwxr-xr-x erebe erebe
             | 61 MB Sun Jan 19 12:39:13 2025  target/release/engine-
             | gateway
             | 
             | The diff is more impressive on some bigger projects
        
               | adastra22 wrote:
               | The compressed symbols sounds like the likely culprit. Do
               | you really need a small executable? The uncompressed
               | symbols need to be loaded into RAM anyway, and if it is
               | delayed until it is needed then you will have to allocate
               | memory to uncompress them.
        
               | erebe__ wrote:
               | I will give it a shot next week to try out ;P
               | 
               | For this particular service, the size does not matter
               | really. For others, it makes more diff (several hundred
               | of Mb) and as we deploy on customers infra, we want
               | images' size to stay reasonable. For now, we apply the
               | same build rules for all our services to stay consistent.
        
               | adastra22 wrote:
               | Maybe I'm not communicating well. Or maybe I don't
               | understand how the debug symbol compression works at
               | runtime. But my point is that I don't think you are
               | getting the tradeoff you think you are getting. The
               | smaller executable may end up using _more_ RAM. Usually
               | at the deployment stage, that 's what matters.
               | 
               | Smaller executables are more for things like reducing
               | distribution sizes, or reducing process launch latency
               | when disk throughput is the issue. When you invoke
               | compression, you are explicitly trading off runtime
               | performance in order to get the benefit of smaller on-
               | disk or network transmission size. For a hosted service,
               | that's usually not a good tradeoff.
        
               | erebe__ wrote:
               | It is most likely me reading too quickly. I was caught
               | off guard by the article gaining traction in a Sunday,
               | and as I have other duties during the weekend, I am
               | reading/responding only when I can sneak in.
               | 
               | For your comment, I think you are right regarding
               | compression of debug symbols that add up to the peak
               | memory, but I think you are misleading when you think the
               | debug symbols are uncompressed when the app/binary is
               | started/loaded. Decompression only happens for me when
               | this section is accessed by debugger or equivalent. It is
               | not the same thing as when the binary is fully
               | compressed, like with upx for example.
               | 
               | I have done a quick sanity check on my desktop, I got.
               | [profile.release]       lto = "thin"       debug = true
               | strip = false            export RUSTFLAGS="-C link-
               | arg=-Wl,--compress-debug-sections=zlib -C force-frame-
               | pointers=yes"       cargo build --bin engine-gateway
               | --release
               | 
               | From rss memory at startup I get ~128 MB, and after the
               | panic at peak I get ~627 MB.
               | 
               | When compiled with those flags                 export
               | RUSTFLAGS="-C force-frame-pointers=yes"        cargo
               | build --bin engine-gateway --release
               | 
               | From rss memory at startup I get ~128 MB, and after the
               | panic at peak I get ~474 MB.
               | 
               | So the peak is taller indeed when the debug section is
               | compressed, but the binary in memory when started is
               | roughly equivalent. (virtual mem too)
               | 
               | I had some hard time getting a source that may validate
               | my belief regarding when the debug symbol are
               | uncompressed. But based on https://inbox.sourceware.org/b
               | inutils/20080622061003.D279F3F... and the help of
               | claude.ai, I would say it is only when those sections are
               | accessed.
               | 
               | for what is worth, the whole answer of claude.ai
               | The debug sections compressed with --compress-debug-
               | sections=zlib are decompressed:            At runtime by
               | the debugger (like GDB) when it needs to access the debug
               | information:            When setting breakpoints
               | When doing backtraces       When inspecting variables
               | During symbol resolution                 When tools need
               | to read debug info:            During coredump analysis
               | When using tools like addr2line       During source-level
               | debugging       When using readelf with the -w option
               | The compression is transparent to these tools - they
               | automatically handle the decompression when needed.
               | The sections remain compressed on disk, and are only
               | decompressed in memory when required.       This helps
               | reduce the binary size on disk while still maintaining
               | full debugging capabilities, with only a small runtime
               | performance cost when the debug info needs to be
               | accessed.       The decompression is handled by the
               | libelf/DWARF libraries that these tools use to parse the
               | ELF files.
        
               | adastra22 wrote:
               | Thanks for running these checks. I'm learning from this
               | too!
        
               | the8472 wrote:
               | Container images use compression too, so having the debug
               | section uncompressed shouldn't actually make the images
               | any bigger.
        
           | the8472 wrote:
           | > 2. we compile the binary with debug symbols
           | 
           |  _symbols_ are usually included even with debuglevel 0,
           | unless stripped[0]. And debuginfo is configurable at several
           | levels[1]. If you 've set it to 2/full try dropping to a
           | lower level, that might also result in less data to load for
           | the backtrace implementation.
           | 
           | [0] https://users.rust-lang.org/t/difference-between-strip-
           | symbo... [1] https://doc.rust-
           | lang.org/cargo/reference/profiles.html#debu...
        
             | erebe__ wrote:
             | Thanks, was not aware there was granularity for debuginfo
             | ;)
        
           | delusional wrote:
           | > Sorry if the article is misleading.
           | 
           | I don't think the article is misleading, but I do think it's
           | a shame that all the interesting info is saved for this
           | hackernews comment. I think it would make for a more exciting
           | article if you included more of the analysis along with the
           | facts. Remember, as readers we don't know anything about your
           | constraints/system.
        
             | erebe__ wrote:
             | It was a parti pris by me, I wanted the article to stay
             | focus on the how, not much on the why. But I agree, even
             | while the context is specific to us, many people wanted
             | more interest of the surrounding, and why it happened. I
             | wanted to explain the method -\\_(tsu)_/-
        
           | xgb84j wrote:
           | Thank you for this in-depth reply! Your answer makes a lot of
           | sense. Also thank you for writing the article!
        
       | malkia wrote:
       | Can't they print something that llvm-symbolizer would pick up
       | offline?
        
         | dgrunwald wrote:
         | Yes, that is typically the way to go.
         | 
         | Collecting a call stack only requires unwinding information
         | (which is usually already present for C++ exceptions / Rust
         | panics), not full debug symbols. This gives you a list of
         | instruction pointers. (on Linux, the glibc `backtrace` function
         | can help with this)
         | 
         | Print those instruction pointers in a relative form (e.g.
         | "my_binary+0x1234") so that the output is independent of ASLR.
         | 
         | The above is all that needs to happen on the
         | production/customer machines, so you don't need to ship debug
         | symbols -- you can ship `strip`ped binaries.
         | 
         | On your own infrastructure, keep the original un-stripped
         | binaries around. We use a script involving elfutil's eu-
         | addr2line with those original binaries to turn the
         | module+relative_address stack trace into a readable symbolized
         | stack trace. I wasn't aware of llvm-symbolizer yet, seems like
         | that can do the same job as eu-addr2line. (There's also
         | binutil's addr2line but in my experience that didn't work as
         | well as eu-addr2line)
        
       | BimJeam wrote:
       | I once had a faulty python based ai image generator running on my
       | machine that used all 64 gigs of ram and oomed with a memory dump
       | written to fs. This is no fun when that happens. But mostly these
       | kind of bugs are misconfigurations or bad code, never ending
       | while loops, whatever.
        
       | CodesInChaos wrote:
       | The analysis looks rather half-finished. They did not analyze why
       | so much memory was consumed. If this is the cache which persists
       | after the first call, if it's temporary working memory, or if
       | it's an accumulating memory leak. And why it uses so much memory
       | at all.
       | 
       | I couldn't find any other complaints about rust backtrace
       | printing consuming a lot of memory, which I would have expected
       | if this was normal behaviour. So I wonder if there is anything
       | special about their environment or usecase?
       | 
       | I would assume that the same OOM problem would arise when
       | printing a panic backtrace. Either their instance has enough
       | memory to print backtraces, or it doesn't. So I don't understand
       | why they only disable lib backtraces.
        
         | erebe__ wrote:
         | Hello,
         | 
         | You can see my other comment
         | https://news.ycombinator.com/item?id=42708904#42756072 for more
         | details.
         | 
         | But yes, the cache does persist after the first call, the
         | resolved symbols stay in the cache to speed up the resolution
         | of next calls.
         | 
         | Regarding the why, it is mainly because
         | 
         | 1. this app is a gRPC server and contains a lot of generated
         | code (you can investigate binary bloat with rust with
         | https://github.com/RazrFalcon/cargo-bloat)
         | 
         | 2. and that we ship our binary with debug symbols, with those
         | options ``` ENV RUSTFLAGS="-C link-arg=-Wl,--compress-debug-
         | sections=zlib -C force-frame-pointers=yes" ```
         | 
         | For the panic, indeed, I had the same question on Reddit. For
         | this particular service, we don't expect panics at all, it is
         | just that by default we ship all our rust binaries with
         | backtrace enabled. And we have added an extra api endpoint to
         | trigger a catched panic on purpose for other apps to be sure
         | our sizing is correct.
        
       | samanthasu wrote:
       | great one! and i would recommnd this hands-on guide for
       | diagnosing memory leaks in Rust applications. it explains how to
       | enable heap profiling in 'jemalloc', collect memory allocation
       | data, and generate flame graphs for analysis.
       | https://greptime.com/blogs/2024-01-18-memory-leak#diagnosing...
        
       | tasn wrote:
       | Reminds me a bit of a post my colleague wrote a while back:
       | https://www.svix.com/blog/heap-fragmentation-in-rust-applica...
        
         | dpc_01234 wrote:
         | Yeap, just recently discovered that pretty much any long living
         | app in Rust should switch jemalloc for this reason.
        
       | submeta wrote:
       | Ahh, I did this in Python before I learned about Cursor and
       | Sourceforge's Cody. I'd use a template where I provide a tree of
       | my project structure, and then put code file contents in my
       | template file, and then have a full repo in one giant markdown
       | file. This only worked for smaller projects, but it worked damn
       | well to provide the full context to an LLM to then ask questions
       | about my code :)
        
       ___________________________________________________________________
       (page generated 2025-01-19 23:01 UTC)