[HN Gopher] Rust: Investigating an Out of Memory Error
___________________________________________________________________
Rust: Investigating an Out of Memory Error
Author : erebe__
Score : 92 points
Date : 2025-01-15 09:07 UTC (4 days ago)
(HTM) web link (www.qovery.com)
(TXT) w3m dump (www.qovery.com)
| xgb84j wrote:
| In the article they talk about how printing an error from the
| anyhow crate in debug format creates a full backtrace, which
| leads to an OOM error. This happens even with 4 GB of memory.
|
| Why does creating a backtrace need such a large amount of memory?
| Is there a memory leak involved as well?
| prerok wrote:
| Well, based on the article, if there was a memory leak then
| they should see the steady increase in memory consumption,
| which was not the case.
|
| The only explanation I can see (if their conclusion is
| accurate) is that the end result of the symbolization is more
| than 400MB additional memory consumption (which is a lot in my
| opinion), however the process of the symbolization requires
| more than 2GB additional memory (which is incredibly a lot).
| prerok wrote:
| The author replied with additional explanations, so it seems
| that the additional 400MB were needed because the debug
| symbols were compressed.
| CodesInChaos wrote:
| I don't think the 4 GiB instance actually ran into an OOM
| error. They merely observed a 400 MiB memory spike. The
| crashing instances were limited to 256 and later 512 MiB.
|
| (Assuming that the article incorrectly used Mib when they meant
| MiB. Used correctly b=bit, B=byte)
| erebe__ wrote:
| Sorry if the article is misleading.
|
| The first increase of the memory limit was not 4G, but
| something roughly around 300Mb/400Mb, and the OOM did happen
| again with this setting.
|
| Thus leading to a 2nd increase to 4Gi to be sure the app would
| not get OOM killed when the behavior get triggered. We needed
| the app to be alive/running for us to investigate the memory
| profiling.
|
| Regarding the increase of 400MiB, yeah it is a lot, and it was
| a surprise to us too. We were not expecting such increase.
| There are, I think 2 reasons behind this.
|
| 1. This service is a grpc server, which has a lot of code
| generated, so lots of symbols
|
| 2. we compile the binary with debug symbols and a flag to
| compress the debug symbols sections to avoid having huge
| binary. Which may part be of this issue.
| CodesInChaos wrote:
| > we compile the binary with debug symbols and a flag to
| compress the debug symbols sections to avoid having huge
| binary.
|
| How big are the uncompressed debug symbols? I'd expected
| processing uncompressed debug symbols to happen via a memory
| mapped file, while compressed debug symbols probably need to
| be extracted to anonymous memory.
|
| https://github.com/llvm/llvm-project/issues/63290
| erebe__ wrote:
| Normal build cargo build --bin engine-
| gateway --release Finished `release` profile
| [optimized + debuginfo] target(s) in 1m 00s ls
| -lh target/release/engine-gateway .rwxr-xr-x erebe
| erebe 198 MB Sun Jan 19 12:37:35 2025
| target/release/engine-gateway
|
| what we ship export RUSTFLAGS="-C link-
| arg=-Wl,--compress-debug-sections=zlib -C force-frame-
| pointers=yes" cargo build --bin engine-gateway
| --release Finished `release` profile [optimized +
| debuginfo] target(s) in 1m 04s ls -lh
| target/release/engine-gateway .rwxr-xr-x erebe erebe
| 61 MB Sun Jan 19 12:39:13 2025 target/release/engine-
| gateway
|
| The diff is more impressive on some bigger projects
| adastra22 wrote:
| The compressed symbols sounds like the likely culprit. Do
| you really need a small executable? The uncompressed
| symbols need to be loaded into RAM anyway, and if it is
| delayed until it is needed then you will have to allocate
| memory to uncompress them.
| erebe__ wrote:
| I will give it a shot next week to try out ;P
|
| For this particular service, the size does not matter
| really. For others, it makes more diff (several hundred
| of Mb) and as we deploy on customers infra, we want
| images' size to stay reasonable. For now, we apply the
| same build rules for all our services to stay consistent.
| adastra22 wrote:
| Maybe I'm not communicating well. Or maybe I don't
| understand how the debug symbol compression works at
| runtime. But my point is that I don't think you are
| getting the tradeoff you think you are getting. The
| smaller executable may end up using _more_ RAM. Usually
| at the deployment stage, that 's what matters.
|
| Smaller executables are more for things like reducing
| distribution sizes, or reducing process launch latency
| when disk throughput is the issue. When you invoke
| compression, you are explicitly trading off runtime
| performance in order to get the benefit of smaller on-
| disk or network transmission size. For a hosted service,
| that's usually not a good tradeoff.
| erebe__ wrote:
| It is most likely me reading too quickly. I was caught
| off guard by the article gaining traction in a Sunday,
| and as I have other duties during the weekend, I am
| reading/responding only when I can sneak in.
|
| For your comment, I think you are right regarding
| compression of debug symbols that add up to the peak
| memory, but I think you are misleading when you think the
| debug symbols are uncompressed when the app/binary is
| started/loaded. Decompression only happens for me when
| this section is accessed by debugger or equivalent. It is
| not the same thing as when the binary is fully
| compressed, like with upx for example.
|
| I have done a quick sanity check on my desktop, I got.
| [profile.release] lto = "thin" debug = true
| strip = false export RUSTFLAGS="-C link-
| arg=-Wl,--compress-debug-sections=zlib -C force-frame-
| pointers=yes" cargo build --bin engine-gateway
| --release
|
| From rss memory at startup I get ~128 MB, and after the
| panic at peak I get ~627 MB.
|
| When compiled with those flags export
| RUSTFLAGS="-C force-frame-pointers=yes" cargo
| build --bin engine-gateway --release
|
| From rss memory at startup I get ~128 MB, and after the
| panic at peak I get ~474 MB.
|
| So the peak is taller indeed when the debug section is
| compressed, but the binary in memory when started is
| roughly equivalent. (virtual mem too)
|
| I had some hard time getting a source that may validate
| my belief regarding when the debug symbol are
| uncompressed. But based on https://inbox.sourceware.org/b
| inutils/20080622061003.D279F3F... and the help of
| claude.ai, I would say it is only when those sections are
| accessed.
|
| for what is worth, the whole answer of claude.ai
| The debug sections compressed with --compress-debug-
| sections=zlib are decompressed: At runtime by
| the debugger (like GDB) when it needs to access the debug
| information: When setting breakpoints
| When doing backtraces When inspecting variables
| During symbol resolution When tools need
| to read debug info: During coredump analysis
| When using tools like addr2line During source-level
| debugging When using readelf with the -w option
| The compression is transparent to these tools - they
| automatically handle the decompression when needed.
| The sections remain compressed on disk, and are only
| decompressed in memory when required. This helps
| reduce the binary size on disk while still maintaining
| full debugging capabilities, with only a small runtime
| performance cost when the debug info needs to be
| accessed. The decompression is handled by the
| libelf/DWARF libraries that these tools use to parse the
| ELF files.
| adastra22 wrote:
| Thanks for running these checks. I'm learning from this
| too!
| the8472 wrote:
| Container images use compression too, so having the debug
| section uncompressed shouldn't actually make the images
| any bigger.
| the8472 wrote:
| > 2. we compile the binary with debug symbols
|
| _symbols_ are usually included even with debuglevel 0,
| unless stripped[0]. And debuginfo is configurable at several
| levels[1]. If you 've set it to 2/full try dropping to a
| lower level, that might also result in less data to load for
| the backtrace implementation.
|
| [0] https://users.rust-lang.org/t/difference-between-strip-
| symbo... [1] https://doc.rust-
| lang.org/cargo/reference/profiles.html#debu...
| erebe__ wrote:
| Thanks, was not aware there was granularity for debuginfo
| ;)
| delusional wrote:
| > Sorry if the article is misleading.
|
| I don't think the article is misleading, but I do think it's
| a shame that all the interesting info is saved for this
| hackernews comment. I think it would make for a more exciting
| article if you included more of the analysis along with the
| facts. Remember, as readers we don't know anything about your
| constraints/system.
| erebe__ wrote:
| It was a parti pris by me, I wanted the article to stay
| focus on the how, not much on the why. But I agree, even
| while the context is specific to us, many people wanted
| more interest of the surrounding, and why it happened. I
| wanted to explain the method -\\_(tsu)_/-
| xgb84j wrote:
| Thank you for this in-depth reply! Your answer makes a lot of
| sense. Also thank you for writing the article!
| malkia wrote:
| Can't they print something that llvm-symbolizer would pick up
| offline?
| dgrunwald wrote:
| Yes, that is typically the way to go.
|
| Collecting a call stack only requires unwinding information
| (which is usually already present for C++ exceptions / Rust
| panics), not full debug symbols. This gives you a list of
| instruction pointers. (on Linux, the glibc `backtrace` function
| can help with this)
|
| Print those instruction pointers in a relative form (e.g.
| "my_binary+0x1234") so that the output is independent of ASLR.
|
| The above is all that needs to happen on the
| production/customer machines, so you don't need to ship debug
| symbols -- you can ship `strip`ped binaries.
|
| On your own infrastructure, keep the original un-stripped
| binaries around. We use a script involving elfutil's eu-
| addr2line with those original binaries to turn the
| module+relative_address stack trace into a readable symbolized
| stack trace. I wasn't aware of llvm-symbolizer yet, seems like
| that can do the same job as eu-addr2line. (There's also
| binutil's addr2line but in my experience that didn't work as
| well as eu-addr2line)
| BimJeam wrote:
| I once had a faulty python based ai image generator running on my
| machine that used all 64 gigs of ram and oomed with a memory dump
| written to fs. This is no fun when that happens. But mostly these
| kind of bugs are misconfigurations or bad code, never ending
| while loops, whatever.
| CodesInChaos wrote:
| The analysis looks rather half-finished. They did not analyze why
| so much memory was consumed. If this is the cache which persists
| after the first call, if it's temporary working memory, or if
| it's an accumulating memory leak. And why it uses so much memory
| at all.
|
| I couldn't find any other complaints about rust backtrace
| printing consuming a lot of memory, which I would have expected
| if this was normal behaviour. So I wonder if there is anything
| special about their environment or usecase?
|
| I would assume that the same OOM problem would arise when
| printing a panic backtrace. Either their instance has enough
| memory to print backtraces, or it doesn't. So I don't understand
| why they only disable lib backtraces.
| erebe__ wrote:
| Hello,
|
| You can see my other comment
| https://news.ycombinator.com/item?id=42708904#42756072 for more
| details.
|
| But yes, the cache does persist after the first call, the
| resolved symbols stay in the cache to speed up the resolution
| of next calls.
|
| Regarding the why, it is mainly because
|
| 1. this app is a gRPC server and contains a lot of generated
| code (you can investigate binary bloat with rust with
| https://github.com/RazrFalcon/cargo-bloat)
|
| 2. and that we ship our binary with debug symbols, with those
| options ``` ENV RUSTFLAGS="-C link-arg=-Wl,--compress-debug-
| sections=zlib -C force-frame-pointers=yes" ```
|
| For the panic, indeed, I had the same question on Reddit. For
| this particular service, we don't expect panics at all, it is
| just that by default we ship all our rust binaries with
| backtrace enabled. And we have added an extra api endpoint to
| trigger a catched panic on purpose for other apps to be sure
| our sizing is correct.
| samanthasu wrote:
| great one! and i would recommnd this hands-on guide for
| diagnosing memory leaks in Rust applications. it explains how to
| enable heap profiling in 'jemalloc', collect memory allocation
| data, and generate flame graphs for analysis.
| https://greptime.com/blogs/2024-01-18-memory-leak#diagnosing...
| tasn wrote:
| Reminds me a bit of a post my colleague wrote a while back:
| https://www.svix.com/blog/heap-fragmentation-in-rust-applica...
| dpc_01234 wrote:
| Yeap, just recently discovered that pretty much any long living
| app in Rust should switch jemalloc for this reason.
| submeta wrote:
| Ahh, I did this in Python before I learned about Cursor and
| Sourceforge's Cody. I'd use a template where I provide a tree of
| my project structure, and then put code file contents in my
| template file, and then have a full repo in one giant markdown
| file. This only worked for smaller projects, but it worked damn
| well to provide the full context to an LLM to then ask questions
| about my code :)
___________________________________________________________________
(page generated 2025-01-19 23:01 UTC)