[HN Gopher] Identifying Rust's collect:<Vec<_>>() memory leak fo...
___________________________________________________________________
Identifying Rust's collect:<Vec<_>>() memory leak footgun
Author : muglug
Score : 123 points
Date : 2024-01-18 13:38 UTC (9 hours ago)
(HTM) web link (blog.polybdenum.com)
(TXT) w3m dump (blog.polybdenum.com)
| teknopaul wrote:
| Over allocating maps and arrays in standard libs is not really a
| memory "leak" . Other langs do it.
| gipp wrote:
| Yeah, came here to say this. Still an interesting post, but the
| title implies a much more serious problem
| hyperpape wrote:
| Agreed.
|
| In the Haskell world, "space leak" is used to refer to a lazy
| computation that overallocates because of laziness (classic
| example being the difference between foldl and foldr or maybe
| that's foldl' and foldr'--I forget some details/cannot be
| bothered to think about folds right now).
|
| I've personally stolen the terminology for a computation that
| temporarily holds on to a bunch of unneeded memory.
| beltsazar wrote:
| After skimming the first few paragraphs, I almost commented
| something similar to yours. But after reading it until the end,
| the problem is more nuanced than that.
|
| The issue is more about reusing memory that has been allocated
| when you "convert" (in Rust term: into_iter()) a Vec to another
| Vec. More surprisingly, the reuse still happens even when the
| new Vec has a different type than the original Vec's type. You
| should read the whole post--it's more interesting than I (and
| you) thought.
| wredue wrote:
| You're right, but it's (I agree; incorrectly) been called a
| memory leak for a long time, to inappropriately over
| allocate/not free resources.
|
| Coming from my java days, they called this a memory leak there
| too.
|
| It's definitely a more inappropriate term in rust, where memory
| leak has a definite meaning already.
| hulium wrote:
| Conceptually, collect seems to create a new vector. In that
| case you would expect that the memory of the old vector is
| freed. The library tries to be smart and reuses the old vector
| as an (undocumented?) optimization. So the old memory that is
| no longer needed is not released.
|
| Whether you call that a leak or not, Rust is known for its
| predictability. Allocating 200x the memory which a programmer
| would expect if he doesn't know the implementation details is
| bad.
| wavemode wrote:
| The semantics are kind of besides the point, I think. This
| allocation reuse behavior was meant as an optimization, but the
| real-world programs this is meant to be "optimizing" run
| significantly slower in beta than they did in stable. So there
| is a bug here no matter how you look at it.
| pif wrote:
| > It's also an illustration of how an optimization in one place
| can lead to bugs downstream by violating programmers'
| expectations.
|
| This touched upon a pet peeve of mine for its resemblance with
| all the talk about undefined behaviour in C and C++. Programmers'
| expectations are not codified and do not need to be respected:
| international standards do.
| thfuran wrote:
| Respecting both is much better than respecting only the latter.
| phkahler wrote:
| >> Programmers' expectations are not codified and do not need
| to be respected
|
| That's pretty negative attitude - my immediate gut response was
| "and neither do yours". But Rust isn't an ISO standard and is
| still in development. Even if we do think in those terms,
| people developing a standard have IMHO an obligation to the
| people who will be using the standard.
| binary132 wrote:
| To be a bit pedantic, the people using the C++ standard are
| mostly compiler engineers.
| tialaramex wrote:
| I mean sure, so for _those_ people WG21 does them a great
| service, in most cases they can say "Oh that's IFNDR"
| [Ill-formed; No Diagnostic Required] or "That's UB" and so
| it's not the compiler's fault your program doesn't work and
| they can close your ticket as NOTABUG if they want.
|
| My guess is that all or almost-all non-trivial C++ projects
| trigger IFNDR and thus have no actual meaning per the ISO
| standard language. They do work, more or less, but that's
| just convention and can be taken away at any time or for
| any reason. Header files, use of concepts, unfortunate
| algorithmic goofs, careless naming, there are a billion+
| ways to trigger IFNDR in C++ and by definition that's fatal
| to your correctness.
|
| + A billion is likely an overestimate, but WG21 has no
| formal inventory of IFNDR in the sprawling ISO document so,
| who knows.
| phkahler wrote:
| Everyone writing C++ is following the standard. The people
| that often reference it directly are the compiler people.
| But if we are going to try and be stupid about it we can
| ignore the compiler engineers opinions too - they're just
| implementing an arbitrary specification that won't impact
| anyone but them anyway right?
| paulddraper wrote:
| > Programmers' expectations are not codified and do not need to
| be respected: international standards do.
|
| What if...stick with me...the standards sought to codify and
| respect programmers' expectations?
|
| https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...
| akira2501 wrote:
| Then using a language without a published standard is a fools
| errand.
| TwentyPosts wrote:
| No? How does that even follow? That a standard should
| codify common expectations insofar possible (without
| breaking rigor) is irrelevant to whether you "should" only
| use standardized languages or not.
| burntsushi wrote:
| Cross posting my comment from reddit[1] because I think it's
| interesting.
|
| -----
|
| Nice post. I love calling attention to this. Just a few months
| ago, I ran into the ~same~ similar problem, although it wasn't
| caused by `collect()`. It was caused by "normal" `Vec` usage:
|
| https://github.com/BurntSushi/aho-corasick/commit/474393be8d...
|
| The issue appeared when building large Aho-Corasick automatons.
| Otherwise, it usually doesn't matter too much because the
| absolute memory sizes are pretty small. But if your automaton
| uses 1GB of memory, then because of the excess capacity in a
| bunch of little `Vec` values, that usage could easily balloon to
| 2GB. And even at that size, _it was hard to notice that it was
| more than it should have been_! It is easy to just think that's
| normal memory usage. Especially when it's coming from the
| automaton representation that you know is less memory efficient.
| (What I'm talking about here is something I called a "non-
| contiguous NFA." The aho-corasick crate also has a "contiguous
| NFA" and a DFA. The "contiguous NFA" uses one contiguous `Vec`
| allocation with a bunch of bit-packing tricks.)
|
| But then someone actually reported an issue on the
| `ahocorasick_rs`[2] Python project (bindings to the `aho-
| corasick` crate) that the `pyahocorasick`[3] Python package (with
| a bespoke C implementation of Aho-Corasick) was using
| _substantially_ less memory. The contiguous NFA was still doing
| better, but the _peak_ memory usage of `ahocorasick_rs` was much
| much bigger than `pyahocorasick`. That prompted me to investigate
| and figure out what the heck was going wrong. (And this is one
| reason why benchmarks comparing tools or libraries are actually
| useful beyond just advertisements or flexing. They alert you to
| what is possible, and thus possibly push you to find bugs in your
| current implementation that might be otherwise easy to miss. In
| other words, benchmark comparisons can turn unknown unknowns into
| known unknowns.)
|
| So since this prompted me to look very carefully at memory usage,
| I noticed that the memory usage reported by
| `AhoCorasick::memory_usage`[4] was substantially smaller than
| peak RSS memory usage in a simple reproduction program of the
| original issue reported to `ahocorasick_rs`. I eventually figured
| out that was because of the excess capacity used by a zillion
| tiny `Vec`s in the non-contiguous representation. I fixed _most_
| of that by calling `shrink_to_fit()`. But as the commit linked
| above notes, it wasn't really feasible to call `shrink_to_fit()`
| on all `Vec`s because that ended up with a non-trivial impact on
| construction time. And on top of all of that, memory usage was
| still worse than `pyahocorasick`.
|
| But why was I using a bunch of little `Vec`s in the first place?
| It's because this is the "non-contiguous" NFA. This is the thing
| that gets built from a list of patterns by first building a trie
| then filling in the failure transitions. That trie construction
| really demands random access mutation, which puts a constraint on
| your data structure choices. Plus, I had designed the contiguous
| NFA as a release valve of sorts that lifts this constraint. So I
| reasoned, "sure, the non-contiguous NFA isn't designed to be
| fast. It just needs to get us started." But still,
| `pyahocorasick` only has one Aho-Corasick implementation (the
| `aho-corasick` crate has 3). So it must be doing something
| differently.
|
| It turns out that `pyahocorasick` uses linked lists! THE HORROR!
| But actually, they are a really good solution to this problem. As
| a Rust programmer, I reach for `Vec` first. But a C programmer
| reaches for linked lists first. And it turns out that linked
| lists are a much better fit here.
|
| And so, I switched to linked lists[5]. (But using indices instead
| of pointers. Because of course. This is Rust. :-)) And now
| `ahocorasick_rs` uses less memory than `pyahocorasick`!
|
| [1]:
| https://old.reddit.com/r/rust/comments/199jycb/identifying_r...
|
| [2]: https://pypi.org/project/ahocorasick-rs/
|
| [3]: https://pypi.org/project/pyahocorasick/
|
| [4]: https://docs.rs/aho-
| corasick/latest/aho_corasick/struct.AhoC...
|
| [5]: https://github.com/BurntSushi/aho-
| corasick/commit/31bb1fc30d...
| diarrhea wrote:
| Thanks for the great write up. Your writing and deep dives
| never fail to enlighten and entertain at the same time.
| beltsazar wrote:
| > Just a few months ago, I ran into the same problem, although
| it wasn't caused by `collect()`. It was caused by "normal"
| `Vec` usage
|
| If it wasn't caused by `collect()`, then I suppose it's a
| related problem, but not the same problem. Your problem was
| caused by, as your commit message says, "ignorance [original:
| ignorant] of the broader effects of this strategy [double-when-
| full] in aggregate"
|
| The OP's problem, otoh, is more about reusing memory that has
| been allocated when you "convert" (in Rust term: into_iter()) a
| Vec to another Vec. What's more, the reuse also happens even
| when the new Vec has a different type from the original Vec's
| type. This behavior is more surprising than the double-when-
| full strategy, which is more widely known by programmers (even
| if sometimes they "forget" about the implications).
| burntsushi wrote:
| I feel like second sentence you quoted clarified that? "same
| problem" in this context means, "there is excess and unused
| capacity in the Vec." Your description is more precise, and
| at that level of precision, sure, they are not the "same
| problem." But yet, they share the same characteristic that is
| easy to forget: a `Vec` might have unused capacity leading to
| higher memory usage than you might expect otherwise if you
| aren't accounting for that unused capacity.
|
| My phrasing is a reflection of what I think is a more
| universal lesson to draw from the blog, and one that is not
| provoked merely by how `collect()` works, but rather, the
| nature of growth amortization itself.
|
| > This behavior is more surprising than the double-when-full
| strategy
|
| I agree.
|
| I updated my comment to use "similar" instead of "same."
| cmrdporcupine wrote:
| What we need is a page-based Vec that mmaps (anon) for the
| storage but leaves the unused portions zero-bytes and therefore
| not part of RSS until actually required. (And when
| clearing/shrinking sections, madvise DONTNEED the pages).
|
| That is, the vec could expand to areas much larger than the
| actual used size, but this would have no effect on process RSS
| until those pages get dirtied with actual data.
| cnity wrote:
| Couldn't this obfuscate OOM type problems by not triggering
| actual memory issues until much later than the allocation?
| cmrdporcupine wrote:
| Maybe? Depends on how you monitor and what your
| expectations are. But for workloads that use a lot of
| memory, and want to avoid wasteful empty allocations, or
| have sparse data, etc. it's a reasonable strategy.
| vlovich123 wrote:
| That can be not a great idea for subtle reasons even though
| it does seem like a better design at first.
|
| When you do the madvise, based on the contact of the API, the
| kernel has to do a TLB shoot down which is really expensive.
| And not just "expensive for my process" but expensive in
| terms of a significant slowdown for entire machine. Honestly
| you could probably DDOS a machine if you could reliably
| trigger that shootdown.
|
| As such you want to be very very careful where you place that
| data structure.
|
| For what it's worth your memory allocator (or at least a good
| one like mimalloc or the non-gperftools newer tcmalloc) will
| do exactly as you mentioned where it will free memory back to
| the OS using advanced heuristics tested out at scale.
|
| As for why a shoot down is needed, it's so that a racing
| allocation call in another process doesn't get that virtual
| address but be running on a core where the TLB points
| elsewhere (the core that did the allocation may not even be
| the one where it's used).
| cmrdporcupine wrote:
| Yes, well, I think you're right it's a lot about the
| contract of the API and the expectation of the user.
|
| I'd certainly want intelligence (either by the user or by
| the framework) in how frequently you release pages back to
| the OS.
|
| But the DONTNEED is really not the core of the value. It's
| being able to create vectors that don't fragment as badly.
|
| And yes, you're right a decent allocator could help with
| this, and what I'm describing is really just a kind of
| specialized allocator.
| vlovich123 wrote:
| I don't follow. Fragmentation wasn't the problem here -
| it's that an optimization to trying to reuse the
| underlying allocation resulted in a severe over-
| allocation in a specific path. That would happen even
| with your idea unless you use DONTNEED to release all the
| unused pages.
|
| I think fragmentation is an over focused on problem when
| in practice it's rarely the problem. It's also a problem
| that can be solved by an application reset and making
| sure to periodically reset your running application in a
| way that doesn't result in severe disruption is a better
| practice that works around many more issues than just
| fragmentation.
| LegionMammal978 wrote:
| Is MADV_FREE on Linux any better in this regard? It's what
| the libc allocators tend to use, if I recall correctly.
| vlovich123 wrote:
| It would have to be the same problem because there's no
| way to free memory without a shootdown AFAIK. As I
| describe elsewhere, I don't think there's any way to free
| memory to the OS without one as otherwise you can run
| into race conditions where another process allocating
| memory gets access to the memory being freed in another
| process through a stale TLB entry.
|
| Here's a discussion [1] of a hypothetical lazy_munmap
| option that would initiate the unmap without an immediate
| shoot down (i.e. the CPUs would lazily evict from the TLB
| when they notice the request) but no such option exists.
| It's also not immediately clear such a hypothetical API
| could exist on current CPU architectures or if it would
| require new HW support as I don't have enough hands-on
| expertise at that level. Anyway, [1] is an interesting
| discussion of this idea and the note about huge pages
| making these even less valuable is interesting to keep in
| mind.
|
| [1] https://news.ycombinator.com/item?id=23216590
| loeg wrote:
| Is this not going to be more or less what the memory
| allocator backing Vec does under the hood anyway?
| glandium wrote:
| That's already what it does (except when shrinking)
| Filligree wrote:
| What's the reason for not always using the contiguous variant?
| burntsushi wrote:
| You can't build the contiguous variant directly from a
| sequence of patterns. You need some kind of intermediate data
| structure to incrementally build a trie in memory. The
| contiguous NFA needs to know the complete picture of each
| state in order to compress it into memory. It makes decisions
| like, "if the number of transitions of this state is less
| than N, then use this representation" or "use the most
| significant N bits of the state pointer to indicate its
| representation." It is difficult to do this in an online
| fashion, and likely impossible to do without some sort of
| compromise. For example, you don't know how many transitions
| each state has until you've completed construction of the
| trie. But how do you build the trie if the state
| representation needs to know the number of transitions?
| Classic chicken-and-egg.
|
| Note that the conversion from a non-contiguous NFA to a
| contiguous NFA is, relatively speaking, pretty cheap. The
| only real reason to _not_ use a contiguous NFA is that it can
| 't represent as many patterns as a non-contiguous NFA.
| (Because of the compression tricks it uses.)
|
| The interesting bits start here:
| https://github.com/BurntSushi/aho-
| corasick/blob/f227162f7c56...
| kccqzy wrote:
| Just a somewhat tangent thought: do you know of any tricks to
| speed up the construction phase of Aho Corasick? A recent
| problem of mine needed to use Aho Corasick but with several
| million strings in the dictionary it took a bit longer than I
| thought to construct it. Any tricks you know of to speed up
| construction? Don't particularly care about memory.
| burntsushi wrote:
| Hmmm. The phrasing of your question is a little ambiguous. I
| could interpret it two ways:
|
| 1. "Do you know if there are any config knobs in the aho-
| corasick crate that will speed up construction?"
|
| 2. "I have my own implementation of Aho-Corasick and
| construction isn't as fast as I hoped. What kinds of tricks
| can I employ to make my own implementation faster?"
|
| I'd be happy to try and answer either of these, but I think
| it would be better to ask with a concrete example on the
| issue tracker Discussions: https://github.com/BurntSushi/aho-
| corasick/discussions
|
| All of the tricks I know about (and I judged to be worth
| doing) are in the aho-corasick crate. There's really just no
| getting around the fact that you need to first build a trie
| (which requires visiting every byte in every pattern) and
| then do a breadth-first search across the entire trie to
| setup failure transitions. Making construction substantially
| faster probably requires some kind of innovation in those two
| parts at a fundamental level.
|
| You can also cheat at this. That is, by designing your Aho-
| Corasick automaton to be zero-copy deserializable. The DFAs
| in regex-automata support this. I waver on whether to do this
| for aho-corasick, because it acts like a straight-jacket for
| evolving the internals of the library.
| the_mitsuhiko wrote:
| This is a good time to check how your code performs in beta vs
| stable. This particular case is new in beta and it would be
| interesting to catch regressions before they land.
|
| Someone already filed this as a bug: https://github.com/rust-
| lang/rust/issues/120091
| tialaramex wrote:
| This re-use trick seems clever, but definitely needed more
| consideration before merging IMO. Re-use where we can determine
| the size & alignment are a perfect match ought to be a win, but
| the other cases seem very situational. And The8472's attitude
| here seems unlikely to lead to a better Rust.
| IshKebab wrote:
| Yeah I feel like it should reallocate if the size difference
| is great.
| WirelessGigabit wrote:
| I don't think it's a memory leak. It's re-using the same space of
| the underling array. It's allocated. Dropping the Vec will
| release the memory.
|
| In v1 you put in 128 containers is with each 1024 boxes.
|
| Then v2 is taking out the first box out of each container,
| tossing the container and putting the box at the space where the
| container was, packing them.
|
| The fact that capacity doubles when you remove the as u8 is...
| normal. You reuse the space, so you can fix 2 _u8 in the space of
| 1_ u16.
|
| The problem here is more that this the optimization causes a
| Vec<T> -> Vec<Y> where sizeof(T) > sizeof(Y) to have much more
| capacity than expected.
|
| Which IS a valid bug report. Bug again, not a memory leak.
| vlovich123 wrote:
| It seems like you have a better grasp on the problem than I. I
| understand why capacity in terms of number of elements doubles.
| But why did he end up using 200x more memory? Is it that the
| transient memory used the 18GiB and then he reduced it to
| vector of elements that were 200x smaller? Or is it that
| there's some other problem that when you collect, it's
| reallocating 2x using the wrong type to do the computation?
| 0x457 wrote:
| Because it doubles in every node and OP has 300k of them[1].
| It's not a leak, since memory is...well...accounted for and
| easily freed by either dropping Vec or calling shirk_to_fit.
| IMO it's expected behavior.
|
| Honestly, if I had 300k Vecs, I would call `shrink_to_fit`
| even without that optimization.
|
| 1: Since the edge list calculation logic is run for one node
| at a time in sequence, the fact that it temporarily uses
| 132kb of memory would normally not be a problem at all.
| However, the new collect() optimization meant that that
| internal storage was being preserved in the final vec which
| got inserted into the list of edge lists, thus keeping around
| that 132kb of obsolete memory forever. Multiply that by 300k
| nodes and suddenly you've leaked the computer's entire RAM.
| vlovich123 wrote:
| Ah right. Vec of vecs and the interior Vec has wasted
| capacity due to the reduction of the element size. Thanks!
| Groxx wrote:
| The `map` step converted the data to a smaller type, and the
| whole chain was smart enough to reuse the original Vec
| without reallocating. Since the original item size was [much
| bigger], there was a lot of "free" capacity left over.
|
| It is indeed a bit surprising, but seems entirely reasonable
| for the platform to do imo. It's efficient, in both memory
| (allocation) and CPU. If you want a specifically-sized
| container, you have to make sure that happens, otherwise
| you're letting things be optimized in language-general ways.
| That will frequently be wrong in edge cases, and may change
| at any time.
| paulddraper wrote:
| No, if that's all it was, the excess memory usage would be 2x.
|
| But it's 200x.
|
| Right?
| nickez wrote:
| Please read the article. His produces 200x intermediary
| values. Clickbait title, since it wasn't a leak.
| paulddraper wrote:
| I did read the article.
|
| > the memory waste from excess capacity should always be at
| most 2x, but I was seeing over 200x.
|
| So the 200x analysis is his problem?
| kbknapp wrote:
| Rust can re-use an allocation, but if the new item is
| _smaller_ than the previous it doesn 't automatically
| remove (free) the "wasted" memory left over from the
| previous allocation. I think this is categorically not a
| memory leak as the memory was absolutely accounted for
| and able to be freed (as evidenced by the
| `shrink_to_fit()`), but I can see how the author was
| initially confused by this optimization.
|
| The 2x versus 200x confusion IMO is the OP was conflating
| that Vec will double in size when it needs more space, so
| they were assuming the memory should have only ever been
| 2x in the worst case of the _new size_. Which in the OPs
| case because the new type size was smaller than the
| previous, it seemed like a massive over-allocation.
|
| Imagine you had a `Vec<Vec<u16>>` and to keep it simple
| it there were only 2 elements in both the inner and outer
| Vec's, which if we assume Rust doubled each Vec's
| allocation that'd be 4x4 "slots" of 2 bytes per slot (or
| 32 bytes total allocated...in reality it'd be a little
| different but to keep it simple let's just assume).
|
| Now imagine you replace that allocation with a
| `Vec<Vec<u8>>` which even with the same doubling of the
| allocation size would be a maximum of 4x4 slots of 1 byte
| per slot (16 bytes total allocation required). Well we
| already have a 32 byte allocation and we only need 16, so
| Rust just re-uses it, and now it looks like we have 16
| bytes of "waste."
|
| Now the author was expecting _at most_ 16 bytes
| (remember, 2x the _new size_ ) but was seeing 32 bytes
| because Rust just re-used the allocation and didn't free
| the "extra" 16 bytes. Further, when they ran
| `Vec::shrink_to_fit()` it shrunk down to only _used_
| space, which in our example would be a total of 4 bytes
| (2x2 of 1 byte slots actually used).
|
| Meaning the author was comparing an _observed_ 32 byte
| allocation, to an expectation of _at most_ 16 bytes, and
| a properly sized allocation of 4 bytes. Factored out to
| their real world data I can see how they 'd see numbers
| greater than "at most 2x."
| IshKebab wrote:
| 200x is correct. What's happening is that he makes a
| vector with tons of capacity and only a few elements, so
| lots of wasted space. Then he turns that vector into
| another vector using an operation that _used_ to allocate
| a new vector (thus releasing the wasted space) but now
| reuses the previous vector 's allocation (retaining the
| wasted space).
|
| It's definitely a sneaky bug. Not a "memory leak" in the
| normal sense since the memory will still be freed
| eventually. I'd call it an unexpected waste of memory.
| hyperpape wrote:
| I agree that this is not a memory leak.
|
| However, the semantic distinction between "this uses much
| more memory than expected" and "this is a memory leak" is a
| little subtle, and it seems pretty rude to call it
| clickbait.
| devmor wrote:
| I disagree that it's a small semantic difference.
|
| I don't think it's clickbait though, I think the author
| was just misusing terminology.
| hyperpape wrote:
| I said it was subtle, not small. I agree it's a valuable
| distinction.
| mratsim wrote:
| A memory leak means it leaks, it's not anymore under
| control. Here the memory is under control, it can be
| reclaimed by the program.
| karmakaze wrote:
| Clickbait (in the context of Rust). In languages with
| managed memory there are no true memory leaks so such
| wastes are called leaks. In lower-level languages, we
| should stay more strict with what we call things.
| deathanatos wrote:
| ... Box::leak1 is a function that exists. That seems like
| a memory leak, no?
|
| Less tongue-in-cheek, if a program allocates far more
| memory than expected of it, I going to colloquially
| called that a "memory leak". If I see a Java program
| whose RSS is doing nothing but "up and to the right"
| until the VM runs out of memory and dies a sweet sweet
| page thrashing death, I'm going to describe that as a
| "memory leak". Having someone tell me, "well, actually,
| it's not a leak per se it's just that the JVM's GC didn't
| collect all the available garbage prior to running out of
| memory because ..." ... I don't care? You're just forcing
| me to wordsmith the problem description ---the problem is
| still there. Program is still dead, and still exceeding
| the constraints of the environment it should have been
| operating in.
|
| The author had some assumptions: that Vec doesn't
| overalloc by more than 2x, and that collect allocates --
| one of those did turn out to be false, but I think if I
| polled Rust programmers, a fair number of them would make
| the wrong assumption. _I_ would, and TIL from this
| article that it was wrong, and that collect can reuse the
| original allocation, despite it not being readily
| apparent how it knows how to do that with a generic
| Iterator. (And, the article got me to understand that
| part, too!)
|
| Unlike most clickbaits which lure you in only to let you
| down, I learned something here. Reading it was
| worthwhile.
|
| 1https://doc.rust-
| lang.org/stable/std/boxed/struct.Box.html#m...
| 0x457 wrote:
| No, memory leak is a very distinct definition: unused and
| stored, but inaccessible memory. Memory leak can be as
| small as a single word. In this case, it's just a memory.
| There is another term for this scenario, which I don't
| remember.
|
| This is a case of optimization gone wrong, but nothing is
| leaked, and every single byte is accounted for.
|
| The title is click bate, but article still interesting to
| read.
| WirelessGigabit wrote:
| Looking at this example: fn dbg_vec<T>(v:
| &Vec<T>) { println!( "vec data
| ptr={:?} len={} cap={}", v.as_ptr(),
| v.len(), v.capacity() );
| } fn main() { {
| let v1 = (0u16..128).map(|i| [i; 1024]).collect::<Vec<_>>();
| dbg_vec(&v1); let v2 = v1.into_iter().map(|x|
| x[0] as u8).collect::<Vec<_>>();
| dbg_vec(&v2); } { let
| v1 = (0u16..128).map(|i| [i; 1024]).collect::<Vec<_>>();
| dbg_vec(&v1); let v2 = v1.into_iter().map(|x|
| x[0]).collect::<Vec<_>>(); dbg_vec(&v2);
| } } # cargo +nightly clean
| Removed 11 files, 7.3MiB total # cargo
| +nightly run --release Compiling vec-debug v0.1.0
| (/home/neo/vec-debug) Finished release
| [optimized] target(s) in 0.17s Running
| `target/release/vec-debug` vec data
| ptr=0x7f47ee162010 len=128 cap=128 vec data
| ptr=0x7f47ee121010 len=128 cap=262144 vec data
| ptr=0x55c6514f3ba0 len=128 cap=128 vec data
| ptr=0x55c6514f3ba0 len=128 cap=131072
|
| What happens is that the original's vectors assigned space
| gets reused. That's it.
|
| It LOOKS like there is more because the capacity inflates.
| But capacity is written in terms of sizeof(T), not bits
| (0u16..128).map(|i| [i; 1024]).collect::<Vec<_>>()
|
| the capacity (and length) is 128 times an array of 1024 of a
| u16. We then re-use the same underlying array which has a
| CAPACITY of 128 times (1024 * 16) and put in a u8 (the cast).
| So each item went from 1024 * 16 to 8.
|
| So previously we had 128 * 1024 * 16 = 2,097,152 bits.
|
| How many times can we put 8 bits in a capacity of 2,097,152
| bits?
|
| 262,144
|
| How many times can we put 16 bits in a capacity of 2,097,152
| bits?
|
| 131,072
| karmakaze wrote:
| A similar thing happens in Go if you don't use a pre-known
| 'capacity' when allocating slices. Each internal realloc could
| use a larger backing array. Adding elements 1 by 1 will keep
| doing this making lots of garbage. They all get collected of
| course, but performance is horrible with memory fragmentation
| and all the copying.
| HideousKojima wrote:
| C# doubles the size of a List<T> each time you hit the
| capacity (if you didn't set it correctly when you created the
| list or left it at the default). Does Go do something similar
| or does it only increase it by 1 each time?
| znkr wrote:
| It grows by 2x for small sizes, then it transitions to
| growing it by 1.25x
| stcredzero wrote:
| It took me a few extra seconds to parse the headline. ("footgun"
| didn't help) Is <Vec<_>>()
|
| perhaps to become another -\_(tsu)_/-
| ?
| tialaramex wrote:
| It's interesting that HN zapped the second colon. The headline
| here (at time of my posting) is:
|
| "Identifying Rust's collect:<Vec<_>>() memory leak footgun"
|
| but the actual title of the blog post is:
|
| "Identifying Rust's collect::<Vec<_>>() memory leak footgun"
|
| That extra colon matters to Rust's syntax.
| unshavedyak wrote:
| Not sure what you're asking. At risk of loosely explaining what
| you may already know, `<Vec<_>>` is a generic parameter `<T>`
| with the type `T` specified as `Vec`. Vec also has a param,
| `<_>`, where the type is ignored - as the compiler can figure
| that out.
|
| Sidenote, `<Vec<_>>` is missing the best part - the fish head,
| to become `::<Vec<_>>`, aka `::<>` - the Turbo Fish :D
|
| I admit to using this syntax more than necessary just so i can
| write a turbofish :D
|
| _edit_ : Wait no, `<>` is the head - isn't it. hah. So i guess
| it's missing the fins? /shrug
| avgcorrection wrote:
| Yes? What about unamused Vec?
| renewiltord wrote:
| Great write-up. Thank you. I wouldn't have guessed. The upshot of
| this whole thing is that Vec is for mutable collections and when
| you're done you make a boxed slice.
| invpt wrote:
| I appreciate the mention of Box<[T]>. I've never (or at least
| very rarely?) seen people using it, even in rustc, although I'm
| sure there's somewhere in the compiler that uses it. I've kind of
| gotten the feeling that Box<[T]> is frowned upon as a needless
| optimization because the one-word storage difference is so
| minimal, but I appreciate having the immutable size semantics as
| a way of ensuring no extra allocations occur by way of the
| methods on Vec.
| ironhaven wrote:
| Another rust type you should look out for is Box<str>In my rust
| code I find a lot more uses of Box<str> rather than Box<[T]>.
| Strings are often fixed length identifiers or usernames that
| get passed around frequently and stored in data structures.
| Box<str> can be a drop-in replacement in a surprising amount of
| code
| zozbot234 wrote:
| Arc<str> is an atomically ref-counted string, also quite
| useful in async or multi-threaded code where (unlike w/ Box)
| you sometimes can't know in advance which task will drop the
| last reference to some data.
| bobbylarrybobby wrote:
| Box<str> is used pervasively in the rust compiler instead of
| String for exactly this reason. Basically every string of
| code the compiler looks at is, unsurprisingly, constant.
| steveklabnik wrote:
| Yeah, the general concept here is
| https://en.wikipedia.org/wiki/String_interning, Box<str> is
| great for it.
| tsnl wrote:
| Another downside of Box<T> is that Vec::into_boxed_slice copies
| all elements from the source vector. Vec::shrink_to_fit does
| not appear to copy. This is based on the stable docs.
| sapiogram wrote:
| It doesn't copy the elements if the source vector is already
| at max capacity, which you can often arrange for beforehand.
| cmrx64 wrote:
| For example, by explicitly calling shrink_to_fit. So you
| either have fragmentation, or an extra copy if you're not
| careful, but none of the solutions let you forget about the
| detail entirely.
| invpt wrote:
| If you look at the source for into_boxed_slice, it calls
| shrink_to_fit at the beginning before doing anything
| else. Hence the documentation is slightly wrong, and no
| copies occur.
|
| Edit: I submitted a PR to clear up the docs:
| https://github.com/rust-lang/rust/pull/120110
| cmrx64 wrote:
| thank you!
| sapiogram wrote:
| Or alternatively, the vector was created with
| `Vec::with_capacity()` or `ExactSizeIterator::collect()`,
| and had the correct capacity all along.
| loeg wrote:
| This is a pretty surprising behavior. Reusing the allocation
| without shrinking when the resulting capacity will be within
| 1.0-2.0x the length: seems reasonable, not super surprising.
| Reusing the allocation without shrinking when the resulting
| capacity will be a large multiple of the (known) length: pretty
| surprising! My intuition is that this is too surprising (at
| capacity >2) to be worth the possible optimization but probably
| still worth doing at capacity small multiples of length.
| Groxx wrote:
| Is reusing an allocation _while changing its size_ a thing you
| expect to be able to do? I would believe that some languages
| /systems/etc can do that, but it certainly feels like an
| exception rather than a rule. Reuse generally means the whole
| block of memory is retained, from what I've seen, because you'd
| have to track that half-freed memory for reuse somehow and that
| has some associated cost. (A compacting-GC language would be a
| reasonable exception here, for example)
| vlovich123 wrote:
| I've read the article and I'm still lost as to how this
| optimization results in a 200x discrepancy between capacity and
| length. Was the mapped element size 200x smaller in this case? Or
| is there some bug where repeated mappings like this cause a 2x
| growth based on the previous element size instead of the target
| element size?
| hedgehog wrote:
| I think the behavior is that if you have a Vec of u128 (say
| 1000), filter that to fewer elements (say 10), and then collect
| it into a Vec of u32 you might expect the resulting value to be
| around 40 bytes but in beta Rust it is 16000 bytes. In current
| Rust the collect would cause a new 10 element Vec of u32 to be
| allocated, in beta it reuses the original larger allocation.
| The author's code is doing a bit more but essentially when
| moving to beta Rust the new optimization caused memory usage to
| increase by a big multiple.
| vlovich123 wrote:
| Ok. I missed that there's a filter step that compounds the
| problem. The more I read the less this sounds like a bug and
| more like application code is missing a shrink_to_fit and was
| relying on a pessimization.
|
| That being said, it's also not an unreasonable expectation on
| the user's behalf that the size and capacity don't get crazy
| different in code as innocuous and idiomatic as this.
|
| I wonder how the open bug will end up training itself.
| Groxx wrote:
| While I agree this is a bit surprising... I don't see _literally
| anything_ in the docs for `collect()` that implies anything about
| memory behaviors. You get _a collection_ , nothing more is
| guaranteed.
|
| If you want specific memory behavior, you really do have to use a
| method that intends to do that. Like `shrink_to_fit()` (mentioned
| in the article).
| deathanatos wrote:
| I would say I don't (or now, didn't) expect collect to be able
| to know that there is an allocation under there, since that's
| not part of Iterator's interface.
|
| I.e., the docs could call out that there _aren 't_ such implied
| behaviors, and that in some circumstances, it might reuse the
| allocation, but that that's not guaranteed. (And ideally, offer
| me info on what to do if I _do_ want certain guarantees about
| the allocation of the Vec.)
|
| Warding off what I think is a _reasonable_ , but wrong,
| assumption, is well within the scope of the docs.
___________________________________________________________________
(page generated 2024-01-18 23:00 UTC)