[HN Gopher] Identifying Rust's collect:<Vec<_>>() memory leak fo...
       ___________________________________________________________________
        
       Identifying Rust's collect:<Vec<_>>() memory leak footgun
        
       Author : muglug
       Score  : 123 points
       Date   : 2024-01-18 13:38 UTC (9 hours ago)
        
 (HTM) web link (blog.polybdenum.com)
 (TXT) w3m dump (blog.polybdenum.com)
        
       | teknopaul wrote:
       | Over allocating maps and arrays in standard libs is not really a
       | memory "leak" . Other langs do it.
        
         | gipp wrote:
         | Yeah, came here to say this. Still an interesting post, but the
         | title implies a much more serious problem
        
         | hyperpape wrote:
         | Agreed.
         | 
         | In the Haskell world, "space leak" is used to refer to a lazy
         | computation that overallocates because of laziness (classic
         | example being the difference between foldl and foldr or maybe
         | that's foldl' and foldr'--I forget some details/cannot be
         | bothered to think about folds right now).
         | 
         | I've personally stolen the terminology for a computation that
         | temporarily holds on to a bunch of unneeded memory.
        
         | beltsazar wrote:
         | After skimming the first few paragraphs, I almost commented
         | something similar to yours. But after reading it until the end,
         | the problem is more nuanced than that.
         | 
         | The issue is more about reusing memory that has been allocated
         | when you "convert" (in Rust term: into_iter()) a Vec to another
         | Vec. More surprisingly, the reuse still happens even when the
         | new Vec has a different type than the original Vec's type. You
         | should read the whole post--it's more interesting than I (and
         | you) thought.
        
         | wredue wrote:
         | You're right, but it's (I agree; incorrectly) been called a
         | memory leak for a long time, to inappropriately over
         | allocate/not free resources.
         | 
         | Coming from my java days, they called this a memory leak there
         | too.
         | 
         | It's definitely a more inappropriate term in rust, where memory
         | leak has a definite meaning already.
        
         | hulium wrote:
         | Conceptually, collect seems to create a new vector. In that
         | case you would expect that the memory of the old vector is
         | freed. The library tries to be smart and reuses the old vector
         | as an (undocumented?) optimization. So the old memory that is
         | no longer needed is not released.
         | 
         | Whether you call that a leak or not, Rust is known for its
         | predictability. Allocating 200x the memory which a programmer
         | would expect if he doesn't know the implementation details is
         | bad.
        
         | wavemode wrote:
         | The semantics are kind of besides the point, I think. This
         | allocation reuse behavior was meant as an optimization, but the
         | real-world programs this is meant to be "optimizing" run
         | significantly slower in beta than they did in stable. So there
         | is a bug here no matter how you look at it.
        
       | pif wrote:
       | > It's also an illustration of how an optimization in one place
       | can lead to bugs downstream by violating programmers'
       | expectations.
       | 
       | This touched upon a pet peeve of mine for its resemblance with
       | all the talk about undefined behaviour in C and C++. Programmers'
       | expectations are not codified and do not need to be respected:
       | international standards do.
        
         | thfuran wrote:
         | Respecting both is much better than respecting only the latter.
        
         | phkahler wrote:
         | >> Programmers' expectations are not codified and do not need
         | to be respected
         | 
         | That's pretty negative attitude - my immediate gut response was
         | "and neither do yours". But Rust isn't an ISO standard and is
         | still in development. Even if we do think in those terms,
         | people developing a standard have IMHO an obligation to the
         | people who will be using the standard.
        
           | binary132 wrote:
           | To be a bit pedantic, the people using the C++ standard are
           | mostly compiler engineers.
        
             | tialaramex wrote:
             | I mean sure, so for _those_ people WG21 does them a great
             | service, in most cases they can say  "Oh that's IFNDR"
             | [Ill-formed; No Diagnostic Required] or "That's UB" and so
             | it's not the compiler's fault your program doesn't work and
             | they can close your ticket as NOTABUG if they want.
             | 
             | My guess is that all or almost-all non-trivial C++ projects
             | trigger IFNDR and thus have no actual meaning per the ISO
             | standard language. They do work, more or less, but that's
             | just convention and can be taken away at any time or for
             | any reason. Header files, use of concepts, unfortunate
             | algorithmic goofs, careless naming, there are a billion+
             | ways to trigger IFNDR in C++ and by definition that's fatal
             | to your correctness.
             | 
             | + A billion is likely an overestimate, but WG21 has no
             | formal inventory of IFNDR in the sprawling ISO document so,
             | who knows.
        
             | phkahler wrote:
             | Everyone writing C++ is following the standard. The people
             | that often reference it directly are the compiler people.
             | But if we are going to try and be stupid about it we can
             | ignore the compiler engineers opinions too - they're just
             | implementing an arbitrary specification that won't impact
             | anyone but them anyway right?
        
         | paulddraper wrote:
         | > Programmers' expectations are not codified and do not need to
         | be respected: international standards do.
         | 
         | What if...stick with me...the standards sought to codify and
         | respect programmers' expectations?
         | 
         | https://en.wikipedia.org/wiki/Principle_of_least_astonishmen...
        
           | akira2501 wrote:
           | Then using a language without a published standard is a fools
           | errand.
        
             | TwentyPosts wrote:
             | No? How does that even follow? That a standard should
             | codify common expectations insofar possible (without
             | breaking rigor) is irrelevant to whether you "should" only
             | use standardized languages or not.
        
       | burntsushi wrote:
       | Cross posting my comment from reddit[1] because I think it's
       | interesting.
       | 
       | -----
       | 
       | Nice post. I love calling attention to this. Just a few months
       | ago, I ran into the ~same~ similar problem, although it wasn't
       | caused by `collect()`. It was caused by "normal" `Vec` usage:
       | 
       | https://github.com/BurntSushi/aho-corasick/commit/474393be8d...
       | 
       | The issue appeared when building large Aho-Corasick automatons.
       | Otherwise, it usually doesn't matter too much because the
       | absolute memory sizes are pretty small. But if your automaton
       | uses 1GB of memory, then because of the excess capacity in a
       | bunch of little `Vec` values, that usage could easily balloon to
       | 2GB. And even at that size, _it was hard to notice that it was
       | more than it should have been_! It is easy to just think that's
       | normal memory usage. Especially when it's coming from the
       | automaton representation that you know is less memory efficient.
       | (What I'm talking about here is something I called a "non-
       | contiguous NFA." The aho-corasick crate also has a "contiguous
       | NFA" and a DFA. The "contiguous NFA" uses one contiguous `Vec`
       | allocation with a bunch of bit-packing tricks.)
       | 
       | But then someone actually reported an issue on the
       | `ahocorasick_rs`[2] Python project (bindings to the `aho-
       | corasick` crate) that the `pyahocorasick`[3] Python package (with
       | a bespoke C implementation of Aho-Corasick) was using
       | _substantially_ less memory. The contiguous NFA was still doing
       | better, but the _peak_ memory usage of `ahocorasick_rs` was much
       | much bigger than `pyahocorasick`. That prompted me to investigate
       | and figure out what the heck was going wrong. (And this is one
       | reason why benchmarks comparing tools or libraries are actually
       | useful beyond just advertisements or flexing. They alert you to
       | what is possible, and thus possibly push you to find bugs in your
       | current implementation that might be otherwise easy to miss. In
       | other words, benchmark comparisons can turn unknown unknowns into
       | known unknowns.)
       | 
       | So since this prompted me to look very carefully at memory usage,
       | I noticed that the memory usage reported by
       | `AhoCorasick::memory_usage`[4] was substantially smaller than
       | peak RSS memory usage in a simple reproduction program of the
       | original issue reported to `ahocorasick_rs`. I eventually figured
       | out that was because of the excess capacity used by a zillion
       | tiny `Vec`s in the non-contiguous representation. I fixed _most_
       | of that by calling `shrink_to_fit()`. But as the commit linked
       | above notes, it wasn't really feasible to call `shrink_to_fit()`
       | on all `Vec`s because that ended up with a non-trivial impact on
       | construction time. And on top of all of that, memory usage was
       | still worse than `pyahocorasick`.
       | 
       | But why was I using a bunch of little `Vec`s in the first place?
       | It's because this is the "non-contiguous" NFA. This is the thing
       | that gets built from a list of patterns by first building a trie
       | then filling in the failure transitions. That trie construction
       | really demands random access mutation, which puts a constraint on
       | your data structure choices. Plus, I had designed the contiguous
       | NFA as a release valve of sorts that lifts this constraint. So I
       | reasoned, "sure, the non-contiguous NFA isn't designed to be
       | fast. It just needs to get us started." But still,
       | `pyahocorasick` only has one Aho-Corasick implementation (the
       | `aho-corasick` crate has 3). So it must be doing something
       | differently.
       | 
       | It turns out that `pyahocorasick` uses linked lists! THE HORROR!
       | But actually, they are a really good solution to this problem. As
       | a Rust programmer, I reach for `Vec` first. But a C programmer
       | reaches for linked lists first. And it turns out that linked
       | lists are a much better fit here.
       | 
       | And so, I switched to linked lists[5]. (But using indices instead
       | of pointers. Because of course. This is Rust. :-)) And now
       | `ahocorasick_rs` uses less memory than `pyahocorasick`!
       | 
       | [1]:
       | https://old.reddit.com/r/rust/comments/199jycb/identifying_r...
       | 
       | [2]: https://pypi.org/project/ahocorasick-rs/
       | 
       | [3]: https://pypi.org/project/pyahocorasick/
       | 
       | [4]: https://docs.rs/aho-
       | corasick/latest/aho_corasick/struct.AhoC...
       | 
       | [5]: https://github.com/BurntSushi/aho-
       | corasick/commit/31bb1fc30d...
        
         | diarrhea wrote:
         | Thanks for the great write up. Your writing and deep dives
         | never fail to enlighten and entertain at the same time.
        
         | beltsazar wrote:
         | > Just a few months ago, I ran into the same problem, although
         | it wasn't caused by `collect()`. It was caused by "normal"
         | `Vec` usage
         | 
         | If it wasn't caused by `collect()`, then I suppose it's a
         | related problem, but not the same problem. Your problem was
         | caused by, as your commit message says, "ignorance [original:
         | ignorant] of the broader effects of this strategy [double-when-
         | full] in aggregate"
         | 
         | The OP's problem, otoh, is more about reusing memory that has
         | been allocated when you "convert" (in Rust term: into_iter()) a
         | Vec to another Vec. What's more, the reuse also happens even
         | when the new Vec has a different type from the original Vec's
         | type. This behavior is more surprising than the double-when-
         | full strategy, which is more widely known by programmers (even
         | if sometimes they "forget" about the implications).
        
           | burntsushi wrote:
           | I feel like second sentence you quoted clarified that? "same
           | problem" in this context means, "there is excess and unused
           | capacity in the Vec." Your description is more precise, and
           | at that level of precision, sure, they are not the "same
           | problem." But yet, they share the same characteristic that is
           | easy to forget: a `Vec` might have unused capacity leading to
           | higher memory usage than you might expect otherwise if you
           | aren't accounting for that unused capacity.
           | 
           | My phrasing is a reflection of what I think is a more
           | universal lesson to draw from the blog, and one that is not
           | provoked merely by how `collect()` works, but rather, the
           | nature of growth amortization itself.
           | 
           | > This behavior is more surprising than the double-when-full
           | strategy
           | 
           | I agree.
           | 
           | I updated my comment to use "similar" instead of "same."
        
         | cmrdporcupine wrote:
         | What we need is a page-based Vec that mmaps (anon) for the
         | storage but leaves the unused portions zero-bytes and therefore
         | not part of RSS until actually required. (And when
         | clearing/shrinking sections, madvise DONTNEED the pages).
         | 
         | That is, the vec could expand to areas much larger than the
         | actual used size, but this would have no effect on process RSS
         | until those pages get dirtied with actual data.
        
           | cnity wrote:
           | Couldn't this obfuscate OOM type problems by not triggering
           | actual memory issues until much later than the allocation?
        
             | cmrdporcupine wrote:
             | Maybe? Depends on how you monitor and what your
             | expectations are. But for workloads that use a lot of
             | memory, and want to avoid wasteful empty allocations, or
             | have sparse data, etc. it's a reasonable strategy.
        
           | vlovich123 wrote:
           | That can be not a great idea for subtle reasons even though
           | it does seem like a better design at first.
           | 
           | When you do the madvise, based on the contact of the API, the
           | kernel has to do a TLB shoot down which is really expensive.
           | And not just "expensive for my process" but expensive in
           | terms of a significant slowdown for entire machine. Honestly
           | you could probably DDOS a machine if you could reliably
           | trigger that shootdown.
           | 
           | As such you want to be very very careful where you place that
           | data structure.
           | 
           | For what it's worth your memory allocator (or at least a good
           | one like mimalloc or the non-gperftools newer tcmalloc) will
           | do exactly as you mentioned where it will free memory back to
           | the OS using advanced heuristics tested out at scale.
           | 
           | As for why a shoot down is needed, it's so that a racing
           | allocation call in another process doesn't get that virtual
           | address but be running on a core where the TLB points
           | elsewhere (the core that did the allocation may not even be
           | the one where it's used).
        
             | cmrdporcupine wrote:
             | Yes, well, I think you're right it's a lot about the
             | contract of the API and the expectation of the user.
             | 
             | I'd certainly want intelligence (either by the user or by
             | the framework) in how frequently you release pages back to
             | the OS.
             | 
             | But the DONTNEED is really not the core of the value. It's
             | being able to create vectors that don't fragment as badly.
             | 
             | And yes, you're right a decent allocator could help with
             | this, and what I'm describing is really just a kind of
             | specialized allocator.
        
               | vlovich123 wrote:
               | I don't follow. Fragmentation wasn't the problem here -
               | it's that an optimization to trying to reuse the
               | underlying allocation resulted in a severe over-
               | allocation in a specific path. That would happen even
               | with your idea unless you use DONTNEED to release all the
               | unused pages.
               | 
               | I think fragmentation is an over focused on problem when
               | in practice it's rarely the problem. It's also a problem
               | that can be solved by an application reset and making
               | sure to periodically reset your running application in a
               | way that doesn't result in severe disruption is a better
               | practice that works around many more issues than just
               | fragmentation.
        
             | LegionMammal978 wrote:
             | Is MADV_FREE on Linux any better in this regard? It's what
             | the libc allocators tend to use, if I recall correctly.
        
               | vlovich123 wrote:
               | It would have to be the same problem because there's no
               | way to free memory without a shootdown AFAIK. As I
               | describe elsewhere, I don't think there's any way to free
               | memory to the OS without one as otherwise you can run
               | into race conditions where another process allocating
               | memory gets access to the memory being freed in another
               | process through a stale TLB entry.
               | 
               | Here's a discussion [1] of a hypothetical lazy_munmap
               | option that would initiate the unmap without an immediate
               | shoot down (i.e. the CPUs would lazily evict from the TLB
               | when they notice the request) but no such option exists.
               | It's also not immediately clear such a hypothetical API
               | could exist on current CPU architectures or if it would
               | require new HW support as I don't have enough hands-on
               | expertise at that level. Anyway, [1] is an interesting
               | discussion of this idea and the note about huge pages
               | making these even less valuable is interesting to keep in
               | mind.
               | 
               | [1] https://news.ycombinator.com/item?id=23216590
        
           | loeg wrote:
           | Is this not going to be more or less what the memory
           | allocator backing Vec does under the hood anyway?
        
           | glandium wrote:
           | That's already what it does (except when shrinking)
        
         | Filligree wrote:
         | What's the reason for not always using the contiguous variant?
        
           | burntsushi wrote:
           | You can't build the contiguous variant directly from a
           | sequence of patterns. You need some kind of intermediate data
           | structure to incrementally build a trie in memory. The
           | contiguous NFA needs to know the complete picture of each
           | state in order to compress it into memory. It makes decisions
           | like, "if the number of transitions of this state is less
           | than N, then use this representation" or "use the most
           | significant N bits of the state pointer to indicate its
           | representation." It is difficult to do this in an online
           | fashion, and likely impossible to do without some sort of
           | compromise. For example, you don't know how many transitions
           | each state has until you've completed construction of the
           | trie. But how do you build the trie if the state
           | representation needs to know the number of transitions?
           | Classic chicken-and-egg.
           | 
           | Note that the conversion from a non-contiguous NFA to a
           | contiguous NFA is, relatively speaking, pretty cheap. The
           | only real reason to _not_ use a contiguous NFA is that it can
           | 't represent as many patterns as a non-contiguous NFA.
           | (Because of the compression tricks it uses.)
           | 
           | The interesting bits start here:
           | https://github.com/BurntSushi/aho-
           | corasick/blob/f227162f7c56...
        
         | kccqzy wrote:
         | Just a somewhat tangent thought: do you know of any tricks to
         | speed up the construction phase of Aho Corasick? A recent
         | problem of mine needed to use Aho Corasick but with several
         | million strings in the dictionary it took a bit longer than I
         | thought to construct it. Any tricks you know of to speed up
         | construction? Don't particularly care about memory.
        
           | burntsushi wrote:
           | Hmmm. The phrasing of your question is a little ambiguous. I
           | could interpret it two ways:
           | 
           | 1. "Do you know if there are any config knobs in the aho-
           | corasick crate that will speed up construction?"
           | 
           | 2. "I have my own implementation of Aho-Corasick and
           | construction isn't as fast as I hoped. What kinds of tricks
           | can I employ to make my own implementation faster?"
           | 
           | I'd be happy to try and answer either of these, but I think
           | it would be better to ask with a concrete example on the
           | issue tracker Discussions: https://github.com/BurntSushi/aho-
           | corasick/discussions
           | 
           | All of the tricks I know about (and I judged to be worth
           | doing) are in the aho-corasick crate. There's really just no
           | getting around the fact that you need to first build a trie
           | (which requires visiting every byte in every pattern) and
           | then do a breadth-first search across the entire trie to
           | setup failure transitions. Making construction substantially
           | faster probably requires some kind of innovation in those two
           | parts at a fundamental level.
           | 
           | You can also cheat at this. That is, by designing your Aho-
           | Corasick automaton to be zero-copy deserializable. The DFAs
           | in regex-automata support this. I waver on whether to do this
           | for aho-corasick, because it acts like a straight-jacket for
           | evolving the internals of the library.
        
       | the_mitsuhiko wrote:
       | This is a good time to check how your code performs in beta vs
       | stable. This particular case is new in beta and it would be
       | interesting to catch regressions before they land.
       | 
       | Someone already filed this as a bug: https://github.com/rust-
       | lang/rust/issues/120091
        
         | tialaramex wrote:
         | This re-use trick seems clever, but definitely needed more
         | consideration before merging IMO. Re-use where we can determine
         | the size & alignment are a perfect match ought to be a win, but
         | the other cases seem very situational. And The8472's attitude
         | here seems unlikely to lead to a better Rust.
        
           | IshKebab wrote:
           | Yeah I feel like it should reallocate if the size difference
           | is great.
        
       | WirelessGigabit wrote:
       | I don't think it's a memory leak. It's re-using the same space of
       | the underling array. It's allocated. Dropping the Vec will
       | release the memory.
       | 
       | In v1 you put in 128 containers is with each 1024 boxes.
       | 
       | Then v2 is taking out the first box out of each container,
       | tossing the container and putting the box at the space where the
       | container was, packing them.
       | 
       | The fact that capacity doubles when you remove the as u8 is...
       | normal. You reuse the space, so you can fix 2 _u8 in the space of
       | 1_ u16.
       | 
       | The problem here is more that this the optimization causes a
       | Vec<T> -> Vec<Y> where sizeof(T) > sizeof(Y) to have much more
       | capacity than expected.
       | 
       | Which IS a valid bug report. Bug again, not a memory leak.
        
         | vlovich123 wrote:
         | It seems like you have a better grasp on the problem than I. I
         | understand why capacity in terms of number of elements doubles.
         | But why did he end up using 200x more memory? Is it that the
         | transient memory used the 18GiB and then he reduced it to
         | vector of elements that were 200x smaller? Or is it that
         | there's some other problem that when you collect, it's
         | reallocating 2x using the wrong type to do the computation?
        
           | 0x457 wrote:
           | Because it doubles in every node and OP has 300k of them[1].
           | It's not a leak, since memory is...well...accounted for and
           | easily freed by either dropping Vec or calling shirk_to_fit.
           | IMO it's expected behavior.
           | 
           | Honestly, if I had 300k Vecs, I would call `shrink_to_fit`
           | even without that optimization.
           | 
           | 1: Since the edge list calculation logic is run for one node
           | at a time in sequence, the fact that it temporarily uses
           | 132kb of memory would normally not be a problem at all.
           | However, the new collect() optimization meant that that
           | internal storage was being preserved in the final vec which
           | got inserted into the list of edge lists, thus keeping around
           | that 132kb of obsolete memory forever. Multiply that by 300k
           | nodes and suddenly you've leaked the computer's entire RAM.
        
             | vlovich123 wrote:
             | Ah right. Vec of vecs and the interior Vec has wasted
             | capacity due to the reduction of the element size. Thanks!
        
           | Groxx wrote:
           | The `map` step converted the data to a smaller type, and the
           | whole chain was smart enough to reuse the original Vec
           | without reallocating. Since the original item size was [much
           | bigger], there was a lot of "free" capacity left over.
           | 
           | It is indeed a bit surprising, but seems entirely reasonable
           | for the platform to do imo. It's efficient, in both memory
           | (allocation) and CPU. If you want a specifically-sized
           | container, you have to make sure that happens, otherwise
           | you're letting things be optimized in language-general ways.
           | That will frequently be wrong in edge cases, and may change
           | at any time.
        
         | paulddraper wrote:
         | No, if that's all it was, the excess memory usage would be 2x.
         | 
         | But it's 200x.
         | 
         | Right?
        
           | nickez wrote:
           | Please read the article. His produces 200x intermediary
           | values. Clickbait title, since it wasn't a leak.
        
             | paulddraper wrote:
             | I did read the article.
             | 
             | > the memory waste from excess capacity should always be at
             | most 2x, but I was seeing over 200x.
             | 
             | So the 200x analysis is his problem?
        
               | kbknapp wrote:
               | Rust can re-use an allocation, but if the new item is
               | _smaller_ than the previous it doesn 't automatically
               | remove (free) the "wasted" memory left over from the
               | previous allocation. I think this is categorically not a
               | memory leak as the memory was absolutely accounted for
               | and able to be freed (as evidenced by the
               | `shrink_to_fit()`), but I can see how the author was
               | initially confused by this optimization.
               | 
               | The 2x versus 200x confusion IMO is the OP was conflating
               | that Vec will double in size when it needs more space, so
               | they were assuming the memory should have only ever been
               | 2x in the worst case of the _new size_. Which in the OPs
               | case because the new type size was smaller than the
               | previous, it seemed like a massive over-allocation.
               | 
               | Imagine you had a `Vec<Vec<u16>>` and to keep it simple
               | it there were only 2 elements in both the inner and outer
               | Vec's, which if we assume Rust doubled each Vec's
               | allocation that'd be 4x4 "slots" of 2 bytes per slot (or
               | 32 bytes total allocated...in reality it'd be a little
               | different but to keep it simple let's just assume).
               | 
               | Now imagine you replace that allocation with a
               | `Vec<Vec<u8>>` which even with the same doubling of the
               | allocation size would be a maximum of 4x4 slots of 1 byte
               | per slot (16 bytes total allocation required). Well we
               | already have a 32 byte allocation and we only need 16, so
               | Rust just re-uses it, and now it looks like we have 16
               | bytes of "waste."
               | 
               | Now the author was expecting _at most_ 16 bytes
               | (remember, 2x the _new size_ ) but was seeing 32 bytes
               | because Rust just re-used the allocation and didn't free
               | the "extra" 16 bytes. Further, when they ran
               | `Vec::shrink_to_fit()` it shrunk down to only _used_
               | space, which in our example would be a total of 4 bytes
               | (2x2 of 1 byte slots actually used).
               | 
               | Meaning the author was comparing an _observed_ 32 byte
               | allocation, to an expectation of _at most_ 16 bytes, and
               | a properly sized allocation of 4 bytes. Factored out to
               | their real world data I can see how they 'd see numbers
               | greater than "at most 2x."
        
               | IshKebab wrote:
               | 200x is correct. What's happening is that he makes a
               | vector with tons of capacity and only a few elements, so
               | lots of wasted space. Then he turns that vector into
               | another vector using an operation that _used_ to allocate
               | a new vector (thus releasing the wasted space) but now
               | reuses the previous vector 's allocation (retaining the
               | wasted space).
               | 
               | It's definitely a sneaky bug. Not a "memory leak" in the
               | normal sense since the memory will still be freed
               | eventually. I'd call it an unexpected waste of memory.
        
             | hyperpape wrote:
             | I agree that this is not a memory leak.
             | 
             | However, the semantic distinction between "this uses much
             | more memory than expected" and "this is a memory leak" is a
             | little subtle, and it seems pretty rude to call it
             | clickbait.
        
               | devmor wrote:
               | I disagree that it's a small semantic difference.
               | 
               | I don't think it's clickbait though, I think the author
               | was just misusing terminology.
        
               | hyperpape wrote:
               | I said it was subtle, not small. I agree it's a valuable
               | distinction.
        
               | mratsim wrote:
               | A memory leak means it leaks, it's not anymore under
               | control. Here the memory is under control, it can be
               | reclaimed by the program.
        
               | karmakaze wrote:
               | Clickbait (in the context of Rust). In languages with
               | managed memory there are no true memory leaks so such
               | wastes are called leaks. In lower-level languages, we
               | should stay more strict with what we call things.
        
               | deathanatos wrote:
               | ... Box::leak1 is a function that exists. That seems like
               | a memory leak, no?
               | 
               | Less tongue-in-cheek, if a program allocates far more
               | memory than expected of it, I going to colloquially
               | called that a "memory leak". If I see a Java program
               | whose RSS is doing nothing but "up and to the right"
               | until the VM runs out of memory and dies a sweet sweet
               | page thrashing death, I'm going to describe that as a
               | "memory leak". Having someone tell me, "well, actually,
               | it's not a leak per se it's just that the JVM's GC didn't
               | collect all the available garbage prior to running out of
               | memory because ..." ... I don't care? You're just forcing
               | me to wordsmith the problem description ---the problem is
               | still there. Program is still dead, and still exceeding
               | the constraints of the environment it should have been
               | operating in.
               | 
               | The author had some assumptions: that Vec doesn't
               | overalloc by more than 2x, and that collect allocates --
               | one of those did turn out to be false, but I think if I
               | polled Rust programmers, a fair number of them would make
               | the wrong assumption. _I_ would, and TIL from this
               | article that it was wrong, and that collect can reuse the
               | original allocation, despite it not being readily
               | apparent how it knows how to do that with a generic
               | Iterator. (And, the article got me to understand that
               | part, too!)
               | 
               | Unlike most clickbaits which lure you in only to let you
               | down, I learned something here. Reading it was
               | worthwhile.
               | 
               | 1https://doc.rust-
               | lang.org/stable/std/boxed/struct.Box.html#m...
        
               | 0x457 wrote:
               | No, memory leak is a very distinct definition: unused and
               | stored, but inaccessible memory. Memory leak can be as
               | small as a single word. In this case, it's just a memory.
               | There is another term for this scenario, which I don't
               | remember.
               | 
               | This is a case of optimization gone wrong, but nothing is
               | leaked, and every single byte is accounted for.
               | 
               | The title is click bate, but article still interesting to
               | read.
        
           | WirelessGigabit wrote:
           | Looking at this example:                   fn dbg_vec<T>(v:
           | &Vec<T>) {             println!(                 "vec data
           | ptr={:?} len={} cap={}",                 v.as_ptr(),
           | v.len(),                 v.capacity()             );
           | }                  fn main() {             {
           | let v1 = (0u16..128).map(|i| [i; 1024]).collect::<Vec<_>>();
           | dbg_vec(&v1);                 let v2 = v1.into_iter().map(|x|
           | x[0] as u8).collect::<Vec<_>>();
           | dbg_vec(&v2);             }             {                 let
           | v1 = (0u16..128).map(|i| [i; 1024]).collect::<Vec<_>>();
           | dbg_vec(&v1);                 let v2 = v1.into_iter().map(|x|
           | x[0]).collect::<Vec<_>>();                 dbg_vec(&v2);
           | }         }                   # cargo +nightly clean
           | Removed 11 files, 7.3MiB total                  # cargo
           | +nightly run --release            Compiling vec-debug v0.1.0
           | (/home/neo/vec-debug)             Finished release
           | [optimized] target(s) in 0.17s              Running
           | `target/release/vec-debug`         vec data
           | ptr=0x7f47ee162010 len=128 cap=128         vec data
           | ptr=0x7f47ee121010 len=128 cap=262144         vec data
           | ptr=0x55c6514f3ba0 len=128 cap=128         vec data
           | ptr=0x55c6514f3ba0 len=128 cap=131072
           | 
           | What happens is that the original's vectors assigned space
           | gets reused. That's it.
           | 
           | It LOOKS like there is more because the capacity inflates.
           | But capacity is written in terms of sizeof(T), not bits
           | (0u16..128).map(|i| [i; 1024]).collect::<Vec<_>>()
           | 
           | the capacity (and length) is 128 times an array of 1024 of a
           | u16. We then re-use the same underlying array which has a
           | CAPACITY of 128 times (1024 * 16) and put in a u8 (the cast).
           | So each item went from 1024 * 16 to 8.
           | 
           | So previously we had 128 * 1024 * 16 = 2,097,152 bits.
           | 
           | How many times can we put 8 bits in a capacity of 2,097,152
           | bits?
           | 
           | 262,144
           | 
           | How many times can we put 16 bits in a capacity of 2,097,152
           | bits?
           | 
           | 131,072
        
         | karmakaze wrote:
         | A similar thing happens in Go if you don't use a pre-known
         | 'capacity' when allocating slices. Each internal realloc could
         | use a larger backing array. Adding elements 1 by 1 will keep
         | doing this making lots of garbage. They all get collected of
         | course, but performance is horrible with memory fragmentation
         | and all the copying.
        
           | HideousKojima wrote:
           | C# doubles the size of a List<T> each time you hit the
           | capacity (if you didn't set it correctly when you created the
           | list or left it at the default). Does Go do something similar
           | or does it only increase it by 1 each time?
        
             | znkr wrote:
             | It grows by 2x for small sizes, then it transitions to
             | growing it by 1.25x
        
       | stcredzero wrote:
       | It took me a few extra seconds to parse the headline. ("footgun"
       | didn't help) Is                   <Vec<_>>()
       | 
       | perhaps to become another                   -\_(tsu)_/-
       | ?
        
         | tialaramex wrote:
         | It's interesting that HN zapped the second colon. The headline
         | here (at time of my posting) is:
         | 
         | "Identifying Rust's collect:<Vec<_>>() memory leak footgun"
         | 
         | but the actual title of the blog post is:
         | 
         | "Identifying Rust's collect::<Vec<_>>() memory leak footgun"
         | 
         | That extra colon matters to Rust's syntax.
        
         | unshavedyak wrote:
         | Not sure what you're asking. At risk of loosely explaining what
         | you may already know, `<Vec<_>>` is a generic parameter `<T>`
         | with the type `T` specified as `Vec`. Vec also has a param,
         | `<_>`, where the type is ignored - as the compiler can figure
         | that out.
         | 
         | Sidenote, `<Vec<_>>` is missing the best part - the fish head,
         | to become `::<Vec<_>>`, aka `::<>` - the Turbo Fish :D
         | 
         | I admit to using this syntax more than necessary just so i can
         | write a turbofish :D
         | 
         |  _edit_ : Wait no, `<>` is the head - isn't it. hah. So i guess
         | it's missing the fins? /shrug
        
         | avgcorrection wrote:
         | Yes? What about unamused Vec?
        
       | renewiltord wrote:
       | Great write-up. Thank you. I wouldn't have guessed. The upshot of
       | this whole thing is that Vec is for mutable collections and when
       | you're done you make a boxed slice.
        
       | invpt wrote:
       | I appreciate the mention of Box<[T]>. I've never (or at least
       | very rarely?) seen people using it, even in rustc, although I'm
       | sure there's somewhere in the compiler that uses it. I've kind of
       | gotten the feeling that Box<[T]> is frowned upon as a needless
       | optimization because the one-word storage difference is so
       | minimal, but I appreciate having the immutable size semantics as
       | a way of ensuring no extra allocations occur by way of the
       | methods on Vec.
        
         | ironhaven wrote:
         | Another rust type you should look out for is Box<str>In my rust
         | code I find a lot more uses of Box<str> rather than Box<[T]>.
         | Strings are often fixed length identifiers or usernames that
         | get passed around frequently and stored in data structures.
         | Box<str> can be a drop-in replacement in a surprising amount of
         | code
        
           | zozbot234 wrote:
           | Arc<str> is an atomically ref-counted string, also quite
           | useful in async or multi-threaded code where (unlike w/ Box)
           | you sometimes can't know in advance which task will drop the
           | last reference to some data.
        
           | bobbylarrybobby wrote:
           | Box<str> is used pervasively in the rust compiler instead of
           | String for exactly this reason. Basically every string of
           | code the compiler looks at is, unsurprisingly, constant.
        
             | steveklabnik wrote:
             | Yeah, the general concept here is
             | https://en.wikipedia.org/wiki/String_interning, Box<str> is
             | great for it.
        
         | tsnl wrote:
         | Another downside of Box<T> is that Vec::into_boxed_slice copies
         | all elements from the source vector. Vec::shrink_to_fit does
         | not appear to copy. This is based on the stable docs.
        
           | sapiogram wrote:
           | It doesn't copy the elements if the source vector is already
           | at max capacity, which you can often arrange for beforehand.
        
             | cmrx64 wrote:
             | For example, by explicitly calling shrink_to_fit. So you
             | either have fragmentation, or an extra copy if you're not
             | careful, but none of the solutions let you forget about the
             | detail entirely.
        
               | invpt wrote:
               | If you look at the source for into_boxed_slice, it calls
               | shrink_to_fit at the beginning before doing anything
               | else. Hence the documentation is slightly wrong, and no
               | copies occur.
               | 
               | Edit: I submitted a PR to clear up the docs:
               | https://github.com/rust-lang/rust/pull/120110
        
               | cmrx64 wrote:
               | thank you!
        
               | sapiogram wrote:
               | Or alternatively, the vector was created with
               | `Vec::with_capacity()` or `ExactSizeIterator::collect()`,
               | and had the correct capacity all along.
        
       | loeg wrote:
       | This is a pretty surprising behavior. Reusing the allocation
       | without shrinking when the resulting capacity will be within
       | 1.0-2.0x the length: seems reasonable, not super surprising.
       | Reusing the allocation without shrinking when the resulting
       | capacity will be a large multiple of the (known) length: pretty
       | surprising! My intuition is that this is too surprising (at
       | capacity >2) to be worth the possible optimization but probably
       | still worth doing at capacity small multiples of length.
        
         | Groxx wrote:
         | Is reusing an allocation _while changing its size_ a thing you
         | expect to be able to do? I would believe that some languages
         | /systems/etc can do that, but it certainly feels like an
         | exception rather than a rule. Reuse generally means the whole
         | block of memory is retained, from what I've seen, because you'd
         | have to track that half-freed memory for reuse somehow and that
         | has some associated cost. (A compacting-GC language would be a
         | reasonable exception here, for example)
        
       | vlovich123 wrote:
       | I've read the article and I'm still lost as to how this
       | optimization results in a 200x discrepancy between capacity and
       | length. Was the mapped element size 200x smaller in this case? Or
       | is there some bug where repeated mappings like this cause a 2x
       | growth based on the previous element size instead of the target
       | element size?
        
         | hedgehog wrote:
         | I think the behavior is that if you have a Vec of u128 (say
         | 1000), filter that to fewer elements (say 10), and then collect
         | it into a Vec of u32 you might expect the resulting value to be
         | around 40 bytes but in beta Rust it is 16000 bytes. In current
         | Rust the collect would cause a new 10 element Vec of u32 to be
         | allocated, in beta it reuses the original larger allocation.
         | The author's code is doing a bit more but essentially when
         | moving to beta Rust the new optimization caused memory usage to
         | increase by a big multiple.
        
           | vlovich123 wrote:
           | Ok. I missed that there's a filter step that compounds the
           | problem. The more I read the less this sounds like a bug and
           | more like application code is missing a shrink_to_fit and was
           | relying on a pessimization.
           | 
           | That being said, it's also not an unreasonable expectation on
           | the user's behalf that the size and capacity don't get crazy
           | different in code as innocuous and idiomatic as this.
           | 
           | I wonder how the open bug will end up training itself.
        
       | Groxx wrote:
       | While I agree this is a bit surprising... I don't see _literally
       | anything_ in the docs for `collect()` that implies anything about
       | memory behaviors. You get _a collection_ , nothing more is
       | guaranteed.
       | 
       | If you want specific memory behavior, you really do have to use a
       | method that intends to do that. Like `shrink_to_fit()` (mentioned
       | in the article).
        
         | deathanatos wrote:
         | I would say I don't (or now, didn't) expect collect to be able
         | to know that there is an allocation under there, since that's
         | not part of Iterator's interface.
         | 
         | I.e., the docs could call out that there _aren 't_ such implied
         | behaviors, and that in some circumstances, it might reuse the
         | allocation, but that that's not guaranteed. (And ideally, offer
         | me info on what to do if I _do_ want certain guarantees about
         | the allocation of the Vec.)
         | 
         | Warding off what I think is a _reasonable_ , but wrong,
         | assumption, is well within the scope of the docs.
        
       ___________________________________________________________________
       (page generated 2024-01-18 23:00 UTC)