[HN Gopher] So We've Got a Memory Leak
___________________________________________________________________
So We've Got a Memory Leak
Author : todsacerdoti
Score : 169 points
Date : 2024-05-10 05:14 UTC (1 days ago)
(HTM) web link (stevenharman.net)
(TXT) w3m dump (stevenharman.net)
| DonHopkins wrote:
| "I'm not a real programmer. I throw together things until it
| works then I move on. The real programmers will say 'Yeah it
| works but you're leaking memory everywhere. Perhaps we should fix
| that.' I'll just restart Apache every 10 requests." -Rasmus
| Lerdorf, PHP Non-Designer
|
| https://en.wikiquote.org/wiki/Rasmus_Lerdorf
| fragmede wrote:
| That explains PHP
| DonHopkins wrote:
| https://news.ycombinator.com/item?id=40256878
|
| >It's not "supposed" to be that way.
|
| >It just happened to end up that way because Rasmus Lerdorf
| just doesn't give a shit. -\\_(tsu)_/-
|
| [...]
| im3w1l wrote:
| There is a scene in the pirates of the carribean I think of a
| lot. "You are without a doubt the worst pirate I have ever
| heard of" "Ah, but you have heard of me."
|
| He kept the scope down. He shipped. It was hugely successful.
| In the end it was overtaken and rightly so, but that doesn't
| invalidate the success it had.
| DonHopkins wrote:
| But it doesn't justify the arrogance.
|
| "For all the folks getting excited about my quotes. Here is
| another - Yes, I am a terrible coder, but I am probably
| still better than you :)" -Rasmus Lerdorf
|
| OR the continued negligence.
|
| https://news.ycombinator.com/item?id=40256878
|
| And who remembers how careless, reckless, and blithe he was
| with the PHP 5.3.7 release he didn't bother to test because
| running tests was too much of a hassle because there were
| already so many test failures that wading through them all
| to see if there were any new ones was just too much to ask
| of him, the leader of the widely used project, in charge of
| cutting releases?
|
| >5.3.7 upgrade warning: [22-Aug-2011] Due to unfortunate
| issues with 5.3.7 (see bug#55439) users should postpone
| upgrading until 5.3.8 is released (expected in a few days).
|
| No seriously, he's literally as careless as he claims to be
| (when he says that repeatedly, you should believe him!),
| and his lack of giving a shit about things like tests and
| encryption and security that are extremely important has
| caused actual serious security problems, like breaking
| crypt() by checking in sloppy buggy code that would have
| caused a unit test to fail, but without bothering to run
| the unit tests (because so many of them failed anyway, so
| who cares??), and then MAKING A RELEASE of PHP 5.3.7 with,
| OF ALL THINGS, a broken untested crypt()!
|
| http://i.imgur.com/cAvSr.jpg
|
| Do you think that's just his sense of humor, a self
| deprecating joke, breaking then releasing crypt() without
| testing, that's funny in some context? What context would
| that be? Do you just laugh and shrug it off with "Let
| Rasmus be Rasmus!"
|
| https://www.reddit.com/r/programming/comments/jsudd/you_see
| _...
|
| >r314434 (rasmus): Make static analyzers happy
|
| >r315218 (stas): Unbreak crypt() (fix bug #55439) # If you
| want to remove static analyser messages, be my guest, but
| please run unit tests after
|
| http://svn.php.net/viewvc/php/php-
| src/trunk/ext/standard/php...
|
| https://plus.google.com/113641248237520845183/posts/g68d9Rv
| R... [broken link]
|
| >Rasmus Lerdorf
|
| >+Lorenz H.-S. We do. See http://gcov.php.net
|
| >You can see the code coverage, test case failures,
| Valgrind reports and more for each branch.
|
| >The crypt change did trigger a test to fail, we just went
| a bit too fast with the release and didn't notice the
| failure. This is mostly because we have too many test
| failures which is primarily caused by us adding tests for
| bug reports before actually fixing the bug. I still like
| the practice of adding test cases for bugs and then working
| towards making the tests pass, however for some of these
| non-critical bugs that are taking a while to change we
| should probably switch them to XFAIL (expected fail) so
| they don't clutter up the test failure output and thus
| making it harder to spot new failures like this crypt one.
|
| And don't even get me started about
| mysql_real_escape_string! It has the word "real" in it. I
| mean, come on, who would ever name a function "real", and
| why?
|
| That implies the existence of a not-so-real mysql escape
| string function. Why didn't they simply FIX the gaping
| security hole in the not-so-real mysql escape string
| function, instead of maintaining one that was real that you
| should use, and one that was not so real that you should
| definitely not use, in the name of backwards compatibility?
|
| Or were there actually people out there using the non-real
| mysql escape string function, and they didn't want to
| ruffle their feathers by forcing those people with code
| that had a security hole so big you could fly a space
| shuttle through to fix their gaping security holes?
|
| The name of the function "mysql_real_escape_string" says
| all you need to know about the culture and carelessness and
| lack of security consciousness of the PHP community.
|
| Melania Trump's "I REALLY DON'T CARE DO U?" nihilistic
| fashion statement sums up Rasmus Lerdorf's and the PHP
| community's attitude towards security, software quality,
| programming, standards, computer science, and unit tests.
|
| -\\_(tsu)_/-
|
| https://www.youtube.com/watch?v=l5imY2oQauE
| pavlov wrote:
| Never calling free() is a valid memory management strategy if
| you know your process lifetime exactly.
| Hamuko wrote:
| Oldie but a goodie: https://devblogs.microsoft.com/oldnewthin
| g/20180228-00/?p=98...
| alternatex wrote:
| I think the complaint is that there was no strategy. The dude
| just wanted to build some websites.
| rwmj wrote:
| I tried to do this in a real program once. The program only
| ever runs for a fraction of a second and then exits, so it
| seemed like a good candidate. Unfortunately what happened
| after is that our company started running tools like Coverity
| which complained about leaked memory. The path of least
| resistance was to fix that by adding free()'s everywhere.
| matja wrote:
| And then you accidently add a use-after-free bug by trying
| to satisfy a "correctness" tool...
| SubjectToChange wrote:
| It's hard to fault the static analysis tooling here. Not
| freeing memory is, almost always, unintended behavior.
| Besides, if a business is going to blindly apply static
| analysis to their projects and then demand that every
| warning be "fixed", then the tool really isn't the
| problem.
|
| Still, any serious developer should, at the very least,
| have the patience to interrogate their code.
| nkrisc wrote:
| The tool doesn't demand it be satisfied, a manager does.
| jmb99 wrote:
| Coverity will definitely warn you about use-after-free.
| It's not a "correctness" tool, it's a static analyzer and
| probably the best one out there (imo). Yes in this use
| case it's probably not too important to care about, but
| really any code base of importance should be run through
| it on a fairly regular basis.
| adrianN wrote:
| How much use after free and double free did you have to
| fix?
| samatman wrote:
| Worth pointing out that in Zig, you wouldn't have had this
| problem in the first place. The natural way to write a
| program of that nature is to create an arena allocator,
| immediately defer arena.deinit() on the next line, and then
| allocate to your heart's content. When the function
| returns, on an error or otherwise, the arena is freed.
|
| No need to go back later and add a bunch of free(), because
| you correctly implemented the memory policy as a matter of
| course, in two lines.
| iveqy wrote:
| Git is actually using that approach which means that libgit
| is pretty useless to embed, which noone does anyway since
| it's GPL and everyone instead uses libgit2.
| mhh__ wrote:
| Don't do it though.
|
| Unless you know you will only allocate 100x times less memory
| than the average user has it will bite you.
|
| 1. Your code is now fragile (in the taleb sense).
|
| 2. Your code is now unusable when someone wants to use it as
| a library in the future
|
| 3. You are now prone to memory fragmentation (many use a bump
| the pointer allocator when not freeing).
|
| 4. You encourage people to not care about free-ing anything
| -- when you do turn free on, or turn a GC on, you might
| struggle to actually free anything because of references all
| over the place.
| MaxBarraclough wrote:
| In certain circumstances it can be a good move. It might
| significantly improve real world performance, for instance.
|
| Walter did this in the DMD compiler years ago and it gave a
| drastic performance boost. [0]
|
| > You are now prone to memory fragmentation (many use a
| bump the pointer allocator when not freeing)
|
| Pointer-bump allocation is immune to fragmentation by
| definition, no?
|
| Keeping dead objects around might lead to caches filled
| with mostly 'dead' data though.
|
| [0] https://web.archive.org/web/20190126213344/https://www.
| drdob...
| mhh__ wrote:
| Walter doing that is why compiling my work project uses
| 70 gigs of ram and is slow - because nothing stays in the
| cache because everything gets copied so much (because
| copies are cheap right?)
|
| Anecdotally people have told me that's it's faster with a
| modern malloc impl anyway, haven't tried it properly.
|
| > Pointer-bump allocation is immune to fragmentation by
| definition, no?
|
| In some sense yes but what I mean is that like should end
| up near like whereas if you have everything going through
| a single bumping allocator then you can have a pattern of
|
| ABABAB where a proper allocator would do AAABBB which the
| cache can actually use.
| MaxBarraclough wrote:
| > Walter doing that is why compiling my work project uses
| 70 gigs of ram and is slow - because nothing stays in the
| cache
|
| This sounds like guesswork. Do you know the details of
| DMD's internals? Have you done profiling to confirm poor
| cache behaviour?
|
| Again, per the article I linked, when Walter made the
| change there was a drastic _improvement_ in performance.
|
| > everything gets copied so much (because copies are
| cheap right?)
|
| I'm not sure what copying you're referring to here.
| Declining to call _free_ has nothing to do with needless
| copy operations.
|
| > like should end up near like whereas if you have
| everything going through a single bumping allocator then
| you can have a pattern of ABABAB where a proper allocator
| would do AAABBB which the cache can actually use.
|
| A general purpose allocator like _malloc_ can 't segment
| the allocations for different types/purposes, it only has
| the allocation size to go on. That's true whether or not
| you ever call _free_. If you want to manage memory using
| purpose-specific pools, you 'd need to do that yourself.
|
| As for whether this really would improve cache
| performance, I imagine it's possible, which is why we
| have discussions about structure-of-arrays vs array-of-
| structures, and the _entity component system_ pattern
| used in gamedev. I 'd be surprised if compiler code could
| be significantly accelerated with that kind of reworking,
| though.
| mhh__ wrote:
| I know the internals of DMD extremely well.
| mhh__ wrote:
| > I'm not sure what copying you're referring to here.
| Declining to call free has nothing to do with needless
| copy operations
|
| Think.
|
| You make malloc feel inexpensive both by using a crappy
| but fast allocator and disabling free. People
| (reflexively) know this, and then they don't get punished
| when they get extremely careless with their allocations
| because test suites never allocate enough memory to OOM
| the machine.
|
| This isn't some hypothetical, I have measured the amount
| of memory dmd ever actually writes to (i.e. after
| allocation) to be absurdly low. Like single digits.
|
| Pretty much every bit of semantic analysis that hasn't
| been optimized post-facto involves copying hundreds to
| thousands of bytes.
|
| > A general purpose allocator like malloc can't segment
| the allocations for different types/purposes, it only has
| the allocation size to go on. That's true whether or not
| you ever call free.
|
| The size is what I'm getting at, but a D allocator can do
| this because it gets given type information. (dmd
| allocates mainly with `new` which is then forwarded to
| the bump the pointer allocator if the GC is not un-
| disabled)
|
| Also evidence that the exact scheme dmd uses is
| suboptimal wrt allocator impl:
|
| https://forum.dlang.org/thread/zmknwhsidfigzsiqcibs@forum
| .dl...
| dur-randir wrote:
| Instagram ran ~year with GC disabled just fine, citing 10%
| better memory utilisation. While you might be right for a
| library code, on application level it sometimes makes
| sense.
| leni536 wrote:
| free() doing nothing can be a valid application-level
| strategy. Not calling free() can bite you down the line.
| lanstin wrote:
| CGI. But once you loose control of your memory in a large
| code base it is practically impossible to regain it. So you
| can't move to long lived server processes when you decide to
| enter the more modern era.
| chubs wrote:
| When I was a rails developer, the 'done thing' was to simply
| throw hardware at issues like these as an acceptable tradeoff for
| productivity. If you cared about this sort of thing, you'd use
| something more formal. I find it personally difficult to calm my
| pearl-clutching perfectionist tendencies to embrace that approach
| but I can't deny it does work :)
| jsheard wrote:
| Lifehack: instead of admitting you are rebooting the server
| every 10 minutes to clear the memory leaks, call it a "phased
| arena allocation strategy" and then it's fine.
| projektfu wrote:
| Oddly, Apache uses a pool allocator[1], so it is using
| essentially the same strategy as Rasmus already.
|
| 1. or it did 20 years ago, i might be out of date :)
| invisitor wrote:
| I didn't read it all but I noticed I enjoyed the way you write. I
| don't know if it's the emojis or the overall formatting you use.
| smallstepforman wrote:
| I just dont understand the fear with manual memory management.
| With RAII and and simple diligance (clear ownership rules),
| managing memory is an easy engineering task. I actually find it
| *more* challenging to deal with frameworks that insist or
| reference counting and shared pointers since ownership is now
| obscure.
|
| I create it, I free it. I transfer, I no longer care. Its part of
| engineering discipline. Memory bugs are no worse than logic bugs,
| we fix the logic bugs, makes sense to fix the memory bugs.
| Disclaimer: I do embedded complex systems that run 24/7.
|
| We do the same for OS resources (handles, sockets, etc) and dont
| use automatic resource managers, we do it manually. So why
| complicate the design with automatic memory management?
| yosefk wrote:
| 35% of vulnerabilities in the biggest tech companies being due
| to use after free bugs are a part of the answer. (More than 90%
| of severe vulnerabilities are due to memory bugs impossible in
| memory-safe languages.)
| samatman wrote:
| Every memory-safe language has a runtime written in a memory-
| managed language. Yes, even Rust: Rust is implemented using
| many (well-vetted) unsafe blocks.
|
| So projects to improve the quality, and lower the defect
| rate, in memory-managed programming, are far from wasted.
| Even if they only get used to write fast garbage collectors
| so that line coders can get on with the work of delivering
| value to customers.
| loeg wrote:
| Manual memory management vs GC is orthogonal to memory safe
| vs unsafe.
| AgentME wrote:
| In practice the lines of each issue are placed very close
| together. Other than Rust, there are no popular memory safe
| languages that use manual memory management.
| mrkeen wrote:
| > We do the same for OS resources (handles, sockets, etc) and
| dont use automatic resource managers
|
| 1) Modern languages have made inroads here.
|
| 2) The OS is my automatic resource manager. When I hit ctrl-c
| on my running C program, my free() is never hit, yet it is
| cleaned up for me.
|
| > So why complicate the design with automatic memory
| management?
|
| I don't know beforehand who is going to be reading my memory,
| how many times, in what order, or whether they're going to copy
| out of it, or hold onto pointers into it.
|
| > I just dont understand the fear with manual memory
| management. With RAII and and simple diligance (clear ownership
| rules), managing memory is an easy engineering task.
|
| I claim that if managing memory is that straightforward, then
| it makes more sense to leave it to the compiler (Rust-style,
| not Java-style.) rather than let a human risk messing it up.
| jeremyjh wrote:
| Memory leaks can be hard to track down, but overflows and use
| after free bugs can take a project weeks off schedule.
| Depending on what is overwritten and where, the effects show up
| very far from the source of the problem. For an engineering or
| product manager this is a terrifying prospect. Managed memory
| more or less solves those problems - it introduces a couple of
| others and there are still a possibility of resource leaks but
| these problems are both more rare and generally easier to pin
| down.
| elondaits wrote:
| I worked with manual memory management for a decade (24/7
| systems) and don't miss it. It's not per-se hard, it's not
| scary, but if you're dealing with structures that may contain
| reference loops, or using an architecture based on event
| handlers that may be moving references around, you need to do
| some very careful design around memory management instead of
| just thinking of the problem domain.
| hedora wrote:
| By far, the worst memory leak I've ever had to debug involved
| a cycle like you are describing, but it was in a Java program
| (swing encourages/encouraged such leaks, and "memory leaks in
| java are impossible", so there weren't decent heap profilers
| at the time).
|
| For the last few decades, I've been writing c/c++/rust code,
| and the tooling there makes it trivial to find such things.
|
| One good approach is to use a C++ custom allocator (that
| wraps a standard allocator) that gets a reference to a call
| site specific counter (or type specific counter) at compile
| time. When an object is allocated it increments the counter.
| When deleted, it decrements.
|
| Every few minutes, it logs the top 100 allocation sites,
| sorted by object count or memory usage. At process exit,
| return an error code if any counters are non-zero.
|
| With that, people can't check in memory leaks that are
| encountered by tests.
|
| In practice, the overhead of such a thing is too low to be
| measured, so it can be on all the time. That lets it find
| leaks that only occur in customer environments.
| billjings wrote:
| But circular references don't leak in Java. You have to
| have a GC root (e.g. a static, or something in your
| runtime) somewhere pointing at the thing to actually leak
| it.
|
| There is one case where a "circular" reference can appear
| to cause a leak that I know of: WeakHashMap. But that's
| because the keys, which are indeed cleaned up at some point
| once the associated value is GC'd, are themselves strongly
| retained references.
| n4r9 wrote:
| > Memory bugs are no worse than logic bugs
|
| I guess it comes down to what you mean by "worse". It seems
| that memory bugs carry a higher risk of being completely
| invisible except in deliberately contrived situations. This
| makes them more dangerous because there's less of an incentive
| to fix something that clients will never notice.
| _gabe_ wrote:
| > It seems that memory bugs carry a higher risk of being
| completely invisible except in deliberately contrived
| situations.
|
| Do they _really_ carry a higher risk of being invisible
| though? You don't need to "contrive" situations either. My
| company just spent 6+ months trying to solve a JavaScript
| undefined symbol bug stemming from a library. When we finally
| tracked it down, it was because we were using import instead
| of require, which the documentation didn't clarify was
| important. The fix we implemented was nowhere near where we
| expected the bug, and the only reason we finally solved it
| was because we had exhausted all other options. Sounds the
| same, if not worse, as a difficult to track memory bug to me.
| pylua wrote:
| Businesses need software to be less complex and easier to
| develop so it is cheaper.
|
| I can't imagine manual memory management on these large scale
| projects with a high variance of developer scale and opinions.
| watermelon0 wrote:
| > We do the same for OS resources (handles, sockets, etc) and
| dont use automatic resource managers, we do it manually.
|
| Generally, in most languages, file handles and sockets are
| automatically closed, when the variable holding them gets out
| of scope.
| inetknght wrote:
| > _just dont understand the fear with manual memory management.
| With RAII and and simple diligance (clear ownership rules),
| managing memory is an easy engineering task. I actually find
| it_ more* challenging to deal with frameworks that insist or
| reference counting and shared pointers since ownership is now
| obscure.*
|
| I mostly agree.
|
| With RAII, memory management is simplified. "Just" evaluate the
| lifetime of the object. Super easy if you're good at software
| design.
|
| Reference counting and shared pointers still have their niche
| use case. But I've most often seen referencing counting used as
| a crutch where reference-counting is _easy_ but designing
| around object lifetimes is more appropriate.
|
| > _Memory bugs are no worse than logic bugs_
|
| It's true. A memory bug is "just" another logic bug. Memory
| bugs lead to null pointers and wild pointers. They're just as
| dangerous as logic bugs. Memory bugs are also _far_ more
| preventable with RAII.
|
| > _We do the same for OS resources (handles, sockets, etc) and
| dont use automatic resource managers, we do it manually. So why
| complicate the design with automatic memory management?_
|
| I'm just going to point out that I _don 't_ do manual
| allocations for OS resources. I wrap OS resources into RAII
| objects. I've made lots (!) of custom RAII wrappers for OS
| objects and library objects. It's trivial to do and saves a
| _feckton_ of headaches.
|
| C++ has std::fstream. I'm not saying it's great. But it's
| definitely RAII for files and has been around for... well I
| don't know exactly but certainly 25+ years.
| xedrac wrote:
| RAII was a huge improvement over C and I was shocked to see
| Zig forego such an improvement. Rust adopted RAII, and took
| it all to the next level with no data races and a lot more
| bugs caught at compile time.
| gizmo686 wrote:
| Memory bugs is an entire class of bugs that we have simply
| solved. If you use a language with a modern garbage collector
| (e.g. one that can handle cycles), you will very likely go an
| entire project without running into a single memory bug. To a
| first approximation, these bugs were not replaced with other
| bugs; they are simply gone. Further, we do not ask anything
| more of the programmer do accomplish this. Instead, we need the
| programmer to do less work than they would need to do with
| manual memory management.
|
| That is not to say that garbage collection is an unambigous
| win. There are real downsides to using it. But for most
| programs, those modern garbage collectors are good enough that
| those downsides just don't matter.
| barbariangrunge wrote:
| Even game engines (including unreal) use gc these days, which
| is nuts. Still though, it's best to be careful with your
| allocations, use pooling, etc
| im3w1l wrote:
| There is one downside, though it's not inherent to garbage
| collection, it just happens to correlate in current
| languages. Gc'd languages don't have raii, and for that
| reason they actually make you do _more_ work when managing
| non-memory resources. The have been some attempts to work
| around this with e.g. pythons _with_ , but In my opinion it's
| less ergonomic due to forcing indentation.
| vbezhenar wrote:
| Go's defer works like RAII and does not introduce
| indentations. f :=
| createFile("/tmp/defer.txt") defer closeFile(f)
| writeFile(f)
| seabrookmx wrote:
| Even when the syntax does require indentation ("with" in
| python, "using" in C#) it's still pretty clean IMO.
| david_allison wrote:
| Additionally, `using` no longer requires indentation in
| C#
|
| https://learn.microsoft.com/en-us/dotnet/csharp/language-
| ref...
| seabrookmx wrote:
| TIL!
|
| I just recently updated a few small services to C# 12 and
| there's a bunch of little niceties like this I'm finding
| (spread operator for instance).
| pjmlp wrote:
| Using also no longer requires inheritance from IDispose,
| it suffices that the type does support Dispose pattern,
| which is great when coupled with extension methods.
| HideousKojima wrote:
| Using no longer requires a new scope block in more recent
| versions of C#
| valicord wrote:
| This misses the main benefit of RAII which is that you
| can't forget to close the file.
| neonsunset wrote:
| Do they? // Disposed when goes out of
| scope using var http = new HttpClient();
| var text = await
| http.GetStringAsync("https://example.org/");
| mattpallissard wrote:
| > you will very likely go an entire project without running
| into a single memory bug
|
| Sure, you haven't lost the handle to the memory, but you can
| still "leak" memory with a GC. Happens all the time. Add to
| data structure at state 1, do something in state 2, remove
| data in state 3. What happens you never hit state 3?
|
| > But for most programs, those modern garbage collectors are
| good enough that those downsides just don't matter.
|
| I mostly agree with this. Although, most programs hit a point
| where you have to be aware what the GC is doing and try to
| avoid additional allocations. Which isn't very ergonomic at
| times. And is often more fucking around than manual memory
| management, just concentrated to a smaller portion of the
| code base.
| oivey wrote:
| Are you calling just allocating memory you'll never use
| leaking? It is not. It is recoverable because for it not to
| be GC'd it has to be someone accessible. A memory leak
| means you lost the pointer to the memory, so now you can't
| free it.
| arccy wrote:
| lost can include you don't know where to find it even
| though you know it exists...
| Maxatar wrote:
| >Are you calling just allocating memory you'll never use
| leaking? It is not.
|
| Yes it is, by definition a memory leak is any memory that
| has been reserved but won't be read from or written to.
| If you allocate memory into a data structure and after
| some point in time you will never read from or write to
| that memory, then that memory constitutes a leak.
| cobbal wrote:
| There are (at least) 2 definitions of memory leak.
|
| The upsides of the definition you gave are that it's
| simple, well defined, and maximally precise (nothing that
| is safe to collect is considered live)
|
| The significant downside to this definition is that it's
| uncomputable. To know if memory is used requires knowing
| if a loop halts.
|
| The second definition of memory leak is "unreachability"
| which is a bit harder to nail down. It's a conservative
| approximation of the first definition, but is more
| popular because it's computable, and it's practical to
| write programs with or without GC that don't leak
| loeg wrote:
| The latter definition is not particularly useful for
| writing programs that run on hardware with finite memory.
| End users don't care whether or not the allocations are
| reachable when your program uses all the memory in the
| system and crowds out other programs, slows, and/or
| crashes.
| jrockway wrote:
| I'm OK with people using the expression "memory leak" to
| mean "unbounded growth in heap size over time". The case
| that I see most frequently is a routine that's set up
| like "use some memory, wait until an event that never
| happens happens, free memory". Technically if you let
| time run indefinitely, the memory would be freed. But
| since memory is finite, eventually you run out and your
| program crashes, which is annoying in this case.
| loeg wrote:
| And this kind of leak is the same kind that is actually
| difficult to prevent and debug with manual allocation --
| it's present / reachable in some structure, but not
| intentionally. Say you have a cache of items -- how big
| should it grow? Can it free space under memory pressure? If
| you reuse items, do they own collections (e.g., std::vec)
| that hold on to reserved but unused space (e.g. clear()
| doesn't free memory)? These are the hard problems and they
| are approximately the same with or without GC.
| sa46 wrote:
| Joshua Bloch, of Effective Java fame, used the term
| unintentional object retention as a more precise term for
| memory "leaks" in garbage collected languages.
| mattpallissard wrote:
| Seemes a bit pedantic, but so am I. I like that term.
| DowsingSpoon wrote:
| I've also heard this referred to as Abandoned Memory, or
| sometimes a Space Leak. It's a common class of bug in a
| reference counted language, like Objective-C, where
| cycles need special consideration.
| Jasper_ wrote:
| Uncollectable reference cycles are shockingly easy to make in
| JS, especially with React. A classic example:
| function closure() { var smallObject = 3;
| var largeObject = Array(1000000); function
| longLived() { return smallObject; } function
| shortLived() { return largeObject; }
| shortLived(); return longLived; }
|
| Will keep largeObject alive.
| cobbal wrote:
| Is this a property of JavaScript or the engine running it?
| This feels like something a sufficiently-smart(tm) closure
| implementation should be able to prevent.
| Jasper_ wrote:
| This is an artifact of V8's GC. Effectively, largeObject
| and smallObject are tracked together, as a unit.
| Splitting it out into two separate records increases
| average memory usage. They keep saying they want to fix
| it eventually, but it's been this way for 10+ years at
| this point.
|
| You really do have to know the quirks of what you're
| targeting.
| kevingadd wrote:
| JS runtimes are allowed to optimize this out, IIRC, and
| will often do so.
| AgentME wrote:
| This isn't an uncollectable reference cycle. It's true that
| with this code in most/all JS engines, if there's a
| reference to the function `longLived` then `largeObject`
| will be kept in memory, but reference cycles are
| collectable in standard garbage collection systems. Both of
| the values will be garbage-collectable once no outside
| references to `longLived` still exist. Pure reference
| counting systems (Rust Rc, C++ shared_ptr, etc) are the
| kind of system that fail to automatically handle cycles.
|
| You could test this with your code by setting a
| FinalizationRegistry to log when they're both finalized,
| unset any outside reference to `longLived`, and then do
| something to force a GC (like run Node with --expose-gc and
| call `global.gc()`, or just allocate a big Uint8Array).
| patrick451 wrote:
| Performance regression is a bug and GC has a horrific
| overhead in the average program. My computer is orders of
| magnitude faster than it was 15 years ago but it spends all
| of it's time wondering around in the wilderness hunting for
| memory to free. We could have just told it.
|
| > those modern garbage collectors are good enough that those
| downsides just don't matter.
|
| This is not what I see. Every I look at a profile of a
| program written in a GCed language, most of the time is spent
| in the garbage collector. I can't recall the last time I
| looked at a c++ profile where more than 10% of the time spent
| in new/delete. I have seen 100x speedups by disabling GC. You
| can't ship that, but it proves there is massive overhead to
| garbage collection.
| macintux wrote:
| > GC has a horrific overhead in the average program
|
| It doesn't have to be that way. The BEAM is designed for
| tiny processes, and GC is cheap.
| toast0 wrote:
| I love Erlang and BEAM, but the reason the GC is (mostly)
| cheap is because self-referential data structures are
| impossible, so you can have a very simple and yet still
| very effective GC. One heap per process also helps
| immensely.
|
| Also, when your process has a giant heap, the GC gets
| expensive. There's been lots of improvement over the
| years, but I used to have processes that could usually go
| through N messages/sec, but if they ever got a backlog,
| the throughput would drop, and that would tend towards
| more backlog and even less throughput, and pretty soon it
| will never catch up.
|
| That sometimes happens with other GC systems, but it
| never feels quite so dire.
| armchairhacker wrote:
| IMO it's not that memory management is hard, it's that
| developers are imperfect, so writing a 100% UB-free, leak-free
| program is hard. And it only takes a single oversight to cause
| havoc, be it a CVE, leak that causes a long-running program to
| slowly grow memory, or hair-pulling bug that occurs 1 in 1000
| times.
|
| Yes, logic bugs have the same issues, and even in languages
| like Java we can sometimes (albeit rarely IIRC) get memory
| leaks. But memory-safe languages are an _improvement_. Just
| like TypeScript is an improvement over JavaScript, even though
| they have the same semantics. We have automated systems that
| can decrease the amount of memory errors from 1% to 0.01%, why
| keep leak and UB prevention a manual concern? Moreover, the
| drawbacks of memory-safe languages are often not an issue: you
| can use a GC-based language like Java which is easy but has
| overhead, or an enforced-ownership language like Rust which has
| a learning curve but no overhead. Meanwhile, while logic bugs
| can be a PITA, memory bugs are particularly notorious, since
| they don't give you a clear error message or in some cases even
| cause your program to halt when they occur.
|
| Tangential: another solution that practically eliminated a
| class of bugs is formal verification. And currently we don't
| see it in any system but those where correctness is most
| important, but that's because unlike memory management the
| downsides are very large (extremely verbose, tricky code that
| must be structured a certain way or it's even more verbose and
| tricky). But if formal verification gets better I believe that
| will also start to become more mainstream.
| jandrewrogers wrote:
| The use of manual memory management increases the cognitive
| load when reasoning about software. Working memory capacity
| varies considerably between people and is a performance
| limiting factor when designing complex systems. You see
| analogues of this in other engineering disciplines too.
|
| Over my many years of working in software development I have
| come around to the idea that most software developers do not
| have sufficient working memory capacity to reason about memory
| management in addition to everything else they must reason
| about at the same time. They may know mechanically how to
| properly do manual memory management but when writing code they
| drop things if they are juggling too many things at once
| mentally. It isn't a matter of effort, there simply is a
| complexity threshold past which something has to give.
| Automatic memory management has its tradeoffs but it also
| enables some people to be effective who otherwise would not be.
|
| On the other side of that you have the minority that get manual
| memory management right every time almost effortlessly, who
| don't grok why it is so difficult for everyone else because it
| is literally easy for them. I think some people can't imagine
| that such a group of engineers exist because they are not in
| that group and their brain doesn't work that way. If you are in
| this group, automatic memory management will seem to make
| things worse for unclear benefit.
|
| I've observed this bifurcation in systems software my entire
| career. In discussions about memory safety and memory
| management, it is often erroneously assumed that the latter
| population simply doesn't exist.
| mattpallissard wrote:
| I think the rift comes from people thinking manual memory
| management is a bunch of random allocs and frees all over the
| place. That is gross and those of us who don't mind managing
| memory don't like it either.
|
| Personally my gripe is when people don't think about memory
| or space complexity at all. I don't care if it's a custom
| memory management strategy or GC'd language, you need to
| think about it. I have the same gripe about persistence,
| socket programming, and database queries.
|
| Abstractions are great, use them, love them. But
| understanding what the hell they are doing under the hood not
| only improves efficiency, it prevents bugs, and gives you a
| really solid base when reasoning about unexpected behavior.
| mirsadm wrote:
| Any professional developer should understand these things.
| It was taught at first year of my computer science degree.
| As you say it isn't particularly difficult if you have a
| strategy. If you can't manage memory then what about any
| other resource that needs to be manually closed (sockets,
| file handles etc ).
| da_chicken wrote:
| I'm not sure I understand the causal link between
| something having a proven track record of being error-
| prone and not _understanding_ it.
| prerok wrote:
| The problem is that automatic memory management means to
| some newcomers that you don't have to worry about it.
| Which is not true at all.
|
| These newcomers may have heard about it, may even
| understand it at some level, but it's not in their rote
| knowledge. GC is great if you know that every allocation
| you do is expensive and that you know that it will be
| taken care of properly, because it will be short-lived.
|
| But I have seen many cases where people just don't have
| it in their conciousness why allocating huge arrays is a
| problem or why allocating a new Character object for
| every single character in a string might be bad. As soon
| as I point it out, they get it, but it's not like they
| thought of it while writing their algorithm.
| Dylan16807 wrote:
| If the automatic allocation tools are good, then they're
| doing the same thing as quality manual management but with
| much more compiler enforcement.
| pjmlp wrote:
| Which is actually the case in enterprise source code,
| touched by hundreds of offshoring companies.
| artemonster wrote:
| Oh yes! And also dont forget about the hand holding static type
| system that wont let you fart unless you explicitly convince it
| that you will not shit your pants!
| bigstrat2003 wrote:
| If manually managing memory were in fact an easy engineering
| task, software developers wouldn't be so demonstrably bad at
| it.
| derriz wrote:
| Sure, for simple flows-of-control, GC buys you little and an
| RAII type approach is fine. But RAII really only works if
| lifecycle and ownership spans are reflected in the lexical
| structure of the code base. RAII relies on creation and cleanup
| happening within a block/lexical scope.
|
| Unfortunately in the real C++ codebases I've had to work with,
| the flow of control is never so simple. Any correspondence
| between lexical scope and lifecycle is broken if the code uses
| exceptions, callbacks, any type of async, threads, etc.
| tlarkworthy wrote:
| ... or a persistent data-structure [1].
|
| [1] https://en.wikipedia.org/wiki/Persistent_data_structure
| pjmlp wrote:
| It doesn't scale in large teams of various skill levels.
| SomeoneFromCA wrote:
| "Much has been written about various tools for profiling a leak,
| understanding heap dumps, common causes of leaks".
|
| Eww.. leaks and heap dumps. Someone needs a healthier diet.
| gavinhoward wrote:
| Valgrind makes finding leaks so easy in C.
|
| Fixing them is harder, but it's usually easy if your design is
| right. I usually allocate and free in the same function unless
| the function is meant to allocate for the caller. (And then that
| call is considered allocation in the caller.)
| canucker2016 wrote:
| It's the "repro the bug" that's hard...
|
| When I worked on static analysis of codebases, the error
| handling codepath was the most likely source of problems.
| gavinhoward wrote:
| Yes.
|
| To do that, I fuzz like crazy, and every path becomes a test,
| even if it means a lot of manual work checking each one. That
| alone exercises tons of error paths.
|
| To cover 99% of the rest, I use SQLite's error injection [1]
| on all tests. Just doing error injection on allocation gets
| you 90% of the way there.
|
| [1]: https://sqlite.org/testing.html#anomaly_testing
| 1024core wrote:
| Reminds me of this story I heard about Yahoo. Their ads server
| had a memory leak and it would OOM after something like 10000
| requests.
|
| Their solution: restart the server after 8000 requests.
|
| This worked for a year or two. And then it started OOM-ing after
| 8000 requests.
|
| Next solution: restart the server after 6000 requests.
| xandrius wrote:
| If that gives you extra time to move the problem to the future,
| I'd say that's a win :D
| otabdeveloper4 wrote:
| > Their solution: restart the server after 8000 requests.
|
| 8000 requests is something like 500 milliseconds for the
| average ad server.
|
| You need exceptionally fast restarts for that to work.
| wging wrote:
| Perhaps now, but the story is about Yahoo, which means it
| could be from the early 2000s or late 90s. Traffic volumes
| were probably lower, computers were definitely slower,
| internet advertising was not as big as it is now, etc.
| toast0 wrote:
| This is in the context of (y)Apache. You set
| MaxRequestsPerChild, so when the request limit is hit, the
| child is killed, a new one started, and requests can be
| served by other children until the replacement is ready. In a
| pure config, idle children block on accept, so the kernel
| does all the coordination work required to get requests to
| children, the child exits after hitting the cap, and the
| parent catches the exit and spawns another. As long as all
| the children don't hit their cap at once, there's no gap in
| service. If they do all hit the cap at once, it's still not
| so bad.
|
| I don't know about ads, but on Yahoo Travel, I had no qualms
| about solving memory leaks that took months to show up with
| MaxRequestsPerChild 100000. I gave valgrind or something a
| whirl, but didn't find it right away and was tired of getting
| paged once a quarter for it, so...
|
| I did do some scaleout work for Shopping once, and found
| theirs set at 3. I don't remember where I left it, but much
| higher. Nobody knew why it was 3, so I took the chance that
| it would become aparrent if I set it to 100, and nothing
| obvious was wrong and the servers were much happier. fork()
| is fast, but it's not fast enough to do it every 3 requests.
| seanhunter wrote:
| So one place I worked might win some sort of prize for the
| dumbest way to burn $5m with a memory leak.
|
| So back in the 90s the printer driver in Solaris had a memory
| leak[1]. At the time I was a contractor for a big bank that is
| sadly/not sadly no longer with us any more. This was when the
| status of faxes in confirming contracts hadn't been sufficiently
| tested in court so banks used to book trades by fax and the
| system that sent the fax would also send a document to a
| particular printer which would print the trade confirm out and
| there was some poor sap whos job consisted of picking up the
| confirms from this printer, calling the counterparty and reading
| them out so that they were all on tape[2] and legally confirmed.
|
| Anyhow one day the memory leak caused the printer driver to fall
| over and fail to print out one of the confirms so the person
| didn't read it out on the telephone. The market moved
| substantially and the counterparty DKd the trade[3]. A lot of
| huffing and puffing by senior executives at the bank had no
| effect and they booked a $5m loss and made a policy to never
| trade with that particular bank again[4]. The fax printer job was
| moved to windows NT.
|
| [1] According to the excellent "Expert C programming" this was
| eventually fixed because Scott McNealy, then CEO of Sun
| Microsystems had been given a very underpowered workstation (as
| CEO) so was affected a lot by this problem and eventually made
| enough of a fuss the devs got around to fixing it
| https://progforperf.github.io/Expert_C_Programming.pdf
|
| [2] Calls in the securities division of banks are pretty much
| always recorded for legal and compliance reasons
|
| [3] DK stands for "Don't know". If the other side says they
| "don't know" a trade they are disputing the fact that a contract
| was made.
|
| [4] Which I'm sure hurt us more than it hurt them as they could
| just trade somewhere else and pay some other bank their
| commission
| fbdab103 wrote:
| >...they booked a $5m loss and made a policy to never trade
| with that particular bank again
|
| Maybe I am too cynical, but would many businesses retroactively
| agree to a deal which would cost them a ton of money? If the
| Process requires Is dotted, Ts crossed, and a phone call
| confirmation which was never placed -why eat the loss when the
| other party should own the error?
|
| Citi just had a lawsuit because of paying back a loan too
| quickly. I expect everyone in finance to play hard ball on
| written agreements when it works in their favor.
| seanhunter wrote:
| Sometimes it happens. Particularly in big US old school
| broker-dealers, "Dictum Meum Pactum"[1] is something some
| people take very seriously especially since you will have a
| fruitful (if adversarial) working relationship over many
| years and may need someone to do you a personal favour in the
| future (eg giving you a job etc).
|
| For example I know of one US investment bank where a very
| large options position was "booked" by a trader using a
| spreadsheet rather than in the official booking system which
| meant that the normal "exercise and expiry" alerts didn't go
| off to warn people when the trade was about to expire. The
| trader in question went on holiday and as a result the trade
| expired more than a billion dollars[2] in the money. The CEO
| of the bank called up the counterparty and successfully
| persuaded them to honour the trade and pay up even though it
| had actually expired and everyone knew there was no legal
| obligation. As it was explained to me at the time, the
| counterparty had probably hedged the trade so was not
| scratching around the sofa trying to find the money.
|
| [1] "My word is my bond"
|
| [2] Yes. With a b.
___________________________________________________________________
(page generated 2024-05-11 23:00 UTC)