[HN Gopher] So We've Got a Memory Leak
       ___________________________________________________________________
        
       So We've Got a Memory Leak
        
       Author : todsacerdoti
       Score  : 169 points
       Date   : 2024-05-10 05:14 UTC (1 days ago)
        
 (HTM) web link (stevenharman.net)
 (TXT) w3m dump (stevenharman.net)
        
       | DonHopkins wrote:
       | "I'm not a real programmer. I throw together things until it
       | works then I move on. The real programmers will say 'Yeah it
       | works but you're leaking memory everywhere. Perhaps we should fix
       | that.' I'll just restart Apache every 10 requests." -Rasmus
       | Lerdorf, PHP Non-Designer
       | 
       | https://en.wikiquote.org/wiki/Rasmus_Lerdorf
        
         | fragmede wrote:
         | That explains PHP
        
           | DonHopkins wrote:
           | https://news.ycombinator.com/item?id=40256878
           | 
           | >It's not "supposed" to be that way.
           | 
           | >It just happened to end up that way because Rasmus Lerdorf
           | just doesn't give a shit. -\\_(tsu)_/-
           | 
           | [...]
        
           | im3w1l wrote:
           | There is a scene in the pirates of the carribean I think of a
           | lot. "You are without a doubt the worst pirate I have ever
           | heard of" "Ah, but you have heard of me."
           | 
           | He kept the scope down. He shipped. It was hugely successful.
           | In the end it was overtaken and rightly so, but that doesn't
           | invalidate the success it had.
        
             | DonHopkins wrote:
             | But it doesn't justify the arrogance.
             | 
             | "For all the folks getting excited about my quotes. Here is
             | another - Yes, I am a terrible coder, but I am probably
             | still better than you :)" -Rasmus Lerdorf
             | 
             | OR the continued negligence.
             | 
             | https://news.ycombinator.com/item?id=40256878
             | 
             | And who remembers how careless, reckless, and blithe he was
             | with the PHP 5.3.7 release he didn't bother to test because
             | running tests was too much of a hassle because there were
             | already so many test failures that wading through them all
             | to see if there were any new ones was just too much to ask
             | of him, the leader of the widely used project, in charge of
             | cutting releases?
             | 
             | >5.3.7 upgrade warning: [22-Aug-2011] Due to unfortunate
             | issues with 5.3.7 (see bug#55439) users should postpone
             | upgrading until 5.3.8 is released (expected in a few days).
             | 
             | No seriously, he's literally as careless as he claims to be
             | (when he says that repeatedly, you should believe him!),
             | and his lack of giving a shit about things like tests and
             | encryption and security that are extremely important has
             | caused actual serious security problems, like breaking
             | crypt() by checking in sloppy buggy code that would have
             | caused a unit test to fail, but without bothering to run
             | the unit tests (because so many of them failed anyway, so
             | who cares??), and then MAKING A RELEASE of PHP 5.3.7 with,
             | OF ALL THINGS, a broken untested crypt()!
             | 
             | http://i.imgur.com/cAvSr.jpg
             | 
             | Do you think that's just his sense of humor, a self
             | deprecating joke, breaking then releasing crypt() without
             | testing, that's funny in some context? What context would
             | that be? Do you just laugh and shrug it off with "Let
             | Rasmus be Rasmus!"
             | 
             | https://www.reddit.com/r/programming/comments/jsudd/you_see
             | _...
             | 
             | >r314434 (rasmus): Make static analyzers happy
             | 
             | >r315218 (stas): Unbreak crypt() (fix bug #55439) # If you
             | want to remove static analyser messages, be my guest, but
             | please run unit tests after
             | 
             | http://svn.php.net/viewvc/php/php-
             | src/trunk/ext/standard/php...
             | 
             | https://plus.google.com/113641248237520845183/posts/g68d9Rv
             | R... [broken link]
             | 
             | >Rasmus Lerdorf
             | 
             | >+Lorenz H.-S. We do. See http://gcov.php.net
             | 
             | >You can see the code coverage, test case failures,
             | Valgrind reports and more for each branch.
             | 
             | >The crypt change did trigger a test to fail, we just went
             | a bit too fast with the release and didn't notice the
             | failure. This is mostly because we have too many test
             | failures which is primarily caused by us adding tests for
             | bug reports before actually fixing the bug. I still like
             | the practice of adding test cases for bugs and then working
             | towards making the tests pass, however for some of these
             | non-critical bugs that are taking a while to change we
             | should probably switch them to XFAIL (expected fail) so
             | they don't clutter up the test failure output and thus
             | making it harder to spot new failures like this crypt one.
             | 
             | And don't even get me started about
             | mysql_real_escape_string! It has the word "real" in it. I
             | mean, come on, who would ever name a function "real", and
             | why?
             | 
             | That implies the existence of a not-so-real mysql escape
             | string function. Why didn't they simply FIX the gaping
             | security hole in the not-so-real mysql escape string
             | function, instead of maintaining one that was real that you
             | should use, and one that was not so real that you should
             | definitely not use, in the name of backwards compatibility?
             | 
             | Or were there actually people out there using the non-real
             | mysql escape string function, and they didn't want to
             | ruffle their feathers by forcing those people with code
             | that had a security hole so big you could fly a space
             | shuttle through to fix their gaping security holes?
             | 
             | The name of the function "mysql_real_escape_string" says
             | all you need to know about the culture and carelessness and
             | lack of security consciousness of the PHP community.
             | 
             | Melania Trump's "I REALLY DON'T CARE DO U?" nihilistic
             | fashion statement sums up Rasmus Lerdorf's and the PHP
             | community's attitude towards security, software quality,
             | programming, standards, computer science, and unit tests.
             | 
             | -\\_(tsu)_/-
             | 
             | https://www.youtube.com/watch?v=l5imY2oQauE
        
         | pavlov wrote:
         | Never calling free() is a valid memory management strategy if
         | you know your process lifetime exactly.
        
           | Hamuko wrote:
           | Oldie but a goodie: https://devblogs.microsoft.com/oldnewthin
           | g/20180228-00/?p=98...
        
           | alternatex wrote:
           | I think the complaint is that there was no strategy. The dude
           | just wanted to build some websites.
        
           | rwmj wrote:
           | I tried to do this in a real program once. The program only
           | ever runs for a fraction of a second and then exits, so it
           | seemed like a good candidate. Unfortunately what happened
           | after is that our company started running tools like Coverity
           | which complained about leaked memory. The path of least
           | resistance was to fix that by adding free()'s everywhere.
        
             | matja wrote:
             | And then you accidently add a use-after-free bug by trying
             | to satisfy a "correctness" tool...
        
               | SubjectToChange wrote:
               | It's hard to fault the static analysis tooling here. Not
               | freeing memory is, almost always, unintended behavior.
               | Besides, if a business is going to blindly apply static
               | analysis to their projects and then demand that every
               | warning be "fixed", then the tool really isn't the
               | problem.
               | 
               | Still, any serious developer should, at the very least,
               | have the patience to interrogate their code.
        
               | nkrisc wrote:
               | The tool doesn't demand it be satisfied, a manager does.
        
               | jmb99 wrote:
               | Coverity will definitely warn you about use-after-free.
               | It's not a "correctness" tool, it's a static analyzer and
               | probably the best one out there (imo). Yes in this use
               | case it's probably not too important to care about, but
               | really any code base of importance should be run through
               | it on a fairly regular basis.
        
             | adrianN wrote:
             | How much use after free and double free did you have to
             | fix?
        
             | samatman wrote:
             | Worth pointing out that in Zig, you wouldn't have had this
             | problem in the first place. The natural way to write a
             | program of that nature is to create an arena allocator,
             | immediately defer arena.deinit() on the next line, and then
             | allocate to your heart's content. When the function
             | returns, on an error or otherwise, the arena is freed.
             | 
             | No need to go back later and add a bunch of free(), because
             | you correctly implemented the memory policy as a matter of
             | course, in two lines.
        
           | iveqy wrote:
           | Git is actually using that approach which means that libgit
           | is pretty useless to embed, which noone does anyway since
           | it's GPL and everyone instead uses libgit2.
        
           | mhh__ wrote:
           | Don't do it though.
           | 
           | Unless you know you will only allocate 100x times less memory
           | than the average user has it will bite you.
           | 
           | 1. Your code is now fragile (in the taleb sense).
           | 
           | 2. Your code is now unusable when someone wants to use it as
           | a library in the future
           | 
           | 3. You are now prone to memory fragmentation (many use a bump
           | the pointer allocator when not freeing).
           | 
           | 4. You encourage people to not care about free-ing anything
           | -- when you do turn free on, or turn a GC on, you might
           | struggle to actually free anything because of references all
           | over the place.
        
             | MaxBarraclough wrote:
             | In certain circumstances it can be a good move. It might
             | significantly improve real world performance, for instance.
             | 
             | Walter did this in the DMD compiler years ago and it gave a
             | drastic performance boost. [0]
             | 
             | > You are now prone to memory fragmentation (many use a
             | bump the pointer allocator when not freeing)
             | 
             | Pointer-bump allocation is immune to fragmentation by
             | definition, no?
             | 
             | Keeping dead objects around might lead to caches filled
             | with mostly 'dead' data though.
             | 
             | [0] https://web.archive.org/web/20190126213344/https://www.
             | drdob...
        
               | mhh__ wrote:
               | Walter doing that is why compiling my work project uses
               | 70 gigs of ram and is slow - because nothing stays in the
               | cache because everything gets copied so much (because
               | copies are cheap right?)
               | 
               | Anecdotally people have told me that's it's faster with a
               | modern malloc impl anyway, haven't tried it properly.
               | 
               | > Pointer-bump allocation is immune to fragmentation by
               | definition, no?
               | 
               | In some sense yes but what I mean is that like should end
               | up near like whereas if you have everything going through
               | a single bumping allocator then you can have a pattern of
               | 
               | ABABAB where a proper allocator would do AAABBB which the
               | cache can actually use.
        
               | MaxBarraclough wrote:
               | > Walter doing that is why compiling my work project uses
               | 70 gigs of ram and is slow - because nothing stays in the
               | cache
               | 
               | This sounds like guesswork. Do you know the details of
               | DMD's internals? Have you done profiling to confirm poor
               | cache behaviour?
               | 
               | Again, per the article I linked, when Walter made the
               | change there was a drastic _improvement_ in performance.
               | 
               | > everything gets copied so much (because copies are
               | cheap right?)
               | 
               | I'm not sure what copying you're referring to here.
               | Declining to call _free_ has nothing to do with needless
               | copy operations.
               | 
               | > like should end up near like whereas if you have
               | everything going through a single bumping allocator then
               | you can have a pattern of ABABAB where a proper allocator
               | would do AAABBB which the cache can actually use.
               | 
               | A general purpose allocator like _malloc_ can 't segment
               | the allocations for different types/purposes, it only has
               | the allocation size to go on. That's true whether or not
               | you ever call _free_. If you want to manage memory using
               | purpose-specific pools, you 'd need to do that yourself.
               | 
               | As for whether this really would improve cache
               | performance, I imagine it's possible, which is why we
               | have discussions about structure-of-arrays vs array-of-
               | structures, and the _entity component system_ pattern
               | used in gamedev. I 'd be surprised if compiler code could
               | be significantly accelerated with that kind of reworking,
               | though.
        
               | mhh__ wrote:
               | I know the internals of DMD extremely well.
        
               | mhh__ wrote:
               | > I'm not sure what copying you're referring to here.
               | Declining to call free has nothing to do with needless
               | copy operations
               | 
               | Think.
               | 
               | You make malloc feel inexpensive both by using a crappy
               | but fast allocator and disabling free. People
               | (reflexively) know this, and then they don't get punished
               | when they get extremely careless with their allocations
               | because test suites never allocate enough memory to OOM
               | the machine.
               | 
               | This isn't some hypothetical, I have measured the amount
               | of memory dmd ever actually writes to (i.e. after
               | allocation) to be absurdly low. Like single digits.
               | 
               | Pretty much every bit of semantic analysis that hasn't
               | been optimized post-facto involves copying hundreds to
               | thousands of bytes.
               | 
               | > A general purpose allocator like malloc can't segment
               | the allocations for different types/purposes, it only has
               | the allocation size to go on. That's true whether or not
               | you ever call free.
               | 
               | The size is what I'm getting at, but a D allocator can do
               | this because it gets given type information. (dmd
               | allocates mainly with `new` which is then forwarded to
               | the bump the pointer allocator if the GC is not un-
               | disabled)
               | 
               | Also evidence that the exact scheme dmd uses is
               | suboptimal wrt allocator impl:
               | 
               | https://forum.dlang.org/thread/zmknwhsidfigzsiqcibs@forum
               | .dl...
        
             | dur-randir wrote:
             | Instagram ran ~year with GC disabled just fine, citing 10%
             | better memory utilisation. While you might be right for a
             | library code, on application level it sometimes makes
             | sense.
        
           | leni536 wrote:
           | free() doing nothing can be a valid application-level
           | strategy. Not calling free() can bite you down the line.
        
           | lanstin wrote:
           | CGI. But once you loose control of your memory in a large
           | code base it is practically impossible to regain it. So you
           | can't move to long lived server processes when you decide to
           | enter the more modern era.
        
       | chubs wrote:
       | When I was a rails developer, the 'done thing' was to simply
       | throw hardware at issues like these as an acceptable tradeoff for
       | productivity. If you cared about this sort of thing, you'd use
       | something more formal. I find it personally difficult to calm my
       | pearl-clutching perfectionist tendencies to embrace that approach
       | but I can't deny it does work :)
        
         | jsheard wrote:
         | Lifehack: instead of admitting you are rebooting the server
         | every 10 minutes to clear the memory leaks, call it a "phased
         | arena allocation strategy" and then it's fine.
        
           | projektfu wrote:
           | Oddly, Apache uses a pool allocator[1], so it is using
           | essentially the same strategy as Rasmus already.
           | 
           | 1. or it did 20 years ago, i might be out of date :)
        
       | invisitor wrote:
       | I didn't read it all but I noticed I enjoyed the way you write. I
       | don't know if it's the emojis or the overall formatting you use.
        
       | smallstepforman wrote:
       | I just dont understand the fear with manual memory management.
       | With RAII and and simple diligance (clear ownership rules),
       | managing memory is an easy engineering task. I actually find it
       | *more* challenging to deal with frameworks that insist or
       | reference counting and shared pointers since ownership is now
       | obscure.
       | 
       | I create it, I free it. I transfer, I no longer care. Its part of
       | engineering discipline. Memory bugs are no worse than logic bugs,
       | we fix the logic bugs, makes sense to fix the memory bugs.
       | Disclaimer: I do embedded complex systems that run 24/7.
       | 
       | We do the same for OS resources (handles, sockets, etc) and dont
       | use automatic resource managers, we do it manually. So why
       | complicate the design with automatic memory management?
        
         | yosefk wrote:
         | 35% of vulnerabilities in the biggest tech companies being due
         | to use after free bugs are a part of the answer. (More than 90%
         | of severe vulnerabilities are due to memory bugs impossible in
         | memory-safe languages.)
        
           | samatman wrote:
           | Every memory-safe language has a runtime written in a memory-
           | managed language. Yes, even Rust: Rust is implemented using
           | many (well-vetted) unsafe blocks.
           | 
           | So projects to improve the quality, and lower the defect
           | rate, in memory-managed programming, are far from wasted.
           | Even if they only get used to write fast garbage collectors
           | so that line coders can get on with the work of delivering
           | value to customers.
        
           | loeg wrote:
           | Manual memory management vs GC is orthogonal to memory safe
           | vs unsafe.
        
             | AgentME wrote:
             | In practice the lines of each issue are placed very close
             | together. Other than Rust, there are no popular memory safe
             | languages that use manual memory management.
        
         | mrkeen wrote:
         | > We do the same for OS resources (handles, sockets, etc) and
         | dont use automatic resource managers
         | 
         | 1) Modern languages have made inroads here.
         | 
         | 2) The OS is my automatic resource manager. When I hit ctrl-c
         | on my running C program, my free() is never hit, yet it is
         | cleaned up for me.
         | 
         | > So why complicate the design with automatic memory
         | management?
         | 
         | I don't know beforehand who is going to be reading my memory,
         | how many times, in what order, or whether they're going to copy
         | out of it, or hold onto pointers into it.
         | 
         | > I just dont understand the fear with manual memory
         | management. With RAII and and simple diligance (clear ownership
         | rules), managing memory is an easy engineering task.
         | 
         | I claim that if managing memory is that straightforward, then
         | it makes more sense to leave it to the compiler (Rust-style,
         | not Java-style.) rather than let a human risk messing it up.
        
         | jeremyjh wrote:
         | Memory leaks can be hard to track down, but overflows and use
         | after free bugs can take a project weeks off schedule.
         | Depending on what is overwritten and where, the effects show up
         | very far from the source of the problem. For an engineering or
         | product manager this is a terrifying prospect. Managed memory
         | more or less solves those problems - it introduces a couple of
         | others and there are still a possibility of resource leaks but
         | these problems are both more rare and generally easier to pin
         | down.
        
         | elondaits wrote:
         | I worked with manual memory management for a decade (24/7
         | systems) and don't miss it. It's not per-se hard, it's not
         | scary, but if you're dealing with structures that may contain
         | reference loops, or using an architecture based on event
         | handlers that may be moving references around, you need to do
         | some very careful design around memory management instead of
         | just thinking of the problem domain.
        
           | hedora wrote:
           | By far, the worst memory leak I've ever had to debug involved
           | a cycle like you are describing, but it was in a Java program
           | (swing encourages/encouraged such leaks, and "memory leaks in
           | java are impossible", so there weren't decent heap profilers
           | at the time).
           | 
           | For the last few decades, I've been writing c/c++/rust code,
           | and the tooling there makes it trivial to find such things.
           | 
           | One good approach is to use a C++ custom allocator (that
           | wraps a standard allocator) that gets a reference to a call
           | site specific counter (or type specific counter) at compile
           | time. When an object is allocated it increments the counter.
           | When deleted, it decrements.
           | 
           | Every few minutes, it logs the top 100 allocation sites,
           | sorted by object count or memory usage. At process exit,
           | return an error code if any counters are non-zero.
           | 
           | With that, people can't check in memory leaks that are
           | encountered by tests.
           | 
           | In practice, the overhead of such a thing is too low to be
           | measured, so it can be on all the time. That lets it find
           | leaks that only occur in customer environments.
        
             | billjings wrote:
             | But circular references don't leak in Java. You have to
             | have a GC root (e.g. a static, or something in your
             | runtime) somewhere pointing at the thing to actually leak
             | it.
             | 
             | There is one case where a "circular" reference can appear
             | to cause a leak that I know of: WeakHashMap. But that's
             | because the keys, which are indeed cleaned up at some point
             | once the associated value is GC'd, are themselves strongly
             | retained references.
        
         | n4r9 wrote:
         | > Memory bugs are no worse than logic bugs
         | 
         | I guess it comes down to what you mean by "worse". It seems
         | that memory bugs carry a higher risk of being completely
         | invisible except in deliberately contrived situations. This
         | makes them more dangerous because there's less of an incentive
         | to fix something that clients will never notice.
        
           | _gabe_ wrote:
           | > It seems that memory bugs carry a higher risk of being
           | completely invisible except in deliberately contrived
           | situations.
           | 
           | Do they _really_ carry a higher risk of being invisible
           | though? You don't need to "contrive" situations either. My
           | company just spent 6+ months trying to solve a JavaScript
           | undefined symbol bug stemming from a library. When we finally
           | tracked it down, it was because we were using import instead
           | of require, which the documentation didn't clarify was
           | important. The fix we implemented was nowhere near where we
           | expected the bug, and the only reason we finally solved it
           | was because we had exhausted all other options. Sounds the
           | same, if not worse, as a difficult to track memory bug to me.
        
         | pylua wrote:
         | Businesses need software to be less complex and easier to
         | develop so it is cheaper.
         | 
         | I can't imagine manual memory management on these large scale
         | projects with a high variance of developer scale and opinions.
        
         | watermelon0 wrote:
         | > We do the same for OS resources (handles, sockets, etc) and
         | dont use automatic resource managers, we do it manually.
         | 
         | Generally, in most languages, file handles and sockets are
         | automatically closed, when the variable holding them gets out
         | of scope.
        
         | inetknght wrote:
         | > _just dont understand the fear with manual memory management.
         | With RAII and and simple diligance (clear ownership rules),
         | managing memory is an easy engineering task. I actually find
         | it_ more* challenging to deal with frameworks that insist or
         | reference counting and shared pointers since ownership is now
         | obscure.*
         | 
         | I mostly agree.
         | 
         | With RAII, memory management is simplified. "Just" evaluate the
         | lifetime of the object. Super easy if you're good at software
         | design.
         | 
         | Reference counting and shared pointers still have their niche
         | use case. But I've most often seen referencing counting used as
         | a crutch where reference-counting is _easy_ but designing
         | around object lifetimes is more appropriate.
         | 
         | > _Memory bugs are no worse than logic bugs_
         | 
         | It's true. A memory bug is "just" another logic bug. Memory
         | bugs lead to null pointers and wild pointers. They're just as
         | dangerous as logic bugs. Memory bugs are also _far_ more
         | preventable with RAII.
         | 
         | > _We do the same for OS resources (handles, sockets, etc) and
         | dont use automatic resource managers, we do it manually. So why
         | complicate the design with automatic memory management?_
         | 
         | I'm just going to point out that I _don 't_ do manual
         | allocations for OS resources. I wrap OS resources into RAII
         | objects. I've made lots (!) of custom RAII wrappers for OS
         | objects and library objects. It's trivial to do and saves a
         | _feckton_ of headaches.
         | 
         | C++ has std::fstream. I'm not saying it's great. But it's
         | definitely RAII for files and has been around for... well I
         | don't know exactly but certainly 25+ years.
        
           | xedrac wrote:
           | RAII was a huge improvement over C and I was shocked to see
           | Zig forego such an improvement. Rust adopted RAII, and took
           | it all to the next level with no data races and a lot more
           | bugs caught at compile time.
        
         | gizmo686 wrote:
         | Memory bugs is an entire class of bugs that we have simply
         | solved. If you use a language with a modern garbage collector
         | (e.g. one that can handle cycles), you will very likely go an
         | entire project without running into a single memory bug. To a
         | first approximation, these bugs were not replaced with other
         | bugs; they are simply gone. Further, we do not ask anything
         | more of the programmer do accomplish this. Instead, we need the
         | programmer to do less work than they would need to do with
         | manual memory management.
         | 
         | That is not to say that garbage collection is an unambigous
         | win. There are real downsides to using it. But for most
         | programs, those modern garbage collectors are good enough that
         | those downsides just don't matter.
        
           | barbariangrunge wrote:
           | Even game engines (including unreal) use gc these days, which
           | is nuts. Still though, it's best to be careful with your
           | allocations, use pooling, etc
        
           | im3w1l wrote:
           | There is one downside, though it's not inherent to garbage
           | collection, it just happens to correlate in current
           | languages. Gc'd languages don't have raii, and for that
           | reason they actually make you do _more_ work when managing
           | non-memory resources. The have been some attempts to work
           | around this with e.g. pythons _with_ , but In my opinion it's
           | less ergonomic due to forcing indentation.
        
             | vbezhenar wrote:
             | Go's defer works like RAII and does not introduce
             | indentations.                   f :=
             | createFile("/tmp/defer.txt")         defer closeFile(f)
             | writeFile(f)
        
               | seabrookmx wrote:
               | Even when the syntax does require indentation ("with" in
               | python, "using" in C#) it's still pretty clean IMO.
        
               | david_allison wrote:
               | Additionally, `using` no longer requires indentation in
               | C#
               | 
               | https://learn.microsoft.com/en-us/dotnet/csharp/language-
               | ref...
        
               | seabrookmx wrote:
               | TIL!
               | 
               | I just recently updated a few small services to C# 12 and
               | there's a bunch of little niceties like this I'm finding
               | (spread operator for instance).
        
               | pjmlp wrote:
               | Using also no longer requires inheritance from IDispose,
               | it suffices that the type does support Dispose pattern,
               | which is great when coupled with extension methods.
        
               | HideousKojima wrote:
               | Using no longer requires a new scope block in more recent
               | versions of C#
        
               | valicord wrote:
               | This misses the main benefit of RAII which is that you
               | can't forget to close the file.
        
             | neonsunset wrote:
             | Do they?                   // Disposed when goes out of
             | scope         using var http = new HttpClient();
             | var text = await
             | http.GetStringAsync("https://example.org/");
        
           | mattpallissard wrote:
           | > you will very likely go an entire project without running
           | into a single memory bug
           | 
           | Sure, you haven't lost the handle to the memory, but you can
           | still "leak" memory with a GC. Happens all the time. Add to
           | data structure at state 1, do something in state 2, remove
           | data in state 3. What happens you never hit state 3?
           | 
           | > But for most programs, those modern garbage collectors are
           | good enough that those downsides just don't matter.
           | 
           | I mostly agree with this. Although, most programs hit a point
           | where you have to be aware what the GC is doing and try to
           | avoid additional allocations. Which isn't very ergonomic at
           | times. And is often more fucking around than manual memory
           | management, just concentrated to a smaller portion of the
           | code base.
        
             | oivey wrote:
             | Are you calling just allocating memory you'll never use
             | leaking? It is not. It is recoverable because for it not to
             | be GC'd it has to be someone accessible. A memory leak
             | means you lost the pointer to the memory, so now you can't
             | free it.
        
               | arccy wrote:
               | lost can include you don't know where to find it even
               | though you know it exists...
        
               | Maxatar wrote:
               | >Are you calling just allocating memory you'll never use
               | leaking? It is not.
               | 
               | Yes it is, by definition a memory leak is any memory that
               | has been reserved but won't be read from or written to.
               | If you allocate memory into a data structure and after
               | some point in time you will never read from or write to
               | that memory, then that memory constitutes a leak.
        
               | cobbal wrote:
               | There are (at least) 2 definitions of memory leak.
               | 
               | The upsides of the definition you gave are that it's
               | simple, well defined, and maximally precise (nothing that
               | is safe to collect is considered live)
               | 
               | The significant downside to this definition is that it's
               | uncomputable. To know if memory is used requires knowing
               | if a loop halts.
               | 
               | The second definition of memory leak is "unreachability"
               | which is a bit harder to nail down. It's a conservative
               | approximation of the first definition, but is more
               | popular because it's computable, and it's practical to
               | write programs with or without GC that don't leak
        
               | loeg wrote:
               | The latter definition is not particularly useful for
               | writing programs that run on hardware with finite memory.
               | End users don't care whether or not the allocations are
               | reachable when your program uses all the memory in the
               | system and crowds out other programs, slows, and/or
               | crashes.
        
               | jrockway wrote:
               | I'm OK with people using the expression "memory leak" to
               | mean "unbounded growth in heap size over time". The case
               | that I see most frequently is a routine that's set up
               | like "use some memory, wait until an event that never
               | happens happens, free memory". Technically if you let
               | time run indefinitely, the memory would be freed. But
               | since memory is finite, eventually you run out and your
               | program crashes, which is annoying in this case.
        
             | loeg wrote:
             | And this kind of leak is the same kind that is actually
             | difficult to prevent and debug with manual allocation --
             | it's present / reachable in some structure, but not
             | intentionally. Say you have a cache of items -- how big
             | should it grow? Can it free space under memory pressure? If
             | you reuse items, do they own collections (e.g., std::vec)
             | that hold on to reserved but unused space (e.g. clear()
             | doesn't free memory)? These are the hard problems and they
             | are approximately the same with or without GC.
        
             | sa46 wrote:
             | Joshua Bloch, of Effective Java fame, used the term
             | unintentional object retention as a more precise term for
             | memory "leaks" in garbage collected languages.
        
               | mattpallissard wrote:
               | Seemes a bit pedantic, but so am I. I like that term.
        
               | DowsingSpoon wrote:
               | I've also heard this referred to as Abandoned Memory, or
               | sometimes a Space Leak. It's a common class of bug in a
               | reference counted language, like Objective-C, where
               | cycles need special consideration.
        
           | Jasper_ wrote:
           | Uncollectable reference cycles are shockingly easy to make in
           | JS, especially with React. A classic example:
           | function closure() {             var smallObject = 3;
           | var largeObject = Array(1000000);                  function
           | longLived() { return smallObject; }             function
           | shortLived() { return largeObject; }
           | shortLived(); return longLived;         }
           | 
           | Will keep largeObject alive.
        
             | cobbal wrote:
             | Is this a property of JavaScript or the engine running it?
             | This feels like something a sufficiently-smart(tm) closure
             | implementation should be able to prevent.
        
               | Jasper_ wrote:
               | This is an artifact of V8's GC. Effectively, largeObject
               | and smallObject are tracked together, as a unit.
               | Splitting it out into two separate records increases
               | average memory usage. They keep saying they want to fix
               | it eventually, but it's been this way for 10+ years at
               | this point.
               | 
               | You really do have to know the quirks of what you're
               | targeting.
        
             | kevingadd wrote:
             | JS runtimes are allowed to optimize this out, IIRC, and
             | will often do so.
        
             | AgentME wrote:
             | This isn't an uncollectable reference cycle. It's true that
             | with this code in most/all JS engines, if there's a
             | reference to the function `longLived` then `largeObject`
             | will be kept in memory, but reference cycles are
             | collectable in standard garbage collection systems. Both of
             | the values will be garbage-collectable once no outside
             | references to `longLived` still exist. Pure reference
             | counting systems (Rust Rc, C++ shared_ptr, etc) are the
             | kind of system that fail to automatically handle cycles.
             | 
             | You could test this with your code by setting a
             | FinalizationRegistry to log when they're both finalized,
             | unset any outside reference to `longLived`, and then do
             | something to force a GC (like run Node with --expose-gc and
             | call `global.gc()`, or just allocate a big Uint8Array).
        
           | patrick451 wrote:
           | Performance regression is a bug and GC has a horrific
           | overhead in the average program. My computer is orders of
           | magnitude faster than it was 15 years ago but it spends all
           | of it's time wondering around in the wilderness hunting for
           | memory to free. We could have just told it.
           | 
           | > those modern garbage collectors are good enough that those
           | downsides just don't matter.
           | 
           | This is not what I see. Every I look at a profile of a
           | program written in a GCed language, most of the time is spent
           | in the garbage collector. I can't recall the last time I
           | looked at a c++ profile where more than 10% of the time spent
           | in new/delete. I have seen 100x speedups by disabling GC. You
           | can't ship that, but it proves there is massive overhead to
           | garbage collection.
        
             | macintux wrote:
             | > GC has a horrific overhead in the average program
             | 
             | It doesn't have to be that way. The BEAM is designed for
             | tiny processes, and GC is cheap.
        
               | toast0 wrote:
               | I love Erlang and BEAM, but the reason the GC is (mostly)
               | cheap is because self-referential data structures are
               | impossible, so you can have a very simple and yet still
               | very effective GC. One heap per process also helps
               | immensely.
               | 
               | Also, when your process has a giant heap, the GC gets
               | expensive. There's been lots of improvement over the
               | years, but I used to have processes that could usually go
               | through N messages/sec, but if they ever got a backlog,
               | the throughput would drop, and that would tend towards
               | more backlog and even less throughput, and pretty soon it
               | will never catch up.
               | 
               | That sometimes happens with other GC systems, but it
               | never feels quite so dire.
        
         | armchairhacker wrote:
         | IMO it's not that memory management is hard, it's that
         | developers are imperfect, so writing a 100% UB-free, leak-free
         | program is hard. And it only takes a single oversight to cause
         | havoc, be it a CVE, leak that causes a long-running program to
         | slowly grow memory, or hair-pulling bug that occurs 1 in 1000
         | times.
         | 
         | Yes, logic bugs have the same issues, and even in languages
         | like Java we can sometimes (albeit rarely IIRC) get memory
         | leaks. But memory-safe languages are an _improvement_. Just
         | like TypeScript is an improvement over JavaScript, even though
         | they have the same semantics. We have automated systems that
         | can decrease the amount of memory errors from 1% to 0.01%, why
         | keep leak and UB prevention a manual concern? Moreover, the
         | drawbacks of memory-safe languages are often not an issue: you
         | can use a GC-based language like Java which is easy but has
         | overhead, or an enforced-ownership language like Rust which has
         | a learning curve but no overhead. Meanwhile, while logic bugs
         | can be a PITA, memory bugs are particularly notorious, since
         | they don't give you a clear error message or in some cases even
         | cause your program to halt when they occur.
         | 
         | Tangential: another solution that practically eliminated a
         | class of bugs is formal verification. And currently we don't
         | see it in any system but those where correctness is most
         | important, but that's because unlike memory management the
         | downsides are very large (extremely verbose, tricky code that
         | must be structured a certain way or it's even more verbose and
         | tricky). But if formal verification gets better I believe that
         | will also start to become more mainstream.
        
         | jandrewrogers wrote:
         | The use of manual memory management increases the cognitive
         | load when reasoning about software. Working memory capacity
         | varies considerably between people and is a performance
         | limiting factor when designing complex systems. You see
         | analogues of this in other engineering disciplines too.
         | 
         | Over my many years of working in software development I have
         | come around to the idea that most software developers do not
         | have sufficient working memory capacity to reason about memory
         | management in addition to everything else they must reason
         | about at the same time. They may know mechanically how to
         | properly do manual memory management but when writing code they
         | drop things if they are juggling too many things at once
         | mentally. It isn't a matter of effort, there simply is a
         | complexity threshold past which something has to give.
         | Automatic memory management has its tradeoffs but it also
         | enables some people to be effective who otherwise would not be.
         | 
         | On the other side of that you have the minority that get manual
         | memory management right every time almost effortlessly, who
         | don't grok why it is so difficult for everyone else because it
         | is literally easy for them. I think some people can't imagine
         | that such a group of engineers exist because they are not in
         | that group and their brain doesn't work that way. If you are in
         | this group, automatic memory management will seem to make
         | things worse for unclear benefit.
         | 
         | I've observed this bifurcation in systems software my entire
         | career. In discussions about memory safety and memory
         | management, it is often erroneously assumed that the latter
         | population simply doesn't exist.
        
           | mattpallissard wrote:
           | I think the rift comes from people thinking manual memory
           | management is a bunch of random allocs and frees all over the
           | place. That is gross and those of us who don't mind managing
           | memory don't like it either.
           | 
           | Personally my gripe is when people don't think about memory
           | or space complexity at all. I don't care if it's a custom
           | memory management strategy or GC'd language, you need to
           | think about it. I have the same gripe about persistence,
           | socket programming, and database queries.
           | 
           | Abstractions are great, use them, love them. But
           | understanding what the hell they are doing under the hood not
           | only improves efficiency, it prevents bugs, and gives you a
           | really solid base when reasoning about unexpected behavior.
        
             | mirsadm wrote:
             | Any professional developer should understand these things.
             | It was taught at first year of my computer science degree.
             | As you say it isn't particularly difficult if you have a
             | strategy. If you can't manage memory then what about any
             | other resource that needs to be manually closed (sockets,
             | file handles etc ).
        
               | da_chicken wrote:
               | I'm not sure I understand the causal link between
               | something having a proven track record of being error-
               | prone and not _understanding_ it.
        
               | prerok wrote:
               | The problem is that automatic memory management means to
               | some newcomers that you don't have to worry about it.
               | Which is not true at all.
               | 
               | These newcomers may have heard about it, may even
               | understand it at some level, but it's not in their rote
               | knowledge. GC is great if you know that every allocation
               | you do is expensive and that you know that it will be
               | taken care of properly, because it will be short-lived.
               | 
               | But I have seen many cases where people just don't have
               | it in their conciousness why allocating huge arrays is a
               | problem or why allocating a new Character object for
               | every single character in a string might be bad. As soon
               | as I point it out, they get it, but it's not like they
               | thought of it while writing their algorithm.
        
             | Dylan16807 wrote:
             | If the automatic allocation tools are good, then they're
             | doing the same thing as quality manual management but with
             | much more compiler enforcement.
        
             | pjmlp wrote:
             | Which is actually the case in enterprise source code,
             | touched by hundreds of offshoring companies.
        
         | artemonster wrote:
         | Oh yes! And also dont forget about the hand holding static type
         | system that wont let you fart unless you explicitly convince it
         | that you will not shit your pants!
        
         | bigstrat2003 wrote:
         | If manually managing memory were in fact an easy engineering
         | task, software developers wouldn't be so demonstrably bad at
         | it.
        
         | derriz wrote:
         | Sure, for simple flows-of-control, GC buys you little and an
         | RAII type approach is fine. But RAII really only works if
         | lifecycle and ownership spans are reflected in the lexical
         | structure of the code base. RAII relies on creation and cleanup
         | happening within a block/lexical scope.
         | 
         | Unfortunately in the real C++ codebases I've had to work with,
         | the flow of control is never so simple. Any correspondence
         | between lexical scope and lifecycle is broken if the code uses
         | exceptions, callbacks, any type of async, threads, etc.
        
           | tlarkworthy wrote:
           | ... or a persistent data-structure [1].
           | 
           | [1] https://en.wikipedia.org/wiki/Persistent_data_structure
        
         | pjmlp wrote:
         | It doesn't scale in large teams of various skill levels.
        
       | SomeoneFromCA wrote:
       | "Much has been written about various tools for profiling a leak,
       | understanding heap dumps, common causes of leaks".
       | 
       | Eww.. leaks and heap dumps. Someone needs a healthier diet.
        
       | gavinhoward wrote:
       | Valgrind makes finding leaks so easy in C.
       | 
       | Fixing them is harder, but it's usually easy if your design is
       | right. I usually allocate and free in the same function unless
       | the function is meant to allocate for the caller. (And then that
       | call is considered allocation in the caller.)
        
         | canucker2016 wrote:
         | It's the "repro the bug" that's hard...
         | 
         | When I worked on static analysis of codebases, the error
         | handling codepath was the most likely source of problems.
        
           | gavinhoward wrote:
           | Yes.
           | 
           | To do that, I fuzz like crazy, and every path becomes a test,
           | even if it means a lot of manual work checking each one. That
           | alone exercises tons of error paths.
           | 
           | To cover 99% of the rest, I use SQLite's error injection [1]
           | on all tests. Just doing error injection on allocation gets
           | you 90% of the way there.
           | 
           | [1]: https://sqlite.org/testing.html#anomaly_testing
        
       | 1024core wrote:
       | Reminds me of this story I heard about Yahoo. Their ads server
       | had a memory leak and it would OOM after something like 10000
       | requests.
       | 
       | Their solution: restart the server after 8000 requests.
       | 
       | This worked for a year or two. And then it started OOM-ing after
       | 8000 requests.
       | 
       | Next solution: restart the server after 6000 requests.
        
         | xandrius wrote:
         | If that gives you extra time to move the problem to the future,
         | I'd say that's a win :D
        
         | otabdeveloper4 wrote:
         | > Their solution: restart the server after 8000 requests.
         | 
         | 8000 requests is something like 500 milliseconds for the
         | average ad server.
         | 
         | You need exceptionally fast restarts for that to work.
        
           | wging wrote:
           | Perhaps now, but the story is about Yahoo, which means it
           | could be from the early 2000s or late 90s. Traffic volumes
           | were probably lower, computers were definitely slower,
           | internet advertising was not as big as it is now, etc.
        
           | toast0 wrote:
           | This is in the context of (y)Apache. You set
           | MaxRequestsPerChild, so when the request limit is hit, the
           | child is killed, a new one started, and requests can be
           | served by other children until the replacement is ready. In a
           | pure config, idle children block on accept, so the kernel
           | does all the coordination work required to get requests to
           | children, the child exits after hitting the cap, and the
           | parent catches the exit and spawns another. As long as all
           | the children don't hit their cap at once, there's no gap in
           | service. If they do all hit the cap at once, it's still not
           | so bad.
           | 
           | I don't know about ads, but on Yahoo Travel, I had no qualms
           | about solving memory leaks that took months to show up with
           | MaxRequestsPerChild 100000. I gave valgrind or something a
           | whirl, but didn't find it right away and was tired of getting
           | paged once a quarter for it, so...
           | 
           | I did do some scaleout work for Shopping once, and found
           | theirs set at 3. I don't remember where I left it, but much
           | higher. Nobody knew why it was 3, so I took the chance that
           | it would become aparrent if I set it to 100, and nothing
           | obvious was wrong and the servers were much happier. fork()
           | is fast, but it's not fast enough to do it every 3 requests.
        
       | seanhunter wrote:
       | So one place I worked might win some sort of prize for the
       | dumbest way to burn $5m with a memory leak.
       | 
       | So back in the 90s the printer driver in Solaris had a memory
       | leak[1]. At the time I was a contractor for a big bank that is
       | sadly/not sadly no longer with us any more. This was when the
       | status of faxes in confirming contracts hadn't been sufficiently
       | tested in court so banks used to book trades by fax and the
       | system that sent the fax would also send a document to a
       | particular printer which would print the trade confirm out and
       | there was some poor sap whos job consisted of picking up the
       | confirms from this printer, calling the counterparty and reading
       | them out so that they were all on tape[2] and legally confirmed.
       | 
       | Anyhow one day the memory leak caused the printer driver to fall
       | over and fail to print out one of the confirms so the person
       | didn't read it out on the telephone. The market moved
       | substantially and the counterparty DKd the trade[3]. A lot of
       | huffing and puffing by senior executives at the bank had no
       | effect and they booked a $5m loss and made a policy to never
       | trade with that particular bank again[4]. The fax printer job was
       | moved to windows NT.
       | 
       | [1] According to the excellent "Expert C programming" this was
       | eventually fixed because Scott McNealy, then CEO of Sun
       | Microsystems had been given a very underpowered workstation (as
       | CEO) so was affected a lot by this problem and eventually made
       | enough of a fuss the devs got around to fixing it
       | https://progforperf.github.io/Expert_C_Programming.pdf
       | 
       | [2] Calls in the securities division of banks are pretty much
       | always recorded for legal and compliance reasons
       | 
       | [3] DK stands for "Don't know". If the other side says they
       | "don't know" a trade they are disputing the fact that a contract
       | was made.
       | 
       | [4] Which I'm sure hurt us more than it hurt them as they could
       | just trade somewhere else and pay some other bank their
       | commission
        
         | fbdab103 wrote:
         | >...they booked a $5m loss and made a policy to never trade
         | with that particular bank again
         | 
         | Maybe I am too cynical, but would many businesses retroactively
         | agree to a deal which would cost them a ton of money? If the
         | Process requires Is dotted, Ts crossed, and a phone call
         | confirmation which was never placed -why eat the loss when the
         | other party should own the error?
         | 
         | Citi just had a lawsuit because of paying back a loan too
         | quickly. I expect everyone in finance to play hard ball on
         | written agreements when it works in their favor.
        
           | seanhunter wrote:
           | Sometimes it happens. Particularly in big US old school
           | broker-dealers, "Dictum Meum Pactum"[1] is something some
           | people take very seriously especially since you will have a
           | fruitful (if adversarial) working relationship over many
           | years and may need someone to do you a personal favour in the
           | future (eg giving you a job etc).
           | 
           | For example I know of one US investment bank where a very
           | large options position was "booked" by a trader using a
           | spreadsheet rather than in the official booking system which
           | meant that the normal "exercise and expiry" alerts didn't go
           | off to warn people when the trade was about to expire. The
           | trader in question went on holiday and as a result the trade
           | expired more than a billion dollars[2] in the money. The CEO
           | of the bank called up the counterparty and successfully
           | persuaded them to honour the trade and pay up even though it
           | had actually expired and everyone knew there was no legal
           | obligation. As it was explained to me at the time, the
           | counterparty had probably hedged the trade so was not
           | scratching around the sofa trying to find the money.
           | 
           | [1] "My word is my bond"
           | 
           | [2] Yes. With a b.
        
       ___________________________________________________________________
       (page generated 2024-05-11 23:00 UTC)