[HN Gopher] JavaScript garbage collection and closures
       ___________________________________________________________________
        
       JavaScript garbage collection and closures
        
       Author : jaffathecake
       Score  : 97 points
       Date   : 2024-07-30 16:35 UTC (6 hours ago)
        
 (HTM) web link (jakearchibald.com)
 (TXT) w3m dump (jakearchibald.com)
        
       | samanator wrote:
       | I wonder how many petabytes of leaked memory there is in the
       | world at any given time from open chrome tabs
        
         | sroussey wrote:
         | I'm sure this issue is found in react hooks all over the place.
        
         | adhamsalama wrote:
         | So it's like a ContextManager in Python? Nice!
        
           | orf wrote:
           | Why?
        
         | kccqzy wrote:
         | These days Chrome has a "memory saver" feature that deactivates
         | inactive tabs and free their memory. It's something Apple
         | pioneered in mobile Safari from the very beginning. For people
         | like me who keep 500+ tabs open always, it's a great feature.
        
       | packetlost wrote:
       | TLDR: the entire scope for a closure is retained as long as that
       | environment might still be referenced. There's no such thing as a
       | partial scope in JavaScript (to my knowledge, please correct if
       | wrong).
       | 
       | In the example, if you don't capture `id` in the returned
       | closure, the problem goes away.
        
         | jaffathecake wrote:
         | That's not quite right. See the final example in the article,
         | it doesn't reference anything in the parent scope.
        
           | packetlost wrote:
           | Ah, you're right. The fact that a closure exists at all is
           | enough to retain the parent environment.
        
         | hinkley wrote:
         | I know the debugger cannot always show you variables from the
         | originating scope or even completed blocks above the breakpoint
         | so it's clear that some data not used in the closure becomes
         | unreachable while the closure is still live.
         | 
         | But I've also spent time hunting down giant memory leaks where
         | some things did get preserved that should not have. I'm so glad
         | domains have finally gone from "deprecated" to deprecated
         | (telling people not to use a feature when you haven't provided
         | the replacement is fucking stupid) as most of that kind of
         | problem that I can recall involved domains in some manner.
        
         | munificent wrote:
         | _> There 's no such thing as a partial scope in JavaScript (to
         | my knowledge, please correct if wrong)._
         | 
         | There's also no such thing as garbage collection in JavaScript.
         | If you look purely at the language spec, then the language
         | behaves as if there is infinite memory.
         | 
         | Garbage collection and how that interacts with scopes is purely
         | an implementation detail. A conforming JavaScript
         | implementation definitely could determine that a given local
         | variable captured by a closure will no longer be accessed and
         | free the associated memory. It's just that the implemented
         | tested by the author (I'm assuming Chrome) doesn't do that.
        
           | packetlost wrote:
           | > There's also no such thing as garbage collection in
           | JavaScript.
           | 
           | Incorrect. The details of garbage collection are left to the
           | implementation, but the spec itself at least implies that GC
           | is expected of implementations [0]. The current spec has an
           | entire section on memory management [1] with several
           | constructs for garbage collection and many explicit
           | references to GC.
           | 
           | That being said, it would be more correct to say _this
           | implementation_ of JavaScript does not perform capture
           | analysis, effectively creating partial scopes.
           | 
           | [0]: https://262.ecma-international.org/7.0/#sec-weakmap-
           | objects [1]: https://tc39.es/ecma262/#sec-managing-memory
        
             | munificent wrote:
             | Ah, weak references are a good point.
             | 
             | Even so, you can have an entirely conforming JavaScript
             | implementation with no GC. It just means that WeakRefs and
             | WeakMaps will always continue to hold their references.
        
       | beardyw wrote:
       | Just today (re)discovered FinalizationRegistry which is a big
       | help if you are worried about what may be left behind. It's quite
       | nice to be able to see a log of objects disappearing.
        
         | syspec wrote:
         | Never heard of `FinalizationRegistry`, how do we access it?
        
           | simlevesque wrote:
           | https://developer.mozilla.org/en-
           | US/docs/Web/JavaScript/Refe...
        
         | jaffathecake wrote:
         | Hah, that's what I was playing with when we discovered this
         | issue.
        
           | beardyw wrote:
           | It's frustrating sometimes waiting for GC to kick in. The
           | Chrome debugger seems to offer a GC now thing, but either it
           | didn't work or it's just a kind of suggestion.
        
             | jaffathecake wrote:
             | I know this isn't helpful, but it worked fine for me.
        
         | laurencerowe wrote:
         | I found FinalizationRegistry had a severe performance cost in
         | v8 when I tried it last year.
        
       | qbane wrote:
       | Can reproduce in latest Firefox and Chromium. I wonder whether
       | this is an actual leak or there is a good reason for JS engines
       | to retain the array buffer.
        
       | blackhaj7 wrote:
       | Interesting, succinct article. Love it.
       | 
       | How can I try this myself? i.e. see bigArrayBuffer in memory and
       | see if it is/isn't garbage collated. I am guessing I can use the
       | Chrome debugger but I would love to know to do it how step by
       | step if anyone has a link
        
         | jaffathecake wrote:
         | Using the memory tab in dev tools, do a heap snapshot. If it's
         | 100mb, then that buffer is in there.
        
         | ProfessorZoom wrote:
         | look up how to use FinalizationRegistry
        
       | skrebbel wrote:
       | As an old OO guy, I like to think of a closure as syntax sugar
       | for a class that's being generated which has fields for all the
       | variables in a scope that are used in callbacks. In those terms,
       | this is quite a bit less surprising. (That said, you could also
       | imagine generating a separate class for each callback - I wonder
       | why JS engines don't do that)
        
         | dartos wrote:
         | That's funny.
         | 
         | As a functional guy, I see classes and objects as syntactic
         | sugar for closures.
        
           | lloeki wrote:
           | _The venerable master Qc Na was walking with his student,
           | Anton. Hoping to prompt the master into a discussion, Anton
           | said_ "Master, I have heard that objects are a very good
           | thing - is this true?" _Qc Na looked pityingly at his student
           | and replied,_ "Foolish pupil - objects are merely a poor
           | man's closures."
           | 
           |  _Chastised, Anton took his leave from his master and
           | returned to his cell, intent on studying closures. He
           | carefully read the entire "Lambda: The Ultimate..." series of
           | papers and its cousins, and implemented a small Scheme
           | interpreter with a closure-based object system. He learned
           | much, and looked forward to informing his master of his
           | progress._
           | 
           |  _On his next walk with Qc Na, Anton attempted to impress his
           | master by saying_ "Master, I have diligently studied the
           | matter, and now understand that objects are truly a poor
           | man's closures." _Qc Na responded by hitting Anton with his
           | stick, saying_ "When will you learn? Closures are a poor
           | man's object." _At that moment, Anton became enlightened._
           | 
           | https://people.csail.mit.edu/gregs/ll1-discuss-archive-
           | html/...
        
             | gavmor wrote:
             | I've often wondered if "Qc Na" were a pun of some kind.
        
               | dartos wrote:
               | In the link they posted:
               | 
               | > I'll take some koanic license and combine Norman Adams
               | (alleged source of "objects are a poor man's closures")
               | and Christian Queinnec ("closures are a poor man's
               | objects") into a single great Zen language master named
               | Qc Na.
        
       | kazinator wrote:
       | Could it be that the vector is hoisted outside of the function's
       | inner scope, promoted to a literal-like datum attached to the
       | outer function?
       | 
       | It's obvious from the function that the ArrayBuffer object never
       | escapes from the scope, and is never modified.
       | 
       | If the object is never modified, there is no need to keep newly
       | instantiating it; it can be hoisted to the function and attached
       | to it somehow, so then to get rid of it, we have to lose the
       | function itself.
        
       | DanielHB wrote:
       | This behavior is required because eval exists:
       | function test(text) {         const a = 1         function
       | inner() {           console.log(eval(text))         }
       | inner()       }       test("() => { return a }")
       | 
       | prints 1 to the console
       | 
       | This happens because the closure's context object is shared
       | between all closures in a given scope. So as soon as one variable
       | from a give scope is accessed through a closure then all
       | variables will be retained by all inner functions.
       | 
       | Technically the engines could be optimizing it when no eval used
       | is detected or when in strict mode (which blocks eval), but I
       | guess that dynamically dropping values from a closure context
       | based on inner lexical scopes can be really tricky thing to do
       | and probably not worth the overhead.
        
         | jaffathecake wrote:
         | Nah, that's not it. Browsers do statically detect if eval is
         | there or not, and react accordingly. This is possible because
         | of https://developer.mozilla.org/en-
         | US/docs/Web/JavaScript/Refe...
        
         | nathcd wrote:
         | > Technically the engines could be optimizing it when no eval
         | used is detected or when in strict mode (which blocks eval),
         | 
         | I just learned about direct vs indirect eval
         | (https://developer.mozilla.org/en-
         | US/docs/Web/JavaScript/Refe...), which I imagine makes this a
         | bit easier. The parent scope is only captured in a direct eval.
        
         | olliej wrote:
         | [later edit: I re-read the example code, and the issue is a
         | double capture and the author not really understanding the
         | semantics of capture in JS, I've given a more technical
         | breakdown and explanation at
         | https://news.ycombinator.com/item?id=41113481]
         | 
         | Nah, in the es3.1 or 5 era we (tc39) fixed the semantics of
         | eval to make it explicit that scope capture only occurs in a
         | direct/unqualified eval (essentially eval becomes a pseudo
         | keyword).
         | 
         | The big question is scope capture - in JSC I implemented free
         | vs captured variable analysis many many years ago, though the
         | primary reason was to be able to avoid allocating the
         | activation/scope object for a function at all, though that did
         | allow us to handle this case (I'm curious if JSC does still do
         | this).
         | 
         | The problem with doing that is that it impacts developer tools
         | significantly.
         | 
         | (To be continued after changing device) (Edit: Continuing)
         | 
         | Essentially developers want to be able to debug their code, but
         | you run into issues if they start debugging after page load
         | because if you do optimize this, then anything they may want to
         | inspect may have been optimized away, at that point you (the
         | engine dev) can't do anything to bring it back, but you could
         | say "from this point on I'll keep things around" (JSC at least
         | in the past would essentially discard all generated code when
         | the dev tools were opened). You might also say "if the dev
         | tools are _enabled_ (not open) " then I'll always do codegen
         | assuming they'll be needed. Or you might say "if the dev tools
         | are open on page load generate everything as [essentially]
         | debug code" (which is still optimized codegen, but less
         | aggressive about GC related things).
         | 
         | All of those options work, but the problem you then have is
         | potentially significant changes in behavior if the dev tools
         | are enabled and/or open - especially nowadays where JS exposes
         | various weak reference types where this kind of change directly
         | impacts the behaviour of said weak references. So now the
         | question becomes how much of a behavioural difference am I
         | willing to accept between execution with or without dev tools
         | involved.
         | 
         | It's possible (I haven't worked directly on browser engines in
         | a number of years now) that the consensus has become "no
         | difference is ok" - this kind of space leak is not super
         | common, etc and the confusion from different behavior
         | with/with-out dev tools might be considered fairly obnoxious.
        
         | paulddraper wrote:
         | No. JS engines detect of eval in the AST.
         | 
         | It's why you can't rename eval and reference the current scope.
         | const eval2 = eval;       (() => {         const a = 1;
         | eval2("a");       })();       // ReferenceError: a is not
         | defined
         | 
         | ---
         | 
         | The reason JS engines choose to share the closure record among
         | all closures create in the function scope is purely for
         | performance.                 function f() {         const a =
         | {};         const b = {};         return [           () => a,
         | () => b,         ];       }
         | 
         | Both functions share the same single closure record,
         | referencing a and b.
         | 
         | JavaScript developers have rediscovered this over and over and
         | over again.
         | 
         | * Meteor.js (2013) [1]
         | 
         | * My personal journey (2013) [2]
         | 
         | [1] https://blog.meteor.com/an-interesting-kind-of-javascript-
         | me...
         | 
         | [2] https://stackoverflow.com/questions/19798803/how-
         | javascript-...
        
           | Feathercrown wrote:
           | You can rename eval, but the renamed eval will be considered
           | "indirect" and will run in the global scope.
        
             | paulddraper wrote:
             | Yes, corrected, thank you.
        
         | ballenf wrote:
         | That code outputs "() => { return a }", since there's no
         | function invocation.
        
       | adhamsalama wrote:
       | Insightful article, thanks for sharing!
        
       | bgirard wrote:
       | At Meta I worked on memlab [1]. This tool is very effective at
       | finding memory leaks in our JavaScript. AFAIK we found one, but
       | only one, such leak that happened in production code. Once
       | discovered it was easily fixed. But understanding this class of
       | issue was important to make sense of the leak report.
       | 
       | [1] https://facebook.github.io/memlab/
        
         | brundolf wrote:
         | Can you speak at all to why it works this way?
        
           | bgirard wrote:
           | I spoke to V8 engineers about this many years ago. JS VMs
           | -could- handle this without leaking, but they would have to
           | do more analysis when compiling code. So are we willing to
           | trade-off startup time to fix a somewhat rare memory usage? I
           | don't think anyone has spent the time to study this in depth
           | and collect numbers to weight the tradeoff.
           | 
           | If suddenly this problem became a big deal and caused a lot
           | of leaks then I think we'd see JS VMs fix it. Perhaps one day
           | if a framework makes this kind of error more frequent.
        
       | unstirrer wrote:
       | See also https://news.ycombinator.com/item?id=5959020
        
       | brundolf wrote:
       | It would be intuitive to me if function closures only retained
       | things they reference
       | 
       | It would also be intuitive to me if closures naively retained
       | everything in scope
       | 
       | It's bizarre to me that the behavior is "if one closure
       | references something, then all of them retain it"
       | 
       | I guess maybe it's a stack vs heap thing? If nothing retains a
       | variable then it can be kept on the stack, but once it has to
       | outlive the function it has to be moved. Still odd the
       | bookkeeping can't distinguish closures that references it from
       | ones that don't, if it already has to check that for the entire
       | set
        
         | sam_perez wrote:
         | Yeah, this does seem like an odd middle ground.
         | 
         | But I suppose it's more about what you're tracking? For
         | example, instead of tracking reference counts to variable,
         | they're tracking reference counts to each scope. Then, if a
         | scope has no more references, the variables it owns are cleaned
         | up.
        
       | pizlonator wrote:
       | Fundamental downside of tracing garbage collection.
       | 
       | Reachability is a conservative approximation of the set of
       | objects that need to be kept alive.
       | 
       | Not saying don't use GC, but this is a great example of how there
       | are no silver bullets in memory management - only imperfect trade
       | offs.
        
         | paulddraper wrote:
         | Not really. It is trivially knowable that () =>
         | clearTimeout(id) does not reference bigArrayBuffer.
         | 
         | The design is not because it is hard to know what needs to be
         | kept alive, but rather it is more efficient to use one closure
         | record with id and bigArrayBuffer than two different ones.
        
           | pizlonator wrote:
           | Fix this one issue and you'll find another.
        
             | paulddraper wrote:
             | It's pretty simple: If the closure doesn't reference a
             | variable, it doesn't need the variable.
             | 
             | Can't get more basic than that.
        
               | pizlonator wrote:
               | Closures aren't the only case where this happens.
               | 
               | And your solution is not adequate: the closure does
               | reference the variable, but it's possible to have a
               | reference to the closure that exists for some reason
               | other than calling the closure (the second closure does
               | this - it references the closure that references the
               | variable, but won't call the closure that references the
               | variable). It's not straightforward to solve that
               | problem, and the solution involves a harder constraint
               | system than just object-to-object reachability.
        
               | paulddraper wrote:
               | This is just tracing GC. Nothing more or less than that.
               | 
               | The only difference is how you construct the closure
               | record for your GC.
               | 
               | Using the example:                 function f() {
               | const a = {};         const b = {};         return [
               | () => a,           () => b,         ];       }
               | 
               | You can choose to have one scope record shared by both
               | functions, or individual record.
               | function1 ----|   ----> a                     v   |
               | scope                     ^   |       function2 ----|
               | ----> b
               | 
               | Or                 function1 --> scope1 --> a
               | function2 --> scope2 --> b
               | 
               | JS engines choose to do the former, not out of simplicity
               | (arguably it's more complicated), but due to performance.
        
               | deredede wrote:
               | The language specification says to use a single scope
               | (see olliej's detailed comment). You could change it to
               | use separate scopes here but it won't solve the more
               | generic issue of imprecise captures, just this specific
               | case.
        
         | kaba0 wrote:
         | You are not absent from conservative approximation even with
         | manual memory management. E.g. a map is a trivial example where
         | you can't always know which elements will be used during the
         | lifetime of the program (and only those has to be kept alive).
        
           | pizlonator wrote:
           | Yeah totally. No silver bullets.
        
       | lxe wrote:
       | I got confused here, because in the timers example, I assumed
       | that without calling cancelDemo/clearTimeout, we cannot assume
       | that the timer's function is no longer callable. Heck, even WITH
       | clearTimeout... without understanding its implemetation, we
       | cannot assume it's no longer callable by some internal system.
       | 
       | I think you should call cancelDemo() in the article to show that
       | yes, even when we assume that the reference to the closure is
       | cleaned up, the allocation persists.                   (() => {
       | // Assumed implementation of setTimeout         // in which we
       | cannot make assumptions that fn         // is inaccessible after
       | the first invocation         const timers = [];         function
       | mySetTimeout(fn) {            fn();                 // retain the
       | function for some reason           // maybe because of
       | timers/intervals implemetation detail           timers.push(fn);
       | const id = timers.length - 1;           return id         }
       | function myClearTimeout(id) {           // this makes it to fn
       | has no more references           // but it still gets retained
       | const fn = timers.splice(id, 1);                 // For good
       | measure            delete fn;         }              function
       | demo() {           const bigArrayBuffer = new
       | ArrayBuffer(100_000_000);           const id = mySetTimeout(() =>
       | {             console.log(bigArrayBuffer.byteLength);
       | }, 1000);                return () => myClearTimeout(id);
       | }              cancelDemo = demo();              // Even when
       | calling clearTimeout,          // bigArrayBuffer is still
       | allocated,          // Which is the crux of the article
       | cancelDemo();                    // Now it's actually
       | deallocated, as          // shows in the article         delete
       | cancelDemo;               })();
        
       | olliej wrote:
       | Ok, so there have been a lot of comments on what causes this,
       | which are not correct, and a few that are pretty close. Here is
       | the actual explanation of what is happening and what is causing
       | the leak.
       | 
       | For context, I used to be on TC39 (the ecmascript standards
       | committee) and spent many many years working on JSC, and
       | specifically working on the GC and closure modeling.
       | 
       | First off: this is not due to eval. In ES3.1 or ES5 (alas I can't
       | recall which) we (tc39) clarified the semantics of eval, to only
       | evaluate in the containing scope if it is called directly -
       | essentially turning it into a pseudo operator (implementations
       | today generally implement a direct eval as `if (target function
       | == real eval function) { do eval } else { call the function }`.
       | Calling eval in any way other that `eval(<expression>)` will not
       | invoke the scope capturing behavior of eval (this is a strict
       | requirement to allow fast access to non-local variables).
       | 
       | The function being reported as exhibiting the bad/unexpected
       | behavior in the post is:                   function demo() {
       | const bigArrayBuffer = new ArrayBuffer(100_000_000);
       | const id = setTimeout(/* timeout closure */ () => {
       | console.log(bigArrayBuffer.byteLength);           }, 1000);
       | return /* cleanup closure */ () => clearTimeout(id);         }
       | 
       | If we were to follow the spec language fairly explicitly, the
       | behavior of this function is (eliding exact semantics of
       | everything other than creation of function objects and closures)
       | 1. enter the function         2. env = create an empty lexical
       | environment object                 (I may use "activation" by
       | accident because                  that was the spec language when
       | I was first                  working on JS engines)           a)
       | set the parent scope of env to the internal scope reference of
       | the              callee (in this case because demo is a global
       | function this will be              the global object)
       | b) add a property "bigArrayBuffer" to env, setting the value
       | to undefined           c) add a property "id" to env, setting the
       | value to undefined         3. evaluate `new
       | ArrayBuffer(100_000_000)` and assign the result            to the
       | "bigArrayBuffer" property of env         4. Construct a function
       | object for the timeout closure, and set its            internal
       | scope reference to *env* (i.e. capture the containing
       | scope)         5. call setTimeout passing the function from (4)
       | and 1000 as,            and assign the result to the "id"
       | property on the env object         6. construct the cleanup
       | closure, and set the internal scope             property to env
       | 
       | The result of this is that we end up with the following set of
       | objects:                   globalObject = {.....}         demo =
       | Function { @scope: globalObject }         <demo_env> (not
       | directly exposed anywhere) =             LexicalEnvironment {
       | @scope: demo.@scope,
       | bigArrayBuffer: big array,                                 id:
       | number                               }         <timeout closure>
       | = Function { @scope: demo_env }         <cleanup closure> =
       | Function { @scope: demo_env }
       | 
       | At which point you can see as long as either closure is live, the
       | reference to bigArrayBuffer is reachable and therefore kept
       | alive.
       | 
       | Now, I was confused about this report originally as I know JSC at
       | least does do free var anaylsis (and I can't imagine v8 doesn't,
       | not sure about SM these days) to reduce false captures, because I
       | had not properly read the example code, and was like "why is this
       | being kept alive", but having actually read the code properly and
       | written out the above it's hopefully very obvious to everyone
       | now.
       | 
       | The language semantics of JS mean that all closures in a given
       | scope chain share that scope chain, which means if one closure
       | captures a variable, then all closures will end up keeping that
       | capture alive, and there is not a lot the JS engine can do to
       | limit that.
       | 
       | There are some steps that could be taken to mitigate or reduce
       | this, but doing that kind of flow analysis can become expensive
       | and a real issue JS engines have is that the overwhelming
       | majority of JS runs a tiny number of times, and is extremely
       | latency sensitive (this is why JSC has put so much effort into
       | parsing + interpreter perf), and any real data flow analysis is
       | too expensive for such code, and by the time code is hot enough
       | to have warranted that kind of analysis the overall program state
       | has got to a point where you cannot retroactively remove closure
       | references, so they remain.
       | 
       | Now something that you _could_ try as a developer in this kind of
       | scenario would be to use let, or a scoped let, to reduce the
       | sharing of scopes, e.g.                   function demo() {
       | let id;           {             let bigArrayBuffer = new
       | ArrayBuffer(100_000_000);             id = setTimeout(/* timeout
       | closure */ () => {
       | console.log(bigArrayBuffer.byteLength);             }, 1000);
       | }                    return /* cleanup closure */ () =>
       | clearTimeout(id);         }
       | 
       | which might resolve this issue, in this particular kind of case.
       | 
       | In principle an engine could introduce logic to try to track
       | exactly how many live closures reference a captured variable, but
       | this is also tricky as you could easily end up with something
       | like:                   function f() {           let x = new
       | GiantObject;           return (a) => {               if (a)
       | return (g) => { g(x) }               return (g) => { g(null); }
       | }         }         y = f() // y needs to keep x alive         y
       | = y(some value) // you get a new closure which
       | // may or may not be the one referencing
       | // x.
       | 
       | This is something you _could_ support, but there's a lot of
       | complexity to ensuring correct behavior and maintaining
       | performance in all the common cases, and it's possibly just not
       | worth it given the JS capturing model.
       | 
       | There are also a few things you could do that would likely be
       | relatively easy/low cost from a JS engine that would remove some
       | cases of excessive capture, but they'd still just be helping
       | super trivial cases like this reduced example code, not
       | necessarily any actual real world examples.
        
       | jtbandes wrote:
       | Chrome bug tracking this issue (since 2013):
       | https://issues.chromium.org/issues/41070945
        
       ___________________________________________________________________
       (page generated 2024-07-30 23:00 UTC)