[HN Gopher] JavaScript garbage collection and closures
___________________________________________________________________
JavaScript garbage collection and closures
Author : jaffathecake
Score : 97 points
Date : 2024-07-30 16:35 UTC (6 hours ago)
(HTM) web link (jakearchibald.com)
(TXT) w3m dump (jakearchibald.com)
| samanator wrote:
| I wonder how many petabytes of leaked memory there is in the
| world at any given time from open chrome tabs
| sroussey wrote:
| I'm sure this issue is found in react hooks all over the place.
| adhamsalama wrote:
| So it's like a ContextManager in Python? Nice!
| orf wrote:
| Why?
| kccqzy wrote:
| These days Chrome has a "memory saver" feature that deactivates
| inactive tabs and free their memory. It's something Apple
| pioneered in mobile Safari from the very beginning. For people
| like me who keep 500+ tabs open always, it's a great feature.
| packetlost wrote:
| TLDR: the entire scope for a closure is retained as long as that
| environment might still be referenced. There's no such thing as a
| partial scope in JavaScript (to my knowledge, please correct if
| wrong).
|
| In the example, if you don't capture `id` in the returned
| closure, the problem goes away.
| jaffathecake wrote:
| That's not quite right. See the final example in the article,
| it doesn't reference anything in the parent scope.
| packetlost wrote:
| Ah, you're right. The fact that a closure exists at all is
| enough to retain the parent environment.
| hinkley wrote:
| I know the debugger cannot always show you variables from the
| originating scope or even completed blocks above the breakpoint
| so it's clear that some data not used in the closure becomes
| unreachable while the closure is still live.
|
| But I've also spent time hunting down giant memory leaks where
| some things did get preserved that should not have. I'm so glad
| domains have finally gone from "deprecated" to deprecated
| (telling people not to use a feature when you haven't provided
| the replacement is fucking stupid) as most of that kind of
| problem that I can recall involved domains in some manner.
| munificent wrote:
| _> There 's no such thing as a partial scope in JavaScript (to
| my knowledge, please correct if wrong)._
|
| There's also no such thing as garbage collection in JavaScript.
| If you look purely at the language spec, then the language
| behaves as if there is infinite memory.
|
| Garbage collection and how that interacts with scopes is purely
| an implementation detail. A conforming JavaScript
| implementation definitely could determine that a given local
| variable captured by a closure will no longer be accessed and
| free the associated memory. It's just that the implemented
| tested by the author (I'm assuming Chrome) doesn't do that.
| packetlost wrote:
| > There's also no such thing as garbage collection in
| JavaScript.
|
| Incorrect. The details of garbage collection are left to the
| implementation, but the spec itself at least implies that GC
| is expected of implementations [0]. The current spec has an
| entire section on memory management [1] with several
| constructs for garbage collection and many explicit
| references to GC.
|
| That being said, it would be more correct to say _this
| implementation_ of JavaScript does not perform capture
| analysis, effectively creating partial scopes.
|
| [0]: https://262.ecma-international.org/7.0/#sec-weakmap-
| objects [1]: https://tc39.es/ecma262/#sec-managing-memory
| munificent wrote:
| Ah, weak references are a good point.
|
| Even so, you can have an entirely conforming JavaScript
| implementation with no GC. It just means that WeakRefs and
| WeakMaps will always continue to hold their references.
| beardyw wrote:
| Just today (re)discovered FinalizationRegistry which is a big
| help if you are worried about what may be left behind. It's quite
| nice to be able to see a log of objects disappearing.
| syspec wrote:
| Never heard of `FinalizationRegistry`, how do we access it?
| simlevesque wrote:
| https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
| jaffathecake wrote:
| Hah, that's what I was playing with when we discovered this
| issue.
| beardyw wrote:
| It's frustrating sometimes waiting for GC to kick in. The
| Chrome debugger seems to offer a GC now thing, but either it
| didn't work or it's just a kind of suggestion.
| jaffathecake wrote:
| I know this isn't helpful, but it worked fine for me.
| laurencerowe wrote:
| I found FinalizationRegistry had a severe performance cost in
| v8 when I tried it last year.
| qbane wrote:
| Can reproduce in latest Firefox and Chromium. I wonder whether
| this is an actual leak or there is a good reason for JS engines
| to retain the array buffer.
| blackhaj7 wrote:
| Interesting, succinct article. Love it.
|
| How can I try this myself? i.e. see bigArrayBuffer in memory and
| see if it is/isn't garbage collated. I am guessing I can use the
| Chrome debugger but I would love to know to do it how step by
| step if anyone has a link
| jaffathecake wrote:
| Using the memory tab in dev tools, do a heap snapshot. If it's
| 100mb, then that buffer is in there.
| ProfessorZoom wrote:
| look up how to use FinalizationRegistry
| skrebbel wrote:
| As an old OO guy, I like to think of a closure as syntax sugar
| for a class that's being generated which has fields for all the
| variables in a scope that are used in callbacks. In those terms,
| this is quite a bit less surprising. (That said, you could also
| imagine generating a separate class for each callback - I wonder
| why JS engines don't do that)
| dartos wrote:
| That's funny.
|
| As a functional guy, I see classes and objects as syntactic
| sugar for closures.
| lloeki wrote:
| _The venerable master Qc Na was walking with his student,
| Anton. Hoping to prompt the master into a discussion, Anton
| said_ "Master, I have heard that objects are a very good
| thing - is this true?" _Qc Na looked pityingly at his student
| and replied,_ "Foolish pupil - objects are merely a poor
| man's closures."
|
| _Chastised, Anton took his leave from his master and
| returned to his cell, intent on studying closures. He
| carefully read the entire "Lambda: The Ultimate..." series of
| papers and its cousins, and implemented a small Scheme
| interpreter with a closure-based object system. He learned
| much, and looked forward to informing his master of his
| progress._
|
| _On his next walk with Qc Na, Anton attempted to impress his
| master by saying_ "Master, I have diligently studied the
| matter, and now understand that objects are truly a poor
| man's closures." _Qc Na responded by hitting Anton with his
| stick, saying_ "When will you learn? Closures are a poor
| man's object." _At that moment, Anton became enlightened._
|
| https://people.csail.mit.edu/gregs/ll1-discuss-archive-
| html/...
| gavmor wrote:
| I've often wondered if "Qc Na" were a pun of some kind.
| dartos wrote:
| In the link they posted:
|
| > I'll take some koanic license and combine Norman Adams
| (alleged source of "objects are a poor man's closures")
| and Christian Queinnec ("closures are a poor man's
| objects") into a single great Zen language master named
| Qc Na.
| kazinator wrote:
| Could it be that the vector is hoisted outside of the function's
| inner scope, promoted to a literal-like datum attached to the
| outer function?
|
| It's obvious from the function that the ArrayBuffer object never
| escapes from the scope, and is never modified.
|
| If the object is never modified, there is no need to keep newly
| instantiating it; it can be hoisted to the function and attached
| to it somehow, so then to get rid of it, we have to lose the
| function itself.
| DanielHB wrote:
| This behavior is required because eval exists:
| function test(text) { const a = 1 function
| inner() { console.log(eval(text)) }
| inner() } test("() => { return a }")
|
| prints 1 to the console
|
| This happens because the closure's context object is shared
| between all closures in a given scope. So as soon as one variable
| from a give scope is accessed through a closure then all
| variables will be retained by all inner functions.
|
| Technically the engines could be optimizing it when no eval used
| is detected or when in strict mode (which blocks eval), but I
| guess that dynamically dropping values from a closure context
| based on inner lexical scopes can be really tricky thing to do
| and probably not worth the overhead.
| jaffathecake wrote:
| Nah, that's not it. Browsers do statically detect if eval is
| there or not, and react accordingly. This is possible because
| of https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...
| nathcd wrote:
| > Technically the engines could be optimizing it when no eval
| used is detected or when in strict mode (which blocks eval),
|
| I just learned about direct vs indirect eval
| (https://developer.mozilla.org/en-
| US/docs/Web/JavaScript/Refe...), which I imagine makes this a
| bit easier. The parent scope is only captured in a direct eval.
| olliej wrote:
| [later edit: I re-read the example code, and the issue is a
| double capture and the author not really understanding the
| semantics of capture in JS, I've given a more technical
| breakdown and explanation at
| https://news.ycombinator.com/item?id=41113481]
|
| Nah, in the es3.1 or 5 era we (tc39) fixed the semantics of
| eval to make it explicit that scope capture only occurs in a
| direct/unqualified eval (essentially eval becomes a pseudo
| keyword).
|
| The big question is scope capture - in JSC I implemented free
| vs captured variable analysis many many years ago, though the
| primary reason was to be able to avoid allocating the
| activation/scope object for a function at all, though that did
| allow us to handle this case (I'm curious if JSC does still do
| this).
|
| The problem with doing that is that it impacts developer tools
| significantly.
|
| (To be continued after changing device) (Edit: Continuing)
|
| Essentially developers want to be able to debug their code, but
| you run into issues if they start debugging after page load
| because if you do optimize this, then anything they may want to
| inspect may have been optimized away, at that point you (the
| engine dev) can't do anything to bring it back, but you could
| say "from this point on I'll keep things around" (JSC at least
| in the past would essentially discard all generated code when
| the dev tools were opened). You might also say "if the dev
| tools are _enabled_ (not open) " then I'll always do codegen
| assuming they'll be needed. Or you might say "if the dev tools
| are open on page load generate everything as [essentially]
| debug code" (which is still optimized codegen, but less
| aggressive about GC related things).
|
| All of those options work, but the problem you then have is
| potentially significant changes in behavior if the dev tools
| are enabled and/or open - especially nowadays where JS exposes
| various weak reference types where this kind of change directly
| impacts the behaviour of said weak references. So now the
| question becomes how much of a behavioural difference am I
| willing to accept between execution with or without dev tools
| involved.
|
| It's possible (I haven't worked directly on browser engines in
| a number of years now) that the consensus has become "no
| difference is ok" - this kind of space leak is not super
| common, etc and the confusion from different behavior
| with/with-out dev tools might be considered fairly obnoxious.
| paulddraper wrote:
| No. JS engines detect of eval in the AST.
|
| It's why you can't rename eval and reference the current scope.
| const eval2 = eval; (() => { const a = 1;
| eval2("a"); })(); // ReferenceError: a is not
| defined
|
| ---
|
| The reason JS engines choose to share the closure record among
| all closures create in the function scope is purely for
| performance. function f() { const a =
| {}; const b = {}; return [ () => a,
| () => b, ]; }
|
| Both functions share the same single closure record,
| referencing a and b.
|
| JavaScript developers have rediscovered this over and over and
| over again.
|
| * Meteor.js (2013) [1]
|
| * My personal journey (2013) [2]
|
| [1] https://blog.meteor.com/an-interesting-kind-of-javascript-
| me...
|
| [2] https://stackoverflow.com/questions/19798803/how-
| javascript-...
| Feathercrown wrote:
| You can rename eval, but the renamed eval will be considered
| "indirect" and will run in the global scope.
| paulddraper wrote:
| Yes, corrected, thank you.
| ballenf wrote:
| That code outputs "() => { return a }", since there's no
| function invocation.
| adhamsalama wrote:
| Insightful article, thanks for sharing!
| bgirard wrote:
| At Meta I worked on memlab [1]. This tool is very effective at
| finding memory leaks in our JavaScript. AFAIK we found one, but
| only one, such leak that happened in production code. Once
| discovered it was easily fixed. But understanding this class of
| issue was important to make sense of the leak report.
|
| [1] https://facebook.github.io/memlab/
| brundolf wrote:
| Can you speak at all to why it works this way?
| bgirard wrote:
| I spoke to V8 engineers about this many years ago. JS VMs
| -could- handle this without leaking, but they would have to
| do more analysis when compiling code. So are we willing to
| trade-off startup time to fix a somewhat rare memory usage? I
| don't think anyone has spent the time to study this in depth
| and collect numbers to weight the tradeoff.
|
| If suddenly this problem became a big deal and caused a lot
| of leaks then I think we'd see JS VMs fix it. Perhaps one day
| if a framework makes this kind of error more frequent.
| unstirrer wrote:
| See also https://news.ycombinator.com/item?id=5959020
| brundolf wrote:
| It would be intuitive to me if function closures only retained
| things they reference
|
| It would also be intuitive to me if closures naively retained
| everything in scope
|
| It's bizarre to me that the behavior is "if one closure
| references something, then all of them retain it"
|
| I guess maybe it's a stack vs heap thing? If nothing retains a
| variable then it can be kept on the stack, but once it has to
| outlive the function it has to be moved. Still odd the
| bookkeeping can't distinguish closures that references it from
| ones that don't, if it already has to check that for the entire
| set
| sam_perez wrote:
| Yeah, this does seem like an odd middle ground.
|
| But I suppose it's more about what you're tracking? For
| example, instead of tracking reference counts to variable,
| they're tracking reference counts to each scope. Then, if a
| scope has no more references, the variables it owns are cleaned
| up.
| pizlonator wrote:
| Fundamental downside of tracing garbage collection.
|
| Reachability is a conservative approximation of the set of
| objects that need to be kept alive.
|
| Not saying don't use GC, but this is a great example of how there
| are no silver bullets in memory management - only imperfect trade
| offs.
| paulddraper wrote:
| Not really. It is trivially knowable that () =>
| clearTimeout(id) does not reference bigArrayBuffer.
|
| The design is not because it is hard to know what needs to be
| kept alive, but rather it is more efficient to use one closure
| record with id and bigArrayBuffer than two different ones.
| pizlonator wrote:
| Fix this one issue and you'll find another.
| paulddraper wrote:
| It's pretty simple: If the closure doesn't reference a
| variable, it doesn't need the variable.
|
| Can't get more basic than that.
| pizlonator wrote:
| Closures aren't the only case where this happens.
|
| And your solution is not adequate: the closure does
| reference the variable, but it's possible to have a
| reference to the closure that exists for some reason
| other than calling the closure (the second closure does
| this - it references the closure that references the
| variable, but won't call the closure that references the
| variable). It's not straightforward to solve that
| problem, and the solution involves a harder constraint
| system than just object-to-object reachability.
| paulddraper wrote:
| This is just tracing GC. Nothing more or less than that.
|
| The only difference is how you construct the closure
| record for your GC.
|
| Using the example: function f() {
| const a = {}; const b = {}; return [
| () => a, () => b, ]; }
|
| You can choose to have one scope record shared by both
| functions, or individual record.
| function1 ----| ----> a v |
| scope ^ | function2 ----|
| ----> b
|
| Or function1 --> scope1 --> a
| function2 --> scope2 --> b
|
| JS engines choose to do the former, not out of simplicity
| (arguably it's more complicated), but due to performance.
| deredede wrote:
| The language specification says to use a single scope
| (see olliej's detailed comment). You could change it to
| use separate scopes here but it won't solve the more
| generic issue of imprecise captures, just this specific
| case.
| kaba0 wrote:
| You are not absent from conservative approximation even with
| manual memory management. E.g. a map is a trivial example where
| you can't always know which elements will be used during the
| lifetime of the program (and only those has to be kept alive).
| pizlonator wrote:
| Yeah totally. No silver bullets.
| lxe wrote:
| I got confused here, because in the timers example, I assumed
| that without calling cancelDemo/clearTimeout, we cannot assume
| that the timer's function is no longer callable. Heck, even WITH
| clearTimeout... without understanding its implemetation, we
| cannot assume it's no longer callable by some internal system.
|
| I think you should call cancelDemo() in the article to show that
| yes, even when we assume that the reference to the closure is
| cleaned up, the allocation persists. (() => {
| // Assumed implementation of setTimeout // in which we
| cannot make assumptions that fn // is inaccessible after
| the first invocation const timers = []; function
| mySetTimeout(fn) { fn(); // retain the
| function for some reason // maybe because of
| timers/intervals implemetation detail timers.push(fn);
| const id = timers.length - 1; return id }
| function myClearTimeout(id) { // this makes it to fn
| has no more references // but it still gets retained
| const fn = timers.splice(id, 1); // For good
| measure delete fn; } function
| demo() { const bigArrayBuffer = new
| ArrayBuffer(100_000_000); const id = mySetTimeout(() =>
| { console.log(bigArrayBuffer.byteLength);
| }, 1000); return () => myClearTimeout(id);
| } cancelDemo = demo(); // Even when
| calling clearTimeout, // bigArrayBuffer is still
| allocated, // Which is the crux of the article
| cancelDemo(); // Now it's actually
| deallocated, as // shows in the article delete
| cancelDemo; })();
| olliej wrote:
| Ok, so there have been a lot of comments on what causes this,
| which are not correct, and a few that are pretty close. Here is
| the actual explanation of what is happening and what is causing
| the leak.
|
| For context, I used to be on TC39 (the ecmascript standards
| committee) and spent many many years working on JSC, and
| specifically working on the GC and closure modeling.
|
| First off: this is not due to eval. In ES3.1 or ES5 (alas I can't
| recall which) we (tc39) clarified the semantics of eval, to only
| evaluate in the containing scope if it is called directly -
| essentially turning it into a pseudo operator (implementations
| today generally implement a direct eval as `if (target function
| == real eval function) { do eval } else { call the function }`.
| Calling eval in any way other that `eval(<expression>)` will not
| invoke the scope capturing behavior of eval (this is a strict
| requirement to allow fast access to non-local variables).
|
| The function being reported as exhibiting the bad/unexpected
| behavior in the post is: function demo() {
| const bigArrayBuffer = new ArrayBuffer(100_000_000);
| const id = setTimeout(/* timeout closure */ () => {
| console.log(bigArrayBuffer.byteLength); }, 1000);
| return /* cleanup closure */ () => clearTimeout(id); }
|
| If we were to follow the spec language fairly explicitly, the
| behavior of this function is (eliding exact semantics of
| everything other than creation of function objects and closures)
| 1. enter the function 2. env = create an empty lexical
| environment object (I may use "activation" by
| accident because that was the spec language when
| I was first working on JS engines) a)
| set the parent scope of env to the internal scope reference of
| the callee (in this case because demo is a global
| function this will be the global object)
| b) add a property "bigArrayBuffer" to env, setting the value
| to undefined c) add a property "id" to env, setting the
| value to undefined 3. evaluate `new
| ArrayBuffer(100_000_000)` and assign the result to the
| "bigArrayBuffer" property of env 4. Construct a function
| object for the timeout closure, and set its internal
| scope reference to *env* (i.e. capture the containing
| scope) 5. call setTimeout passing the function from (4)
| and 1000 as, and assign the result to the "id"
| property on the env object 6. construct the cleanup
| closure, and set the internal scope property to env
|
| The result of this is that we end up with the following set of
| objects: globalObject = {.....} demo =
| Function { @scope: globalObject } <demo_env> (not
| directly exposed anywhere) = LexicalEnvironment {
| @scope: demo.@scope,
| bigArrayBuffer: big array, id:
| number } <timeout closure>
| = Function { @scope: demo_env } <cleanup closure> =
| Function { @scope: demo_env }
|
| At which point you can see as long as either closure is live, the
| reference to bigArrayBuffer is reachable and therefore kept
| alive.
|
| Now, I was confused about this report originally as I know JSC at
| least does do free var anaylsis (and I can't imagine v8 doesn't,
| not sure about SM these days) to reduce false captures, because I
| had not properly read the example code, and was like "why is this
| being kept alive", but having actually read the code properly and
| written out the above it's hopefully very obvious to everyone
| now.
|
| The language semantics of JS mean that all closures in a given
| scope chain share that scope chain, which means if one closure
| captures a variable, then all closures will end up keeping that
| capture alive, and there is not a lot the JS engine can do to
| limit that.
|
| There are some steps that could be taken to mitigate or reduce
| this, but doing that kind of flow analysis can become expensive
| and a real issue JS engines have is that the overwhelming
| majority of JS runs a tiny number of times, and is extremely
| latency sensitive (this is why JSC has put so much effort into
| parsing + interpreter perf), and any real data flow analysis is
| too expensive for such code, and by the time code is hot enough
| to have warranted that kind of analysis the overall program state
| has got to a point where you cannot retroactively remove closure
| references, so they remain.
|
| Now something that you _could_ try as a developer in this kind of
| scenario would be to use let, or a scoped let, to reduce the
| sharing of scopes, e.g. function demo() {
| let id; { let bigArrayBuffer = new
| ArrayBuffer(100_000_000); id = setTimeout(/* timeout
| closure */ () => {
| console.log(bigArrayBuffer.byteLength); }, 1000);
| } return /* cleanup closure */ () =>
| clearTimeout(id); }
|
| which might resolve this issue, in this particular kind of case.
|
| In principle an engine could introduce logic to try to track
| exactly how many live closures reference a captured variable, but
| this is also tricky as you could easily end up with something
| like: function f() { let x = new
| GiantObject; return (a) => { if (a)
| return (g) => { g(x) } return (g) => { g(null); }
| } } y = f() // y needs to keep x alive y
| = y(some value) // you get a new closure which
| // may or may not be the one referencing
| // x.
|
| This is something you _could_ support, but there's a lot of
| complexity to ensuring correct behavior and maintaining
| performance in all the common cases, and it's possibly just not
| worth it given the JS capturing model.
|
| There are also a few things you could do that would likely be
| relatively easy/low cost from a JS engine that would remove some
| cases of excessive capture, but they'd still just be helping
| super trivial cases like this reduced example code, not
| necessarily any actual real world examples.
| jtbandes wrote:
| Chrome bug tracking this issue (since 2013):
| https://issues.chromium.org/issues/41070945
___________________________________________________________________
(page generated 2024-07-30 23:00 UTC)