[HN Gopher] Debugging a memory leak in a Clojure service
___________________________________________________________________
Debugging a memory leak in a Clojure service
Author : whiteros_e
Score : 43 points
Date : 2024-09-05 17:03 UTC (1 days ago)
(HTM) web link (charanvasu.com)
(TXT) w3m dump (charanvasu.com)
| Sarkie wrote:
| It's 9/10 always the classloader and a newInstance call on every
| request.
| ayewo wrote:
| Interesting article.
|
| 1. I'm having a bit of trouble parsing this paragraph:
|
| > _The reason eval loads a new classloader every time is
| justified as dynamically generated classes cannot be garbage
| collected as long as the classloader is referencing to them. In
| this case, single classloader evaluating all the forms and
| generating new classes can lead to the generated class not being
| garbage collected._
|
| _To avoid this, a new classloader is being created every time,
| this way once the evaluation is done. The classloader will no
| longer be reachable and all it's dynamically loaded class._
|
| It sounds like the solution they adopted was to instantiate a
| brand new classloader each time a dynamic class is evaluated,
| rather than use a singleton classloader for the app's lifetime.
| whiteros_e wrote:
| Dynamic classes cannot be GC'd without the classloader being
| dereferenced. In this case, if eval used an existing
| classloader we would end up exhausting metaspace and leading to
| MaxPermGen exception.
|
| Initial Clojure implementation was checking for an already
| created classloader and tried to reuse. They had commented out
| the code that was doing it.
|
| Link to the code in the compiler:
| https://github.com/clojure/clojure/blob/clojure-1.11.0/src/j...
| aardvark179 wrote:
| These days there is some Machinery in the JVM for creating
| truly anonymous classes that can be garbage collected (https:
| //docs.oracle.com/en/java/javase/22/docs/api/java.base......)
| ), but they are trickier to generate as they can't have
| static fields (you don't have a class name so have no way to
| refer to them) etc.
| henning wrote:
| If you can go from ~60ms p99 response times to ~45 from reduced
| garbage collection, that means GC has a major impact on user-
| perceptible performance on your application and proves that it is
| an extremely expensive operation that should be carefully
| managed. If you have a modern microservices Kubernetes blah blah
| bullshit setup, this fraud detection service is probably only one
| part of a chain of service calls that occurs during common user
| operations at this company. How much of the time users wait for a
| few hundred bytes of actual text to load on screen is spent
| waiting for multiple cloud instances to GC?
|
| The only way to eliminate its massive cost is to code the way
| game programmers in managed languages do and not generate any
| garbage, in which case GC doesn't even help you very much.
|
| What should be hard about app scalability and performance is
| scaling up the database and dealing with fundamental difficulties
| of distributed systems. What is actually hard in practice is
| dealing with the infinite tower of janky bullshit the Clean Code
| Uncle Bob people have forced us to build which creates things
| like massive GC overhead that is impossible to eliminate with
| totally rewriting or redesigning the app.
| whiteros_e wrote:
| I've read somewhere that because of these getter setter
| patterns, JVM authors had to optimise their JIT to detect and
| inline those.
|
| Related discussion on SO:
| https://stackoverflow.com/questions/37109924/if-getter-sette...
| mikmoila wrote:
| There is nothing special in getters and setters, the runtime
| sees them as methods and may optimize them as it'd do for any
| other methods.
| roenxi wrote:
| This article showcases 2 harder-to-articulate features of
| Clojure:
|
| 1) Digging in to Clojure library source code is unsettlingly
| easy. Clojure's core implementation has 2 layers - a pure Clojure
| layer (which is remarkably terse, readable and interesting) and a
| Java layer (which is more verbose). RT (Runtime) happens to be
| one of the main parts of the Java layer. The experience of
| looking into a clojure.core function and finding 2-10 line
| implementation is normal.
|
| 2) Code maintenance is generally pretty easy. In this case the
| answer was "don't use eval" and I've had a lot of good
| experiences where the answer to a performance problem is
| similarly basic. The language tends to be responsible about using
| resources.
| MBlume wrote:
| The article makes it sound like the system was using eval
| (probably on a per-request basis, not just on start-up), and also
| like ceasing to use eval was pretty trivial once they realized
| eval was the problem. I'd be curious why they were using eval and
| what they were able to do instead.
| NightMKoder wrote:
| If your clojure pods are getting OOMKilled, you have a
| misconfigured JVM. The code (e.g. eval or not) mostly doesn't
| matter.
|
| If you have an actual memory leak in a JVM app what you want is
| an exception called java.lang.OutOfMemoryError . This means the
| heap is full and has no space for new objects even after a GC
| run.
|
| An OOMKilled means the JVM attempted to allocate memory from the
| OS but the OS doesn't have any memory available. The kernel then
| immediately kills the process. The problem is that the JVM at the
| time thinks that _it should be able to allocate memory_ - i.e.
| it's not trying to garbage collect old objects - it's just
| calling malloc for some unrelated reason. It never gets a chance
| to say "man I should clear up some space cause I'm running out".
| The JVM doesn't know the cgroup memory limit.
|
| So how do you convince the JVM that it really shouldn't be using
| that much memory? It's...complicated. The big answer is -Xmx but
| there's a ton more flags that matter (-Xss, -XX:MaxMetaspaceSize,
| etc). Folks think that -XX:+UseContainerSupport fixes this whole
| thing, but it doesn't; there's no magic bullet. See https://ihor-
| mutel.medium.com/tracking-jvm-memory-issues-on-... for a good
| discussion.
| pwagland wrote:
| This is one of the areas where OpenJ9 does things a lot better
| than HotSpot. OpenJ9 uses one memory pool for _everything_,
| HotSpot has a dozen different memory pools for different
| purposes. This makes it much harder to tune HotSpot in
| containers.
___________________________________________________________________
(page generated 2024-09-06 23:01 UTC)