[HN Gopher] Debugging a memory leak in a Clojure service
       ___________________________________________________________________
        
       Debugging a memory leak in a Clojure service
        
       Author : whiteros_e
       Score  : 43 points
       Date   : 2024-09-05 17:03 UTC (1 days ago)
        
 (HTM) web link (charanvasu.com)
 (TXT) w3m dump (charanvasu.com)
        
       | Sarkie wrote:
       | It's 9/10 always the classloader and a newInstance call on every
       | request.
        
       | ayewo wrote:
       | Interesting article.
       | 
       | 1. I'm having a bit of trouble parsing this paragraph:
       | 
       | > _The reason eval loads a new classloader every time is
       | justified as dynamically generated classes cannot be garbage
       | collected as long as the classloader is referencing to them. In
       | this case, single classloader evaluating all the forms and
       | generating new classes can lead to the generated class not being
       | garbage collected._
       | 
       |  _To avoid this, a new classloader is being created every time,
       | this way once the evaluation is done. The classloader will no
       | longer be reachable and all it's dynamically loaded class._
       | 
       | It sounds like the solution they adopted was to instantiate a
       | brand new classloader each time a dynamic class is evaluated,
       | rather than use a singleton classloader for the app's lifetime.
        
         | whiteros_e wrote:
         | Dynamic classes cannot be GC'd without the classloader being
         | dereferenced. In this case, if eval used an existing
         | classloader we would end up exhausting metaspace and leading to
         | MaxPermGen exception.
         | 
         | Initial Clojure implementation was checking for an already
         | created classloader and tried to reuse. They had commented out
         | the code that was doing it.
         | 
         | Link to the code in the compiler:
         | https://github.com/clojure/clojure/blob/clojure-1.11.0/src/j...
        
           | aardvark179 wrote:
           | These days there is some Machinery in the JVM for creating
           | truly anonymous classes that can be garbage collected (https:
           | //docs.oracle.com/en/java/javase/22/docs/api/java.base......)
           | ), but they are trickier to generate as they can't have
           | static fields (you don't have a class name so have no way to
           | refer to them) etc.
        
       | henning wrote:
       | If you can go from ~60ms p99 response times to ~45 from reduced
       | garbage collection, that means GC has a major impact on user-
       | perceptible performance on your application and proves that it is
       | an extremely expensive operation that should be carefully
       | managed. If you have a modern microservices Kubernetes blah blah
       | bullshit setup, this fraud detection service is probably only one
       | part of a chain of service calls that occurs during common user
       | operations at this company. How much of the time users wait for a
       | few hundred bytes of actual text to load on screen is spent
       | waiting for multiple cloud instances to GC?
       | 
       | The only way to eliminate its massive cost is to code the way
       | game programmers in managed languages do and not generate any
       | garbage, in which case GC doesn't even help you very much.
       | 
       | What should be hard about app scalability and performance is
       | scaling up the database and dealing with fundamental difficulties
       | of distributed systems. What is actually hard in practice is
       | dealing with the infinite tower of janky bullshit the Clean Code
       | Uncle Bob people have forced us to build which creates things
       | like massive GC overhead that is impossible to eliminate with
       | totally rewriting or redesigning the app.
        
         | whiteros_e wrote:
         | I've read somewhere that because of these getter setter
         | patterns, JVM authors had to optimise their JIT to detect and
         | inline those.
         | 
         | Related discussion on SO:
         | https://stackoverflow.com/questions/37109924/if-getter-sette...
        
           | mikmoila wrote:
           | There is nothing special in getters and setters, the runtime
           | sees them as methods and may optimize them as it'd do for any
           | other methods.
        
       | roenxi wrote:
       | This article showcases 2 harder-to-articulate features of
       | Clojure:
       | 
       | 1) Digging in to Clojure library source code is unsettlingly
       | easy. Clojure's core implementation has 2 layers - a pure Clojure
       | layer (which is remarkably terse, readable and interesting) and a
       | Java layer (which is more verbose). RT (Runtime) happens to be
       | one of the main parts of the Java layer. The experience of
       | looking into a clojure.core function and finding 2-10 line
       | implementation is normal.
       | 
       | 2) Code maintenance is generally pretty easy. In this case the
       | answer was "don't use eval" and I've had a lot of good
       | experiences where the answer to a performance problem is
       | similarly basic. The language tends to be responsible about using
       | resources.
        
       | MBlume wrote:
       | The article makes it sound like the system was using eval
       | (probably on a per-request basis, not just on start-up), and also
       | like ceasing to use eval was pretty trivial once they realized
       | eval was the problem. I'd be curious why they were using eval and
       | what they were able to do instead.
        
       | NightMKoder wrote:
       | If your clojure pods are getting OOMKilled, you have a
       | misconfigured JVM. The code (e.g. eval or not) mostly doesn't
       | matter.
       | 
       | If you have an actual memory leak in a JVM app what you want is
       | an exception called java.lang.OutOfMemoryError . This means the
       | heap is full and has no space for new objects even after a GC
       | run.
       | 
       | An OOMKilled means the JVM attempted to allocate memory from the
       | OS but the OS doesn't have any memory available. The kernel then
       | immediately kills the process. The problem is that the JVM at the
       | time thinks that _it should be able to allocate memory_ - i.e.
       | it's not trying to garbage collect old objects - it's just
       | calling malloc for some unrelated reason. It never gets a chance
       | to say "man I should clear up some space cause I'm running out".
       | The JVM doesn't know the cgroup memory limit.
       | 
       | So how do you convince the JVM that it really shouldn't be using
       | that much memory? It's...complicated. The big answer is -Xmx but
       | there's a ton more flags that matter (-Xss, -XX:MaxMetaspaceSize,
       | etc). Folks think that -XX:+UseContainerSupport fixes this whole
       | thing, but it doesn't; there's no magic bullet. See https://ihor-
       | mutel.medium.com/tracking-jvm-memory-issues-on-... for a good
       | discussion.
        
         | pwagland wrote:
         | This is one of the areas where OpenJ9 does things a lot better
         | than HotSpot. OpenJ9 uses one memory pool for _everything_,
         | HotSpot has a dozen different memory pools for different
         | purposes. This makes it much harder to tune HotSpot in
         | containers.
        
       ___________________________________________________________________
       (page generated 2024-09-06 23:01 UTC)