[HN Gopher] Lichess on Scala3 - Help needed
       ___________________________________________________________________
        
       Lichess on Scala3 - Help needed
        
       Author : dzogchen
       Score  : 360 points
       Date   : 2022-12-05 14:18 UTC (8 hours ago)
        
 (HTM) web link (lichess.org)
 (TXT) w3m dump (lichess.org)
        
       | CraigJPerry wrote:
       | At a cursory glance of the thread dumps i see the time spent in
       | the server compiler (C2) is very very high. It might be worth
       | exploring if that's expected?
       | 
       | Alternatively if you just want a quick way to rule it out you
       | could turn off tiered compilation, i had to google the option:
       | -XX:-TieredCompilation
        
         | agilob wrote:
         | C1 vs C2 is best shot here IMO. Except a normal memory leak of
         | course.
         | 
         | Problem with JVM is these compilation stages are difficult to
         | monitor and tune. They require logging, parsing logs and trial
         | and error approach.
         | 
         | OP definitely worth logging tiered compilation
         | `-XX:+PrintCompilation` and `-XX:+PrintInlining` and
         | `-XX:+LogCompilation`. When this turns out to be filled,
         | emptied and filled again, try increasing
         | `ReservedCodeCacheSize`
        
         | xxs wrote:
         | not having the tiered compilation would switch off c1, not c2.
         | Tiered compilation is mostly an issue if the code generated
         | remains c1 (the dumb compiler) and never promotes to c2... or
         | if there is an OSR (on stack replacement issue - bug)
         | 
         | flip note: "java -XX:+PrintFlagsFinal -version", to see all
         | available flags and their values, included the ones based on
         | ergonomics.
        
       | michaelt wrote:
       | _> I don 't think the garbage collector is to blame. During the
       | worst times, when the JVM almost maxed out 90 CPUs, the GC was
       | only using 3s of CPU time per minute._
       | 
       | The graph linked only shows 3s in _young gen GC_ - you should
       | check the time spent in the _old gen_ GC too.
       | 
       | You can get loads of time spent in GC even without running out of
       | memory - running out of other resources like file handles or
       | network connections will also trigger a full GC in the hopes of
       | freeing some up.
       | 
       | If you've got 1000 file handles available, one process that uses
       | 100 per second and doesn't leak, and another that uses 1 per hour
       | and leaks it, after 900 hours everything will look fine, then
       | after 1000 hours you'll run out - and the symptoms will manifest
       | in the first process, not the second.
       | 
       | Admittedly, there's a text file of jstats output linked which
       | doesn't show any full GC happening, so maybe this is nothing...
        
         | zug_zug wrote:
         | Would a way to test this hypothesis be to switch garbage-
         | collecting algorithm?
        
           | agilob wrote:
           | G1GC should be completely suitable, you can try with ZGC or
           | Shenondoah. Both have some memory "penalty", each object
           | takes a bit more memory, so with change you will see 5-15%
           | increase memory usage. This would be normal.
           | 
           | G1GC should be fine, so enable GC logging, analyse them
           | using:
           | 
           | https://www.tagtraum.com/gcviewer.html
           | 
           | or
           | 
           | https://gceasy.io/
           | 
           | You can log GC for long time with rotation, something like
           | this https://dzone.com/articles/try-to-avoid-
           | xxusegclogfilerotati...
           | 
           | For the GC analysing you will be looking for tenured
           | generation, so must add this flag:
           | -XX:+PrintTenuringDistribution
           | 
           | You would be looking for GC major and GC evacuation times.
           | Major GCs are STW and take more time, so overall the goal is
           | to eliminate them as much as possible.
           | 
           | I usually find it very important to have charts of heap
           | usage. Overall heap allocated (all regions) vs complete heap
           | size. The same for non-heap area.
        
             | randoglando wrote:
             | STW GC is only for ParallelGC right? Does not apply to
             | G1GC, ZGC or Shenandoah?
        
               | eonwe wrote:
               | G1 GC totally has stw mode.
        
       | nonethewiser wrote:
       | How much does lichess make from donations? Does it really cover
       | their $400k / year costs?
       | 
       | https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk...
        
         | Scarblac wrote:
         | Yes, they have no other income.
        
       | [deleted]
        
       | 0xFFFE wrote:
       | It appears the problem existed before the lila3 was deployed on
       | 11/22. If you notice the GC graph. The number of GC cycles/minute
       | kept increasing gradually starting on 11/10 and almost doubling
       | on 11/21. The 11/22 deployment of lila3 reset the graph and since
       | you have been restarting everyday since, we can't see the growth
       | of more than a day. My wild hunch is a code push on 11/10 causing
       | a memory leak, worth checking in my opinion.
        
         | nickspag wrote:
         | there were no deployments near 11/10, according to that graph -
         | they also say in the blog that scala2 could go for two weeks
         | without restart, so theyre presumably aware of some sort of
         | memory management issues they're just okay with it.
        
         | reidrac wrote:
         | I haven't looked at the graphs, but they updated netty to
         | 4.1.85.Final on 2022-11-10.
         | 
         | https://netty.io/news/2022/11/09/4-1-85-Final.html mentions
         | that a potential memory leak was fixed. In includes a debug log
         | warning of the leak; but enabling debug logging may be a no-no.
         | 
         | Perhaps could be worth reverting that and see if there's any
         | change. Sounds cheap and harmless.
        
           | agilob wrote:
           | Netty is a super complex and also super poorly documented
           | project. I did weeks of exploration and found these JVM args:
           | -Dio.netty.allocator.numDirectArenas=0
           | -Dio.netty.noPreferDirect=true
           | -Dio.netty.noUnsafe=true
           | 
           | work pretty well for us on any HTTP server. They slightly
           | reduce performance as HTTP pool is weaker, but deceases
           | memory usage by 25-40%, also eliminated one of a few memory
           | leaks in an older version of KeyCloak.
        
       | ulularem wrote:
       | Is the JVM running out of code/JIT cache space?
       | 
       | We had a similar problem recently in vanilla Java: monolith-like
       | servers would seem to eventually "go bad" for no real discernable
       | reason after hours/days of uptime. It turned out we needed to
       | increase the amount of code-cache size the JVM was allowed to
       | use.
       | 
       | Though supposedly the cache should have kept working (in the LRU-
       | like way many caches operate), we observed that formerly-fine
       | parts of the affected servers also seemed to be unreasonably
       | slow, as if the whole code/JIT caching behavior had been disabled
       | completely.
        
         | cldellow wrote:
         | The jstat output also shows that CCS is at 99.21% usage, which
         | could support this theory.
         | 
         | At a previous company, we operated some Scala services and ran
         | into an issue like this. I forget if it was triggered from a
         | JDK update or a Scala update. Scala (at least the 2.x series)
         | generated a lot of classes, so there was a lot of memory
         | pressure on this part of the system.
         | 
         | IIRC, we increased the limit and it resolved the issue.
         | 
         | I feel like there's a flag you can pass to see JIT invocations,
         | which might also help validate this as the problem.
         | -XX:+PrintCompilation maybe?
         | 
         | Caveats: It's been a long time, so my memory may be faulty, and
         | this may not apply any longer.
        
           | xxs wrote:
           | >The jstat output also shows that CCS is at 99.21% usage,
           | which could support this theory.
           | 
           | If code cache has run out, the process effectively runs in
           | interpreted mode. I'd wonder however how they would have so
           | much code. Still, they should just run a profiler or any java
           | monitoring tool.
        
             | munificent wrote:
             | _> I 'd wonder however how they would have so much code._
             | 
             | They probably didn't author that much code, but Scala 3 may
             | have many language features that its compiler desugars to
             | large amounts of generated code.
             | 
             | (For example, when C# added support for anonymous
             | functions, they initially did so by compiling each lambda
             | to a generated class with a field for each local variable
             | that the function closed over.)
        
               | xxs wrote:
               | They still need to use the functions quite a bit to
               | trigger at c1. Normally Java doesn't compile immediately
               | but after enough iterations.
               | 
               | Also if scala does that for real, it'd eat the inline
               | budget effectively killing the performance.
        
           | agilob wrote:
           | >The jstat output also shows that CCS is at 99.21% usage,
           | which could support this theory.
           | 
           | If the code cache is full, the cache sweeper will have more
           | work to do, will run slower, and this cache is a linked list
           | (afair), any attempt to create another C1/C2 optimised code
           | will cause the allocator to treverse the list, try to find
           | enough contiguous space, and fail, triggering an attempt to
           | fragment the space. Occasionally, removing some less
           | frequently used compiled caches.
           | 
           | This process isn't your normal GC process. If it runs out of
           | memory and nothing can be removed, you're at plateau of how
           | fast code can execute, but your JVM is consuming more CPU
           | cycles, that means, you're losing overall performance. There
           | is no OOM error here, it all fails and slows down silently.
           | No exceptions, no logs, nothing but wasted CPU cycles. This
           | is one of the worst aspects of JVM to monitor and tune. I
           | don't know of any promethues-like metric exporters that can
           | be used here, like in any GC activity or stack/heap metrics.
           | 
           | As I stated in another comment, try
           | 
           | `-XX:+PrintCompilation` and `-XX:+PrintInlining` and
           | `-XX:+LogCompilation`. When this turns out to be filled, try
           | increasing `ReservedCodeCacheSize`. This is out of your non-
           | heap area.
        
         | ulularem wrote:
         | To expand a bit more here (and I'll probably slightly flub some
         | terminology): the JVM of course does just-in-time hotspot
         | compilation of frequently-executed code. When it does so, it
         | caches the optimized code for later executions. There are
         | values set for how many times code is executed before it's
         | optimized in this way.
         | 
         | A monolith-like server (contains a lot of code to cache),
         | that's been up for hours/days (has eventually triggered many
         | disparate code paths enough to kick-off the optimization) in a
         | new version of a language (speculatively may contain more code
         | to optimize than the previous version) all seem like factors
         | pointing to this potential situation.
        
           | Floegipoky wrote:
           | Do you think that was driven by the code cache repeatedly
           | filling and flushing? And bumping the size resolved it
           | because it wasn't filling anymore? Did you experiment with
           | disabling flushing?
           | 
           | My team is scaling up a Java service and we don't have a lot
           | of institutional experience operating such systems, so I'm
           | really interested in JVM tuning "case studies".
        
           | whizzter wrote:
           | This definitely sounds like a plausible explanation if the
           | codegen for Scala3 has changed to enable more dynamism and
           | that in turn makes some functions/patterns far larger.
           | 
           | It seems the place to inform them might be on their email or
           | the discord so join up there and suggest it there?
           | 
           | Gonna be fun to hear the solution later.
        
         | michaelt wrote:
         | There's a JVM option - _-XX:+UnlockDiagnosticVMOptions
         | -XX:+LogCompilation_ which does what it sounds like.
         | 
         | The output might be instructive if you want to monitor the
         | compilation behaviour more closely. And there are tools like
         | JITWatch if you want to get into even more detail.
        
       | tmd83 wrote:
       | Does anyone know what they are generating
       | jvm.memory.allocation.sum data from?
        
       | ketzo wrote:
       | Off-topic response, but I had no idea Lichess did community-
       | driven development like this. Very cool.
        
         | Sebguer wrote:
         | It's open-source, so why wouldn't it?
        
           | ketzo wrote:
           | well for one, I didn't know Lichess was open source.
           | 
           | But for another, even for an open-source project, I love the
           | super public-but-detailed approach to this bug solving. It
           | almost feels like a bounty.
        
             | gowld wrote:
        
         | netsuitebitch wrote:
         | It's a bit rare.
        
       | morsch wrote:
       | Try looking at it with async-profiler. This can be done in
       | production. I discovered performance problems in unexpected (and
       | some expected) places with it in the past. It may be more helpful
       | if it's your application code that's to blame, though, less so if
       | it's the JVM itself.
       | 
       | https://github.com/jvm-profiling-tools/async-profiler
        
         | kelseyfrog wrote:
         | I'm surprised to find this comment at the bottom of the comment
         | section. My advice, like yours, is to profile performance
         | issues before getting into the hypothesis-change-measure loop.
         | Like you said, it quickly bifurcates the problem space into
         | application code and the JVM and eliminates entire classes of
         | performance problems.
         | 
         | It's important to point out too that while most people assume
         | cpu profiling, the JVM and hence most profiling tools also have
         | memory profiling which can be helpful in diagnosing problems
         | (especially ones related to GC). I hope the querent ends up
         | profiling and shares the results. It would be both fun and
         | productive to investigate the results collaboratively.
        
           | morsch wrote:
           | The cool thing is that async-profiler is lightweight enough
           | to run it in production, even against a server that's already
           | under heavy load (in my experience, ymmv). Oh and it's free.
           | We have a cron job running it three times a day (alloc and
           | cpu, both).
           | 
           | Of course the report is rudimentary compared to heavyweight
           | profilers.
        
         | Matthias247 wrote:
         | +1. Continuous profiling should help figuring out the root
         | cause of such issues. I have only used an in-house tool for
         | Java for this so far, so I don't have a recommendation for a
         | specific one. But the linked one looks reasonable, and there
         | might be other tools too.
         | 
         | It might not even require continuous profiling. One alternative
         | approach is to capture some minutes of profiling data after
         | startup (when CPU usage looks good), and some minutes after the
         | CPU spikes occur. Then compare those two and check for major
         | differences.
        
       | gavinray wrote:
       | I assume the author tried asking in the Scala discord?
       | 
       | https://discord.com/invite/scala
       | 
       | Most of the core community hangs out there, and some of the folks
       | that contribute to the compiler too. If there's someone that
       | knows, they're either on the Discord or the forums.
        
         | ajkjk wrote:
         | Hm why assume that? would never have thought to do that.
        
           | gavinray wrote:
           | I dunno, every language/tech thing has a Discord nowadays
           | (mostly). If I need help with something I usually go there
           | first.
           | 
           | Even for niche things like D language, the Discord is the
           | place to go. I learned Scala 3 and D mostly through the help
           | of folks from their Discord servers guiding me.
        
             | jraph wrote:
             | The use of Discord in free software communities never
             | ceases to depress and disappoint me. I hope this fad dies
             | soon. [1]
             | 
             | I would not risk assuming maintainers of an open source
             | project to have the reflex of jumping into Discord to ask
             | questions.
             | 
             | Anyway, D has a forum and an IRC channel. Scala has a
             | Discourse.
             | 
             | The nice thing about forums is that problems and solutions
             | are searchable by other people in the future. I learned
             | many things by myself thanks to this. I would not like to
             | live in a world where you need to engage with people all
             | the time, asking the same questions again and again, to use
             | some tool or some programming language.
             | 
             | [1] https://drewdevault.com/2022/03/29/free-software-free-
             | infras...
        
               | vips7L wrote:
               | Discord is also searchable by other people in the future.
               | Forum channels are exactly the use case you describe:
               | https://support.discord.com/hc/en-
               | us/articles/6208479917079-...
        
               | kevincox wrote:
               | Searchable within Discord. It can't be found by general
               | search engines and can't be archived. It's a walled
               | garden.
        
               | InGoodFaith wrote:
               | You might be interested in Linen to make your discord
               | (and slack) searchable outside of the walled garden (can
               | also use to archive too).
               | 
               | https://github.com/linen-dev/linen.dev
               | 
               | https://news.ycombinator.com/item?id=31494908
        
       | rthnbgrredf wrote:
        
       ___________________________________________________________________
       (page generated 2022-12-05 23:01 UTC)