[HN Gopher] Lichess on Scala3 - Help needed
___________________________________________________________________
Lichess on Scala3 - Help needed
Author : dzogchen
Score : 360 points
Date : 2022-12-05 14:18 UTC (8 hours ago)
(HTM) web link (lichess.org)
(TXT) w3m dump (lichess.org)
| CraigJPerry wrote:
| At a cursory glance of the thread dumps i see the time spent in
| the server compiler (C2) is very very high. It might be worth
| exploring if that's expected?
|
| Alternatively if you just want a quick way to rule it out you
| could turn off tiered compilation, i had to google the option:
| -XX:-TieredCompilation
| agilob wrote:
| C1 vs C2 is best shot here IMO. Except a normal memory leak of
| course.
|
| Problem with JVM is these compilation stages are difficult to
| monitor and tune. They require logging, parsing logs and trial
| and error approach.
|
| OP definitely worth logging tiered compilation
| `-XX:+PrintCompilation` and `-XX:+PrintInlining` and
| `-XX:+LogCompilation`. When this turns out to be filled,
| emptied and filled again, try increasing
| `ReservedCodeCacheSize`
| xxs wrote:
| not having the tiered compilation would switch off c1, not c2.
| Tiered compilation is mostly an issue if the code generated
| remains c1 (the dumb compiler) and never promotes to c2... or
| if there is an OSR (on stack replacement issue - bug)
|
| flip note: "java -XX:+PrintFlagsFinal -version", to see all
| available flags and their values, included the ones based on
| ergonomics.
| michaelt wrote:
| _> I don 't think the garbage collector is to blame. During the
| worst times, when the JVM almost maxed out 90 CPUs, the GC was
| only using 3s of CPU time per minute._
|
| The graph linked only shows 3s in _young gen GC_ - you should
| check the time spent in the _old gen_ GC too.
|
| You can get loads of time spent in GC even without running out of
| memory - running out of other resources like file handles or
| network connections will also trigger a full GC in the hopes of
| freeing some up.
|
| If you've got 1000 file handles available, one process that uses
| 100 per second and doesn't leak, and another that uses 1 per hour
| and leaks it, after 900 hours everything will look fine, then
| after 1000 hours you'll run out - and the symptoms will manifest
| in the first process, not the second.
|
| Admittedly, there's a text file of jstats output linked which
| doesn't show any full GC happening, so maybe this is nothing...
| zug_zug wrote:
| Would a way to test this hypothesis be to switch garbage-
| collecting algorithm?
| agilob wrote:
| G1GC should be completely suitable, you can try with ZGC or
| Shenondoah. Both have some memory "penalty", each object
| takes a bit more memory, so with change you will see 5-15%
| increase memory usage. This would be normal.
|
| G1GC should be fine, so enable GC logging, analyse them
| using:
|
| https://www.tagtraum.com/gcviewer.html
|
| or
|
| https://gceasy.io/
|
| You can log GC for long time with rotation, something like
| this https://dzone.com/articles/try-to-avoid-
| xxusegclogfilerotati...
|
| For the GC analysing you will be looking for tenured
| generation, so must add this flag:
| -XX:+PrintTenuringDistribution
|
| You would be looking for GC major and GC evacuation times.
| Major GCs are STW and take more time, so overall the goal is
| to eliminate them as much as possible.
|
| I usually find it very important to have charts of heap
| usage. Overall heap allocated (all regions) vs complete heap
| size. The same for non-heap area.
| randoglando wrote:
| STW GC is only for ParallelGC right? Does not apply to
| G1GC, ZGC or Shenandoah?
| eonwe wrote:
| G1 GC totally has stw mode.
| nonethewiser wrote:
| How much does lichess make from donations? Does it really cover
| their $400k / year costs?
|
| https://docs.google.com/spreadsheets/d/1Si3PMUJGR9KrpE5lngSk...
| Scarblac wrote:
| Yes, they have no other income.
| [deleted]
| 0xFFFE wrote:
| It appears the problem existed before the lila3 was deployed on
| 11/22. If you notice the GC graph. The number of GC cycles/minute
| kept increasing gradually starting on 11/10 and almost doubling
| on 11/21. The 11/22 deployment of lila3 reset the graph and since
| you have been restarting everyday since, we can't see the growth
| of more than a day. My wild hunch is a code push on 11/10 causing
| a memory leak, worth checking in my opinion.
| nickspag wrote:
| there were no deployments near 11/10, according to that graph -
| they also say in the blog that scala2 could go for two weeks
| without restart, so theyre presumably aware of some sort of
| memory management issues they're just okay with it.
| reidrac wrote:
| I haven't looked at the graphs, but they updated netty to
| 4.1.85.Final on 2022-11-10.
|
| https://netty.io/news/2022/11/09/4-1-85-Final.html mentions
| that a potential memory leak was fixed. In includes a debug log
| warning of the leak; but enabling debug logging may be a no-no.
|
| Perhaps could be worth reverting that and see if there's any
| change. Sounds cheap and harmless.
| agilob wrote:
| Netty is a super complex and also super poorly documented
| project. I did weeks of exploration and found these JVM args:
| -Dio.netty.allocator.numDirectArenas=0
| -Dio.netty.noPreferDirect=true
| -Dio.netty.noUnsafe=true
|
| work pretty well for us on any HTTP server. They slightly
| reduce performance as HTTP pool is weaker, but deceases
| memory usage by 25-40%, also eliminated one of a few memory
| leaks in an older version of KeyCloak.
| ulularem wrote:
| Is the JVM running out of code/JIT cache space?
|
| We had a similar problem recently in vanilla Java: monolith-like
| servers would seem to eventually "go bad" for no real discernable
| reason after hours/days of uptime. It turned out we needed to
| increase the amount of code-cache size the JVM was allowed to
| use.
|
| Though supposedly the cache should have kept working (in the LRU-
| like way many caches operate), we observed that formerly-fine
| parts of the affected servers also seemed to be unreasonably
| slow, as if the whole code/JIT caching behavior had been disabled
| completely.
| cldellow wrote:
| The jstat output also shows that CCS is at 99.21% usage, which
| could support this theory.
|
| At a previous company, we operated some Scala services and ran
| into an issue like this. I forget if it was triggered from a
| JDK update or a Scala update. Scala (at least the 2.x series)
| generated a lot of classes, so there was a lot of memory
| pressure on this part of the system.
|
| IIRC, we increased the limit and it resolved the issue.
|
| I feel like there's a flag you can pass to see JIT invocations,
| which might also help validate this as the problem.
| -XX:+PrintCompilation maybe?
|
| Caveats: It's been a long time, so my memory may be faulty, and
| this may not apply any longer.
| xxs wrote:
| >The jstat output also shows that CCS is at 99.21% usage,
| which could support this theory.
|
| If code cache has run out, the process effectively runs in
| interpreted mode. I'd wonder however how they would have so
| much code. Still, they should just run a profiler or any java
| monitoring tool.
| munificent wrote:
| _> I 'd wonder however how they would have so much code._
|
| They probably didn't author that much code, but Scala 3 may
| have many language features that its compiler desugars to
| large amounts of generated code.
|
| (For example, when C# added support for anonymous
| functions, they initially did so by compiling each lambda
| to a generated class with a field for each local variable
| that the function closed over.)
| xxs wrote:
| They still need to use the functions quite a bit to
| trigger at c1. Normally Java doesn't compile immediately
| but after enough iterations.
|
| Also if scala does that for real, it'd eat the inline
| budget effectively killing the performance.
| agilob wrote:
| >The jstat output also shows that CCS is at 99.21% usage,
| which could support this theory.
|
| If the code cache is full, the cache sweeper will have more
| work to do, will run slower, and this cache is a linked list
| (afair), any attempt to create another C1/C2 optimised code
| will cause the allocator to treverse the list, try to find
| enough contiguous space, and fail, triggering an attempt to
| fragment the space. Occasionally, removing some less
| frequently used compiled caches.
|
| This process isn't your normal GC process. If it runs out of
| memory and nothing can be removed, you're at plateau of how
| fast code can execute, but your JVM is consuming more CPU
| cycles, that means, you're losing overall performance. There
| is no OOM error here, it all fails and slows down silently.
| No exceptions, no logs, nothing but wasted CPU cycles. This
| is one of the worst aspects of JVM to monitor and tune. I
| don't know of any promethues-like metric exporters that can
| be used here, like in any GC activity or stack/heap metrics.
|
| As I stated in another comment, try
|
| `-XX:+PrintCompilation` and `-XX:+PrintInlining` and
| `-XX:+LogCompilation`. When this turns out to be filled, try
| increasing `ReservedCodeCacheSize`. This is out of your non-
| heap area.
| ulularem wrote:
| To expand a bit more here (and I'll probably slightly flub some
| terminology): the JVM of course does just-in-time hotspot
| compilation of frequently-executed code. When it does so, it
| caches the optimized code for later executions. There are
| values set for how many times code is executed before it's
| optimized in this way.
|
| A monolith-like server (contains a lot of code to cache),
| that's been up for hours/days (has eventually triggered many
| disparate code paths enough to kick-off the optimization) in a
| new version of a language (speculatively may contain more code
| to optimize than the previous version) all seem like factors
| pointing to this potential situation.
| Floegipoky wrote:
| Do you think that was driven by the code cache repeatedly
| filling and flushing? And bumping the size resolved it
| because it wasn't filling anymore? Did you experiment with
| disabling flushing?
|
| My team is scaling up a Java service and we don't have a lot
| of institutional experience operating such systems, so I'm
| really interested in JVM tuning "case studies".
| whizzter wrote:
| This definitely sounds like a plausible explanation if the
| codegen for Scala3 has changed to enable more dynamism and
| that in turn makes some functions/patterns far larger.
|
| It seems the place to inform them might be on their email or
| the discord so join up there and suggest it there?
|
| Gonna be fun to hear the solution later.
| michaelt wrote:
| There's a JVM option - _-XX:+UnlockDiagnosticVMOptions
| -XX:+LogCompilation_ which does what it sounds like.
|
| The output might be instructive if you want to monitor the
| compilation behaviour more closely. And there are tools like
| JITWatch if you want to get into even more detail.
| tmd83 wrote:
| Does anyone know what they are generating
| jvm.memory.allocation.sum data from?
| ketzo wrote:
| Off-topic response, but I had no idea Lichess did community-
| driven development like this. Very cool.
| Sebguer wrote:
| It's open-source, so why wouldn't it?
| ketzo wrote:
| well for one, I didn't know Lichess was open source.
|
| But for another, even for an open-source project, I love the
| super public-but-detailed approach to this bug solving. It
| almost feels like a bounty.
| gowld wrote:
| netsuitebitch wrote:
| It's a bit rare.
| morsch wrote:
| Try looking at it with async-profiler. This can be done in
| production. I discovered performance problems in unexpected (and
| some expected) places with it in the past. It may be more helpful
| if it's your application code that's to blame, though, less so if
| it's the JVM itself.
|
| https://github.com/jvm-profiling-tools/async-profiler
| kelseyfrog wrote:
| I'm surprised to find this comment at the bottom of the comment
| section. My advice, like yours, is to profile performance
| issues before getting into the hypothesis-change-measure loop.
| Like you said, it quickly bifurcates the problem space into
| application code and the JVM and eliminates entire classes of
| performance problems.
|
| It's important to point out too that while most people assume
| cpu profiling, the JVM and hence most profiling tools also have
| memory profiling which can be helpful in diagnosing problems
| (especially ones related to GC). I hope the querent ends up
| profiling and shares the results. It would be both fun and
| productive to investigate the results collaboratively.
| morsch wrote:
| The cool thing is that async-profiler is lightweight enough
| to run it in production, even against a server that's already
| under heavy load (in my experience, ymmv). Oh and it's free.
| We have a cron job running it three times a day (alloc and
| cpu, both).
|
| Of course the report is rudimentary compared to heavyweight
| profilers.
| Matthias247 wrote:
| +1. Continuous profiling should help figuring out the root
| cause of such issues. I have only used an in-house tool for
| Java for this so far, so I don't have a recommendation for a
| specific one. But the linked one looks reasonable, and there
| might be other tools too.
|
| It might not even require continuous profiling. One alternative
| approach is to capture some minutes of profiling data after
| startup (when CPU usage looks good), and some minutes after the
| CPU spikes occur. Then compare those two and check for major
| differences.
| gavinray wrote:
| I assume the author tried asking in the Scala discord?
|
| https://discord.com/invite/scala
|
| Most of the core community hangs out there, and some of the folks
| that contribute to the compiler too. If there's someone that
| knows, they're either on the Discord or the forums.
| ajkjk wrote:
| Hm why assume that? would never have thought to do that.
| gavinray wrote:
| I dunno, every language/tech thing has a Discord nowadays
| (mostly). If I need help with something I usually go there
| first.
|
| Even for niche things like D language, the Discord is the
| place to go. I learned Scala 3 and D mostly through the help
| of folks from their Discord servers guiding me.
| jraph wrote:
| The use of Discord in free software communities never
| ceases to depress and disappoint me. I hope this fad dies
| soon. [1]
|
| I would not risk assuming maintainers of an open source
| project to have the reflex of jumping into Discord to ask
| questions.
|
| Anyway, D has a forum and an IRC channel. Scala has a
| Discourse.
|
| The nice thing about forums is that problems and solutions
| are searchable by other people in the future. I learned
| many things by myself thanks to this. I would not like to
| live in a world where you need to engage with people all
| the time, asking the same questions again and again, to use
| some tool or some programming language.
|
| [1] https://drewdevault.com/2022/03/29/free-software-free-
| infras...
| vips7L wrote:
| Discord is also searchable by other people in the future.
| Forum channels are exactly the use case you describe:
| https://support.discord.com/hc/en-
| us/articles/6208479917079-...
| kevincox wrote:
| Searchable within Discord. It can't be found by general
| search engines and can't be archived. It's a walled
| garden.
| InGoodFaith wrote:
| You might be interested in Linen to make your discord
| (and slack) searchable outside of the walled garden (can
| also use to archive too).
|
| https://github.com/linen-dev/linen.dev
|
| https://news.ycombinator.com/item?id=31494908
| rthnbgrredf wrote:
___________________________________________________________________
(page generated 2022-12-05 23:01 UTC)