[HN Gopher] Uber Engineering Tricks of the Trade: Tuning JVM Mem...
___________________________________________________________________
Uber Engineering Tricks of the Trade: Tuning JVM Memory for Large-
Scale Services
Author : commandlinefan
Score : 93 points
Date : 2021-05-18 16:41 UTC (6 hours ago)
(HTM) web link (eng.uber.com)
(TXT) w3m dump (eng.uber.com)
| SilurianWenlock wrote:
| I thought the concurrent GC algorithms (zing/azul, g1?) avoided
| these stop the world events by using concurrency?
| dharmab wrote:
| As discussed in the link, these algorithms reduce but do not
| entirely eliminate STW.
| _old_dude_ wrote:
| Recent GC algorithms slow down the application, either by not
| scheduling coroutines that allocate too much memory or by
| forcing them to do the marking or the evacuation (helping the
| GC) when running (or both).
|
| With that, STW pause time does not depend on the heap size or
| on the root set size (the stack size). In practice, it means
| pause < 1ms, at that point the OS becomes the bottleneck, not
| the GC.
|
| So latency is good but throughput can be reduced by 30%.
| crummybowley wrote:
| Just use go.
| asguy wrote:
| "Our Apache Hadoop-based data platform ingests hundreds of
| petabytes of analytical data with minimum latency and stores it
| in a data lake built on top of the Hadoop Distributed File System
| (HDFS)." "Did you just tell me to go fuck myself, Bob?" "I
| believe I did."
| bpodgursky wrote:
| This isn't especially strange or complicated. I guess you're
| just trying to be witty? But it comes across as pretty
| ignorant.
|
| Yeah if Uber was born today and didn't want to use any on-prem
| resources they'd use S3, but HDFS is the best alternative to
| object storage out there, and there's a huge ecosystem of tools
| around it. If you're running your own datacenter, there's not
| any serious alternative.
| pc86 wrote:
| I knew I recognized this from somewhere.
|
| http://howfuckedismydatabase.com/nosql/
|
| https://news.ycombinator.com/item?id=1636113
| suyash wrote:
| @Uber - have you considered GraalVM ?
| notdang wrote:
| Will it solve the problem? As I understand the same GC tuning
| will need to take place.
| throwaway7783 wrote:
| GP might've meant AOT
| cellularmitosis wrote:
| grall removes the need for JIT but does it also remove the
| need for GC?
| chrisseaton wrote:
| Graal has better escape analysis and so produces less
| garbage, which is the very best kind of garbage collection!
|
| That might be what they meant.
| _old_dude_ wrote:
| Tuning GCs like it was 2014 again.
|
| It's a little bit sad that they did not hire an expert.
| vajrabum wrote:
| I don't understand what you're trying to say here. Is the
| methodology they're using outdated? Or perhaps their approach
| seems naive? Or are you saying something else?
| _old_dude_ wrote:
| The whole section on how GCs work is far from accurate. It
| only describes how one GC, CMS, works or worked because it
| had been removed. The permanent generation does not exist
| anymore since Java 8, etc
|
| And i'm far from being an expert in GCs, but the basics are
| just wrong.
| viscanti wrote:
| Maybe they have to use some kind of legacy JVM library that
| doesn't work on Java 8? It looks like they're a GoLang
| shop, so I'm not sure why we'd assume their Java stuff is
| representative of anything other than what they have to do
| because some useful JVM library for mapping or something
| needs to be supported.
| _old_dude_ wrote:
| The problem is that Java 7 is 10 years old, it means that
| you are missing 10 years of optimization. For GCs that's
| a lot.
|
| GCs before ~2010 have been optimized for throughput more
| than for latency. Since then, you can choose.
|
| Since 10 years, G1 let you set your maximum pause time
| and you get a huge warning if it miss that target, ZGC or
| Shenandoah are from the beginning latency first,
| throughput second.
| erik_seaberg wrote:
| Yikes. Fortunately Java is a supported language at Uber,
| though some teams are more serious about it than others.
| Scala is also accepted because of Spark.
| viscanti wrote:
| I guess I misunderstood things then. It's difficult to
| reconcile why they'd be using a legacy version of Java
| that required these specific GC tunings if they weren't
| forced to use it because of libraries that couldn't be
| replicated or replaced in GoLang. Sounds like it's a much
| bigger mess there than what I had assumed (I guess I was
| giving them too much of the benefit of the doubt).
| vsto wrote:
| It is Hadoop that requires (mostly) Java 8 and Java 11
| (runtime only). https://cwiki.apache.org/confluence/displ
| ay/HADOOP/Hadoop+Ja...
| StreamBright wrote:
| Using Hadoop like it was 2014 again too.
|
| I am surprised how much legacy inefficient crap is lingering
| around in companies like Uber.
| pavitheran wrote:
| What should be used instead of Hadoop DFS in 2021?
| throwaway7783 wrote:
| Oh, these cloud kids...
|
| Edit: instant downvotes. Okay, S3/GCS/Azure is the typical
| answer (egress costs be damned)
| StreamBright wrote:
| I guess the same was said about those non-mainframe kids
| 30 years ago. Tech gets cheaper and better. You can rent
| CPU time instead of buying a server. It is just that
| simple.
| ppf wrote:
| You mean, like a mainframe? Tech goes round in circles,
| more like.
| StreamBright wrote:
| Depends on multiple factors: - S3 or compatible would be
| trivial choice for storage
|
| - if on-prem is a must there are multiple options,
| generally something with erasure codes (it is a game
| changer for storage)
|
| So far I have been using enterprise storage (that has some
| potential problems when mounted as nfs volumes), works for
| petabytes, already decouples storage from compute.
|
| More recently I was experimenting with MinIO. No conclusion
| so far.
|
| The problems are with Hadoop:
|
| - unfortunate design choices (namenode??)
|
| - extremely unfortunate implementation (I probably spent
| more time in the Hadoop codebase than any other, found many
| bugs, some I could fix, most I couldn't)
|
| I think I have migrated away from Hadoop 10 PB worth of
| data infra in the last 5 years, mostly to AWS, some to
| Azure. Average cost saving is between 10-30% yoy.
|
| Some comments point out the network cost. The reality is
| most companies collect a giant amount of data (ingress) and
| publish dashboards (egress). It makes cloud pretty viable.
|
| S3 is beating the shit out of HDFS in reliability and cost,
| even though most Hadoop shops spread the fud that it is
| slow. Same way these companies used to spread the fud that
| snappy is best for data compression.
|
| As of 2021 even the latest adopters (banks and insurance
| companies) use cloud. Maybe extremely few dogmatic
| companies remain in the onprem crowd. Even those will
| eventually give up.
| elmalto wrote:
| Did you move the compute layer to AWS as well? Did you
| see similar savings there as well for non-burst payloads?
| viscanti wrote:
| > I am surprised how much legacy inefficient crap is
| lingering around in companies like Uber.
|
| Maybe there are some JVM specific libraries they need for
| things like mapping that don't exist on GoLang? From reading
| their tech blog, it sounds like they're mostly a GoLang shop
| and they were apparently Python before that. So it seems like
| they're probably forced into using Java because some of the
| libraries they use aren't worth rewriting in-house to avoid
| having to use the JVM.
| StreamBright wrote:
| Don't get me wrong I have nothing against Java/JVM. I
| pretty much appreciate that tech. The criticism was
| specifically against Hadoop. I spent nearly 10 years on it.
| Luckily it is stack that most companies migrating away
| from.
| gher-shyu3i wrote:
| What do they migrate to if they need on-prem solutions?
| closeparen wrote:
| Well, throwing away and recreating all the data pipelines
| that have ben written since 2014 would be pretty inefficient.
___________________________________________________________________
(page generated 2021-05-18 23:01 UTC)