[HN Gopher] Uber Engineering Tricks of the Trade: Tuning JVM Mem...
       ___________________________________________________________________
        
       Uber Engineering Tricks of the Trade: Tuning JVM Memory for Large-
       Scale Services
        
       Author : commandlinefan
       Score  : 93 points
       Date   : 2021-05-18 16:41 UTC (6 hours ago)
        
 (HTM) web link (eng.uber.com)
 (TXT) w3m dump (eng.uber.com)
        
       | SilurianWenlock wrote:
       | I thought the concurrent GC algorithms (zing/azul, g1?) avoided
       | these stop the world events by using concurrency?
        
         | dharmab wrote:
         | As discussed in the link, these algorithms reduce but do not
         | entirely eliminate STW.
        
           | _old_dude_ wrote:
           | Recent GC algorithms slow down the application, either by not
           | scheduling coroutines that allocate too much memory or by
           | forcing them to do the marking or the evacuation (helping the
           | GC) when running (or both).
           | 
           | With that, STW pause time does not depend on the heap size or
           | on the root set size (the stack size). In practice, it means
           | pause < 1ms, at that point the OS becomes the bottleneck, not
           | the GC.
           | 
           | So latency is good but throughput can be reduced by 30%.
        
       | crummybowley wrote:
       | Just use go.
        
       | asguy wrote:
       | "Our Apache Hadoop-based data platform ingests hundreds of
       | petabytes of analytical data with minimum latency and stores it
       | in a data lake built on top of the Hadoop Distributed File System
       | (HDFS)." "Did you just tell me to go fuck myself, Bob?" "I
       | believe I did."
        
         | bpodgursky wrote:
         | This isn't especially strange or complicated. I guess you're
         | just trying to be witty? But it comes across as pretty
         | ignorant.
         | 
         | Yeah if Uber was born today and didn't want to use any on-prem
         | resources they'd use S3, but HDFS is the best alternative to
         | object storage out there, and there's a huge ecosystem of tools
         | around it. If you're running your own datacenter, there's not
         | any serious alternative.
        
         | pc86 wrote:
         | I knew I recognized this from somewhere.
         | 
         | http://howfuckedismydatabase.com/nosql/
         | 
         | https://news.ycombinator.com/item?id=1636113
        
       | suyash wrote:
       | @Uber - have you considered GraalVM ?
        
         | notdang wrote:
         | Will it solve the problem? As I understand the same GC tuning
         | will need to take place.
        
           | throwaway7783 wrote:
           | GP might've meant AOT
        
             | cellularmitosis wrote:
             | grall removes the need for JIT but does it also remove the
             | need for GC?
        
           | chrisseaton wrote:
           | Graal has better escape analysis and so produces less
           | garbage, which is the very best kind of garbage collection!
           | 
           | That might be what they meant.
        
       | _old_dude_ wrote:
       | Tuning GCs like it was 2014 again.
       | 
       | It's a little bit sad that they did not hire an expert.
        
         | vajrabum wrote:
         | I don't understand what you're trying to say here. Is the
         | methodology they're using outdated? Or perhaps their approach
         | seems naive? Or are you saying something else?
        
           | _old_dude_ wrote:
           | The whole section on how GCs work is far from accurate. It
           | only describes how one GC, CMS, works or worked because it
           | had been removed. The permanent generation does not exist
           | anymore since Java 8, etc
           | 
           | And i'm far from being an expert in GCs, but the basics are
           | just wrong.
        
             | viscanti wrote:
             | Maybe they have to use some kind of legacy JVM library that
             | doesn't work on Java 8? It looks like they're a GoLang
             | shop, so I'm not sure why we'd assume their Java stuff is
             | representative of anything other than what they have to do
             | because some useful JVM library for mapping or something
             | needs to be supported.
        
               | _old_dude_ wrote:
               | The problem is that Java 7 is 10 years old, it means that
               | you are missing 10 years of optimization. For GCs that's
               | a lot.
               | 
               | GCs before ~2010 have been optimized for throughput more
               | than for latency. Since then, you can choose.
               | 
               | Since 10 years, G1 let you set your maximum pause time
               | and you get a huge warning if it miss that target, ZGC or
               | Shenandoah are from the beginning latency first,
               | throughput second.
        
               | erik_seaberg wrote:
               | Yikes. Fortunately Java is a supported language at Uber,
               | though some teams are more serious about it than others.
               | Scala is also accepted because of Spark.
        
               | viscanti wrote:
               | I guess I misunderstood things then. It's difficult to
               | reconcile why they'd be using a legacy version of Java
               | that required these specific GC tunings if they weren't
               | forced to use it because of libraries that couldn't be
               | replicated or replaced in GoLang. Sounds like it's a much
               | bigger mess there than what I had assumed (I guess I was
               | giving them too much of the benefit of the doubt).
        
               | vsto wrote:
               | It is Hadoop that requires (mostly) Java 8 and Java 11
               | (runtime only). https://cwiki.apache.org/confluence/displ
               | ay/HADOOP/Hadoop+Ja...
        
         | StreamBright wrote:
         | Using Hadoop like it was 2014 again too.
         | 
         | I am surprised how much legacy inefficient crap is lingering
         | around in companies like Uber.
        
           | pavitheran wrote:
           | What should be used instead of Hadoop DFS in 2021?
        
             | throwaway7783 wrote:
             | Oh, these cloud kids...
             | 
             | Edit: instant downvotes. Okay, S3/GCS/Azure is the typical
             | answer (egress costs be damned)
        
               | StreamBright wrote:
               | I guess the same was said about those non-mainframe kids
               | 30 years ago. Tech gets cheaper and better. You can rent
               | CPU time instead of buying a server. It is just that
               | simple.
        
               | ppf wrote:
               | You mean, like a mainframe? Tech goes round in circles,
               | more like.
        
             | StreamBright wrote:
             | Depends on multiple factors: - S3 or compatible would be
             | trivial choice for storage
             | 
             | - if on-prem is a must there are multiple options,
             | generally something with erasure codes (it is a game
             | changer for storage)
             | 
             | So far I have been using enterprise storage (that has some
             | potential problems when mounted as nfs volumes), works for
             | petabytes, already decouples storage from compute.
             | 
             | More recently I was experimenting with MinIO. No conclusion
             | so far.
             | 
             | The problems are with Hadoop:
             | 
             | - unfortunate design choices (namenode??)
             | 
             | - extremely unfortunate implementation (I probably spent
             | more time in the Hadoop codebase than any other, found many
             | bugs, some I could fix, most I couldn't)
             | 
             | I think I have migrated away from Hadoop 10 PB worth of
             | data infra in the last 5 years, mostly to AWS, some to
             | Azure. Average cost saving is between 10-30% yoy.
             | 
             | Some comments point out the network cost. The reality is
             | most companies collect a giant amount of data (ingress) and
             | publish dashboards (egress). It makes cloud pretty viable.
             | 
             | S3 is beating the shit out of HDFS in reliability and cost,
             | even though most Hadoop shops spread the fud that it is
             | slow. Same way these companies used to spread the fud that
             | snappy is best for data compression.
             | 
             | As of 2021 even the latest adopters (banks and insurance
             | companies) use cloud. Maybe extremely few dogmatic
             | companies remain in the onprem crowd. Even those will
             | eventually give up.
        
               | elmalto wrote:
               | Did you move the compute layer to AWS as well? Did you
               | see similar savings there as well for non-burst payloads?
        
           | viscanti wrote:
           | > I am surprised how much legacy inefficient crap is
           | lingering around in companies like Uber.
           | 
           | Maybe there are some JVM specific libraries they need for
           | things like mapping that don't exist on GoLang? From reading
           | their tech blog, it sounds like they're mostly a GoLang shop
           | and they were apparently Python before that. So it seems like
           | they're probably forced into using Java because some of the
           | libraries they use aren't worth rewriting in-house to avoid
           | having to use the JVM.
        
             | StreamBright wrote:
             | Don't get me wrong I have nothing against Java/JVM. I
             | pretty much appreciate that tech. The criticism was
             | specifically against Hadoop. I spent nearly 10 years on it.
             | Luckily it is stack that most companies migrating away
             | from.
        
               | gher-shyu3i wrote:
               | What do they migrate to if they need on-prem solutions?
        
           | closeparen wrote:
           | Well, throwing away and recreating all the data pipelines
           | that have ben written since 2014 would be pretty inefficient.
        
       ___________________________________________________________________
       (page generated 2021-05-18 23:01 UTC)