[HN Gopher] Ray: A Distributed Framework for Emerging AI Applica...
       ___________________________________________________________________
        
       Ray: A Distributed Framework for Emerging AI Applications
        
       Author : mlerner
       Score  : 78 points
       Date   : 2021-07-04 16:24 UTC (6 hours ago)
        
 (HTM) web link (www.micahlerner.com)
 (TXT) w3m dump (www.micahlerner.com)
        
       | orf wrote:
       | How does this compare to Dask?
        
       | reubenbond wrote:
       | Given how Ray "provides [...] exactly-once semantics" for its
       | actors, you could draw similarities between it and workflow-as-
       | code frameworks such as https://temporal.io. The way that Ray
       | splits up actors and tasks looks similar to Temporal's Workflows
       | + Activities split: Workflows (Ray actors) contain orchestration
       | logic and have their method calls/results durably logged.
       | Activities (Ray tasks) perform the expensive computations and any
       | interaction with external systems and are not durably logged.
       | 
       | If you're in the .NET ecosystem or interested in distributed
       | systems in general, you may like Orleans
       | (https://github.com/dotnet/orleans), which I work on at
       | Microsoft. Orleans contributes the Virtual Actor model which
       | other modern actor frameworks are starting to adopt since it is
       | well suited for the hectic, failure-prone environment of
       | distributed systems (which those so-called Cloud Native Apps live
       | in). The Ray paper linked from the article
       | (https://www.usenix.org/system/files/osdi18-moritz.pdf) discusses
       | some similarities. Slight correction on the paper: it states that
       | "For message delivery, Orleans provides at-least-once [...]
       | semantics". It's at-most-once. At-least-once messaging semantics
       | (usually implemented via automatic retries) aren't ideal for
       | these kinds of systems, in my opinion.
        
         | mlerner wrote:
         | Neat - thanks for sharing! A friend mentioned Orleans, queueing
         | up the paper (found this one https://www.microsoft.com/en-
         | us/research/publication/orleans...) for future reading.
        
       | phissenschaft wrote:
       | Great work and kudos to the Ray team! It's definitely a fresh
       | look with a lot of lessons learned from previous generations
       | (e.g. spark).
       | 
       | There are a few nice features I wish Ray would eventually get to.
       | 
       | On the user experience side, it would be nice to have task level
       | logs: often time it's easier for users to reason at task level,
       | especially the task is a facade that triggers other complicated
       | library/subprocess calls.
       | 
       | For the scheduler, if there's more native support for
       | sharded/bundled/partitioned tasks and
       | https://cloud.google.com/blog/products/gcp/no-shard-left-beh...
        
       | robertnishihara wrote:
       | Hi all, I'm one of the authors of Ray, thanks for all the
       | comments and discussion! To add to the discussion, I'll mention a
       | few conceptual things that have changed since we wrote the paper.
       | 
       | *Emphasis on the library ecosystem*
       | 
       | A lot of our focus is on building an ecosystem of libraries on
       | top of Ray (much, but not all, of the focus is on machine
       | learning libraries). Some of these libraries are built natively
       | on top of Ray such as Ray Tune for scaling hyperparameter search
       | (http://tune.io), RLlib for scaling reinforcement learning
       | (http://rllib.io), Ray Serve for scaling model serving
       | (http://rayserve.org/), and RaySGD for scaling training
       | (https://docs.ray.io/en/master/raysgd/raysgd.html).
       | 
       | Some of the libraries are popular libraries on their own, which
       | now integrate with Ray such as Horovod
       | (https://eng.uber.com/horovod-ray/), XGBoost
       | (https://xgboost.readthedocs.io/en/latest/tutorials/ray.html),
       | and Dask for dataframes (https://docs.ray.io/en/master/dask-on-
       | ray.html). While Dask itself has similarities to Ray (especially
       | the task part of the Ray API), Dask also has libraries for
       | scaling dataframes and arrays, which can be used as part of the
       | Ray ecosystem (more details at
       | https://www.anyscale.com/blog/analyzing-memory-management-an...).
       | 
       | Many Ray users start using Ray for one of the libraries (e.g., to
       | scale training or hyperparameter search) as opposed to just for
       | the core system.
       | 
       | *Emphasis on serverless*
       | 
       | Our goal with Ray is to make distributed computing as easy as
       | possible. To do that, we think the serverless direction, which
       | allows people to just focus on their code and not on
       | infrastructure, is very important. Here, I don't mean serverless
       | purely in the sense of functions as a service, but something that
       | would allow people to run a wide variety of applications
       | (training, data processing, inference, etc) elastically in the
       | cloud without configuring or thinking about infrastructure.
       | There's a lot of ongoing work here (e.g., to improve autoscaling
       | up and down with heterogeneous resource types). More details on
       | the topic https://www.anyscale.com/blog/the-ideal-foundation-for-
       | a-gen....
       | 
       | If you're interested in this kind of stuff, consider joining us
       | at Anyscale https://jobs.lever.co/anyscale.
        
       | Tenoke wrote:
       | I've used Ray for about a year (typically for thousands of ML
       | tasks, spread across ~48-120 cores simultaneously) and it's a
       | pleasure to use at least using the basic API. Admittedly, I had
       | problems when trying to use some of the more advanced approaches
       | but I didn't really need them and I can definitely recommend it
       | since the performance is great.
        
         | samvher wrote:
         | Just out of curiosity, what kind of work requires thousands of
         | ML tasks? (Assuming you're talking about training and not
         | inference?)
        
           | Tenoke wrote:
           | The thousands of tasks are inference but I also use ray to
           | train/update a double digit models simultaneously (~1 per
           | user).
        
       | maxthegeek1 wrote:
       | Honestly looks really cool, I want to try using Ray.
        
       | wolfium3 wrote:
       | There was a recent talk at PyCon US 2021 on this :)
       | 
       | TALK / SangBin Cho / Data Processing on Ray
       | [https://www.youtube.com/watch?v=DNLqvdov_J4]
        
       | sillysaurusx wrote:
       | I used Ray to train a massive GPT model by putting each layer on
       | a separate TPU. Ray was able to send all the gradients back and
       | forth as needed.
       | 
       | It scaled fine up to 33 TPUs (i.e. 33 layers).
       | 
       | Ray is impressive as hell.
       | 
       | By the way, I didn't write the code to do any of that. kindiana,
       | aka "the guy that wrote GPT-J", also happened to write this:
       | 
       | https://github.com/kingoflolz/swarm-jax
       | 
       | I just ran it and it worked. Which is extraordinarily unusual for
       | TPUs, historically speaking.
       | 
       | I'm pushing my luck at this point, since it's crossing the line
       | from enthusiasm to spam. But if you want to try out Ray on TPUs
       | like I did here, I posted a (massive) amount of detail on how to
       | get started, and why:
       | https://news.ycombinator.com/item?id=27728225
       | 
       | That's the last I'll be mentioning it for some time, though.
       | 
       | Ray + JAX is such a killer combo.
        
       | Rich_Morin wrote:
       | Although it's early days, Jose Valim and some other folks are
       | working on adding AI-related capabilities to Elixir and the
       | (Erlang) BEAM. See "Introducing Nx"
       | (https://www.youtube.com/watch?v=fPKMmJpAGWc) for an intro.
       | 
       | Given that they already have a robust Actor model as a base to
       | work from, it occurs to me that they may be able to use some of
       | Ray's ideas as they go along...
        
       | ramoz wrote:
       | I spent the past year and a half deploying a distributed backend
       | for bert-like models & we ultimately chose a K8s architecture &
       | "precise" affinity mapped out, which is still hard due to cpu
       | pinning issues. On the frontend-api, Golang gives us the ability
       | to distribute & split requests coming in (10-20M / day & batch
       | size averaging ~3K which splits into 50 due to model
       | constraints). Embeddings are stored on those nodes, local ssds.
       | Those nodes are only a handful. Models run on 2 pools, 1
       | dedicated and one preemptible (most nodes here) which gives us
       | cost optimization and scheduling is simplified due to K8s. We
       | have anywhere from 120-300 of these high compute nodes.
       | 
       | Wondering if anyone has similar deployments and migrated to Ray.
       | We've evaluated it but can't afford a large migration at this
       | point & would also need to test quite a bit & rebuild our whole
       | automation for infra and apps.
       | 
       | Really interested though as the infrastructure isn't cheap and
       | every time the model updates we are basically re-architecting it.
       | Right now we are moving everything away from python
       | (gunicorn/flask, and MKL) to Golang as we can get better
       | efficiencies with data serialization (numpy ops are the biggest
       | time eaters right now ... model input vectors constructed from
       | flatbuffers)
        
         | vvladymyrov wrote:
         | Are you running inference on CPU or GPU?
        
         | vvladymyrov wrote:
         | Have you considered Rust? It had neat python interconnect. My
         | team are doing early experiments - offloading tight
         | loops/expensive computations in existing Python inference app
         | to Rust.
        
       ___________________________________________________________________
       (page generated 2021-07-04 23:00 UTC)