[HN Gopher] Ray: A Distributed Framework for Emerging AI Applica...
___________________________________________________________________
Ray: A Distributed Framework for Emerging AI Applications
Author : mlerner
Score : 78 points
Date : 2021-07-04 16:24 UTC (6 hours ago)
(HTM) web link (www.micahlerner.com)
(TXT) w3m dump (www.micahlerner.com)
| orf wrote:
| How does this compare to Dask?
| reubenbond wrote:
| Given how Ray "provides [...] exactly-once semantics" for its
| actors, you could draw similarities between it and workflow-as-
| code frameworks such as https://temporal.io. The way that Ray
| splits up actors and tasks looks similar to Temporal's Workflows
| + Activities split: Workflows (Ray actors) contain orchestration
| logic and have their method calls/results durably logged.
| Activities (Ray tasks) perform the expensive computations and any
| interaction with external systems and are not durably logged.
|
| If you're in the .NET ecosystem or interested in distributed
| systems in general, you may like Orleans
| (https://github.com/dotnet/orleans), which I work on at
| Microsoft. Orleans contributes the Virtual Actor model which
| other modern actor frameworks are starting to adopt since it is
| well suited for the hectic, failure-prone environment of
| distributed systems (which those so-called Cloud Native Apps live
| in). The Ray paper linked from the article
| (https://www.usenix.org/system/files/osdi18-moritz.pdf) discusses
| some similarities. Slight correction on the paper: it states that
| "For message delivery, Orleans provides at-least-once [...]
| semantics". It's at-most-once. At-least-once messaging semantics
| (usually implemented via automatic retries) aren't ideal for
| these kinds of systems, in my opinion.
| mlerner wrote:
| Neat - thanks for sharing! A friend mentioned Orleans, queueing
| up the paper (found this one https://www.microsoft.com/en-
| us/research/publication/orleans...) for future reading.
| phissenschaft wrote:
| Great work and kudos to the Ray team! It's definitely a fresh
| look with a lot of lessons learned from previous generations
| (e.g. spark).
|
| There are a few nice features I wish Ray would eventually get to.
|
| On the user experience side, it would be nice to have task level
| logs: often time it's easier for users to reason at task level,
| especially the task is a facade that triggers other complicated
| library/subprocess calls.
|
| For the scheduler, if there's more native support for
| sharded/bundled/partitioned tasks and
| https://cloud.google.com/blog/products/gcp/no-shard-left-beh...
| robertnishihara wrote:
| Hi all, I'm one of the authors of Ray, thanks for all the
| comments and discussion! To add to the discussion, I'll mention a
| few conceptual things that have changed since we wrote the paper.
|
| *Emphasis on the library ecosystem*
|
| A lot of our focus is on building an ecosystem of libraries on
| top of Ray (much, but not all, of the focus is on machine
| learning libraries). Some of these libraries are built natively
| on top of Ray such as Ray Tune for scaling hyperparameter search
| (http://tune.io), RLlib for scaling reinforcement learning
| (http://rllib.io), Ray Serve for scaling model serving
| (http://rayserve.org/), and RaySGD for scaling training
| (https://docs.ray.io/en/master/raysgd/raysgd.html).
|
| Some of the libraries are popular libraries on their own, which
| now integrate with Ray such as Horovod
| (https://eng.uber.com/horovod-ray/), XGBoost
| (https://xgboost.readthedocs.io/en/latest/tutorials/ray.html),
| and Dask for dataframes (https://docs.ray.io/en/master/dask-on-
| ray.html). While Dask itself has similarities to Ray (especially
| the task part of the Ray API), Dask also has libraries for
| scaling dataframes and arrays, which can be used as part of the
| Ray ecosystem (more details at
| https://www.anyscale.com/blog/analyzing-memory-management-an...).
|
| Many Ray users start using Ray for one of the libraries (e.g., to
| scale training or hyperparameter search) as opposed to just for
| the core system.
|
| *Emphasis on serverless*
|
| Our goal with Ray is to make distributed computing as easy as
| possible. To do that, we think the serverless direction, which
| allows people to just focus on their code and not on
| infrastructure, is very important. Here, I don't mean serverless
| purely in the sense of functions as a service, but something that
| would allow people to run a wide variety of applications
| (training, data processing, inference, etc) elastically in the
| cloud without configuring or thinking about infrastructure.
| There's a lot of ongoing work here (e.g., to improve autoscaling
| up and down with heterogeneous resource types). More details on
| the topic https://www.anyscale.com/blog/the-ideal-foundation-for-
| a-gen....
|
| If you're interested in this kind of stuff, consider joining us
| at Anyscale https://jobs.lever.co/anyscale.
| Tenoke wrote:
| I've used Ray for about a year (typically for thousands of ML
| tasks, spread across ~48-120 cores simultaneously) and it's a
| pleasure to use at least using the basic API. Admittedly, I had
| problems when trying to use some of the more advanced approaches
| but I didn't really need them and I can definitely recommend it
| since the performance is great.
| samvher wrote:
| Just out of curiosity, what kind of work requires thousands of
| ML tasks? (Assuming you're talking about training and not
| inference?)
| Tenoke wrote:
| The thousands of tasks are inference but I also use ray to
| train/update a double digit models simultaneously (~1 per
| user).
| maxthegeek1 wrote:
| Honestly looks really cool, I want to try using Ray.
| wolfium3 wrote:
| There was a recent talk at PyCon US 2021 on this :)
|
| TALK / SangBin Cho / Data Processing on Ray
| [https://www.youtube.com/watch?v=DNLqvdov_J4]
| sillysaurusx wrote:
| I used Ray to train a massive GPT model by putting each layer on
| a separate TPU. Ray was able to send all the gradients back and
| forth as needed.
|
| It scaled fine up to 33 TPUs (i.e. 33 layers).
|
| Ray is impressive as hell.
|
| By the way, I didn't write the code to do any of that. kindiana,
| aka "the guy that wrote GPT-J", also happened to write this:
|
| https://github.com/kingoflolz/swarm-jax
|
| I just ran it and it worked. Which is extraordinarily unusual for
| TPUs, historically speaking.
|
| I'm pushing my luck at this point, since it's crossing the line
| from enthusiasm to spam. But if you want to try out Ray on TPUs
| like I did here, I posted a (massive) amount of detail on how to
| get started, and why:
| https://news.ycombinator.com/item?id=27728225
|
| That's the last I'll be mentioning it for some time, though.
|
| Ray + JAX is such a killer combo.
| Rich_Morin wrote:
| Although it's early days, Jose Valim and some other folks are
| working on adding AI-related capabilities to Elixir and the
| (Erlang) BEAM. See "Introducing Nx"
| (https://www.youtube.com/watch?v=fPKMmJpAGWc) for an intro.
|
| Given that they already have a robust Actor model as a base to
| work from, it occurs to me that they may be able to use some of
| Ray's ideas as they go along...
| ramoz wrote:
| I spent the past year and a half deploying a distributed backend
| for bert-like models & we ultimately chose a K8s architecture &
| "precise" affinity mapped out, which is still hard due to cpu
| pinning issues. On the frontend-api, Golang gives us the ability
| to distribute & split requests coming in (10-20M / day & batch
| size averaging ~3K which splits into 50 due to model
| constraints). Embeddings are stored on those nodes, local ssds.
| Those nodes are only a handful. Models run on 2 pools, 1
| dedicated and one preemptible (most nodes here) which gives us
| cost optimization and scheduling is simplified due to K8s. We
| have anywhere from 120-300 of these high compute nodes.
|
| Wondering if anyone has similar deployments and migrated to Ray.
| We've evaluated it but can't afford a large migration at this
| point & would also need to test quite a bit & rebuild our whole
| automation for infra and apps.
|
| Really interested though as the infrastructure isn't cheap and
| every time the model updates we are basically re-architecting it.
| Right now we are moving everything away from python
| (gunicorn/flask, and MKL) to Golang as we can get better
| efficiencies with data serialization (numpy ops are the biggest
| time eaters right now ... model input vectors constructed from
| flatbuffers)
| vvladymyrov wrote:
| Are you running inference on CPU or GPU?
| vvladymyrov wrote:
| Have you considered Rust? It had neat python interconnect. My
| team are doing early experiments - offloading tight
| loops/expensive computations in existing Python inference app
| to Rust.
___________________________________________________________________
(page generated 2021-07-04 23:00 UTC)