[HN Gopher] Elixir and Machine Learning in 2024 so far: MLIR, Ar...
___________________________________________________________________
Elixir and Machine Learning in 2024 so far: MLIR, Arrow, structured
LLM, etc.
Author : clessg
Score : 136 points
Date : 2024-05-29 12:38 UTC (10 hours ago)
(HTM) web link (dashbit.co)
(TXT) w3m dump (dashbit.co)
| bnchrch wrote:
| I think Elixir might have the most wonderful community out there.
|
| Really cool to see the concerted effort in parallel going into
| both ML problem space, and into introducing typing.
| davidw wrote:
| From a "marketing strategy" point of view, I wonder what the
| thinking is in investing in this stuff so heavily when Python
| seems to be kind of the go-to? Will they be able to create a
| "good enough" environment to do that kind of work with Elixir? Is
| it just someone or a company scratching their own itch?
|
| This is a genuine question - I don't know much about "AI stuff",
| but do know something about the economics of programming
| languages and I'm "intellectually curious" about what is driving
| this and what the goals are, rather than critical of Elixir. I
| love working with BEAM and miss it.
| jsiva wrote:
| I can't say I know much about AI stuff or BEAM. But my best
| guess is that elixir native ML should integrate well with OTP's
| distributed computing capabilities. As an outsider to the
| elixir ecosystem, I've seen glimpses of elixir ML here and
| there but no mention of attempting to bridge the python ML
| ecosystem into elixir.
| PaulStatezny wrote:
| One factor may be that a few years back the language creator
| (Jose Valim, also the author of this article) announced that
| the language is basically "completed", and that they would
| shift focus to other things like developer tooling and other
| projects outside of the language itself.
|
| Jose is quite prolific, so I think it's natural that he moves
| on to things like this. It's hard to know what reception will
| be like until you build it.
| davidw wrote:
| From a strictly "marketing" point of view, if you want to
| grow the language and ecosystem, it seems the successful move
| is to stake out a place where you're likely to win.
|
| I think often this happens more or less by accident rather
| than conscious design - but think of something like PHP which
| made it really easy to whip up quick web pages, or Rails
| which drive Ruby adoption for a better, more structured, but
| still dynamic and quick web programming experience.
|
| And I suppose part of those happy accidents are people just
| hacking on something they think is cool, so I wouldn't stress
| too much about the "marketing aspect". I'm just curious what
| drove it.
| jolux wrote:
| My guess is that Jose and the core team are both personally
| interested in the big wave of ML stuff we've been
| experiencing recently and also want to demonstrate that
| Elixir is a viable platform for doing this work to teams
| which have adopted Elixir and are interested in ML but
| don't want to add a bunch of Python into their codebase.
| barrell wrote:
| I would also guess that Elixir led to some crazy things
| they didn't even imagine possible with web, like phoenixs
| live view [1]. Even if they don't have explicit ideas of
| how it will impact ML, it'll be really interesting to
| try.
|
| [1]: https://phoenixframework.org/blog/phoenix-
| liveview-1.0-relea...
| bhaney wrote:
| > a few years back the language creator announced that the
| language is basically "completed"
|
| And then began adding an entire static type system to the
| language
| elbasti wrote:
| Strictly optional and without changing the language API,
| fwiw. So about as smooth/painless an experience as
| possible.
|
| Its addition into the next minor verision (1.17) will bring
| warnings that address some of the most common footguns in
| the language, like comparing structs.
| dpflan wrote:
| Jose may show up here and answer your questions...
| https://news.ycombinator.com/user?id=josevalim
| andy_ppp wrote:
| I actually think the BEAM is an ideal environment for machine
| learning, sharding things across machines. The only thing I'm
| not sure of is if PyTorch etc. are more optimised than XLA the
| backend Axon uses... would be good to see some performance
| comparisons of a big LLM running on both. For everything else
| I'd suggest Elixir was a better experience.
| regulation_d wrote:
| We use Elixir for our primary application, with a fair amount
| of Python code to manage our ML pipelines. But we also need
| real-time inference and it's really convenient/performant to be
| able to just do that in-app. So I, for one, am very grateful
| for the work that's been done provide the level of tooling in
| Elixir. It has worked quite well for us.
| cess11 wrote:
| It's very serious. The BEAM had a problem in that it lacked
| solid number-crunching libraries, so some folks solved that,
| and when they did distributed ML-capabilities kind of just fell
| out as a neat bonus so some folks did that too.
|
| So now it's integrated into the basic Livebook, just boot it
| and go to that example and you have a transcriber or whatever
| as boilerplate to play around with. Want something else from
| Huggingface? Just switch out a couple of strings referencing
| that model and all the rest is sorted for you, except if the
| tokenizer or whatever doesn't follow a particular format but in
| that case you just upload it to some free web service and make
| a PR with the result and reference that version hash
| specifically and it'll work.
|
| Python is awful for sharding, concurrency, distribution, that
| kind of thing. With the BEAM you can trivially cluster with an
| instance on a dedicated GPU-outfitted machine that runs the LLM
| models or what have you and there you have named processes that
| control process pools for running queries and they'll be
| immediately available to any BEAM that clusters with it. Fine,
| you'll do some VPN or something that requires a bit of
| experience with networking, but compared to building a robust,
| distributed system in Python it's easy mode.
|
| I don't know what the goals are, but I perceive the
| Nx/Bumblebee/BEAM platform as obviously better than Python for
| building production systems. There might be advantages to
| Python when creating and training models, I'm not sure, but if
| you already have the models and need to serve more than one,
| and want the latency to be low so the characteristically slow
| response feels a little faster, and don't already have a big
| Kubernetes system for running many Python applications in a
| distributed manner, then this is for you and it'll be better
| than good enough until you've created a rather large success.
| ricketycricket wrote:
| > except if the tokenizer or whatever doesn't follow a
| particular format but in that case you just upload it to some
| free web service and make a PR with the result and reference
| that version hash specifically and it'll work.
|
| May I ask to which service you are referring?
| TJSomething wrote:
| I feel like a big audience would be people moving away from
| Spark.
| vvpan wrote:
| Can you provide a little more context, I am curious.
| bluevlahblah wrote:
| People have to realise these are mostly for hobby. It is really
| hard to get these working with other libraries.
|
| Take explorer, it's a mess trying to implement dplyr verbs in
| elixir. Anyone trying to use it is going to hit its limitations
| sooner or later. I tried migrating to it from polars but it is
| too frustrating.. gave up after some time.
|
| Why will people use half baked libraries instead of python ? I
| will stick to Keres/pytorch, polars, etc
| victorbjorklund wrote:
| At this point is isnt trying to convert a happy python user
| like you. Rather to give tools to teams whose app is already in
| Elixir and the devs knows Elixir. Instead of bringing in Python
| to the mix you can use Elixir
| ch4s3 wrote:
| I don't think a lot of people are using these in production
| yet, but you can't become fully baked without spending some
| time being half baked.
|
| Sometimes though it's nice to have most of your stuff in one
| language.
| barrell wrote:
| I've been considering using elixir for a while. With these
| libraries, it's now entirely feasible to move my backend over
| to elixir, getting rid of 95% of my python code in the process.
| The tasks leftover that still need python can easily each fit
| in their own <100 line file and work just as well using other
| languages to process the data. Most of it needs to be run on a
| gpu anyways so it's already sandboxed wrt deployments
|
| It's not an all or nothing game, and I rue the day that I
| decided to do everything in python just because I needed a
| couple LoC to call a python module.
| jolux wrote:
| Explorer is still pre-1.0.
| mrdoops wrote:
| IMO the big win for Elixir/Nx/Bumblebee/etc is that you can do
| batched distributed inference out of the box without deploying
| anything separate to your app or hitting an API. Massive
| complexity reduction and you can more easily scale up or down.
| https://hexdocs.pm/nx/Nx.Serving.html#content
|
| And there's also a scale to 0 story for when you're not using
| that GPU at all: https://github.com/phoenixframework/flame
|
| 1 language/toolchain. 1 deployable app. Real time and distributed
| machine learning baked in. 1 dev can go really far.
| 6gvONxR4sf7o wrote:
| I've been really curious about BEAM languages but never made
| the leap. How well does it manage heterogeneous compute? I'm
| used to other languages making me define what happens on CPU vs
| GPU and defining cross-machine talk around those kinds of
| considerations.
|
| What parts of that does elixir (and company) allow me to not
| write? Is there a good balance between abstractions when it
| comes to still maybe wanting control over what goes where
| (heteregeneity)?
|
| Super curious and kinda looking for an excuse here :)
| ricketycricket wrote:
| May not answer all your questions, but this may be a good
| starting point: https://hexdocs.pm/nx/Nx.Defn.html
| elbasti wrote:
| The BEAM is pretty high level, and it's REALLY good at
| managing distributed compute at the thread or device level.
|
| If you have a parallelizeable workflow, it's very easy to
| make it (properly!) parallel locally, where by "properly" I
| mean having supervision trees, sane restart behavior, etc.
|
| And once you have that you can extend that parallelism to
| different nodes in a network (with the same sanity around
| supervision and discovery) basically for free. Like, one-
| line-of-code for free.
|
| Nonetheless, it's all message-passing, and so pretty high
| level. AFAIK it's not designed for parallelizing compute at
| GPU scale.
|
| That being said, if you have multiple GPUs and multiple
| machines that have to _coordinate_ between them, Elixir
| /Erlang is pretty much perfect.
| itronitron wrote:
| You might want to look at the java aparapi project
|
| _Aparapi allows Java developers to take advantage of the
| compute power of GPU and APU devices by executing data
| parallel code fragments on the GPU rather than being confined
| to the local CPU. It does this by converting Java bytecode to
| OpenCL at runtime and executing on the GPU, if for any reason
| Aparapi can 't execute on the GPU it will execute in a Java
| thread pool._
|
| https://aparapi.github.io/
|
| https://github.com/aparapi/aparapi
|
| but avoid the non-official fork which sometimes comes up in
| search results.
| melodyogonna wrote:
| MLIR enables so much potential to systems that use it
| tonyhb wrote:
| MLIR is cool and has an exciting future for sure.
| behnamoh wrote:
| My experience with Elixir onboarding was meh. Spent hours trying
| to setup the LSP in VSCode and Neovim. Their pseudo-official LSP
| (elixir-ls) didn't work at all. I even made a post about it on
| Reddit, Github, and here. No one really knew what was going on.
|
| Even with Haskell you have something like ghcup and you're good
| to go. Not to mention Rust's amazing Cargo and Go's tooling as
| well.
|
| So far, Elixir has been even more challenging to just get up and
| running than Common Lisp!
|
| By the way, the official Elixir website recommends using Homebrew
| to install it. But almost everyone in the Github issues and
| comments says ASDF is the way to go.
| cess11 wrote:
| ASDF is easy to use once you've learned a few of the basic
| commands, and that way you'll have an easy time when new
| versions come out and want to check out new features. Like the
| built-in JSON parser, the gradual typing when it drops, things
| like that.
|
| If you built something useful you might not want to upgrade it
| just to look at the new stuff, and if you built nothing and
| just drop into iex a couple of times per year it'll still be
| easier to pull in the latest BEAM and Elixir versions and play
| around than figuring out if Homebrew has the new version, and
| if so, whether it installs nicely over the old or not.
|
| I don't think I've got the LSP running, might check tomorrow.
| It's OK, the IDE autocompletes some things and for me
| development basically happens in the REPL and then gets pasted
| into tests and the project anyway. And iex has good
| autocomplete and help functions and so on.
|
| Edit: A hurdle might be to install the GUI libraries needed for
| Observer, but you'll probably be able to search out an
| incantation for your operating system once you're into doing
| stuff with process trees.
| behnamoh wrote:
| I think the Python dev style I adopted can't be easily ported
| to Elixir. In Python, I rely heavily on an LSP because I want
| to fiddle with a lot of functions/classes located deeply in
| libraries. In VSCode, I simply press CMD and click on any
| function (or in Neovim, I `gf` or `gd` it). I thought it'd
| make even more sense in Elixir because apparently everything
| is a module. Am I missing something? How do you use iex
| efficiently?
| ricketycricket wrote:
| Sorry you are having so many issues. That has not been my
| experience with elixir-ls either locally or in Codespaces.
| Just want to say that when you do get it working, it does
| indeed have that and many more features. If you are
| interested, the Elixir slack is very active and helpful and
| there is a #language-server channel.
| behnamoh wrote:
| Thank you! It seems the Elixir LSP does provide useful
| features... I'm worried that even after getting it
| working on my Mac, the same troubles happen again when I
| move to remote machines. I'll try the Elixir slack!
| clessg wrote:
| Yeah, the LSP situation remains a sore point, which is deeply
| unfortunate. One of the big reasons I like Gleam! Luckily,
| there are new contenders popping up to hopefully solve the
| issues with elixir-ls: try https://github.com/elixir-
| tools/next-ls or https://github.com/lexical-lsp/lexical. They
| might give a better experience. In fact, the creator of Elixir
| recently started directly sponsoring next-ls, so it's probably
| a safe enough bet.
|
| > By the way, the official Elixir website recommends using
| Homebrew to install it. But almost everyone in the Github
| issues and comments says ASDF is the way to go.
|
| The Elixir website is right. Just use Homebrew until you find a
| real need for asdf or similar tools. It's far simpler.
|
| asdf (or mise[0]) is merely a way to manage different runtime
| versions between various projects, you would use it the same
| way as one might use rbenv/rvm, nvm/n, pyenv, or even
| Docker/nix, and so on. You don't need it until you have several
| ongoing projects requiring different runtime versions. If you
| reach that point, great! It'll be worth the effort then, and it
| isn't difficult.
|
| Personally, I just use Homebrew Elixir for easy ad-hoc access
| to iex/livebook. If I truly need reproducible environments,
| devbox[1] (a sort of nix wrapper) is nice and extremely
| straightforward.
|
| Tl;dr: Just use Homebrew. If your requirements expand beyond
| that, you probably have far more challenging problems than
| installing asdf (or whatever).
|
| [0] https://mise.jdx.dev/dev-tools/comparison-to-asdf.html
|
| [1] https://www.jetify.com/devbox
| behnamoh wrote:
| Thanks! I just checked out devbox, it's so great to have a
| more user-friendly nix for people like me who don't want to
| mess with nix!
|
| Also, great point about Gleam! What I like about it, other
| than types and JS compilation, is exactly the tooling! The
| devs clearly understood that tooling is extremely important
| these days and have focused on that. If it's truly inter-
| operable with Elixir/Erlang libs, I'll probably just use
| Gleam instead.
| dpflan wrote:
| To the author, I noticed a typo: a misspelling of "meachine"
| instead of "machine"
|
| """ These features bring Numerical Elixir and its ability to
| setup distributed model serving, over CPUs and GPUs, to
| traditional meachine learning algorithms, allowing developers and
| data practitioners to tackle a wider number of problems within
| the Elixir ecosystem. """
___________________________________________________________________
(page generated 2024-05-29 23:01 UTC)