hngopher.com

       [HN Gopher] Elixir and Machine Learning in 2024 so far: MLIR, Ar...
       ___________________________________________________________________
        
       Elixir and Machine Learning in 2024 so far: MLIR, Arrow, structured
       LLM, etc.
        
       Author : clessg
       Score  : 136 points
       Date   : 2024-05-29 12:38 UTC (10 hours ago)
        
 (HTM) web link (dashbit.co)
 (TXT) w3m dump (dashbit.co)
        
       | bnchrch wrote:
       | I think Elixir might have the most wonderful community out there.
       | 
       | Really cool to see the concerted effort in parallel going into
       | both ML problem space, and into introducing typing.
        
       | davidw wrote:
       | From a "marketing strategy" point of view, I wonder what the
       | thinking is in investing in this stuff so heavily when Python
       | seems to be kind of the go-to? Will they be able to create a
       | "good enough" environment to do that kind of work with Elixir? Is
       | it just someone or a company scratching their own itch?
       | 
       | This is a genuine question - I don't know much about "AI stuff",
       | but do know something about the economics of programming
       | languages and I'm "intellectually curious" about what is driving
       | this and what the goals are, rather than critical of Elixir. I
       | love working with BEAM and miss it.
        
         | jsiva wrote:
         | I can't say I know much about AI stuff or BEAM. But my best
         | guess is that elixir native ML should integrate well with OTP's
         | distributed computing capabilities. As an outsider to the
         | elixir ecosystem, I've seen glimpses of elixir ML here and
         | there but no mention of attempting to bridge the python ML
         | ecosystem into elixir.
        
         | PaulStatezny wrote:
         | One factor may be that a few years back the language creator
         | (Jose Valim, also the author of this article) announced that
         | the language is basically "completed", and that they would
         | shift focus to other things like developer tooling and other
         | projects outside of the language itself.
         | 
         | Jose is quite prolific, so I think it's natural that he moves
         | on to things like this. It's hard to know what reception will
         | be like until you build it.
        
           | davidw wrote:
           | From a strictly "marketing" point of view, if you want to
           | grow the language and ecosystem, it seems the successful move
           | is to stake out a place where you're likely to win.
           | 
           | I think often this happens more or less by accident rather
           | than conscious design - but think of something like PHP which
           | made it really easy to whip up quick web pages, or Rails
           | which drive Ruby adoption for a better, more structured, but
           | still dynamic and quick web programming experience.
           | 
           | And I suppose part of those happy accidents are people just
           | hacking on something they think is cool, so I wouldn't stress
           | too much about the "marketing aspect". I'm just curious what
           | drove it.
        
             | jolux wrote:
             | My guess is that Jose and the core team are both personally
             | interested in the big wave of ML stuff we've been
             | experiencing recently and also want to demonstrate that
             | Elixir is a viable platform for doing this work to teams
             | which have adopted Elixir and are interested in ML but
             | don't want to add a bunch of Python into their codebase.
        
               | barrell wrote:
               | I would also guess that Elixir led to some crazy things
               | they didn't even imagine possible with web, like phoenixs
               | live view [1]. Even if they don't have explicit ideas of
               | how it will impact ML, it'll be really interesting to
               | try.
               | 
               | [1]: https://phoenixframework.org/blog/phoenix-
               | liveview-1.0-relea...
        
           | bhaney wrote:
           | > a few years back the language creator announced that the
           | language is basically "completed"
           | 
           | And then began adding an entire static type system to the
           | language
        
             | elbasti wrote:
             | Strictly optional and without changing the language API,
             | fwiw. So about as smooth/painless an experience as
             | possible.
             | 
             | Its addition into the next minor verision (1.17) will bring
             | warnings that address some of the most common footguns in
             | the language, like comparing structs.
        
         | dpflan wrote:
         | Jose may show up here and answer your questions...
         | https://news.ycombinator.com/user?id=josevalim
        
         | andy_ppp wrote:
         | I actually think the BEAM is an ideal environment for machine
         | learning, sharding things across machines. The only thing I'm
         | not sure of is if PyTorch etc. are more optimised than XLA the
         | backend Axon uses... would be good to see some performance
         | comparisons of a big LLM running on both. For everything else
         | I'd suggest Elixir was a better experience.
        
         | regulation_d wrote:
         | We use Elixir for our primary application, with a fair amount
         | of Python code to manage our ML pipelines. But we also need
         | real-time inference and it's really convenient/performant to be
         | able to just do that in-app. So I, for one, am very grateful
         | for the work that's been done provide the level of tooling in
         | Elixir. It has worked quite well for us.
        
         | cess11 wrote:
         | It's very serious. The BEAM had a problem in that it lacked
         | solid number-crunching libraries, so some folks solved that,
         | and when they did distributed ML-capabilities kind of just fell
         | out as a neat bonus so some folks did that too.
         | 
         | So now it's integrated into the basic Livebook, just boot it
         | and go to that example and you have a transcriber or whatever
         | as boilerplate to play around with. Want something else from
         | Huggingface? Just switch out a couple of strings referencing
         | that model and all the rest is sorted for you, except if the
         | tokenizer or whatever doesn't follow a particular format but in
         | that case you just upload it to some free web service and make
         | a PR with the result and reference that version hash
         | specifically and it'll work.
         | 
         | Python is awful for sharding, concurrency, distribution, that
         | kind of thing. With the BEAM you can trivially cluster with an
         | instance on a dedicated GPU-outfitted machine that runs the LLM
         | models or what have you and there you have named processes that
         | control process pools for running queries and they'll be
         | immediately available to any BEAM that clusters with it. Fine,
         | you'll do some VPN or something that requires a bit of
         | experience with networking, but compared to building a robust,
         | distributed system in Python it's easy mode.
         | 
         | I don't know what the goals are, but I perceive the
         | Nx/Bumblebee/BEAM platform as obviously better than Python for
         | building production systems. There might be advantages to
         | Python when creating and training models, I'm not sure, but if
         | you already have the models and need to serve more than one,
         | and want the latency to be low so the characteristically slow
         | response feels a little faster, and don't already have a big
         | Kubernetes system for running many Python applications in a
         | distributed manner, then this is for you and it'll be better
         | than good enough until you've created a rather large success.
        
           | ricketycricket wrote:
           | > except if the tokenizer or whatever doesn't follow a
           | particular format but in that case you just upload it to some
           | free web service and make a PR with the result and reference
           | that version hash specifically and it'll work.
           | 
           | May I ask to which service you are referring?
        
         | TJSomething wrote:
         | I feel like a big audience would be people moving away from
         | Spark.
        
           | vvpan wrote:
           | Can you provide a little more context, I am curious.
        
       | bluevlahblah wrote:
       | People have to realise these are mostly for hobby. It is really
       | hard to get these working with other libraries.
       | 
       | Take explorer, it's a mess trying to implement dplyr verbs in
       | elixir. Anyone trying to use it is going to hit its limitations
       | sooner or later. I tried migrating to it from polars but it is
       | too frustrating.. gave up after some time.
       | 
       | Why will people use half baked libraries instead of python ? I
       | will stick to Keres/pytorch, polars, etc
        
         | victorbjorklund wrote:
         | At this point is isnt trying to convert a happy python user
         | like you. Rather to give tools to teams whose app is already in
         | Elixir and the devs knows Elixir. Instead of bringing in Python
         | to the mix you can use Elixir
        
         | ch4s3 wrote:
         | I don't think a lot of people are using these in production
         | yet, but you can't become fully baked without spending some
         | time being half baked.
         | 
         | Sometimes though it's nice to have most of your stuff in one
         | language.
        
         | barrell wrote:
         | I've been considering using elixir for a while. With these
         | libraries, it's now entirely feasible to move my backend over
         | to elixir, getting rid of 95% of my python code in the process.
         | The tasks leftover that still need python can easily each fit
         | in their own <100 line file and work just as well using other
         | languages to process the data. Most of it needs to be run on a
         | gpu anyways so it's already sandboxed wrt deployments
         | 
         | It's not an all or nothing game, and I rue the day that I
         | decided to do everything in python just because I needed a
         | couple LoC to call a python module.
        
         | jolux wrote:
         | Explorer is still pre-1.0.
        
       | mrdoops wrote:
       | IMO the big win for Elixir/Nx/Bumblebee/etc is that you can do
       | batched distributed inference out of the box without deploying
       | anything separate to your app or hitting an API. Massive
       | complexity reduction and you can more easily scale up or down.
       | https://hexdocs.pm/nx/Nx.Serving.html#content
       | 
       | And there's also a scale to 0 story for when you're not using
       | that GPU at all: https://github.com/phoenixframework/flame
       | 
       | 1 language/toolchain. 1 deployable app. Real time and distributed
       | machine learning baked in. 1 dev can go really far.
        
         | 6gvONxR4sf7o wrote:
         | I've been really curious about BEAM languages but never made
         | the leap. How well does it manage heterogeneous compute? I'm
         | used to other languages making me define what happens on CPU vs
         | GPU and defining cross-machine talk around those kinds of
         | considerations.
         | 
         | What parts of that does elixir (and company) allow me to not
         | write? Is there a good balance between abstractions when it
         | comes to still maybe wanting control over what goes where
         | (heteregeneity)?
         | 
         | Super curious and kinda looking for an excuse here :)
        
           | ricketycricket wrote:
           | May not answer all your questions, but this may be a good
           | starting point: https://hexdocs.pm/nx/Nx.Defn.html
        
           | elbasti wrote:
           | The BEAM is pretty high level, and it's REALLY good at
           | managing distributed compute at the thread or device level.
           | 
           | If you have a parallelizeable workflow, it's very easy to
           | make it (properly!) parallel locally, where by "properly" I
           | mean having supervision trees, sane restart behavior, etc.
           | 
           | And once you have that you can extend that parallelism to
           | different nodes in a network (with the same sanity around
           | supervision and discovery) basically for free. Like, one-
           | line-of-code for free.
           | 
           | Nonetheless, it's all message-passing, and so pretty high
           | level. AFAIK it's not designed for parallelizing compute at
           | GPU scale.
           | 
           | That being said, if you have multiple GPUs and multiple
           | machines that have to _coordinate_ between them, Elixir
           | /Erlang is pretty much perfect.
        
           | itronitron wrote:
           | You might want to look at the java aparapi project
           | 
           |  _Aparapi allows Java developers to take advantage of the
           | compute power of GPU and APU devices by executing data
           | parallel code fragments on the GPU rather than being confined
           | to the local CPU. It does this by converting Java bytecode to
           | OpenCL at runtime and executing on the GPU, if for any reason
           | Aparapi can 't execute on the GPU it will execute in a Java
           | thread pool._
           | 
           | https://aparapi.github.io/
           | 
           | https://github.com/aparapi/aparapi
           | 
           | but avoid the non-official fork which sometimes comes up in
           | search results.
        
       | melodyogonna wrote:
       | MLIR enables so much potential to systems that use it
        
         | tonyhb wrote:
         | MLIR is cool and has an exciting future for sure.
        
       | behnamoh wrote:
       | My experience with Elixir onboarding was meh. Spent hours trying
       | to setup the LSP in VSCode and Neovim. Their pseudo-official LSP
       | (elixir-ls) didn't work at all. I even made a post about it on
       | Reddit, Github, and here. No one really knew what was going on.
       | 
       | Even with Haskell you have something like ghcup and you're good
       | to go. Not to mention Rust's amazing Cargo and Go's tooling as
       | well.
       | 
       | So far, Elixir has been even more challenging to just get up and
       | running than Common Lisp!
       | 
       | By the way, the official Elixir website recommends using Homebrew
       | to install it. But almost everyone in the Github issues and
       | comments says ASDF is the way to go.
        
         | cess11 wrote:
         | ASDF is easy to use once you've learned a few of the basic
         | commands, and that way you'll have an easy time when new
         | versions come out and want to check out new features. Like the
         | built-in JSON parser, the gradual typing when it drops, things
         | like that.
         | 
         | If you built something useful you might not want to upgrade it
         | just to look at the new stuff, and if you built nothing and
         | just drop into iex a couple of times per year it'll still be
         | easier to pull in the latest BEAM and Elixir versions and play
         | around than figuring out if Homebrew has the new version, and
         | if so, whether it installs nicely over the old or not.
         | 
         | I don't think I've got the LSP running, might check tomorrow.
         | It's OK, the IDE autocompletes some things and for me
         | development basically happens in the REPL and then gets pasted
         | into tests and the project anyway. And iex has good
         | autocomplete and help functions and so on.
         | 
         | Edit: A hurdle might be to install the GUI libraries needed for
         | Observer, but you'll probably be able to search out an
         | incantation for your operating system once you're into doing
         | stuff with process trees.
        
           | behnamoh wrote:
           | I think the Python dev style I adopted can't be easily ported
           | to Elixir. In Python, I rely heavily on an LSP because I want
           | to fiddle with a lot of functions/classes located deeply in
           | libraries. In VSCode, I simply press CMD and click on any
           | function (or in Neovim, I `gf` or `gd` it). I thought it'd
           | make even more sense in Elixir because apparently everything
           | is a module. Am I missing something? How do you use iex
           | efficiently?
        
             | ricketycricket wrote:
             | Sorry you are having so many issues. That has not been my
             | experience with elixir-ls either locally or in Codespaces.
             | Just want to say that when you do get it working, it does
             | indeed have that and many more features. If you are
             | interested, the Elixir slack is very active and helpful and
             | there is a #language-server channel.
        
               | behnamoh wrote:
               | Thank you! It seems the Elixir LSP does provide useful
               | features... I'm worried that even after getting it
               | working on my Mac, the same troubles happen again when I
               | move to remote machines. I'll try the Elixir slack!
        
         | clessg wrote:
         | Yeah, the LSP situation remains a sore point, which is deeply
         | unfortunate. One of the big reasons I like Gleam! Luckily,
         | there are new contenders popping up to hopefully solve the
         | issues with elixir-ls: try https://github.com/elixir-
         | tools/next-ls or https://github.com/lexical-lsp/lexical. They
         | might give a better experience. In fact, the creator of Elixir
         | recently started directly sponsoring next-ls, so it's probably
         | a safe enough bet.
         | 
         | > By the way, the official Elixir website recommends using
         | Homebrew to install it. But almost everyone in the Github
         | issues and comments says ASDF is the way to go.
         | 
         | The Elixir website is right. Just use Homebrew until you find a
         | real need for asdf or similar tools. It's far simpler.
         | 
         | asdf (or mise[0]) is merely a way to manage different runtime
         | versions between various projects, you would use it the same
         | way as one might use rbenv/rvm, nvm/n, pyenv, or even
         | Docker/nix, and so on. You don't need it until you have several
         | ongoing projects requiring different runtime versions. If you
         | reach that point, great! It'll be worth the effort then, and it
         | isn't difficult.
         | 
         | Personally, I just use Homebrew Elixir for easy ad-hoc access
         | to iex/livebook. If I truly need reproducible environments,
         | devbox[1] (a sort of nix wrapper) is nice and extremely
         | straightforward.
         | 
         | Tl;dr: Just use Homebrew. If your requirements expand beyond
         | that, you probably have far more challenging problems than
         | installing asdf (or whatever).
         | 
         | [0] https://mise.jdx.dev/dev-tools/comparison-to-asdf.html
         | 
         | [1] https://www.jetify.com/devbox
        
           | behnamoh wrote:
           | Thanks! I just checked out devbox, it's so great to have a
           | more user-friendly nix for people like me who don't want to
           | mess with nix!
           | 
           | Also, great point about Gleam! What I like about it, other
           | than types and JS compilation, is exactly the tooling! The
           | devs clearly understood that tooling is extremely important
           | these days and have focused on that. If it's truly inter-
           | operable with Elixir/Erlang libs, I'll probably just use
           | Gleam instead.
        
       | dpflan wrote:
       | To the author, I noticed a typo: a misspelling of "meachine"
       | instead of "machine"
       | 
       | """ These features bring Numerical Elixir and its ability to
       | setup distributed model serving, over CPUs and GPUs, to
       | traditional meachine learning algorithms, allowing developers and
       | data practitioners to tackle a wider number of problems within
       | the Elixir ecosystem. """
        
       ___________________________________________________________________
       (page generated 2024-05-29 23:01 UTC)