[HN Gopher] Micrograd.jl
___________________________________________________________________
Micrograd.jl
Author : the_origami_fox
Score : 134 points
Date : 2024-08-28 07:24 UTC (2 days ago)
(HTM) web link (liorsinai.github.io)
(TXT) w3m dump (liorsinai.github.io)
| huqedato wrote:
| Julia is a splendid, high performance language. And the most
| overlooked. Such a huge pity and shame that the entire current AI
| ecosystem is build on Python/Pytorch. Python - not a real
| programming language, let alone is interpreted... such a huge
| loss of performance besides Julia.
| FranzFerdiNaN wrote:
| > not a real programming language
|
| Really? Why do you feel the need to say this? Not liking
| Python, sure, but this kind of comments is just stupid elitism.
| What's next, the only REAL programmers are the ones that make
| their own punch cards?
| chasd00 wrote:
| Theyre just trolling for a reaction. It is indeed a
| ridiculous statement.
| brrrrrm wrote:
| It's all about the kernels tho. The language doesn't matter
| much. For the things that matter, everything is a dispatch to
| some cuda graph
|
| I'm not really a fan of this convergence but the old school
| imperative CPU way of thinking about things is dead in this
| space
| adgjlsfhk1 wrote:
| One of the really nice things about Julia for GPU programming
| is that you can write your own kernels. CUDA.jl isn't just C
| kernels. This is why (for example) DiffEQGPU.jl is able to be
| a lot faster than the other GPU based ODE solvers (see
| https://arxiv.org/abs/2304.06835 for details).
| atoav wrote:
| Python is not a real programing language? That must come as a
| shocking revalation to the many thousand people running it
| successfully in production. /s
|
| As someone who programs C/C++/Python/Rust/JS you had me curious
| in the first half of the post. But that comment makes me wonder
| about the quality of the rest of what you're saying.
| Y_Y wrote:
| I recognize the use of "not a real language" as traditional
| hyperbole[0]. I have my own gripes with python, even though it
| pays the bills, but this is just going to set off a load of
| people and is probably bad for discussion quality.
|
| Ironically it's very hard to write actual low-level parallel
| code (like CUDA) through python, there's really no choice but
| to call out to Fortran and C libraries for the likes of
| pytorch.
|
| [0]
| https://en.wikipedia.org/wiki/Real_Programmers_Don't_Use_Pas...
| xaellison wrote:
| as a major julia enthusiast I gotta say this is not how you get
| people to check it out buddy
| xiaodai wrote:
| I kinda gave up on Julia for deep learning since it's so buggy. I
| am using PyTorch now. Not great but at least it works!
| enkursigilo wrote:
| Can you elaborate a bit more?
| moelf wrote:
| what is so buggy, Julia the language or the deep learning
| libraries in Julia? in either case it would be good to have
| some examples.
| catgary wrote:
| This is an old-ish article about Julia, but from what I can
| tell the core issues with autograd were never fixed:
|
| https://kidger.site/thoughts/jax-vs-julia/
| currymj wrote:
| julia the language is really good. but a lot of core
| infrastructure julia libraries are maintained by some
| overworked grad student.
|
| sometimes that grad student is a brilliantly productive
| programmer + the libraries reach escape velocity and build a
| community, and then you get areas where Julia is state of the
| art like in differential equation solving, or generally other
| areas of "classical" scientific computing.
|
| in other cases the grad student is merely a very good
| programmer, and they just sort of float along being "almost
| but not quite there" for a long time, maybe abandoned
| depending on the maintainer's career path.
|
| the latter case is pretty common in the machine learning
| ecosystem. a lot of people get excited about using a fast
| language for ML, see that Julia can do what they want in a
| really cool way, and then run into some breaking problem or
| missing feature ("will be fixed eventually") after investing
| some time in a project.
| barbarr wrote:
| I didn't find it too buggy personally, in fact it has an
| unexpected level of composability between libraries that I
| found exciting. Stuff "just works". But I felt it lacked
| performance in practical areas such as file I/O and one-off
| development in notebooks (e.g. plotting results), which is
| really important in the initial stages of model development.
|
| (I also remember getting frustrated by frequent uninterruptible
| kernel hangs in Jupyter, but that might have been a skill issue
| on my part. But it was definitely a friction I don't encounter
| with python. When I was developing in Julia I remember feeling
| anxiety/dread about hitting enter on new cells, double and
| triple checking my code lest I initiate an uninterruptible
| error and have to restart my kernel and lose all my compilation
| progress, meaning I'll have to wait a long time again to run
| code and generate new graphs.)
| Lyngbakr wrote:
| Did you ever try alternatives to Jupyter like Pluto.jl? I'm
| curious if there's the same sort of friction.
| adgjlsfhk1 wrote:
| Julia does definitely need some love from devs with a strong
| understanding of IO performance. That said, for interactive
| use the compiler has gotten a bunch faster and better at
| caching results in the past few years. On Julia 1.10
| (released about 6 months ago) the time to load Plots.jl and
| display a plot from scratch is 1.6 seconds on my laptop
| compared to 7.3 seconds in Julia 1.8 (2022)
| Tarrosion wrote:
| I'm curious what kind of slow IO is a pain point for you -- I
| was surprised to read this comment because I normally think
| of Julia IO being pretty fast. I don't doubt there are cases
| where the Julia experience is slower than in other languages,
| I'm just curious what you're encountering since my experience
| is the opposite.
|
| Tiny example (which blends Julia-the-language and Julia-the-
| ecosystem, for better and worse): I just timed reading the
| most recent CSV I generated in real life, a relatively small
| 14k rows x 19 columns. 10ms in Julia+CSV+DataFrames, 37ms in
| Python+Pandas...ie much faster in Julia but also not a pain
| point either way.
| barbarr wrote:
| My use case was a program involving many calls to an
| external program that generated an XYZ file format to read
| in (computational chemistry). It's likely I was doing
| something wrong or inefficient, but I remember the whole
| process was rate-limited by this step in a way that Python
| wasn't.
| catgary wrote:
| I'm jealous of your experience with its autograd if it "just
| worked" for you. It was a huge pain for me to get it to do
| anything non-trivial.
| pkage wrote:
| Same here. I started my PhD with the full intention of doing
| most of my research with Julia (via Flux[0]), and while things
| worked well enough there were a few things which made it
| challenging:
|
| - Lack of multi-GPU support,
|
| - some other weird bugs related to autograd which i never fully
| figured out,
|
| - and the killer one: none of my coauthors used Julia, so I
| decided to just go with PyTorch.
|
| PyTorch has been just fine, and it's nice to not have to
| reinvent to wheel for every new model architecture.
|
| [0] https://fluxml.ai/
| catgary wrote:
| Yeah I was pretty enthusiastic about Julia for a year or two,
| even using it professionally. But honestly, JAX gives you
| (almost) everything Julia promises and its automatic
| differentiation is incredibly robust. As Python itself becomes
| a pretty reasonable language (the static typing improvements in
| 3.12, the promise of a JIT compiler) and JAX develops (it now
| has support for dynamic shape and AOT compilation) I can't see
| why I'd ever go back.
|
| The Julia repl is incredibly nice though, I do miss that.
| mccoyb wrote:
| Can you link dynamic shape support? Big if true -- but I
| haven't been able to find anything on it.
|
| Edit: I see -- I think you mean exporting lowered StableHLO
| code in a shape polymorphic format --- from the docs:
| https://jax.readthedocs.io/en/latest/export/shape_poly.html
|
| This is not the thing I usually think when someone says
| dynamic shape support.
|
| In this model, you have to construct a static graph initially
| --- then you're allowed to specify a restricted set of input
| shapes to be symbolic, to avoid the cost of lowering -- but
| you'll still incur the cost of compilation for any new shapes
| which the graph hasn't been specialized for (because those
| shapes affect the array memory layouts, which XLA needs to
| know to be aggressive)
| catgary wrote:
| It's part of model export/serialization, it is documented
| here:
|
| https://jax.readthedocs.io/en/latest/export/export.html#sup
| p...
|
| Edit: I think you need to look here as well, the Exported
| objects do in fact serialize a function and support shape
| polymorphism:
|
| https://jax.readthedocs.io/en/latest/export/shape_poly.html
| #...
| mccoyb wrote:
| Thanks! See above -- I don't think this is exactly
| dynamic shape support.
|
| My definition might be wrong, but I often think of full
| dynamic shape support as implying something dynamic about
| the computation graph.
|
| For instance, JAX supports a scan primitive -- whose
| length must be statically known. With full dynamic shape
| support, this length might be unknown -- which would mean
| one could express loops with shape dependent size.
|
| As far as I can tell, shape polymorphic exports may sort
| of give you that -- but you incur the cost of
| compilation, which will not be negligible with XLA.
| catgary wrote:
| I think you're right, so it is now as shape polymorphic
| as any framework that with an XLA backend can be.
|
| I work with edge devices, so I have also been
| experimenting with IREE for deployment, that can handle
| dynamic shapes (at times, it stopped working for a
| version but may be working again in the development
| branch).
| mccoyb wrote:
| I can't comment on the lowest leaf of this thread, but
| thanks for update! I'll read through this section and see
| if my intuitions are wrong or right.
| adgjlsfhk1 wrote:
| IMO the python JIT support won't help very much. Python
| currently is ~50x slower than "fast" languages, so even if
| the JIT provides a 3x speedup, pure python will still be too
| slow for anything that needs performance. Sure it will help
| on the margins, but a JIT can't magically make python fast.
| catgary wrote:
| I'm only really thinking about ML/scientific computing
| workflows where all the heavy lifting happens in
| jax/torch/polars.
| adgjlsfhk1 wrote:
| right and in those cases, a python JIT will do nothing
| for your performance because all the computation is
| happening in C/CUDA anyway.
| ssivark wrote:
| I think it would be more useful to list concrete
| bugs/complaints that the Julia devs could address.
| Blanket/vague claims like "Julia for deep learning [...] so
| buggy" is unfalsifiable and un-addressable. It promotes gossip
| with tribal dynamics rather than helping ecosystems improve and
| helping people pick the right tools for their needs. This is
| even more so with pile-on second hand claims (though the above
| comment might be first-hand, but potentially out-of-date).
|
| Also, it's now pretty easy to call Python from Julia (and vice
| versa) [1]. I haven't used it for deep learning, but I've been
| using it to implement my algorithms in Julia while making use
| of Jax-based libraries from Python so it's certainly quite
| smooth and ergonomic.
|
| [1] https://juliapy.github.io/PythonCall.jl/
| xyproto wrote:
| Why did Julia select a package naming convention that makes every
| project name look like a filename?
| eigenspace wrote:
| It makes a julia package name very recognizable and easily
| searchable. It's actually something I really miss when I'm
| trying to look up packages in other languages.
| NeuroCoder wrote:
| I thought that was weird too but then I realized it was on of
| the most useful tools for searching stuff online and getting
| exactly what I wanted.
| infogulch wrote:
| Imagine searching for Plots.jl, Symbolics.jl, CUDA.jl if they
| didn't have the ".jl". I wish more package ecosystems used a
| convention like this.
| stackghost wrote:
| Presumably for similar reasons to JavaScript, with names like
| Next.js, etc.
| anon389r58r58 wrote:
| Almost feels like a fallacy of Julia at this point, on the one
| hand Julia really needs a stable, high-performance AD-engine, but
| on the other hand it seems to be fairly easy to get a minimal AD-
| package off the ground.
|
| And so the perennial cycle continues and another Julia AD-package
| emerges, and ignores all/most previous work in order to claim
| novelty.
|
| Without a claim for a complete list: ReverseDiff.jl,
| ForwardDiff.jl, Zygote.jl, Enzyme.jl, Tangent.jl, Diffractor.jl,
| and many more whose name has disappeared in the short history of
| Julia...
| 0cf8612b2e1e wrote:
| I do not think this is meant to be a "real" library, but a
| didactic exercise inspired by Andrej Kaparthy's Python
| implementation.
| nextos wrote:
| This is a didactic exercise. Julia is fantastic, but lacks
| funding to develop a differentiable programming ecosystem
| that can compete with Torch or Jax. These two have corporate
| juggernauts backing them. Still, it is quite remarkable how
| far Julia has got with few resources.
|
| Having an alternative to Python would benefit the ML
| ecosystem, which is too much of a monoculture right now.
| Julia has some really interesting statistics, probabilistic
| programming and physics-informed ML packages.
| anon389r58r58 wrote:
| I think you are asking an ill-posed question in parts.
| Julia has a lot of great things, and needs to continue
| evolving to find an even better fit amongst the many
| programming languages available today and sustain itself
| long-term.
|
| Emulating Python's ML ecosystem is not going to be a viable
| strategy. The investment into the Python-based standard is
| just too large.
|
| What I could see happening though is that the continuous
| evolution of the ML ecosystem will further abstract
| components of the software stack down to an MLIR/LLVM
| abstraction level at which point something like Julia could
| also use this componentry. Sort of a continuum of
| components, where the "frontend" language and associated
| programming style is the remaining choice for the user to
| make.
| mcabbott wrote:
| That's what Reactant.jl is aiming to do: Take the LLVM
| code from Julia and pipe to to XLA, where it can benefit
| from all the investment which make Jax fast.
|
| Same author as the more mature Enzyme.jl, which is AD
| done at the LLVM level.
|
| Julia's flexibility makes it easy to write a simple AD
| system, but perhaps we're learning that it's the wrong
| level to write the program transformations needed for
| really efficient AD. Perhaps the same is true of "run
| this on a GPU, or 10" transformations.
| thetwentyone wrote:
| Odd that the author excluded ForwardDiff.jl and Zygote.jl, both
| of which get a lot of mileage in the Julia AD world. Nonetheless,
| awesome tutorial and great to see more Julia content like this!
___________________________________________________________________
(page generated 2024-08-30 23:01 UTC)