[HN Gopher] Micrograd.jl
       ___________________________________________________________________
        
       Micrograd.jl
        
       Author : the_origami_fox
       Score  : 134 points
       Date   : 2024-08-28 07:24 UTC (2 days ago)
        
 (HTM) web link (liorsinai.github.io)
 (TXT) w3m dump (liorsinai.github.io)
        
       | huqedato wrote:
       | Julia is a splendid, high performance language. And the most
       | overlooked. Such a huge pity and shame that the entire current AI
       | ecosystem is build on Python/Pytorch. Python - not a real
       | programming language, let alone is interpreted... such a huge
       | loss of performance besides Julia.
        
         | FranzFerdiNaN wrote:
         | > not a real programming language
         | 
         | Really? Why do you feel the need to say this? Not liking
         | Python, sure, but this kind of comments is just stupid elitism.
         | What's next, the only REAL programmers are the ones that make
         | their own punch cards?
        
           | chasd00 wrote:
           | Theyre just trolling for a reaction. It is indeed a
           | ridiculous statement.
        
         | brrrrrm wrote:
         | It's all about the kernels tho. The language doesn't matter
         | much. For the things that matter, everything is a dispatch to
         | some cuda graph
         | 
         | I'm not really a fan of this convergence but the old school
         | imperative CPU way of thinking about things is dead in this
         | space
        
           | adgjlsfhk1 wrote:
           | One of the really nice things about Julia for GPU programming
           | is that you can write your own kernels. CUDA.jl isn't just C
           | kernels. This is why (for example) DiffEQGPU.jl is able to be
           | a lot faster than the other GPU based ODE solvers (see
           | https://arxiv.org/abs/2304.06835 for details).
        
         | atoav wrote:
         | Python is not a real programing language? That must come as a
         | shocking revalation to the many thousand people running it
         | successfully in production. /s
         | 
         | As someone who programs C/C++/Python/Rust/JS you had me curious
         | in the first half of the post. But that comment makes me wonder
         | about the quality of the rest of what you're saying.
        
         | Y_Y wrote:
         | I recognize the use of "not a real language" as traditional
         | hyperbole[0]. I have my own gripes with python, even though it
         | pays the bills, but this is just going to set off a load of
         | people and is probably bad for discussion quality.
         | 
         | Ironically it's very hard to write actual low-level parallel
         | code (like CUDA) through python, there's really no choice but
         | to call out to Fortran and C libraries for the likes of
         | pytorch.
         | 
         | [0]
         | https://en.wikipedia.org/wiki/Real_Programmers_Don't_Use_Pas...
        
         | xaellison wrote:
         | as a major julia enthusiast I gotta say this is not how you get
         | people to check it out buddy
        
       | xiaodai wrote:
       | I kinda gave up on Julia for deep learning since it's so buggy. I
       | am using PyTorch now. Not great but at least it works!
        
         | enkursigilo wrote:
         | Can you elaborate a bit more?
        
         | moelf wrote:
         | what is so buggy, Julia the language or the deep learning
         | libraries in Julia? in either case it would be good to have
         | some examples.
        
           | catgary wrote:
           | This is an old-ish article about Julia, but from what I can
           | tell the core issues with autograd were never fixed:
           | 
           | https://kidger.site/thoughts/jax-vs-julia/
        
           | currymj wrote:
           | julia the language is really good. but a lot of core
           | infrastructure julia libraries are maintained by some
           | overworked grad student.
           | 
           | sometimes that grad student is a brilliantly productive
           | programmer + the libraries reach escape velocity and build a
           | community, and then you get areas where Julia is state of the
           | art like in differential equation solving, or generally other
           | areas of "classical" scientific computing.
           | 
           | in other cases the grad student is merely a very good
           | programmer, and they just sort of float along being "almost
           | but not quite there" for a long time, maybe abandoned
           | depending on the maintainer's career path.
           | 
           | the latter case is pretty common in the machine learning
           | ecosystem. a lot of people get excited about using a fast
           | language for ML, see that Julia can do what they want in a
           | really cool way, and then run into some breaking problem or
           | missing feature ("will be fixed eventually") after investing
           | some time in a project.
        
         | barbarr wrote:
         | I didn't find it too buggy personally, in fact it has an
         | unexpected level of composability between libraries that I
         | found exciting. Stuff "just works". But I felt it lacked
         | performance in practical areas such as file I/O and one-off
         | development in notebooks (e.g. plotting results), which is
         | really important in the initial stages of model development.
         | 
         | (I also remember getting frustrated by frequent uninterruptible
         | kernel hangs in Jupyter, but that might have been a skill issue
         | on my part. But it was definitely a friction I don't encounter
         | with python. When I was developing in Julia I remember feeling
         | anxiety/dread about hitting enter on new cells, double and
         | triple checking my code lest I initiate an uninterruptible
         | error and have to restart my kernel and lose all my compilation
         | progress, meaning I'll have to wait a long time again to run
         | code and generate new graphs.)
        
           | Lyngbakr wrote:
           | Did you ever try alternatives to Jupyter like Pluto.jl? I'm
           | curious if there's the same sort of friction.
        
           | adgjlsfhk1 wrote:
           | Julia does definitely need some love from devs with a strong
           | understanding of IO performance. That said, for interactive
           | use the compiler has gotten a bunch faster and better at
           | caching results in the past few years. On Julia 1.10
           | (released about 6 months ago) the time to load Plots.jl and
           | display a plot from scratch is 1.6 seconds on my laptop
           | compared to 7.3 seconds in Julia 1.8 (2022)
        
           | Tarrosion wrote:
           | I'm curious what kind of slow IO is a pain point for you -- I
           | was surprised to read this comment because I normally think
           | of Julia IO being pretty fast. I don't doubt there are cases
           | where the Julia experience is slower than in other languages,
           | I'm just curious what you're encountering since my experience
           | is the opposite.
           | 
           | Tiny example (which blends Julia-the-language and Julia-the-
           | ecosystem, for better and worse): I just timed reading the
           | most recent CSV I generated in real life, a relatively small
           | 14k rows x 19 columns. 10ms in Julia+CSV+DataFrames, 37ms in
           | Python+Pandas...ie much faster in Julia but also not a pain
           | point either way.
        
             | barbarr wrote:
             | My use case was a program involving many calls to an
             | external program that generated an XYZ file format to read
             | in (computational chemistry). It's likely I was doing
             | something wrong or inefficient, but I remember the whole
             | process was rate-limited by this step in a way that Python
             | wasn't.
        
           | catgary wrote:
           | I'm jealous of your experience with its autograd if it "just
           | worked" for you. It was a huge pain for me to get it to do
           | anything non-trivial.
        
         | pkage wrote:
         | Same here. I started my PhD with the full intention of doing
         | most of my research with Julia (via Flux[0]), and while things
         | worked well enough there were a few things which made it
         | challenging:
         | 
         | - Lack of multi-GPU support,
         | 
         | - some other weird bugs related to autograd which i never fully
         | figured out,
         | 
         | - and the killer one: none of my coauthors used Julia, so I
         | decided to just go with PyTorch.
         | 
         | PyTorch has been just fine, and it's nice to not have to
         | reinvent to wheel for every new model architecture.
         | 
         | [0] https://fluxml.ai/
        
         | catgary wrote:
         | Yeah I was pretty enthusiastic about Julia for a year or two,
         | even using it professionally. But honestly, JAX gives you
         | (almost) everything Julia promises and its automatic
         | differentiation is incredibly robust. As Python itself becomes
         | a pretty reasonable language (the static typing improvements in
         | 3.12, the promise of a JIT compiler) and JAX develops (it now
         | has support for dynamic shape and AOT compilation) I can't see
         | why I'd ever go back.
         | 
         | The Julia repl is incredibly nice though, I do miss that.
        
           | mccoyb wrote:
           | Can you link dynamic shape support? Big if true -- but I
           | haven't been able to find anything on it.
           | 
           | Edit: I see -- I think you mean exporting lowered StableHLO
           | code in a shape polymorphic format --- from the docs:
           | https://jax.readthedocs.io/en/latest/export/shape_poly.html
           | 
           | This is not the thing I usually think when someone says
           | dynamic shape support.
           | 
           | In this model, you have to construct a static graph initially
           | --- then you're allowed to specify a restricted set of input
           | shapes to be symbolic, to avoid the cost of lowering -- but
           | you'll still incur the cost of compilation for any new shapes
           | which the graph hasn't been specialized for (because those
           | shapes affect the array memory layouts, which XLA needs to
           | know to be aggressive)
        
             | catgary wrote:
             | It's part of model export/serialization, it is documented
             | here:
             | 
             | https://jax.readthedocs.io/en/latest/export/export.html#sup
             | p...
             | 
             | Edit: I think you need to look here as well, the Exported
             | objects do in fact serialize a function and support shape
             | polymorphism:
             | 
             | https://jax.readthedocs.io/en/latest/export/shape_poly.html
             | #...
        
               | mccoyb wrote:
               | Thanks! See above -- I don't think this is exactly
               | dynamic shape support.
               | 
               | My definition might be wrong, but I often think of full
               | dynamic shape support as implying something dynamic about
               | the computation graph.
               | 
               | For instance, JAX supports a scan primitive -- whose
               | length must be statically known. With full dynamic shape
               | support, this length might be unknown -- which would mean
               | one could express loops with shape dependent size.
               | 
               | As far as I can tell, shape polymorphic exports may sort
               | of give you that -- but you incur the cost of
               | compilation, which will not be negligible with XLA.
        
               | catgary wrote:
               | I think you're right, so it is now as shape polymorphic
               | as any framework that with an XLA backend can be.
               | 
               | I work with edge devices, so I have also been
               | experimenting with IREE for deployment, that can handle
               | dynamic shapes (at times, it stopped working for a
               | version but may be working again in the development
               | branch).
        
               | mccoyb wrote:
               | I can't comment on the lowest leaf of this thread, but
               | thanks for update! I'll read through this section and see
               | if my intuitions are wrong or right.
        
           | adgjlsfhk1 wrote:
           | IMO the python JIT support won't help very much. Python
           | currently is ~50x slower than "fast" languages, so even if
           | the JIT provides a 3x speedup, pure python will still be too
           | slow for anything that needs performance. Sure it will help
           | on the margins, but a JIT can't magically make python fast.
        
             | catgary wrote:
             | I'm only really thinking about ML/scientific computing
             | workflows where all the heavy lifting happens in
             | jax/torch/polars.
        
               | adgjlsfhk1 wrote:
               | right and in those cases, a python JIT will do nothing
               | for your performance because all the computation is
               | happening in C/CUDA anyway.
        
         | ssivark wrote:
         | I think it would be more useful to list concrete
         | bugs/complaints that the Julia devs could address.
         | Blanket/vague claims like "Julia for deep learning [...] so
         | buggy" is unfalsifiable and un-addressable. It promotes gossip
         | with tribal dynamics rather than helping ecosystems improve and
         | helping people pick the right tools for their needs. This is
         | even more so with pile-on second hand claims (though the above
         | comment might be first-hand, but potentially out-of-date).
         | 
         | Also, it's now pretty easy to call Python from Julia (and vice
         | versa) [1]. I haven't used it for deep learning, but I've been
         | using it to implement my algorithms in Julia while making use
         | of Jax-based libraries from Python so it's certainly quite
         | smooth and ergonomic.
         | 
         | [1] https://juliapy.github.io/PythonCall.jl/
        
       | xyproto wrote:
       | Why did Julia select a package naming convention that makes every
       | project name look like a filename?
        
         | eigenspace wrote:
         | It makes a julia package name very recognizable and easily
         | searchable. It's actually something I really miss when I'm
         | trying to look up packages in other languages.
        
         | NeuroCoder wrote:
         | I thought that was weird too but then I realized it was on of
         | the most useful tools for searching stuff online and getting
         | exactly what I wanted.
        
         | infogulch wrote:
         | Imagine searching for Plots.jl, Symbolics.jl, CUDA.jl if they
         | didn't have the ".jl". I wish more package ecosystems used a
         | convention like this.
        
         | stackghost wrote:
         | Presumably for similar reasons to JavaScript, with names like
         | Next.js, etc.
        
       | anon389r58r58 wrote:
       | Almost feels like a fallacy of Julia at this point, on the one
       | hand Julia really needs a stable, high-performance AD-engine, but
       | on the other hand it seems to be fairly easy to get a minimal AD-
       | package off the ground.
       | 
       | And so the perennial cycle continues and another Julia AD-package
       | emerges, and ignores all/most previous work in order to claim
       | novelty.
       | 
       | Without a claim for a complete list: ReverseDiff.jl,
       | ForwardDiff.jl, Zygote.jl, Enzyme.jl, Tangent.jl, Diffractor.jl,
       | and many more whose name has disappeared in the short history of
       | Julia...
        
         | 0cf8612b2e1e wrote:
         | I do not think this is meant to be a "real" library, but a
         | didactic exercise inspired by Andrej Kaparthy's Python
         | implementation.
        
           | nextos wrote:
           | This is a didactic exercise. Julia is fantastic, but lacks
           | funding to develop a differentiable programming ecosystem
           | that can compete with Torch or Jax. These two have corporate
           | juggernauts backing them. Still, it is quite remarkable how
           | far Julia has got with few resources.
           | 
           | Having an alternative to Python would benefit the ML
           | ecosystem, which is too much of a monoculture right now.
           | Julia has some really interesting statistics, probabilistic
           | programming and physics-informed ML packages.
        
             | anon389r58r58 wrote:
             | I think you are asking an ill-posed question in parts.
             | Julia has a lot of great things, and needs to continue
             | evolving to find an even better fit amongst the many
             | programming languages available today and sustain itself
             | long-term.
             | 
             | Emulating Python's ML ecosystem is not going to be a viable
             | strategy. The investment into the Python-based standard is
             | just too large.
             | 
             | What I could see happening though is that the continuous
             | evolution of the ML ecosystem will further abstract
             | components of the software stack down to an MLIR/LLVM
             | abstraction level at which point something like Julia could
             | also use this componentry. Sort of a continuum of
             | components, where the "frontend" language and associated
             | programming style is the remaining choice for the user to
             | make.
        
               | mcabbott wrote:
               | That's what Reactant.jl is aiming to do: Take the LLVM
               | code from Julia and pipe to to XLA, where it can benefit
               | from all the investment which make Jax fast.
               | 
               | Same author as the more mature Enzyme.jl, which is AD
               | done at the LLVM level.
               | 
               | Julia's flexibility makes it easy to write a simple AD
               | system, but perhaps we're learning that it's the wrong
               | level to write the program transformations needed for
               | really efficient AD. Perhaps the same is true of "run
               | this on a GPU, or 10" transformations.
        
       | thetwentyone wrote:
       | Odd that the author excluded ForwardDiff.jl and Zygote.jl, both
       | of which get a lot of mileage in the Julia AD world. Nonetheless,
       | awesome tutorial and great to see more Julia content like this!
        
       ___________________________________________________________________
       (page generated 2024-08-30 23:01 UTC)