[HN Gopher] Implementing LLaMA3 in 100 Lines of Pure Jax
       ___________________________________________________________________
        
       Implementing LLaMA3 in 100 Lines of Pure Jax
        
       Author : jxmorris12
       Score  : 147 points
       Date   : 2025-02-19 02:37 UTC (20 hours ago)
        
 (HTM) web link (saurabhalone.com)
 (TXT) w3m dump (saurabhalone.com)
        
       | hustwindmaple1 wrote:
       | Cool blog
        
       | heyitsguay wrote:
       | Unreadable in portrait mode on mobile. The text column is way too
       | narrow, should be an easy fix!
        
         | abhgh wrote:
         | It's not just the width of the column - there are annotations
         | on certain lines (that appear on a right "margin") that don't
         | show up on mobile. I think that makes it not an easy fix, but
         | to your larger point, this is not very mobile friendly. It
         | looks quite good on a desktop though.
        
         | kccqzy wrote:
         | People had long forgotten that mobile browsers handle wide
         | content by zooming. If you are making a website but don't
         | bother optimizing it for mobile, leave off the viewport <meta>
         | element.
        
       | ein0p wrote:
       | It is a very bad idea to handle the KV cache in Jax naively like
       | that. Jax requires static shapes. You're creating dynamic shapes
       | there, causing a ton of recompilation.
        
         | bravura wrote:
         | Is there any automatic way to get warned against these
         | antipatterns?
        
           | sega_sai wrote:
           | you can see each compilation if you use JAX_LOG_COMPILES
           | variable or you use low enough logging level.
        
             | bravura wrote:
             | Sorry, not to belabor this point.
             | 
             | Would that suggest to you what you did wrong? Or purely
             | show you what you got right? How chatty is this variable?
        
               | sega_sai wrote:
               | I used this to see if something is repeatedly compiled.
               | I.e. I have the code that runs in a loop and you
               | immediately see if something is compiled only once, or
               | every time. (and it produces a lot of output) I'm not
               | saying this is the best way to do it though, it just
               | worked for me.
        
         | magicalhippo wrote:
         | The blog mentions it's not for production use. This sounds like
         | one thing you'd want to change.
         | 
         | I was curious what else made it not fit for production.
         | Anything fundamental or just minor issues like this?
        
         | YetAnotherNick wrote:
         | Just don't use jit in generation and it would be fine. Of
         | course there is some performance penalty but in my experience
         | jit is oversold and the difference is something like ~10-30%.
         | 
         | Also in any case to get optimized code you need flash attention
         | and many other tricks.
        
       | brcmthrowaway wrote:
       | these anime kids are going to take everyones job
        
         | rfl890 wrote:
         | said no one ever
        
         | ge96 wrote:
         | Anya from spy family x
        
       | bravura wrote:
       | To the poster who wrote: "Hey Saurabh, will you be willing to
       | teach me this on a call? I'm willing to pay for it (im not rich,
       | so, dont expect much please). I will be having a lot of
       | questions, mostly related to core concepts of transformers and
       | jax in general."
       | 
       | This is the wrong way to ask for help.
       | 
       | Instead, consider offering your help and time apprenticing and
       | learning along the way. Can't code that well? Write test cases
       | and clean up. Or help blog writing. etc. You certainly have some
       | valuable skill you could trade up.
        
         | saagarjha wrote:
         | I mean I'm no Saurabh but that didn't seem to unreasonable to
         | me? In fact I'll put my money where my mouth is and offer half
         | an hour for free just to spite you
        
       | zengid wrote:
       | There is some research in accelerating Reinforcement Learning by
       | implementing the simulator on the GPU using Jax. Really neat. I'm
       | curious if this could be done with Mojo, too?
       | 
       | https://arxiv.org/html/2311.10090v5
       | 
       | https://youtu.be/Jr_nGkCG3og?si=WKS7Hbz13A0QH4nf&t=520
        
       | ge96 wrote:
       | "focuses on the soul of pure functional programming which makes
       | it more cool"
       | 
       | This is tangential to this post's main point but if you're trying
       | for mass adoption this can go badly. Case in point, a hardware
       | company I backed decided to write their code using Haskel like
       | why "because it's cool" and now the people who are trying to
       | modify/work with it have to deal with Haskell vs. a general
       | purpose language like C++ idk...
       | 
       | edit: I also realize most of this code is python but yeah
        
         | drdaeman wrote:
         | > deal with Haskell vs. a general purpose language like C++
         | 
         | What's the actual problem? Company decided to use Haskell
         | (which is also a general-purpose language) then hired people
         | who don't know it?
         | 
         | If so, hire a bunch of Pythonistas to work on a Rails project
         | and you'll have similar kind of struggles (and it won't mean
         | that Python or Ruby are somehow bad, it'll be an almost
         | entirely non-technical issue).
        
           | ge96 wrote:
           | the problem is it's intended to be an open source device so
           | haskell would be harder to work on than something simpler
           | like C++
           | 
           | again my point is about adoption, hence offering multiple
           | languages in most products like stripe for ex
           | 
           | edit: it's alright, when they actually ship these things
           | (after putting down $3.5K) I hope I will take it upon myself
           | to port it to C++ myself
           | 
           | edit: "general purpose" is probably the wrong way to put it,
           | Haskell is harder to read than C++ is my pov
        
       ___________________________________________________________________
       (page generated 2025-02-19 23:01 UTC)