[HN Gopher] Implementing LLaMA3 in 100 Lines of Pure Jax
___________________________________________________________________
Implementing LLaMA3 in 100 Lines of Pure Jax
Author : jxmorris12
Score : 147 points
Date : 2025-02-19 02:37 UTC (20 hours ago)
(HTM) web link (saurabhalone.com)
(TXT) w3m dump (saurabhalone.com)
| hustwindmaple1 wrote:
| Cool blog
| heyitsguay wrote:
| Unreadable in portrait mode on mobile. The text column is way too
| narrow, should be an easy fix!
| abhgh wrote:
| It's not just the width of the column - there are annotations
| on certain lines (that appear on a right "margin") that don't
| show up on mobile. I think that makes it not an easy fix, but
| to your larger point, this is not very mobile friendly. It
| looks quite good on a desktop though.
| kccqzy wrote:
| People had long forgotten that mobile browsers handle wide
| content by zooming. If you are making a website but don't
| bother optimizing it for mobile, leave off the viewport <meta>
| element.
| ein0p wrote:
| It is a very bad idea to handle the KV cache in Jax naively like
| that. Jax requires static shapes. You're creating dynamic shapes
| there, causing a ton of recompilation.
| bravura wrote:
| Is there any automatic way to get warned against these
| antipatterns?
| sega_sai wrote:
| you can see each compilation if you use JAX_LOG_COMPILES
| variable or you use low enough logging level.
| bravura wrote:
| Sorry, not to belabor this point.
|
| Would that suggest to you what you did wrong? Or purely
| show you what you got right? How chatty is this variable?
| sega_sai wrote:
| I used this to see if something is repeatedly compiled.
| I.e. I have the code that runs in a loop and you
| immediately see if something is compiled only once, or
| every time. (and it produces a lot of output) I'm not
| saying this is the best way to do it though, it just
| worked for me.
| magicalhippo wrote:
| The blog mentions it's not for production use. This sounds like
| one thing you'd want to change.
|
| I was curious what else made it not fit for production.
| Anything fundamental or just minor issues like this?
| YetAnotherNick wrote:
| Just don't use jit in generation and it would be fine. Of
| course there is some performance penalty but in my experience
| jit is oversold and the difference is something like ~10-30%.
|
| Also in any case to get optimized code you need flash attention
| and many other tricks.
| brcmthrowaway wrote:
| these anime kids are going to take everyones job
| rfl890 wrote:
| said no one ever
| ge96 wrote:
| Anya from spy family x
| bravura wrote:
| To the poster who wrote: "Hey Saurabh, will you be willing to
| teach me this on a call? I'm willing to pay for it (im not rich,
| so, dont expect much please). I will be having a lot of
| questions, mostly related to core concepts of transformers and
| jax in general."
|
| This is the wrong way to ask for help.
|
| Instead, consider offering your help and time apprenticing and
| learning along the way. Can't code that well? Write test cases
| and clean up. Or help blog writing. etc. You certainly have some
| valuable skill you could trade up.
| saagarjha wrote:
| I mean I'm no Saurabh but that didn't seem to unreasonable to
| me? In fact I'll put my money where my mouth is and offer half
| an hour for free just to spite you
| zengid wrote:
| There is some research in accelerating Reinforcement Learning by
| implementing the simulator on the GPU using Jax. Really neat. I'm
| curious if this could be done with Mojo, too?
|
| https://arxiv.org/html/2311.10090v5
|
| https://youtu.be/Jr_nGkCG3og?si=WKS7Hbz13A0QH4nf&t=520
| ge96 wrote:
| "focuses on the soul of pure functional programming which makes
| it more cool"
|
| This is tangential to this post's main point but if you're trying
| for mass adoption this can go badly. Case in point, a hardware
| company I backed decided to write their code using Haskel like
| why "because it's cool" and now the people who are trying to
| modify/work with it have to deal with Haskell vs. a general
| purpose language like C++ idk...
|
| edit: I also realize most of this code is python but yeah
| drdaeman wrote:
| > deal with Haskell vs. a general purpose language like C++
|
| What's the actual problem? Company decided to use Haskell
| (which is also a general-purpose language) then hired people
| who don't know it?
|
| If so, hire a bunch of Pythonistas to work on a Rails project
| and you'll have similar kind of struggles (and it won't mean
| that Python or Ruby are somehow bad, it'll be an almost
| entirely non-technical issue).
| ge96 wrote:
| the problem is it's intended to be an open source device so
| haskell would be harder to work on than something simpler
| like C++
|
| again my point is about adoption, hence offering multiple
| languages in most products like stripe for ex
|
| edit: it's alright, when they actually ship these things
| (after putting down $3.5K) I hope I will take it upon myself
| to port it to C++ myself
|
| edit: "general purpose" is probably the wrong way to put it,
| Haskell is harder to read than C++ is my pov
___________________________________________________________________
(page generated 2025-02-19 23:01 UTC)