[HN Gopher] Implementing a ChatGPT-like LLM from scratch, step b...
       ___________________________________________________________________
        
       Implementing a ChatGPT-like LLM from scratch, step by step
        
       Author : rasbt
       Score  : 306 points
       Date   : 2024-01-27 16:19 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | AndrewKemendo wrote:
       | Writing a technical book in public is a level of anxiety I can't
       | imagine, so kudos to the author!
        
         | rasbt wrote:
         | It kind of is, but it's also kind of motivating :)
        
         | waynesonfire wrote:
         | It's actually less risky. The author may be able to reap the
         | benefits of writing a book without actually finishing it.
         | Ideally, maybe not much more than Chapter 1.
        
       | kif wrote:
       | Looks like just the kind of book I'd want to read. I bought a
       | copy :)
        
         | rasbt wrote:
         | Glad to hear and thanks for the support. Chapter 3 should be in
         | the MEAP soonish (submitted the draft last week). Will also
         | upload my code for chapter 4 to GitHub soonish, in the next
         | couple of days, just have to type up the notes.
        
       | intalentive wrote:
       | The model architecture itself is really not too complex,
       | especially with torch. The whole process is pretty
       | straightforward. Nice feasible project.
        
       | turnsout wrote:
       | This looks amazing @rasbt! Out of curiosity, is your primary goal
       | to cultivate understanding and demystify, or to encourage people
       | to build their own small models tailored to their needs?
        
         | rasbt wrote:
         | I'd say my primary motivation is an educational goal, i.e.,
         | helping people understand how LLMs work by building one. LLMs
         | are an important topic, and there are lots of hand-wavy videos
         | and articles out there -- I think if one codes an LLM from the
         | ground up, it will clarify lots of concepts.
         | 
         | Now, the secondary goal is, of course, also to help people with
         | building their own LLMs if they need to. The book will code the
         | whole pipeline, including pretraining and finetuning, but I
         | will also show how to load pretrained weights because I don't
         | think it's feasible to pretrain an LLM from a financial
         | perspective. We are coding everything from scratch in this book
         | using GPT-2-like LLM (so that we can load the weights for
         | models ranging from 124M that run on a laptop to the 1558M that
         | runs on a small GPU). In practice, you probably want to use a
         | framework like HF transformers or axolotl, but I hope this
         | from-scratch approach will demystify the process so that these
         | frameworks are less of a black box.
        
           | turnsout wrote:
           | Thanks for such a thoughtful response. I'm building with
           | LLMs, and do feel uncomfortable with my admittedly hand-wavy
           | understanding of the underlying transformer architecture.
           | I've ordered your book and look forward to following along!
        
       | clueless wrote:
       | are the code for chapter 4 through 8 missing?
        
         | rasbt wrote:
         | It's in progress still. I have most of the code working, but
         | it's not organized into the chapter structure, yet. I am
         | planning to add a new chapter every ~month (I wish I could do
         | this faster, but I also have some other commitments). Chapter 4
         | will be either uploaded by the end of this weekend or by the
         | end of next weekend.
        
         | _giorgio_ wrote:
         | Depending on your level, it could take a lot of weeks to go
         | through the already available material (code and pdf), so I'd
         | suggest to purchase it anyway... It makes no sense to wait
         | until the end, if you're interested in the subject.
        
       | malermeister wrote:
       | How does this compare to the karpathy video [0]? I'm trying to
       | get into LLMs and am trying to figure out what the best resource
       | to get that level of understanding would be.
       | 
       | [0] https://www.youtube.com/watch?v=kCc8FmEb1nY
        
         | _giorgio_ wrote:
         | You can't understand it unless you already know most of the
         | stuff.
         | 
         | I've watched it many times to understand well most of it.
         | 
         | And obviously you must already know pytorch really well,
         | including the matrix multiplication, backpropagation etc. He
         | speaks very fast too...
        
           | tayo42 wrote:
           | He has like 4 or 5 videos that can be watched before that one
           | where all of that is covered. He goes over stuff like writing
           | back prop from scratch and implementing layers without torch.
        
             | _giorgio_ wrote:
             | I know... That material isn't for beginners.
        
           | hadjian wrote:
           | Did you really watch all videos in the playlist? I am at
           | video 4 and had no background in PyTorch or numpy.
           | 
           | In my opinion he covers everything needed to understand his
           | lectures. Even broadcasting and multidimensional indexing
           | with numpy.
           | 
           | Also in the first lecture you will implement your own python
           | class for building expressions including backprop with an API
           | modeled after PyTorch.
           | 
           | IMHO it is the second lecture I can recommend without
           | hesitation. The other is Gilbert Strang on linear algebra.
        
             | fbdab103 wrote:
             | To echo this sentiment, I thought he does a really
             | reasonable job of working up to the topic. Sure, it is fast
             | paced, but it is a video you can rewind, plus play with the
             | notebooks.
             | 
             | There is a lot to learn, but I think he touches on all of
             | the highlights which would give the viewer the tools to
             | have a better understanding if they want to explore the
             | topic in depth.
             | 
             | Plus, I like that the videos are not overly polished. He
             | occasionally makes a minor blunder, which really humanizes
             | the experience.
        
             | _giorgio_ wrote:
             | I was talking about the last video. It's difficult unless
             | you don't know most of the material, or if you havent
             | watched the other videos in the series.
             | 
             | Anyway those videos are quite advanced. Surely not for
             | beginners.
        
         | rasbt wrote:
         | Haven't fully watched this but from a brief skimming, here are
         | some differences that the book has:
         | 
         | - it implements a real word-level LLM instead of a character-
         | level LLM
         | 
         | - after pretraining also shows how to load pretrained weights
         | 
         | - instruction-finetune that LLM after pretraining
         | 
         | - code the alignment process for the instruction-finetuned LLM
         | 
         | - also show how to finetune the LLM for classification tasks
         | 
         | - the book it overall has a lots of figures. For Chapter 3,
         | there are 26 figures alone :)
         | 
         | The video looks awesome though. I think it's probably a great
         | complementary resource to get a good solid intro because it's
         | just 2 hours. I think reading the book will probably be more
         | like 10 times that time investment.
        
           | malermeister wrote:
           | Thank you for the answer! What is the knowledge that your
           | book requires? If I have a lot of software dev experience and
           | sorta kinda remember algebra from uni, would it be a good
           | fit?
        
       | SushiHippie wrote:
       | fyi probably qualifies as an "Show HN:"
        
       | npalli wrote:
       | import torch
       | 
       | From the first code sample, not quite from scratch :-)
        
         | rasbt wrote:
         | Lol ok, otherwise it would probably be not very readable due to
         | the verbosity. The book shows how to implement LayerNorm,
         | Softmax, Linear layers, GeLU etc without using the pre-packaged
         | torch versions though.
        
         | nerdponx wrote:
         | I don't think implementing autograd is relevant or in-scope for
         | learning about how transformers work (or writing out the
         | gradient for transformer by hand, I can't even imagine doing
         | that).
        
         | PheonixPharts wrote:
         | Automatic differentiation is _why_ we are able to have complex
         | models like transformers, it 's arguably the key reason (in
         | addition to large amounts of data and massive compute
         | resources) that we have the revolution in AI that we have.
         | 
         |  _Nobody_ working in this space is hand calculating derivatives
         | for these models. Thinking in terms of differentiable
         | programming is a given and I think certainly counts as  "from
         | scratch" in this case.
         | 
         | Any time I see someone post a comment like this, I suspect the
         | don't really understand what's happening under the hood or how
         | contemporary machine learning works.
        
           | schneems wrote:
           | I'm very comfortable with AI in general but not so much with
           | Machine Lesrning. I understand transformers are a key piece
           | of the puzzle that enables tools like LLMs but don't know
           | much about them.
           | 
           | Do you (or others) have good resources explaining what they
           | are and how they work at a high level?
        
           | re wrote:
           | > Thinking in terms of differentiable programming is a given
           | and I think certainly counts as "from scratch" in this case.
           | 
           | I have to disagree on that being an obvious assumption for
           | the meaning of "from scratch", especially given that the book
           | description says that readers only need to know Python. It
           | feels like if I read "Crafting Interpreters" only to find
           | that step one is to download Lex and Yacc because everyone
           | working in the space already knows how parsers work.
           | 
           | > I suspect the don't really understand what's happening
           | under the hood or how contemporary machine learning works.
           | 
           | Everyone has to start somewhere. I thought I would be
           | interested in a book like this precisely _because_ I don 't
           | already fully understand what's happening under the hood, but
           | it sounds like it might not actually be a good starting point
           | for my idea of "from scratch."
        
         | politelemon wrote:
         | They should probably                   import universe
         | 
         | first.
        
       | bosky101 wrote:
       | How was the process of pitching to Manning?
        
         | rasbt wrote:
         | That was pretty smooth. They reached out whether I was
         | interested in writing a book for them (probably because of my
         | other writings online), I mentioned what kind I book I want to
         | write, submitted a proposal, and they liked that idea :)
        
       | wslh wrote:
       | I jumped to Github thinking this is would be a free resource
       | (with all due respect to the author work).
       | 
       | What free resources are available and recommended in the "from
       | scratch vein"?
        
         | larme wrote:
         | https://jaykmody.com/blog/gpt-from-scratch/ for a gpt2
         | inference engine in numpy
         | 
         | then
         | 
         | https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-k...
         | for adding a kv cache implementation
        
           | larme wrote:
           | I'd like to add that most of these text only talking about
           | inference part. This book (I also purchased the draft
           | version) has training and finetuning in the TOC. I assume it
           | will include materials about how to do training and
           | finetuning from scratch.
        
         | rasbt wrote:
         | I added notes to the Jupyter notebooks, I hope they are also
         | readable as standalone from the repo.
        
         | natrys wrote:
         | Neural Networks: Zero to Hero[1] by Andrej Karpathy
         | 
         | [1] https://karpathy.ai/zero-to-hero.html
        
           | villedespommes wrote:
           | +1, Andrey is an amazing educator! I'd also recommend his
           | https://youtu.be/kCc8FmEb1nY?si=mP0cQlQ4rcceL2uP and checking
           | out his github repos. MinGPT, for example, implements a small
           | gpt model that's compatible with HF API, whereas more modern
           | nanoGPT shows how to use newer features such as flash
           | attention. The quality of every video, every blog post is
           | just so high.
        
         | PheonixPharts wrote:
         | I honestly cannot fathom why anyone working in the AI space
         | would find $50 too much to spend to gain a deeper insight into
         | the subject. Creating educational materials requires an insane
         | amount of work, and I can promise, no matter how successful
         | this book is, if rasbt were do the math on income generated
         | over hours spent creating it wouldn't make sense as an hourly
         | rate.
         | 
         | Plenty of other people have this understanding of these topics,
         | and you know what they chose to do with that knowledge? Keep it
         | to themselves and go work at OpenAI to make far more money
         | keeping that knowledge private.
         | 
         | If you want to live in a world where this knowledge is open, at
         | the very least refrain from publicly complaining about a book
         | that cost roughly the same as a decent dinner.
        
           | rasbt wrote:
           | Yeah, I don't think creating educational materials makes
           | sense from an economical perspective, but it's one of my
           | hobbies that gives me joy for some reason :). Hah, and
           | 'insane amount of work' is probably right -- lots of
           | sacrifices to carve out that necessary time.
        
           | wslh wrote:
           | Not talking about affordability but about following links
           | thinking that I would find another kind of resource. Beyond
           | this case, this happens all the time with click-baity
           | content. Again, if the link was to Amazon or the editors it
           | will be clear associated with a product while Github is
           | associated with open source content. Not being pedantic, just
           | an observation browsing the web.
        
           | _giorgio_ wrote:
           | Not to be pedantic, but in this case it's probably 30 usd for
           | print and ebook (there are always coupons on the manning
           | website).
        
           | layer8 wrote:
           | > anyone working in the AI space
           | 
           | I would have expected the main target audience to be people
           | NOT working in the AI space, that don't have any prior
           | knowledge ("from scratch"), just curious to learn how an LLM
           | works.
        
         | politelemon wrote:
         | I'd go with https://course.fast.ai/
         | 
         | It's much more accessible to regular developers, and doesn't
         | make assumptions about any kind of mathematics background. It's
         | a good starting poing after which other similar resources start
         | to make more sense.
        
       | whartung wrote:
       | Can I use any of the information in this book to learn about
       | reinforcement learning?
       | 
       | My goal is to have something learn to land, like a lunar lander.
       | Simple, start at 100 feet, thrust in one direction, keep trying
       | until you stop making craters.
       | 
       | Then start adding variables, such as now it's moving
       | horizontally, adding a horizontal thruster.
       | 
       | next, remove the horizontal thruster and let the lander pivot.
       | 
       | Etc.
       | 
       | I just have no idea how to start with this, but this seems
       | "mainstream" ML, curious if this book would help with that.
        
         | Buttons840 wrote:
         | I enjoyed "Grokking Deep Reinforcement Learning"[0]. It doesn't
         | include anything about transformers though. Also, see Python's
         | gymnasium[1] library for a lunar lander environment, it's the
         | one I focused on most while I was learning and I've solved it a
         | few different ways now. You can also look at my own notebook I
         | used when implementing Soft Actor Critic with PyTorch not too
         | long ago[2], it's not great for teaching, but maybe you can get
         | something out of it.
         | 
         | [0]: https://www.manning.com/books/grokking-deep-reinforcement-
         | le... [1]: https://gymnasium.farama.org/environments/box2d/
         | [2]: https://github.com/DevJac/learn-
         | pytorch/blob/main/SAC.ipynb
        
         | thatguysaguy wrote:
         | Try OpenAI's spinning up:
         | https://spinningup.openai.com/en/latest/
        
           | Buttons840 wrote:
           | This is a good and short introduction to RL. The density of
           | the information in Spinning Up was just right for me and I
           | think I've referred to it more often than any other resource
           | when actually implementing my own RL algorithms (PPO and
           | SAC).
           | 
           | If I had to recommend a curriculum to a friend I would say:
           | 
           | (1) Spend a few hours on Spinning Up.
           | 
           | (2) If the mathematical notation is intimidating, read
           | Grokking Deep Reinforcement Learning (from Manning), which is
           | slower paced and spends a lot of time explaining the notation
           | itself, rather than just assuming the mathematical notation
           | is self-explanatory as is so often the case. This book has
           | good theoretical explanations and will get you some running
           | code.
           | 
           | (3) Spend a few hours with Spinning Up again. By this point
           | you should be a little comfortable with a few different RL
           | algorithms.
           | 
           | (4) Read Sutton's book, which is "the bible" of reinforcement
           | learning. It's quite approachable, but it would be a bit dry
           | and abstract without some hands-on experience with RL I
           | think.
        
         | smokel wrote:
         | This book seems to focus on large language models, for which
         | RLHF is sometimes a useful addition.
         | 
         | To learn more about RL, most people would advise the Sutton and
         | Barto book, available at: http://incompleteideas.net/book/the-
         | book-2nd.html
        
           | Buttons840 wrote:
           | I would recommend this as a second book after reading a
           | "cookbook" style book that is more focused on getting real
           | code working. After some hands-on experience with RL (whether
           | you succeed or fail), Sutton's book will be a lot more
           | interesting and approachable.
        
         | PheonixPharts wrote:
         | Reinforcement learning is an entirely separate area of research
         | from LLMs and, while often seen as part of ML (Tom Mitchell's
         | classic _Machine Learning_ has a great section on Q learning,
         | even if it feels a bit dated in other areas) it has little to
         | do with contemporary ML work. Even with things like AlphaGo,
         | what you find is basically work in using deep neural networks
         | as an input into classic RL techniques.
         | 
         | Sutton and Barto's _Reinforcement Learning: An Introduction_ is
         | widely considered a the definitive intro to the topic.
        
         | rasbt wrote:
         | Sorry, in that case I would rather recommend a dedicated RL
         | book. The RL part in LLMs will be very specific to LLMs, and I
         | will only cover what's absolutely relevant in terms of
         | background info. I do have a longish intro chapter on RL in my
         | other general ML/DL book (https://github.com/rasbt/machine-
         | learning-book/tree/main/ch1...) but like others said, I would
         | recommend a dedicated RL book in your case.
        
       | Karupan wrote:
       | Bought a copy. Good luck rasbt!
        
       | ijustwanttovote wrote:
       | Wow, great info. Thanks for sharing.
        
       | theogravity wrote:
       | Purchased the book. Really excited to read it!
        
       | photon_collider wrote:
       | Bought a copy! Looking forward to reading it. :)
       | 
       | Is there a way for readers to give feedback on the book as you
       | write it?
        
         | _giorgio_ wrote:
         | The book's forum on manning
        
       | canyon289 wrote:
       | For an additional resource I'm writing a guide book, though its
       | in various stages of completion
       | 
       | The fine tuning guide is the best resource so far
       | https://ravinkumar.com/GenAiGuidebook/language_models/finetu...
        
       | iamcreasy wrote:
       | Thank you for this endeavour.
       | 
       | Do you have an ETA for the completion of the book?
        
       ___________________________________________________________________
       (page generated 2024-01-27 23:00 UTC)