[HN Gopher] Implementing a ChatGPT-like LLM from scratch, step b...
___________________________________________________________________
Implementing a ChatGPT-like LLM from scratch, step by step
Author : rasbt
Score : 306 points
Date : 2024-01-27 16:19 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| AndrewKemendo wrote:
| Writing a technical book in public is a level of anxiety I can't
| imagine, so kudos to the author!
| rasbt wrote:
| It kind of is, but it's also kind of motivating :)
| waynesonfire wrote:
| It's actually less risky. The author may be able to reap the
| benefits of writing a book without actually finishing it.
| Ideally, maybe not much more than Chapter 1.
| kif wrote:
| Looks like just the kind of book I'd want to read. I bought a
| copy :)
| rasbt wrote:
| Glad to hear and thanks for the support. Chapter 3 should be in
| the MEAP soonish (submitted the draft last week). Will also
| upload my code for chapter 4 to GitHub soonish, in the next
| couple of days, just have to type up the notes.
| intalentive wrote:
| The model architecture itself is really not too complex,
| especially with torch. The whole process is pretty
| straightforward. Nice feasible project.
| turnsout wrote:
| This looks amazing @rasbt! Out of curiosity, is your primary goal
| to cultivate understanding and demystify, or to encourage people
| to build their own small models tailored to their needs?
| rasbt wrote:
| I'd say my primary motivation is an educational goal, i.e.,
| helping people understand how LLMs work by building one. LLMs
| are an important topic, and there are lots of hand-wavy videos
| and articles out there -- I think if one codes an LLM from the
| ground up, it will clarify lots of concepts.
|
| Now, the secondary goal is, of course, also to help people with
| building their own LLMs if they need to. The book will code the
| whole pipeline, including pretraining and finetuning, but I
| will also show how to load pretrained weights because I don't
| think it's feasible to pretrain an LLM from a financial
| perspective. We are coding everything from scratch in this book
| using GPT-2-like LLM (so that we can load the weights for
| models ranging from 124M that run on a laptop to the 1558M that
| runs on a small GPU). In practice, you probably want to use a
| framework like HF transformers or axolotl, but I hope this
| from-scratch approach will demystify the process so that these
| frameworks are less of a black box.
| turnsout wrote:
| Thanks for such a thoughtful response. I'm building with
| LLMs, and do feel uncomfortable with my admittedly hand-wavy
| understanding of the underlying transformer architecture.
| I've ordered your book and look forward to following along!
| clueless wrote:
| are the code for chapter 4 through 8 missing?
| rasbt wrote:
| It's in progress still. I have most of the code working, but
| it's not organized into the chapter structure, yet. I am
| planning to add a new chapter every ~month (I wish I could do
| this faster, but I also have some other commitments). Chapter 4
| will be either uploaded by the end of this weekend or by the
| end of next weekend.
| _giorgio_ wrote:
| Depending on your level, it could take a lot of weeks to go
| through the already available material (code and pdf), so I'd
| suggest to purchase it anyway... It makes no sense to wait
| until the end, if you're interested in the subject.
| malermeister wrote:
| How does this compare to the karpathy video [0]? I'm trying to
| get into LLMs and am trying to figure out what the best resource
| to get that level of understanding would be.
|
| [0] https://www.youtube.com/watch?v=kCc8FmEb1nY
| _giorgio_ wrote:
| You can't understand it unless you already know most of the
| stuff.
|
| I've watched it many times to understand well most of it.
|
| And obviously you must already know pytorch really well,
| including the matrix multiplication, backpropagation etc. He
| speaks very fast too...
| tayo42 wrote:
| He has like 4 or 5 videos that can be watched before that one
| where all of that is covered. He goes over stuff like writing
| back prop from scratch and implementing layers without torch.
| _giorgio_ wrote:
| I know... That material isn't for beginners.
| hadjian wrote:
| Did you really watch all videos in the playlist? I am at
| video 4 and had no background in PyTorch or numpy.
|
| In my opinion he covers everything needed to understand his
| lectures. Even broadcasting and multidimensional indexing
| with numpy.
|
| Also in the first lecture you will implement your own python
| class for building expressions including backprop with an API
| modeled after PyTorch.
|
| IMHO it is the second lecture I can recommend without
| hesitation. The other is Gilbert Strang on linear algebra.
| fbdab103 wrote:
| To echo this sentiment, I thought he does a really
| reasonable job of working up to the topic. Sure, it is fast
| paced, but it is a video you can rewind, plus play with the
| notebooks.
|
| There is a lot to learn, but I think he touches on all of
| the highlights which would give the viewer the tools to
| have a better understanding if they want to explore the
| topic in depth.
|
| Plus, I like that the videos are not overly polished. He
| occasionally makes a minor blunder, which really humanizes
| the experience.
| _giorgio_ wrote:
| I was talking about the last video. It's difficult unless
| you don't know most of the material, or if you havent
| watched the other videos in the series.
|
| Anyway those videos are quite advanced. Surely not for
| beginners.
| rasbt wrote:
| Haven't fully watched this but from a brief skimming, here are
| some differences that the book has:
|
| - it implements a real word-level LLM instead of a character-
| level LLM
|
| - after pretraining also shows how to load pretrained weights
|
| - instruction-finetune that LLM after pretraining
|
| - code the alignment process for the instruction-finetuned LLM
|
| - also show how to finetune the LLM for classification tasks
|
| - the book it overall has a lots of figures. For Chapter 3,
| there are 26 figures alone :)
|
| The video looks awesome though. I think it's probably a great
| complementary resource to get a good solid intro because it's
| just 2 hours. I think reading the book will probably be more
| like 10 times that time investment.
| malermeister wrote:
| Thank you for the answer! What is the knowledge that your
| book requires? If I have a lot of software dev experience and
| sorta kinda remember algebra from uni, would it be a good
| fit?
| SushiHippie wrote:
| fyi probably qualifies as an "Show HN:"
| npalli wrote:
| import torch
|
| From the first code sample, not quite from scratch :-)
| rasbt wrote:
| Lol ok, otherwise it would probably be not very readable due to
| the verbosity. The book shows how to implement LayerNorm,
| Softmax, Linear layers, GeLU etc without using the pre-packaged
| torch versions though.
| nerdponx wrote:
| I don't think implementing autograd is relevant or in-scope for
| learning about how transformers work (or writing out the
| gradient for transformer by hand, I can't even imagine doing
| that).
| PheonixPharts wrote:
| Automatic differentiation is _why_ we are able to have complex
| models like transformers, it 's arguably the key reason (in
| addition to large amounts of data and massive compute
| resources) that we have the revolution in AI that we have.
|
| _Nobody_ working in this space is hand calculating derivatives
| for these models. Thinking in terms of differentiable
| programming is a given and I think certainly counts as "from
| scratch" in this case.
|
| Any time I see someone post a comment like this, I suspect the
| don't really understand what's happening under the hood or how
| contemporary machine learning works.
| schneems wrote:
| I'm very comfortable with AI in general but not so much with
| Machine Lesrning. I understand transformers are a key piece
| of the puzzle that enables tools like LLMs but don't know
| much about them.
|
| Do you (or others) have good resources explaining what they
| are and how they work at a high level?
| re wrote:
| > Thinking in terms of differentiable programming is a given
| and I think certainly counts as "from scratch" in this case.
|
| I have to disagree on that being an obvious assumption for
| the meaning of "from scratch", especially given that the book
| description says that readers only need to know Python. It
| feels like if I read "Crafting Interpreters" only to find
| that step one is to download Lex and Yacc because everyone
| working in the space already knows how parsers work.
|
| > I suspect the don't really understand what's happening
| under the hood or how contemporary machine learning works.
|
| Everyone has to start somewhere. I thought I would be
| interested in a book like this precisely _because_ I don 't
| already fully understand what's happening under the hood, but
| it sounds like it might not actually be a good starting point
| for my idea of "from scratch."
| politelemon wrote:
| They should probably import universe
|
| first.
| bosky101 wrote:
| How was the process of pitching to Manning?
| rasbt wrote:
| That was pretty smooth. They reached out whether I was
| interested in writing a book for them (probably because of my
| other writings online), I mentioned what kind I book I want to
| write, submitted a proposal, and they liked that idea :)
| wslh wrote:
| I jumped to Github thinking this is would be a free resource
| (with all due respect to the author work).
|
| What free resources are available and recommended in the "from
| scratch vein"?
| larme wrote:
| https://jaykmody.com/blog/gpt-from-scratch/ for a gpt2
| inference engine in numpy
|
| then
|
| https://www.dipkumar.dev/becoming-the-unbeatable/posts/gpt-k...
| for adding a kv cache implementation
| larme wrote:
| I'd like to add that most of these text only talking about
| inference part. This book (I also purchased the draft
| version) has training and finetuning in the TOC. I assume it
| will include materials about how to do training and
| finetuning from scratch.
| rasbt wrote:
| I added notes to the Jupyter notebooks, I hope they are also
| readable as standalone from the repo.
| natrys wrote:
| Neural Networks: Zero to Hero[1] by Andrej Karpathy
|
| [1] https://karpathy.ai/zero-to-hero.html
| villedespommes wrote:
| +1, Andrey is an amazing educator! I'd also recommend his
| https://youtu.be/kCc8FmEb1nY?si=mP0cQlQ4rcceL2uP and checking
| out his github repos. MinGPT, for example, implements a small
| gpt model that's compatible with HF API, whereas more modern
| nanoGPT shows how to use newer features such as flash
| attention. The quality of every video, every blog post is
| just so high.
| PheonixPharts wrote:
| I honestly cannot fathom why anyone working in the AI space
| would find $50 too much to spend to gain a deeper insight into
| the subject. Creating educational materials requires an insane
| amount of work, and I can promise, no matter how successful
| this book is, if rasbt were do the math on income generated
| over hours spent creating it wouldn't make sense as an hourly
| rate.
|
| Plenty of other people have this understanding of these topics,
| and you know what they chose to do with that knowledge? Keep it
| to themselves and go work at OpenAI to make far more money
| keeping that knowledge private.
|
| If you want to live in a world where this knowledge is open, at
| the very least refrain from publicly complaining about a book
| that cost roughly the same as a decent dinner.
| rasbt wrote:
| Yeah, I don't think creating educational materials makes
| sense from an economical perspective, but it's one of my
| hobbies that gives me joy for some reason :). Hah, and
| 'insane amount of work' is probably right -- lots of
| sacrifices to carve out that necessary time.
| wslh wrote:
| Not talking about affordability but about following links
| thinking that I would find another kind of resource. Beyond
| this case, this happens all the time with click-baity
| content. Again, if the link was to Amazon or the editors it
| will be clear associated with a product while Github is
| associated with open source content. Not being pedantic, just
| an observation browsing the web.
| _giorgio_ wrote:
| Not to be pedantic, but in this case it's probably 30 usd for
| print and ebook (there are always coupons on the manning
| website).
| layer8 wrote:
| > anyone working in the AI space
|
| I would have expected the main target audience to be people
| NOT working in the AI space, that don't have any prior
| knowledge ("from scratch"), just curious to learn how an LLM
| works.
| politelemon wrote:
| I'd go with https://course.fast.ai/
|
| It's much more accessible to regular developers, and doesn't
| make assumptions about any kind of mathematics background. It's
| a good starting poing after which other similar resources start
| to make more sense.
| whartung wrote:
| Can I use any of the information in this book to learn about
| reinforcement learning?
|
| My goal is to have something learn to land, like a lunar lander.
| Simple, start at 100 feet, thrust in one direction, keep trying
| until you stop making craters.
|
| Then start adding variables, such as now it's moving
| horizontally, adding a horizontal thruster.
|
| next, remove the horizontal thruster and let the lander pivot.
|
| Etc.
|
| I just have no idea how to start with this, but this seems
| "mainstream" ML, curious if this book would help with that.
| Buttons840 wrote:
| I enjoyed "Grokking Deep Reinforcement Learning"[0]. It doesn't
| include anything about transformers though. Also, see Python's
| gymnasium[1] library for a lunar lander environment, it's the
| one I focused on most while I was learning and I've solved it a
| few different ways now. You can also look at my own notebook I
| used when implementing Soft Actor Critic with PyTorch not too
| long ago[2], it's not great for teaching, but maybe you can get
| something out of it.
|
| [0]: https://www.manning.com/books/grokking-deep-reinforcement-
| le... [1]: https://gymnasium.farama.org/environments/box2d/
| [2]: https://github.com/DevJac/learn-
| pytorch/blob/main/SAC.ipynb
| thatguysaguy wrote:
| Try OpenAI's spinning up:
| https://spinningup.openai.com/en/latest/
| Buttons840 wrote:
| This is a good and short introduction to RL. The density of
| the information in Spinning Up was just right for me and I
| think I've referred to it more often than any other resource
| when actually implementing my own RL algorithms (PPO and
| SAC).
|
| If I had to recommend a curriculum to a friend I would say:
|
| (1) Spend a few hours on Spinning Up.
|
| (2) If the mathematical notation is intimidating, read
| Grokking Deep Reinforcement Learning (from Manning), which is
| slower paced and spends a lot of time explaining the notation
| itself, rather than just assuming the mathematical notation
| is self-explanatory as is so often the case. This book has
| good theoretical explanations and will get you some running
| code.
|
| (3) Spend a few hours with Spinning Up again. By this point
| you should be a little comfortable with a few different RL
| algorithms.
|
| (4) Read Sutton's book, which is "the bible" of reinforcement
| learning. It's quite approachable, but it would be a bit dry
| and abstract without some hands-on experience with RL I
| think.
| smokel wrote:
| This book seems to focus on large language models, for which
| RLHF is sometimes a useful addition.
|
| To learn more about RL, most people would advise the Sutton and
| Barto book, available at: http://incompleteideas.net/book/the-
| book-2nd.html
| Buttons840 wrote:
| I would recommend this as a second book after reading a
| "cookbook" style book that is more focused on getting real
| code working. After some hands-on experience with RL (whether
| you succeed or fail), Sutton's book will be a lot more
| interesting and approachable.
| PheonixPharts wrote:
| Reinforcement learning is an entirely separate area of research
| from LLMs and, while often seen as part of ML (Tom Mitchell's
| classic _Machine Learning_ has a great section on Q learning,
| even if it feels a bit dated in other areas) it has little to
| do with contemporary ML work. Even with things like AlphaGo,
| what you find is basically work in using deep neural networks
| as an input into classic RL techniques.
|
| Sutton and Barto's _Reinforcement Learning: An Introduction_ is
| widely considered a the definitive intro to the topic.
| rasbt wrote:
| Sorry, in that case I would rather recommend a dedicated RL
| book. The RL part in LLMs will be very specific to LLMs, and I
| will only cover what's absolutely relevant in terms of
| background info. I do have a longish intro chapter on RL in my
| other general ML/DL book (https://github.com/rasbt/machine-
| learning-book/tree/main/ch1...) but like others said, I would
| recommend a dedicated RL book in your case.
| Karupan wrote:
| Bought a copy. Good luck rasbt!
| ijustwanttovote wrote:
| Wow, great info. Thanks for sharing.
| theogravity wrote:
| Purchased the book. Really excited to read it!
| photon_collider wrote:
| Bought a copy! Looking forward to reading it. :)
|
| Is there a way for readers to give feedback on the book as you
| write it?
| _giorgio_ wrote:
| The book's forum on manning
| canyon289 wrote:
| For an additional resource I'm writing a guide book, though its
| in various stages of completion
|
| The fine tuning guide is the best resource so far
| https://ravinkumar.com/GenAiGuidebook/language_models/finetu...
| iamcreasy wrote:
| Thank you for this endeavour.
|
| Do you have an ETA for the completion of the book?
___________________________________________________________________
(page generated 2024-01-27 23:00 UTC)