[HN Gopher] Building an LLM from Scratch: Automatic Differentiat...
___________________________________________________________________
Building an LLM from Scratch: Automatic Differentiation (2023)
Author : netwrt
Score : 321 points
Date : 2024-02-15 20:01 UTC (1 days ago)
(HTM) web link (bclarkson-code.github.io)
(TXT) w3m dump (bclarkson-code.github.io)
| asgraham wrote:
| As a chronic premature optimizer my first reaction was, "Is this
| even possible in vanilla python???" Obviously it's _possible_ ,
| but can you train an LLM before the heat death of the universe? A
| perceptron, sure, of course. A deep learning model, plausible if
| it's not too deep. But a _large_ language model? I.e. the kind of
| LLM necessary for "from vanilla python to functional coding
| assistant."
|
| But obviously the author already thought of that. The source repo
| has a great motto: "It don't go fast but it do be goin'" [1]
|
| I love the idea of the project and I'm curious to see what the
| endgame runtime will be.
|
| [1] https://github.com/bclarkson-code/Tricycle
| gkbrk wrote:
| Why wouldn't it be possible? You can generate machine code with
| Python and call into it with ctypes. All your deep learning
| code is still in Python, but in the runtime it gets JIT
| compiled into something faster.
| cafaxo wrote:
| I did a similar thing for Julia: Llama2.jl contains vanilla Julia
| code [1] for training small Llama2-style models on the CPU.
|
| [1] https://github.com/cafaxo/Llama2.jl/tree/master/src/training
| andxor_ wrote:
| Great stuff. Thanks for sharing.
| 3abiton wrote:
| How hard was it to find open source data nowadays? I saw that
| books3 are already made illegal to train on.
| itissid wrote:
| Every one should go through this _rite of passage_ work and get
| to the "Attention is all you need" implementation. It's a world
| where engineering and the academic papers are very close and
| reproducible and a must for you to progress in the field.
|
| (see also andre karpathys zero to hero nn series on youtube as
| well its very good and similar to this work)
| bschne wrote:
| +1 for Karpathy, the series is really good
| calebkaiser wrote:
| I would also recommend going through Callum McDougall/Neel
| Nanda's fantastic Transformer from Scratch tutorial. It takes a
| different approach to conceptualizing the model (or at least,
| it implements it in a way which emphasizes different
| characteristics of Transformers and self-attention), which I
| found deeply satisfying when I first explored them.
|
| https://arena-ch1-transformers.streamlit.app/%5B1.1%5D_Trans...
| joshua11 wrote:
| Thanks for sharing. This is a nice resource
| wredue wrote:
| Is this YouTube series also "from scratch (but not really)"
|
| Edit - it is. Not to talk down on the series. I'm sure it's
| good, but it is actually "LLM with PyTorch".
|
| Edit - I looked again and I was actually not correct. He does
| ultimately use frameworks, but gives some early talk about how
| those function under the hood.
| deadfast wrote:
| I appreciate you coming back and giving more details, it
| encourages me to look into it now. Maybe my expectations on
| the internet are just low, but I thought it was a virtuous
| act worth the effort, I wish more people would continue with
| skepticism but be willing to follow through and let their
| opinions change given solid evidence.
| simfoo wrote:
| That magic moment in Karpathys first video when he gets to the
| loss function and calls backward for the first time - this is
| when it clicked for me. Highly recommended!
| nqzero wrote:
| is there an existing SLM that resembles an LLM in architecture
| that includes the code for training it ?
|
| i realize the cost and time to train may be prohibitive and that
| quality on general english might be very limited, but is the code
| itself available ?
| sva_ wrote:
| Not sure what you mean with SLM, but
| https://github.com/karpathy/nanoGPT
| andxor_ wrote:
| Very well written. AD is like magic and this is a good exposition
| on the basic building block.
|
| I quite like Jeremy's approach:
| https://nbviewer.org/github/fastai/fastbook/blob/master/17_f...
|
| It shows a very simple "Pythonic" approach to assemble gradient
| of a composition of functions from the gradients of the
| components.
| revskill wrote:
| The only problem is it's implemented in Python. One reason is i
| hate to install python on my machine, and i don't know how to
| manage dependencies. The MacOSX required to upgrade to install
| native stuffs. Such a hell.
___________________________________________________________________
(page generated 2024-02-16 23:02 UTC)