[HN Gopher] Building an LLM from Scratch: Automatic Differentiat...
       ___________________________________________________________________
        
       Building an LLM from Scratch: Automatic Differentiation (2023)
        
       Author : netwrt
       Score  : 321 points
       Date   : 2024-02-15 20:01 UTC (1 days ago)
        
 (HTM) web link (bclarkson-code.github.io)
 (TXT) w3m dump (bclarkson-code.github.io)
        
       | asgraham wrote:
       | As a chronic premature optimizer my first reaction was, "Is this
       | even possible in vanilla python???" Obviously it's _possible_ ,
       | but can you train an LLM before the heat death of the universe? A
       | perceptron, sure, of course. A deep learning model, plausible if
       | it's not too deep. But a _large_ language model? I.e. the kind of
       | LLM necessary for  "from vanilla python to functional coding
       | assistant."
       | 
       | But obviously the author already thought of that. The source repo
       | has a great motto: "It don't go fast but it do be goin'" [1]
       | 
       | I love the idea of the project and I'm curious to see what the
       | endgame runtime will be.
       | 
       | [1] https://github.com/bclarkson-code/Tricycle
        
         | gkbrk wrote:
         | Why wouldn't it be possible? You can generate machine code with
         | Python and call into it with ctypes. All your deep learning
         | code is still in Python, but in the runtime it gets JIT
         | compiled into something faster.
        
       | cafaxo wrote:
       | I did a similar thing for Julia: Llama2.jl contains vanilla Julia
       | code [1] for training small Llama2-style models on the CPU.
       | 
       | [1] https://github.com/cafaxo/Llama2.jl/tree/master/src/training
        
         | andxor_ wrote:
         | Great stuff. Thanks for sharing.
        
         | 3abiton wrote:
         | How hard was it to find open source data nowadays? I saw that
         | books3 are already made illegal to train on.
        
       | itissid wrote:
       | Every one should go through this _rite of passage_ work and get
       | to the  "Attention is all you need" implementation. It's a world
       | where engineering and the academic papers are very close and
       | reproducible and a must for you to progress in the field.
       | 
       | (see also andre karpathys zero to hero nn series on youtube as
       | well its very good and similar to this work)
        
         | bschne wrote:
         | +1 for Karpathy, the series is really good
        
         | calebkaiser wrote:
         | I would also recommend going through Callum McDougall/Neel
         | Nanda's fantastic Transformer from Scratch tutorial. It takes a
         | different approach to conceptualizing the model (or at least,
         | it implements it in a way which emphasizes different
         | characteristics of Transformers and self-attention), which I
         | found deeply satisfying when I first explored them.
         | 
         | https://arena-ch1-transformers.streamlit.app/%5B1.1%5D_Trans...
        
           | joshua11 wrote:
           | Thanks for sharing. This is a nice resource
        
         | wredue wrote:
         | Is this YouTube series also "from scratch (but not really)"
         | 
         | Edit - it is. Not to talk down on the series. I'm sure it's
         | good, but it is actually "LLM with PyTorch".
         | 
         | Edit - I looked again and I was actually not correct. He does
         | ultimately use frameworks, but gives some early talk about how
         | those function under the hood.
        
           | deadfast wrote:
           | I appreciate you coming back and giving more details, it
           | encourages me to look into it now. Maybe my expectations on
           | the internet are just low, but I thought it was a virtuous
           | act worth the effort, I wish more people would continue with
           | skepticism but be willing to follow through and let their
           | opinions change given solid evidence.
        
         | simfoo wrote:
         | That magic moment in Karpathys first video when he gets to the
         | loss function and calls backward for the first time - this is
         | when it clicked for me. Highly recommended!
        
       | nqzero wrote:
       | is there an existing SLM that resembles an LLM in architecture
       | that includes the code for training it ?
       | 
       | i realize the cost and time to train may be prohibitive and that
       | quality on general english might be very limited, but is the code
       | itself available ?
        
         | sva_ wrote:
         | Not sure what you mean with SLM, but
         | https://github.com/karpathy/nanoGPT
        
       | andxor_ wrote:
       | Very well written. AD is like magic and this is a good exposition
       | on the basic building block.
       | 
       | I quite like Jeremy's approach:
       | https://nbviewer.org/github/fastai/fastbook/blob/master/17_f...
       | 
       | It shows a very simple "Pythonic" approach to assemble gradient
       | of a composition of functions from the gradients of the
       | components.
        
       | revskill wrote:
       | The only problem is it's implemented in Python. One reason is i
       | hate to install python on my machine, and i don't know how to
       | manage dependencies. The MacOSX required to upgrade to install
       | native stuffs. Such a hell.
        
       ___________________________________________________________________
       (page generated 2024-02-16 23:02 UTC)