hngopher.com

       [HN Gopher] Building an LLM from Scratch: Automatic Differentiation
       ___________________________________________________________________
        
       Building an LLM from Scratch: Automatic Differentiation
        
       Author : netwrt
       Score  : 95 points
       Date   : 2024-02-15 20:01 UTC (2 hours ago)
        
 (HTM) web link (bclarkson-code.github.io)
 (TXT) w3m dump (bclarkson-code.github.io)
        
       | asgraham wrote:
       | As a chronic premature optimizer my first reaction was, "Is this
       | even possible in vanilla python???" Obviously it's _possible_ ,
       | but can you train an LLM before the heat death of the universe? A
       | perceptron, sure, of course. A deep learning model, plausible if
       | it's not too deep. But a _large_ language model? I.e. the kind of
       | LLM necessary for  "from vanilla python to functional coding
       | assistant."
       | 
       | But obviously the author already thought of that. The source repo
       | has a great motto: "It don't go fast but it do be goin'" [1]
       | 
       | I love the idea of the project and I'm curious to see what the
       | endgame runtime will be.
       | 
       | [1] https://github.com/bclarkson-code/Tricycle
        
         | gkbrk wrote:
         | Why wouldn't it be possible? You can generate machine code with
         | Python and call into it with ctypes. All your deep learning
         | code is still in Python, but in the runtime it gets JIT
         | compiled into something faster.
        
       | cafaxo wrote:
       | I did a similar thing for Julia: Llama2.jl contains vanilla Julia
       | code [1] for training small Llama2-style models on the CPU.
       | 
       | [1] https://github.com/cafaxo/Llama2.jl/tree/master/src/training
        
       | itissid wrote:
       | Every one should go through this _rite of passage_ work and get
       | to the  "Attention is all you need" implementation. It's a world
       | where engineering and the academic papers are very close and
       | reproducible and a must for you to progress in the field.
       | 
       | (see also andre karpathys zero to hero nn series on youtube as
       | well its very good and similar to this work)
        
       ___________________________________________________________________
       (page generated 2024-02-15 23:00 UTC)