[HN Gopher] Building an LLM from Scratch: Automatic Differentiation
___________________________________________________________________
Building an LLM from Scratch: Automatic Differentiation
Author : netwrt
Score : 95 points
Date : 2024-02-15 20:01 UTC (2 hours ago)
(HTM) web link (bclarkson-code.github.io)
(TXT) w3m dump (bclarkson-code.github.io)
| asgraham wrote:
| As a chronic premature optimizer my first reaction was, "Is this
| even possible in vanilla python???" Obviously it's _possible_ ,
| but can you train an LLM before the heat death of the universe? A
| perceptron, sure, of course. A deep learning model, plausible if
| it's not too deep. But a _large_ language model? I.e. the kind of
| LLM necessary for "from vanilla python to functional coding
| assistant."
|
| But obviously the author already thought of that. The source repo
| has a great motto: "It don't go fast but it do be goin'" [1]
|
| I love the idea of the project and I'm curious to see what the
| endgame runtime will be.
|
| [1] https://github.com/bclarkson-code/Tricycle
| gkbrk wrote:
| Why wouldn't it be possible? You can generate machine code with
| Python and call into it with ctypes. All your deep learning
| code is still in Python, but in the runtime it gets JIT
| compiled into something faster.
| cafaxo wrote:
| I did a similar thing for Julia: Llama2.jl contains vanilla Julia
| code [1] for training small Llama2-style models on the CPU.
|
| [1] https://github.com/cafaxo/Llama2.jl/tree/master/src/training
| itissid wrote:
| Every one should go through this _rite of passage_ work and get
| to the "Attention is all you need" implementation. It's a world
| where engineering and the academic papers are very close and
| reproducible and a must for you to progress in the field.
|
| (see also andre karpathys zero to hero nn series on youtube as
| well its very good and similar to this work)
___________________________________________________________________
(page generated 2024-02-15 23:00 UTC)