[HN Gopher] The Structure of Neural Embeddings
___________________________________________________________________
The Structure of Neural Embeddings
Author : sean_pedersen
Score : 56 points
Date : 2024-12-27 17:55 UTC (5 hours ago)
(HTM) web link (seanpedersen.github.io)
(TXT) w3m dump (seanpedersen.github.io)
| tomrod wrote:
| Oh wow, great set of reads. Thanks to @sean_pedersen for posting,
| looking forward to reviewing this in my closeout this year.
| jmward01 wrote:
| Current embeddings are badly trained and are massively holding
| back networks. A core issue is something I call 'token drag'. Low
| frequency tokens, when they finally come up, drag the model back
| towards an earlier state causing a lot of lost training. This
| leads to the first few layers of a model effectively being
| dedicated to just being a buffer to the bad embeddings feeding
| the model. Luckily fixing this is actually really easy. Creating
| a sacrificial two layer network to predict embeddings in training
| (and then just calculating the embeddings once for prod
| inference) gives a massive boost to training. To see this in
| action check out the unified embeddings in this project:
| https://github.com/jmward01/lmplay
___________________________________________________________________
(page generated 2024-12-27 23:01 UTC)