[HN Gopher] The Structure of Neural Embeddings
       ___________________________________________________________________
        
       The Structure of Neural Embeddings
        
       Author : sean_pedersen
       Score  : 56 points
       Date   : 2024-12-27 17:55 UTC (5 hours ago)
        
 (HTM) web link (seanpedersen.github.io)
 (TXT) w3m dump (seanpedersen.github.io)
        
       | tomrod wrote:
       | Oh wow, great set of reads. Thanks to @sean_pedersen for posting,
       | looking forward to reviewing this in my closeout this year.
        
       | jmward01 wrote:
       | Current embeddings are badly trained and are massively holding
       | back networks. A core issue is something I call 'token drag'. Low
       | frequency tokens, when they finally come up, drag the model back
       | towards an earlier state causing a lot of lost training. This
       | leads to the first few layers of a model effectively being
       | dedicated to just being a buffer to the bad embeddings feeding
       | the model. Luckily fixing this is actually really easy. Creating
       | a sacrificial two layer network to predict embeddings in training
       | (and then just calculating the embeddings once for prod
       | inference) gives a massive boost to training. To see this in
       | action check out the unified embeddings in this project:
       | https://github.com/jmward01/lmplay
        
       ___________________________________________________________________
       (page generated 2024-12-27 23:01 UTC)