[HN Gopher] The Illustrated Transformer (2018)
       ___________________________________________________________________
        
       The Illustrated Transformer (2018)
        
       Author : debdut
       Score  : 152 points
       Date   : 2024-07-02 22:42 UTC (1 days ago)
        
 (HTM) web link (jalammar.github.io)
 (TXT) w3m dump (jalammar.github.io)
        
       | ryan-duve wrote:
       | I gave a talk on using Google BERT for financial services
       | problems at a machine learning conference in early 2019. During
       | my preparation, this was the only resource on transformers I
       | could find that was even remotely understandable to me.
       | 
       | I had a lot of trouble understand what was going on from just the
       | original publication[0].
       | 
       | [0] https://arxiv.org/abs/1706.03762
        
         | andoando wrote:
         | Maybe Im dumb but I still can't make much sense of this.
        
         | isaacfung wrote:
         | Maybe it's easier to understand in the format of annotated code
         | 
         | https://nlp.seas.harvard.edu/2018/04/03/attention.html
        
           | ronald_petty wrote:
           | Updated (in case above link goes away) -
           | https://nlp.seas.harvard.edu/annotated-transformer/
           | 
           | Thanks for original!
        
       | jerpint wrote:
       | I go back regligiously to this post whenever I need a quick
       | visual refresh on how transformers work, I can't overstate how
       | fantastic it is
        
       | xianshou wrote:
       | Illustrated Transformer is amazing as a way of understanding the
       | original transformer architecture step-by-step, but if you want
       | to truly visualize how information flows through a decoder-only
       | architecture - from nanoGPT all the way up to a fully represented
       | GPT-3 - _nothing_ beats this:
       | 
       | https://bbycroft.net/llm
        
         | cpldcpu wrote:
         | whoa, that's awesome.
        
       | crystal_revenge wrote:
       | While I absolutely love this illustration (and frankly everything
       | Jay Alammar does), it is worth recognizing there is a distinction
       | between visualizing _how_ a transformer (or any model really
       | works) and _what_ the transformer is doing.
       | 
       | My favorite article on the latter is Cosma Shalizi's excellent
       | post showing that all "attention" is really doing is kernel
       | smoothing [0]. Personally having this 'click' was a bigger
       | insight for me than walking through this post and implementing
       | "attention is all you need".
       | 
       | In a very real sense transformers are just performing compression
       | and providing a soft lookup functionality on top of an
       | unimaginably large dataset (basically the majority of human
       | writing). This understanding of LLMs helps to better understand
       | their limitations as well as their, imho untapped, usefulness.
       | 
       | 0. http://bactra.org/notebooks/nn-attention-and-
       | transformers.ht...
        
       | photon_lines wrote:
       | Great post and write-up - I also made an in-depth explorations
       | and did my best to use visuals - for anyone interested you can
       | find it here: https://photonlines.substack.com/p/intuitive-and-
       | visual-guid...
        
       ___________________________________________________________________
       (page generated 2024-07-03 23:02 UTC)