[HN Gopher] The Illustrated Transformer (2018)
___________________________________________________________________
The Illustrated Transformer (2018)
Author : debdut
Score : 152 points
Date : 2024-07-02 22:42 UTC (1 days ago)
(HTM) web link (jalammar.github.io)
(TXT) w3m dump (jalammar.github.io)
| ryan-duve wrote:
| I gave a talk on using Google BERT for financial services
| problems at a machine learning conference in early 2019. During
| my preparation, this was the only resource on transformers I
| could find that was even remotely understandable to me.
|
| I had a lot of trouble understand what was going on from just the
| original publication[0].
|
| [0] https://arxiv.org/abs/1706.03762
| andoando wrote:
| Maybe Im dumb but I still can't make much sense of this.
| isaacfung wrote:
| Maybe it's easier to understand in the format of annotated code
|
| https://nlp.seas.harvard.edu/2018/04/03/attention.html
| ronald_petty wrote:
| Updated (in case above link goes away) -
| https://nlp.seas.harvard.edu/annotated-transformer/
|
| Thanks for original!
| jerpint wrote:
| I go back regligiously to this post whenever I need a quick
| visual refresh on how transformers work, I can't overstate how
| fantastic it is
| xianshou wrote:
| Illustrated Transformer is amazing as a way of understanding the
| original transformer architecture step-by-step, but if you want
| to truly visualize how information flows through a decoder-only
| architecture - from nanoGPT all the way up to a fully represented
| GPT-3 - _nothing_ beats this:
|
| https://bbycroft.net/llm
| cpldcpu wrote:
| whoa, that's awesome.
| crystal_revenge wrote:
| While I absolutely love this illustration (and frankly everything
| Jay Alammar does), it is worth recognizing there is a distinction
| between visualizing _how_ a transformer (or any model really
| works) and _what_ the transformer is doing.
|
| My favorite article on the latter is Cosma Shalizi's excellent
| post showing that all "attention" is really doing is kernel
| smoothing [0]. Personally having this 'click' was a bigger
| insight for me than walking through this post and implementing
| "attention is all you need".
|
| In a very real sense transformers are just performing compression
| and providing a soft lookup functionality on top of an
| unimaginably large dataset (basically the majority of human
| writing). This understanding of LLMs helps to better understand
| their limitations as well as their, imho untapped, usefulness.
|
| 0. http://bactra.org/notebooks/nn-attention-and-
| transformers.ht...
| photon_lines wrote:
| Great post and write-up - I also made an in-depth explorations
| and did my best to use visuals - for anyone interested you can
| find it here: https://photonlines.substack.com/p/intuitive-and-
| visual-guid...
___________________________________________________________________
(page generated 2024-07-03 23:02 UTC)