[HN Gopher] A Visual Guide to Vision Transformers
___________________________________________________________________
A Visual Guide to Vision Transformers
Author : md2rp
Score : 180 points
Date : 2024-04-16 14:00 UTC (9 hours ago)
(HTM) web link (blog.mdturp.ch)
(TXT) w3m dump (blog.mdturp.ch)
| md2rp wrote:
| A Visual Guide to Vision Transformers This is a visual guide to
| Vision Transformers (ViTs), a class of deep learning models that
| have achieved state-of-the-art performance on image
| classification tasks. Vision Transformers apply the transformer
| architecture, originally designed for natural language processing
| (NLP), to image data. This guide will walk you through the key
| components of Vision Transformers in a scroll story format, using
| visualizations and simple explanations to help you understand how
| these models work and how the flow of the data through the model
| looks like.
| bArray wrote:
| Nice! A small piece of feedback: I would have the dimensions
| mentioned in the text also annotated on the diagram. It wasn't
| exactly clear how the input data was flattened for example.
| byteknight wrote:
| Would also add, as a 100% math idiot, linear transformations,
| and how it performs them is not explained.
|
| Entirely plausible this is intended for someone more
| "mathmatical" than myself but appreciate the work regardless.
| md2rp wrote:
| Thanks for the feedback! I left it out intentionally but
| probably worth thinking about doing a more fundamental
| guide!
| md2rp wrote:
| Thanks for the feedback! Will add it in the revision!
| challenger-derp wrote:
| Very nice. I wish I could do this sort of scroll story in my
| digital notes. Is this done with a javascript library?
| md2rp wrote:
| Yes this was done with a combination of GSAP Scrolltrigger
| https://gsap.com/docs/v3/Plugins/ScrollTrigger/ and
| https://d3js.org/
| TuringTest wrote:
| That kind of scroll is OK-ish for a background parallax
| effect, or maybe some pretty fade-in/out effects while
| elements scroll into view (without changing their relative
| position in the page).
|
| When it interferes with the main functionality of the page,
| namely reading the content, they break accessibility,
| distract over understanding the difficult topic, make the
| content brittle against changes in the platform (different
| browsers or future standard updates), and as others pointed
| out make it difficult or impossible to use alternative
| presentations.
|
| With most comments commenting on the presentation and not on
| the content, I think it makes clear that it detracts from the
| experience more than helps.
| tantalor wrote:
| Stop scrollytelling! It's awful, nobody should do this.
| 4chandaily wrote:
| Agreed. My scroll wheel should scroll the page, not advance
| slides or split birds or whatever else. If you need to do this
| kind of information display, use buttons or a UI widget to
| control it. Don't hijack the HID devices I use for accessibly
| operating my computer.
|
| This goes for Scroll Wheels, Scrollbars, the Back Button, the
| Right Click Button, or any other standard input paradigm.
| (please) Don't fuck with these! Some of us make use of
| accessibility features, and messing with our interfaces makes
| these break or behave in unexpected ways.
| layer8 wrote:
| This. You can't use reader mode, you can't save the page as a
| PDF, you can't use PageUp/PageDown because you'll miss some in-
| between state, and the scroll position where a certain image is
| shown may not be the preferred one for reading the
| corresponding text. And the JS will invariably break sooner or
| later.
| elicash wrote:
| I'd be annoyed if my bank did this, or airlines, or anything
| where I just need to get a task done.
|
| For personal websites, I actually think individuality and fun
| and creativity are good.
| observationist wrote:
| It's aggressively inaccessible. I don't know if it's a "I'm a
| web designer, I know better" thing or what.
|
| Web designers: Don't let form interfere with function. The
| function of this page is to communicate information about
| transformers. The form effectively prevents that from
| happening. Don't do it. No, bad, stop.
| SpaceManNabs wrote:
| Lucas Beyer has a lot of references and material as well that I
| recommend.
| causal wrote:
| I like this, but think there is some crucial motivation missing
| in steps 10.1-10.3 regarding what query/key weights are and why
| they're needed.
| ThouYS wrote:
| yes, same issue in all transformer tutorials
| causal wrote:
| The 2b1b video was the first to make it click for me
| hotdogscout wrote:
| You mean 3b1b (three blue one brown)?
| causal wrote:
| Ah that's right, miscounted the blues
| lordswork wrote:
| I suspect this is because most people (including people
| writing these tutorials) don't have a strong grasp on this
| piece as well.
| vikiomega9 wrote:
| this post made sense to me https://teltam.github.io/posts/soft-
| dictionary-keys.html
|
| It helps to think of kqv as a form of look up.
| lyapunova wrote:
| To be honest, I actually really like the visual delivery here.
| It's especially helpful for understanding what's going on with
| computer vision problems. Please make more!
___________________________________________________________________
(page generated 2024-04-16 23:01 UTC)