[HN Gopher] Three things everyone should know about Vision Trans...
___________________________________________________________________
Three things everyone should know about Vision Transformers
Author : reqo
Score : 43 points
Date : 2025-04-24 15:53 UTC (7 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| Centigonal wrote:
| There's something that tickles me about this paper's title. The
| thought that _everyone_ should know these three things. The idea
| of going to my neighbor who 's a retired K-12 teacher and telling
| her about how adding MLP-based patch pre-processing layers
| improves Bert-like self-supervised training based on patch
| masking.
| pixl97 wrote:
| Hey, when the AI powered T-rex is chasing you down you'll wish
| you paid attention that the vision transformers perception is
| based on movement!
|
| Had to throw some Jurassic Park humor in here.
| woopwoop wrote:
| Clickbait titles are something of a tradition in this field by
| now. Some important paper titles include "One weird trick for
| parallelizing convolutional neural networks", "Attention is all
| you need", and "A picture is worth 16x16 words". Personally I
| still find it kind of irritating, but to each their own I
| guess.
| minimaxir wrote:
| Only the first one is clickbait in the style of blogs that
| incentivize you to click on the headline (i.e. the
| information gap), the last two are just fun puns.
| janalsncm wrote:
| Honestly I took the first one as making fun of that trope.
| Usually the "one weird trick to" ends in some tabloid-style
| thing like lose 15 pounds or find out if your husband is
| loyal. So "parallizing CNNs" is a joke, as if that's
| something you'd see in a checkout isle.
| woopwoop wrote:
| In what sense is "Attention is all you need" a pun?
| minimaxir wrote:
| It's a reference to the lyric "love is all you need" from
| the song "All You Need Is Love" by the Beatles, and it
| uses a faux-synonym with a different meaning.
| adultSwim wrote:
| "Attention is all you need" is an outlier. They backed up
| their bold claim with breakthrough results.
|
| For modest incremental improvements, I greatly prefer boring
| technical titles. Not everything needs to a stochastic
| parrot. We see this dynamic with building luxury condos. On
| any individual project, making that pick will help juice
| profit. When the whole city follows that , it leads to a less
| desirable outcome.
| guerrilla wrote:
| Yeah, I guess today was the day that I learned I am not part of
| "everyone". I feel so left out now.
| i5heu wrote:
| I put this paper into 4o so i can check if it is relevant, so
| that you do not have to do this too here are the bullet points:
|
| - Vision Transformers can be parallelized to reduce latency and
| improve optimization without sacrificing accuracy.
|
| - Fine-tuning only the attention layers is often sufficient for
| adapting ViTs to new tasks or resolutions, saving compute and
| memory.
|
| - Using MLP-based patch preprocessing improves performance in
| masked self-supervised learning by preserving patch independence.
| Jamesoncrate wrote:
| just read the abstract
| jmugan wrote:
| You would think. I don't know about this paper in particular,
| but I'm continually surprised about how much more I get out
| of LLM summaries of papers than the abstracts of papers
| written by the authors.
| tough wrote:
| This would be an interesting metric to track, how different
| an abstract generated from LLM giving it the paper as
| source, vs the actual abstract is, and if it has any
| correlation whatsoever with the overall quality of the
| paper or not
| mananaysiempre wrote:
| Paper abstracts are not optimized by drive-by readers like
| you and me. They are optimized for active researchers in
| the field reading their daily arXiv digest that lists _all_
| the new papers across the categories they work in, and
| needing to take the read /don't-read decision for each
| entry there as efficiently as possible.
|
| If you've already decided you're interested in the paper,
| then the Introduction and/or Conclusion sections are what
| you're looking for.
| kridsdale3 wrote:
| Same. I don't think GP deserves the downvotes.
___________________________________________________________________
(page generated 2025-04-24 23:01 UTC)