hngopher.com

       [HN Gopher] Three things everyone should know about Vision Trans...
       ___________________________________________________________________
        
       Three things everyone should know about Vision Transformers
        
       Author : reqo
       Score  : 43 points
       Date   : 2025-04-24 15:53 UTC (7 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | Centigonal wrote:
       | There's something that tickles me about this paper's title. The
       | thought that _everyone_ should know these three things. The idea
       | of going to my neighbor who 's a retired K-12 teacher and telling
       | her about how adding MLP-based patch pre-processing layers
       | improves Bert-like self-supervised training based on patch
       | masking.
        
         | pixl97 wrote:
         | Hey, when the AI powered T-rex is chasing you down you'll wish
         | you paid attention that the vision transformers perception is
         | based on movement!
         | 
         | Had to throw some Jurassic Park humor in here.
        
         | woopwoop wrote:
         | Clickbait titles are something of a tradition in this field by
         | now. Some important paper titles include "One weird trick for
         | parallelizing convolutional neural networks", "Attention is all
         | you need", and "A picture is worth 16x16 words". Personally I
         | still find it kind of irritating, but to each their own I
         | guess.
        
           | minimaxir wrote:
           | Only the first one is clickbait in the style of blogs that
           | incentivize you to click on the headline (i.e. the
           | information gap), the last two are just fun puns.
        
             | janalsncm wrote:
             | Honestly I took the first one as making fun of that trope.
             | Usually the "one weird trick to" ends in some tabloid-style
             | thing like lose 15 pounds or find out if your husband is
             | loyal. So "parallizing CNNs" is a joke, as if that's
             | something you'd see in a checkout isle.
        
             | woopwoop wrote:
             | In what sense is "Attention is all you need" a pun?
        
               | minimaxir wrote:
               | It's a reference to the lyric "love is all you need" from
               | the song "All You Need Is Love" by the Beatles, and it
               | uses a faux-synonym with a different meaning.
        
           | adultSwim wrote:
           | "Attention is all you need" is an outlier. They backed up
           | their bold claim with breakthrough results.
           | 
           | For modest incremental improvements, I greatly prefer boring
           | technical titles. Not everything needs to a stochastic
           | parrot. We see this dynamic with building luxury condos. On
           | any individual project, making that pick will help juice
           | profit. When the whole city follows that , it leads to a less
           | desirable outcome.
        
         | guerrilla wrote:
         | Yeah, I guess today was the day that I learned I am not part of
         | "everyone". I feel so left out now.
        
       | i5heu wrote:
       | I put this paper into 4o so i can check if it is relevant, so
       | that you do not have to do this too here are the bullet points:
       | 
       | - Vision Transformers can be parallelized to reduce latency and
       | improve optimization without sacrificing accuracy.
       | 
       | - Fine-tuning only the attention layers is often sufficient for
       | adapting ViTs to new tasks or resolutions, saving compute and
       | memory.
       | 
       | - Using MLP-based patch preprocessing improves performance in
       | masked self-supervised learning by preserving patch independence.
        
         | Jamesoncrate wrote:
         | just read the abstract
        
           | jmugan wrote:
           | You would think. I don't know about this paper in particular,
           | but I'm continually surprised about how much more I get out
           | of LLM summaries of papers than the abstracts of papers
           | written by the authors.
        
             | tough wrote:
             | This would be an interesting metric to track, how different
             | an abstract generated from LLM giving it the paper as
             | source, vs the actual abstract is, and if it has any
             | correlation whatsoever with the overall quality of the
             | paper or not
        
             | mananaysiempre wrote:
             | Paper abstracts are not optimized by drive-by readers like
             | you and me. They are optimized for active researchers in
             | the field reading their daily arXiv digest that lists _all_
             | the new papers across the categories they work in, and
             | needing to take the read /don't-read decision for each
             | entry there as efficiently as possible.
             | 
             | If you've already decided you're interested in the paper,
             | then the Introduction and/or Conclusion sections are what
             | you're looking for.
        
             | kridsdale3 wrote:
             | Same. I don't think GP deserves the downvotes.
        
       ___________________________________________________________________
       (page generated 2025-04-24 23:01 UTC)