[HN Gopher] I Implemented Nystromformer
       ___________________________________________________________________
        
       I Implemented Nystromformer
        
       Author : dagli
       Score  : 23 points
       Date   : 2022-08-21 07:45 UTC (1 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | JohnDeHope wrote:
       | I thought it'd have something to do with tech author and Dart
       | programming language guy: Bob Nystrom aka @munificentbob.
        
         | nextfx wrote:
         | I assume the name is referring to the Nystrom low-rank matrix
         | approximation.
        
           | throwawaymaths wrote:
           | Yannic kilchers explanation of the nystromformer is really
           | good and clear (though it might help to understand
           | transformers to start out with) https://youtu.be/m-zrcmRd7E4
        
       | sarosh wrote:
       | The repo from the paper: https://github.com/mlpen/Nystromformer
        
       | uniqueuid wrote:
       | There's quite a bit of impressive jargon in that repo and the
       | video!
       | 
       | I had to briefly look at the paper abstract, which explains that
       | this is about solving the sequence limit of transformer text
       | models:
       | 
       | >While beneficial, the quadratic complexity of self-attention on
       | the input sequence length has limited its application to longer
       | sequences -- a topic being actively studied in the community. To
       | address this limitation, we propose Nystromformer -- a model that
       | exhibits favorable scalability as a function of sequence length.
       | 
       | That's cool -- I'm looking forward to being able to process texts
       | > 512 tokens in the future, and would be especially excited if
       | that were possible for sentence-bert.
        
       ___________________________________________________________________
       (page generated 2022-08-22 23:01 UTC)