[HN Gopher] I Implemented Nystromformer
___________________________________________________________________
I Implemented Nystromformer
Author : dagli
Score : 23 points
Date : 2022-08-21 07:45 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| JohnDeHope wrote:
| I thought it'd have something to do with tech author and Dart
| programming language guy: Bob Nystrom aka @munificentbob.
| nextfx wrote:
| I assume the name is referring to the Nystrom low-rank matrix
| approximation.
| throwawaymaths wrote:
| Yannic kilchers explanation of the nystromformer is really
| good and clear (though it might help to understand
| transformers to start out with) https://youtu.be/m-zrcmRd7E4
| sarosh wrote:
| The repo from the paper: https://github.com/mlpen/Nystromformer
| uniqueuid wrote:
| There's quite a bit of impressive jargon in that repo and the
| video!
|
| I had to briefly look at the paper abstract, which explains that
| this is about solving the sequence limit of transformer text
| models:
|
| >While beneficial, the quadratic complexity of self-attention on
| the input sequence length has limited its application to longer
| sequences -- a topic being actively studied in the community. To
| address this limitation, we propose Nystromformer -- a model that
| exhibits favorable scalability as a function of sequence length.
|
| That's cool -- I'm looking forward to being able to process texts
| > 512 tokens in the future, and would be especially excited if
| that were possible for sentence-bert.
___________________________________________________________________
(page generated 2022-08-22 23:01 UTC)