[HN Gopher] LoMA: Lossless Compressed Memory Attention
___________________________________________________________________
LoMA: Lossless Compressed Memory Attention
Author : PaulHoule
Score : 62 points
Date : 2024-01-27 17:37 UTC (5 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| buildbot wrote:
| Very interesting, but on my initial skim through, I don't
| understand how this technique is lossless? Reproduced from the
| methods section:
|
| 1. Select a sequence of tc tokens that the model has already
| generated or completed predicting as the reading area. 2. Insert
| t '<m>' tokens at once after the reading area to serve as the
| memory area. 3. The model performs a single inference on the
| memory area, but discards the model's output, retaining only the
| KV pairs from each layer. 4. Discard the reading area, and the
| model continues generating text from after the memory area.
|
| Isn't the memory area a lossy compression of the reading area?
| godelski wrote:
| The paper is very confusing and should have a title change. In
| their results (4.2.1) they say
|
| > The observation that L_Repeat converges rapidly to a value
| close to zero under smaller compression ratios is significant.
| It demonstrates that the method is highly effective in
| compressing information losslessly into memory tokens
|
| So I'm also unconvinced
| godelski wrote:
| Cool, but poorly written. Why on page 4 spend half of it on
| standard attention (which should be assumed knowledge to a reader
| at this point) but then not explain equation 10? what is L? Why
| is there an identity matrix? I don't need equations 6-9 but I
| sure do need more information on 10-14. I hope there's code
|
| What a weird line in Figure 2
|
| > Note: L_LM represents L_LM, and L_Repeat represents L_Repeat,
| and Loss represents L.
|
| Tautology is not helpful here.
|
| And what's everyone's aversion to log plots? Figure 3 is
| unreadable but would be perfect with log.
|
| And where's the Appendix? It's referenced from the paper.... I'm
| also unconvinced it's lossless
| Solvency wrote:
| They spend that much time reiterating well-trodden established
| information because that's the easiest to redundantly speak to.
| All of the novel stuff is suspiciously vague and detail is left
| to the imagination. Because the author is fudging things.
___________________________________________________________________
(page generated 2024-01-27 23:00 UTC)