[HN Gopher] Unlimiformer: Long-Range Transformers with Unlimited...
___________________________________________________________________
Unlimiformer: Long-Range Transformers with Unlimited Length Input
Author : shishy
Score : 197 points
Date : 2023-05-05 17:53 UTC (5 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| sva_ wrote:
| I think infiniformer would've sounded better. The bench scores
| seem pretty marginal.
| mirekrusin wrote:
| Pretty marginal score gains once a week is all you need.
| sdenton4 wrote:
| Only so long as a) the gains are real, and not overfitting
| the test dataset, and b) you don't balloon in complexity, so
| that stacking approaches becomes impossible to manage.
|
| Point (a) is extremely hard to discern, especially when
| people are chasing third-significant-digit gains on common
| benchmarks; it's essentially multiple-testing false discovery
| in action. I've seen whole families of methods fail to
| transfer to new domains...
|
| Point (b) is also a real issue. As you increase the number of
| bells and whistles, each with their own hyperparameters with
| non-linear impacts on model quality, it becomes impossible to
| say what's working or not.
|
| In practice, i think we see some cycles of baroque
| incremental improvements, followed by someone spending a year
| stripping away the bullshit and getting something simple that
| outperforms the pack, essentially because it's easier to do
| hyperparam search over simpler models once you figure out the
| bits that actually matter.
| XorNot wrote:
| Hang on, how unlimited is unlimited here? Surely the immediate
| thing you'd do with this is just _never_ delete any prior inputs
| so it becomes defacto long term memory for the model?
| shishy wrote:
| Last paragraph touches on that:
|
| The length of inputs is theoretically bounded by the memory
| limitations of the computer used. More practically, using a CPU
| datastore is many times slower than a GPU datastore because of
| slower search and the need to transfer retrieved embed- dings
| to the GPU... (continues)
| 0xDEF wrote:
| The limit is RAM but GPU RAM is much faster than computer RAM.
| davrosthedalek wrote:
| Is that really the limit? There is no real restriction that
| everything is in memory at the same time, right? You could
| maybe stream from SSD?
| capableweb wrote:
| Create a swapfile and you essentially trade disk space for
| memory space.
| GistNoesis wrote:
| I've read the paper quickly, the main idea is simple and
| interesting, but maybe a little dubious (it's kind of an accuracy
| for memory trade-off).
|
| In the transformer architecture one has to compute QKT.
|
| QKT=(hd * Wq * WkT)heT (equation (2) page 3 in the paper).
|
| Where hd is the hidden state of the decoder, and he is the hidden
| state of the encoder, and Wq and Wd are some parameters matrices,
| and T denotes the transposition operation.
|
| By grouping the calculation this way, in a transformer encoder-
| decoder architecture, they can build and use only a single index
| (you index the he vectors using a vector database) for all the
| decoder layers queries. Instead of having to build 2 * L * H
| indices (with L the number of layers of the decoder and H the
| number of head in the decoder).
|
| But what makes it a little dubious, is that this transformation
| mean you make your near neighbor queries in a space of dimension
| "dimension of the hidden state", instead of "dimension of a head"
| that is H times smaller.
|
| So if you had to build 2 * L * H indices each index would be H
| times smaller.
|
| So you only gain a factor 2 * L. But the trade-off is that you
| are doing a near neighbor search in higher dimension where you
| are then subjected to the curse of dimensionality (the higher the
| dimension the more similar all points are to each other). Whereas
| the whole point of projections in transformer is to lower the
| dimension so that the knn search make more sense. So to get the
| same accuracy, your near-neighbor search engine will have to work
| a lot harder.
|
| Also as an approximation of the transformer, because it's using
| some knn search, it comes with the problems associated with it
| (for example harder to train because more sparse, and a tendency
| to hyperfocus), but it can be complemented with low-rank
| linearization of the attention to also have the neural net act on
| the gist rather than the closest neighbors.
| numeri wrote:
| This technique can be added on to any encoder-decoder
| Transformer model post-training, so the added training
| difficulties you mention don't apply. It honestly is a very
| interesting approach to me - the main issue I see (which they
| discuss in the paper) is in pure latency. If you're using a
| large enough vector database, it will be on the CPU, and
| transferring hidden states from GPU to CPU and then the
| embeddings back from CPU to GPU is going to eat up a ton of
| time.
| szundi wrote:
| Input should be the Internet then.
| quickthrower2 wrote:
| Pricing: $0.1 per nano token.
| TeMPOraL wrote:
| Is this how Kagi's "universal summarizer" works? They wrote a lot
| of copy about how it's able to summarize websites and documents
| of arbitrary length, while not revealing how on Earth this
| actually works. It _does_ seem to work, though.
| adamnemecek wrote:
| The attention mechanism corresponds to the Hopf algebraic
| convolution, a generalization of the commonly known convolution.
|
| I'm in the process of implementing a framework based on this
| idea.
|
| I have written a paper on this recently,
| https://arxiv.org/abs/2302.01834
|
| I have a discord channel https://discord.cofunctional.ai.
| capableweb wrote:
| You never just work on something until it's being ready to be
| shared, and then share it once? It has to be shared before it's
| even a little bit usable, with just some vague words about what
| it might be?
| adamnemecek wrote:
| I'm gauging interest and looking for potential users. Steve
| Blank and all that.
| verdverm wrote:
| The first step to crossing the chasm is finding those
| innovators and learning if you are solving a problem!
| adamnemecek wrote:
| I have and I am. Next.
| j2kun wrote:
| Appendix C of the PDF has an example book summary of The Brothers
| Karamazov, with noted minor factual errors
|
| https://arxiv.org/pdf/2305.01625.pdf
|
| > Unlimiformer summary:
|
| > The first part of the novel focuses on the question of whether
| or not the Russian nobleman, Dmitri Fyodorovitch, has killed his
| father. In the town of Ivanovna, the lieutenant-colonel of the
| Mushenkhanovitch is accused of the murder of his brother
| Ivanovitch. The lieutenant-incommand, Vasilyevitch, takes the
| form of a dog, and the two men-the two men and the woman who are
| questioned by the court-martial-murphy. The two men cry out to
| the God of Russia for help in their quest to save the town. The
| man, afraid of the wrath of the God, hands the dog a bunch of
| letters that are supposed to be proof of his love for his
| brother. The old man-the one who had killed his mother, and then
| found the letter-arrives. He reads it-asked the old man to
| forgive him for the murder and then takes the dog away. The other
| men, all of whom are prisoners, demand that the man confess his
| crime to the court. The first and most important thing they tell
| the court is that they love the man. The court acquits the man
| and sentences the man to death. The second man-an old officer of
| the town, Alekandrovitch-askes to tell them the same thing. The
| third man-in the process of confessing his crime-is Vashenka, a
| drunk man who has been sent to the town to kill his father, for
| reasons which are not entirely clear to the people. The woman's
| servant, Evgenyevna, is also the one who has told the court the
| story of the Medvedevitch's murder, for the good old man's and
| the young man's love. The three men, who are separated for the
| first time, are laughing at the man's attempt to seduce Mitya.
| The young man, in the meantime, is conscripted into the town-
| side. He tells the court that he loves her, but he has yet to
| tell her the true story. The men, in this room, demand a man to
| kill her, and she will not betray them. The women, in their own
| country, are rebelling against the man who had sent them three
| thousand roubles, and they will not allow the man of the people
| to see them. They will not let the man in the town be allowed to
| see the man-or Dmitriovitch; he will have her husband killed him.
| He will not tell the people who love him. The next man, named
| Vashenovitch, arrives, and takes the man away. They all begin to
| laugh at the fact that he has succeeded in seducing and
| entrusting his brother Dmitri. He is then taken away to the old
| woman's house, where the governor-side-of-the-world, and his
| sister, Arkadin, is being punished. The priestesses and the baron
| are shocked, for they have been so virtuous and well-suited. The
| only thing they will be able to do is kill the priest. They
| threaten to burn the priestess to death, for she has been so
| wicked and libidinous that she has not yet seen the priest, for
| her husband. The priests-ostensibly convinced that she is a woman
| who loves the priest and has been punished for her love and for
| allowing the priest to marry her. The last man, Yakivitch,
| arrives at the house, and, after a long day of drinking and then
| some of the men-is killed. He and the priest are ordered to leave
| the town so that the priest can finally be reunited with the
| people of the old lady. The final man, the commander of the St.
| Petersburg town of Arkadina, is sentenced to death for the crime
| of having killed and then the lieutenant of the governor, for
| taking the money. The commander, the former lieutenant-delegation
| of the People's Army, is summarily executed, and all the men,
| except for the commander, have been summarily punished for their
| crime. The entire town is shocked and, in a very dramatic way,
| the priestesses plead for the forgiveness of the man, for
| allowing them to kill and imprison Ivan. They plead for their
| brother to be restored as well, for all the people they have
| loved, and for the priestor to tell the story
| timy2shoes wrote:
| Just like the book, that summary was too long; didn't read.
| MacsHeadroom wrote:
| Sounds like your context window is too short.
| verdverm wrote:
| because internet?
| edflsafoiewq wrote:
| That summary hardly inspires confidence, it's awful.
| chrgy wrote:
| In the age of transformers , lets ask a transformer to summarize
| this paper:
|
| The Unlimiformer paper is about a new way to make computer
| programs that can summarize really long pieces of text. Normally,
| when you ask a computer program to summarize something, it can
| only handle a certain amount of text at once. But with
| Unlimiformer, the program can handle as much text as you want!
|
| The way Unlimiformer works is by using a special technique called
| a "k-nearest-neighbor index" to help the program pay attention to
| the most important parts of the text. This makes it possible for
| the program to summarize even really long documents without
| losing important information.
|
| Overall, Unlimiformer is an exciting new development in natural
| language processing that could make it easier for computers to
| understand and summarize large amounts of text.
| bighoki2885000 wrote:
| [dead]
| space_fountain wrote:
| As I understand it the approach here is to use an approximate
| nearest neighbor database to retrieve highly relevant tokens from
| across large documents using the existing attention heads. So
| each attention head retrieves context from entire document. They
| say this can work without fine tuning, but performance improves
| with it. This is apparently extending this piece of prior work,
| but they've managed to re-range the linear algebra of attention
| so they only need one database for all attention heads across all
| layers of the model. I'm a bit confused how attention would here
| for layers below the top and a bit confused about how position is
| encoded for tokens across a long document like this.
| im3w1l wrote:
| I don't understand how this could work. Like if you select a
| small fixed number of tokens from a large document won't you
| necessarily lose a lot of important data?
| nephanth wrote:
| Btw, why do transformers have a limit input size in the first
| place? I'm pretty sure the self-attention mechanisms scale
| (although with bad complexity) to arbitrary sizes
| MacsHeadroom wrote:
| >(although with bad complexity)
|
| Because of exactly that.
|
| Also the attention mechanism is baked in during pretraining. So
| whatever max context length you want increases the compute cost
| of training by at least a function of said "bad complexity."
| Even just 4096 tokens of max context is much more expensive to
| train than 2048. So if we want models with 8k, 32k, or more
| context then the training costs get out of hand quickly.
| mxwsn wrote:
| 1. This is not exact attention, but an approximation of it.
| Specifically, they use k-nearest neighbors to retrieve the top-k
| most similar tokens, out of an "unlimited-length input" say of
| size N, where k << N.
|
| 2. This idea is quite similar to retrieval transformers and
| Hopfield networks which have been known and published for several
| years now. It's not really that novel.
|
| 3. Due to the preceding points, the title can easily mislead
| people. It's not really a conventional transformer, and it's not
| a breakthrough.
|
| 4. This paper is a preprint and not peer-reviewed.
|
| "I generally don't enjoy seeing preprints like this going to the
| top of Hacker News. This would be a higher quality submission if
| the paper was peer-reviewed or put into a greater context, like a
| blog post discussion or something like that."
|
| Let me retract this and say something a bit nicer :) I personally
| think there this specific preprint making it to the top of HN is
| potentially harmful, because of the hype around LLMs, the diverse
| audience of readers here, and the specific title that implies a
| claim of "transformer with unlimited context length", when this
| is misleading. I don't have anything against preprints in general
| - a lot of work outside of the peer-review process ends up being
| very impactful.
| joseph_grobbles wrote:
| [dead]
| dhruvdh wrote:
| I generally don't enjoy something being diminished on account
| of being "not really that novel".
|
| Your comment essentially says - this is not a high quality
| submission because readers might not actually read it, which is
| no fault of the work, or submitter.
| whimsicalism wrote:
| It doesn't have to be someone's fault to not be a good suited
| submission.
| MasterScrat wrote:
| > Your comment essentially says - this is not a high quality
| submission because readers might not actually read it
|
| I'd argue that on average, most readers won't have a good
| enough understanding, or read the paper far enough, to
| understand that the reality is closer to "it's not a
| breakthrough" rather than "Transformers with Unlimited Length
| Input".
|
| So, I wholeheartedly welcome this type of hype-breaking
| leading comment.
| jjoonathan wrote:
| Agreed 100%. Not only do I appreciate "well actually"
| comments, I think they are the single most useful aspect of
| forum discussions.
|
| The headline will always be "BATTERY BREAKTHROUGH PROMISES
| TO ROCKET ELON MUSK TESLA TO THE MOON!!!" and while it's
| easy to know that _some_ amount of cold water is necessary
| you need to spend a nontrivial amount of attention and have
| a nontrivial amount of knowledge to figure out just how
| much cold water. It 's a useful thing to outsource. Did a
| research group see outperformance in an experiment with 1%
| probability of translating into production? Or is CATL
| scaling up a production process? The "well actually"
| comment will contextualize for you. If there's a "well
| actually" reply to the "well actually" comment, that tells
| you something too. Upvotes/downvotes dial in the
| distributed consensus.
|
| It's far from perfect, but I'd challenge detractors to
| point to a more effective method for large-scale democratic
| truth seeking.
| swores wrote:
| It's possible to approve of the "hype-breaking" (aka
| TLDRing / ELI5ing so that HN comment readers can understand
| the degree to which it's interesting for those of us not
| close enough to the field to understand that for ourselves)
| without agreeing that that same comment should also
| complain that preprints shouldn't be submitted to / upvoted
| on HN.
|
| That's how I feel, anyway. I'd rather have seen a comment
| that has the same explanations in it but just generally
| less grumpy! Saying stuff like "It's not really that
| novel." doesn't really contribute much, when it could
| either be explained why it isn't novel by explaining how
| similar it is to something earlier that can be referenced,
| or thinking about what if anything _is_ novel in this
| research - assuming it isn 't being accused of just
| replicating something already done.
| godelski wrote:
| Honestly, these complaints (other than 4) apply to the vast
| majority of papers. #4 is just false. It has already been
| viewed by other lab members (peers) and open publication is
| peer reviewing. The "peer review system" (publishing to
| conferences/journals) is relatively new and I think ML
| demonstrates all the problems with the system (yay hype).
|
| Novelty is especially a joke. ViTs are "just" NLP encoding
| transformers. T2I models are "just" NLP models connected to
| generative models. Diffusion models are "just" whitening
| models. GPT3 is just GPT2 with more layers and more data which
| is just GPT with more layers and more data. We can go even
| deeper if we pull from math and physics works. But that doesn't
| mean these works haven't been highly fruitful and useful. I'm
| happy all of these have been published.
|
| > because of the hype around LLMs
|
| I too hate the hype, but it is often bimodal. There are people
| who are far too critical and people who are far too accepting.
| The harm is not preprints or people reading papers, the harm is
| people who have no business/qualifications evaluating works
| confidently spouting out critiques. It is people not
| understanding that researchers are just critical of one
| another's work by default and that doesn't mean it shouldn't
| have been published.
|
| It is well known that reviewers are good at identifying bad
| papers but not good at identifying good papers[0,1]. Which
| let's be honest, that means reviewers just have high reject
| rates in a noisy system. Making publication as a metric for
| merit a highly noisy one at best.
|
| As for the paper:
|
| Many LLMs and large models are using attention approximations.
| Nor is the kNN technique particularly new. My main complaints
| are a lack of comparisons for Figure 3 and 4, but I'm not a NLP
| person so I don't even know if there's some other good works
| that can compare better (BART is a common baseline). But
| generative models are (unfortunately not notoriously known)
| extremely difficult to evaluate. Paper seems fine to me. It is
| useful to the community. I don't like the name either, but
| their input is limited by computer memory, not the model. I
| would want to see more on this. Not a NLP person all I can say
| is that this looks neither like a strong reject nor a strong
| accept. I'll leave it to the community to determine if they
| want more experiments for the conference publication or not but
| the work seems useful.
|
| [0] https://inverseprobability.com/talks/notes/the-neurips-
| exper...
|
| [1] https://arxiv.org/abs/2109.09774
| ShamelessC wrote:
| > This idea is quite similar to retrieval transformers and
| Hopfield networks which have been known and published for
| several years now. It's not really that novel.
|
| Is it? I had thought retrieval transformers "merely" used
| retrieval as a backend of sorts rather than a substitute for
| the attention itself?
| mxwsn wrote:
| Yeah, RETRO [0] embeds all an entire question/prompt, and
| searches for similar text passages with k-NN, then does
| further processing. This can kind of be understood as
| attention on paragraphs. This preprint instead does k-NN and
| calls it attention on single tokens. So not the same. But
| similar.
|
| [0] https://jalammar.github.io/illustrated-retrieval-
| transformer...
| make3 wrote:
| retro doesn't attend itself, which is a big difference
| ShamelessC wrote:
| Ah, I see - thanks for the clarification.
| chaxor wrote:
| There's nothing really wrong with a preprint making it to the
| top - there can be genuinely good work that stays in preprint
| for quite some time. I believe the original ELMo work that
| spurred the Sesame street gang is _still_ in preprint despite
| its importance in NLP (:shocked Pikachu face: not a
| transformer?!).
|
| But yes, you're correct in this instance that it's not
| necessarily 'huge news' since it is highly similar to a long
| list of the Reformer (LSH-based), Performer (FAVOR**), FNet
| (Fourier-based), Routing Transformer, Sparse Transformer,
| Longformer (task specific sparse), blockbert, XLNet/xfmr-xl
| (slide + relative PE), BP-Transformer (binary partition),
| BigBird (global and rand attn), RWKV which is..., etc.
|
| ** FAVOR actually is innovative and different in this space,
| but towards _similar_ ends anyway
| visarga wrote:
| How come you know the efficient-transformers family, when I
| ask questions about transformers in ML interviews nobody has
| heard of them. Can't figure out why it's not common
| knowledge. For years all the transformer papers were about
| reducing O( N^2 )
| f_devd wrote:
| To be fair ML is (used to be?) pretty broad, so unless
| someone is actively keeping up with the sota in the high-
| data sequence modeling area it's quite possible to miss. I
| know ML teams which were entirely made up of OSML
| practicioners, because that was the most commonly useful
| until recently.
| ftxbro wrote:
| > I generally don't enjoy seeing preprints like this going to
| the top of Hacker News. This would be a higher quality
| submission if the paper was peer-reviewed or put into a greater
| context, like a blog post discussion or something like that.
|
| This opinion seems totally backwards to me. I'm not sure what
| you think peer-reviewed means? Also I prefer full preprints
| than blog posts. But then again, I have no idea why ones like
| the daily blogposts of Seth Godin (to pick on one randomly,
| sorry it's not personal) so often go to the top of hacker news.
| Maybe opinions like yours explains it?
| MacsHeadroom wrote:
| > This opinion seems totally backwards to me.
|
| I agree.
|
| > I'm not sure what you think peer-reviewed means?
|
| Posting to HN is a form of peer-review, typically far better
| than the form of "peer-review" coopted by journal publishers.
| xg15 wrote:
| That's redefining what "peer-review" is. And I'll take
| credentialism over some board of anonymous internet people,
| I'm sorry.
|
| I mean, hypothetically, this whole thread could be stuffed
| with sock puppet accounts of the author. How would you
| know?
| pyth0 wrote:
| > Posting to HN is a form of peer-review, typically far
| better than the form of "peer-review" coopted by journal
| publishers.
|
| This is a rather self-aggrandizing view, and I think it
| speaks to the level of ego that underpins a lot of the
| discussion on here.
| 19h wrote:
| There's no need to attack the entire HN community over
| one person's opinion. Preprints and discussions here both
| have value, and different forms of review suit different
| needs.
| pyth0 wrote:
| This was not an attack against the community or the paper
| in question. I am only speaking from my experience as
| (primarily) a lurker.
| 19h wrote:
| My apologies, I misinterpreted your comment. You make a
| fair point that HN discussions are not equivalent to
| formal peer review.
| godelski wrote:
| There's a lot of junk comments on HN but there's also a
| lot of junk comments at top conferences like CVPR, ICCV,
| and NIPS. The system is just noisy. I've had plenty of
| inane reviews that clearly break reviewer guidelines (ACs
| do nothing)[0,1].
|
| Also, I want to remind everyone that ML uses conferences
| as the main publishing mechanism, not journals. While
| things like JMLR exist, that's not where papers are
| targeting.
|
| Maybe we just need to let researchers evaluate works
| based on their merits and not concern ourselves with
| things like popularity, prestige, and armchair experts'
| opinions. The latter seems antiscientific to me. We need
| to recognize that the system is noisy and Goodhart shows
| us we aren't optimizing merit.
|
| [0] an example is that I had a strong reject with 2 lines
| of text. One stating that it wasn't novel (no further
| justification) and the other noting a broken citation
| link to the appendix. No comments about actual content.
|
| [1] As another example, I've had reviewers all complain
| because I didn't compare one class of model to another
| and wasn't beating their performance. I beat the
| performance of my peers, but different models do
| different things. Image quality is only one metric. You
| wouldn't compare PixelCNN to StyleGAN.
| xg15 wrote:
| > _Maybe we just need to let researchers evaluate works
| based on their merits and not concern ourselves with
| things like popularity, prestige, and armchair experts '
| opinions._
|
| Ok, but how would the researchers communicate their
| evaluation to non-experts? (Or other experts who didn't
| have the time to validate the paper)
|
| Isn't that exactly what a review is?
|
| My impression is the armchair experts are more likely to
| be found on HN.
| Grimblewald wrote:
| >This is a rather self-aggrandizing view, and I think it
| speaks to the level of ego that underpins a lot of the
| discussion on here.
|
| I'm not so sure about that. I've read a lot of things
| that should have never left peer review or editing
| stages, while some of the most impotent papers for my
| field never left preprint.
|
| Overall I think the most imprortant step of peer review
| is you as the reader in the field. Peer review should
| catch the worst offenders out, saving us all some time,
| but it should never be viewed as a seal of approval.
| Everything you read should be critically evaluated as if
| it were a preprint anyway.
| pyth0 wrote:
| I realize some people have taken my comment to be
| speaking on the efficacy of the peer review process but
| that was not my intent. I have no experience reading or
| reviewing papers, or with the journal publication
| process. My point was more to the fact that HN is a
| public forum in which anyone can participate and so
| elevating it above (what I hope are) subject matter
| experts seemed rather arrogant. To be fair, the OP has
| since expanded with a more complete comment and it seems
| to be a similar sentiment to the things you and a couple
| others have shared.
| freeone3000 wrote:
| Having been on a paper review board, the selection
| process is essentially credentialism for credentialism's
| sake. Anyone who's done a paper or two is deemed to be
| qualified, and as it's unpaid, uncredited bonus work on
| top of your day job, the slots aren't competed for very
| hard.
|
| I would say the primary difference between a conference
| peer review board and HN is that the author is obliged to
| respond to the reviewers on the board. I would not say
| there's any particular difference in qualifications.
| xg15 wrote:
| > _Anyone who's done a paper or two_
|
| That already narrows it down greatly compared to the
| general public you find on the internet.
| JoshuaDavid wrote:
| Do you think it's _factually incorrect_ that the HN
| comment section is more likely to find problems which
| invalidate the conclusions of the paper than the journal-
| driven peer review process?
| mrbungie wrote:
| I think that it depends on what journal we are talking
| about. Most of them have some biases in their processes,
| just as HN commenters also do.
| anonymousDan wrote:
| Yes?
| JoshuaDavid wrote:
| On reflection, I probably agree that the answer is "yes"
| to the question as I phrased it. I think that if you take
| a random paper, the peer reviewers probably _do_ have
| much more useful feedback than HN would.
|
| However, if you limit the question to "papers which make
| bold conclusions of the type that generates lots of
| discussion on HN", I think HN will be more likely to find
| methodological flaws in those papers than the peer review
| process would. I think that's mostly because papers are
| optimized pretty hard not to have any problems which
| would cause them to be rejected by the peer review
| process, but _not_ optimized very hard to not have other
| problems.
|
| Which means, on average, I expect the HN comment section
| to have more interesting feedback about a paper, _given
| that it 's the sort of paper that gets lots of HN
| discussion_, and also _given that the author put a lot of
| effort into anticipating and avoiding the concerns that
| would come up in the peer review process_.
|
| Which, to a reader of HN, looks like "a lot of peer-
| reviewed papers have obvious flaws that are pointed out
| by the HN commentariat".
|
| I do think, on the object level, a pre-print which the
| author intends to publish in a reputable journal will be
| improved _more_ by fixing any problems pointed out by HN
| commenters than by fixing any problems pointed out by
| peer reviewers, and as such I think "post the pre-print
| on HN and collect feedback before peer review" is still a
| good step if the goal is to publish the best paper
| possible.
| pyth0 wrote:
| This is a considerably more thoughtful comment and I
| appreciate your reflection. I also can see how my initial
| response was a little broad and over-generalizing. I do
| think there is an interesting conversion in there about
| whether a group of technically minded people outside the
| "in group" of the peer reviewer circle (of whatever paper
| in question) could offer different and potentially
| important feedback.
|
| Although I should add I have no background in academia
| and don't feel prepared to have that discussion.
| cs702 wrote:
| After a very quick read, that's my understanding too: It's just
| KNN search with some bells and whistles. So I agree on points
| 1-3.
|
| When something works well, I don't care much about point 4.
|
| Personally, I've had only mixed success with KNN search on long
| sequences. Maybe I haven't done it right? I don't know. In my
| experience, nothing seems to work quite as well as explicit
| token-token interactions by some form of attention, which as we
| all know is too costly for long sequences (O(n2)). Lately I've
| been playing with https://github.com/hazyresearch/safari ,
| which uses a lot less compute and seems promising, though it
| reminds me of things like FNet. Otherwise, for long sequences
| I've yet to find something better than
| https://github.com/HazyResearch/flash-attention for nxn
| interactions and https://github.com/glassroom/heinsen_routing
| for nxm interactions. If anyone has other suggestions, I'd love
| to hear about them.
| ztratar wrote:
| Given the model performance is thus affected by a k-nearest
| neighbor, but those algorithms are proving not great for baseline
| vector search, how well will this actually work?
|
| It seems mostly like a vertically integrated vector DB + existing
| LLM call, but correct me if I'm wrong. There are of course some
| performance gains with that, but the holy grail of
| "understanding" at unlimited length still seems unsolved.
| mrbungie wrote:
| Isn't the performance (as in the capacity of retrieval, not
| performance as compute/memory usage) of kNN mostly given by the
| quality of the vectors/embeddings themselves?
|
| Most vector DBs use (at least) some kind of KNN anyways.
| smusamashah wrote:
| What does it mean for ChatGPT and likes? Can they employ this
| method to virtually get rid of context tokens limit?
| Kranar wrote:
| Yes it looks like it can use this method. This method is a
| preprocessor and post-processor that can be used on an existing
| GPT model to augment it to handle unlimited tokens.
| ftxbro wrote:
| Other times this was put on hacker news:
|
| https://news.ycombinator.com/item?id=35823039
|
| https://news.ycombinator.com/item?id=35803470
| swores wrote:
| While I appreciate your intent and effort - I don't think it's
| actually useful to link to other submissions unless either they
| have comments (ideally only if there's at least one interesting
| comment, but at least more than no comments at all), or if it's
| a submission of the same subject but to a different source link
| - in which case it's probably more useful to just link the
| alternative source, if it's worth reading, rather than
| potentially split the discussion into separate comment threads
| if the other is empty.
|
| Linking to a different submission of the same link with 0
| comments doesn't add anything.
| ftxbro wrote:
| I must have submitted it at the wrong time of day.
| swores wrote:
| Sure or just random luck, maybe this submission just
| happened to take place when the only few people who care
| about this subject happened to come online, or vice versa
| for bad luck before etc.
|
| But unlike sites like Reddit, with the exception of self /
| ask HN / etc posts, nobody really pays attention to who the
| submitter is, so enjoy the conversation finally breaking
| out on it as consolation for not getting karma points, but
| skip linking to dead submissions :)
|
| FYI, if you ever submit something that fails to get any
| traction / upvotes, then I've seen mods say before (@dang
| will hopefully correct me if I'm wrong) that a) it's OK to
| try submitting a second time maybe after a day or so (but
| not keep submitting over and over) or b) send the mods an
| email with a brief reason why it's a link that should
| interest HN readers for it to be potentially added to a
| "second chance pool". Though in the case of this link,
| between three of you it was posted two days ago, one day
| ago, and today which has finally got a bit more notice, so
| worked out alright in the end :)
___________________________________________________________________
(page generated 2023-05-05 23:00 UTC)