[HN Gopher] Artificial Needles to Real Haystacks: Improving Retr...
___________________________________________________________________
Artificial Needles to Real Haystacks: Improving Retrieval
Capabilities in LLMs
Author : veryluckyxyz
Score : 59 points
Date : 2024-06-29 04:55 UTC (18 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| anotherpaulg wrote:
| This is really interesting. They fine-tune on instances of this
| sort of task: Do a task using the list of
| dictionaries below. Dictionary [1] {122: 765, 4548: 1475,
| 4818: 4782} Dictionary [2] {526: 290, 9205: 9318, 9278: 1565} ...
| Dictionary [32] {2931: 8364, 196: 1464, 812: 5363} ...
| Dictionary [85] {344: 1579, 116: 617, 330: 411} Above is a
| list of dictionaries such that each key and value is an integer.
| Report the value of key 2931 and the dictionary it is in.
| Desired answer: The value of key 2931 is 8364 and it is in
| Dictionary [32].
|
| This task doesn't teach any new facts, but seems to encourage
| better ability to random-access data from a large context.
| surfingdino wrote:
| cat list_of_dicts.txt | grep '2931:'
| optimalsolver wrote:
| This is why you need symbolic reasoning. The above is vastly
| more efficient than whatever these LLMs are doing.
| surfingdino wrote:
| I had a chat with a dev friend of mine. He's building an
| app and will have users enter venue booking info. He then
| plans to use ... AI to check venue availability and sort
| bookings, something a couple of simple SQL queries and
| views would easily take care of. Where did we go wrong? How
| did the LLM monkey gang manage to embed their shitty ideas
| into the minds of otherwise intelligent people?
| vessenes wrote:
| Interesting. You could add counting and other jobs as well, eg
| "how many times does value 8364 appear, and in which
| dictionaries?"
|
| A lot of the needle trained models clearly can't _reason_ with
| stuff everywhere in a long context even if they can retrieve. I
| would like to see some extensions that require some reasoning.
|
| I guess you could use words as dict vals and ask it what
| sentence a set of keys spells, and to answer the sentence with
| a set of keys. Lots of interesting possibilities.
| viksit wrote:
| anyone have pointers on progress in symbolic reasoning vs context
| forcing approaches in LLMs?
| dvt wrote:
| I've seen a lot of papers recently tackle the needle-in-a-
| haystack problem wrt LLMs, and I think this approach (and more
| generally, any in-context solution) is a mistake.
|
| Imo the best way to handle this is RAG + multi-shot prompting (+
| symbolic mapping to an actual data structure). For example, a
| pre-processing step where you partition the context by "records,"
| another step where you insert (and potentially split up the
| records) in a RAG database, and another step where you make fuzzy
| queries. So, if you ask for record 1234 you get an exact match on
| that line (or set of lines, or record, or whatever) of the
| original context. And if you ask for "elephant" but there's no
| "elephant" in the context, you might get the "hippo" record
| because of the RAG reranking.
|
| This is a lot of work, and is essentially a data pipeline, but
| the results are much better-curated than just fine-tuning and
| hoping that generalized needle-in-a-haystack search will work
| reliably as part of a language model.
| pilooch wrote:
| No. Because the model should and will replace any piece of
| code. This is what happened already for other tasks, in
| computer vision, text (entity recognition, etc...), audio, etc.
|
| RAG will go away, decision multimodal models / LLMs will take
| over. No here yet, but inevitable I believe.
| vatsadev wrote:
| not nesc.
|
| like I would rather use code to find apriltags than a vit or
| other model
| snovv_crash wrote:
| End to end networks can sometimes have higher performance,
| but the failure mechanisms aren't explainable, and are
| unintuitive to humans.
|
| If you're building something that needs to be easy to work
| with, and that humans can understand the limitations of,
| splitting the network up into stages and having human-
| interpretable intermediate values is a good architecture
| choice.
___________________________________________________________________
(page generated 2024-06-29 23:01 UTC)