[HN Gopher] Artificial Needles to Real Haystacks: Improving Retr...
       ___________________________________________________________________
        
       Artificial Needles to Real Haystacks: Improving Retrieval
       Capabilities in LLMs
        
       Author : veryluckyxyz
       Score  : 59 points
       Date   : 2024-06-29 04:55 UTC (18 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | anotherpaulg wrote:
       | This is really interesting. They fine-tune on instances of this
       | sort of task:                 Do a task using the list of
       | dictionaries below.       Dictionary [1] {122: 765, 4548: 1475,
       | 4818: 4782} Dictionary [2] {526: 290, 9205: 9318, 9278: 1565} ...
       | Dictionary [32] {2931: 8364, 196: 1464, 812: 5363} ...
       | Dictionary [85] {344: 1579, 116: 617, 330: 411}       Above is a
       | list of dictionaries such that each key and value is an integer.
       | Report the       value of key 2931 and the dictionary it is in.
       | Desired answer: The value of key 2931 is 8364 and it is in
       | Dictionary [32].
       | 
       | This task doesn't teach any new facts, but seems to encourage
       | better ability to random-access data from a large context.
        
         | surfingdino wrote:
         | cat list_of_dicts.txt | grep '2931:'
        
           | optimalsolver wrote:
           | This is why you need symbolic reasoning. The above is vastly
           | more efficient than whatever these LLMs are doing.
        
             | surfingdino wrote:
             | I had a chat with a dev friend of mine. He's building an
             | app and will have users enter venue booking info. He then
             | plans to use ... AI to check venue availability and sort
             | bookings, something a couple of simple SQL queries and
             | views would easily take care of. Where did we go wrong? How
             | did the LLM monkey gang manage to embed their shitty ideas
             | into the minds of otherwise intelligent people?
        
         | vessenes wrote:
         | Interesting. You could add counting and other jobs as well, eg
         | "how many times does value 8364 appear, and in which
         | dictionaries?"
         | 
         | A lot of the needle trained models clearly can't _reason_ with
         | stuff everywhere in a long context even if they can retrieve. I
         | would like to see some extensions that require some reasoning.
         | 
         | I guess you could use words as dict vals and ask it what
         | sentence a set of keys spells, and to answer the sentence with
         | a set of keys. Lots of interesting possibilities.
        
       | viksit wrote:
       | anyone have pointers on progress in symbolic reasoning vs context
       | forcing approaches in LLMs?
        
       | dvt wrote:
       | I've seen a lot of papers recently tackle the needle-in-a-
       | haystack problem wrt LLMs, and I think this approach (and more
       | generally, any in-context solution) is a mistake.
       | 
       | Imo the best way to handle this is RAG + multi-shot prompting (+
       | symbolic mapping to an actual data structure). For example, a
       | pre-processing step where you partition the context by "records,"
       | another step where you insert (and potentially split up the
       | records) in a RAG database, and another step where you make fuzzy
       | queries. So, if you ask for record 1234 you get an exact match on
       | that line (or set of lines, or record, or whatever) of the
       | original context. And if you ask for "elephant" but there's no
       | "elephant" in the context, you might get the "hippo" record
       | because of the RAG reranking.
       | 
       | This is a lot of work, and is essentially a data pipeline, but
       | the results are much better-curated than just fine-tuning and
       | hoping that generalized needle-in-a-haystack search will work
       | reliably as part of a language model.
        
         | pilooch wrote:
         | No. Because the model should and will replace any piece of
         | code. This is what happened already for other tasks, in
         | computer vision, text (entity recognition, etc...), audio, etc.
         | 
         | RAG will go away, decision multimodal models / LLMs will take
         | over. No here yet, but inevitable I believe.
        
           | vatsadev wrote:
           | not nesc.
           | 
           | like I would rather use code to find apriltags than a vit or
           | other model
        
           | snovv_crash wrote:
           | End to end networks can sometimes have higher performance,
           | but the failure mechanisms aren't explainable, and are
           | unintuitive to humans.
           | 
           | If you're building something that needs to be easy to work
           | with, and that humans can understand the limitations of,
           | splitting the network up into stages and having human-
           | interpretable intermediate values is a good architecture
           | choice.
        
       ___________________________________________________________________
       (page generated 2024-06-29 23:01 UTC)