[HN Gopher] Training LLMs to generate text with citations via fi...
       ___________________________________________________________________
        
       Training LLMs to generate text with citations via fine-grained
       rewards
        
       Author : PaulHoule
       Score  : 120 points
       Date   : 2024-02-16 16:42 UTC (6 hours ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | dmezzetti wrote:
       | Very interesting approach.
       | 
       | For those interested in an alternate method that doesn't depend
       | on a LLM, check out this article:
       | https://neuml.hashnode.dev/build-rag-pipelines-with-txtai
       | 
       | Disclaimer: I'm the primary author of txtai.
        
         | leobg wrote:
         | Are you affiliated with txtai?
        
           | dmezzetti wrote:
           | Yes, updated with a disclaimer.
        
           | FinnKuhn wrote:
           | profile description says creator of txtai...
        
         | jerpint wrote:
         | Looks like it's just RAG? The paper is proposing an alternative
         | to RAG due to its shortcomings
        
           | dmezzetti wrote:
           | It's RAG and a method that identifies citations from the
           | context.
        
         | bugglebeetle wrote:
         | The shortcoming of most RAG-based approaches is the assumption
         | that the question resembles the answer in a way that also jives
         | with the embeddings model. Thus far, I've not seen strong
         | evidence (or in my testing) that this is true or works well,
         | but at least citation allows for better assessment. The problem
         | seems to be that we don't have a good feedback loop for ranking
         | RAG retrieval, as we have for LLMs with things like DPO.
        
           | whakim wrote:
           | 100%. This is why RAG and "classical search" will converge
           | for non-trivial use cases. The folks who are doing RAG well
           | still rely on many tried-and-true tricks of the trade:
           | combining semantic search with keyword-based search, using
           | graphs, doing re-ranking, etc. etc. Yet most discussions of
           | RAG on the internet seem to promise consistently awesome
           | query output by just jamming together some embeddings and an
           | LLM, which doesn't pan out in practice.
        
         | dang wrote:
         | Can you please stop posting so promotionally on HN? If you read
         | https://news.ycombinator.com/newsguidelines.html, you'll see
         | it's against the site guidelines.
        
           | dmezzetti wrote:
           | You got it and I appreciate you asking kindly.
           | 
           | I will say though that hopefully you'll consider applying
           | that policy equally to all. Because many VC-backed and large
           | companies basically post press releases and they trend
           | without issue.
           | 
           | I'm a single person open-source project. But it's your site,
           | I'll respect your request and not post moving forward.
        
           | mountainriver wrote:
           | I found what he posted relevant and useful, not sure what the
           | issue is
        
       | skeptrune wrote:
       | This is soooo much more exciting than the "put 100M tokens in the
       | context window" idea
        
         | amelius wrote:
         | ELIZA was so much more exciting than the "put 100B artificial
         | neurons on a GPU" idea.
         | 
         | ;)
        
         | bugglebeetle wrote:
         | I would say the large context sizes with the ability to
         | reliably cite said context is the best possible outcome.
        
       | th0ma5 wrote:
       | It is my understanding there is no way to objectively or
       | programmatically tell if any of this stuff is correct or doesn't
       | obfuscate some dire error. These tricks don't give me confidence
       | that we're headed in that direction either.
        
         | nerdponx wrote:
         | Any LLM/GPT system is fundamentally stochastic, so yes, kind
         | of? But it's hardly a "trick" to come up with a way to make
         | them work the way we want them to work.
         | 
         | An LLM is based on the idea that there is some very complicated
         | function                 y(t) = f(y(t-1), y(t-2), ..., y(t-K))
         | 
         | for any sequence of text tokens t(1), y(2), ... and some huge
         | context size K.
         | 
         | We fit the model by attempting to find an approximation of f()
         | with a low badness score (called "loss"). We also don't usually
         | want the absolute lowest badness score, because there is always
         | some tradeoff between between minimizing loss on the specific
         | test data that we happen to have, and preserving the ability to
         | generalize to content outside of the test data.
         | 
         | The technique here is an improvement to that process of finding
         | a good approximation of f(). The specific technique here is to
         | adjust the loss to mean something more specific than "error
         | rate on predicting the next word in a sequence."
         | 
         | The entire basis of the principle is that the loss function
         | controls what the model learns. If we want it to produce more
         | sentences that look a certain way, we introduce a reward for
         | generating sentences that look that way. If we want to avoid
         | certain kinds of outputs, we introduce a penalty for generating
         | those kinds of outputs. From there, it's just gradient descent.
         | Better outputs produce lower losses, so the model starts
         | producing better outputs, without us having to really know or
         | care about what the model is doing internally.
         | 
         | The technique of RLHF is similar along those lines. We
         | discourage the model to hallucinate by having a human review
         | its output and report when the model hallucinates, so we can
         | impose a penalty for hallucination, and thereby shift the model
         | output in the direction of not-hallucinating after many such
         | rounds of "learning".
         | 
         | Is it a fundamental shift in how LLMs work? No. Could it
         | possibly lead to improvements? Yes.
        
       | serjester wrote:
       | This is fantastic if this starts working on smaller models. We've
       | have solid results with appending the following to our GPT-4
       | system prompt.
       | 
       | "When possible cite your sources. Use the following custom html
       | format <citation>{document_id}+{node_id}</citation>. Very
       | important!"
        
         | 3abiton wrote:
         | The field is racing, at the speed of sound. 3.5 was yesterday.
        
       ___________________________________________________________________
       (page generated 2024-02-16 23:00 UTC)