[HN Gopher] Training LLMs to generate text with citations via fi...
___________________________________________________________________
Training LLMs to generate text with citations via fine-grained
rewards
Author : PaulHoule
Score : 120 points
Date : 2024-02-16 16:42 UTC (6 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| dmezzetti wrote:
| Very interesting approach.
|
| For those interested in an alternate method that doesn't depend
| on a LLM, check out this article:
| https://neuml.hashnode.dev/build-rag-pipelines-with-txtai
|
| Disclaimer: I'm the primary author of txtai.
| leobg wrote:
| Are you affiliated with txtai?
| dmezzetti wrote:
| Yes, updated with a disclaimer.
| FinnKuhn wrote:
| profile description says creator of txtai...
| jerpint wrote:
| Looks like it's just RAG? The paper is proposing an alternative
| to RAG due to its shortcomings
| dmezzetti wrote:
| It's RAG and a method that identifies citations from the
| context.
| bugglebeetle wrote:
| The shortcoming of most RAG-based approaches is the assumption
| that the question resembles the answer in a way that also jives
| with the embeddings model. Thus far, I've not seen strong
| evidence (or in my testing) that this is true or works well,
| but at least citation allows for better assessment. The problem
| seems to be that we don't have a good feedback loop for ranking
| RAG retrieval, as we have for LLMs with things like DPO.
| whakim wrote:
| 100%. This is why RAG and "classical search" will converge
| for non-trivial use cases. The folks who are doing RAG well
| still rely on many tried-and-true tricks of the trade:
| combining semantic search with keyword-based search, using
| graphs, doing re-ranking, etc. etc. Yet most discussions of
| RAG on the internet seem to promise consistently awesome
| query output by just jamming together some embeddings and an
| LLM, which doesn't pan out in practice.
| dang wrote:
| Can you please stop posting so promotionally on HN? If you read
| https://news.ycombinator.com/newsguidelines.html, you'll see
| it's against the site guidelines.
| dmezzetti wrote:
| You got it and I appreciate you asking kindly.
|
| I will say though that hopefully you'll consider applying
| that policy equally to all. Because many VC-backed and large
| companies basically post press releases and they trend
| without issue.
|
| I'm a single person open-source project. But it's your site,
| I'll respect your request and not post moving forward.
| mountainriver wrote:
| I found what he posted relevant and useful, not sure what the
| issue is
| skeptrune wrote:
| This is soooo much more exciting than the "put 100M tokens in the
| context window" idea
| amelius wrote:
| ELIZA was so much more exciting than the "put 100B artificial
| neurons on a GPU" idea.
|
| ;)
| bugglebeetle wrote:
| I would say the large context sizes with the ability to
| reliably cite said context is the best possible outcome.
| th0ma5 wrote:
| It is my understanding there is no way to objectively or
| programmatically tell if any of this stuff is correct or doesn't
| obfuscate some dire error. These tricks don't give me confidence
| that we're headed in that direction either.
| nerdponx wrote:
| Any LLM/GPT system is fundamentally stochastic, so yes, kind
| of? But it's hardly a "trick" to come up with a way to make
| them work the way we want them to work.
|
| An LLM is based on the idea that there is some very complicated
| function y(t) = f(y(t-1), y(t-2), ..., y(t-K))
|
| for any sequence of text tokens t(1), y(2), ... and some huge
| context size K.
|
| We fit the model by attempting to find an approximation of f()
| with a low badness score (called "loss"). We also don't usually
| want the absolute lowest badness score, because there is always
| some tradeoff between between minimizing loss on the specific
| test data that we happen to have, and preserving the ability to
| generalize to content outside of the test data.
|
| The technique here is an improvement to that process of finding
| a good approximation of f(). The specific technique here is to
| adjust the loss to mean something more specific than "error
| rate on predicting the next word in a sequence."
|
| The entire basis of the principle is that the loss function
| controls what the model learns. If we want it to produce more
| sentences that look a certain way, we introduce a reward for
| generating sentences that look that way. If we want to avoid
| certain kinds of outputs, we introduce a penalty for generating
| those kinds of outputs. From there, it's just gradient descent.
| Better outputs produce lower losses, so the model starts
| producing better outputs, without us having to really know or
| care about what the model is doing internally.
|
| The technique of RLHF is similar along those lines. We
| discourage the model to hallucinate by having a human review
| its output and report when the model hallucinates, so we can
| impose a penalty for hallucination, and thereby shift the model
| output in the direction of not-hallucinating after many such
| rounds of "learning".
|
| Is it a fundamental shift in how LLMs work? No. Could it
| possibly lead to improvements? Yes.
| serjester wrote:
| This is fantastic if this starts working on smaller models. We've
| have solid results with appending the following to our GPT-4
| system prompt.
|
| "When possible cite your sources. Use the following custom html
| format <citation>{document_id}+{node_id}</citation>. Very
| important!"
| 3abiton wrote:
| The field is racing, at the speed of sound. 3.5 was yesterday.
___________________________________________________________________
(page generated 2024-02-16 23:00 UTC)