[HN Gopher] Training and aligning LLMs with RLHF and RLHF altern...
___________________________________________________________________
Training and aligning LLMs with RLHF and RLHF alternatives
Author : rasbt
Score : 65 points
Date : 2023-09-10 14:04 UTC (8 hours ago)
(HTM) web link (magazine.sebastianraschka.com)
(TXT) w3m dump (magazine.sebastianraschka.com)
| scoresmoke wrote:
| Discussions about LLM alignment often miss topics of data quality
| and quantity. It turns out that current models like Llama 2 use
| 10K+ prompts and responses for supervised fine-tuning (SFT) and
| 100K+ human preference pairs. While the preferences are pretty
| easy to annotate, producing a good SFT dataset is uneasy.
|
| https://evalovernite.substack.com/p/rlhf-math-aint-enough
|
| https://doi.org/10.5281/zenodo.8186168
| jamesblonde wrote:
| I read here that Yann LeCun claimed that even with RLHF, LLMs
| will still hallucinate - that it's an unavoidable consequence of
| their autoregressive nature
|
| https://www.hopsworks.ai/dictionary/rlhf-reinforcement-learn...
| ShamelessC wrote:
| That goes without saying.
|
| edit: I don't like your linked article at all. Subtly
| misleading and/or misinformed. Like a yahoo news but for ML.
|
| to clarify: No one (certainly not OpenAI) suggested that RLHF
| was useful for reducing hallucinations. It's not for that. The
| insinuation that it was designed for that purpose (at least
| partially) and yet "failed" is a faulty one. It was not
| designed for that purpose. Hallucinations are a known issue
| with large language models, and while I appreciate LeCunn re-
| iterating that; lesser researchers than LeCunn are aware of
| that fact.
| og_kalu wrote:
| Likely yes. But "solving" hallucinations is not really
| important as long as mitigating it to some sufficiently low
| level is possible.
| phillipcarter wrote:
| Moreover, it's all about use case. If you need a high degree
| of reliability and reproducibility, don't use LLMs! Not yet,
| at least. That's fine though, because there's a ton of value
| they offer in solving problems where that isn't needed.
| 3abiton wrote:
| I wonder if there will be a new metric implemented in
| evaluating LLMs: Hallucination score.
| bugglebeetle wrote:
| > If you need a high degree of reliability and
| reproducibility, don't use LLMs!
|
| This is true of pretty much all of machine learning. LLMs
| are just getting singled out because their outputs are not
| getting the same level of validation that typicall occurs
| with older approaches. BERT models will also spit out
| whacky stuff, depending on how they're trained/fine-
| tuned/used/etc
| bugglebeetle wrote:
| For many NLP tasks (which is what I mostly use LLMs for),
| hallucinations can be prevented with simple, procedural
| checks against the input or a controlled vocabulary. For
| example, for NER tasks, you can just check whether the
| extracted entities are valid relative to either of the two.
| Geee wrote:
| What datasets OpenAI uses for RLHF? Is the assumption correct
| that it's "time & labor intensive"? Couldn't you take ranked
| responses from HN / Reddit / Stack Exchange / Quora etc. where
| answers are already ranked, and train the reward model on that?
___________________________________________________________________
(page generated 2023-09-10 23:00 UTC)