[HN Gopher] LLMs Unleashed: The Power of Fine-Tuning
___________________________________________________________________
LLMs Unleashed: The Power of Fine-Tuning
Author : lucaspauker
Score : 49 points
Date : 2023-07-27 17:08 UTC (5 hours ago)
(HTM) web link (lucaspauker.com)
(TXT) w3m dump (lucaspauker.com)
| treprinum wrote:
| This is 2020-level stuff. These days with emergent abilities in
| LLMs trained with over 1T tokens like GPT-4 a single-shot chain-
| of-thought beats most fine-tunings. I did research on transformer
| adapters i.e. parameter-efficient fine-tuning and that stuff is
| now completely obsolete outside of some restricted domains where
| small models can still perform well.
| Der_Einzige wrote:
| Gosh you are so wrong. Literally every bit of fine tuning and
| fine tuning related work is more important than ever. Being
| able to fine tune a giant model like GPT-4 would be a game
| changer. I don't get why people like to come on here and tell
| blatant lies like this.
| jph00 wrote:
| I haven't seen any recent papers that show that fine-tuning is
| obsolete - I've only seen papers showing the opposite. I'd be
| very interested to see any papers that have demonstrated
| applications where fine-tuning is not effective nowadays, if
| you have any links.
|
| Here's an example of a paper that shows good results from fine-
| tuning for instance: https://arxiv.org/abs/2305.02301
| treprinum wrote:
| This Stanford seminar video can provide some references:
|
| https://www.youtube.com/watch?v=tVtOevLrt5U
| jph00 wrote:
| > " _The idea of fine-tuning has a strong research pedigree. The
| approach can be traced back to 2018, when two influential papers
| were published._ "
|
| The article refers to the BERT and GPT papers as the source of
| the fine-tuning idea. However, we actually first demonstrated it
| for universal models in 2017 and published the ULMFiT (Howard and
| Ruder) paper in early 2018. Prior to that, Dai and Le
| demonstrated the technique for in-corpus datasets. So it would be
| more accurate to say the approach can be traced back to those two
| papers, rather than to BERT and GPT.
|
| BERT and GPT showed the effectiveness of scaling up the amount of
| data and compute, and switching the model architecture to
| Transformers (amongst other things).
| SpaceManNabs wrote:
| I like that your article was well cited. Fun read. Nothing stands
| out as too inaccurate.
|
| You should try a post on parameter efficient tuning next!
| mickeyfrac wrote:
| The link to your terra cotta product, which I assume is the point
| of the article, is broken.
| [deleted]
| nullc wrote:
| > Fine tuning is better for complex tasks where the model's
| generated output must be accurate and trusted.
|
| uhhh. I understand what was intended there but while fine tuning
| may reduce the rate of hallucinations and make hallucinations
| more plausible, it's not magic accurate and trust-worthy dust.
|
| Unfortunately many people think this stuff is magic and care
| should be taken to not encourage people to confuse improvements
| with resolving the issue.
|
| One way of characterizing the LLM accuracy problem is that it
| often _looks_ very accurate and convincing even when it is
| emitting nonsense. If you cast the problem in those terms-- as a
| problem of looking more trustworthy than it actually is-- fine
| tuning actually exacerbates the problem.
| bugglebeetle wrote:
| Are there any good tutorials on fine-tuning the quantitized
| versions of the LLama models anywhere? I have a few NLP tasks I'd
| like to test out, with plenty of training data, but everything
| I've seen doesn't seem generalizable enough or lacks necessary
| details.
| phas0ruk wrote:
| Helpful. I was thinking today about when it makes sense to fine
| tune vs use embeddings to feed into the LLM prompt and this
| helped solidify my understanding.
___________________________________________________________________
(page generated 2023-07-27 23:00 UTC)