[HN Gopher] LLMs Unleashed: The Power of Fine-Tuning
       ___________________________________________________________________
        
       LLMs Unleashed: The Power of Fine-Tuning
        
       Author : lucaspauker
       Score  : 49 points
       Date   : 2023-07-27 17:08 UTC (5 hours ago)
        
 (HTM) web link (lucaspauker.com)
 (TXT) w3m dump (lucaspauker.com)
        
       | treprinum wrote:
       | This is 2020-level stuff. These days with emergent abilities in
       | LLMs trained with over 1T tokens like GPT-4 a single-shot chain-
       | of-thought beats most fine-tunings. I did research on transformer
       | adapters i.e. parameter-efficient fine-tuning and that stuff is
       | now completely obsolete outside of some restricted domains where
       | small models can still perform well.
        
         | Der_Einzige wrote:
         | Gosh you are so wrong. Literally every bit of fine tuning and
         | fine tuning related work is more important than ever. Being
         | able to fine tune a giant model like GPT-4 would be a game
         | changer. I don't get why people like to come on here and tell
         | blatant lies like this.
        
         | jph00 wrote:
         | I haven't seen any recent papers that show that fine-tuning is
         | obsolete - I've only seen papers showing the opposite. I'd be
         | very interested to see any papers that have demonstrated
         | applications where fine-tuning is not effective nowadays, if
         | you have any links.
         | 
         | Here's an example of a paper that shows good results from fine-
         | tuning for instance: https://arxiv.org/abs/2305.02301
        
           | treprinum wrote:
           | This Stanford seminar video can provide some references:
           | 
           | https://www.youtube.com/watch?v=tVtOevLrt5U
        
       | jph00 wrote:
       | > " _The idea of fine-tuning has a strong research pedigree. The
       | approach can be traced back to 2018, when two influential papers
       | were published._ "
       | 
       | The article refers to the BERT and GPT papers as the source of
       | the fine-tuning idea. However, we actually first demonstrated it
       | for universal models in 2017 and published the ULMFiT (Howard and
       | Ruder) paper in early 2018. Prior to that, Dai and Le
       | demonstrated the technique for in-corpus datasets. So it would be
       | more accurate to say the approach can be traced back to those two
       | papers, rather than to BERT and GPT.
       | 
       | BERT and GPT showed the effectiveness of scaling up the amount of
       | data and compute, and switching the model architecture to
       | Transformers (amongst other things).
        
       | SpaceManNabs wrote:
       | I like that your article was well cited. Fun read. Nothing stands
       | out as too inaccurate.
       | 
       | You should try a post on parameter efficient tuning next!
        
       | mickeyfrac wrote:
       | The link to your terra cotta product, which I assume is the point
       | of the article, is broken.
        
         | [deleted]
        
       | nullc wrote:
       | > Fine tuning is better for complex tasks where the model's
       | generated output must be accurate and trusted.
       | 
       | uhhh. I understand what was intended there but while fine tuning
       | may reduce the rate of hallucinations and make hallucinations
       | more plausible, it's not magic accurate and trust-worthy dust.
       | 
       | Unfortunately many people think this stuff is magic and care
       | should be taken to not encourage people to confuse improvements
       | with resolving the issue.
       | 
       | One way of characterizing the LLM accuracy problem is that it
       | often _looks_ very accurate and convincing even when it is
       | emitting nonsense. If you cast the problem in those terms-- as a
       | problem of looking more trustworthy than it actually is-- fine
       | tuning actually exacerbates the problem.
        
       | bugglebeetle wrote:
       | Are there any good tutorials on fine-tuning the quantitized
       | versions of the LLama models anywhere? I have a few NLP tasks I'd
       | like to test out, with plenty of training data, but everything
       | I've seen doesn't seem generalizable enough or lacks necessary
       | details.
        
       | phas0ruk wrote:
       | Helpful. I was thinking today about when it makes sense to fine
       | tune vs use embeddings to feed into the LLM prompt and this
       | helped solidify my understanding.
        
       ___________________________________________________________________
       (page generated 2023-07-27 23:00 UTC)