[HN Gopher] Our Humble Attempt at "How Much Data Is Needed to Fi...
___________________________________________________________________
Our Humble Attempt at "How Much Data Is Needed to Fine-Tune
Author : gnahzby
Score : 31 points
Date : 2023-09-24 20:23 UTC (2 hours ago)
(HTM) web link (barryzhang.substack.com)
(TXT) w3m dump (barryzhang.substack.com)
| tomohelix wrote:
| Is this something like short term vs long term memory? The
| context window for LLMs is its short term memory where you can
| tell it to do things or quickly define something and the LLM can
| learn very quickly even with just 1 example or a sentence. But it
| forgets immediately once the work is done. But for finetuning, it
| commits the knowledge into its weight network and have a "deeper"
| understanding? The cost is it takes more effort and energy to do
| so?
|
| If so, let say in the future, we have an LLM with 100K token
| context windows but with a subsystem where it notices some
| knowledge keeps being repeated in the context and then store that
| knowledge for finetuning when the LLM is not doing inference.
| Basically a mirror of the way we human work? Is that possible? An
| LLM that constantly improved and can adapt to new knowledge?
| BoorishBears wrote:
| Fine tuning is mostly useless for direct addition of knowledge.
|
| You can use it to improve knowledge in indirect ways:
|
| - get the model better at crafting queries for an external data
| source
|
| - get the model better a tool usage to do computation with an
| external system
|
| - get more useful embeddings from BERT/SBERT
|
| - tell the model what it cannot answer accurately
|
| But in general, fine tuning is noise right now because 99% of
| the people chasing it actually don't need it.
|
| If you want to change how the model presents text, use fine
| tuning. If you want to change what the model can present, fine
| tuning is a hopeless way of doing it.
| joewferrara wrote:
| They test two fine tuning tasks in the article - reliable output
| formatting and custom tone. These are two tasks (reliable output
| formatting in particular) that are advertised regularly as areas
| where fine tuning an LLM should work. The goal is not to change
| what the LLM knows, but to change how the LLM communicates what
| it knows. In theory the user wants to leverage the LLMs knowledge
| base and using the different output format is more useful to the
| user.
|
| The hard question IMO is the question of when does it make sense
| to fine tune an LLM to update it's knowledge and how much data is
| needed in this case? I have not seen anyone show a real example
| of succeeded in this case and wonder if it's close to as
| difficult as training the LLM from scratch or if it's a feasible
| fine tuning use case.
| dnnssl2 wrote:
| Knowledge instillation is probably the holy grail of fine
| tuning. The hard part is:
|
| 1. Generalizing new facts. You can create a question answer
| pair of: "what is the population of the world in 2023?" "8
| billion", but it may not be able to pick up alternate phrasing
| or "does the world have 8 billion people on it?"
|
| 2. Catastrophic and behavioral forgetting. Continued fine
| tuning after RLHF and instruction fine tuning may result in the
| loss of the alignment and instruction following capabilities
| trained by OpenAI. At worst, it will start spewing random
| tokens like the example in the post.
|
| I have not yet seen it successfully done, and I suspect that
| updating fractions (~.1%) of the original weights with PEFT
| methods won't help.
| BoorishBears wrote:
| Your answer is not really answering and is liable to confuse
| someone asking the question this person asked... the answer
| to their question is a simple: No.
|
| Current fine tuning techniques can only contribute to
| knowledge indirectly (getting better queries for an external
| data source for example), you cannot directly embed new facts
| in the model is any generally efficient/effective manner.
|
| There are toy examples of fine tuning in facts that are not
| of use outside of academic considerations at this point, and
| I sense it's contributing to the widespread confusion about
| fine-tuning's value proposition
| dnnssl2 wrote:
| There are a few reputable academic examples of factual
| editing, such as: https://rome.baulab.info/
|
| I don't believe that the answer is strictly no. There are
| still many questions around the fine tuning method and the
| scale of data, as well as expectations of task accuracy
| from the perspective of an end user.
| ozr wrote:
| Fwiw, unpublished testing on LLaMA-1 13B showed that it was
| able to learn a new word and it's meaning via PEFT with <50
| examples. Finetuning can unquestionably add new data to a
| model.
|
| Jeremy Howard has written a bit about how quickly LLMs can pick
| up new concepts as well:
|
| https://www.fast.ai/posts/2023-09-04-learning-jumps/
| mikeagb wrote:
| The question of how to fine-tune to teach LLMs
| facts/knowledge is definitely something we're interested in
| exploring more in future work. The common opinion seems to me
| at least to be that fine-tuning is more to teach the model
| how to use the knowledge it already has to complete a
| specific task rather than to instill new knowledge, and that
| RAG should be sued to provide more specific context. However,
| I personally believe there is potential in fine-tuning for
| "memorization" or learning, and am excited to see new
| developments in the field.
___________________________________________________________________
(page generated 2023-09-24 23:00 UTC)