[HN Gopher] Chinchilla's Death
___________________________________________________________________
Chinchilla's Death
Author : KolmogorovComp
Score : 56 points
Date : 2023-09-04 18:31 UTC (4 hours ago)
(HTM) web link (espadrine.github.io)
(TXT) w3m dump (espadrine.github.io)
| newfocogi wrote:
| While the article makes good observations, this would appear to
| be a major oversight by leading research labs if they could have
| just kept the gas pedal down on simpler models for longer and
| gotten better performance. This is HackerNews - can we get
| someone from OpenAI, DeepMind, or MetaAI to respond and justify
| why cutting off the smaller models at a lower total compute
| budget is justified?
| v64 wrote:
| The Llama 1 paper [1] was one of the earlier models to question
| the assumption that more params = better model. Since then
| they've released Llama 2 and this post is offering more
| evidence that reinforces their hypothesis.
|
| I wouldn't say it was an oversight by other labs that they
| missed this. It's easier to just increase params on a model
| over the same training set instead of gathering a larger
| training set necessary for a smaller model. And at first,
| increasing model size did seem to be the way forward, but we've
| since hit diminishing returns. Now that we've hit that point,
| we've begun exploring other options and the Llamas are early
| evidence of another way forward.
|
| [1] https://arxiv.org/abs/2302.13971
| binarymax wrote:
| What is the likelihood of overfitting the smaller models? It's
| not obvious what the criteria and hyperparams are that prevent
| that.
|
| If there's no overfitting and the results get reproduced then
| this is a very promising find.
___________________________________________________________________
(page generated 2023-09-04 23:00 UTC)