[HN Gopher] Expanding Transformer size without losing function o...
___________________________________________________________________
Expanding Transformer size without losing function or starting from
scratch
Author : og_kalu
Score : 24 points
Date : 2023-08-18 17:14 UTC (5 hours ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| valine wrote:
| I'm surprised we weren't doing this already. I'd like to see what
| happens if you train a small language model on preschool level
| reading material, and ramp up both the model size and training
| data complexity as you go. My hope would be you'd need less data
| to train a model in this fashion than you would with our current
| approach of throwing the entire internet at the model.
| two_in_one wrote:
| > I'm surprised we weren't doing this already.
|
| Because they don't work very well in real life. Article doesn't
| have any experimental proof it works, saves training time, or
| size, or anything..
|
| And it doesn't mention another promising approach: mix of
| experts. There are many ways of implementing it. I mean the
| general idea of specialized fragments of the NN which are
| selectively called. They don't have to be trained all at once.
| galaxytachyon wrote:
| Hilarious. Eventually we will create a real sentient AGI that is
| highly efficient and can do everything a normal human can, but it
| turns out it needs 20 years of training and too much
| entertainment needs and it often goes mad when we force it to
| work on repetitive work too long, i.e. a normal human.
|
| Would be so funny if by the end of all the AI research, we found
| out the human brain is already the best you can ever get.
| bilsbie wrote:
| Is that kind of the Star Trek universe?
| Ifkaluva wrote:
| I would be delighted by this. Sounds like something Douglas
| Adams would dream up.
| drdeca wrote:
| [delayed]
___________________________________________________________________
(page generated 2023-08-18 23:01 UTC)