[HN Gopher] New AI Training Technique Is Drastically Faster, Say...
___________________________________________________________________
New AI Training Technique Is Drastically Faster, Says Google
Author : moondistance
Score : 65 points
Date : 2024-07-06 18:58 UTC (4 hours ago)
(HTM) web link (decrypt.co)
(TXT) w3m dump (decrypt.co)
| eutropia wrote:
| https://arxiv.org/pdf/2406.17711 - link to the paper
| morbicer wrote:
| Nice. Google scientists come up with ground breaking idea, then
| Google's PM bungles the chance to bring it to the market and
| productize it and someone like OpenAI or Anthropic will swoop in
| to reap the rewards. And the cycle repeats.
|
| Deep Mind people invent transformers and then they watch people
| laugh at Bard or what it's called nowadays because product and
| engineering lost the plot. Kodak is paging you some message from
| the grave, read it Google.
| kirubakaran wrote:
| Sounds like a management issue, not a PM/Engg issue
| morbicer wrote:
| Yes, PM to me stands for product _management_ so management
| issue. Same for engineering - it doesn't mean just individual
| contributors, there's someone managing the engineering as
| well.
| kirubakaran wrote:
| I meant Senior Management. "Product Manager" manages the
| product, not people.
| butyo wrote:
| The context of the comment chain is Google failing to get
| product out the door before the competition based upon
| research results. Sounds like backend management is
| finding things and product management failing to
| capitalize was their point.
| kirubakaran wrote:
| Ah I see what you mean, thanks!
| dyauspitr wrote:
| Not being able to take ideas and turn them into products
| clients want is solely a product management issue.
| fhub wrote:
| Not always. For example if bringing a new product to market
| has the perception it might eat into existing revenue then
| all sorts of managment shenanigans will likely happen at
| most big orgs.
| dbuser99 wrote:
| What are you on about? They publish their research advancing
| the field. And gemini has caught up with openai and anybody
| else.
| morbicer wrote:
| I am glad they are advancing the field but I think it's
| unfortunate that doesn't make them top dog. Gemini is not top
| tier to me but I admit that confusing naming and spotty
| worldwide rollout might be a reason why I am not familiar
| with their best model. But that's a signal on it's own.
|
| The launch was faked and I don't think the real thing is here
| yet https://techcrunch.com/2023/12/07/googles-best-gemini-
| demo-w...
| AndyNemmity wrote:
| Based on this comment, decided to try out gemini.
|
| Total disaster. Doing similar tasks to openai and claude, it
| just borks. And it is complaining about my desire to use a
| gender guesser python libary, and tells me that's
| inappropriate for non-binary people, and it won't do it.
|
| That's fun.
|
| Edit 1: Also it refuses to print the entire script. I've
| tried many work arounds, it seems to only want to output a
| very small number of total lines.
|
| Threw it into ChatGPT and immediately it fixed all the issues
| with Gemini, and worked on first try.
|
| Edit 2: The only thing better about Gemini as far as I can
| tell, is that the copy code button is on the bottom.
| ChatGPT's is at the top, and that's dumb.
|
| Edit 3: I'm being downvoted heavily now, to be clear, I
| didn't intentionally seek out the gender issue, it's just
| what I was working on.
|
| I'm currently trying to generate infographics based on
| wrestlers, and I needed to split the men from the female for
| championship title rankings.
|
| I have no problem with it in general, it just came up, so I
| communicated it.
|
| Multiple times Gemini removed the code using the gender
| guesser library because it felt I shouldn't use it. When
| trying to determine wrestlers, and their Title Chances, it
| makes a lot of sense...
|
| But Gemini just refused to allow me to use it, which seems
| like a ridiculous thing. I want to make the choices here.
| fswd wrote:
| I've had the same exact experiences.
| pheatherlite wrote:
| Problem with Google summed up. Ethics and pseudo sciences
| folks wanting to opinionate technology. That's akin to a
| kitchen knife refusing to cut gift wrapping paper because
| that's inappropriate use of a knife. The silliness
| dyauspitr wrote:
| Gemini is solid. I'll give them a year or two before they start
| building an unscalable moat.
| josephg wrote:
| Claude 3.5 seems better, to me. And ChatGPT is still
| excellent. Why on earth do you think Google will win this
| race?
| hiddencost wrote:
| Deepmind did not invent transformers...
|
| https://arxiv.org/abs/1706.03762
| https://arxiv.org/abs/1810.04805
| josephg wrote:
| Look at the author lists in the pdfs. Almost all of them are
| @google.com. They were Google employees when they wrote and
| published those papers.
| swax wrote:
| AI advancement is coming at us both ways - orders of magnitude
| more compute, with orders of magnitude more efficiency. Hyper
| exponential.
| downboots wrote:
| I only hope it brings about more integration of our vast
| amounts of data instead of more generative inaccuracy
| Dylan16807 wrote:
| The efficiency has not improved all that much, and when you
| multiply two exponential growths it's still exponential.
|
| Though even when you add the efficiency improvements I think
| we're still lagging behind Moore's Law overall.
| vessenes wrote:
| So the paper itself is pretty significant, I think, from looking
| at it. The general methodology seems to be: train small model as
| a discriminatory scoring model on very high quality data (JEST is
| mostly concerned with multi-modal tasks it seems, so think
| image/text caption pairs), have that model score 'maximally
| learnable' batches on a larger / lower quality dataset, then
| train the big model using the scoring.
|
| This turns out to be significant FLOPs and quality win, even
| counting for the initial model training and scoring part of it,
| they claim roughly 10x for quality/FLOP tradeoffs, and they show
| some significantly beating SOTA numbers for some tasks in their
| model size.
|
| The bad part, to me, is that this is some significant engineering
| -- it requires known high quality datasets, training of the
| scoring model, selection and scoring of the data for the big
| training run - this is not a bold new leap that's going to be
| easy to implement for hobbyists - this is a practitioner's
| excellent engineering showing the way forward for certain
| training needs.
|
| As always, appreciate the publishing from DeepMind - this looks
| like great work. It would be nice to see a company like
| together.ai or others get it actionized into a pipeline; it might
| be a bit, though. It looks relatively gnarly in the details on
| the data and scoring side.
| kmmlng wrote:
| Isn't this similar to what Microsoft did with their Phi models?
| vessenes wrote:
| I don't think so -- the Phi training plan was to pull answers
| from textbooks and have GPT-4 write questions for the
| answers, thus ensuring high quality completions. They then
| trained on this data, fairly indiscriminately. This _is_
| about quality of training data, but it's much more general in
| that it's an approach that can target broad scale web data
| using a small / cheap model to 'sort' and prioritize.
| kelseyfrog wrote:
| Great, improvements in efficiency will lead to greater resource
| consumption due to Jevons Paradox[1].
|
| 1. https://en.wikipedia.org/wiki/Jevons_paradox
| Mehvix wrote:
| >the falling cost of use induces increases in demand enough
| that resource use is increased, rather than reduced
|
| This is just saying throughput is increased, yes? The time to
| train, and thus iterate (i.e. dialing in hyperpaprams) will
| decrease.
| kelseyfrog wrote:
| It calls into question the byline "which could mean lower
| energy demands."
|
| Ie: more efficient steam engines lead to both an increase of
| steam engine throughput as well as coal consumption, an
| increase in AI efficiency can lead to an increase in training
| throughput and energy consumption.
|
| The paradox is a result of prevalence scaling faster than
| efficiency and efficiency driving prevalence.
___________________________________________________________________
(page generated 2024-07-06 23:00 UTC)