[HN Gopher] New AI Training Technique Is Drastically Faster, Say...
       ___________________________________________________________________
        
       New AI Training Technique Is Drastically Faster, Says Google
        
       Author : moondistance
       Score  : 65 points
       Date   : 2024-07-06 18:58 UTC (4 hours ago)
        
 (HTM) web link (decrypt.co)
 (TXT) w3m dump (decrypt.co)
        
       | eutropia wrote:
       | https://arxiv.org/pdf/2406.17711 - link to the paper
        
       | morbicer wrote:
       | Nice. Google scientists come up with ground breaking idea, then
       | Google's PM bungles the chance to bring it to the market and
       | productize it and someone like OpenAI or Anthropic will swoop in
       | to reap the rewards. And the cycle repeats.
       | 
       | Deep Mind people invent transformers and then they watch people
       | laugh at Bard or what it's called nowadays because product and
       | engineering lost the plot. Kodak is paging you some message from
       | the grave, read it Google.
        
         | kirubakaran wrote:
         | Sounds like a management issue, not a PM/Engg issue
        
           | morbicer wrote:
           | Yes, PM to me stands for product _management_ so management
           | issue. Same for engineering - it doesn't mean just individual
           | contributors, there's someone managing the engineering as
           | well.
        
             | kirubakaran wrote:
             | I meant Senior Management. "Product Manager" manages the
             | product, not people.
        
               | butyo wrote:
               | The context of the comment chain is Google failing to get
               | product out the door before the competition based upon
               | research results. Sounds like backend management is
               | finding things and product management failing to
               | capitalize was their point.
        
               | kirubakaran wrote:
               | Ah I see what you mean, thanks!
        
           | dyauspitr wrote:
           | Not being able to take ideas and turn them into products
           | clients want is solely a product management issue.
        
             | fhub wrote:
             | Not always. For example if bringing a new product to market
             | has the perception it might eat into existing revenue then
             | all sorts of managment shenanigans will likely happen at
             | most big orgs.
        
         | dbuser99 wrote:
         | What are you on about? They publish their research advancing
         | the field. And gemini has caught up with openai and anybody
         | else.
        
           | morbicer wrote:
           | I am glad they are advancing the field but I think it's
           | unfortunate that doesn't make them top dog. Gemini is not top
           | tier to me but I admit that confusing naming and spotty
           | worldwide rollout might be a reason why I am not familiar
           | with their best model. But that's a signal on it's own.
           | 
           | The launch was faked and I don't think the real thing is here
           | yet https://techcrunch.com/2023/12/07/googles-best-gemini-
           | demo-w...
        
           | AndyNemmity wrote:
           | Based on this comment, decided to try out gemini.
           | 
           | Total disaster. Doing similar tasks to openai and claude, it
           | just borks. And it is complaining about my desire to use a
           | gender guesser python libary, and tells me that's
           | inappropriate for non-binary people, and it won't do it.
           | 
           | That's fun.
           | 
           | Edit 1: Also it refuses to print the entire script. I've
           | tried many work arounds, it seems to only want to output a
           | very small number of total lines.
           | 
           | Threw it into ChatGPT and immediately it fixed all the issues
           | with Gemini, and worked on first try.
           | 
           | Edit 2: The only thing better about Gemini as far as I can
           | tell, is that the copy code button is on the bottom.
           | ChatGPT's is at the top, and that's dumb.
           | 
           | Edit 3: I'm being downvoted heavily now, to be clear, I
           | didn't intentionally seek out the gender issue, it's just
           | what I was working on.
           | 
           | I'm currently trying to generate infographics based on
           | wrestlers, and I needed to split the men from the female for
           | championship title rankings.
           | 
           | I have no problem with it in general, it just came up, so I
           | communicated it.
           | 
           | Multiple times Gemini removed the code using the gender
           | guesser library because it felt I shouldn't use it. When
           | trying to determine wrestlers, and their Title Chances, it
           | makes a lot of sense...
           | 
           | But Gemini just refused to allow me to use it, which seems
           | like a ridiculous thing. I want to make the choices here.
        
             | fswd wrote:
             | I've had the same exact experiences.
        
             | pheatherlite wrote:
             | Problem with Google summed up. Ethics and pseudo sciences
             | folks wanting to opinionate technology. That's akin to a
             | kitchen knife refusing to cut gift wrapping paper because
             | that's inappropriate use of a knife. The silliness
        
         | dyauspitr wrote:
         | Gemini is solid. I'll give them a year or two before they start
         | building an unscalable moat.
        
           | josephg wrote:
           | Claude 3.5 seems better, to me. And ChatGPT is still
           | excellent. Why on earth do you think Google will win this
           | race?
        
         | hiddencost wrote:
         | Deepmind did not invent transformers...
         | 
         | https://arxiv.org/abs/1706.03762
         | https://arxiv.org/abs/1810.04805
        
           | josephg wrote:
           | Look at the author lists in the pdfs. Almost all of them are
           | @google.com. They were Google employees when they wrote and
           | published those papers.
        
       | swax wrote:
       | AI advancement is coming at us both ways - orders of magnitude
       | more compute, with orders of magnitude more efficiency. Hyper
       | exponential.
        
         | downboots wrote:
         | I only hope it brings about more integration of our vast
         | amounts of data instead of more generative inaccuracy
        
         | Dylan16807 wrote:
         | The efficiency has not improved all that much, and when you
         | multiply two exponential growths it's still exponential.
         | 
         | Though even when you add the efficiency improvements I think
         | we're still lagging behind Moore's Law overall.
        
       | vessenes wrote:
       | So the paper itself is pretty significant, I think, from looking
       | at it. The general methodology seems to be: train small model as
       | a discriminatory scoring model on very high quality data (JEST is
       | mostly concerned with multi-modal tasks it seems, so think
       | image/text caption pairs), have that model score 'maximally
       | learnable' batches on a larger / lower quality dataset, then
       | train the big model using the scoring.
       | 
       | This turns out to be significant FLOPs and quality win, even
       | counting for the initial model training and scoring part of it,
       | they claim roughly 10x for quality/FLOP tradeoffs, and they show
       | some significantly beating SOTA numbers for some tasks in their
       | model size.
       | 
       | The bad part, to me, is that this is some significant engineering
       | -- it requires known high quality datasets, training of the
       | scoring model, selection and scoring of the data for the big
       | training run - this is not a bold new leap that's going to be
       | easy to implement for hobbyists - this is a practitioner's
       | excellent engineering showing the way forward for certain
       | training needs.
       | 
       | As always, appreciate the publishing from DeepMind - this looks
       | like great work. It would be nice to see a company like
       | together.ai or others get it actionized into a pipeline; it might
       | be a bit, though. It looks relatively gnarly in the details on
       | the data and scoring side.
        
         | kmmlng wrote:
         | Isn't this similar to what Microsoft did with their Phi models?
        
           | vessenes wrote:
           | I don't think so -- the Phi training plan was to pull answers
           | from textbooks and have GPT-4 write questions for the
           | answers, thus ensuring high quality completions. They then
           | trained on this data, fairly indiscriminately. This _is_
           | about quality of training data, but it's much more general in
           | that it's an approach that can target broad scale web data
           | using a small  / cheap model to 'sort' and prioritize.
        
       | kelseyfrog wrote:
       | Great, improvements in efficiency will lead to greater resource
       | consumption due to Jevons Paradox[1].
       | 
       | 1. https://en.wikipedia.org/wiki/Jevons_paradox
        
         | Mehvix wrote:
         | >the falling cost of use induces increases in demand enough
         | that resource use is increased, rather than reduced
         | 
         | This is just saying throughput is increased, yes? The time to
         | train, and thus iterate (i.e. dialing in hyperpaprams) will
         | decrease.
        
           | kelseyfrog wrote:
           | It calls into question the byline "which could mean lower
           | energy demands."
           | 
           | Ie: more efficient steam engines lead to both an increase of
           | steam engine throughput as well as coal consumption, an
           | increase in AI efficiency can lead to an increase in training
           | throughput and energy consumption.
           | 
           | The paradox is a result of prevalence scaling faster than
           | efficiency and efficiency driving prevalence.
        
       ___________________________________________________________________
       (page generated 2024-07-06 23:00 UTC)