[HN Gopher] The bigger-is-better approach to AI is running out o...
       ___________________________________________________________________
        
       The bigger-is-better approach to AI is running out of road
        
       Author : pseudolus
       Score  : 67 points
       Date   : 2023-06-24 20:12 UTC (2 hours ago)
        
 (HTM) web link (www.economist.com)
 (TXT) w3m dump (www.economist.com)
        
       | d--b wrote:
       | Roads?! Where we're going, we don't need roads
        
       | jug wrote:
       | I also heard this. I unfortunately forget which study it was but
       | yes, their paper spoke of likely diminishing returns at least at
       | around 400-500B parameters for current LLM's. The recent news of
       | GPT-4 running on 8x 220B LLM's (which doesn't equal a 8*220B
       | size) fits that range and it's also questionable how much further
       | we can push LLM's further by introducing multiple models like
       | this, because this too eventually introduces problems due to
       | granularity and picking the right model if I understood this
       | correctly (from an earlier Hacker News discussion). (sorry for
       | altogether no sources lol)
        
         | whimsicalism wrote:
         | Have those reports from George Hotz been confirmed? It seems
         | plausible to me, but also suggests to me that we have further
         | to go by using that parameter budget for depth rather than for
         | width.
        
           | woeirua wrote:
           | It seems consistent with the behavior we see when using GPT4
           | in chat mode. Every once in a while it will change its answer
           | as it's generating it, as though it's switched which model it
           | favors to produce the response. GPT3.5 doesn't do that.
        
             | og_kalu wrote:
             | MoE models don't work like that though
        
       | awestroke wrote:
       | It's been just a few months since GPT4. Calm down.
        
         | codetrotter wrote:
         | Major AI winter #3. Let's gooo!
        
       | xwdv wrote:
       | I remember how hyped people were seeing the progress from GPT3.5
       | to GPT4, people really felt like many jobs were going to be
       | replaced very soon. The next big advancement was around the
       | corner. I think the limitations of LLMs should be more salient to
       | them by now.
        
       | tomohelix wrote:
       | Non paywall: https://archive.ph/XwWTi
       | 
       | Imo, it is true that the current architecture is hitting the
       | limit. We need a breakthrough on the scale of the transistor to
       | get past this problem. We know it is possible though. Every
       | single human is proof that high performance AI can be run with
       | less energy than a laptop. We just need a dedicated architecture
       | for their working mechanisms the same way a transistor is the
       | embodiment of 1 and 0.
       | 
       | Unfortunately, in terms of understanding intelligence and how
       | they work, I don't think we have made any significant advance in
       | the last few decades. Maybe with better tools at probing how the
       | LLM work, we can get some new insights.
        
       | whimsicalism wrote:
       | Data requirements are overstated - you can train on longer and
       | longer sequences and I am pretty sure most organizations are
       | still using the "show the model the data only once" approach
       | which is just wasteful.
       | 
       | Compute challenges are more real, but we are seeing for the first
       | time huge amounts of global capital being allocated to solve
       | specifically these problems, so I am curious what fruit that will
       | bear in a few years.
       | 
       | I mean already the stuff that some of these low level people are
       | doing is absolutely nuts. Tim Dettmers work training with only 4
       | bits means only 16 possible values per weight and still getting
       | great results.
        
         | nextworddev wrote:
         | Yep. We are in the very early innings of capital being deployed
         | to all this.
        
           | sgt101 wrote:
           | Ok - what's the ROI on the $10bn (++) that OpenAI have had?
           | 
           | So far I reckon <$10m in actual revenue.
           | 
           | This isn't what VC's (or microsoft) dream of.
        
             | nextworddev wrote:
             | I think Azure OpenAI service is growing at 1000% per
             | quarter according to last earnings call
        
               | TeMPOraL wrote:
               | Indeed. Azure OpenAI service is how you get corporate-
               | blessed ChatGPT that you can use with proprietary
               | information, among other things. There's a huge demand
               | for it.
        
             | Quarrelsome wrote:
             | The level of exposure since chatGPT has/will result in a
             | lot of money turning up, especially for applications of the
             | existing technology (whether they succeed or fail). That
             | stats on usage demonstrate that the thundering herd has
             | noticed and that attention can be extremely valuable.
             | 
             | I think its quite likely that OpenAI will make that money
             | back and more, as both the industry leader and with the
             | power of their brand (chatGPT).
        
         | PartiallyTyped wrote:
         | > using the "show the model the data only once" approach which
         | is just wasteful.
         | 
         | According to the InstructGPT paper, that is not the case,
         | showing the data multiple times results in overfitting.
        
           | whimsicalism wrote:
           | 1. You are just referring to fine tuning, I am referring to
           | training the base model.
           | 
           | 2. They still saw performance improvements which is why they
           | did train on the data multiple times, you can see in the
           | paper.
           | 
           | 3. there was a recent paper demonstrating that reusing data
           | still saw continued improvements in perplexity, i am on my
           | ipad so cannot find it now
        
             | PartiallyTyped wrote:
             | > 1. You are just referring to fine tuning, I am referring
             | to training the base model.
             | 
             | Ahh mb! Sorry.
        
       | Animats wrote:
       | We need a way to make tight little specialist models that don't
       | hallucinate and reliably report when they don't know. Trying to
       | cram all of the web into a LLM is a dead end.
        
         | causalmodels wrote:
         | It's not an either or. We're going to leverage the web trained
         | LLMs to bootstrap the specialist models via combination of
         | training token quality classifiers and synthetic data
         | generation. Phi-1 is a pretty good example of this.
        
         | twobitshifter wrote:
         | I don't know how a bunch of specialist models don't combine
         | into a super useful generalist model. Do we believe too much
         | knowledge breaks an LLM?
        
         | TX81Z wrote:
         | GPT4 is really good at code and you can generally verify
         | hallucination easily.
         | 
         | The other good use cases are using LLM to turn natural language
         | prompts into API calls to real data.
        
           | simion314 wrote:
           | >GPT4 is really good at code
           | 
           | for popular languages, though for JS it most of the time
           | outputs obsolete syntax and code.
           | 
           | Today i tried to do a bit of scripting with my son in Garrys
           | Mod, it uses Expression 2 for a Wiremod module. GPT
           | hallucinated a lot of functions and the worst part it
           | switched almost each time from e2 to lua.
           | 
           | It is good at solving homeworks for students, or solving
           | popular problems in popular languages and libraries though it
           | might give you an ugly solution and ugly code, it is probably
           | trained on bad code too and it did not learn to prefer good
           | code over bad code.
        
           | LASR wrote:
           | This is what I've been doing. GPT-4 to generate some data
           | from some input, followed up by a 3.5T call to verify the
           | output against the input to verify the content. You can feed
           | the 3.5T output straight back into GPT-4 again and it will
           | self-correct.
           | 
           | Doing this a couple of times gives me 100% accuracy for my
           | use case that involves some level of summarization and
           | reasoning.
           | 
           | Hallucinations are not as big of a deal at all IMO. Not
           | enough that I'll just sit there and wait for models that
           | don't hallucinate.
        
             | adriand wrote:
             | This sounds interesting, can you detail this data flow a
             | bit more and maybe provide an example?
        
         | skepticATX wrote:
         | This is ultimately just very powerful semantic search though,
         | is it not?
         | 
         | It seems that what we need to make a big leap forward is better
         | reasoning. There is a lot of debate between the GPT-4 can/can't
         | reason camps, but I haven't seen anyone try to argue that it
         | reasons _particularly_ well.
        
         | nomel wrote:
         | > that don't hallucinate
         | 
         | "Hallucination" is part of thought. Solving a new problem
         | requires hallucinating new, non existing, possible outcomes and
         | solutions, to find one that will work. It seems that
         | eliminating the ability to interpolate and extrapolate
         | (hallucinations) would make intelligence impossible. It would
         | eliminate creativity, tying together new concepts, creation,
         | etc.
         | 
         | Is the goal AI, or a nice database front end, to reference
         | facts? Is intelligence facts, or is it the flexibility and the
         | ability to handle and create the novel, things that are _new_?
         | 
         | The ability to have _confidence_ , and know and respond to it,
         | seems important, but that's surely different than the
         | elimination of hallucinations.
         | 
         | I'm probably misunderstanding something, and/or don't know what
         | I'm talking about.
        
           | Barrin92 wrote:
           | >Is the goal AI, or a nice database front end, to reference
           | facts?
           | 
           | The latter given the kind of products that are currently
           | being built with it. You don't want your code completion or
           | news aggregator to hallucinate for the same reason you don't
           | want your wrench to hallucinate, it's a tool.
           | 
           | And as for hallucinations, that's a PR friendly misnomer for
           | "it made **** up". Using the same phrase doesn't mean it has
           | functionally anything to do with the cognitive processes
           | involved in human thought. In the same way a 'artificial'
           | neural net is really a metaphorical neural net, it has very
           | few things in common with biological neurons.
        
         | 4ndrewl wrote:
         | They don't know that they don't know. It's only hallucination
         | from a human's perspective. From the model's perspective it's
         | _all_ hallucination.
        
         | z3c0 wrote:
         | Logistic Regression is simple to implement, supports binomial,
         | multinomial, and ordinal classification, and is a key layer in
         | NN, as it's often used as an actuator that sorts a probability
         | into a discrete category. Very good for specialized problems,
         | and easily trainable to sort unknown or nonsensical inputs into
         | a noncategory.
         | 
         | Linear Regression is great for projections, and can even be fit
         | to time series data using lagging.
        
         | Solvency wrote:
         | The same way "attention" was a game changer, I'm not sure why
         | they don't invent some recursive self-maintenance algorithm
         | that constantly improves the neural network within. Self
         | directed attention, so to speak.
        
         | wilg wrote:
         | How are we determining it's a dead end here? Recently things
         | like GPT-4 have come out which have improved on that technique.
         | Why would specialization necessarily reduce hallucination and
         | improve accuracy?
        
         | rco8786 wrote:
         | Bingo. I've been beating this drum since the initial GPT-3
         | awe.. The future of AI is bespoke, purpose-driven models
         | trained on a combination of public and (importantly)
         | proprietary data.
         | 
         | Data is still king.
        
           | samstave wrote:
           | It will be a terrific moment when we can browse a galaxy map
           | or other lists of LLMs, and select "context" groups - so you
           | can say from history, give me X
        
             | thelittleone wrote:
             | That doesn't sound ideal at all to me. It also sounds like
             | a bottleneck. I want to ask AI once and it recommends or
             | selects the best "context" group.
        
               | samstave wrote:
               | It would bea good visual training tool for kids so they
               | can get an understanding early on other than a block-box
               | view.
        
           | JimtheCoder wrote:
           | So, search isn't dead after all...
        
             | stavros wrote:
             | Search has definitely been dead since before LLMs, we just
             | don't have a replacement yet.
        
               | JimtheCoder wrote:
               | I mean more the concept of search, not the current
               | implementation
        
               | stavros wrote:
               | Oh, I don't think we'll ever stop wanting to search for
               | things. Maybe not everything, but some things.
        
               | BolexNOLA wrote:
               | For a while my replacement was "use google, add 'reddit'
               | at the end." Not sure how much longer that will work
               | given even just this limited blackout impacted how
               | effective that was lol
        
               | edgyquant wrote:
               | That hasn't worked since about three months after
               | companies found out people do it. It's all astroturfing
               | now days anyway and if it applies to products (which it
               | for sure does) you can be sure that government actors
               | caught on as well.
        
         | brucethemoose2 wrote:
         | Its complicated.
         | 
         | Training on something huge like "the internet" is what gives
         | rise to those amazing emergent properties missing in smaller
         | models (including the recent Pi model). And there are only so
         | many datasets that huge.
         | 
         | But its also indeed a waste, as Pi proves.
         | 
         | There probably is some sweet spot (6B-40B?) for specialized,
         | heavily focused models pre trained with high quality general
         | data.
        
         | TeMPOraL wrote:
         | How much general "thinking"[0] would you want those "tight
         | little specialist models" to retain? I think that cramming "all
         | of the web" is actually crucial for this capability[1], so at
         | least with LLM-style models, you likely can't avoid it. The
         | text in the training data set doesn't encode _just_ the object-
         | level knowledge, but indirectly also higher-level, cross-domain
         | and general concepts; cutting down on the size and breadth of
         | the training data may cause the network to lose the ability to
         | "understand"[0].
         | 
         | --
         | 
         | [0] - Or "something very convincingly pretending to think by
         | parroting stuff back", if you're closer to the "stochastic
         | parrot" view.
         | 
         | [1] - Per my hand-wavy hypothesis that the bulk of what we call
         | thinking boils down to proximity search in extremely high-
         | dimensional space.
        
       | bluecoconut wrote:
       | Another recent (but not called out in this article) is the
       | "Textbooks Are All You Need" paper [1]; the results seem to
       | suggest that careful curation and curriculums of training data
       | can significantly improve model capabilities (when training
       | domain specific, smaller models). Claiming a 10x smaller model
       | can outperform competitors. (Eg. phi-1 vs. starcoder)
       | 
       | [1] https://arxiv.org/abs/2306.11644
        
         | TaylorAlexander wrote:
         | I've still wondered if anyone has tried training with a large
         | dataset of published books, like something from library
         | genesis, or in the case of google using the full text from
         | google books. There's all this talk of finding quality text and
         | I've not heard of text from print books being a major source
         | beyond this textbooks paper?
        
           | startupsfail wrote:
           | That's how OpenAI was (is) doing it. Books downloaded from
           | the Internet is a part of the dataset as per GPT3 model card.
           | Right to read.
        
       | skepticATX wrote:
       | I think the worst case scenario, and the one that I think is the
       | most likely, is that we plateau at broad and shallow AI. Broad
       | enough such that it can be used to replace many workers (it may
       | not do a great job though), but narrow enough such that we don't
       | really see the kinds of productivity increases that would usher
       | in the pseudo-utopia that many AI folks talk about.
        
       | LeanderK wrote:
       | To the outsider it might seem that the only thing we've been
       | doing is scaling up the neural networks, but that's not true. A
       | lot of innovation and changes happened, some enabled us to scale
       | up more and others just improved performance. I am quite
       | confident that innovation will continue.
        
         | jgalt212 wrote:
         | Your statement is entirely fair, but the actual title is "The
         | bigger-is-better approach to AI is running out of road". They
         | are actually saying what you are saying, but your comment seems
         | to contest the article.
        
       | sheepscreek wrote:
       | Sam Altman has been saying this for months. Nothing noteworthy
       | here for someone following the industry closely.
       | 
       | A16z's latest summary of the landscape was way more useful and
       | relevant than this.
        
       | satellite2 wrote:
       | How is the economist qualified to answer this question?
        
         | krona wrote:
         | Usually its the most credentialed I'm least interested in
         | listening to, especially when the value of those credentials
         | are dependant upon the future looking a particular way.
        
         | semiquaver wrote:
         | Not sure where I heard this but it's apparently a common trope
         | that many of the politicians and leaders that treat The
         | Economist as close to holy writ are often horrified to learn
         | that most the staff is actually a bunch of very precocious
         | 20-somethings that are good at research and writing in an
         | authoritative tone.
         | 
         | Actually, now that I think of it, not so different from LLMs...
         | 
         | (Full disclosure, I've been a subscriber for a couple of
         | decades)
        
         | JimtheCoder wrote:
         | Using this logic, how are they qualified to answer 90% of the
         | questions their articles deal with...
        
           | satellite2 wrote:
           | Yes and no. Most questions they answers are more politically
           | leaning, meaning that they are not optimisation problems but
           | they answer the question of the kind of society we want to
           | be.
           | 
           | This specific article seems to be reporting on a very
           | technical issue on how to continue to scale LLM. Even
           | scientific papers have a hard time answering those kind of
           | questions because unless in very special circumstances where
           | we can show with a good confidence that there are limitations
           | (P vs NP for instance) the answer will simply be given by the
           | most successful approach.
        
       | dpeckett wrote:
       | Sparse networks are the future, there's definitely a few major
       | algorithmic hurdles we'll have to cross before they end up a real
       | option but long term they will dominate (after all they already
       | do in the living world).
       | 
       | All our current approaches rely on dense matrix multiplications.
       | These approaches necessitate a tremendous amount of communication
       | bandwidth (and low latency collectives). This is extremely
       | challenging and expensive to scale O(n^2.3).
       | 
       | The constraints of physics and finance make significantly larger
       | models out of reach for now.
        
       | CrzyLngPwd wrote:
       | https://archive.is/XwWTi
        
       | jackmott42 wrote:
       | I'm pretty excited by the possibilities. I am astounded by how
       | much these language models can do with nothing but "predict the
       | next word" as the core idea. I imagine in the near future having
       | collections of a hundred different models, physics models,
       | grammar models, fact models, sentiment models, vision models,
       | wired all together by coordination models, and wired up to math
       | tools and databases to ground truth when possible. I think it can
       | get pretty wild.
       | 
       | Just chatGPT wired up to Wolfram Alpha is already pretty creepy
       | amazing.
        
       | [deleted]
        
       ___________________________________________________________________
       (page generated 2023-06-24 23:00 UTC)