[HN Gopher] The bigger-is-better approach to AI is running out o...
___________________________________________________________________
The bigger-is-better approach to AI is running out of road
Author : pseudolus
Score : 67 points
Date : 2023-06-24 20:12 UTC (2 hours ago)
(HTM) web link (www.economist.com)
(TXT) w3m dump (www.economist.com)
| d--b wrote:
| Roads?! Where we're going, we don't need roads
| jug wrote:
| I also heard this. I unfortunately forget which study it was but
| yes, their paper spoke of likely diminishing returns at least at
| around 400-500B parameters for current LLM's. The recent news of
| GPT-4 running on 8x 220B LLM's (which doesn't equal a 8*220B
| size) fits that range and it's also questionable how much further
| we can push LLM's further by introducing multiple models like
| this, because this too eventually introduces problems due to
| granularity and picking the right model if I understood this
| correctly (from an earlier Hacker News discussion). (sorry for
| altogether no sources lol)
| whimsicalism wrote:
| Have those reports from George Hotz been confirmed? It seems
| plausible to me, but also suggests to me that we have further
| to go by using that parameter budget for depth rather than for
| width.
| woeirua wrote:
| It seems consistent with the behavior we see when using GPT4
| in chat mode. Every once in a while it will change its answer
| as it's generating it, as though it's switched which model it
| favors to produce the response. GPT3.5 doesn't do that.
| og_kalu wrote:
| MoE models don't work like that though
| awestroke wrote:
| It's been just a few months since GPT4. Calm down.
| codetrotter wrote:
| Major AI winter #3. Let's gooo!
| xwdv wrote:
| I remember how hyped people were seeing the progress from GPT3.5
| to GPT4, people really felt like many jobs were going to be
| replaced very soon. The next big advancement was around the
| corner. I think the limitations of LLMs should be more salient to
| them by now.
| tomohelix wrote:
| Non paywall: https://archive.ph/XwWTi
|
| Imo, it is true that the current architecture is hitting the
| limit. We need a breakthrough on the scale of the transistor to
| get past this problem. We know it is possible though. Every
| single human is proof that high performance AI can be run with
| less energy than a laptop. We just need a dedicated architecture
| for their working mechanisms the same way a transistor is the
| embodiment of 1 and 0.
|
| Unfortunately, in terms of understanding intelligence and how
| they work, I don't think we have made any significant advance in
| the last few decades. Maybe with better tools at probing how the
| LLM work, we can get some new insights.
| whimsicalism wrote:
| Data requirements are overstated - you can train on longer and
| longer sequences and I am pretty sure most organizations are
| still using the "show the model the data only once" approach
| which is just wasteful.
|
| Compute challenges are more real, but we are seeing for the first
| time huge amounts of global capital being allocated to solve
| specifically these problems, so I am curious what fruit that will
| bear in a few years.
|
| I mean already the stuff that some of these low level people are
| doing is absolutely nuts. Tim Dettmers work training with only 4
| bits means only 16 possible values per weight and still getting
| great results.
| nextworddev wrote:
| Yep. We are in the very early innings of capital being deployed
| to all this.
| sgt101 wrote:
| Ok - what's the ROI on the $10bn (++) that OpenAI have had?
|
| So far I reckon <$10m in actual revenue.
|
| This isn't what VC's (or microsoft) dream of.
| nextworddev wrote:
| I think Azure OpenAI service is growing at 1000% per
| quarter according to last earnings call
| TeMPOraL wrote:
| Indeed. Azure OpenAI service is how you get corporate-
| blessed ChatGPT that you can use with proprietary
| information, among other things. There's a huge demand
| for it.
| Quarrelsome wrote:
| The level of exposure since chatGPT has/will result in a
| lot of money turning up, especially for applications of the
| existing technology (whether they succeed or fail). That
| stats on usage demonstrate that the thundering herd has
| noticed and that attention can be extremely valuable.
|
| I think its quite likely that OpenAI will make that money
| back and more, as both the industry leader and with the
| power of their brand (chatGPT).
| PartiallyTyped wrote:
| > using the "show the model the data only once" approach which
| is just wasteful.
|
| According to the InstructGPT paper, that is not the case,
| showing the data multiple times results in overfitting.
| whimsicalism wrote:
| 1. You are just referring to fine tuning, I am referring to
| training the base model.
|
| 2. They still saw performance improvements which is why they
| did train on the data multiple times, you can see in the
| paper.
|
| 3. there was a recent paper demonstrating that reusing data
| still saw continued improvements in perplexity, i am on my
| ipad so cannot find it now
| PartiallyTyped wrote:
| > 1. You are just referring to fine tuning, I am referring
| to training the base model.
|
| Ahh mb! Sorry.
| Animats wrote:
| We need a way to make tight little specialist models that don't
| hallucinate and reliably report when they don't know. Trying to
| cram all of the web into a LLM is a dead end.
| causalmodels wrote:
| It's not an either or. We're going to leverage the web trained
| LLMs to bootstrap the specialist models via combination of
| training token quality classifiers and synthetic data
| generation. Phi-1 is a pretty good example of this.
| twobitshifter wrote:
| I don't know how a bunch of specialist models don't combine
| into a super useful generalist model. Do we believe too much
| knowledge breaks an LLM?
| TX81Z wrote:
| GPT4 is really good at code and you can generally verify
| hallucination easily.
|
| The other good use cases are using LLM to turn natural language
| prompts into API calls to real data.
| simion314 wrote:
| >GPT4 is really good at code
|
| for popular languages, though for JS it most of the time
| outputs obsolete syntax and code.
|
| Today i tried to do a bit of scripting with my son in Garrys
| Mod, it uses Expression 2 for a Wiremod module. GPT
| hallucinated a lot of functions and the worst part it
| switched almost each time from e2 to lua.
|
| It is good at solving homeworks for students, or solving
| popular problems in popular languages and libraries though it
| might give you an ugly solution and ugly code, it is probably
| trained on bad code too and it did not learn to prefer good
| code over bad code.
| LASR wrote:
| This is what I've been doing. GPT-4 to generate some data
| from some input, followed up by a 3.5T call to verify the
| output against the input to verify the content. You can feed
| the 3.5T output straight back into GPT-4 again and it will
| self-correct.
|
| Doing this a couple of times gives me 100% accuracy for my
| use case that involves some level of summarization and
| reasoning.
|
| Hallucinations are not as big of a deal at all IMO. Not
| enough that I'll just sit there and wait for models that
| don't hallucinate.
| adriand wrote:
| This sounds interesting, can you detail this data flow a
| bit more and maybe provide an example?
| skepticATX wrote:
| This is ultimately just very powerful semantic search though,
| is it not?
|
| It seems that what we need to make a big leap forward is better
| reasoning. There is a lot of debate between the GPT-4 can/can't
| reason camps, but I haven't seen anyone try to argue that it
| reasons _particularly_ well.
| nomel wrote:
| > that don't hallucinate
|
| "Hallucination" is part of thought. Solving a new problem
| requires hallucinating new, non existing, possible outcomes and
| solutions, to find one that will work. It seems that
| eliminating the ability to interpolate and extrapolate
| (hallucinations) would make intelligence impossible. It would
| eliminate creativity, tying together new concepts, creation,
| etc.
|
| Is the goal AI, or a nice database front end, to reference
| facts? Is intelligence facts, or is it the flexibility and the
| ability to handle and create the novel, things that are _new_?
|
| The ability to have _confidence_ , and know and respond to it,
| seems important, but that's surely different than the
| elimination of hallucinations.
|
| I'm probably misunderstanding something, and/or don't know what
| I'm talking about.
| Barrin92 wrote:
| >Is the goal AI, or a nice database front end, to reference
| facts?
|
| The latter given the kind of products that are currently
| being built with it. You don't want your code completion or
| news aggregator to hallucinate for the same reason you don't
| want your wrench to hallucinate, it's a tool.
|
| And as for hallucinations, that's a PR friendly misnomer for
| "it made **** up". Using the same phrase doesn't mean it has
| functionally anything to do with the cognitive processes
| involved in human thought. In the same way a 'artificial'
| neural net is really a metaphorical neural net, it has very
| few things in common with biological neurons.
| 4ndrewl wrote:
| They don't know that they don't know. It's only hallucination
| from a human's perspective. From the model's perspective it's
| _all_ hallucination.
| z3c0 wrote:
| Logistic Regression is simple to implement, supports binomial,
| multinomial, and ordinal classification, and is a key layer in
| NN, as it's often used as an actuator that sorts a probability
| into a discrete category. Very good for specialized problems,
| and easily trainable to sort unknown or nonsensical inputs into
| a noncategory.
|
| Linear Regression is great for projections, and can even be fit
| to time series data using lagging.
| Solvency wrote:
| The same way "attention" was a game changer, I'm not sure why
| they don't invent some recursive self-maintenance algorithm
| that constantly improves the neural network within. Self
| directed attention, so to speak.
| wilg wrote:
| How are we determining it's a dead end here? Recently things
| like GPT-4 have come out which have improved on that technique.
| Why would specialization necessarily reduce hallucination and
| improve accuracy?
| rco8786 wrote:
| Bingo. I've been beating this drum since the initial GPT-3
| awe.. The future of AI is bespoke, purpose-driven models
| trained on a combination of public and (importantly)
| proprietary data.
|
| Data is still king.
| samstave wrote:
| It will be a terrific moment when we can browse a galaxy map
| or other lists of LLMs, and select "context" groups - so you
| can say from history, give me X
| thelittleone wrote:
| That doesn't sound ideal at all to me. It also sounds like
| a bottleneck. I want to ask AI once and it recommends or
| selects the best "context" group.
| samstave wrote:
| It would bea good visual training tool for kids so they
| can get an understanding early on other than a block-box
| view.
| JimtheCoder wrote:
| So, search isn't dead after all...
| stavros wrote:
| Search has definitely been dead since before LLMs, we just
| don't have a replacement yet.
| JimtheCoder wrote:
| I mean more the concept of search, not the current
| implementation
| stavros wrote:
| Oh, I don't think we'll ever stop wanting to search for
| things. Maybe not everything, but some things.
| BolexNOLA wrote:
| For a while my replacement was "use google, add 'reddit'
| at the end." Not sure how much longer that will work
| given even just this limited blackout impacted how
| effective that was lol
| edgyquant wrote:
| That hasn't worked since about three months after
| companies found out people do it. It's all astroturfing
| now days anyway and if it applies to products (which it
| for sure does) you can be sure that government actors
| caught on as well.
| brucethemoose2 wrote:
| Its complicated.
|
| Training on something huge like "the internet" is what gives
| rise to those amazing emergent properties missing in smaller
| models (including the recent Pi model). And there are only so
| many datasets that huge.
|
| But its also indeed a waste, as Pi proves.
|
| There probably is some sweet spot (6B-40B?) for specialized,
| heavily focused models pre trained with high quality general
| data.
| TeMPOraL wrote:
| How much general "thinking"[0] would you want those "tight
| little specialist models" to retain? I think that cramming "all
| of the web" is actually crucial for this capability[1], so at
| least with LLM-style models, you likely can't avoid it. The
| text in the training data set doesn't encode _just_ the object-
| level knowledge, but indirectly also higher-level, cross-domain
| and general concepts; cutting down on the size and breadth of
| the training data may cause the network to lose the ability to
| "understand"[0].
|
| --
|
| [0] - Or "something very convincingly pretending to think by
| parroting stuff back", if you're closer to the "stochastic
| parrot" view.
|
| [1] - Per my hand-wavy hypothesis that the bulk of what we call
| thinking boils down to proximity search in extremely high-
| dimensional space.
| bluecoconut wrote:
| Another recent (but not called out in this article) is the
| "Textbooks Are All You Need" paper [1]; the results seem to
| suggest that careful curation and curriculums of training data
| can significantly improve model capabilities (when training
| domain specific, smaller models). Claiming a 10x smaller model
| can outperform competitors. (Eg. phi-1 vs. starcoder)
|
| [1] https://arxiv.org/abs/2306.11644
| TaylorAlexander wrote:
| I've still wondered if anyone has tried training with a large
| dataset of published books, like something from library
| genesis, or in the case of google using the full text from
| google books. There's all this talk of finding quality text and
| I've not heard of text from print books being a major source
| beyond this textbooks paper?
| startupsfail wrote:
| That's how OpenAI was (is) doing it. Books downloaded from
| the Internet is a part of the dataset as per GPT3 model card.
| Right to read.
| skepticATX wrote:
| I think the worst case scenario, and the one that I think is the
| most likely, is that we plateau at broad and shallow AI. Broad
| enough such that it can be used to replace many workers (it may
| not do a great job though), but narrow enough such that we don't
| really see the kinds of productivity increases that would usher
| in the pseudo-utopia that many AI folks talk about.
| LeanderK wrote:
| To the outsider it might seem that the only thing we've been
| doing is scaling up the neural networks, but that's not true. A
| lot of innovation and changes happened, some enabled us to scale
| up more and others just improved performance. I am quite
| confident that innovation will continue.
| jgalt212 wrote:
| Your statement is entirely fair, but the actual title is "The
| bigger-is-better approach to AI is running out of road". They
| are actually saying what you are saying, but your comment seems
| to contest the article.
| sheepscreek wrote:
| Sam Altman has been saying this for months. Nothing noteworthy
| here for someone following the industry closely.
|
| A16z's latest summary of the landscape was way more useful and
| relevant than this.
| satellite2 wrote:
| How is the economist qualified to answer this question?
| krona wrote:
| Usually its the most credentialed I'm least interested in
| listening to, especially when the value of those credentials
| are dependant upon the future looking a particular way.
| semiquaver wrote:
| Not sure where I heard this but it's apparently a common trope
| that many of the politicians and leaders that treat The
| Economist as close to holy writ are often horrified to learn
| that most the staff is actually a bunch of very precocious
| 20-somethings that are good at research and writing in an
| authoritative tone.
|
| Actually, now that I think of it, not so different from LLMs...
|
| (Full disclosure, I've been a subscriber for a couple of
| decades)
| JimtheCoder wrote:
| Using this logic, how are they qualified to answer 90% of the
| questions their articles deal with...
| satellite2 wrote:
| Yes and no. Most questions they answers are more politically
| leaning, meaning that they are not optimisation problems but
| they answer the question of the kind of society we want to
| be.
|
| This specific article seems to be reporting on a very
| technical issue on how to continue to scale LLM. Even
| scientific papers have a hard time answering those kind of
| questions because unless in very special circumstances where
| we can show with a good confidence that there are limitations
| (P vs NP for instance) the answer will simply be given by the
| most successful approach.
| dpeckett wrote:
| Sparse networks are the future, there's definitely a few major
| algorithmic hurdles we'll have to cross before they end up a real
| option but long term they will dominate (after all they already
| do in the living world).
|
| All our current approaches rely on dense matrix multiplications.
| These approaches necessitate a tremendous amount of communication
| bandwidth (and low latency collectives). This is extremely
| challenging and expensive to scale O(n^2.3).
|
| The constraints of physics and finance make significantly larger
| models out of reach for now.
| CrzyLngPwd wrote:
| https://archive.is/XwWTi
| jackmott42 wrote:
| I'm pretty excited by the possibilities. I am astounded by how
| much these language models can do with nothing but "predict the
| next word" as the core idea. I imagine in the near future having
| collections of a hundred different models, physics models,
| grammar models, fact models, sentiment models, vision models,
| wired all together by coordination models, and wired up to math
| tools and databases to ground truth when possible. I think it can
| get pretty wild.
|
| Just chatGPT wired up to Wolfram Alpha is already pretty creepy
| amazing.
| [deleted]
___________________________________________________________________
(page generated 2023-06-24 23:00 UTC)