[HN Gopher] Ilya Sutskever NeurIPS talk [video]
___________________________________________________________________
Ilya Sutskever NeurIPS talk [video]
Author : mfiguiere
Score : 240 points
Date : 2024-12-14 00:49 UTC (22 hours ago)
(HTM) web link (www.youtube.com)
(TXT) w3m dump (www.youtube.com)
| skissane wrote:
| > "Pre-training as we know it will unquestionably end," Sutskever
| said onstage.
|
| > "We've achieved peak data and there'll be no more."
|
| > During his NeurIPS talk, Sutskever said that, while he believes
| existing data can still take AI development farther, the industry
| is tapping out on new data to train on. This dynamic will, he
| said, eventually force a shift away from the way models are
| trained today. He compared the situation to fossil fuels: just as
| oil is a finite resource, the internet contains a finite amount
| of human-generated content.
|
| > "We've achieved peak data and there'll be no more," according
| to Sutskever. "We have to deal with the data that we have.
| There's only one internet."
|
| What will replace Internet data for training? Curated synthetic
| datasets?
|
| There are massive proprietary datasets out there which people
| avoid using for training due to copyright concerns. But if you
| actually own one of those datasets, that resolves a lot of the
| legal issues with training on it.
|
| For example, Getty has a massive image library. Training on it
| would risk Getty suing you. But what if Getty decides to use it
| to train their own AI? Similarly, what if News Corp decides to
| train an AI using its publishing assets (Wall Street Journal,
| HarperCollins, etc)?
| _aavaa_ wrote:
| > just as oil is a finite resource, the internet contains a
| finite amount of human-generated content.
|
| I guess now they're being explicit about the blatantly
| extractive nature of these businesses and their models.
| kibae wrote:
| I always suspected that bots on Reddit were used to gain karma
| and then eventually sell the account, but maybe they're also
| being used for some kind of RLHF.
| vitorgrs wrote:
| Not sure if this was a good example. Getty already license
| their images to Nvidia.
|
| And they already have a generative image service... I believe
| it's power by Nvidia model.
| popularonion wrote:
| > What will replace Internet data for training? Curated
| synthetic datasets?
|
| Enter Neuralink
| zxexz wrote:
| Really not sure what you mean by this, could you explain?
| Gigachad wrote:
| AI can just suck up the content of peoples brains for
| training data.
| zxexz wrote:
| Yeah, people will go crazy for GPT-o2 trained on the
| readings of sensors "barely embedded" in the brains
| tortured monkeys, for sure.
|
| EDIT: This comment may have been a bit too sassy. I get
| the thought behind the original comment, but I personally
| question the direction and premise of the Neuralink
| project, and know I am not alone in that regard. That
| being said, taking a step back, there for sure are plenty
| of rich data sources for non-text multimodal data.
| phillipcarter wrote:
| You need to go back to Twitter with low-quality posts
| like this.
| YetAnotherNick wrote:
| Humans doesn't need trillions of tokens to reason or ability to
| know what they know. While a certain part of it comes from
| evolution, I think we have already matched the part that came
| from evolution using internet data, like basic language skills,
| basic world modelling. Current pretraining takes lot more data
| than a human would, and you don't need to look into all Getty
| images to draw a picture and so would a self aware/improving
| model(whatever that means).
|
| To reach expert level in any field, just training next tokens
| for internet data or any data is not the solution.
| exe34 wrote:
| > Humans doesn't need trillions of tokens
|
| I wonder about that. we can fine tune on calculus with much
| fewer tokens, but I'd be interested in some calculations of
| how many tokens evolution provides us (it's not about the DNA
| itself, but all the other things that were explored and
| discarded and are now out of reach) - but also the sheer
| amount of physics learnt by a baby by crawling around and
| putting everything in its mouth.
| YetAnotherNick wrote:
| Yes, as I said in the last comment. With current training
| techniques, one internet data is enough to give models what
| is given by evolution. For further training, I believe we
| would need different techniques to make the model self
| aware about its knowledge.
|
| Also, I believe a person who is blind and paralyzed for
| life could still attain knowledge if educated well
| enough.(can't find any study here tbh)
| exe34 wrote:
| yeah blind and paralysed from birth - I'm doubtful that
| hearing along would give you the physics training.
| although if it can be done, then it means the
| evolutionary pre-training is even more impressive.
| menaerus wrote:
| > What will replace Internet data for training? Curated
| synthetic datasets?
|
| Perhaps a different take at this could be: if I wanted to train
| "state law" LLM that is exceedingly good in interpreting state
| law, what are the obstacles to download all the law and
| regulations material for given state that will allow me to
| train LLM such that it becomes 95th percentile of all law
| trainees and lawyers.
|
| In that case, and my point is, that we already don't need an
| "Internet". We just need a sufficiently sized and curated
| domain-specific dataset and the result we can get is already
| scary. "State law" LLM was just an example but the same logic
| applies to basically any other domain - want a domain-specific
| (LLM) expert? Train it.
| pas wrote:
| you need context for the dry statutes
|
| sure, you download all the legal arguments, and hope that
| putting all this on top of a general LLM which has enough
| context to deal with usual human, American, contemporary
| stuff
|
| the argument is that it's not really enough for the next jump
| (as it would need "exponentially" more data) as far as a I
| understand
| menaerus wrote:
| I don't understand the limitation, e.g. how much data do
| you need to train the "law state" specific LLM that doesn't
| know anything else but that?
|
| Such LLM does not need to have 400B of parameters since
| it's not a general knowledge LLM but perhaps I'm wrong on
| this (?). So my point rather is that it may very well be,
| let's take for example, a 30B parameters LLM which in turn
| means that we might have just enough data to train it.
| Larger contexts in smaller models are a solved problem.
| petesergeant wrote:
| > how much data do you need to train the "law state"
| specific LLM that doesn't know anything else but that?
|
| Law doesn't exist in a vacuum. You can't have a useful
| LLM for state law that doesn't have an exceptional
| grounding in real world objects of mechanics.
|
| You could force a bright young child to memorize a large
| text, but without a strong general model of the world,
| they're just regurgitating words rather than able to
| reason about it.
| menaerus wrote:
| Counter-argument: code does not exist in vacuum yet we
| have small and mid-sized LLMs that can already output
| reasonable code.
| petesergeant wrote:
| Generally they've been distilled from much larger models,
| but also, code is a much smaller domain than the law.
| noirbot wrote:
| Code is both much smaller as a domain _and_ less prone to
| the chaos of human interpretation. There are many factors
| that go into why a given civil or criminal case in court
| turns out how it does, and often the biggest one is not
| "was it legal". Giving a computer access to the full
| written history of cases doesn't give you any of the
| context of why those cases turned out. A judge or jury
| isn't going to include in the written record that they
| just really didn't like one of the lawyers. Or that the
| case settled because one of the parties just couldn't
| afford to keep going. Or that one party or the other
| destroyed/withheld evidence.
|
| Generally speaking, your compiler won't just decide not
| to work as expected. Tons of legal decisions don't
| actually follow the law as written. Or even the precedent
| set by other courts. And that's even assuming the law and
| precedent are remotely clear in the first place.
| zozbot234 wrote:
| A model that's trained on legal decisions can still be
| used to _explore_ these questions, though. The model may
| end up being uncertain about which way the case will go,
| or even more strikingly, it may be confident about the
| outcome of a case that then is decided differently, and
| you can try and figure out what 's going on with such
| cases.
| sharih wrote:
| legal reasoning involves applying facts to the law, and
| it needs knowledge of the world. the expertise of a
| professional is in picking the right/winning path based
| on their study of the law, the facts and their real world
| training. money is in codifying that to teach models to
| do the same
| noirbot wrote:
| But what value does that have? The difference between a
| armchair lawyer and a real actual lawyer is in knowing
| when something is legal/illegal, but unlikely to be seen
| that way in a court or brought to a favorable verdict.
| It's knowing which cases you can actually win, and how
| much it'll cost and why.
|
| Most of that is not in scope of what an LLM could be
| trained on, or even what an LLM would be good at. What
| you're training in that case would be someone who's an
| opinion columnist or twitter poster. Not an actual
| lawyer.
| mkoryak wrote:
| I'm going to push back on "produce reasonable code".
|
| I've seen reasonable code written by AI, and also code
| that looks reasonable but contains bugs and logic errors
| that can be found if you're an expert in that type of
| code.
|
| In other words, I don't think we can rely solely on AI to
| write code.
| theptip wrote:
| For a "legal LLM" you need three things: general IQ /
| common sense at a substantially higher level than
| current, understanding of the specific rules, and
| hallucination-free recall of the relevant legal
| facts/cases.
|
| I think it's reasonable to assume you can get 2/3 with a
| small corpus IF you have an IQ 150 AGI. Empirically the
| current known method for increasing IQ is to make the
| model bigger.
|
| Part of what you're getting at is possible though, once
| you have the big model you can distill it down to a
| smaller number of parameters without losing much
| capability in your chosen narrow domain. So you forget
| physics and sports but remain good at law. That doesn't
| help you with improving the capability frontier though.
| pas wrote:
| And then your Juris R. Genius gets a new case about two
| Red Socks fans getting into a fight and without missing a
| beat starts blabbering about how overdosing on too much
| red pigments from the undergarments caused their rage!
| losvedir wrote:
| That's kind of going in a different direction. The big
| picture is that LLMs have until this point gotten better and
| better from larger datasets alone. See "The Bitter Lesson".
| But now we're running out of datasets and so the only way we
| know of to improve models' reasoning abilities and
| everything, is coming to an end.
|
| You're talking about fine tuning, which yes is a technique
| that's being used and explored in different domains, but my
| understanding is it's not a very good way for models to
| acquire knowledge. Instead larger context windows and RAG
| works better for something like case law. Fine tuning works
| for things like giving models a certain "voice" in how they
| produce text, and general alignment things.
|
| At least that's my understanding as an interested but not
| totally involved follower of this stuff.
| kranke155 wrote:
| A human being doesn't need to read the entire internet to
| pass the state bar.
|
| Seems to me that need new ideas?
| yeahwhatever10 wrote:
| The problem remains the size of the dataset. You aren't going
| to get large enough datasets in these specific domains.
| sharih wrote:
| the big frontier models already have all laws, regulations
| and cases memorized/trained on given they are public. the
| real advancement is in experts codifying their
| expertise/reasoning for models to learn from. legal is no
| different from other fields in this.
| fidotron wrote:
| > What will replace Internet data for training? Curated
| synthetic datasets?
|
| My take is that the access Meta, Google etc. have to extra data
| has reduced the amount of research into using synthetic data
| because they have had such a surplus of it relative to everyone
| else.
|
| For example, when I've done training of object detectors (quite
| out of date now) I used Blender 3D models, scripts to adjust
| parameters, and existing ML models to infer camera calibration
| and overlay orientation. This works amazingly well for
| subsequently identifying the real analogue of the object, and I
| know of people doing vehicle training in similar ways using
| game engines.
|
| There were several surprising tactical details to all this
| which push the accuracy up dramatically and you don't see too
| widely discussed, like ensuring that things which are not
| relevant are properly randomized in the training set, such as
| the surface texture of the 3D models. (i.e. putting random
| fractal patterns on the object for training improves how robust
| the object detector is to disturbance in reality).
| robg wrote:
| The ones that stand out to me are industries like
| pharmaceuticals and energy exploration, where the data silos
| are the point of their (assumed) competitive advantages. Why
| even the playing by opening up those datasets when keeping them
| closed locks in potential discoveries? Open data is the basis
| of the Internet. But whole industries are based on keeping
| discoveries closely guarded for decades.
| seydor wrote:
| Robots can acquire data on their own (hopefully not via human
| dissection)
| parkaboy wrote:
| I wonder if we will see (or already are/have been seeing) the
| XR/smart glasses space heat up. Seems eventually like a great
| way to generate and hoover up massive amounts of fresh training
| data.
| RicoElectrico wrote:
| I think we're not close to running out of training data. It's
| just that we would like knowledge, but not necessary behavior
| of said texts. LLMs are very bad at recalling popular memes
| (known by any seasoned netizen) if they had no press coverage.
| Maybe training with 4chan isn't as pointless if you could make
| it memorize it, but not imitate it.
|
| Also, what about movie scripts and song lyrics? Transcripts of
| well known YouTube videos? Hell, television programs even.
| stavros wrote:
| We've run out of training data that definitely did not
| contain LLM outputs.
| DAGdug wrote:
| What about non-text modalities - image and video,
| specifically?
| riffraff wrote:
| video is probably still fine, but images sourced from the
| internet now contain a massive amount of AI slop.
|
| It seems, for example, that many newsletters, blogs etc
| resort to using AI-generated images to give some color to
| their writings (which is something I too intended to do,
| before realizing how annoyed I am by it)
| gcollard- wrote:
| All the publicly accessible sources you mentioned have
| already been scraped or licensed to avoid legal issues. This
| is why it's often said, "there's no public data left to train
| on."
|
| For evidence of this, consider observing non-English-speaking
| young children (ages 2-6) using ChatGPT's voice mode. The
| multimodal model frequently interprets a significant portion
| of their speech as "thank you for watching my video,"
| reflecting child-like patterns learned from YouTube videos.
| zozbot234 wrote:
| Synthetic datasets are useless (other than for _very_ specific
| purposes, such as enforcing known strong priors, and even then
| it 's way better to do it directly by changing the
| architecture). You're better off spending that compute by
| making multiple passes over the data you do have.
| HeatrayEnjoyer wrote:
| This is contrary to what the big AI labs have found.
| Synthetic data is the new game in town.
| kranke155 wrote:
| Ilya is saying it doesn't work in this talk apparently.
| toxik wrote:
| Most priors are not encodable as architecture though, or only
| partially.
| oldgradstudent wrote:
| > There are massive proprietary datasets out there which people
| avoid using for training due to copyright concerns.
|
| The main legal concern is their unwillingness to pay to access
| these datasets.
| zozbot234 wrote:
| Yup, there's also a huge amount of copyright-free, public
| domain content on the Internet which just has to be
| transcribed, and would provide plenty of valuable training to
| a LLM on all sorts of varied language use. (Then you could
| use RAG over some trusted set of data to provide the bare
| "facts" that the LLM is supposed to be talking about.) But
| guess what, writing down that content accurately from scans
| costs money (and no, existing OCR is nowhere near good
| enough), so the job is left to purely volunteer efforts.
| numpad0 wrote:
| > What will replace Internet data for training?
|
| It means unlimited scaling with Transformer LLM is over. They
| need a new architecture that scales better. Internet data
| respawns when they click [New Game...], oil analogy is an
| analogy and not a fact, but anyways total amount available in a
| single game is finite so combustion efficiency matters.
| neom wrote:
| Full talk is interesting:
| https://www.youtube.com/watch?v=YD-9NG1Ke5Y
| CuriousSkeptic wrote:
| On the slide of body/brain weight relation he highlighted the
| humanids difference in scaling
|
| What he didn't mention, that I found interesting, was that the
| same slide also highlighted a hard ceiling for non-humanids at
| the same point
| imranhou wrote:
| This is a very interesting point, in some ways the implicit
| belief is that we just need to get beyond the 700g limitation
| in terms of scaling LLM models and we would get human
| intelligence/superintelligence. I admit I didn't really get
| the body/brain analogy, I would have been better satisfied
| with a simpler graph of brain weight to intelligence with a
| scaling barrier of 700g.
| stretchwithme wrote:
| AIs will need to start asking people questions. Should make for
| some very strange phone calls.
| wslh wrote:
| That's a good point. I think most people use LLMs by asking
| questions and receiving answers. But if you reverse the dynamic
| and have the LLM interview you instead, where you simply
| respond to its questions, you'll notice something interesting:
| the LLM as an interviewer is far less "smart" than it is when
| simply providing answers. I've tried it myself, and the
| interview felt more like interacting with ELIZA [1].
|
| There seemed to be a lack of intent when the LLM was the one
| asking the questions. This creates a reverse dynamic, where you
| become the one being "prompted" and this dynamic could be worth
| studying or adjusting further
|
| [1] https://en.wikipedia.org/wiki/ELIZA
| airstrike wrote:
| Which LLM did you perform that test with?
| wslh wrote:
| ChatGPT Pro.
| afro88 wrote:
| That's not an LLM, that's a subscription plan. You can
| select any OpenAI LLM on ChatGPT Pro.
|
| You can share the chat here, and this will show the LLM
| you had selected for the conversation. The initial prompt
| is also pretty important. For claims like current LLMs
| feel like conversing with Eliza, you are most definitely
| missing something in how you're going about it.
|
| Advanced voice mode will give you better results for
| conversations too. It seems to be set up to converse
| rather than provide answers or perform work for you. No
| initial prompt, model selection or setup required
| Barrin92 wrote:
| >There seemed to be a lack of intent when the LLM was the one
| asking the questions
|
| There doesn't just seem to be lack of intent, there is no
| intent, because by the nature of its architecture these
| systems are just set of weights with a python script attached
| to them asking you to give you one more token over and over.
|
| There's no needs, drives, motivations, desires or any other
| part of the cognitive architecture of humans in there that
| produce genuine intent.
| zxexz wrote:
| I can't help but feel that this talk was a lot of...fluff?
|
| The synopsis, as far as my tired brain can remember:
|
| - Here's a brief summary of the last 10 years
|
| - We're reaching the limit of our scaling laws, because we've
| trained on all the data we have available on the limit
|
| - Some things that may be next are "agents", "synthetic data",
| and improving compute
|
| - Some "ANNs are like biological NNs" rehash that would feel
| questionable if there _was_ a thesis (which there wasn 't?
| something about how body mass vs. brain mass are positively
| correlated?)
|
| - 3 questions, the first was something about "hallucinations" and
| whether a model be able to understand if it is hallucinating?
| Then something that involved cryptocurrencies, and then a
| _slightly_ interesting question about multi-hop reasoning
| coeneedell wrote:
| I attended this talk in person and some context is needed. He
| was invited for the "test of time" talk series. This explains
| the historical part of the talk. I think his general persona
| and association with ai led to the fluffy speculation at the
| end.
|
| I notice with Ilya he wants to talk about these out there
| speculative topics but defends himself with statements like
| "I'm not saying when or how just that it will happen" which
| makes his arguments impossible to address. Stuff like this
| openly invites the crazies to to interact with him, as seen
| with the cryptocurrency question at the end.
|
| Right before this was a talk reviewing the impact of GANs that
| stayed on topic for the conference session throughout.
| mrbungie wrote:
| I mean he repeateadly gave some hints (even if just for the
| lulz and not seriously) that the audience is at least
| partially composed of people with little technical background
| or AI bros. An example is when he mentioned LSTMs and said
| "many of you may never seen before". Even if he didn't mean
| it, ironically it ended being spot on when the crypto
| question came.
| killerstorm wrote:
| Well, it looks like the entire point was "you can no longer
| expect a capability gain from a model with a bigger ndim
| trained on a bigger internet dump".
|
| That's just one sentence, but it's pretty important. And while
| many people already know this, it's important to hear Sutskever
| say this. So people know it's a common knowledge.
|
| The rest is basically intro/outro.
| mrbungie wrote:
| It is very important to have at least some kind of
| counterweight vs OpenAI/sama predicting AGI for 2025/2026.
| throwaway314155 wrote:
| Ilya very much had the same optimism about AGI during his
| time at OpenAI from my understanding.
| lottin wrote:
| He makes a good point. But then he jumps to "models will be
| self-aware". I fail to see any logical connection.
| killerstorm wrote:
| But they are self-aware, in fact it's impossible to make a
| good AI assistant which isn't: it has to know that it's an
| AI assistant, it has to be aware of its capabilities,
| limitations, etc.
|
| I guess you're interpreting "self-awareness" in some
| mythical way, like a soul. But in a trivial sense, they
| are. Perhaps not to same extent as humans: models do not
| experience time in a continuous way. But given that it can
| maintain a dialogue (voice mode, etc), it seems to be
| phenomenologically equivalent.
| KuriousCat wrote:
| So, are we headed towards a bitter-sweet zone and modeling is
| going to get more prominent once again? massive datasets
| going to take a backseat?
| sigmar wrote:
| I found this week's DeepMind podcast with Oriole Vinyals to be on
| similar topics as this talk (current situation of LLMs, path
| ahead with training) but much more interesting:
| https://pca.st/episode/0f68afd5-2b2b-4ce9-964f-38193b7e8dd3
| sega_sai wrote:
| Very thought provoking. One of the things was not clear to me,
| what does he mean by 'agentic' intelligence?
| kgeist wrote:
| When it autonomously performs tasks on behalf of the user,
| without their intervention. It sets goals, plans actions, etc.
| by itself.
| ototot wrote:
| Example of 'agentic': https://blog.google/technology/google-
| deepmind/google-gemini...
| ethbr1 wrote:
| It's the current industry buzzword for action-oriented LLMs.
|
| Instead of generating text, tightening/formalizing the loop
| around planning, executing, analyzing results, and replanning.
|
| As far as buzzwords go, it's far from the worst, as it captures
| the essentials -- creating semi-autonomous agents.
| ed wrote:
| Based on the context Ilya is not referring to that kind of
| agent. He's referring to something more fundamental (which I
| was curious about, too).
| legel wrote:
| I'm glad Ilya starts the talk with a photo of Quoc Le, who was
| the lead author of a 2012 paper on scaling neural nets that
| inspired me to go into deep learning at the time.
|
| His comments are relatively humble and based on public prior
| work, but it's clear he's working on big things today and also
| has a big imagination.
|
| I'll also just say that at this point "the cat is out of the
| bag", and probably it will be a new generation of leaders -- let
| us all hope they are as humanitarian -- who drive the future of
| AI.
| chipsrafferty wrote:
| Literally a zero chance that the new generation of leaders of
| artifical intelligence will be humanitarian.
| mrbungie wrote:
| Let us all hope that they will be as humanitarian as they can
| be, but let's not forget they are still just human beings.
| tikkun wrote:
| As context on Ilya's predictions given in this talk, he predicted
| these in July 2017:
|
| > Within the next three years, robotics should be completely
| solved [wrong, unsolved 7 years later], AI should solve a long-
| standing unproven theorem [wrong, unsolved 7 years later],
| programming competitions should be won consistently by AIs
| [wrong, not true 7 years later, seems close though], and there
| should be convincing chatbots (though no one should pass the
| Turing test) [correct, GPT-3 was released by then, and I think
| with a good prompt it was a convincing chatbot]. In as little as
| four years, each overnight experiment will feasibly use so much
| compute capacity that there's an actual chance of waking up to
| AGI [didn't happen], given the right algorithm -- and figuring
| out the algorithm will actually happen within 2-4 further years
| of experimenting with this compute in a competitive multiagent
| simulation [didn't happen].
|
| Being exceptionally smart in one field doesn't make you
| exceptionally smart at making predictions about that field. Like
| AI models, human intelligence often doesn't generalize very well.
| padolsey wrote:
| >exceptionally smart at making predictions
|
| Is anyone though? Genuine question. I don't have much faith in
| predictions anymore.
| qeternity wrote:
| No, very few for things with this much uncertainty.
|
| Most of it is survivorship bias: if you have a million people
| all making predictions with coin flip accuracy, somebody is
| going to get a seemingly improbable number correct.
| exe34 wrote:
| so your prediction is that most predictions will be wrong?
| bobbruno wrote:
| A common saying in the stats field goes like this:
|
| "Predictions are hard, especially about the future".
| ethbr1 wrote:
| Predictions predicated on technological advancement are
| tricky: there's a reason breakthroughs are called
| breakthroughs.
| _giorgio_ wrote:
| he just wanted money from investors, that's why he used such
| short limits
|
| https://openai.com/index/elon-musk-wanted-an-openai-for-prof...
|
| > 2/3/4 will ultimately require large amounts of capital. If we
| can secure the funding, we have a real chance at setting the
| initial conditions under which AGI is born.
| InkCanon wrote:
| For all the discussion about it, this is the simple answer.
| It's not an engineering or scientific prediction, it's a line
| from a pitch deck.
| noirbot wrote:
| But isn't that part of the problem? Some of the brightest
| minds in the field's public statements are filtered by
| their need to lie in order to con the rich into funding
| their work. This leaves actual honest discussions of what's
| possible on what timelines to mostly be from people who
| aren't working directly in the field, which inclines
| towards people skeptical of it.
|
| Most the people who could make an engineering prediction
| with any level of confidence or insight are locked up in
| businesses where doing so publicly would be disastrous to
| their funding, so we get fed hype that ends up falling flat
| again and again.
| trescenzi wrote:
| The opposite of this is also really interesting.
| Seemingly the people with money are happy to be fed these
| crazy predictions regardless of their accuracy. A
| charitable reading is they temper them and say "ok it's
| worth X if it has a 5% chance of being correct" but the
| past 4 years have made that harder for me to believe.
| noirbot wrote:
| To be honest, I think some of it is what you suggest - a
| gamble on long odds, but I think the bigger issue is just
| a carelessness that comes with having more money than you
| can ever effectively spend in your life if you tried. If
| you're so rich you could hand everyone you meet $100 and
| not notice, you have nothing in your life forcing you to
| care if you're making good decisions and not being
| conned.
|
| It certainly doesn't help that so many of the people who
| are that rich got that rich by conning other people this
| exact way. It's an incestuous cycle of con-artists who
| think they're geniuses, and the media only slavishly
| supports that by treating them like they're such.
| sangnoir wrote:
| It is important to note the context: he it was in a private
| email to an investor with vested interests in those fields, and
| someone who is also prone to giving over-optimistic timelines
| ("Tobo-taxis will be here next year, for sure" since 2015)
| _giorgio_ wrote:
| What a stupid talk.
|
| They gave 15 minutes to one of the most competent scientist.
|
| A joke.
| ldenoue wrote:
| LLM corrected transcript (using Gemini Flash 8B over the raw
| YouTube transcript)
| https://www.appblit.com/scribe?v=YD-9NG1Ke5Y#0
| oezi wrote:
| How do you prevent Gemini from just swallowing text after some
| time?
|
| Audio transcript correction is one area where I struggle to see
| good results from any LLM unless I chunk it to no more than one
| or two pages.
|
| Or did you use any tool?
| belter wrote:
| It's surprising that some prominent ML practitioners still liken
| transformer 'neurons' to actual biological neurons...
|
| Real neurons rely on spiking, ion gradients, complex dendritic
| trees, and synaptic plasticity governed by intricate biochemical
| processes. None of which apply to the simple, differentiable
| linear layers and pointwise nonlinearities in transformers.
|
| Are there any reputable neuroscientists or biologists endorsing
| such comparisons, or is this analogy strictly a convention
| maintained by the ML community? :-)
| martindbp wrote:
| You have to remember what came before 2012: SVMs, Random
| Forests etc, absolutely nothing like the brain (yes, NNs are
| old, but 2012 was the start of the deep learning revolution).
| With this frame of reference, the brain and neural networks are
| both a kind of Connectionism with similar properties, and I
| think it makes perfect sense to liken them with each other,
| draw inspiration from one and apply it to the other.
| signa11 wrote:
| sorry, but i think neural-networks came way before 2012,
| notably the works of rumelhart, mccleland etc. see the 2
| volume "parallel distributed processing" to read almost all
| about it.
|
| the book(s):
| https://direct.mit.edu/books/monograph/4424/Parallel-
| Distrib...
|
| a-talk: https://www.youtube.com/watch?v=yQbJNEhgYUw
| mcshicks wrote:
| Jets and Sharks!
|
| https://github.com/acmiceli/IACModel
| FL33TW00D wrote:
| I raise you Warren McCulloch in 1962:
| https://www.youtube.com/watch?v=wawMjJUCMVw
| martindbp wrote:
| I knew someone would bring it up, which is why I added
| "(yes, NNs are old, but 2012 was the start of the deep
| learning revolution)"
| versteegen wrote:
| 2012 was when the revolutionaries stormed the bastille
| and overthrew the old guard. But I say it was 2006 when
| the revolution started, when the manifesto was published:
| deep NNs can be trained end-to-end, learning their own
| features [1]. I think this is when "Deep Learning" became
| a term of art, and the paper has 24k citations.
| (Interestingly in a talk a Vector Hinton gave two weeks
| ago he said his paper on deep learning at NIPS 2006 was
| rejected because they already had one.)
|
| [1] G. E. Hinton and R. R. Salakhutdinov, 2006, Science,
| Reducing the Dimensionality of Data with Neural Networks
| zitterbewegung wrote:
| Neural Networks are 200 years old (Legendre and Gauss defined
| Feed forward neural networks). Deep learning. The real
| difference between traditional ones and deep learning is a
| hierarchy of layers (hidden layers) which do different things
| to accomplish a goal. Even the concept of training is to
| provide weights on the neural network and there are many
| algorithms to do refinement, optimization and the network
| design.
| varjag wrote:
| Gauss did not define feed forward neural networks, it all
| stems from a tweet of a very confused person.
| mrbungie wrote:
| I mean, sure, you can model a simple linear regression
| fitted via Least Squares (pretty much what they did 200
| years ago) with a one hidden layer feed-fwd Neural Network,
| but the theorical framework for NNs is quite different.
| belter wrote:
| It is also odd to see such a weak argument as the brain-to-
| body mass ratio being used, as here:
| https://youtu.be/YD-9NG1Ke5Y?t=593
|
| If this metric were truly indicative, what should we make of
| the remarkable ratios found in small birds (1:12), tree
| shrews (1:10), or even small ants (1:7)?
|
| https://en.wikipedia.org/wiki/Brain%E2%80%93body_mass_ratio
| theptip wrote:
| > what should we make of the remarkable ratios found...
|
| We also can't implement those creatures' control systems in
| silicon, so they too are doing things we can learn from?
| zk4x wrote:
| What came before was regression. Which is to this day no 1
| method if we want something interpretable, especially if we
| know which functions our variables follow. And self attention
| is very similar to correlation matrix. In a way neural
| networks are just bunch of regression models stacked on top
| of each other with some normalization and nonlinearity
| between them. It's cool however how closely it resembles
| biology.
| criddell wrote:
| Is that wildly different from me calling a data structure where
| a parent node has child nodes a tree?
| wrs wrote:
| Depends -- do you then start claiming that because your data
| structure is like a tree, it's surely going to start bearing
| fruit and emitting oxygen?
| modzu wrote:
| what color are neurons? is that relevant? ml has proven that
| artificial networks can think. the other stuff may be necessary
| to do other things, or maybe simply evolved to support the
| requisite biological structures. ml is of course inspired by
| biology, but that does not mean we need to simulate everything.
| chpatrick wrote:
| You don't need to simulate every atom in a planet to predict
| its orbit. A mathematical neuron could have similar function to
| a real one even if it works completely differently.
| sourcepluck wrote:
| Reading the replies to your comment, I think maybe the answer
| to your simple question is: "no". I also wonder if any "serious
| comparisons" have been made, and would be interested to read
| about it! A good question, I think.
| syassami wrote:
| https://www.bloomberg.com/news/articles/2024-12-13/liquid-ai...
| curious_cat_163 wrote:
| Not excusing the lack of caveat in his talk but IMO, the old
| adage of: "All models are wrong, but some are useful." applies
| here.
| sensanaty wrote:
| > just as oil is a finite resource, the internet contains a
| finite amount of human-generated content.
|
| The oil comparison is really apt. Indeed, let's boil a few more
| lakes dry so that Mr Worldcoin and his ilk can get another 3
| cents added to their net worth, totally worth it.
| seizethecheese wrote:
| I understand the oil analogy, but not your leap. What lake is
| getting boiled?
| olddog2 wrote:
| So much knowledge in the world is locked away with empiric
| experimentation being the only way to unlock it, and compute can
| only really help that experimentation become more efficient.
| Something still has to run a randomized controlled trial on an
| intervention and that takes real time and real atoms to do.
| killthebuddha wrote:
| One thing he said I think was a profound understatement, and
| that's that "more reasoning is more unpredictable". I think we
| should be thinking about reasoning as in some sense _exactly the
| same thing as unpredictability_. Or, more specifically, _useful
| reasoning_ is by definition unpredictable. This framing is
| important when it comes to, e.g., alignment.
| bondarchuk wrote:
| Not necessarily true when you think about e.g. finding vs.
| verifying a solution (in terms of time complexity).
| killthebuddha wrote:
| IMO verifying a solution is a great example of how reasoning
| is unpredictable. To say "I need to verify this solution" is
| to say "I do not know whether the solution is correct or not"
| or "I cannot predict whether the solution is correct or not
| without reasoning about it first".
| bondarchuk wrote:
| But you will know beforehand some/a lot of properties that
| the solution will satisfy, which is a type of certainty.
| stevenhuang wrote:
| It's not clear any of that follows at all.
|
| Just look at inductive reasoning. Each step builds from a
| previous step using established facts and basic heuristics
| to reach a conclusion.
|
| Such a mechanistic process allows for a great deal of
| "predictability" at each step or estimating likelihood that
| a solution is overall correct.
|
| In fact I'd go further and posit that perfect reasoning is
| 100% deterministic and systematic, and instead it's
| _creativity_ that is unpredictable.
| narrator wrote:
| Reasoning by analogy is more predictable because it is by
| definition more derivative of existing ideas. Reasoning from
| first principles though can create whole new intellectual
| worlds by replacing the underpinnings of ideas such that they
| grow in completely new directions.
| mike_hearn wrote:
| Wouldn't it be the reverse? The word unreasonable is often used
| as a synonym for volatile, unpredictable, even dangerous.
| That's because "reason" is viewed as highly predictable. Two
| people who rationally reason from the same set of known facts
| would be expected to arrive at similar conclusions.
|
| I think what Ilya is trying to get at here is more like:
| someone very smart can seem "unpredictable" to someone who is
| not smart, because the latter can't easily reason at the same
| speed or quality as the former. It's not that reason itself is
| unpredictable, it's that if you can reason quickly enough you
| might reach conclusions nobody saw coming in advance, even if
| they make sense.
| linsomniac wrote:
| ISTR reading back in the mid '90s, in a book on computing history
| which I have long since forgotten the exact name/author of,
| something along the lines of:
|
| In the mid '80s it was largely believed among AI researchers that
| AI was largely solved, it just needed computing horsepower to
| grow. Because of this AI research stalled for a decade or more.
|
| Considering the horsepower we are throwing at LLMs, I think there
| was something to at least part of that.
| LampCharger wrote:
| Ha. Do people understand time for humanity to save itself is
| running out. What is the point of having a super human AGI if
| there's no human civilization for which it can help?
| HeatrayEnjoyer wrote:
| "We can totally control an entity with 10^x faster and stronger
| intelligence than us. There is no way this could go wrong, in
| fact we should spend all of our money building it as soon as
| possible."
| talldayo wrote:
| > We can totally control an entity with 10^x faster and
| stronger intelligence than us.
|
| Unless you're referencing an unreleased model that can count
| the number of 'r' occurrences in "strawberry" then I don't
| even think we're dealing with .01*10^x intelligence right
| now. Maybe not even .001e depending on how bad of a Chomsky
| apologist you are.
| ilaksh wrote:
| Larger models are more robust reasoners. Is there a limit? What
| if you make a 5 TB model trained on a lot of multimodal data
| where the language information was fully grounded in videos and
| images etc. Could more robust reasoning be that simple?
| ryoshu wrote:
| It could be simpler. Humans don't need 5TB of data to reason.
| Workaccount2 wrote:
| The amount of data needed to train a human brain is enormous.
| Philpax wrote:
| Think about the sheer amount of data that you receive through
| your five senses over 570,000,000 seconds/18 years. It's a
| lot, lot more than 5TB.
| hackandthink wrote:
| What kind of reasing is he talking about?
|
| Why should it be unpredictable? Deductive
| Reasoning Inductive Reasoning Abductive Reasoning
| Analogical Reasoning Pragmatic Reasoning Moral
| Reasoning Causal Reasoning Counterfactual Reasoning
| Heuristic Reasoning Bayesian Reasoning
|
| (List generated by ChatGPT)
| swyx wrote:
| this is stolen and reposted content. the source video is here.
| https://youtu.be/1yvBqasHLZs?si=pQihchmQG3xoeCPZ
| dang wrote:
| Ok, we've changed to that from
| https://www.youtube.com/watch?v=YD-9NG1Ke5Y. Thanks!
| error9348 wrote:
| It would be great if all NeurIPS talks were accessible for free
| like this one. I understand they generate some revenue from
| online ticket sales, but it would be a great resource. Maybe some
| big org could sponsor it.
| swyx wrote:
| they are - on a 1 month delay.
| https://slideslive.com/neurips-2023 last year.
|
| so if you have the patience, wait, if no patience, pay. fair?
|
| we did this for ai.engineer too except we believe in youtube a
| bit more for accessibility/discoverability.
| https://www.youtube.com/@aiDotEngineer/videos
| error9348 wrote:
| Wow, thanks for the correction. Didn't know this existed - to
| be fair, I only found a preview last year I tried, and paid
| up.
| davidmurphy wrote:
| sweet! thanks for posting
| IWeldMelons wrote:
| How about NeurVA, NeurTN or NeurOLED?
| niyyou wrote:
| I'll take the risk of hurting the groupies here. But I have a
| genuine question: what did you learn from this talk? Like...
| really... what was new? or potentially useful? or insightful
| perhaps? I really don't want to sound bad-mouthed but I'm sick of
| these prophetic talks (in this case, the tone was literally
| prophetic--with sudden high and grandiose pitches--and the
| content typically religious, full of beliefs and empty
| statements.
| niyyou wrote:
| Precision: << pre-training data is exhausted >> everyone has
| been saying that for a while now. The graph plotting body mass
| against brain mass... what does it say exactly? (where is the
| link to the prior point on data?). I think we would all benefit
| from being more critical here and stop idealizing these
| figures. I believe they have no more clue that any other
| average ML researcher on all these questions.
| XenophileJKO wrote:
| The other thing that bugged me is the built in assumption
| that today's model have learned everything there is to learn
| from the Internet corpus. This is quite easy to disprove.
| Both in factual retention, but also meta cognition on the
| context of the content.
| jebarker wrote:
| Yeah, exactly. A human can learning vastly more about, say,
| math from a much smaller quantity of text. I doubt we're
| anywhere close to exhausting the knowledge extraction
| potential from web data.
| random3 wrote:
| What everyone could learn is to check their (and their
| communities') assumptions from not long ago. Who saw this, who
| didn't. Based on this many can confirm their beliefs and others
| can realize they're clueless. In either case, there's something
| to be learned but more to be learned when you realize you were
| wrong.
| 29athrowaway wrote:
| From your reaction I guess you were expecting a talk about a
| NeurIPS 2024 paper.
|
| This is a different situation. There's the "NeurIPS 2024 Test
| of Time Paper Awards" where they award a historical paper. In
| this case, a paper from 2014 was awarded and his talk is about
| that and why it passed the test of time.
|
| https://blog.neurips.cc/2024/11/27/announcing-the-neurips-20...
|
| The title chosen for the HN submission leaves out that
| important context. So that's why you are disappointed now.
| abetusk wrote:
| I'll give my take:
|
| * Before the current renaissance of neural networks (pre
| ~2014ish), it was unclear that scaling would work. That is,
| simple algorithms on lots of data. The last decade has pretty
| much addressed that critique and it's clear that scaling does
| work to a large extent, and spectacularly so.
|
| * Much of the current neural network models and research are
| geared towards "one-shot" algorithms, doing pattern matching
| and giving an immediate result. Contrast this with search which
| needs to do inference time compute or search.
|
| * The exponential increase in power means that neural network
| models are quickly sponging up as much data as they can find
| and we're quickly running into the limits of science, art and
| other data that humans have created in the last 5k years or so.
|
| * Sutskever points out, as an analogy, nature has created a
| better model for humans (the brain to mass ratio for animals)
| with hominids finding more efficient compute than other
| animals, even ones with much larger brains and neuron count.
|
| * Sutskever is advocating for better models, presumably
| focusing on inference time computer more.
|
| In some sense, we're coming a bit full circle where people who
| were advocating for pure scaling (simple algorithms + lots of
| data) for learning are now advocating for better algorithms,
| presumably with a focus on inference time compute (read:
| search).
|
| I agree that it's a little opaque, especially for people who
| haven't been paying attention to past and current research, but
| this message seems pretty clear to me.
|
| Noam Brown had a talk recently titled "Parables on the Power of
| Planning in AI" [0] which addresses this point more head on.
|
| I will also point out that the scaling hypothesis is closely
| related to "The Bitter Lesson" by Rich Sutton [1]. Most people
| focus on the "learning" aspect of scaling but "The Bitter
| Lesson" very clearly articulates learning _and_ search as the
| methods most amenable to compute. From Sutton:
|
| """
|
| ...
|
| Search and learning are the two most important classes of
| techniques for utilizing massive amounts of computation in AI
| research.
|
| ...
|
| """
|
| [0] https://youtube.com/watch?v=eaAonE58sLU
|
| [1]
| https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...
| abetusk wrote:
| Here's a more pithy summary:
|
| "We've made a copy of the internet, run current state of the
| art methods on it and GPT-O1 is the best we can do. We need
| better (inference/search) algorithms to make progress"
| eldenring wrote:
| He mentions this in the video, but the talk is specifically
| tailored for the "Test of Time" award. This being his 3rd year
| in a row recieving the award, I think he's earned permission to
| speak prophetically.
| 29athrowaway wrote:
| This talk is not for a 2024 NeurIPS paper.
|
| This talk is for the "NeurIPS 2024 Test of Time Paper Awards"
| where they recognize a historical paper that has aged well.
|
| https://blog.neurips.cc/2024/11/27/announcing-the-neurips-20...
|
| And the presentation is about how a 2014 paper aged. When you
| understand this context you will appreciate the talk more.
___________________________________________________________________
(page generated 2024-12-14 23:00 UTC)