[HN Gopher] Ilya Sutskever NeurIPS talk [video]
       ___________________________________________________________________
        
       Ilya Sutskever NeurIPS talk [video]
        
       Author : mfiguiere
       Score  : 240 points
       Date   : 2024-12-14 00:49 UTC (22 hours ago)
        
 (HTM) web link (www.youtube.com)
 (TXT) w3m dump (www.youtube.com)
        
       | skissane wrote:
       | > "Pre-training as we know it will unquestionably end," Sutskever
       | said onstage.
       | 
       | > "We've achieved peak data and there'll be no more."
       | 
       | > During his NeurIPS talk, Sutskever said that, while he believes
       | existing data can still take AI development farther, the industry
       | is tapping out on new data to train on. This dynamic will, he
       | said, eventually force a shift away from the way models are
       | trained today. He compared the situation to fossil fuels: just as
       | oil is a finite resource, the internet contains a finite amount
       | of human-generated content.
       | 
       | > "We've achieved peak data and there'll be no more," according
       | to Sutskever. "We have to deal with the data that we have.
       | There's only one internet."
       | 
       | What will replace Internet data for training? Curated synthetic
       | datasets?
       | 
       | There are massive proprietary datasets out there which people
       | avoid using for training due to copyright concerns. But if you
       | actually own one of those datasets, that resolves a lot of the
       | legal issues with training on it.
       | 
       | For example, Getty has a massive image library. Training on it
       | would risk Getty suing you. But what if Getty decides to use it
       | to train their own AI? Similarly, what if News Corp decides to
       | train an AI using its publishing assets (Wall Street Journal,
       | HarperCollins, etc)?
        
         | _aavaa_ wrote:
         | > just as oil is a finite resource, the internet contains a
         | finite amount of human-generated content.
         | 
         | I guess now they're being explicit about the blatantly
         | extractive nature of these businesses and their models.
        
         | kibae wrote:
         | I always suspected that bots on Reddit were used to gain karma
         | and then eventually sell the account, but maybe they're also
         | being used for some kind of RLHF.
        
         | vitorgrs wrote:
         | Not sure if this was a good example. Getty already license
         | their images to Nvidia.
         | 
         | And they already have a generative image service... I believe
         | it's power by Nvidia model.
        
         | popularonion wrote:
         | > What will replace Internet data for training? Curated
         | synthetic datasets?
         | 
         | Enter Neuralink
        
           | zxexz wrote:
           | Really not sure what you mean by this, could you explain?
        
             | Gigachad wrote:
             | AI can just suck up the content of peoples brains for
             | training data.
        
               | zxexz wrote:
               | Yeah, people will go crazy for GPT-o2 trained on the
               | readings of sensors "barely embedded" in the brains
               | tortured monkeys, for sure.
               | 
               | EDIT: This comment may have been a bit too sassy. I get
               | the thought behind the original comment, but I personally
               | question the direction and premise of the Neuralink
               | project, and know I am not alone in that regard. That
               | being said, taking a step back, there for sure are plenty
               | of rich data sources for non-text multimodal data.
        
               | phillipcarter wrote:
               | You need to go back to Twitter with low-quality posts
               | like this.
        
         | YetAnotherNick wrote:
         | Humans doesn't need trillions of tokens to reason or ability to
         | know what they know. While a certain part of it comes from
         | evolution, I think we have already matched the part that came
         | from evolution using internet data, like basic language skills,
         | basic world modelling. Current pretraining takes lot more data
         | than a human would, and you don't need to look into all Getty
         | images to draw a picture and so would a self aware/improving
         | model(whatever that means).
         | 
         | To reach expert level in any field, just training next tokens
         | for internet data or any data is not the solution.
        
           | exe34 wrote:
           | > Humans doesn't need trillions of tokens
           | 
           | I wonder about that. we can fine tune on calculus with much
           | fewer tokens, but I'd be interested in some calculations of
           | how many tokens evolution provides us (it's not about the DNA
           | itself, but all the other things that were explored and
           | discarded and are now out of reach) - but also the sheer
           | amount of physics learnt by a baby by crawling around and
           | putting everything in its mouth.
        
             | YetAnotherNick wrote:
             | Yes, as I said in the last comment. With current training
             | techniques, one internet data is enough to give models what
             | is given by evolution. For further training, I believe we
             | would need different techniques to make the model self
             | aware about its knowledge.
             | 
             | Also, I believe a person who is blind and paralyzed for
             | life could still attain knowledge if educated well
             | enough.(can't find any study here tbh)
        
               | exe34 wrote:
               | yeah blind and paralysed from birth - I'm doubtful that
               | hearing along would give you the physics training.
               | although if it can be done, then it means the
               | evolutionary pre-training is even more impressive.
        
         | menaerus wrote:
         | > What will replace Internet data for training? Curated
         | synthetic datasets?
         | 
         | Perhaps a different take at this could be: if I wanted to train
         | "state law" LLM that is exceedingly good in interpreting state
         | law, what are the obstacles to download all the law and
         | regulations material for given state that will allow me to
         | train LLM such that it becomes 95th percentile of all law
         | trainees and lawyers.
         | 
         | In that case, and my point is, that we already don't need an
         | "Internet". We just need a sufficiently sized and curated
         | domain-specific dataset and the result we can get is already
         | scary. "State law" LLM was just an example but the same logic
         | applies to basically any other domain - want a domain-specific
         | (LLM) expert? Train it.
        
           | pas wrote:
           | you need context for the dry statutes
           | 
           | sure, you download all the legal arguments, and hope that
           | putting all this on top of a general LLM which has enough
           | context to deal with usual human, American, contemporary
           | stuff
           | 
           | the argument is that it's not really enough for the next jump
           | (as it would need "exponentially" more data) as far as a I
           | understand
        
             | menaerus wrote:
             | I don't understand the limitation, e.g. how much data do
             | you need to train the "law state" specific LLM that doesn't
             | know anything else but that?
             | 
             | Such LLM does not need to have 400B of parameters since
             | it's not a general knowledge LLM but perhaps I'm wrong on
             | this (?). So my point rather is that it may very well be,
             | let's take for example, a 30B parameters LLM which in turn
             | means that we might have just enough data to train it.
             | Larger contexts in smaller models are a solved problem.
        
               | petesergeant wrote:
               | > how much data do you need to train the "law state"
               | specific LLM that doesn't know anything else but that?
               | 
               | Law doesn't exist in a vacuum. You can't have a useful
               | LLM for state law that doesn't have an exceptional
               | grounding in real world objects of mechanics.
               | 
               | You could force a bright young child to memorize a large
               | text, but without a strong general model of the world,
               | they're just regurgitating words rather than able to
               | reason about it.
        
               | menaerus wrote:
               | Counter-argument: code does not exist in vacuum yet we
               | have small and mid-sized LLMs that can already output
               | reasonable code.
        
               | petesergeant wrote:
               | Generally they've been distilled from much larger models,
               | but also, code is a much smaller domain than the law.
        
               | noirbot wrote:
               | Code is both much smaller as a domain _and_ less prone to
               | the chaos of human interpretation. There are many factors
               | that go into why a given civil or criminal case in court
               | turns out how it does, and often the biggest one is not
               | "was it legal". Giving a computer access to the full
               | written history of cases doesn't give you any of the
               | context of why those cases turned out. A judge or jury
               | isn't going to include in the written record that they
               | just really didn't like one of the lawyers. Or that the
               | case settled because one of the parties just couldn't
               | afford to keep going. Or that one party or the other
               | destroyed/withheld evidence.
               | 
               | Generally speaking, your compiler won't just decide not
               | to work as expected. Tons of legal decisions don't
               | actually follow the law as written. Or even the precedent
               | set by other courts. And that's even assuming the law and
               | precedent are remotely clear in the first place.
        
               | zozbot234 wrote:
               | A model that's trained on legal decisions can still be
               | used to _explore_ these questions, though. The model may
               | end up being uncertain about which way the case will go,
               | or even more strikingly, it may be confident about the
               | outcome of a case that then is decided differently, and
               | you can try and figure out what 's going on with such
               | cases.
        
               | sharih wrote:
               | legal reasoning involves applying facts to the law, and
               | it needs knowledge of the world. the expertise of a
               | professional is in picking the right/winning path based
               | on their study of the law, the facts and their real world
               | training. money is in codifying that to teach models to
               | do the same
        
               | noirbot wrote:
               | But what value does that have? The difference between a
               | armchair lawyer and a real actual lawyer is in knowing
               | when something is legal/illegal, but unlikely to be seen
               | that way in a court or brought to a favorable verdict.
               | It's knowing which cases you can actually win, and how
               | much it'll cost and why.
               | 
               | Most of that is not in scope of what an LLM could be
               | trained on, or even what an LLM would be good at. What
               | you're training in that case would be someone who's an
               | opinion columnist or twitter poster. Not an actual
               | lawyer.
        
               | mkoryak wrote:
               | I'm going to push back on "produce reasonable code".
               | 
               | I've seen reasonable code written by AI, and also code
               | that looks reasonable but contains bugs and logic errors
               | that can be found if you're an expert in that type of
               | code.
               | 
               | In other words, I don't think we can rely solely on AI to
               | write code.
        
               | theptip wrote:
               | For a "legal LLM" you need three things: general IQ /
               | common sense at a substantially higher level than
               | current, understanding of the specific rules, and
               | hallucination-free recall of the relevant legal
               | facts/cases.
               | 
               | I think it's reasonable to assume you can get 2/3 with a
               | small corpus IF you have an IQ 150 AGI. Empirically the
               | current known method for increasing IQ is to make the
               | model bigger.
               | 
               | Part of what you're getting at is possible though, once
               | you have the big model you can distill it down to a
               | smaller number of parameters without losing much
               | capability in your chosen narrow domain. So you forget
               | physics and sports but remain good at law. That doesn't
               | help you with improving the capability frontier though.
        
               | pas wrote:
               | And then your Juris R. Genius gets a new case about two
               | Red Socks fans getting into a fight and without missing a
               | beat starts blabbering about how overdosing on too much
               | red pigments from the undergarments caused their rage!
        
           | losvedir wrote:
           | That's kind of going in a different direction. The big
           | picture is that LLMs have until this point gotten better and
           | better from larger datasets alone. See "The Bitter Lesson".
           | But now we're running out of datasets and so the only way we
           | know of to improve models' reasoning abilities and
           | everything, is coming to an end.
           | 
           | You're talking about fine tuning, which yes is a technique
           | that's being used and explored in different domains, but my
           | understanding is it's not a very good way for models to
           | acquire knowledge. Instead larger context windows and RAG
           | works better for something like case law. Fine tuning works
           | for things like giving models a certain "voice" in how they
           | produce text, and general alignment things.
           | 
           | At least that's my understanding as an interested but not
           | totally involved follower of this stuff.
        
             | kranke155 wrote:
             | A human being doesn't need to read the entire internet to
             | pass the state bar.
             | 
             | Seems to me that need new ideas?
        
           | yeahwhatever10 wrote:
           | The problem remains the size of the dataset. You aren't going
           | to get large enough datasets in these specific domains.
        
           | sharih wrote:
           | the big frontier models already have all laws, regulations
           | and cases memorized/trained on given they are public. the
           | real advancement is in experts codifying their
           | expertise/reasoning for models to learn from. legal is no
           | different from other fields in this.
        
         | fidotron wrote:
         | > What will replace Internet data for training? Curated
         | synthetic datasets?
         | 
         | My take is that the access Meta, Google etc. have to extra data
         | has reduced the amount of research into using synthetic data
         | because they have had such a surplus of it relative to everyone
         | else.
         | 
         | For example, when I've done training of object detectors (quite
         | out of date now) I used Blender 3D models, scripts to adjust
         | parameters, and existing ML models to infer camera calibration
         | and overlay orientation. This works amazingly well for
         | subsequently identifying the real analogue of the object, and I
         | know of people doing vehicle training in similar ways using
         | game engines.
         | 
         | There were several surprising tactical details to all this
         | which push the accuracy up dramatically and you don't see too
         | widely discussed, like ensuring that things which are not
         | relevant are properly randomized in the training set, such as
         | the surface texture of the 3D models. (i.e. putting random
         | fractal patterns on the object for training improves how robust
         | the object detector is to disturbance in reality).
        
         | robg wrote:
         | The ones that stand out to me are industries like
         | pharmaceuticals and energy exploration, where the data silos
         | are the point of their (assumed) competitive advantages. Why
         | even the playing by opening up those datasets when keeping them
         | closed locks in potential discoveries? Open data is the basis
         | of the Internet. But whole industries are based on keeping
         | discoveries closely guarded for decades.
        
         | seydor wrote:
         | Robots can acquire data on their own (hopefully not via human
         | dissection)
        
         | parkaboy wrote:
         | I wonder if we will see (or already are/have been seeing) the
         | XR/smart glasses space heat up. Seems eventually like a great
         | way to generate and hoover up massive amounts of fresh training
         | data.
        
         | RicoElectrico wrote:
         | I think we're not close to running out of training data. It's
         | just that we would like knowledge, but not necessary behavior
         | of said texts. LLMs are very bad at recalling popular memes
         | (known by any seasoned netizen) if they had no press coverage.
         | Maybe training with 4chan isn't as pointless if you could make
         | it memorize it, but not imitate it.
         | 
         | Also, what about movie scripts and song lyrics? Transcripts of
         | well known YouTube videos? Hell, television programs even.
        
           | stavros wrote:
           | We've run out of training data that definitely did not
           | contain LLM outputs.
        
             | DAGdug wrote:
             | What about non-text modalities - image and video,
             | specifically?
        
               | riffraff wrote:
               | video is probably still fine, but images sourced from the
               | internet now contain a massive amount of AI slop.
               | 
               | It seems, for example, that many newsletters, blogs etc
               | resort to using AI-generated images to give some color to
               | their writings (which is something I too intended to do,
               | before realizing how annoyed I am by it)
        
           | gcollard- wrote:
           | All the publicly accessible sources you mentioned have
           | already been scraped or licensed to avoid legal issues. This
           | is why it's often said, "there's no public data left to train
           | on."
           | 
           | For evidence of this, consider observing non-English-speaking
           | young children (ages 2-6) using ChatGPT's voice mode. The
           | multimodal model frequently interprets a significant portion
           | of their speech as "thank you for watching my video,"
           | reflecting child-like patterns learned from YouTube videos.
        
         | zozbot234 wrote:
         | Synthetic datasets are useless (other than for _very_ specific
         | purposes, such as enforcing known strong priors, and even then
         | it 's way better to do it directly by changing the
         | architecture). You're better off spending that compute by
         | making multiple passes over the data you do have.
        
           | HeatrayEnjoyer wrote:
           | This is contrary to what the big AI labs have found.
           | Synthetic data is the new game in town.
        
             | kranke155 wrote:
             | Ilya is saying it doesn't work in this talk apparently.
        
           | toxik wrote:
           | Most priors are not encodable as architecture though, or only
           | partially.
        
         | oldgradstudent wrote:
         | > There are massive proprietary datasets out there which people
         | avoid using for training due to copyright concerns.
         | 
         | The main legal concern is their unwillingness to pay to access
         | these datasets.
        
           | zozbot234 wrote:
           | Yup, there's also a huge amount of copyright-free, public
           | domain content on the Internet which just has to be
           | transcribed, and would provide plenty of valuable training to
           | a LLM on all sorts of varied language use. (Then you could
           | use RAG over some trusted set of data to provide the bare
           | "facts" that the LLM is supposed to be talking about.) But
           | guess what, writing down that content accurately from scans
           | costs money (and no, existing OCR is nowhere near good
           | enough), so the job is left to purely volunteer efforts.
        
         | numpad0 wrote:
         | > What will replace Internet data for training?
         | 
         | It means unlimited scaling with Transformer LLM is over. They
         | need a new architecture that scales better. Internet data
         | respawns when they click [New Game...], oil analogy is an
         | analogy and not a fact, but anyways total amount available in a
         | single game is finite so combustion efficiency matters.
        
       | neom wrote:
       | Full talk is interesting:
       | https://www.youtube.com/watch?v=YD-9NG1Ke5Y
        
         | CuriousSkeptic wrote:
         | On the slide of body/brain weight relation he highlighted the
         | humanids difference in scaling
         | 
         | What he didn't mention, that I found interesting, was that the
         | same slide also highlighted a hard ceiling for non-humanids at
         | the same point
        
           | imranhou wrote:
           | This is a very interesting point, in some ways the implicit
           | belief is that we just need to get beyond the 700g limitation
           | in terms of scaling LLM models and we would get human
           | intelligence/superintelligence. I admit I didn't really get
           | the body/brain analogy, I would have been better satisfied
           | with a simpler graph of brain weight to intelligence with a
           | scaling barrier of 700g.
        
       | stretchwithme wrote:
       | AIs will need to start asking people questions. Should make for
       | some very strange phone calls.
        
         | wslh wrote:
         | That's a good point. I think most people use LLMs by asking
         | questions and receiving answers. But if you reverse the dynamic
         | and have the LLM interview you instead, where you simply
         | respond to its questions, you'll notice something interesting:
         | the LLM as an interviewer is far less "smart" than it is when
         | simply providing answers. I've tried it myself, and the
         | interview felt more like interacting with ELIZA [1].
         | 
         | There seemed to be a lack of intent when the LLM was the one
         | asking the questions. This creates a reverse dynamic, where you
         | become the one being "prompted" and this dynamic could be worth
         | studying or adjusting further
         | 
         | [1] https://en.wikipedia.org/wiki/ELIZA
        
           | airstrike wrote:
           | Which LLM did you perform that test with?
        
             | wslh wrote:
             | ChatGPT Pro.
        
               | afro88 wrote:
               | That's not an LLM, that's a subscription plan. You can
               | select any OpenAI LLM on ChatGPT Pro.
               | 
               | You can share the chat here, and this will show the LLM
               | you had selected for the conversation. The initial prompt
               | is also pretty important. For claims like current LLMs
               | feel like conversing with Eliza, you are most definitely
               | missing something in how you're going about it.
               | 
               | Advanced voice mode will give you better results for
               | conversations too. It seems to be set up to converse
               | rather than provide answers or perform work for you. No
               | initial prompt, model selection or setup required
        
           | Barrin92 wrote:
           | >There seemed to be a lack of intent when the LLM was the one
           | asking the questions
           | 
           | There doesn't just seem to be lack of intent, there is no
           | intent, because by the nature of its architecture these
           | systems are just set of weights with a python script attached
           | to them asking you to give you one more token over and over.
           | 
           | There's no needs, drives, motivations, desires or any other
           | part of the cognitive architecture of humans in there that
           | produce genuine intent.
        
       | zxexz wrote:
       | I can't help but feel that this talk was a lot of...fluff?
       | 
       | The synopsis, as far as my tired brain can remember:
       | 
       | - Here's a brief summary of the last 10 years
       | 
       | - We're reaching the limit of our scaling laws, because we've
       | trained on all the data we have available on the limit
       | 
       | - Some things that may be next are "agents", "synthetic data",
       | and improving compute
       | 
       | - Some "ANNs are like biological NNs" rehash that would feel
       | questionable if there _was_ a thesis (which there wasn 't?
       | something about how body mass vs. brain mass are positively
       | correlated?)
       | 
       | - 3 questions, the first was something about "hallucinations" and
       | whether a model be able to understand if it is hallucinating?
       | Then something that involved cryptocurrencies, and then a
       | _slightly_ interesting question about multi-hop reasoning
        
         | coeneedell wrote:
         | I attended this talk in person and some context is needed. He
         | was invited for the "test of time" talk series. This explains
         | the historical part of the talk. I think his general persona
         | and association with ai led to the fluffy speculation at the
         | end.
         | 
         | I notice with Ilya he wants to talk about these out there
         | speculative topics but defends himself with statements like
         | "I'm not saying when or how just that it will happen" which
         | makes his arguments impossible to address. Stuff like this
         | openly invites the crazies to to interact with him, as seen
         | with the cryptocurrency question at the end.
         | 
         | Right before this was a talk reviewing the impact of GANs that
         | stayed on topic for the conference session throughout.
        
           | mrbungie wrote:
           | I mean he repeateadly gave some hints (even if just for the
           | lulz and not seriously) that the audience is at least
           | partially composed of people with little technical background
           | or AI bros. An example is when he mentioned LSTMs and said
           | "many of you may never seen before". Even if he didn't mean
           | it, ironically it ended being spot on when the crypto
           | question came.
        
         | killerstorm wrote:
         | Well, it looks like the entire point was "you can no longer
         | expect a capability gain from a model with a bigger ndim
         | trained on a bigger internet dump".
         | 
         | That's just one sentence, but it's pretty important. And while
         | many people already know this, it's important to hear Sutskever
         | say this. So people know it's a common knowledge.
         | 
         | The rest is basically intro/outro.
        
           | mrbungie wrote:
           | It is very important to have at least some kind of
           | counterweight vs OpenAI/sama predicting AGI for 2025/2026.
        
             | throwaway314155 wrote:
             | Ilya very much had the same optimism about AGI during his
             | time at OpenAI from my understanding.
        
           | lottin wrote:
           | He makes a good point. But then he jumps to "models will be
           | self-aware". I fail to see any logical connection.
        
             | killerstorm wrote:
             | But they are self-aware, in fact it's impossible to make a
             | good AI assistant which isn't: it has to know that it's an
             | AI assistant, it has to be aware of its capabilities,
             | limitations, etc.
             | 
             | I guess you're interpreting "self-awareness" in some
             | mythical way, like a soul. But in a trivial sense, they
             | are. Perhaps not to same extent as humans: models do not
             | experience time in a continuous way. But given that it can
             | maintain a dialogue (voice mode, etc), it seems to be
             | phenomenologically equivalent.
        
           | KuriousCat wrote:
           | So, are we headed towards a bitter-sweet zone and modeling is
           | going to get more prominent once again? massive datasets
           | going to take a backseat?
        
       | sigmar wrote:
       | I found this week's DeepMind podcast with Oriole Vinyals to be on
       | similar topics as this talk (current situation of LLMs, path
       | ahead with training) but much more interesting:
       | https://pca.st/episode/0f68afd5-2b2b-4ce9-964f-38193b7e8dd3
        
       | sega_sai wrote:
       | Very thought provoking. One of the things was not clear to me,
       | what does he mean by 'agentic' intelligence?
        
         | kgeist wrote:
         | When it autonomously performs tasks on behalf of the user,
         | without their intervention. It sets goals, plans actions, etc.
         | by itself.
        
         | ototot wrote:
         | Example of 'agentic': https://blog.google/technology/google-
         | deepmind/google-gemini...
        
         | ethbr1 wrote:
         | It's the current industry buzzword for action-oriented LLMs.
         | 
         | Instead of generating text, tightening/formalizing the loop
         | around planning, executing, analyzing results, and replanning.
         | 
         | As far as buzzwords go, it's far from the worst, as it captures
         | the essentials -- creating semi-autonomous agents.
        
           | ed wrote:
           | Based on the context Ilya is not referring to that kind of
           | agent. He's referring to something more fundamental (which I
           | was curious about, too).
        
       | legel wrote:
       | I'm glad Ilya starts the talk with a photo of Quoc Le, who was
       | the lead author of a 2012 paper on scaling neural nets that
       | inspired me to go into deep learning at the time.
       | 
       | His comments are relatively humble and based on public prior
       | work, but it's clear he's working on big things today and also
       | has a big imagination.
       | 
       | I'll also just say that at this point "the cat is out of the
       | bag", and probably it will be a new generation of leaders -- let
       | us all hope they are as humanitarian -- who drive the future of
       | AI.
        
         | chipsrafferty wrote:
         | Literally a zero chance that the new generation of leaders of
         | artifical intelligence will be humanitarian.
        
         | mrbungie wrote:
         | Let us all hope that they will be as humanitarian as they can
         | be, but let's not forget they are still just human beings.
        
       | tikkun wrote:
       | As context on Ilya's predictions given in this talk, he predicted
       | these in July 2017:
       | 
       | > Within the next three years, robotics should be completely
       | solved [wrong, unsolved 7 years later], AI should solve a long-
       | standing unproven theorem [wrong, unsolved 7 years later],
       | programming competitions should be won consistently by AIs
       | [wrong, not true 7 years later, seems close though], and there
       | should be convincing chatbots (though no one should pass the
       | Turing test) [correct, GPT-3 was released by then, and I think
       | with a good prompt it was a convincing chatbot]. In as little as
       | four years, each overnight experiment will feasibly use so much
       | compute capacity that there's an actual chance of waking up to
       | AGI [didn't happen], given the right algorithm -- and figuring
       | out the algorithm will actually happen within 2-4 further years
       | of experimenting with this compute in a competitive multiagent
       | simulation [didn't happen].
       | 
       | Being exceptionally smart in one field doesn't make you
       | exceptionally smart at making predictions about that field. Like
       | AI models, human intelligence often doesn't generalize very well.
        
         | padolsey wrote:
         | >exceptionally smart at making predictions
         | 
         | Is anyone though? Genuine question. I don't have much faith in
         | predictions anymore.
        
           | qeternity wrote:
           | No, very few for things with this much uncertainty.
           | 
           | Most of it is survivorship bias: if you have a million people
           | all making predictions with coin flip accuracy, somebody is
           | going to get a seemingly improbable number correct.
        
           | exe34 wrote:
           | so your prediction is that most predictions will be wrong?
        
             | bobbruno wrote:
             | A common saying in the stats field goes like this:
             | 
             | "Predictions are hard, especially about the future".
        
           | ethbr1 wrote:
           | Predictions predicated on technological advancement are
           | tricky: there's a reason breakthroughs are called
           | breakthroughs.
        
         | _giorgio_ wrote:
         | he just wanted money from investors, that's why he used such
         | short limits
         | 
         | https://openai.com/index/elon-musk-wanted-an-openai-for-prof...
         | 
         | > 2/3/4 will ultimately require large amounts of capital. If we
         | can secure the funding, we have a real chance at setting the
         | initial conditions under which AGI is born.
        
           | InkCanon wrote:
           | For all the discussion about it, this is the simple answer.
           | It's not an engineering or scientific prediction, it's a line
           | from a pitch deck.
        
             | noirbot wrote:
             | But isn't that part of the problem? Some of the brightest
             | minds in the field's public statements are filtered by
             | their need to lie in order to con the rich into funding
             | their work. This leaves actual honest discussions of what's
             | possible on what timelines to mostly be from people who
             | aren't working directly in the field, which inclines
             | towards people skeptical of it.
             | 
             | Most the people who could make an engineering prediction
             | with any level of confidence or insight are locked up in
             | businesses where doing so publicly would be disastrous to
             | their funding, so we get fed hype that ends up falling flat
             | again and again.
        
               | trescenzi wrote:
               | The opposite of this is also really interesting.
               | Seemingly the people with money are happy to be fed these
               | crazy predictions regardless of their accuracy. A
               | charitable reading is they temper them and say "ok it's
               | worth X if it has a 5% chance of being correct" but the
               | past 4 years have made that harder for me to believe.
        
               | noirbot wrote:
               | To be honest, I think some of it is what you suggest - a
               | gamble on long odds, but I think the bigger issue is just
               | a carelessness that comes with having more money than you
               | can ever effectively spend in your life if you tried. If
               | you're so rich you could hand everyone you meet $100 and
               | not notice, you have nothing in your life forcing you to
               | care if you're making good decisions and not being
               | conned.
               | 
               | It certainly doesn't help that so many of the people who
               | are that rich got that rich by conning other people this
               | exact way. It's an incestuous cycle of con-artists who
               | think they're geniuses, and the media only slavishly
               | supports that by treating them like they're such.
        
         | sangnoir wrote:
         | It is important to note the context: he it was in a private
         | email to an investor with vested interests in those fields, and
         | someone who is also prone to giving over-optimistic timelines
         | ("Tobo-taxis will be here next year, for sure" since 2015)
        
       | _giorgio_ wrote:
       | What a stupid talk.
       | 
       | They gave 15 minutes to one of the most competent scientist.
       | 
       | A joke.
        
       | ldenoue wrote:
       | LLM corrected transcript (using Gemini Flash 8B over the raw
       | YouTube transcript)
       | https://www.appblit.com/scribe?v=YD-9NG1Ke5Y#0
        
         | oezi wrote:
         | How do you prevent Gemini from just swallowing text after some
         | time?
         | 
         | Audio transcript correction is one area where I struggle to see
         | good results from any LLM unless I chunk it to no more than one
         | or two pages.
         | 
         | Or did you use any tool?
        
       | belter wrote:
       | It's surprising that some prominent ML practitioners still liken
       | transformer 'neurons' to actual biological neurons...
       | 
       | Real neurons rely on spiking, ion gradients, complex dendritic
       | trees, and synaptic plasticity governed by intricate biochemical
       | processes. None of which apply to the simple, differentiable
       | linear layers and pointwise nonlinearities in transformers.
       | 
       | Are there any reputable neuroscientists or biologists endorsing
       | such comparisons, or is this analogy strictly a convention
       | maintained by the ML community? :-)
        
         | martindbp wrote:
         | You have to remember what came before 2012: SVMs, Random
         | Forests etc, absolutely nothing like the brain (yes, NNs are
         | old, but 2012 was the start of the deep learning revolution).
         | With this frame of reference, the brain and neural networks are
         | both a kind of Connectionism with similar properties, and I
         | think it makes perfect sense to liken them with each other,
         | draw inspiration from one and apply it to the other.
        
           | signa11 wrote:
           | sorry, but i think neural-networks came way before 2012,
           | notably the works of rumelhart, mccleland etc. see the 2
           | volume "parallel distributed processing" to read almost all
           | about it.
           | 
           | the book(s):
           | https://direct.mit.edu/books/monograph/4424/Parallel-
           | Distrib...
           | 
           | a-talk: https://www.youtube.com/watch?v=yQbJNEhgYUw
        
             | mcshicks wrote:
             | Jets and Sharks!
             | 
             | https://github.com/acmiceli/IACModel
        
             | FL33TW00D wrote:
             | I raise you Warren McCulloch in 1962:
             | https://www.youtube.com/watch?v=wawMjJUCMVw
        
             | martindbp wrote:
             | I knew someone would bring it up, which is why I added
             | "(yes, NNs are old, but 2012 was the start of the deep
             | learning revolution)"
        
               | versteegen wrote:
               | 2012 was when the revolutionaries stormed the bastille
               | and overthrew the old guard. But I say it was 2006 when
               | the revolution started, when the manifesto was published:
               | deep NNs can be trained end-to-end, learning their own
               | features [1]. I think this is when "Deep Learning" became
               | a term of art, and the paper has 24k citations.
               | (Interestingly in a talk a Vector Hinton gave two weeks
               | ago he said his paper on deep learning at NIPS 2006 was
               | rejected because they already had one.)
               | 
               | [1] G. E. Hinton and R. R. Salakhutdinov, 2006, Science,
               | Reducing the Dimensionality of Data with Neural Networks
        
           | zitterbewegung wrote:
           | Neural Networks are 200 years old (Legendre and Gauss defined
           | Feed forward neural networks). Deep learning. The real
           | difference between traditional ones and deep learning is a
           | hierarchy of layers (hidden layers) which do different things
           | to accomplish a goal. Even the concept of training is to
           | provide weights on the neural network and there are many
           | algorithms to do refinement, optimization and the network
           | design.
        
             | varjag wrote:
             | Gauss did not define feed forward neural networks, it all
             | stems from a tweet of a very confused person.
        
             | mrbungie wrote:
             | I mean, sure, you can model a simple linear regression
             | fitted via Least Squares (pretty much what they did 200
             | years ago) with a one hidden layer feed-fwd Neural Network,
             | but the theorical framework for NNs is quite different.
        
           | belter wrote:
           | It is also odd to see such a weak argument as the brain-to-
           | body mass ratio being used, as here:
           | https://youtu.be/YD-9NG1Ke5Y?t=593
           | 
           | If this metric were truly indicative, what should we make of
           | the remarkable ratios found in small birds (1:12), tree
           | shrews (1:10), or even small ants (1:7)?
           | 
           | https://en.wikipedia.org/wiki/Brain%E2%80%93body_mass_ratio
        
             | theptip wrote:
             | > what should we make of the remarkable ratios found...
             | 
             | We also can't implement those creatures' control systems in
             | silicon, so they too are doing things we can learn from?
        
           | zk4x wrote:
           | What came before was regression. Which is to this day no 1
           | method if we want something interpretable, especially if we
           | know which functions our variables follow. And self attention
           | is very similar to correlation matrix. In a way neural
           | networks are just bunch of regression models stacked on top
           | of each other with some normalization and nonlinearity
           | between them. It's cool however how closely it resembles
           | biology.
        
         | criddell wrote:
         | Is that wildly different from me calling a data structure where
         | a parent node has child nodes a tree?
        
           | wrs wrote:
           | Depends -- do you then start claiming that because your data
           | structure is like a tree, it's surely going to start bearing
           | fruit and emitting oxygen?
        
         | modzu wrote:
         | what color are neurons? is that relevant? ml has proven that
         | artificial networks can think. the other stuff may be necessary
         | to do other things, or maybe simply evolved to support the
         | requisite biological structures. ml is of course inspired by
         | biology, but that does not mean we need to simulate everything.
        
         | chpatrick wrote:
         | You don't need to simulate every atom in a planet to predict
         | its orbit. A mathematical neuron could have similar function to
         | a real one even if it works completely differently.
        
         | sourcepluck wrote:
         | Reading the replies to your comment, I think maybe the answer
         | to your simple question is: "no". I also wonder if any "serious
         | comparisons" have been made, and would be interested to read
         | about it! A good question, I think.
        
         | syassami wrote:
         | https://www.bloomberg.com/news/articles/2024-12-13/liquid-ai...
        
         | curious_cat_163 wrote:
         | Not excusing the lack of caveat in his talk but IMO, the old
         | adage of: "All models are wrong, but some are useful." applies
         | here.
        
       | sensanaty wrote:
       | > just as oil is a finite resource, the internet contains a
       | finite amount of human-generated content.
       | 
       | The oil comparison is really apt. Indeed, let's boil a few more
       | lakes dry so that Mr Worldcoin and his ilk can get another 3
       | cents added to their net worth, totally worth it.
        
         | seizethecheese wrote:
         | I understand the oil analogy, but not your leap. What lake is
         | getting boiled?
        
       | olddog2 wrote:
       | So much knowledge in the world is locked away with empiric
       | experimentation being the only way to unlock it, and compute can
       | only really help that experimentation become more efficient.
       | Something still has to run a randomized controlled trial on an
       | intervention and that takes real time and real atoms to do.
        
       | killthebuddha wrote:
       | One thing he said I think was a profound understatement, and
       | that's that "more reasoning is more unpredictable". I think we
       | should be thinking about reasoning as in some sense _exactly the
       | same thing as unpredictability_. Or, more specifically, _useful
       | reasoning_ is by definition unpredictable. This framing is
       | important when it comes to, e.g., alignment.
        
         | bondarchuk wrote:
         | Not necessarily true when you think about e.g. finding vs.
         | verifying a solution (in terms of time complexity).
        
           | killthebuddha wrote:
           | IMO verifying a solution is a great example of how reasoning
           | is unpredictable. To say "I need to verify this solution" is
           | to say "I do not know whether the solution is correct or not"
           | or "I cannot predict whether the solution is correct or not
           | without reasoning about it first".
        
             | bondarchuk wrote:
             | But you will know beforehand some/a lot of properties that
             | the solution will satisfy, which is a type of certainty.
        
             | stevenhuang wrote:
             | It's not clear any of that follows at all.
             | 
             | Just look at inductive reasoning. Each step builds from a
             | previous step using established facts and basic heuristics
             | to reach a conclusion.
             | 
             | Such a mechanistic process allows for a great deal of
             | "predictability" at each step or estimating likelihood that
             | a solution is overall correct.
             | 
             | In fact I'd go further and posit that perfect reasoning is
             | 100% deterministic and systematic, and instead it's
             | _creativity_ that is unpredictable.
        
         | narrator wrote:
         | Reasoning by analogy is more predictable because it is by
         | definition more derivative of existing ideas. Reasoning from
         | first principles though can create whole new intellectual
         | worlds by replacing the underpinnings of ideas such that they
         | grow in completely new directions.
        
         | mike_hearn wrote:
         | Wouldn't it be the reverse? The word unreasonable is often used
         | as a synonym for volatile, unpredictable, even dangerous.
         | That's because "reason" is viewed as highly predictable. Two
         | people who rationally reason from the same set of known facts
         | would be expected to arrive at similar conclusions.
         | 
         | I think what Ilya is trying to get at here is more like:
         | someone very smart can seem "unpredictable" to someone who is
         | not smart, because the latter can't easily reason at the same
         | speed or quality as the former. It's not that reason itself is
         | unpredictable, it's that if you can reason quickly enough you
         | might reach conclusions nobody saw coming in advance, even if
         | they make sense.
        
       | linsomniac wrote:
       | ISTR reading back in the mid '90s, in a book on computing history
       | which I have long since forgotten the exact name/author of,
       | something along the lines of:
       | 
       | In the mid '80s it was largely believed among AI researchers that
       | AI was largely solved, it just needed computing horsepower to
       | grow. Because of this AI research stalled for a decade or more.
       | 
       | Considering the horsepower we are throwing at LLMs, I think there
       | was something to at least part of that.
        
       | LampCharger wrote:
       | Ha. Do people understand time for humanity to save itself is
       | running out. What is the point of having a super human AGI if
       | there's no human civilization for which it can help?
        
         | HeatrayEnjoyer wrote:
         | "We can totally control an entity with 10^x faster and stronger
         | intelligence than us. There is no way this could go wrong, in
         | fact we should spend all of our money building it as soon as
         | possible."
        
           | talldayo wrote:
           | > We can totally control an entity with 10^x faster and
           | stronger intelligence than us.
           | 
           | Unless you're referencing an unreleased model that can count
           | the number of 'r' occurrences in "strawberry" then I don't
           | even think we're dealing with .01*10^x intelligence right
           | now. Maybe not even .001e depending on how bad of a Chomsky
           | apologist you are.
        
       | ilaksh wrote:
       | Larger models are more robust reasoners. Is there a limit? What
       | if you make a 5 TB model trained on a lot of multimodal data
       | where the language information was fully grounded in videos and
       | images etc. Could more robust reasoning be that simple?
        
         | ryoshu wrote:
         | It could be simpler. Humans don't need 5TB of data to reason.
        
           | Workaccount2 wrote:
           | The amount of data needed to train a human brain is enormous.
        
           | Philpax wrote:
           | Think about the sheer amount of data that you receive through
           | your five senses over 570,000,000 seconds/18 years. It's a
           | lot, lot more than 5TB.
        
       | hackandthink wrote:
       | What kind of reasing is he talking about?
       | 
       | Why should it be unpredictable?                 Deductive
       | Reasoning       Inductive Reasoning       Abductive Reasoning
       | Analogical Reasoning       Pragmatic Reasoning       Moral
       | Reasoning       Causal Reasoning       Counterfactual Reasoning
       | Heuristic Reasoning       Bayesian Reasoning
       | 
       | (List generated by ChatGPT)
        
       | swyx wrote:
       | this is stolen and reposted content. the source video is here.
       | https://youtu.be/1yvBqasHLZs?si=pQihchmQG3xoeCPZ
        
         | dang wrote:
         | Ok, we've changed to that from
         | https://www.youtube.com/watch?v=YD-9NG1Ke5Y. Thanks!
        
       | error9348 wrote:
       | It would be great if all NeurIPS talks were accessible for free
       | like this one. I understand they generate some revenue from
       | online ticket sales, but it would be a great resource. Maybe some
       | big org could sponsor it.
        
         | swyx wrote:
         | they are - on a 1 month delay.
         | https://slideslive.com/neurips-2023 last year.
         | 
         | so if you have the patience, wait, if no patience, pay. fair?
         | 
         | we did this for ai.engineer too except we believe in youtube a
         | bit more for accessibility/discoverability.
         | https://www.youtube.com/@aiDotEngineer/videos
        
           | error9348 wrote:
           | Wow, thanks for the correction. Didn't know this existed - to
           | be fair, I only found a preview last year I tried, and paid
           | up.
        
           | davidmurphy wrote:
           | sweet! thanks for posting
        
       | IWeldMelons wrote:
       | How about NeurVA, NeurTN or NeurOLED?
        
       | niyyou wrote:
       | I'll take the risk of hurting the groupies here. But I have a
       | genuine question: what did you learn from this talk? Like...
       | really... what was new? or potentially useful? or insightful
       | perhaps? I really don't want to sound bad-mouthed but I'm sick of
       | these prophetic talks (in this case, the tone was literally
       | prophetic--with sudden high and grandiose pitches--and the
       | content typically religious, full of beliefs and empty
       | statements.
        
         | niyyou wrote:
         | Precision: << pre-training data is exhausted >> everyone has
         | been saying that for a while now. The graph plotting body mass
         | against brain mass... what does it say exactly? (where is the
         | link to the prior point on data?). I think we would all benefit
         | from being more critical here and stop idealizing these
         | figures. I believe they have no more clue that any other
         | average ML researcher on all these questions.
        
           | XenophileJKO wrote:
           | The other thing that bugged me is the built in assumption
           | that today's model have learned everything there is to learn
           | from the Internet corpus. This is quite easy to disprove.
           | Both in factual retention, but also meta cognition on the
           | context of the content.
        
             | jebarker wrote:
             | Yeah, exactly. A human can learning vastly more about, say,
             | math from a much smaller quantity of text. I doubt we're
             | anywhere close to exhausting the knowledge extraction
             | potential from web data.
        
         | random3 wrote:
         | What everyone could learn is to check their (and their
         | communities') assumptions from not long ago. Who saw this, who
         | didn't. Based on this many can confirm their beliefs and others
         | can realize they're clueless. In either case, there's something
         | to be learned but more to be learned when you realize you were
         | wrong.
        
         | 29athrowaway wrote:
         | From your reaction I guess you were expecting a talk about a
         | NeurIPS 2024 paper.
         | 
         | This is a different situation. There's the "NeurIPS 2024 Test
         | of Time Paper Awards" where they award a historical paper. In
         | this case, a paper from 2014 was awarded and his talk is about
         | that and why it passed the test of time.
         | 
         | https://blog.neurips.cc/2024/11/27/announcing-the-neurips-20...
         | 
         | The title chosen for the HN submission leaves out that
         | important context. So that's why you are disappointed now.
        
         | abetusk wrote:
         | I'll give my take:
         | 
         | * Before the current renaissance of neural networks (pre
         | ~2014ish), it was unclear that scaling would work. That is,
         | simple algorithms on lots of data. The last decade has pretty
         | much addressed that critique and it's clear that scaling does
         | work to a large extent, and spectacularly so.
         | 
         | * Much of the current neural network models and research are
         | geared towards "one-shot" algorithms, doing pattern matching
         | and giving an immediate result. Contrast this with search which
         | needs to do inference time compute or search.
         | 
         | * The exponential increase in power means that neural network
         | models are quickly sponging up as much data as they can find
         | and we're quickly running into the limits of science, art and
         | other data that humans have created in the last 5k years or so.
         | 
         | * Sutskever points out, as an analogy, nature has created a
         | better model for humans (the brain to mass ratio for animals)
         | with hominids finding more efficient compute than other
         | animals, even ones with much larger brains and neuron count.
         | 
         | * Sutskever is advocating for better models, presumably
         | focusing on inference time computer more.
         | 
         | In some sense, we're coming a bit full circle where people who
         | were advocating for pure scaling (simple algorithms + lots of
         | data) for learning are now advocating for better algorithms,
         | presumably with a focus on inference time compute (read:
         | search).
         | 
         | I agree that it's a little opaque, especially for people who
         | haven't been paying attention to past and current research, but
         | this message seems pretty clear to me.
         | 
         | Noam Brown had a talk recently titled "Parables on the Power of
         | Planning in AI" [0] which addresses this point more head on.
         | 
         | I will also point out that the scaling hypothesis is closely
         | related to "The Bitter Lesson" by Rich Sutton [1]. Most people
         | focus on the "learning" aspect of scaling but "The Bitter
         | Lesson" very clearly articulates learning _and_ search as the
         | methods most amenable to compute. From Sutton:
         | 
         | """
         | 
         | ...
         | 
         | Search and learning are the two most important classes of
         | techniques for utilizing massive amounts of computation in AI
         | research.
         | 
         | ...
         | 
         | """
         | 
         | [0] https://youtube.com/watch?v=eaAonE58sLU
         | 
         | [1]
         | https://www.cs.utexas.edu/~eunsol/courses/data/bitter_lesson...
        
           | abetusk wrote:
           | Here's a more pithy summary:
           | 
           | "We've made a copy of the internet, run current state of the
           | art methods on it and GPT-O1 is the best we can do. We need
           | better (inference/search) algorithms to make progress"
        
         | eldenring wrote:
         | He mentions this in the video, but the talk is specifically
         | tailored for the "Test of Time" award. This being his 3rd year
         | in a row recieving the award, I think he's earned permission to
         | speak prophetically.
        
       | 29athrowaway wrote:
       | This talk is not for a 2024 NeurIPS paper.
       | 
       | This talk is for the "NeurIPS 2024 Test of Time Paper Awards"
       | where they recognize a historical paper that has aged well.
       | 
       | https://blog.neurips.cc/2024/11/27/announcing-the-neurips-20...
       | 
       | And the presentation is about how a 2014 paper aged. When you
       | understand this context you will appreciate the talk more.
        
       ___________________________________________________________________
       (page generated 2024-12-14 23:00 UTC)