[HN Gopher] Why the deep learning boom caught almost everyone by...
___________________________________________________________________
Why the deep learning boom caught almost everyone by surprise
Author : slyall
Score : 201 points
Date : 2024-11-06 04:05 UTC (18 hours ago)
(HTM) web link (www.understandingai.org)
(TXT) w3m dump (www.understandingai.org)
| arcmechanica wrote:
| It was basically useful to average people and wasn't just some
| way to steal and resell data or dump ad after ad on us. A lot of
| dark patterns really ruin services.
| teknover wrote:
| "Nvidia invented the GPU in 1999" wrong on many fronts.
|
| Arguably the November 1996 launch of 3dfx kickstarted GPU
| interest and OpenGL.
|
| After reading that, it's hard to take author seriously on the
| rest of the claims.
| ahofmann wrote:
| Wow, that is harsh. The quoted claim is in the middle of a very
| long article. The background of the author seems to be more on
| the scientific side, than the technical side. So throw out
| everything, because the author got one (not very important)
| date wrong?
| RicoElectrico wrote:
| Revisionist marketing should not be given a free pass.
| twelve40 wrote:
| yet it's almost the norm these days. Sick of hearing Steve
| Jobs invented smartphones when I personally was using a
| device with web and streaming music years before that.
| kragen wrote:
| You don't remember when Bill Gates and AOL invented the
| internet, Apple invented the GUI, and Tim Berners-Lee
| invented hypertext?
| santoshalper wrote:
| Possibly technically correct, but utterly irrelevant. The 3dfx
| chips accelerated parts of the 3d graphics pipeline and were
| not general-purpose programmable computers the way a modern GPU
| is (and thus would be useless for deep learning or any other
| kind of AI).
|
| If you are going to count 3dfx as a proper GPU and not just a
| geometry and lighting accelerator, then you might as well go
| back further and count things like the SGI Reality Engine.
| Either way, 3dfx wasn't really first to anything meaningful.
| FeepingCreature wrote:
| But the first NVidia GPUs didn't have general-purpose compute
| either. Google informs me that the first GPU with user-
| programmable shaders was the GeForce 3 in 2001.
| rramadass wrote:
| After actually having read the article i can say that your
| comment is unnecessarily negative and clueless.
|
| The article is a very good historical one showing how 3
| important things came together to make the current progress
| possible viz;
|
| 1) Geoffrey Hinton's back-propagation algorithm for deep neural
| networks
|
| 2) Nvidia's GPU hardware used via CUDA for AI/ML and
|
| 3) Fei-Fei Li's huge ImageNet database to train the algorithm
| on the hardware. This team actually used "Amazon Mechanical
| Turk"(AMT) to label the massive dataset of 14 million images.
|
| Excerpts;
|
| _"Pre-ImageNet, people did not believe in data," Li said in a
| September interview at the Computer History Museum. "Everyone
| was working on completely different paradigms in AI with a tiny
| bit of data."_
|
| _"That moment was pretty symbolic to the world of AI because
| three fundamental elements of modern AI converged for the first
| time," Li said in a September interview at the Computer History
| Museum. "The first element was neural networks. The second
| element was big data, using ImageNet. And the third element was
| GPU computing."_
| Someone wrote:
| I wound not call it invent", but it seems Nvidia defined the
| term _GPU_. See https://www.britannica.com/technology/graphics-
| processing-un... and
| https://en.wikipedia.org/wiki/GeForce_256#Architecture:
|
| _"GeForce 256 was marketed as "the world's first 'GPU', or
| Graphics Processing Unit", a term Nvidia defined at the time as
| "a single-chip processor with integrated transform, lighting,
| triangle setup/clipping, and rendering engines that is capable
| of processing a minimum of 10 million polygons per second""_
|
| They may have been the first with a product that fitted that
| definition to market.
| kragen wrote:
| That sounds like marketing wank, not a description of an
| invention.
|
| I don't think you can get a speedup by running neural
| networks on the GeForce 256, and the features listed there
| aren't really relevant (or arguably even _present_ ) in
| today's GPUs. As I recall, people were trying to figure out
| how to use GPUs to get faster processing in their Beowulfs in
| the late 90s and early 21st century, but it wasn't until
| about 02005 that anyone could actually get a speedup. The
| PlayStation 3's "Cell" was a little more flexible.
| KevinMS wrote:
| Can confirm. I was playing Unreal on my dual Voodoo2 SLI rig
| back in 1998.
| kragen wrote:
| Arguably the November 01981 launch of Silicon Graphics
| kickstarted GPU interest and OpenGL. You can read Jim Clark's
| 01982 paper about the Geometry Engine in https://web.archive.or
| g/web/20170513193926/http://excelsior..... His first key point
| in the paper was that the chip had a "general instruction set",
| although what he meant by it was quite different from today's
| GPUs. IRIS GL started morphing into OpenGL in 01992, and
| certainly when I went to SIGGRAPH 93 it was full of hardware-
| accelerated 3-D drawn with OpenGL on Silicon Graphics Hardware.
| But graphics coprocessors date back to the 60s; Evans &
| Sutherland was founded in 01968.
|
| I mean, I certainly don't think NVIDIA invented the GPU--that's
| a clear error in an otherwise pretty decent article--but it was
| a pretty gradual process.
| binarybits wrote:
| Defining who "really" invented something is often tricky. For
| example I mentioned in the article that there is some dispute
| about who discovered backpropagation. A
|
| According to Wikipedia, Nvidia released its first product, the
| RV1, in November 1995, the same month 3dfx released its first
| Voodoo Graphics 3D chip. Is there reason to think the 3dfx card
| was more of a "true" GPU than the RV1? If not, I'd say Nvidia
| has as good a claim to inventing the GPU as 3dfx does.
| in3d wrote:
| NV1, not RV1.
|
| 3dfx Voodoo cards were initially more successful, but I don't
| think anything not actually used for deep learning should
| count.
| vdvsvwvwvwvwv wrote:
| Lesson: ignore detractors. Especially if their argument is "dont
| be a tall poppy"
| jakeNaround wrote:
| The lesson is reality is not the dialectics and symbolic logic
| but all the stuff in it.
|
| Study story problems and you end up with string theory. Study
| data computed from endless world of stuff, find utility.
|
| What a shock building the bridge is more useful than a drawer
| full of bridge designs.
| aleph_minus_one wrote:
| > What a shock building the bridge is more useful than a
| drawer full of bridge designs.
|
| Here, opinions will differ.
| psd1 wrote:
| Also: look for fields that have stagnated, where progress is
| enabled by apparently-unrelated innovations elsewhere
| xanderlewis wrote:
| Unfortunately, they're usually right. We just don't hear about
| all the time wasted.
| blitzar wrote:
| On several occasions I have heard "they said it couldn't be
| done" - only to discover that yes it is technically correct,
| however, "they" was on one random person who had no clue and
| anyone with any domain knowledge said it was reasonable.
| friendzis wrote:
| Usually when I hear "they said it couldn't be done", it is
| used as triumphant downplay of legitimate critique. If you
| dig deeper that "couldn't be done" usually is in relation
| to some constraints or performance characteristics, which
| the "done" thing still does not meet, but the goalposts
| have already been moved.
| Ukv wrote:
| > that "couldn't be done" usually is in relation to some
| constraints or performance characteristics, which the
| "done" thing still does not meet
|
| I'd say theoretical proofs of impossibility tend to make
| valid logical deductions within the formal model they set
| up, but the issue is that model often turns out to be a
| deficient representation of reality.
|
| For instance, Minsky and Papert's Perceptrons book,
| credited in part with prompting the 1980s AI winter,
| gives a valid mathematical proof about inability of
| networks within their framework to represent the XOR
| function. This function is easily solved by multilayer
| neural networks, but Minsky/Papert considered those to be
| a "sterile" extension and believed neural networks
| trained by gradient descent would fail to scale up.
|
| Or more contemporary, Gary Marcus has been outspoken
| since 2012 that deep learning is hitting a wall - giving
| the example that a dense network trained on just `1000 ->
| 1000`, `0100 -> 0100`, `0010 -> 0010` can't then reliably
| predict `0001 -> 0001` because the fourth output neuron
| was never activated in training. Similarly, this function
| is easily solved by transformers representing
| input/output as a sequence of tokens thus not needing to
| light up an untrained neuron to give the answer (nor do
| humans when writing/speaking the answer).
|
| If I claimed that it was topologically impossible to
| drink a Capri-Sun, then someone comes along and punctures
| it with a straw (an unaccounted for advancement from the
| blindspot of my model), I could maybe cling on and argue
| that my challenge remains technically true and unsolved
| because that violates one of the constraints I set out -
| but at the very least the relevance of my proof to
| reality has diminished and it may no longer support the
| viewpoints/conclusions I intended it to ("don't buy
| Capri-Sun"). Not to say that theoretical results can't
| still be interesting in their own right - like the
| halting problem, which does not apply to real computers.
| marcosdumay wrote:
| It's extremely common that legitimate critique gets used
| to illegitimately attack people doing things differently
| enough that the relative importance of several factors
| change.
|
| This is really, really common. And it's done both by
| mistake and in bad faith. In fact, it's a guarantee that
| once anybody tries anything different enough, they'll be
| constantly attacked this way.
| vdvsvwvwvwvwv wrote:
| What if the time wasted is part of the search? The hive wins
| but a bee may not. (Capitalism means some bees win too)
| xanderlewis wrote:
| It is. But most people are not interested in simply being
| 'part of the search' -- they want a career, and that relies
| on individual success.
| madaxe_again wrote:
| I can't be the only one who has watched this all unfold with a
| sense of inevitability, surely.
|
| When the first serious CUDA based ML demos started appearing a
| decade or so ago, it was, at least to me, pretty clear that this
| would lead to AGI in 10-15 years - and here we are. It was the
| same sort of feeling as when I first saw the WWW aged 11, and
| knew that this was going to eat the world - and here we are.
|
| The thing that flummoxes me is that now that we are so obviously
| on this self-reinforcing cycle, how many are still insistent that
| AI will amount to nothing.
|
| I am reminded of how the internet was just a fad - although this
| is going to have an even greater impact on how we live, and our
| economies.
| BriggyDwiggs42 wrote:
| Downvoters are responding to a perceived arrogance. What does
| agi mean to you?
| nineteen999 wrote:
| Could be arrogance, or could be the delusion.
| madaxe_again wrote:
| Why is it a delusion, in your opinion?
| andai wrote:
| It's a delusion on the part of the downvoters.
| BriggyDwiggs42 wrote:
| Indeed, it sure could be arrogance.
| xen0 wrote:
| What makes you think AGI is either here or imminent?
|
| For me the current systems still clearly fall short of that
| goal.
| madaxe_again wrote:
| They do fall short, but progress in this field is not linear.
| This is the bit that I struggle to comprehend - that which
| was literally infeasible only a few years ago is now mocked
| and derided.
|
| It's like jet engines and cheap intercontinental travel
| becoming an inevitability once the rubicon of powered flight
| is crossed - and everyone bitching about the peanuts while
| they cruise at inconceivable speed through the atmosphere.
| diffeomorphism wrote:
| Just like supersonic travel between Europe and America
| becoming common place was inevitable. Oh, wait.
|
| Optimism is good, blind optimism isn't.
| madaxe_again wrote:
| It _is_ yet inevitable - but it wasn't sustainable in the
| slightest when it was first attempted - Concorde was akin
| to the Apollo programme, in being precocious and
| prohibitively expensive due to the technical limitations
| of the time. It will, ultimately, be little more
| remarkable than flying is currently, even as we hop
| around on suborbital trajectories.
|
| It isn't a question of optimism - in fact, I am deeply
| pessimistic as to what ML will mean for humanity as a
| whole, at least in the short term - it's a question of
| seeing the features of a confluence of technology, will,
| and knowledge that has in the past spurred technical
| revolution.
|
| Newcomen was far from the first to develop a steam
| engine, but there was suddenly demand for such beasts, as
| shallow mines became exhausted, and everything else
| followed from that.
|
| ML has been around in one form or another for decades now
| - however we are now at the point where the technology
| exists, insofar as modern GPUs exist, the will exists,
| insofar as trillions of dollars of investment flood into
| the space, and the knowledge exists, insofar as we have
| finally produced machine learning models which are non-
| trivial.
|
| Just as with powered flight, the technology - the
| internal combustion engine - had to be in place, as did
| the will (the First World War), and the knowledge, which
| we had possessed for centuries but had no means or will
| to act upon. The idea was, in fact, _ridiculous_. Nobody
| could see the utility - until someone realised you could
| use them to drop ordnance on your enemies.
|
| With supersonic flight - the technology is emerging, the
| will will be provided by the substantial increase in
| marginal utility provided by sub-hour transit compared to
| the relatively small improvement Concorde offered, and
| the knowledge, again, we already have.
|
| So no, not optimism - just observance of historical
| forces. When you put these things together, there tend to
| be technical revolutions, and resultant societal change.
| xen0 wrote:
| > It is yet inevitable
|
| The cool thing for you is that this statement is
| unfalsifiable.
|
| But what I really took issue with was your timeframe in
| the original comment. Your statements imply you fully
| expect AGI to be a thing within a couple of years. I do
| not.
| marcosdumay wrote:
| > insofar as modern GPUs exist, the will exists, insofar
| as trillions of dollars of investment flood into the
| space
|
| Funny thing. I expect deep learning to have a really bad
| next decade, people betting on it to see quick bad
| results, and maybe it even disappear from the visible
| economy for a while exactly because there have been
| hundreds of billions of dollars invested.
|
| What is no fault of the technology, that I expect to have
| some usefulness on the long term. I expect a really bad
| reaction to it coming exclusively from the excessive
| hype.
| oersted wrote:
| What do you think is next?
| madaxe_again wrote:
| An unravelling, as myriad possibilities become actualities.
| The advances in innumerate fields that ML will unlock will
| have enormous impacts.
|
| Again, I cannot understand for the life of me how people
| cannot see this.
| alexander2002 wrote:
| I had a hypothesis once and It is probably 1000% wrong. But
| I will state here. /// Once computers can talk to other
| computers over network in human friendly way <abstraction
| by llm> and such that these entities completely control our
| interfaces which we humans can easily do and use them
| effectively multi-modality then I think there is a slight
| chance "I" might belive there is AGI or atleast some
| indications of it
| marcosdumay wrote:
| It's unsettling how the Turing Test turned out to be so
| independent of AGI, isn't it?
| selimthegrim wrote:
| Innumerable?
| 2sk21 wrote:
| I'm surprised that the article doesn't mention that one of the
| key factors that enabled deep learning was the use of RELU as the
| activation function in the early 2010s. RELU behaves a lot better
| than the logistic sigmoid that we used until then.
| sanxiyn wrote:
| Geoffrey Hinton (now a Nobel Prize winner!) himself did a
| summary. I think it is the single best summary on this topic.
| Our labeled datasets were thousands of times too small.
| Our computers were millions of times too slow. We
| initialized the weights in a stupid way. We used the
| wrong type of non-linearity.
| imjonse wrote:
| That is a pithier formulation of the widely accepted summary
| of "more data + more compute + algo improvements"
| sanxiyn wrote:
| No, it isn't. It emphasizes importance of Glorot
| initialization and ReLU.
| cma wrote:
| As compute has outpaced memory bandwidth most recent stuff has
| moved away from ReLU. I think Llama 3.x uses SwiGLU. Still
| probably closer to ReLU than logistic sigmoid, but it's back to
| being something more smooth than ReLU.
| 2sk21 wrote:
| Indeed, there have been so many new activation functions that
| I have stopped following the literature after I retired. I am
| glad to see that people are trying out new things.
| DeathArrow wrote:
| I think neural nets are just a subset of machine learning
| techniques.
|
| I wonder what would have happened if we poured the same amount of
| money, talent and hardware into SVMs, random forests, KNN, etc.
|
| I don't say that transformers, LLMs, deep learning and other
| great things that happened in the neural network space aren't
| very valuable, because they are.
|
| But I think in the future we should also study other options
| which might be better suited than neural networks for some
| classes of problems.
|
| Can a very large and expensive LLM do sentiment analysis or
| classification? Yes, it can. But so can simple SVMs and KNN and
| sometimes even better.
|
| I saw some YouTube coders doing calls to OpenAI's o1 model for
| some very simple classification tasks. That isn't the best tool
| for the job.
| Meloniko wrote:
| And based on what though do you think that?
|
| I think neural networks are fundamental and we will
| focus/experiment a lot more with architecture, layers and other
| parts involved but emerging features arise through size
| mentalgear wrote:
| KANs (Kolmogorov-Arnold Networks) are one example of a
| promising exploration pathway to real AGI, with the advantage
| of full explain-ability.
| astrange wrote:
| "Explainable" is a strong word.
|
| As a simple example, if you ask a question and part of the
| answer is directly quoted from a book from memory, that text
| is not computed/reasoned by the AI and so doesn't have an
| "explanation".
|
| But I also suspect that any AGI would necessarily produce
| answers it can't explain. That's called intuition.
| diffeomorphism wrote:
| Why? If I ask you what the height of the Empire State
| Building is, then a reference is a great, explainable
| answer.
| astrange wrote:
| It wouldn't be a reference; "explanation" for an LLM
| means it tells you which of its neurons were used to
| create the answer, ie what internal computations it did
| and which parts of the input it read. Their architecture
| isn't capable of referencing things.
|
| What you'd get is an explanation saying "it quoted this
| verbatim", or possibly "the top neuron is used to output
| the word 'State' after the word 'Empire'".
|
| You can try out a system here:
| https://monitor.transluce.org/dashboard/chat
|
| Of course the AI could incorporate web search, but then
| what if the explanation is just "it did a web search and
| that was the first result"? It seems pretty difficult to
| recursively make every external tool also explainable...
| Retric wrote:
| LLM's are not the only possible option here. When talking
| about AGI none of what we are doing is currently that
| promising.
|
| The search is for something that can write an essay,
| drive a car, and cook lunch so we need something new.
| Vampiero wrote:
| When people talk about explainability I immediately think
| of Prolog.
|
| A Prolog query is explainable precisely because, by
| construction, it itself is the explanation. And you can
| go step by step and understand how you got a particular
| result, inspecting each variable binding and predicate
| call site in the process.
|
| Despite all the billions being thrown at modern ML, no
| one has managed to create a model that does something
| like what Prolog does with its simple recursive
| backtracking.
|
| So the moral of the story is that you can 100% trust the
| result of a Prolog query, but you can't ever trust the
| output of an LLM. Given that, which technology would you
| rather use to build software on which lives depend on?
|
| And which of the two methods is more "artificially
| intelligent"?
| astrange wrote:
| The site I linked above does that for LLaMa 8B.
|
| https://transluce.org/observability-interface
|
| LLMs don't have enough self-awareness to produce really
| satisfying explanations though, no.
| diffeomorphism wrote:
| Then you should have a stronger notion of "explanation".
| Why were these specific neurons activated?
|
| Simplest example: OCR. A network identifying digits can
| often be explained as recognizing lines, curves, numbers
| of segments etc.. That is an explanation, not "computer
| says it looks like an 8"
| krisoft wrote:
| But can humans do that? If you show someone a picture of
| a cat, can they "explain" why is it a cat and not a dog
| or a pumpkin?
|
| And is that explanation the way how they obtained the
| "cat-nes" of the picture, or do they just see that it is
| a cat immediately and obviously and when you ask them for
| an explanation they come up with some explaining noises
| until you are satisfied?
| diffeomorphism wrote:
| Wild cat, house cat, lynx,...? Sure, they can. They will
| tell you about proportions, shape of the ears, size as
| compared to other objects in the picture etc.
|
| For cat vs pumpkin they will think you are making fun of
| them, but it very much is explainable. Though now I am
| picturing a puzzle about finding orange cats in a picture
| of a pumpkin field.
| fragmede wrote:
| Shown a picture of a cloud, why it looks like a cat does
| sometimes need an explanation until others can see the
| cat, and it's not just "explaining noises".
| trhway wrote:
| >I wonder what would have happened if we poured the same amount
| of money, talent and hardware into SVMs, random forests, KNN,
| etc.
|
| people did that to horses. No car resulted from it, just
| slightly better horses.
|
| >I saw some YouTube coders doing calls to OpenAI's o1 model for
| some very simple classification tasks. That isn't the best tool
| for the job.
|
| This "not best tool" is just there for the coders to call while
| the "simple SVMs and KNN" would require coding and training by
| those coders for the specific task they have at hand.
| guappa wrote:
| [citation needed]
| empiko wrote:
| Deep learning is easy to adapt to various domains, use cases,
| training criteria. Other approaches do not have the flexibility
| of combining arbitrary layers and subnetworks and then training
| them with arbitrary loss functions. The depth in deep learning
| is also pretty important, as it allows the model to create
| hierarchical representations of the inputs.
| f1shy wrote:
| But is very hard to validate for important or critical
| applications
| jasode wrote:
| _> I wonder what would have happened if we poured the same
| amount of money, talent and hardware into SVMs, random forests,
| KNN, etc._
|
| But that's backwards from how new techniques and progress is
| made. What actually happens is somebody (maybe a student at a
| university) has an _insight or new idea for an algorithm_ that
| 's near $0 cost to implement a proof-of concept. Then everybody
| else notices the improvement and then extra millions/billions
| get directed toward it.
|
| New ideas -- that didn't cost much at the start -- ATTRACT the
| follow on billions in investments.
|
| This timeline of tech progress in computer science is the
| opposite from other disciplines such as materials science or
| bio-medical fields. Trying to discover the next super-alloy or
| cancer drug all requires expensive experiments. Manipulating
| atoms & molecules requires very expensive specialized
| equipment. In contrast, computer science experiments can be
| cheap. You just need a clever insight.
|
| An example of that was the 2012 AlexNet image recognition
| algorithm that blew all the other approaches out of the water.
| Alex Krizhevsky had an new insight on a convolutional neural
| network to run on CUDA. He bought 2 NVIDIA cards (GTX580 3GB
| GPU) from Amazon. It didn't require NASA levels of investment
| at the start to implement his idea. Once everybody else noticed
| his superior results, the billions began pouring in to
| iterate/refine on CNNs.
|
| Both the "attention mechanism" and the refinement of
| "transformer architecture" were also cheap to prove out at a
| very small scale. In 2014, Jakob Uszkoreit thought about an
| "attention mechanism" instead of RNN and LSTM for machine
| translation. It didn't cost billions to come up with that idea.
| Yes, ChatGPT-the-product cost billions but the "attention
| mechanism algorithm" did not.
|
| _> into SVMs, random forests, KNN, etc._
|
| If anyone has found an unknown insight into SVM, KNN, etc that
| everybody else in the industry has overlooked, they can do
| cheap experiments to prove it. E.g. The entire Wikipedia text
| download is currently only ~25GB. Run the new SVM
| classification idea on that corpus. Very low cost experiments
| in computer science algorithms can still be done in the
| proverbial "home garage".
| FrustratedMonky wrote:
| "$0 cost to implement a proof-of concept"
|
| This falls apart for breakthroughs that are not zero cost to
| do a proof-of concept.
|
| Think that is what the parent is rereferring . That other
| technologies might have more potential, but would take money
| to build out.
| scotty79 wrote:
| Do transformer architecture and attention mechanisms actually
| give any benefit to anything else than scalability?
|
| I though the main insights were embeddings, positional
| encoding and shortcuts through layers to improve back
| propagation.
| DeathArrow wrote:
| True, you might not need lots of money to test some ideas.
| But LLMs and transformers are all the rage so they gather all
| attention and research funds.
|
| People don't even think of doing anything else and those that
| might do, are paid to pursue research on LLMs.
| edude03 wrote:
| Transformers were made for machine translation - someone had
| the insight that when going from one language to another the
| context mattered such that the tokens that came before would
| bias which ones came after. It just so happened that
| transformers we more performant on other tasks, and at the time
| you could demonstrate the improvement on a small scale.
| ldjkfkdsjnv wrote:
| This is such a terrible opinion, im so tired of reading the LLM
| deniers
| f1shy wrote:
| > neural nets are just a subset of machine learning techniques.
|
| Fact by definition
| dr_dshiv wrote:
| The best tool for the job is, I'd argue, the one that does the
| job most reliably for the least amount of money. When you
| consider how little expertise or data you need to use openai
| offerings, I'd be surprised if sentiment analysis using
| classical ML methods are actually better (unless you are an
| expert and have a good dataset).
| jensgk wrote:
| > I wonder what would have happened if we poured the same
| amount of money, talent and hardware into SVMs, random forests,
| KNN, etc.
|
| From my perspective, that is actually what happened between the
| mid-90s to 2015. Neural netowrks were dead in that period, but
| any other ML method was very, very hot.
| macrolime wrote:
| I took some AI courses around the same time as the author, and I
| remember the professors were actually big proponents of neural
| nets, but they believed the missing piece was some new genius
| learning algorithm rather than just scaling up with more data.
| rramadass wrote:
| > rather than just scaling up with more data.
|
| That was the key takeaway for me from this article. I didn't
| know of Fei-Fei Li's ImageNet contribution which actually gave
| all the other researchers the essential data to train with. Her
| intuition that more data would probably make the accuracy of
| existing algorithms better i think is very much under
| appreciated.
|
| Key excerpt;
|
| _So when she got to Princeton, Li decided to go much bigger.
| She became obsessed with an estimate by vision scientist Irving
| Biederman that the average person recognizes roughly 30,000
| different kinds of objects. Li started to wonder if it would be
| possible to build a truly comprehensive image dataset--one that
| included every kind of object people commonly encounter in the
| physical world._
| aithrowawaycomm wrote:
| I think there is a slight disconnect here between making AI
| systems which are smart and AI systems which are _useful._ It's a
| very old fallacy in AI: pretending tools which _assist_ human
| intelligence by solving human problems must themselves be
| intelligent.
|
| The utility of big datasets was indeed surprising, but that
| skepticism came about from recognizing the scaling paradigm
| _must_ be a dead end: vertebrates across the board require less
| data to learn new things, by several orders of magnitude. Methods
| to give ANNs "common sense" are essentially identical to the old
| LISP expert systems: hard-wiring the answers to specific common-
| sense questions in either code or training data, even though fish
| and lizards can rapidly make common-sense deductions about
| manmade objects they couldn't have possibly seen in their
| evolutionary histories. Even spiders have generalization
| abilities seemingly absent in transformers: they spin webs inside
| human homes with unnatural geometry.
|
| Again it is surprising that the ImageNet stuff worked as well as
| it did. Deep learning is undoubtedly a useful way to build
| applications, just like Lisp was. But I think we are about as
| close to AGI as we were in the 80s, since we have made zero
| progress on common sense: in the 80s we knew Big Data can poorly
| emulate common sense, and that's where we're at today.
| j_bum wrote:
| > vertebrates across the board require less data to learn new
| things, by several orders of magnitude.
|
| Sometimes I wonder if it's fair to say this.
|
| Organisms have had billions of years of training. We might come
| online and succeed in our environments with very little data,
| but we can't ignore the information that's been trained into
| our DNA, so to speak.
|
| What's billions of years of sensory information that drove
| behavior and selection, if not training data?
| aithrowawaycomm wrote:
| My primary concern is the generalization to manmade things
| that couldn't possibly be in the evolutionary "training
| data." As a thought experiment, it seems very plausible that
| you can train a transformer ANN on spiderwebs between trees,
| rocks, bushes, etc, and get "superspider" performance (say in
| a computer simulation). But I strongly doubt this will
| generalize to building webs between garages and pantries like
| actual spiders, no matter how many trees you throw at it, so
| such a system wouldn't be ASI.
|
| This extends to all sorts of animal cognitive experiments:
| crows understand simple pulleys simply by inspecting them,
| but they couldn't have evolved to _use_ pulleys. Mice can
| quickly learn that hitting a button 5 times will give them a
| treat: does it make sense to say that they encountered a
| similar situation in their evolutionary past? It makes more
| sense to suppose that mice and crows have powerful abilities
| to reason causally about their actions. These abilities are
| more sophisticated than mere "Pavlovian" associative
| reasoning, which is about understanding _stimuli_. With AI we
| can emulate associative reasoning very well because we have a
| good mathematical framework for Pavlovian responses as a sort
| of learning of correlations. But causal reasoning is much
| more mysterious, and we are very far from figuring out a good
| mathematical formalism that a computer can make sense of.
|
| I also just detest the evolution = training data metaphor
| because it completely ignores architecture. Evolution is not
| just glomming on data, it's trying different types of
| neurons, different connections between them, etc. All
| organisms alive today evolved with "billions of years of
| training," but only architecture explains why we are so much
| smarter than chimps. In fact I think the "evolution" preys on
| our misconception that humans are "more evolved" than chimps,
| but our common ancestor was more primitive than a chimp.
| visarga wrote:
| I don't think "humans/animals learn faster" holds. LLMs
| learn new things on the spot, you just explain it in the
| prompt and give an example or two.
|
| A recent paper tested both linguists and LLMs at learning a
| language with less than 200 speakers and therefore
| virtually no presence on the web. All from a few pages of
| explanations. The LLMs come close to humans.
|
| https://arxiv.org/abs/2309.16575
|
| Another example is the ARC-AGI benchmark, where the model
| has to learn from a few examples to derive the rule. AI
| models are closing the gap to human level, they are around
| 55% while humans are at 80%. These tests were specifically
| designed to be hard for models and easy for humans.
|
| Besides these examples of fast learning, I think the other
| argument about humans benefiting from evolution is also
| essential here. Similarly, we can't beat AlphaZero at Go,
| as it evolved its own Go culture and plays better than us.
| Evolution is powerful.
| car wrote:
| It's all in the architecture. Also, biological neurons are
| orders of magnitude more complex than NN's. There's a
| plethora of neurotransmitters and all kinds of cellular
| machinery for dealing with signals (inhibitory, excitatory
| etc.).
| RaftPeople wrote:
| > _Organisms have had billions of years of training. We might
| come online and succeed in our environments with very little
| data, but we can't ignore the information that's been trained
| into our DNA, so to speak_
|
| It's not just information (e.g. sets of innate smells and
| response tendencies), but it's also all of the advanced
| functions built into our brains (e.g. making sense of
| different types of input, dynamically adapting the brain to
| conditions, etc.).
| lubujackson wrote:
| Good point. And don't forget the dynamically changing
| environment responding with a quick death for any false path.
|
| Like how good would LLMs be if their training set was built
| by humans responding with an intelligent signal at every
| crossroads.
| SiempreViernes wrote:
| This argument mostly just hollows out the meaning of
| training: evolution gives you things like arms and ears, but
| if you say evolution is like training you imply that you
| could have grown a new kind of arm in school.
| horsawlarway wrote:
| Training an LLM feels almost exactly like evolution - the
| gradient is "ability to procreate" and we're selecting
| candidates from related, randomized genetic traits and
| iterating the process over and over and over.
|
| Schooling/education feels much more like supervised
| training and reinforcement (and possibly just context).
|
| I think it's dismissive to assume that evolution hasn't
| influenced how well you're able to pick up new behavior,
| because it's highly likely it's not entirely novel in the
| context of your ancestry, and the traits you have that have
| been selected for.
| marcosdumay wrote:
| > but we can't ignore the information that's been trained
| into our DNA
|
| There's around 600MB in our DNA. Subtract this from the size
| of any LLM out there and see how much you get.
| rjsw wrote:
| Maybe we just collectively decided that it didn't matter
| whether the answer was correct or not.
| aithrowawaycomm wrote:
| Again I do think these things have utility and the
| unreliability of LLMs is a bit incidental here. Symbolic
| systems in LISP are highly reliable, but they couldn't
| possibly be extended to AGI without another component, since
| there was no way to get the humans out of the loop: someone
| had to assign the symbols semantic meaning and encode the
| LISP function accordingly. I think there's a similar
| conceptual issue with current ANNs, and LLMs in particular:
| they rely on far too much formal human knowledge to get off
| the ground.
| rjsw wrote:
| I meant more why the "boom caught almost everyone by
| surprise", people working in the field thought that correct
| answers would be important.
| nxobject wrote:
| Barring a stunning discovery that will stop putting the
| responsibility for NN intelligence on synthetic training
| set - it looks like NN and symbolic AI may have to coexist,
| symbiotically.
| spencerchubb wrote:
| > vertebrates across the board require less data to learn new
| things
|
| the human brain is absolutely inundated with data, especially
| from visual, audio, and kinesthetic mediums. the data is a very
| different form than what one would use to train a CNN or LLM,
| but it is undoubtedly data. newborns start out literally being
| unable to see, and they have to develop those neural pathways
| by taking in the "pixels" of the world for every millisecond of
| every day
| kirkules wrote:
| Do you have, offhand, any names or references to point me
| toward why you think fish and lizards can make rapid common
| sense deductions about man made objects they couldn't have seen
| in their evolutionary histories?
|
| Also, separately, I'm only assuming but it seems the reason you
| think these deductions are different from hard wired answers if
| that their evolutionary lineage can't have had to make similar
| deductions. If that's your reasoning, it makes me wonder if
| you're using a systematic description of decisions and of the
| requisite data and reasoning systems to make those decisions,
| which would be interesting to me.
| aleph_minus_one wrote:
| > I think there is a slight disconnect here between making AI
| systems which are smart and AI systems which are _useful_. It's
| a very old fallacy in AI: pretending tools which _assist_ human
| intelligence by solving human problems must themselves be
| intelligent.
|
| I have difficulties understanding why you could even believe in
| such a fallacy: just look around you: most jobs that have to be
| done require barely any intelligence, and on the other hand,
| there exist few jobs that _do_ require an insane amount of
| intelligence.
| kleiba wrote:
| _> "Pre-ImageNet, people did not believe in data," Li said in a
| September interview at the Computer History Museum. "Everyone was
| working on completely different paradigms in AI with a tiny bit
| of data."_
|
| That's baloney. The old ML adage "there's no data like more data"
| is as old as mankind itself.
| FrustratedMonky wrote:
| Not really. This is referring back to the 80's. People weren't
| even doing 'ML'. And back then people were more focused on
| teasing out 'laws' in as few data points as possible. The focus
| was more on formulas and symbols, and finding relationships
| between individual data points. Not the broad patterns we take
| for granted today.
| criddell wrote:
| I would say using backpropagation to train multi-layer neural
| networks would qualify as ML and we were definitely doing
| that in 80's.
| UltraSane wrote:
| Just with tiny amounts of data.
| jensgk wrote:
| Compared to today. We thought we used large amounts of
| data at the time.
| UltraSane wrote:
| "We thought we used large amounts of data at the time."
|
| Really? Did it take at least an entire rack to store?
| jensgk wrote:
| We didn't measure data size that way. At some point in
| the future someone would find this dialog, and think that
| we dont't have large amounts of data now, because we are
| not using entire solar systems for storage.
| UltraSane wrote:
| Why can't you use a rack as a unit of storage at the
| time? Were 19" server racks not in common use yet? The
| storage capacity of a rack will grow over time.
|
| my storage hierarchy goes 1) 1 storage drive 2) 1 server
| maxed out with the biggest storage drives available 3) 1
| rack filled with servers from 2 4) 1 data center filled
| with racks from 3
| fragmede wrote:
| How big is a rack in VW beetles though?
|
| It's a terrible measurement because it's an irrelevant
| detail about how their data is stored that no one
| actually knows if your data is being stored in a
| proprietary cloud except for people that work there on
| that team.
|
| So while someone could say they used a 10 TiB data set,
| or 10T parameters, how many "racks" of AWS S3 that is, is
| not known outside of Amazon.
| mistrial9 wrote:
| mid-90s had neural nets, even a few popular science kinds of
| books on it. The common hardware was so much less capable
| then.
| sgt101 wrote:
| mid-60's had neural nets.
|
| mid-90's had LeCun telling everyone that big neural nets
| were the future.
| dekhn wrote:
| Mid 90s I was working on neural nets and other machine
| learning, based on gradient descent, with manually
| computed derivatives, on genomic data (from what I can
| recall, we had no awareness of LeCun; I didnt find out
| about his great OCR results until much later). it worked
| fine and it seemed like a promising area.
|
| My only surprise is how long it took to get to imagenet,
| but in retrospect, I appreciate that a number of
| conditions had to be met (much more data, much better
| algorithms, much faster computers). I also didn't
| recognize just how poorly MLPs were for sequence
| modelling, compared to RNNs and transformers.
| sgt101 wrote:
| I'm so out of things ! What do you mean manually computed
| derivatives?
| evrydayhustling wrote:
| Not baloney. The culture around data in 2005-2010 -- at least /
| especially in academia -- was night and day to where it is
| today. It's not that people didn't understand that more data
| enabled richer + more accurate models, but that they accepted
| data constraints as a part of the problem setup.
|
| Most methods research went into ways of building beliefs about
| a domain into models as biases, so that they could be more
| accurate in practice with less data. (This describes a lot of
| PGM work). This was partly because there was still a tug of war
| between CS and traditional statistics communities on ML, and
| the latter were trained to be obsessive about model
| specification.
|
| One result was that the models that were practical for
| production inference were often trained to the point of
| diminishing returns on their specific tasks. Engineers
| deploying ML weren't wishing for more training instances, but
| better data at inference time. Models that could perform more
| general tasks -- like differentiating 90k object classes rather
| than just a few -- were barely even on most people's radar.
|
| Perhaps folks at Google or FB at the time have a different
| perspective. One of the reasons I went ABD in my program was
| that it felt industry had access to richer data streams than
| academia. Fei Fei Li's insistence on building an academic
| computer science career around giant data sets really was
| ingenius, and even subversive.
| bsenftner wrote:
| The culture was and is skeptical in biased manners. Between
| '04 and '08 I worked with a group that had trained neural
| nets for 3D reconstruction of human heads. They were using it
| for prenatal diagnostics and a facial recognition pre-
| processor, and I was using it for creating digital doubles in
| VFX film making. By '08 I'd developed a system suitable for
| use in mobile advertising, creating ads with people in them,
| and 3D games with your likeness as the player. VCs thought we
| were frauds, and their tech advisors told them our tech was
| an old discredited technique that could not do what we
| claimed. We spoke to every VC, some of which literally kicked
| us out. Finally, after years of "no" that same AlexNet
| success begins to change minds, but _now_ they want the tech
| to create porn. At that point, after years of "no" I was
| making children's educational media, there was no way I was
| gonna do porn. Plus, president of my co was a woman, famous
| for creating children's media. Yeah, the culture was
| different then, not too long ago.
| evrydayhustling wrote:
| Wow, so early for generative -- although I assume you were
| generating parameters that got mapped to mesh positions,
| rather than generating pixels?
|
| I definitely remember that bias about neural nets, to the
| point of my first grad ML class having us recreate proofs
| that you should never need more than two hidden layers (one
| can pick up the thread at [1]). Of all the ideas clunking
| around in the AI toolbox at the time, I don't really have
| background on why people felt the need to kill NN with
| fire.
|
| [1] https://en.wikipedia.org/wiki/Universal_approximation_t
| heore...
| bsenftner wrote:
| It was annotated face images and 3D scans of heads
| trained to map one to the other. After a threshold in the
| size of the training data, good to great results from a
| single photo could be had to generate the mesh 3D
| positions, and then again to map the photo onto the mesh
| surface. Do that with multiple frames, and one is firmly
| in the Uncanny Valley.
| philipkglass wrote:
| Who's offering VC money for neural network porn technology?
| As far as I can tell, there is huge organic demand for this
| but prospective users are mostly cheapskates and the area
| is rife with reputational problems, app store barriers,
| payment processor barriers, and regulatory barriers. In
| practice I have only ever seen investors scared off by
| hints that a technology/platform would be well matched to
| adult entertainment.
| tucnak wrote:
| > they accepted data constraints as a part of the problem
| setup.
|
| I've never heard this be put so succinctly! Thank you
| littlestymaar wrote:
| In 2019, GPT-2 1.5B was trained on ~10B tokens.
|
| Last week Hugging Face released SmolLM v2 1.7B trained on 11T
| tokens, 3 orders of magnitude more training data for the same
| number of tokens with almost the same architecture.
|
| So even back in 2019 we can say we were working with a tiny
| amount of data compared to what is routine now.
| kleiba wrote:
| True. But my point is that the quote "people didn't believe
| in data" is not true. Back in 2019, when GPT-2 was trained,
| the reason they didn't use the 3T of today was not because
| they "didn't believe in data" - they totally would have had
| it been technically feasible (as in: they had that much data
| + the necessary compute).
|
| The same has always been true. There has never been a stance
| along the lines of "ah, let's not collect more data - it's
| not worth it!". It's always been other reasons, typically the
| lack of resources.
| littlestymaar wrote:
| > they totally would have had it been technically feasible
|
| TinyLlama[1] has been made by _an individual on their own_
| last year, training a 1.1B model on 3T tokens with just 16
| A100-40G GPUs in 90 days. It was definitely within reach of
| any funded org in 2019.
|
| In 2022 (IIRC), Google released the Chinchilla paper about
| the compute-optimal amount of data to train a given model,
| for a 1B model, the value was determined to be 20B tokens,
| which again is 3 orders of magnitude below the current
| state of the art for the same class of model.
|
| Until very recently (the first llama paper IIRC, and people
| noticing that the 7B model showed no sign of saturation
| during its already very long training) the ML community
| vastly underestimated the amount of training data that was
| needed to make a LLM perform at its potential.
|
| [1]: https://github.com/jzhang38/TinyLlama
| kleiba wrote:
| Answering to people arguing against my comment: you guys do not
| seem to take into account that the technical circumstances were
| totally different thirty, twenty or even ten years ago! People
| would have liked to train with more data, and there was a big
| interest in combining heterogeneous datasets to achieve exactly
| that. But one major problem was the compute! There weren't any
| pretrained models that you specialized in one way or the other
| - you always retrained from scratch. I mean, even today, who's
| get the capability to train a multibillion GPT from scratch?
| And not just retraining once a tried and trusted
| architecture+dataset, no, I mean as a research project trying
| to optimize your setup towards a certain goal.
| kccqzy wrote:
| Pre-ImageNet was like pre-2010. Doing ML with massive data
| really wasn't in vogue back then.
| mistrial9 wrote:
| except in Ivory Towers of Google + Facebook
| disgruntledphd2 wrote:
| Even then maybe Google but probably not Facebook. Ads used
| ML but there wasn't that much of it in feed. Like, there
| were a bunch of CV projects that I saw in 2013 that didn't
| use NNs. Three years later, otoh you couldn't find a
| devserver without tripping over an NN along the way.
| sgt101 wrote:
| It's not quite so - we couldn't handle it, and we didn't have
| it, so it was a bit of a none question.
|
| I started with ML in 1994, I was in a small poor lab - so we
| didn't have state of the art hardware. On the other hand I
| think my experience is fairly representative. We worked with
| data sets on spark workstations that were stored in flat files
| and had thousands or sometimes tens of thousands of instances.
| We had problems keeping our data sets on the machines and often
| archived them to tape.
|
| Data came from very deliberate acquisition processes. For
| example I remember going to a field exercise with a particular
| device and directing it's use over a period of days in order to
| collect the data that would be needed for a machine learning
| project.
|
| Sometime in the 2000's data started to be generated and
| collected as "exhaust" from various processes. People and
| organisations became instrumented in the sense that their daily
| activities were necessarily captured digitally. For a time this
| data was latent, people didn't really think about using it in
| the way that we think about it now, but by about 2010 it was
| obvious that not only was this data available but we had the
| processing and data systems to use it effectively.
| icf80 wrote:
| logic is data and data is logic
| hollerith wrote:
| The deep learning boom caught deep-learning researchers by
| surprise because deep-learning researchers don't understand their
| craft well enough to predict essential properties of their
| creations.
|
| A model is grown, not crafted like a computer program, which
| makes it hard to predict. (More precisely, a big growth phase
| follows the crafting phase.)
| lynndotpy wrote:
| I was a deep learning researcher. The problem is that accuracy
| (+ related metrics) were prioritized in research and funding.
| Factors like interpretability, extrapolation, efficiency, or
| consistency were not prioritized, but were clearly important
| before being implemented.
|
| Dall-E was the only big surprising consumer model-- 2022 saw a
| sudden huge leap from "txt2img is kind of funny" to "txt2img is
| actually interesting". I would have assumed such a thing could
| only come in 2030 or earlier. But deep learning is full of
| counterintuitive results (like the NFL theorem not mattering,
| or ReLU being better than sigmoid).
|
| But in hindsight, it was naive to think "this does not work
| yet" would get in the way of the products being sold and
| monetized.
| nxobject wrote:
| I'm still very taken aback by how far we've been able to take
| prompting as somehow our universal language to communicate with
| AI of choice.
| TheRealPomax wrote:
| It wasn't "almost everyone", it was straight up everyone.
| vl wrote:
| _So the AI boom of the last 12 years was made possible by three
| visionaries who pursued unorthodox ideas in the face of
| widespread criticism._
|
| I argue that Mikolov with word2vec was instrumental in current AI
| revolution. This demonstrated the easy of extracting meaning in
| mathematical way from text and directly lead to all advancements
| we have today with LLMs. And ironically, didn't require GPU.
| MichaelZuo wrote:
| How much easier was it compared to the next best method at the
| time?
| gregw2 wrote:
| The article credits two academics (Hinton, Fei Fei Li) and a CEO
| (Jensen Huang). But really it was three academics.
|
| Jensen Huang, reasonably, was desperate for any market that could
| suck up more compute, which he could pivot to from GPUs for
| gaming when gaming saturated its ability to use compute. Screen
| resolutions and visible polygons and texture maps only demand so
| much compute; it's an S-curve like everything else. So from a
| marketing/market-development and capital investment perspective I
| do think he deserves credit. Certainly the Intel guys struggled
| to similarly recognize it (and to execute even on plain GPUs.)
|
| But... the technical/academic insight of the CUDA/GPU vision in
| my view came from Ian Buck's "Brook" PhD thesis at Stanford under
| Pat Hanrahan (Pixar+Tableau co-founder, Turing Award Winner) and
| Ian promptly took it to Nvidia where it was commercialized under
| Jensen.
|
| For a good telling of this under-told story, see one of
| Hanrahan's lectures at MIT:
| https://www.youtube.com/watch?v=Dk4fvqaOqv4
|
| Corrections welcome.
| markhahn wrote:
| Jensen embraced AI as a way to recover TAM after ASICs took
| over crypto mining. You can see that between-period in NVidia
| revenue and profit graphs.
|
| By that time, GP-GPU had been around for a long, long time.
| CUDA still doesn't have much to do with AI - sure, it supports
| AI usage, even includes some AI-specific features (low-mixed
| precision blocked operations).
| cameldrv wrote:
| Jensen embraced AI way before that. CuDNN was released back
| in 2014. I remember being at ICLR in 2015, and there were
| three companies with booths: Google and Facebook who were
| recruiting, and NVIDIA was selling a 4 GPU desktop computer.
| dartos wrote:
| Well as soon as matmul has a marketable use (ML predictive
| algorithms) nvidia was on top of it.
|
| I don't think they were thinking of LLMs in 2014, tbf.
| aleph_minus_one wrote:
| > Jensen embraced AI as a way to recover TAM after ASICs took
| over crypto mining.
|
| TAM: Total Addressable Market
| AvAn12 wrote:
| Three legs to the stool - the NN algorithms, the data, and the
| labels. I think the first two are obvious but think about how
| much human time and care went into labeling millions of images...
| fragmede wrote:
| And the compute power!
| hyperific wrote:
| The article mentions Support Vector Machines being the hot topic
| in 2008. Is anyone still using/researching these?
|
| I often wonder how many useful technologies could exist if trends
| went a different way. Where would we be if neural nets hadn't
| caught on and SVMs and expert systems had.
| spencerchubb wrote:
| in insurance we use older statistical methods that are easily
| interpretable, because we are required to have rates approved
| by departments of insurance
___________________________________________________________________
(page generated 2024-11-06 23:02 UTC)