[HN Gopher] Why the deep learning boom caught almost everyone by...
       ___________________________________________________________________
        
       Why the deep learning boom caught almost everyone by surprise
        
       Author : slyall
       Score  : 201 points
       Date   : 2024-11-06 04:05 UTC (18 hours ago)
        
 (HTM) web link (www.understandingai.org)
 (TXT) w3m dump (www.understandingai.org)
        
       | arcmechanica wrote:
       | It was basically useful to average people and wasn't just some
       | way to steal and resell data or dump ad after ad on us. A lot of
       | dark patterns really ruin services.
        
       | teknover wrote:
       | "Nvidia invented the GPU in 1999" wrong on many fronts.
       | 
       | Arguably the November 1996 launch of 3dfx kickstarted GPU
       | interest and OpenGL.
       | 
       | After reading that, it's hard to take author seriously on the
       | rest of the claims.
        
         | ahofmann wrote:
         | Wow, that is harsh. The quoted claim is in the middle of a very
         | long article. The background of the author seems to be more on
         | the scientific side, than the technical side. So throw out
         | everything, because the author got one (not very important)
         | date wrong?
        
           | RicoElectrico wrote:
           | Revisionist marketing should not be given a free pass.
        
             | twelve40 wrote:
             | yet it's almost the norm these days. Sick of hearing Steve
             | Jobs invented smartphones when I personally was using a
             | device with web and streaming music years before that.
        
               | kragen wrote:
               | You don't remember when Bill Gates and AOL invented the
               | internet, Apple invented the GUI, and Tim Berners-Lee
               | invented hypertext?
        
         | santoshalper wrote:
         | Possibly technically correct, but utterly irrelevant. The 3dfx
         | chips accelerated parts of the 3d graphics pipeline and were
         | not general-purpose programmable computers the way a modern GPU
         | is (and thus would be useless for deep learning or any other
         | kind of AI).
         | 
         | If you are going to count 3dfx as a proper GPU and not just a
         | geometry and lighting accelerator, then you might as well go
         | back further and count things like the SGI Reality Engine.
         | Either way, 3dfx wasn't really first to anything meaningful.
        
           | FeepingCreature wrote:
           | But the first NVidia GPUs didn't have general-purpose compute
           | either. Google informs me that the first GPU with user-
           | programmable shaders was the GeForce 3 in 2001.
        
         | rramadass wrote:
         | After actually having read the article i can say that your
         | comment is unnecessarily negative and clueless.
         | 
         | The article is a very good historical one showing how 3
         | important things came together to make the current progress
         | possible viz;
         | 
         | 1) Geoffrey Hinton's back-propagation algorithm for deep neural
         | networks
         | 
         | 2) Nvidia's GPU hardware used via CUDA for AI/ML and
         | 
         | 3) Fei-Fei Li's huge ImageNet database to train the algorithm
         | on the hardware. This team actually used "Amazon Mechanical
         | Turk"(AMT) to label the massive dataset of 14 million images.
         | 
         | Excerpts;
         | 
         |  _"Pre-ImageNet, people did not believe in data," Li said in a
         | September interview at the Computer History Museum. "Everyone
         | was working on completely different paradigms in AI with a tiny
         | bit of data."_
         | 
         |  _"That moment was pretty symbolic to the world of AI because
         | three fundamental elements of modern AI converged for the first
         | time," Li said in a September interview at the Computer History
         | Museum. "The first element was neural networks. The second
         | element was big data, using ImageNet. And the third element was
         | GPU computing."_
        
         | Someone wrote:
         | I wound not call it invent", but it seems Nvidia defined the
         | term _GPU_. See https://www.britannica.com/technology/graphics-
         | processing-un... and
         | https://en.wikipedia.org/wiki/GeForce_256#Architecture:
         | 
         |  _"GeForce 256 was marketed as "the world's first 'GPU', or
         | Graphics Processing Unit", a term Nvidia defined at the time as
         | "a single-chip processor with integrated transform, lighting,
         | triangle setup/clipping, and rendering engines that is capable
         | of processing a minimum of 10 million polygons per second""_
         | 
         | They may have been the first with a product that fitted that
         | definition to market.
        
           | kragen wrote:
           | That sounds like marketing wank, not a description of an
           | invention.
           | 
           | I don't think you can get a speedup by running neural
           | networks on the GeForce 256, and the features listed there
           | aren't really relevant (or arguably even _present_ ) in
           | today's GPUs. As I recall, people were trying to figure out
           | how to use GPUs to get faster processing in their Beowulfs in
           | the late 90s and early 21st century, but it wasn't until
           | about 02005 that anyone could actually get a speedup. The
           | PlayStation 3's "Cell" was a little more flexible.
        
         | KevinMS wrote:
         | Can confirm. I was playing Unreal on my dual Voodoo2 SLI rig
         | back in 1998.
        
         | kragen wrote:
         | Arguably the November 01981 launch of Silicon Graphics
         | kickstarted GPU interest and OpenGL. You can read Jim Clark's
         | 01982 paper about the Geometry Engine in https://web.archive.or
         | g/web/20170513193926/http://excelsior..... His first key point
         | in the paper was that the chip had a "general instruction set",
         | although what he meant by it was quite different from today's
         | GPUs. IRIS GL started morphing into OpenGL in 01992, and
         | certainly when I went to SIGGRAPH 93 it was full of hardware-
         | accelerated 3-D drawn with OpenGL on Silicon Graphics Hardware.
         | But graphics coprocessors date back to the 60s; Evans &
         | Sutherland was founded in 01968.
         | 
         | I mean, I certainly don't think NVIDIA invented the GPU--that's
         | a clear error in an otherwise pretty decent article--but it was
         | a pretty gradual process.
        
         | binarybits wrote:
         | Defining who "really" invented something is often tricky. For
         | example I mentioned in the article that there is some dispute
         | about who discovered backpropagation. A
         | 
         | According to Wikipedia, Nvidia released its first product, the
         | RV1, in November 1995, the same month 3dfx released its first
         | Voodoo Graphics 3D chip. Is there reason to think the 3dfx card
         | was more of a "true" GPU than the RV1? If not, I'd say Nvidia
         | has as good a claim to inventing the GPU as 3dfx does.
        
           | in3d wrote:
           | NV1, not RV1.
           | 
           | 3dfx Voodoo cards were initially more successful, but I don't
           | think anything not actually used for deep learning should
           | count.
        
       | vdvsvwvwvwvwv wrote:
       | Lesson: ignore detractors. Especially if their argument is "dont
       | be a tall poppy"
        
         | jakeNaround wrote:
         | The lesson is reality is not the dialectics and symbolic logic
         | but all the stuff in it.
         | 
         | Study story problems and you end up with string theory. Study
         | data computed from endless world of stuff, find utility.
         | 
         | What a shock building the bridge is more useful than a drawer
         | full of bridge designs.
        
           | aleph_minus_one wrote:
           | > What a shock building the bridge is more useful than a
           | drawer full of bridge designs.
           | 
           | Here, opinions will differ.
        
         | psd1 wrote:
         | Also: look for fields that have stagnated, where progress is
         | enabled by apparently-unrelated innovations elsewhere
        
         | xanderlewis wrote:
         | Unfortunately, they're usually right. We just don't hear about
         | all the time wasted.
        
           | blitzar wrote:
           | On several occasions I have heard "they said it couldn't be
           | done" - only to discover that yes it is technically correct,
           | however, "they" was on one random person who had no clue and
           | anyone with any domain knowledge said it was reasonable.
        
             | friendzis wrote:
             | Usually when I hear "they said it couldn't be done", it is
             | used as triumphant downplay of legitimate critique. If you
             | dig deeper that "couldn't be done" usually is in relation
             | to some constraints or performance characteristics, which
             | the "done" thing still does not meet, but the goalposts
             | have already been moved.
        
               | Ukv wrote:
               | > that "couldn't be done" usually is in relation to some
               | constraints or performance characteristics, which the
               | "done" thing still does not meet
               | 
               | I'd say theoretical proofs of impossibility tend to make
               | valid logical deductions within the formal model they set
               | up, but the issue is that model often turns out to be a
               | deficient representation of reality.
               | 
               | For instance, Minsky and Papert's Perceptrons book,
               | credited in part with prompting the 1980s AI winter,
               | gives a valid mathematical proof about inability of
               | networks within their framework to represent the XOR
               | function. This function is easily solved by multilayer
               | neural networks, but Minsky/Papert considered those to be
               | a "sterile" extension and believed neural networks
               | trained by gradient descent would fail to scale up.
               | 
               | Or more contemporary, Gary Marcus has been outspoken
               | since 2012 that deep learning is hitting a wall - giving
               | the example that a dense network trained on just `1000 ->
               | 1000`, `0100 -> 0100`, `0010 -> 0010` can't then reliably
               | predict `0001 -> 0001` because the fourth output neuron
               | was never activated in training. Similarly, this function
               | is easily solved by transformers representing
               | input/output as a sequence of tokens thus not needing to
               | light up an untrained neuron to give the answer (nor do
               | humans when writing/speaking the answer).
               | 
               | If I claimed that it was topologically impossible to
               | drink a Capri-Sun, then someone comes along and punctures
               | it with a straw (an unaccounted for advancement from the
               | blindspot of my model), I could maybe cling on and argue
               | that my challenge remains technically true and unsolved
               | because that violates one of the constraints I set out -
               | but at the very least the relevance of my proof to
               | reality has diminished and it may no longer support the
               | viewpoints/conclusions I intended it to ("don't buy
               | Capri-Sun"). Not to say that theoretical results can't
               | still be interesting in their own right - like the
               | halting problem, which does not apply to real computers.
        
               | marcosdumay wrote:
               | It's extremely common that legitimate critique gets used
               | to illegitimately attack people doing things differently
               | enough that the relative importance of several factors
               | change.
               | 
               | This is really, really common. And it's done both by
               | mistake and in bad faith. In fact, it's a guarantee that
               | once anybody tries anything different enough, they'll be
               | constantly attacked this way.
        
           | vdvsvwvwvwvwv wrote:
           | What if the time wasted is part of the search? The hive wins
           | but a bee may not. (Capitalism means some bees win too)
        
             | xanderlewis wrote:
             | It is. But most people are not interested in simply being
             | 'part of the search' -- they want a career, and that relies
             | on individual success.
        
       | madaxe_again wrote:
       | I can't be the only one who has watched this all unfold with a
       | sense of inevitability, surely.
       | 
       | When the first serious CUDA based ML demos started appearing a
       | decade or so ago, it was, at least to me, pretty clear that this
       | would lead to AGI in 10-15 years - and here we are. It was the
       | same sort of feeling as when I first saw the WWW aged 11, and
       | knew that this was going to eat the world - and here we are.
       | 
       | The thing that flummoxes me is that now that we are so obviously
       | on this self-reinforcing cycle, how many are still insistent that
       | AI will amount to nothing.
       | 
       | I am reminded of how the internet was just a fad - although this
       | is going to have an even greater impact on how we live, and our
       | economies.
        
         | BriggyDwiggs42 wrote:
         | Downvoters are responding to a perceived arrogance. What does
         | agi mean to you?
        
           | nineteen999 wrote:
           | Could be arrogance, or could be the delusion.
        
             | madaxe_again wrote:
             | Why is it a delusion, in your opinion?
        
               | andai wrote:
               | It's a delusion on the part of the downvoters.
        
             | BriggyDwiggs42 wrote:
             | Indeed, it sure could be arrogance.
        
         | xen0 wrote:
         | What makes you think AGI is either here or imminent?
         | 
         | For me the current systems still clearly fall short of that
         | goal.
        
           | madaxe_again wrote:
           | They do fall short, but progress in this field is not linear.
           | This is the bit that I struggle to comprehend - that which
           | was literally infeasible only a few years ago is now mocked
           | and derided.
           | 
           | It's like jet engines and cheap intercontinental travel
           | becoming an inevitability once the rubicon of powered flight
           | is crossed - and everyone bitching about the peanuts while
           | they cruise at inconceivable speed through the atmosphere.
        
             | diffeomorphism wrote:
             | Just like supersonic travel between Europe and America
             | becoming common place was inevitable. Oh, wait.
             | 
             | Optimism is good, blind optimism isn't.
        
               | madaxe_again wrote:
               | It _is_ yet inevitable - but it wasn't sustainable in the
               | slightest when it was first attempted - Concorde was akin
               | to the Apollo programme, in being precocious and
               | prohibitively expensive due to the technical limitations
               | of the time. It will, ultimately, be little more
               | remarkable than flying is currently, even as we hop
               | around on suborbital trajectories.
               | 
               | It isn't a question of optimism - in fact, I am deeply
               | pessimistic as to what ML will mean for humanity as a
               | whole, at least in the short term - it's a question of
               | seeing the features of a confluence of technology, will,
               | and knowledge that has in the past spurred technical
               | revolution.
               | 
               | Newcomen was far from the first to develop a steam
               | engine, but there was suddenly demand for such beasts, as
               | shallow mines became exhausted, and everything else
               | followed from that.
               | 
               | ML has been around in one form or another for decades now
               | - however we are now at the point where the technology
               | exists, insofar as modern GPUs exist, the will exists,
               | insofar as trillions of dollars of investment flood into
               | the space, and the knowledge exists, insofar as we have
               | finally produced machine learning models which are non-
               | trivial.
               | 
               | Just as with powered flight, the technology - the
               | internal combustion engine - had to be in place, as did
               | the will (the First World War), and the knowledge, which
               | we had possessed for centuries but had no means or will
               | to act upon. The idea was, in fact, _ridiculous_. Nobody
               | could see the utility - until someone realised you could
               | use them to drop ordnance on your enemies.
               | 
               | With supersonic flight - the technology is emerging, the
               | will will be provided by the substantial increase in
               | marginal utility provided by sub-hour transit compared to
               | the relatively small improvement Concorde offered, and
               | the knowledge, again, we already have.
               | 
               | So no, not optimism - just observance of historical
               | forces. When you put these things together, there tend to
               | be technical revolutions, and resultant societal change.
        
               | xen0 wrote:
               | > It is yet inevitable
               | 
               | The cool thing for you is that this statement is
               | unfalsifiable.
               | 
               | But what I really took issue with was your timeframe in
               | the original comment. Your statements imply you fully
               | expect AGI to be a thing within a couple of years. I do
               | not.
        
               | marcosdumay wrote:
               | > insofar as modern GPUs exist, the will exists, insofar
               | as trillions of dollars of investment flood into the
               | space
               | 
               | Funny thing. I expect deep learning to have a really bad
               | next decade, people betting on it to see quick bad
               | results, and maybe it even disappear from the visible
               | economy for a while exactly because there have been
               | hundreds of billions of dollars invested.
               | 
               | What is no fault of the technology, that I expect to have
               | some usefulness on the long term. I expect a really bad
               | reaction to it coming exclusively from the excessive
               | hype.
        
         | oersted wrote:
         | What do you think is next?
        
           | madaxe_again wrote:
           | An unravelling, as myriad possibilities become actualities.
           | The advances in innumerate fields that ML will unlock will
           | have enormous impacts.
           | 
           | Again, I cannot understand for the life of me how people
           | cannot see this.
        
             | alexander2002 wrote:
             | I had a hypothesis once and It is probably 1000% wrong. But
             | I will state here. /// Once computers can talk to other
             | computers over network in human friendly way <abstraction
             | by llm> and such that these entities completely control our
             | interfaces which we humans can easily do and use them
             | effectively multi-modality then I think there is a slight
             | chance "I" might belive there is AGI or atleast some
             | indications of it
        
               | marcosdumay wrote:
               | It's unsettling how the Turing Test turned out to be so
               | independent of AGI, isn't it?
        
             | selimthegrim wrote:
             | Innumerable?
        
       | 2sk21 wrote:
       | I'm surprised that the article doesn't mention that one of the
       | key factors that enabled deep learning was the use of RELU as the
       | activation function in the early 2010s. RELU behaves a lot better
       | than the logistic sigmoid that we used until then.
        
         | sanxiyn wrote:
         | Geoffrey Hinton (now a Nobel Prize winner!) himself did a
         | summary. I think it is the single best summary on this topic.
         | Our labeled datasets were thousands of times too small.
         | Our computers were millions of times too slow.       We
         | initialized the weights in a stupid way.       We used the
         | wrong type of non-linearity.
        
           | imjonse wrote:
           | That is a pithier formulation of the widely accepted summary
           | of "more data + more compute + algo improvements"
        
             | sanxiyn wrote:
             | No, it isn't. It emphasizes importance of Glorot
             | initialization and ReLU.
        
         | cma wrote:
         | As compute has outpaced memory bandwidth most recent stuff has
         | moved away from ReLU. I think Llama 3.x uses SwiGLU. Still
         | probably closer to ReLU than logistic sigmoid, but it's back to
         | being something more smooth than ReLU.
        
           | 2sk21 wrote:
           | Indeed, there have been so many new activation functions that
           | I have stopped following the literature after I retired. I am
           | glad to see that people are trying out new things.
        
       | DeathArrow wrote:
       | I think neural nets are just a subset of machine learning
       | techniques.
       | 
       | I wonder what would have happened if we poured the same amount of
       | money, talent and hardware into SVMs, random forests, KNN, etc.
       | 
       | I don't say that transformers, LLMs, deep learning and other
       | great things that happened in the neural network space aren't
       | very valuable, because they are.
       | 
       | But I think in the future we should also study other options
       | which might be better suited than neural networks for some
       | classes of problems.
       | 
       | Can a very large and expensive LLM do sentiment analysis or
       | classification? Yes, it can. But so can simple SVMs and KNN and
       | sometimes even better.
       | 
       | I saw some YouTube coders doing calls to OpenAI's o1 model for
       | some very simple classification tasks. That isn't the best tool
       | for the job.
        
         | Meloniko wrote:
         | And based on what though do you think that?
         | 
         | I think neural networks are fundamental and we will
         | focus/experiment a lot more with architecture, layers and other
         | parts involved but emerging features arise through size
        
         | mentalgear wrote:
         | KANs (Kolmogorov-Arnold Networks) are one example of a
         | promising exploration pathway to real AGI, with the advantage
         | of full explain-ability.
        
           | astrange wrote:
           | "Explainable" is a strong word.
           | 
           | As a simple example, if you ask a question and part of the
           | answer is directly quoted from a book from memory, that text
           | is not computed/reasoned by the AI and so doesn't have an
           | "explanation".
           | 
           | But I also suspect that any AGI would necessarily produce
           | answers it can't explain. That's called intuition.
        
             | diffeomorphism wrote:
             | Why? If I ask you what the height of the Empire State
             | Building is, then a reference is a great, explainable
             | answer.
        
               | astrange wrote:
               | It wouldn't be a reference; "explanation" for an LLM
               | means it tells you which of its neurons were used to
               | create the answer, ie what internal computations it did
               | and which parts of the input it read. Their architecture
               | isn't capable of referencing things.
               | 
               | What you'd get is an explanation saying "it quoted this
               | verbatim", or possibly "the top neuron is used to output
               | the word 'State' after the word 'Empire'".
               | 
               | You can try out a system here:
               | https://monitor.transluce.org/dashboard/chat
               | 
               | Of course the AI could incorporate web search, but then
               | what if the explanation is just "it did a web search and
               | that was the first result"? It seems pretty difficult to
               | recursively make every external tool also explainable...
        
               | Retric wrote:
               | LLM's are not the only possible option here. When talking
               | about AGI none of what we are doing is currently that
               | promising.
               | 
               | The search is for something that can write an essay,
               | drive a car, and cook lunch so we need something new.
        
               | Vampiero wrote:
               | When people talk about explainability I immediately think
               | of Prolog.
               | 
               | A Prolog query is explainable precisely because, by
               | construction, it itself is the explanation. And you can
               | go step by step and understand how you got a particular
               | result, inspecting each variable binding and predicate
               | call site in the process.
               | 
               | Despite all the billions being thrown at modern ML, no
               | one has managed to create a model that does something
               | like what Prolog does with its simple recursive
               | backtracking.
               | 
               | So the moral of the story is that you can 100% trust the
               | result of a Prolog query, but you can't ever trust the
               | output of an LLM. Given that, which technology would you
               | rather use to build software on which lives depend on?
               | 
               | And which of the two methods is more "artificially
               | intelligent"?
        
               | astrange wrote:
               | The site I linked above does that for LLaMa 8B.
               | 
               | https://transluce.org/observability-interface
               | 
               | LLMs don't have enough self-awareness to produce really
               | satisfying explanations though, no.
        
               | diffeomorphism wrote:
               | Then you should have a stronger notion of "explanation".
               | Why were these specific neurons activated?
               | 
               | Simplest example: OCR. A network identifying digits can
               | often be explained as recognizing lines, curves, numbers
               | of segments etc.. That is an explanation, not "computer
               | says it looks like an 8"
        
               | krisoft wrote:
               | But can humans do that? If you show someone a picture of
               | a cat, can they "explain" why is it a cat and not a dog
               | or a pumpkin?
               | 
               | And is that explanation the way how they obtained the
               | "cat-nes" of the picture, or do they just see that it is
               | a cat immediately and obviously and when you ask them for
               | an explanation they come up with some explaining noises
               | until you are satisfied?
        
               | diffeomorphism wrote:
               | Wild cat, house cat, lynx,...? Sure, they can. They will
               | tell you about proportions, shape of the ears, size as
               | compared to other objects in the picture etc.
               | 
               | For cat vs pumpkin they will think you are making fun of
               | them, but it very much is explainable. Though now I am
               | picturing a puzzle about finding orange cats in a picture
               | of a pumpkin field.
        
               | fragmede wrote:
               | Shown a picture of a cloud, why it looks like a cat does
               | sometimes need an explanation until others can see the
               | cat, and it's not just "explaining noises".
        
         | trhway wrote:
         | >I wonder what would have happened if we poured the same amount
         | of money, talent and hardware into SVMs, random forests, KNN,
         | etc.
         | 
         | people did that to horses. No car resulted from it, just
         | slightly better horses.
         | 
         | >I saw some YouTube coders doing calls to OpenAI's o1 model for
         | some very simple classification tasks. That isn't the best tool
         | for the job.
         | 
         | This "not best tool" is just there for the coders to call while
         | the "simple SVMs and KNN" would require coding and training by
         | those coders for the specific task they have at hand.
        
           | guappa wrote:
           | [citation needed]
        
         | empiko wrote:
         | Deep learning is easy to adapt to various domains, use cases,
         | training criteria. Other approaches do not have the flexibility
         | of combining arbitrary layers and subnetworks and then training
         | them with arbitrary loss functions. The depth in deep learning
         | is also pretty important, as it allows the model to create
         | hierarchical representations of the inputs.
        
           | f1shy wrote:
           | But is very hard to validate for important or critical
           | applications
        
         | jasode wrote:
         | _> I wonder what would have happened if we poured the same
         | amount of money, talent and hardware into SVMs, random forests,
         | KNN, etc._
         | 
         | But that's backwards from how new techniques and progress is
         | made. What actually happens is somebody (maybe a student at a
         | university) has an _insight or new idea for an algorithm_ that
         | 's near $0 cost to implement a proof-of concept. Then everybody
         | else notices the improvement and then extra millions/billions
         | get directed toward it.
         | 
         | New ideas -- that didn't cost much at the start -- ATTRACT the
         | follow on billions in investments.
         | 
         | This timeline of tech progress in computer science is the
         | opposite from other disciplines such as materials science or
         | bio-medical fields. Trying to discover the next super-alloy or
         | cancer drug all requires expensive experiments. Manipulating
         | atoms & molecules requires very expensive specialized
         | equipment. In contrast, computer science experiments can be
         | cheap. You just need a clever insight.
         | 
         | An example of that was the 2012 AlexNet image recognition
         | algorithm that blew all the other approaches out of the water.
         | Alex Krizhevsky had an new insight on a convolutional neural
         | network to run on CUDA. He bought 2 NVIDIA cards (GTX580 3GB
         | GPU) from Amazon. It didn't require NASA levels of investment
         | at the start to implement his idea. Once everybody else noticed
         | his superior results, the billions began pouring in to
         | iterate/refine on CNNs.
         | 
         | Both the "attention mechanism" and the refinement of
         | "transformer architecture" were also cheap to prove out at a
         | very small scale. In 2014, Jakob Uszkoreit thought about an
         | "attention mechanism" instead of RNN and LSTM for machine
         | translation. It didn't cost billions to come up with that idea.
         | Yes, ChatGPT-the-product cost billions but the "attention
         | mechanism algorithm" did not.
         | 
         |  _> into SVMs, random forests, KNN, etc._
         | 
         | If anyone has found an unknown insight into SVM, KNN, etc that
         | everybody else in the industry has overlooked, they can do
         | cheap experiments to prove it. E.g. The entire Wikipedia text
         | download is currently only ~25GB. Run the new SVM
         | classification idea on that corpus. Very low cost experiments
         | in computer science algorithms can still be done in the
         | proverbial "home garage".
        
           | FrustratedMonky wrote:
           | "$0 cost to implement a proof-of concept"
           | 
           | This falls apart for breakthroughs that are not zero cost to
           | do a proof-of concept.
           | 
           | Think that is what the parent is rereferring . That other
           | technologies might have more potential, but would take money
           | to build out.
        
           | scotty79 wrote:
           | Do transformer architecture and attention mechanisms actually
           | give any benefit to anything else than scalability?
           | 
           | I though the main insights were embeddings, positional
           | encoding and shortcuts through layers to improve back
           | propagation.
        
           | DeathArrow wrote:
           | True, you might not need lots of money to test some ideas.
           | But LLMs and transformers are all the rage so they gather all
           | attention and research funds.
           | 
           | People don't even think of doing anything else and those that
           | might do, are paid to pursue research on LLMs.
        
         | edude03 wrote:
         | Transformers were made for machine translation - someone had
         | the insight that when going from one language to another the
         | context mattered such that the tokens that came before would
         | bias which ones came after. It just so happened that
         | transformers we more performant on other tasks, and at the time
         | you could demonstrate the improvement on a small scale.
        
         | ldjkfkdsjnv wrote:
         | This is such a terrible opinion, im so tired of reading the LLM
         | deniers
        
         | f1shy wrote:
         | > neural nets are just a subset of machine learning techniques.
         | 
         | Fact by definition
        
         | dr_dshiv wrote:
         | The best tool for the job is, I'd argue, the one that does the
         | job most reliably for the least amount of money. When you
         | consider how little expertise or data you need to use openai
         | offerings, I'd be surprised if sentiment analysis using
         | classical ML methods are actually better (unless you are an
         | expert and have a good dataset).
        
         | jensgk wrote:
         | > I wonder what would have happened if we poured the same
         | amount of money, talent and hardware into SVMs, random forests,
         | KNN, etc.
         | 
         | From my perspective, that is actually what happened between the
         | mid-90s to 2015. Neural netowrks were dead in that period, but
         | any other ML method was very, very hot.
        
       | macrolime wrote:
       | I took some AI courses around the same time as the author, and I
       | remember the professors were actually big proponents of neural
       | nets, but they believed the missing piece was some new genius
       | learning algorithm rather than just scaling up with more data.
        
         | rramadass wrote:
         | > rather than just scaling up with more data.
         | 
         | That was the key takeaway for me from this article. I didn't
         | know of Fei-Fei Li's ImageNet contribution which actually gave
         | all the other researchers the essential data to train with. Her
         | intuition that more data would probably make the accuracy of
         | existing algorithms better i think is very much under
         | appreciated.
         | 
         | Key excerpt;
         | 
         |  _So when she got to Princeton, Li decided to go much bigger.
         | She became obsessed with an estimate by vision scientist Irving
         | Biederman that the average person recognizes roughly 30,000
         | different kinds of objects. Li started to wonder if it would be
         | possible to build a truly comprehensive image dataset--one that
         | included every kind of object people commonly encounter in the
         | physical world._
        
       | aithrowawaycomm wrote:
       | I think there is a slight disconnect here between making AI
       | systems which are smart and AI systems which are _useful._ It's a
       | very old fallacy in AI: pretending tools which _assist_ human
       | intelligence by solving human problems must themselves be
       | intelligent.
       | 
       | The utility of big datasets was indeed surprising, but that
       | skepticism came about from recognizing the scaling paradigm
       | _must_ be a dead end: vertebrates across the board require less
       | data to learn new things, by several orders of magnitude. Methods
       | to give ANNs "common sense" are essentially identical to the old
       | LISP expert systems: hard-wiring the answers to specific common-
       | sense questions in either code or training data, even though fish
       | and lizards can rapidly make common-sense deductions about
       | manmade objects they couldn't have possibly seen in their
       | evolutionary histories. Even spiders have generalization
       | abilities seemingly absent in transformers: they spin webs inside
       | human homes with unnatural geometry.
       | 
       | Again it is surprising that the ImageNet stuff worked as well as
       | it did. Deep learning is undoubtedly a useful way to build
       | applications, just like Lisp was. But I think we are about as
       | close to AGI as we were in the 80s, since we have made zero
       | progress on common sense: in the 80s we knew Big Data can poorly
       | emulate common sense, and that's where we're at today.
        
         | j_bum wrote:
         | > vertebrates across the board require less data to learn new
         | things, by several orders of magnitude.
         | 
         | Sometimes I wonder if it's fair to say this.
         | 
         | Organisms have had billions of years of training. We might come
         | online and succeed in our environments with very little data,
         | but we can't ignore the information that's been trained into
         | our DNA, so to speak.
         | 
         | What's billions of years of sensory information that drove
         | behavior and selection, if not training data?
        
           | aithrowawaycomm wrote:
           | My primary concern is the generalization to manmade things
           | that couldn't possibly be in the evolutionary "training
           | data." As a thought experiment, it seems very plausible that
           | you can train a transformer ANN on spiderwebs between trees,
           | rocks, bushes, etc, and get "superspider" performance (say in
           | a computer simulation). But I strongly doubt this will
           | generalize to building webs between garages and pantries like
           | actual spiders, no matter how many trees you throw at it, so
           | such a system wouldn't be ASI.
           | 
           | This extends to all sorts of animal cognitive experiments:
           | crows understand simple pulleys simply by inspecting them,
           | but they couldn't have evolved to _use_ pulleys. Mice can
           | quickly learn that hitting a button 5 times will give them a
           | treat: does it make sense to say that they encountered a
           | similar situation in their evolutionary past? It makes more
           | sense to suppose that mice and crows have powerful abilities
           | to reason causally about their actions. These abilities are
           | more sophisticated than mere "Pavlovian" associative
           | reasoning, which is about understanding _stimuli_. With AI we
           | can emulate associative reasoning very well because we have a
           | good mathematical framework for Pavlovian responses as a sort
           | of learning of correlations. But causal reasoning is much
           | more mysterious, and we are very far from figuring out a good
           | mathematical formalism that a computer can make sense of.
           | 
           | I also just detest the evolution = training data metaphor
           | because it completely ignores architecture. Evolution is not
           | just glomming on data, it's trying different types of
           | neurons, different connections between them, etc. All
           | organisms alive today evolved with "billions of years of
           | training," but only architecture explains why we are so much
           | smarter than chimps. In fact I think the "evolution" preys on
           | our misconception that humans are "more evolved" than chimps,
           | but our common ancestor was more primitive than a chimp.
        
             | visarga wrote:
             | I don't think "humans/animals learn faster" holds. LLMs
             | learn new things on the spot, you just explain it in the
             | prompt and give an example or two.
             | 
             | A recent paper tested both linguists and LLMs at learning a
             | language with less than 200 speakers and therefore
             | virtually no presence on the web. All from a few pages of
             | explanations. The LLMs come close to humans.
             | 
             | https://arxiv.org/abs/2309.16575
             | 
             | Another example is the ARC-AGI benchmark, where the model
             | has to learn from a few examples to derive the rule. AI
             | models are closing the gap to human level, they are around
             | 55% while humans are at 80%. These tests were specifically
             | designed to be hard for models and easy for humans.
             | 
             | Besides these examples of fast learning, I think the other
             | argument about humans benefiting from evolution is also
             | essential here. Similarly, we can't beat AlphaZero at Go,
             | as it evolved its own Go culture and plays better than us.
             | Evolution is powerful.
        
             | car wrote:
             | It's all in the architecture. Also, biological neurons are
             | orders of magnitude more complex than NN's. There's a
             | plethora of neurotransmitters and all kinds of cellular
             | machinery for dealing with signals (inhibitory, excitatory
             | etc.).
        
           | RaftPeople wrote:
           | > _Organisms have had billions of years of training. We might
           | come online and succeed in our environments with very little
           | data, but we can't ignore the information that's been trained
           | into our DNA, so to speak_
           | 
           | It's not just information (e.g. sets of innate smells and
           | response tendencies), but it's also all of the advanced
           | functions built into our brains (e.g. making sense of
           | different types of input, dynamically adapting the brain to
           | conditions, etc.).
        
           | lubujackson wrote:
           | Good point. And don't forget the dynamically changing
           | environment responding with a quick death for any false path.
           | 
           | Like how good would LLMs be if their training set was built
           | by humans responding with an intelligent signal at every
           | crossroads.
        
           | SiempreViernes wrote:
           | This argument mostly just hollows out the meaning of
           | training: evolution gives you things like arms and ears, but
           | if you say evolution is like training you imply that you
           | could have grown a new kind of arm in school.
        
             | horsawlarway wrote:
             | Training an LLM feels almost exactly like evolution - the
             | gradient is "ability to procreate" and we're selecting
             | candidates from related, randomized genetic traits and
             | iterating the process over and over and over.
             | 
             | Schooling/education feels much more like supervised
             | training and reinforcement (and possibly just context).
             | 
             | I think it's dismissive to assume that evolution hasn't
             | influenced how well you're able to pick up new behavior,
             | because it's highly likely it's not entirely novel in the
             | context of your ancestry, and the traits you have that have
             | been selected for.
        
           | marcosdumay wrote:
           | > but we can't ignore the information that's been trained
           | into our DNA
           | 
           | There's around 600MB in our DNA. Subtract this from the size
           | of any LLM out there and see how much you get.
        
         | rjsw wrote:
         | Maybe we just collectively decided that it didn't matter
         | whether the answer was correct or not.
        
           | aithrowawaycomm wrote:
           | Again I do think these things have utility and the
           | unreliability of LLMs is a bit incidental here. Symbolic
           | systems in LISP are highly reliable, but they couldn't
           | possibly be extended to AGI without another component, since
           | there was no way to get the humans out of the loop: someone
           | had to assign the symbols semantic meaning and encode the
           | LISP function accordingly. I think there's a similar
           | conceptual issue with current ANNs, and LLMs in particular:
           | they rely on far too much formal human knowledge to get off
           | the ground.
        
             | rjsw wrote:
             | I meant more why the "boom caught almost everyone by
             | surprise", people working in the field thought that correct
             | answers would be important.
        
             | nxobject wrote:
             | Barring a stunning discovery that will stop putting the
             | responsibility for NN intelligence on synthetic training
             | set - it looks like NN and symbolic AI may have to coexist,
             | symbiotically.
        
         | spencerchubb wrote:
         | > vertebrates across the board require less data to learn new
         | things
         | 
         | the human brain is absolutely inundated with data, especially
         | from visual, audio, and kinesthetic mediums. the data is a very
         | different form than what one would use to train a CNN or LLM,
         | but it is undoubtedly data. newborns start out literally being
         | unable to see, and they have to develop those neural pathways
         | by taking in the "pixels" of the world for every millisecond of
         | every day
        
         | kirkules wrote:
         | Do you have, offhand, any names or references to point me
         | toward why you think fish and lizards can make rapid common
         | sense deductions about man made objects they couldn't have seen
         | in their evolutionary histories?
         | 
         | Also, separately, I'm only assuming but it seems the reason you
         | think these deductions are different from hard wired answers if
         | that their evolutionary lineage can't have had to make similar
         | deductions. If that's your reasoning, it makes me wonder if
         | you're using a systematic description of decisions and of the
         | requisite data and reasoning systems to make those decisions,
         | which would be interesting to me.
        
         | aleph_minus_one wrote:
         | > I think there is a slight disconnect here between making AI
         | systems which are smart and AI systems which are _useful_. It's
         | a very old fallacy in AI: pretending tools which _assist_ human
         | intelligence by solving human problems must themselves be
         | intelligent.
         | 
         | I have difficulties understanding why you could even believe in
         | such a fallacy: just look around you: most jobs that have to be
         | done require barely any intelligence, and on the other hand,
         | there exist few jobs that _do_ require an insane amount of
         | intelligence.
        
       | kleiba wrote:
       | _> "Pre-ImageNet, people did not believe in data," Li said in a
       | September interview at the Computer History Museum. "Everyone was
       | working on completely different paradigms in AI with a tiny bit
       | of data."_
       | 
       | That's baloney. The old ML adage "there's no data like more data"
       | is as old as mankind itself.
        
         | FrustratedMonky wrote:
         | Not really. This is referring back to the 80's. People weren't
         | even doing 'ML'. And back then people were more focused on
         | teasing out 'laws' in as few data points as possible. The focus
         | was more on formulas and symbols, and finding relationships
         | between individual data points. Not the broad patterns we take
         | for granted today.
        
           | criddell wrote:
           | I would say using backpropagation to train multi-layer neural
           | networks would qualify as ML and we were definitely doing
           | that in 80's.
        
             | UltraSane wrote:
             | Just with tiny amounts of data.
        
               | jensgk wrote:
               | Compared to today. We thought we used large amounts of
               | data at the time.
        
               | UltraSane wrote:
               | "We thought we used large amounts of data at the time."
               | 
               | Really? Did it take at least an entire rack to store?
        
               | jensgk wrote:
               | We didn't measure data size that way. At some point in
               | the future someone would find this dialog, and think that
               | we dont't have large amounts of data now, because we are
               | not using entire solar systems for storage.
        
               | UltraSane wrote:
               | Why can't you use a rack as a unit of storage at the
               | time? Were 19" server racks not in common use yet? The
               | storage capacity of a rack will grow over time.
               | 
               | my storage hierarchy goes 1) 1 storage drive 2) 1 server
               | maxed out with the biggest storage drives available 3) 1
               | rack filled with servers from 2 4) 1 data center filled
               | with racks from 3
        
               | fragmede wrote:
               | How big is a rack in VW beetles though?
               | 
               | It's a terrible measurement because it's an irrelevant
               | detail about how their data is stored that no one
               | actually knows if your data is being stored in a
               | proprietary cloud except for people that work there on
               | that team.
               | 
               | So while someone could say they used a 10 TiB data set,
               | or 10T parameters, how many "racks" of AWS S3 that is, is
               | not known outside of Amazon.
        
           | mistrial9 wrote:
           | mid-90s had neural nets, even a few popular science kinds of
           | books on it. The common hardware was so much less capable
           | then.
        
             | sgt101 wrote:
             | mid-60's had neural nets.
             | 
             | mid-90's had LeCun telling everyone that big neural nets
             | were the future.
        
               | dekhn wrote:
               | Mid 90s I was working on neural nets and other machine
               | learning, based on gradient descent, with manually
               | computed derivatives, on genomic data (from what I can
               | recall, we had no awareness of LeCun; I didnt find out
               | about his great OCR results until much later). it worked
               | fine and it seemed like a promising area.
               | 
               | My only surprise is how long it took to get to imagenet,
               | but in retrospect, I appreciate that a number of
               | conditions had to be met (much more data, much better
               | algorithms, much faster computers). I also didn't
               | recognize just how poorly MLPs were for sequence
               | modelling, compared to RNNs and transformers.
        
               | sgt101 wrote:
               | I'm so out of things ! What do you mean manually computed
               | derivatives?
        
         | evrydayhustling wrote:
         | Not baloney. The culture around data in 2005-2010 -- at least /
         | especially in academia -- was night and day to where it is
         | today. It's not that people didn't understand that more data
         | enabled richer + more accurate models, but that they accepted
         | data constraints as a part of the problem setup.
         | 
         | Most methods research went into ways of building beliefs about
         | a domain into models as biases, so that they could be more
         | accurate in practice with less data. (This describes a lot of
         | PGM work). This was partly because there was still a tug of war
         | between CS and traditional statistics communities on ML, and
         | the latter were trained to be obsessive about model
         | specification.
         | 
         | One result was that the models that were practical for
         | production inference were often trained to the point of
         | diminishing returns on their specific tasks. Engineers
         | deploying ML weren't wishing for more training instances, but
         | better data at inference time. Models that could perform more
         | general tasks -- like differentiating 90k object classes rather
         | than just a few -- were barely even on most people's radar.
         | 
         | Perhaps folks at Google or FB at the time have a different
         | perspective. One of the reasons I went ABD in my program was
         | that it felt industry had access to richer data streams than
         | academia. Fei Fei Li's insistence on building an academic
         | computer science career around giant data sets really was
         | ingenius, and even subversive.
        
           | bsenftner wrote:
           | The culture was and is skeptical in biased manners. Between
           | '04 and '08 I worked with a group that had trained neural
           | nets for 3D reconstruction of human heads. They were using it
           | for prenatal diagnostics and a facial recognition pre-
           | processor, and I was using it for creating digital doubles in
           | VFX film making. By '08 I'd developed a system suitable for
           | use in mobile advertising, creating ads with people in them,
           | and 3D games with your likeness as the player. VCs thought we
           | were frauds, and their tech advisors told them our tech was
           | an old discredited technique that could not do what we
           | claimed. We spoke to every VC, some of which literally kicked
           | us out. Finally, after years of "no" that same AlexNet
           | success begins to change minds, but _now_ they want the tech
           | to create porn. At that point, after years of  "no" I was
           | making children's educational media, there was no way I was
           | gonna do porn. Plus, president of my co was a woman, famous
           | for creating children's media. Yeah, the culture was
           | different then, not too long ago.
        
             | evrydayhustling wrote:
             | Wow, so early for generative -- although I assume you were
             | generating parameters that got mapped to mesh positions,
             | rather than generating pixels?
             | 
             | I definitely remember that bias about neural nets, to the
             | point of my first grad ML class having us recreate proofs
             | that you should never need more than two hidden layers (one
             | can pick up the thread at [1]). Of all the ideas clunking
             | around in the AI toolbox at the time, I don't really have
             | background on why people felt the need to kill NN with
             | fire.
             | 
             | [1] https://en.wikipedia.org/wiki/Universal_approximation_t
             | heore...
        
               | bsenftner wrote:
               | It was annotated face images and 3D scans of heads
               | trained to map one to the other. After a threshold in the
               | size of the training data, good to great results from a
               | single photo could be had to generate the mesh 3D
               | positions, and then again to map the photo onto the mesh
               | surface. Do that with multiple frames, and one is firmly
               | in the Uncanny Valley.
        
             | philipkglass wrote:
             | Who's offering VC money for neural network porn technology?
             | As far as I can tell, there is huge organic demand for this
             | but prospective users are mostly cheapskates and the area
             | is rife with reputational problems, app store barriers,
             | payment processor barriers, and regulatory barriers. In
             | practice I have only ever seen investors scared off by
             | hints that a technology/platform would be well matched to
             | adult entertainment.
        
           | tucnak wrote:
           | > they accepted data constraints as a part of the problem
           | setup.
           | 
           | I've never heard this be put so succinctly! Thank you
        
         | littlestymaar wrote:
         | In 2019, GPT-2 1.5B was trained on ~10B tokens.
         | 
         | Last week Hugging Face released SmolLM v2 1.7B trained on 11T
         | tokens, 3 orders of magnitude more training data for the same
         | number of tokens with almost the same architecture.
         | 
         | So even back in 2019 we can say we were working with a tiny
         | amount of data compared to what is routine now.
        
           | kleiba wrote:
           | True. But my point is that the quote "people didn't believe
           | in data" is not true. Back in 2019, when GPT-2 was trained,
           | the reason they didn't use the 3T of today was not because
           | they "didn't believe in data" - they totally would have had
           | it been technically feasible (as in: they had that much data
           | + the necessary compute).
           | 
           | The same has always been true. There has never been a stance
           | along the lines of "ah, let's not collect more data - it's
           | not worth it!". It's always been other reasons, typically the
           | lack of resources.
        
             | littlestymaar wrote:
             | > they totally would have had it been technically feasible
             | 
             | TinyLlama[1] has been made by _an individual on their own_
             | last year, training a 1.1B model on 3T tokens with just 16
             | A100-40G GPUs in 90 days. It was definitely within reach of
             | any funded org in 2019.
             | 
             | In 2022 (IIRC), Google released the Chinchilla paper about
             | the compute-optimal amount of data to train a given model,
             | for a 1B model, the value was determined to be 20B tokens,
             | which again is 3 orders of magnitude below the current
             | state of the art for the same class of model.
             | 
             | Until very recently (the first llama paper IIRC, and people
             | noticing that the 7B model showed no sign of saturation
             | during its already very long training) the ML community
             | vastly underestimated the amount of training data that was
             | needed to make a LLM perform at its potential.
             | 
             | [1]: https://github.com/jzhang38/TinyLlama
        
         | kleiba wrote:
         | Answering to people arguing against my comment: you guys do not
         | seem to take into account that the technical circumstances were
         | totally different thirty, twenty or even ten years ago! People
         | would have liked to train with more data, and there was a big
         | interest in combining heterogeneous datasets to achieve exactly
         | that. But one major problem was the compute! There weren't any
         | pretrained models that you specialized in one way or the other
         | - you always retrained from scratch. I mean, even today, who's
         | get the capability to train a multibillion GPT from scratch?
         | And not just retraining once a tried and trusted
         | architecture+dataset, no, I mean as a research project trying
         | to optimize your setup towards a certain goal.
        
         | kccqzy wrote:
         | Pre-ImageNet was like pre-2010. Doing ML with massive data
         | really wasn't in vogue back then.
        
           | mistrial9 wrote:
           | except in Ivory Towers of Google + Facebook
        
             | disgruntledphd2 wrote:
             | Even then maybe Google but probably not Facebook. Ads used
             | ML but there wasn't that much of it in feed. Like, there
             | were a bunch of CV projects that I saw in 2013 that didn't
             | use NNs. Three years later, otoh you couldn't find a
             | devserver without tripping over an NN along the way.
        
         | sgt101 wrote:
         | It's not quite so - we couldn't handle it, and we didn't have
         | it, so it was a bit of a none question.
         | 
         | I started with ML in 1994, I was in a small poor lab - so we
         | didn't have state of the art hardware. On the other hand I
         | think my experience is fairly representative. We worked with
         | data sets on spark workstations that were stored in flat files
         | and had thousands or sometimes tens of thousands of instances.
         | We had problems keeping our data sets on the machines and often
         | archived them to tape.
         | 
         | Data came from very deliberate acquisition processes. For
         | example I remember going to a field exercise with a particular
         | device and directing it's use over a period of days in order to
         | collect the data that would be needed for a machine learning
         | project.
         | 
         | Sometime in the 2000's data started to be generated and
         | collected as "exhaust" from various processes. People and
         | organisations became instrumented in the sense that their daily
         | activities were necessarily captured digitally. For a time this
         | data was latent, people didn't really think about using it in
         | the way that we think about it now, but by about 2010 it was
         | obvious that not only was this data available but we had the
         | processing and data systems to use it effectively.
        
       | icf80 wrote:
       | logic is data and data is logic
        
       | hollerith wrote:
       | The deep learning boom caught deep-learning researchers by
       | surprise because deep-learning researchers don't understand their
       | craft well enough to predict essential properties of their
       | creations.
       | 
       | A model is grown, not crafted like a computer program, which
       | makes it hard to predict. (More precisely, a big growth phase
       | follows the crafting phase.)
        
         | lynndotpy wrote:
         | I was a deep learning researcher. The problem is that accuracy
         | (+ related metrics) were prioritized in research and funding.
         | Factors like interpretability, extrapolation, efficiency, or
         | consistency were not prioritized, but were clearly important
         | before being implemented.
         | 
         | Dall-E was the only big surprising consumer model-- 2022 saw a
         | sudden huge leap from "txt2img is kind of funny" to "txt2img is
         | actually interesting". I would have assumed such a thing could
         | only come in 2030 or earlier. But deep learning is full of
         | counterintuitive results (like the NFL theorem not mattering,
         | or ReLU being better than sigmoid).
         | 
         | But in hindsight, it was naive to think "this does not work
         | yet" would get in the way of the products being sold and
         | monetized.
        
         | nxobject wrote:
         | I'm still very taken aback by how far we've been able to take
         | prompting as somehow our universal language to communicate with
         | AI of choice.
        
       | TheRealPomax wrote:
       | It wasn't "almost everyone", it was straight up everyone.
        
       | vl wrote:
       | _So the AI boom of the last 12 years was made possible by three
       | visionaries who pursued unorthodox ideas in the face of
       | widespread criticism._
       | 
       | I argue that Mikolov with word2vec was instrumental in current AI
       | revolution. This demonstrated the easy of extracting meaning in
       | mathematical way from text and directly lead to all advancements
       | we have today with LLMs. And ironically, didn't require GPU.
        
         | MichaelZuo wrote:
         | How much easier was it compared to the next best method at the
         | time?
        
       | gregw2 wrote:
       | The article credits two academics (Hinton, Fei Fei Li) and a CEO
       | (Jensen Huang). But really it was three academics.
       | 
       | Jensen Huang, reasonably, was desperate for any market that could
       | suck up more compute, which he could pivot to from GPUs for
       | gaming when gaming saturated its ability to use compute. Screen
       | resolutions and visible polygons and texture maps only demand so
       | much compute; it's an S-curve like everything else. So from a
       | marketing/market-development and capital investment perspective I
       | do think he deserves credit. Certainly the Intel guys struggled
       | to similarly recognize it (and to execute even on plain GPUs.)
       | 
       | But... the technical/academic insight of the CUDA/GPU vision in
       | my view came from Ian Buck's "Brook" PhD thesis at Stanford under
       | Pat Hanrahan (Pixar+Tableau co-founder, Turing Award Winner) and
       | Ian promptly took it to Nvidia where it was commercialized under
       | Jensen.
       | 
       | For a good telling of this under-told story, see one of
       | Hanrahan's lectures at MIT:
       | https://www.youtube.com/watch?v=Dk4fvqaOqv4
       | 
       | Corrections welcome.
        
         | markhahn wrote:
         | Jensen embraced AI as a way to recover TAM after ASICs took
         | over crypto mining. You can see that between-period in NVidia
         | revenue and profit graphs.
         | 
         | By that time, GP-GPU had been around for a long, long time.
         | CUDA still doesn't have much to do with AI - sure, it supports
         | AI usage, even includes some AI-specific features (low-mixed
         | precision blocked operations).
        
           | cameldrv wrote:
           | Jensen embraced AI way before that. CuDNN was released back
           | in 2014. I remember being at ICLR in 2015, and there were
           | three companies with booths: Google and Facebook who were
           | recruiting, and NVIDIA was selling a 4 GPU desktop computer.
        
             | dartos wrote:
             | Well as soon as matmul has a marketable use (ML predictive
             | algorithms) nvidia was on top of it.
             | 
             | I don't think they were thinking of LLMs in 2014, tbf.
        
           | aleph_minus_one wrote:
           | > Jensen embraced AI as a way to recover TAM after ASICs took
           | over crypto mining.
           | 
           | TAM: Total Addressable Market
        
       | AvAn12 wrote:
       | Three legs to the stool - the NN algorithms, the data, and the
       | labels. I think the first two are obvious but think about how
       | much human time and care went into labeling millions of images...
        
         | fragmede wrote:
         | And the compute power!
        
       | hyperific wrote:
       | The article mentions Support Vector Machines being the hot topic
       | in 2008. Is anyone still using/researching these?
       | 
       | I often wonder how many useful technologies could exist if trends
       | went a different way. Where would we be if neural nets hadn't
       | caught on and SVMs and expert systems had.
        
         | spencerchubb wrote:
         | in insurance we use older statistical methods that are easily
         | interpretable, because we are required to have rates approved
         | by departments of insurance
        
       ___________________________________________________________________
       (page generated 2024-11-06 23:02 UTC)