[HN Gopher] A non-technical explanation of deep learning
___________________________________________________________________
A non-technical explanation of deep learning
Author : tworats
Score : 186 points
Date : 2023-04-25 15:10 UTC (7 hours ago)
(HTM) web link (www.parand.com)
(TXT) w3m dump (www.parand.com)
| amelius wrote:
| The problem with deep learning is opposite. You can understand
| most of it with just high school math. Advanced math is mostly
| useless because of the dimensionality of neural nets.
| Utkarsh_Mood wrote:
| can you elaborate further on what you mean by 'dimensionality
| of neural nets'? Thanks!
| amelius wrote:
| Yes, I mean the huge number of trainable parameters.
| [deleted]
| mach1ne wrote:
| Yes, it's really rather like alchemy in some sense. Stuff
| works, and often nobody knows exactly why.
| [deleted]
| uoaei wrote:
| "I don't follow the latest ML scaling and theory research"
| does not in any way equate to "these things are unknowable".
| lhnz wrote:
| Hm, watching Neel Nanda videos recently and I do get the
| feeling that there are lots of unknowns in ML and also in
| what trained networks have learnt.
| uoaei wrote:
| That's like saying you understand state-of-the-art CFD code
| because you can read Fortran.
|
| There are many aspects to learning systems that we still don't
| have any kind of grasp on, and will take more than a little
| advanced math (statistics/probability theory, transport theory,
| topology, etc.) to understand as a community.
|
| Dunning-Kruger is probably more common in spaces like this one,
| where people carry social capital for being able to "spin up
| quickly". But the true meta-skill of upskilling is turning
| unknown unknowns (UU) into known unknowns (KU), and then into
| known knowns (KK). It's not enough to just jump from UU to KK
| through osmosis by reading blog posts on a news aggregator,
| because there will still be a huge space of unknowns not
| covered by that approach.
| ftxbro wrote:
| > Advanced math is mostly useless because of the dimensionality
| of neural nets.
|
| It depends what you mean by advanced math. There is a lot of
| math that only really comes into play _because_ of the high
| dimensionality! For example math related to tensor wrangling,
| low rank approximations, spectral theory, harmonic theory,
| matrix calculus derivatives, universality principles, and other
| concepts that could be interesting or bewildering or horrifying
| depending how you react to it. Of course some of it is only
| linear algebra of the 'just high school math' kind but that's
| not how I would normally describe it. If you look at the math
| in the proofs in the appendices of the more technical AI papers
| on arxiv there is often some weird stuff in there, not just
| matrix multiply and softmax.
| lhnz wrote:
| I have a few funny analogies that I think kind of work.
|
| 1. "gradient descent" is like tuning a guitar by ear and
| listening to the beat frequencies ("loss") and then decreasing
| these by tuning a string up or down.
|
| 2. the best I can come up with for "backpropagation" is to
| imagine a clever device that can tirelessly optimize a Rube
| Goldberg machine for you but as a science, not an art.
| great_wubwub wrote:
| As someone who knows barely enough to be dangerous, I like this.
| I'm sure it leaves enough out to make most experts angry, but it
| makes a lot of sense to me.
| time_to_smile wrote:
| > I'm sure it leaves enough out to make most experts angry
|
| It's not that it leaves out details, it's that the articles
| metaphors are not actually correct in regards to the way deep
| learning works.
|
| This post mostly confuses both reinforcement learning and
| ensemble models with deep learning. If you only enough "enough
| to be dangerous" then this post will steer your intuition in
| the _wrong_ direction.
| lxe wrote:
| This has Three-Body Problem vibes :)
| time_to_smile wrote:
| > This is how neural networks work: they see many examples and
| get rewarded or punished based on whether their guesses are
| correct.
|
| This description more closely describes reinforcement learning,
| rather than gradient based optimization.
|
| In fact, the entire metaphor of a confused individual being
| slapped or rewarded without understanding what's going on doesn't
| really make sense when considering gradient optimization because
| the gradient wrt the to loss function tells the network _exactly_
| how to change it 's behavior to improve it's performance.
|
| This last point is incredibly important to understand correctly
| since it contains one of the biggest assumptions about network
| behavior: that the optimal solution, or at least good enough for
| our concerns solution, _can be found_ by slowing taking small
| steps in the right direction.
|
| Neural networks are great at _refining_ their beliefs but have a
| difficult time radically changing them. A better analogy might be
| trying to very slowly convince your uncle that climate change is
| real, and not a liberal conspiracy.
|
| edit: it also does a poor job of explaining layers, which reads
| much more similar to how ensemble methods work (lots of little
| classifiers voting) than how deep networks work.
| apomekhanes wrote:
| [flagged]
| hgsgm wrote:
| Non-technical, non-accurate. "Truthy", buzzfeed/huffpo quality.
| nailer wrote:
| Does this article imply there are circumstances where a
| spreadsheet is a cat? What a poor example of technical writing.
| teacpde wrote:
| Not the author, but to the author's defense, it is meant to be
| non-technical. And the first paragraph reads interesting to me.
| nailer wrote:
| Most non technical people would think there are zero
| circumstances where a spreadsheet could be a cat.
| adrianmonk wrote:
| It's obvious from context that it's the content of the
| media. To me at least.
|
| If I play you a song on Spotify and say, "Is this a
| saxophone?", you wouldn't say, "No, it's a iPhone running
| Spotify."
|
| If a policeman holds up a photograph of a person and says,
| "Is this the person who attacked you?", the victim doesn't
| say, "No, it's an 8 by 10 glossy print."
| ricardobeat wrote:
| That's part of the explanation. It might not make sense at
| first, but you'll figure something out to avoid being
| slapped.
| andysinclair wrote:
| He's saying that the spreadsheet represents the "picture" of
| the cat in terms of pixels and RGB values etc.
|
| The algorithm/workers are not really "looking" at a picture of
| a cat, they are analysing and looking for patterns in the data
| that defines the picture of the cat.
| charcircuit wrote:
| Why is violence and praise being used to illustrate gradient
| descent? Why does each person get to see the entire input data?
| _gmax0 wrote:
| The most concise and intuitive line of explanation I've been
| given goes along the lines of this:
|
| 1 - We want to model data, representative of some system, through
| functions.
|
| 2 - Virtually any function can be expressed by a n-th order
| polynomial.
|
| 3 - We wish to learn the parameters, the coefficients, of such
| polynomials.
|
| 4 - Neural networks allow us to brute-force test candidate values
| of such parameters (finding optimal candidate parameters such
| that error between expected and actual values of our dataset are
| minimized)
|
| Whereas prior, methods (e.g. PCA) could only model linear
| relationships, neural networks allowed us to begin modeling non-
| linear ones.
| civilized wrote:
| You don't need neural networks to do polynomial regression.
| Polynomial regression, perhaps surprisingly, can be implemented
| using only (multivariable) linear regression. You just include
| powers of your predictor x as terms in the regression formula:
| y = a + bx + cx^2 + dx^3 + ...
|
| The resulting model is linear, even though there are powers of
| x in your formula. Because x and y are _known_ from the data.
| They 're not what you're solving for, you're solving for the
| unknown _coefficients_ (a, b, c, d...). This gives you a linear
| system of equations in those unknown coefficients, which can be
| solved using standard linear least squares methods.
|
| So fitting polynomials is easy. The problem is that it's not
| that useful. Deep learning has to solve much harder problems to
| get to a useful model.
| jacksnipe wrote:
| Except gradient descent is about as far from brute force as it
| gets
| _gmax0 wrote:
| Sure, under the assumption that your parameter space is
| convex.
| [deleted]
| pedrosorio wrote:
| Mentioning polynomials is a pretty poor way to explain it for
| two reasons:
|
| - It requires some mathematical understanding so will exclude
| some part of the non-technical audience
|
| - It is the incorrect analogy. Non-linearities in neural
| networks have nothing to do with polynomials. In fact,
| polynomial regression is a type of linear regression, and for
| the most part, it sucks.
|
| Also, as someone mentioned, all the "serious" alternative ML
| methods prior to the deep learning revolution allow modeling
| non linearities (even if just through modification of linear
| regressions, like polynomial regression).
| teruakohatu wrote:
| > Whereas prior, methods (e.g. PCA) could only model linear
| relationships,
|
| Prior methods also allowed modelling of non-linear
| relationships, eg. Random Forests.
| lhnz wrote:
| Hm, I don't think that's quite it. I went through my own
| process of learning how neural networks work recently and wrote
| this based on my learning: https://sebinsua.com/bridging-the-
| gap
|
| As far as my understanding goes, you can represent practically
| any function as layers of linear transformations followed by
| non-linear functions (e.g. `ReLU(x) = max(0, x)`). It's this
| sprinkling of non-linearity that allows the networks to be able
| to model complex functions.
|
| However, from my perspective, the secret sauce is (1)
| composability and (2) differentiability. These enable the
| backpropagation process (which is just "the chain rule" from
| calculus) and this is what allows these massive mathematical
| expressions to learn parameters (weights and biases) that
| perform well.
| jstx1 wrote:
| Does stuff like this help anyone?
|
| I still haven't forgiven CGP Grey for changing the title to his
| 2017 ML video to "How AIs, like ChatGPT, learn". The video is
| about genetic algorithms and has nothing to do with ChatGPT. (or
| with anything else in modern AI)
| [deleted]
| gnicholas wrote:
| I read this to see if it would be useful to share with my 9
| year old. After reading it, I think it is not any more useful
| (alone) than watching the 3b1b video on this topic. The video
| is longer, but has more visualizations.
|
| I think that perhaps reading this description after watching
| the video might make the process more memorable. My guess is
| that if I had my daughter read this first, it wouldn't do much
| to make the video easier to parse. Reading this real-world
| example after watching the video could help solidify the
| concept.
|
| Disclaimer: I don't know a lot about AI/ML, so it's possible
| that I am 100% wrong here!
| SnooSux wrote:
| I've barely forgiven him for explaining genetic algorithms and
| acting like they have any relevance to contemporary ML
| research.
|
| The footnote video was an alright explanation of backprop. If
| that were part of the main video that would have been
| reasonable.
|
| I really like his history/geography videos but anything
| technical leave a lot to be desired. And don't get me started
| on Humans Need Not Apply.
| flangola7 wrote:
| Humans Need Not Apply is one of the most phenomenal videos on
| YouTube, what do you think is wrong with it?
| jstx1 wrote:
| > And don't get me started on Humans Need Not Apply.
|
| Well now you have to tell us. :) Many of the concrete
| examples in that video are exaggerated and/or misunderstood
| but the general question it asks - what to do when automation
| makes many people unemployable through no fault of their own
| - seems valid.
| bolyarche wrote:
| [dead]
| Trufa wrote:
| What a strange word to use in that context, why would he need
| to be forgiven by you? How has he wronged you? Seems at worst,
| an honest mistake in a complicated topic.
| gregschlom wrote:
| Maybe GP is a non-native English speaker? This construct
| would be pretty common way for a native French speaker to say
| they are angry at something. Not sure if it's common in
| English as well.
| mrbombastic wrote:
| This is a pretty common phrase in English as well, it is
| not meant to be taken literally.
| Cpoll wrote:
| You're not using "at worst" correctly. What you describe is
| an "at best". Worse would be that CGP Grey deliberately
| picked a misleading title in order to optimize views,
| algorithm, etc.
|
| This is, I think, the case. But I don't begrudge them too
| much, YouTube is cutthroat.
| zvmaz wrote:
| I have met people who think they understand a particular topic I
| am versed in, but actually don't. Similarly, I am often wary that
| I get superficial knowledge about a topic I don't know much about
| through "laymen" resources, and I doubt one can have an
| appropriate level of understanding mainly through analogies and
| metaphors. It's a kind of "epistemic anxiety". Of course, there
| are "laymen" books I stumbled upon which I think go to
| appropriate levels of depth and do not "dumb down" to shallow
| levels the topics, yet remain accessible, like Godel's Proof, by
| Ernest Nagel. I'd be glad to read about similar books on all
| topics, including the one discussed in this thread.
|
| Knowledge is hard to attain...
| wanderlust123 wrote:
| Can you please suggest other books similar in spirit to the
| Nagel book? Would love to read some over summer
| lxe wrote:
| I've noticed that the learning curve stary fairly flat when it
| comes to understanding weights, and layers, and neural
| networks, heck, even what gradient descent is for... but then
| when it comes to actually understanding why optimization
| algorithms are needed, and how they work, things just spiral
| into very hard math territory.
|
| I do think that maybe it feels inaccessible because we
| transition from discrete concepts easily digestible by CS grads
| into some complicated math with very terse mathematic notation,
| yet the math might not be as hard if presented in a way that
| doesn't scare away programmers.
| sainez wrote:
| I find the best way to learn technical topics is to build a
| simplified version of the thing. The trick is to understand the
| relationship between the high level components without getting
| lost in the details. This high level understanding then helps
| inform you when you drill down into specifics.
|
| I think this book is a shining example of that philosophy:
| https://www.buildyourownlisp.com/. In the book, you implement
| an extremely bare-bones version of lisp, but it has been
| invaluable in my career. I found I was able to understand
| nuanced language features much more quickly because I have a
| clear model of how programming languages are decomposed into
| their components.
| joe_the_user wrote:
| _I find the best way to learn technical topics is to build a
| simplified version of the thing. The trick is to understand
| the relationship between the high level components without
| getting lost in the details. This high level understanding
| then helps inform you when you drill down into specifics._
|
| I agree but that's a good guide to build a technical
| understanding of a complex subject, not sufficient-in-itself
| tool set for considering questions in that complex subject.
|
| Especially, I'll people combining some "non-technical
| summary" of quantum-mechanics/Newton Gravity/genetic
| engineer/etc with their personal common sense are constant
| annoyance to me whenever such topics come here.
| nomel wrote:
| > constant annoyance to me whenever such topics come here.
|
| I'll say that thinking about things at the edge of my
| understanding, where "Eureka!" moments are low hanging
| fruit, results in the highest dump of dopamine out of any
| other activity. Having silly fun speculating (and I make it
| clear when I am) my way through some thought process is
| literally the most fun I can have. Seeing those types of
| conversations, full of _genuine_ curiosity, thoughtful
| speculation, and all the resulting corrections
| /discussions/insight, etc, is why I love HN so much, and I
| hope it's always a place to nerd out.
|
| One mans trash is another mans treasure, I suppose. :)
| commandlinefan wrote:
| > an appropriate level of understanding mainly through
| analogies and metaphors
|
| I think it's actually worse than that - somebody who doesn't
| know actually realizes that he doesn't know, but somebody who
| _thinks_ he understands through analogies and metaphors will
| confidently come to the incorrect conclusion and then argue
| with somebody who actually does understand the topic - often
| managing to convince innocent bystanders because his reasoning
| is easier to grasp and the so-called expert seems to be getting
| more and more flustered (as he tries to explain why the analogy
| is actually correct, but oversimplified).
| somenameforme wrote:
| There are a million e.g. number parsing (image to digit) neural
| network type programs on GitHub. Go pick one in your preferred
| language and break it apart, and rebuild it, looking up the
| concepts behind parts you don't understand. After you finish up
| with the above, look up 'the xor problem' to see a common
| practical problem (which creating a network to replicate xor
| _illustrates_ , rather than _is_ ) and you'll be well on your
| way to a nice fundamental understanding, built from the ground
| up.
|
| One of the most interesting things about this topic is that the
| fundamental concepts and implementations are all _really_
| simple. It 's the fact that it actually works that's mind
| boggling. In any case, the above is not a months like affair -
| but like one week of dedicated work.
| giardini wrote:
| Nothing about LLMs?!
| fifteen1506 wrote:
| Yeah, I need something to explain me about those Transformers
| things. I know it was published by Google in 2017 and that it
| is 'magic'.
|
| End of knowledge.
|
| Maybe I should ask ChatGPT?
| jedberg wrote:
| > Maybe I should ask ChatGPT?
|
| You actually should, it spits out a pretty good explanation
| (sometimes).
| wrs wrote:
| This is the funniest refutation of the Chinese Room argument that
| I've seen. Note that at the end, it's still the case that none of
| these people can recognize a cat.
| pringk02 wrote:
| Doesn't that mean it supports the Chinese room argument? I'm
| not sure I follow your reasoning.
|
| (also, popular conciousness forgets that technically the
| Chinese Room argument is only arguing against the much
| narrower, and now philosophically unfashionable, "Hard AI"
| stance as it was held in the 70s)
| tasty_freeze wrote:
| > the Chinese Room argument is only arguing against the much
| narrower, and now philosophically unfashionable, "Hard AI"
| stance as it was held in the 70s
|
| Searle has stood behind his argument in the 70s, but in every
| decade since then too.
|
| The main failure is that most people fundamentally don't
| believe they are mechanistic. If one believe in dualism, then
| it easy to attribute various mental states to that dualism,
| and of course a computer neural network cannot experience
| qualia like humans do.
|
| I don't believe in a soul, and thus believe that a computer
| neural network, probably not today's models but a future one
| that is large enough and has the right recurrent topology,
| will be able to have qualia similar to what humans and
| animals experience.
| wrs wrote:
| I understand the Chinese Room argument to be that because the
| human in the room doesn't understand Chinese, the system
| doesn't understand Chinese. In this case, none of the humans
| can recognize cats, but the collective can.
| cscurmudgeon wrote:
| Thats not the Chinese Room argument. The argument says just
| because a system processes X doesn't imply it has
| consciousness of X.
| onikolas7 wrote:
| Funny. In the game black&white you would slap or pet your avatar
| to train it. The lead AI programmer on that was Demis Hassabis of
| deepmind fame.
| Maultasche wrote:
| The description made me think of Black & White as well. I still
| have memories of smacking my creature around every time he ate
| someone.
| redog wrote:
| Somehow he knew AI would be our Gods.
| pkdpic wrote:
| I love this, but Im always confused in these kinds of analogies
| what the reward / punishment system really equates to...
|
| Also reminds me of Ted Chiang warning us that we will torture
| innumerable AI entities long before we start having real
| conversations about treating them with compassion.
| time_to_smile wrote:
| Don't love it, it's not correct.
|
| > what the reward / punishment system really equates to
|
| Nothing, and least as far as neural network training goes. This
| is an extremely poor analogy regarding how neural networks
| learn.
|
| If you've ever done any kind of physical training and have had
| a trainer sightly adjust the position of your limbs until what
| ever activity you're doing feels better, that's a much closer
| analogy. You're gently searching the space of possible correct
| positions, guided by an algorithm (your trainer) that knows how
| to move you towards a more correct solution.
|
| There's nothing analogous to a "reward" or "punishment" when
| neural networks are learning.
| GaggiX wrote:
| >There's nothing analogous to a "reward" or "punishment" when
| neural networks are learning.
|
| Well deep reinforcement learning.
| commandlinefan wrote:
| > what the reward / punishment system really equates to
|
| Well, in the article, it says the punishment was a slap. On the
| other hand, he just says "she gives you a wonderful reward"...
| so you're left to use your imagination there.
| clarle wrote:
| Totally aware that this isn't a fully formal definition of deep
| learning, but one interesting takeaway for me is realizing that
| in a way, corporations with their formal and informal reporting
| structures are structured in a way similar to neural networks
| too.
|
| It seems like these sort of structures just regularly arise to
| help regulate the flow of information through a system.
| 0xBABAD00C wrote:
| There is research claiming the entire universe is a neural
| network: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7712105/
| mcbuilder wrote:
| That's the sort of blue sky research I'm glad exists.
| mxkopy wrote:
| Indra's Net is, in fact, a neural network
| joe_the_user wrote:
| Uh,
|
| The similarity of corporations and neural nets is pretty much
| only that both are information processing systems. An operating
| system or missile guidance system is far more like a
| corporation than a neural network.
|
| Neural networks have no memory and generally don't seek
| particular goals, they simply recognize, predict and generate
| similar instances.
| Retric wrote:
| Plenty of ways to think about this stuff. IMO neural networks
| don't inherently do anything, it's just a data structure.
|
| Different ways you can interact with that data structure can
| however provide meaning and store information in the weights
| etc.
| Myrmornis wrote:
| > they see 3 spreadsheets of numbers representing the RGB values
| of the picture.
|
| This needs expanding: it's the sort of thing that's easy for a
| programmer to say, but few non-{programmer,mathematically trained
| person} are going to see that an RGB value has 3 parts and so a
| collection of RGB values could be sliced into 3 sheets.
| romwell wrote:
| ...or know what Ruth Ginsburg Bader has anything to do with it
| all.
|
| The RGB color model and representation of images in it is
| already technical. Anyone who knows what it means also wouldn't
| need to be told the following quip:
|
| >Also note that computers see things as multi-dimensional
| tables of data. They don't look at a "picture" - they see 3
| spreadsheets of numbers representing the RGB values of the
| picture.
|
| ...which is the only time RGB is mentioned in the article.
|
| That's before we get to the part that "multidimensional" here
| is extraneous, and doesn't even match the typical usage (where
| RGBA is stored as a single 32-bit value). Everything is a tape
| of 1's and 0's, "multidimensionality" comes from interpretation
| of data.
|
| The _dimension_ of image data is still 2: each pixel is a
| sample a 2D projection of a 3D world, and is related to other
| pixels in a way that 's different than, say, those of letters
| in a line of text, or voxels (letters don't have a a well-
| defined "up" neighbor, voxels have _more_ well-defined
| neighbors than pixels do).
| sainez wrote:
| If anyone is looking for a quick overview of how LLMs are built,
| I highly recommend this video by Steve Seitz:
| https://www.youtube.com/watch?v=lnA9DMvHtfI.
|
| It does an excellent job of taking you from 0 to a decent
| understanding without dumbing down the content or abusing
| analogies.
| vrglvrglvrgl wrote:
| [dead]
___________________________________________________________________
(page generated 2023-04-25 23:00 UTC)