[HN Gopher] A non-technical explanation of deep learning
       ___________________________________________________________________
        
       A non-technical explanation of deep learning
        
       Author : tworats
       Score  : 186 points
       Date   : 2023-04-25 15:10 UTC (7 hours ago)
        
 (HTM) web link (www.parand.com)
 (TXT) w3m dump (www.parand.com)
        
       | amelius wrote:
       | The problem with deep learning is opposite. You can understand
       | most of it with just high school math. Advanced math is mostly
       | useless because of the dimensionality of neural nets.
        
         | Utkarsh_Mood wrote:
         | can you elaborate further on what you mean by 'dimensionality
         | of neural nets'? Thanks!
        
           | amelius wrote:
           | Yes, I mean the huge number of trainable parameters.
        
           | [deleted]
        
         | mach1ne wrote:
         | Yes, it's really rather like alchemy in some sense. Stuff
         | works, and often nobody knows exactly why.
        
           | [deleted]
        
           | uoaei wrote:
           | "I don't follow the latest ML scaling and theory research"
           | does not in any way equate to "these things are unknowable".
        
             | lhnz wrote:
             | Hm, watching Neel Nanda videos recently and I do get the
             | feeling that there are lots of unknowns in ML and also in
             | what trained networks have learnt.
        
         | uoaei wrote:
         | That's like saying you understand state-of-the-art CFD code
         | because you can read Fortran.
         | 
         | There are many aspects to learning systems that we still don't
         | have any kind of grasp on, and will take more than a little
         | advanced math (statistics/probability theory, transport theory,
         | topology, etc.) to understand as a community.
         | 
         | Dunning-Kruger is probably more common in spaces like this one,
         | where people carry social capital for being able to "spin up
         | quickly". But the true meta-skill of upskilling is turning
         | unknown unknowns (UU) into known unknowns (KU), and then into
         | known knowns (KK). It's not enough to just jump from UU to KK
         | through osmosis by reading blog posts on a news aggregator,
         | because there will still be a huge space of unknowns not
         | covered by that approach.
        
         | ftxbro wrote:
         | > Advanced math is mostly useless because of the dimensionality
         | of neural nets.
         | 
         | It depends what you mean by advanced math. There is a lot of
         | math that only really comes into play _because_ of the high
         | dimensionality! For example math related to tensor wrangling,
         | low rank approximations, spectral theory, harmonic theory,
         | matrix calculus derivatives, universality principles, and other
         | concepts that could be interesting or bewildering or horrifying
         | depending how you react to it. Of course some of it is only
         | linear algebra of the  'just high school math' kind but that's
         | not how I would normally describe it. If you look at the math
         | in the proofs in the appendices of the more technical AI papers
         | on arxiv there is often some weird stuff in there, not just
         | matrix multiply and softmax.
        
       | lhnz wrote:
       | I have a few funny analogies that I think kind of work.
       | 
       | 1. "gradient descent" is like tuning a guitar by ear and
       | listening to the beat frequencies ("loss") and then decreasing
       | these by tuning a string up or down.
       | 
       | 2. the best I can come up with for "backpropagation" is to
       | imagine a clever device that can tirelessly optimize a Rube
       | Goldberg machine for you but as a science, not an art.
        
       | great_wubwub wrote:
       | As someone who knows barely enough to be dangerous, I like this.
       | I'm sure it leaves enough out to make most experts angry, but it
       | makes a lot of sense to me.
        
         | time_to_smile wrote:
         | > I'm sure it leaves enough out to make most experts angry
         | 
         | It's not that it leaves out details, it's that the articles
         | metaphors are not actually correct in regards to the way deep
         | learning works.
         | 
         | This post mostly confuses both reinforcement learning and
         | ensemble models with deep learning. If you only enough "enough
         | to be dangerous" then this post will steer your intuition in
         | the _wrong_ direction.
        
       | lxe wrote:
       | This has Three-Body Problem vibes :)
        
       | time_to_smile wrote:
       | > This is how neural networks work: they see many examples and
       | get rewarded or punished based on whether their guesses are
       | correct.
       | 
       | This description more closely describes reinforcement learning,
       | rather than gradient based optimization.
       | 
       | In fact, the entire metaphor of a confused individual being
       | slapped or rewarded without understanding what's going on doesn't
       | really make sense when considering gradient optimization because
       | the gradient wrt the to loss function tells the network _exactly_
       | how to change it 's behavior to improve it's performance.
       | 
       | This last point is incredibly important to understand correctly
       | since it contains one of the biggest assumptions about network
       | behavior: that the optimal solution, or at least good enough for
       | our concerns solution, _can be found_ by slowing taking small
       | steps in the right direction.
       | 
       | Neural networks are great at _refining_ their beliefs but have a
       | difficult time radically changing them. A better analogy might be
       | trying to very slowly convince your uncle that climate change is
       | real, and not a liberal conspiracy.
       | 
       | edit: it also does a poor job of explaining layers, which reads
       | much more similar to how ensemble methods work (lots of little
       | classifiers voting) than how deep networks work.
        
       | apomekhanes wrote:
       | [flagged]
        
       | hgsgm wrote:
       | Non-technical, non-accurate. "Truthy", buzzfeed/huffpo quality.
        
       | nailer wrote:
       | Does this article imply there are circumstances where a
       | spreadsheet is a cat? What a poor example of technical writing.
        
         | teacpde wrote:
         | Not the author, but to the author's defense, it is meant to be
         | non-technical. And the first paragraph reads interesting to me.
        
           | nailer wrote:
           | Most non technical people would think there are zero
           | circumstances where a spreadsheet could be a cat.
        
             | adrianmonk wrote:
             | It's obvious from context that it's the content of the
             | media. To me at least.
             | 
             | If I play you a song on Spotify and say, "Is this a
             | saxophone?", you wouldn't say, "No, it's a iPhone running
             | Spotify."
             | 
             | If a policeman holds up a photograph of a person and says,
             | "Is this the person who attacked you?", the victim doesn't
             | say, "No, it's an 8 by 10 glossy print."
        
             | ricardobeat wrote:
             | That's part of the explanation. It might not make sense at
             | first, but you'll figure something out to avoid being
             | slapped.
        
         | andysinclair wrote:
         | He's saying that the spreadsheet represents the "picture" of
         | the cat in terms of pixels and RGB values etc.
         | 
         | The algorithm/workers are not really "looking" at a picture of
         | a cat, they are analysing and looking for patterns in the data
         | that defines the picture of the cat.
        
       | charcircuit wrote:
       | Why is violence and praise being used to illustrate gradient
       | descent? Why does each person get to see the entire input data?
        
       | _gmax0 wrote:
       | The most concise and intuitive line of explanation I've been
       | given goes along the lines of this:
       | 
       | 1 - We want to model data, representative of some system, through
       | functions.
       | 
       | 2 - Virtually any function can be expressed by a n-th order
       | polynomial.
       | 
       | 3 - We wish to learn the parameters, the coefficients, of such
       | polynomials.
       | 
       | 4 - Neural networks allow us to brute-force test candidate values
       | of such parameters (finding optimal candidate parameters such
       | that error between expected and actual values of our dataset are
       | minimized)
       | 
       | Whereas prior, methods (e.g. PCA) could only model linear
       | relationships, neural networks allowed us to begin modeling non-
       | linear ones.
        
         | civilized wrote:
         | You don't need neural networks to do polynomial regression.
         | Polynomial regression, perhaps surprisingly, can be implemented
         | using only (multivariable) linear regression. You just include
         | powers of your predictor x as terms in the regression formula:
         | y = a + bx + cx^2 + dx^3 + ...
         | 
         | The resulting model is linear, even though there are powers of
         | x in your formula. Because x and y are _known_ from the data.
         | They 're not what you're solving for, you're solving for the
         | unknown _coefficients_ (a, b, c, d...). This gives you a linear
         | system of equations in those unknown coefficients, which can be
         | solved using standard linear least squares methods.
         | 
         | So fitting polynomials is easy. The problem is that it's not
         | that useful. Deep learning has to solve much harder problems to
         | get to a useful model.
        
         | jacksnipe wrote:
         | Except gradient descent is about as far from brute force as it
         | gets
        
           | _gmax0 wrote:
           | Sure, under the assumption that your parameter space is
           | convex.
        
             | [deleted]
        
         | pedrosorio wrote:
         | Mentioning polynomials is a pretty poor way to explain it for
         | two reasons:
         | 
         | - It requires some mathematical understanding so will exclude
         | some part of the non-technical audience
         | 
         | - It is the incorrect analogy. Non-linearities in neural
         | networks have nothing to do with polynomials. In fact,
         | polynomial regression is a type of linear regression, and for
         | the most part, it sucks.
         | 
         | Also, as someone mentioned, all the "serious" alternative ML
         | methods prior to the deep learning revolution allow modeling
         | non linearities (even if just through modification of linear
         | regressions, like polynomial regression).
        
         | teruakohatu wrote:
         | > Whereas prior, methods (e.g. PCA) could only model linear
         | relationships,
         | 
         | Prior methods also allowed modelling of non-linear
         | relationships, eg. Random Forests.
        
         | lhnz wrote:
         | Hm, I don't think that's quite it. I went through my own
         | process of learning how neural networks work recently and wrote
         | this based on my learning: https://sebinsua.com/bridging-the-
         | gap
         | 
         | As far as my understanding goes, you can represent practically
         | any function as layers of linear transformations followed by
         | non-linear functions (e.g. `ReLU(x) = max(0, x)`). It's this
         | sprinkling of non-linearity that allows the networks to be able
         | to model complex functions.
         | 
         | However, from my perspective, the secret sauce is (1)
         | composability and (2) differentiability. These enable the
         | backpropagation process (which is just "the chain rule" from
         | calculus) and this is what allows these massive mathematical
         | expressions to learn parameters (weights and biases) that
         | perform well.
        
       | jstx1 wrote:
       | Does stuff like this help anyone?
       | 
       | I still haven't forgiven CGP Grey for changing the title to his
       | 2017 ML video to "How AIs, like ChatGPT, learn". The video is
       | about genetic algorithms and has nothing to do with ChatGPT. (or
       | with anything else in modern AI)
        
         | [deleted]
        
         | gnicholas wrote:
         | I read this to see if it would be useful to share with my 9
         | year old. After reading it, I think it is not any more useful
         | (alone) than watching the 3b1b video on this topic. The video
         | is longer, but has more visualizations.
         | 
         | I think that perhaps reading this description after watching
         | the video might make the process more memorable. My guess is
         | that if I had my daughter read this first, it wouldn't do much
         | to make the video easier to parse. Reading this real-world
         | example after watching the video could help solidify the
         | concept.
         | 
         | Disclaimer: I don't know a lot about AI/ML, so it's possible
         | that I am 100% wrong here!
        
         | SnooSux wrote:
         | I've barely forgiven him for explaining genetic algorithms and
         | acting like they have any relevance to contemporary ML
         | research.
         | 
         | The footnote video was an alright explanation of backprop. If
         | that were part of the main video that would have been
         | reasonable.
         | 
         | I really like his history/geography videos but anything
         | technical leave a lot to be desired. And don't get me started
         | on Humans Need Not Apply.
        
           | flangola7 wrote:
           | Humans Need Not Apply is one of the most phenomenal videos on
           | YouTube, what do you think is wrong with it?
        
           | jstx1 wrote:
           | > And don't get me started on Humans Need Not Apply.
           | 
           | Well now you have to tell us. :) Many of the concrete
           | examples in that video are exaggerated and/or misunderstood
           | but the general question it asks - what to do when automation
           | makes many people unemployable through no fault of their own
           | - seems valid.
        
             | bolyarche wrote:
             | [dead]
        
         | Trufa wrote:
         | What a strange word to use in that context, why would he need
         | to be forgiven by you? How has he wronged you? Seems at worst,
         | an honest mistake in a complicated topic.
        
           | gregschlom wrote:
           | Maybe GP is a non-native English speaker? This construct
           | would be pretty common way for a native French speaker to say
           | they are angry at something. Not sure if it's common in
           | English as well.
        
             | mrbombastic wrote:
             | This is a pretty common phrase in English as well, it is
             | not meant to be taken literally.
        
           | Cpoll wrote:
           | You're not using "at worst" correctly. What you describe is
           | an "at best". Worse would be that CGP Grey deliberately
           | picked a misleading title in order to optimize views,
           | algorithm, etc.
           | 
           | This is, I think, the case. But I don't begrudge them too
           | much, YouTube is cutthroat.
        
       | zvmaz wrote:
       | I have met people who think they understand a particular topic I
       | am versed in, but actually don't. Similarly, I am often wary that
       | I get superficial knowledge about a topic I don't know much about
       | through "laymen" resources, and I doubt one can have an
       | appropriate level of understanding mainly through analogies and
       | metaphors. It's a kind of "epistemic anxiety". Of course, there
       | are "laymen" books I stumbled upon which I think go to
       | appropriate levels of depth and do not "dumb down" to shallow
       | levels the topics, yet remain accessible, like Godel's Proof, by
       | Ernest Nagel. I'd be glad to read about similar books on all
       | topics, including the one discussed in this thread.
       | 
       | Knowledge is hard to attain...
        
         | wanderlust123 wrote:
         | Can you please suggest other books similar in spirit to the
         | Nagel book? Would love to read some over summer
        
         | lxe wrote:
         | I've noticed that the learning curve stary fairly flat when it
         | comes to understanding weights, and layers, and neural
         | networks, heck, even what gradient descent is for... but then
         | when it comes to actually understanding why optimization
         | algorithms are needed, and how they work, things just spiral
         | into very hard math territory.
         | 
         | I do think that maybe it feels inaccessible because we
         | transition from discrete concepts easily digestible by CS grads
         | into some complicated math with very terse mathematic notation,
         | yet the math might not be as hard if presented in a way that
         | doesn't scare away programmers.
        
         | sainez wrote:
         | I find the best way to learn technical topics is to build a
         | simplified version of the thing. The trick is to understand the
         | relationship between the high level components without getting
         | lost in the details. This high level understanding then helps
         | inform you when you drill down into specifics.
         | 
         | I think this book is a shining example of that philosophy:
         | https://www.buildyourownlisp.com/. In the book, you implement
         | an extremely bare-bones version of lisp, but it has been
         | invaluable in my career. I found I was able to understand
         | nuanced language features much more quickly because I have a
         | clear model of how programming languages are decomposed into
         | their components.
        
           | joe_the_user wrote:
           | _I find the best way to learn technical topics is to build a
           | simplified version of the thing. The trick is to understand
           | the relationship between the high level components without
           | getting lost in the details. This high level understanding
           | then helps inform you when you drill down into specifics._
           | 
           | I agree but that's a good guide to build a technical
           | understanding of a complex subject, not sufficient-in-itself
           | tool set for considering questions in that complex subject.
           | 
           | Especially, I'll people combining some "non-technical
           | summary" of quantum-mechanics/Newton Gravity/genetic
           | engineer/etc with their personal common sense are constant
           | annoyance to me whenever such topics come here.
        
             | nomel wrote:
             | > constant annoyance to me whenever such topics come here.
             | 
             | I'll say that thinking about things at the edge of my
             | understanding, where "Eureka!" moments are low hanging
             | fruit, results in the highest dump of dopamine out of any
             | other activity. Having silly fun speculating (and I make it
             | clear when I am) my way through some thought process is
             | literally the most fun I can have. Seeing those types of
             | conversations, full of _genuine_ curiosity, thoughtful
             | speculation, and all the resulting corrections
             | /discussions/insight, etc, is why I love HN so much, and I
             | hope it's always a place to nerd out.
             | 
             | One mans trash is another mans treasure, I suppose. :)
        
         | commandlinefan wrote:
         | > an appropriate level of understanding mainly through
         | analogies and metaphors
         | 
         | I think it's actually worse than that - somebody who doesn't
         | know actually realizes that he doesn't know, but somebody who
         | _thinks_ he understands through analogies and metaphors will
         | confidently come to the incorrect conclusion and then argue
         | with somebody who actually does understand the topic - often
         | managing to convince innocent bystanders because his reasoning
         | is easier to grasp and the so-called expert seems to be getting
         | more and more flustered (as he tries to explain why the analogy
         | is actually correct, but oversimplified).
        
         | somenameforme wrote:
         | There are a million e.g. number parsing (image to digit) neural
         | network type programs on GitHub. Go pick one in your preferred
         | language and break it apart, and rebuild it, looking up the
         | concepts behind parts you don't understand. After you finish up
         | with the above, look up 'the xor problem' to see a common
         | practical problem (which creating a network to replicate xor
         | _illustrates_ , rather than _is_ ) and you'll be well on your
         | way to a nice fundamental understanding, built from the ground
         | up.
         | 
         | One of the most interesting things about this topic is that the
         | fundamental concepts and implementations are all _really_
         | simple. It 's the fact that it actually works that's mind
         | boggling. In any case, the above is not a months like affair -
         | but like one week of dedicated work.
        
       | giardini wrote:
       | Nothing about LLMs?!
        
         | fifteen1506 wrote:
         | Yeah, I need something to explain me about those Transformers
         | things. I know it was published by Google in 2017 and that it
         | is 'magic'.
         | 
         | End of knowledge.
         | 
         | Maybe I should ask ChatGPT?
        
           | jedberg wrote:
           | > Maybe I should ask ChatGPT?
           | 
           | You actually should, it spits out a pretty good explanation
           | (sometimes).
        
       | wrs wrote:
       | This is the funniest refutation of the Chinese Room argument that
       | I've seen. Note that at the end, it's still the case that none of
       | these people can recognize a cat.
        
         | pringk02 wrote:
         | Doesn't that mean it supports the Chinese room argument? I'm
         | not sure I follow your reasoning.
         | 
         | (also, popular conciousness forgets that technically the
         | Chinese Room argument is only arguing against the much
         | narrower, and now philosophically unfashionable, "Hard AI"
         | stance as it was held in the 70s)
        
           | tasty_freeze wrote:
           | > the Chinese Room argument is only arguing against the much
           | narrower, and now philosophically unfashionable, "Hard AI"
           | stance as it was held in the 70s
           | 
           | Searle has stood behind his argument in the 70s, but in every
           | decade since then too.
           | 
           | The main failure is that most people fundamentally don't
           | believe they are mechanistic. If one believe in dualism, then
           | it easy to attribute various mental states to that dualism,
           | and of course a computer neural network cannot experience
           | qualia like humans do.
           | 
           | I don't believe in a soul, and thus believe that a computer
           | neural network, probably not today's models but a future one
           | that is large enough and has the right recurrent topology,
           | will be able to have qualia similar to what humans and
           | animals experience.
        
           | wrs wrote:
           | I understand the Chinese Room argument to be that because the
           | human in the room doesn't understand Chinese, the system
           | doesn't understand Chinese. In this case, none of the humans
           | can recognize cats, but the collective can.
        
             | cscurmudgeon wrote:
             | Thats not the Chinese Room argument. The argument says just
             | because a system processes X doesn't imply it has
             | consciousness of X.
        
       | onikolas7 wrote:
       | Funny. In the game black&white you would slap or pet your avatar
       | to train it. The lead AI programmer on that was Demis Hassabis of
       | deepmind fame.
        
         | Maultasche wrote:
         | The description made me think of Black & White as well. I still
         | have memories of smacking my creature around every time he ate
         | someone.
        
         | redog wrote:
         | Somehow he knew AI would be our Gods.
        
       | pkdpic wrote:
       | I love this, but Im always confused in these kinds of analogies
       | what the reward / punishment system really equates to...
       | 
       | Also reminds me of Ted Chiang warning us that we will torture
       | innumerable AI entities long before we start having real
       | conversations about treating them with compassion.
        
         | time_to_smile wrote:
         | Don't love it, it's not correct.
         | 
         | > what the reward / punishment system really equates to
         | 
         | Nothing, and least as far as neural network training goes. This
         | is an extremely poor analogy regarding how neural networks
         | learn.
         | 
         | If you've ever done any kind of physical training and have had
         | a trainer sightly adjust the position of your limbs until what
         | ever activity you're doing feels better, that's a much closer
         | analogy. You're gently searching the space of possible correct
         | positions, guided by an algorithm (your trainer) that knows how
         | to move you towards a more correct solution.
         | 
         | There's nothing analogous to a "reward" or "punishment" when
         | neural networks are learning.
        
           | GaggiX wrote:
           | >There's nothing analogous to a "reward" or "punishment" when
           | neural networks are learning.
           | 
           | Well deep reinforcement learning.
        
         | commandlinefan wrote:
         | > what the reward / punishment system really equates to
         | 
         | Well, in the article, it says the punishment was a slap. On the
         | other hand, he just says "she gives you a wonderful reward"...
         | so you're left to use your imagination there.
        
       | clarle wrote:
       | Totally aware that this isn't a fully formal definition of deep
       | learning, but one interesting takeaway for me is realizing that
       | in a way, corporations with their formal and informal reporting
       | structures are structured in a way similar to neural networks
       | too.
       | 
       | It seems like these sort of structures just regularly arise to
       | help regulate the flow of information through a system.
        
         | 0xBABAD00C wrote:
         | There is research claiming the entire universe is a neural
         | network: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7712105/
        
           | mcbuilder wrote:
           | That's the sort of blue sky research I'm glad exists.
        
           | mxkopy wrote:
           | Indra's Net is, in fact, a neural network
        
         | joe_the_user wrote:
         | Uh,
         | 
         | The similarity of corporations and neural nets is pretty much
         | only that both are information processing systems. An operating
         | system or missile guidance system is far more like a
         | corporation than a neural network.
         | 
         | Neural networks have no memory and generally don't seek
         | particular goals, they simply recognize, predict and generate
         | similar instances.
        
           | Retric wrote:
           | Plenty of ways to think about this stuff. IMO neural networks
           | don't inherently do anything, it's just a data structure.
           | 
           | Different ways you can interact with that data structure can
           | however provide meaning and store information in the weights
           | etc.
        
       | Myrmornis wrote:
       | > they see 3 spreadsheets of numbers representing the RGB values
       | of the picture.
       | 
       | This needs expanding: it's the sort of thing that's easy for a
       | programmer to say, but few non-{programmer,mathematically trained
       | person} are going to see that an RGB value has 3 parts and so a
       | collection of RGB values could be sliced into 3 sheets.
        
         | romwell wrote:
         | ...or know what Ruth Ginsburg Bader has anything to do with it
         | all.
         | 
         | The RGB color model and representation of images in it is
         | already technical. Anyone who knows what it means also wouldn't
         | need to be told the following quip:
         | 
         | >Also note that computers see things as multi-dimensional
         | tables of data. They don't look at a "picture" - they see 3
         | spreadsheets of numbers representing the RGB values of the
         | picture.
         | 
         | ...which is the only time RGB is mentioned in the article.
         | 
         | That's before we get to the part that "multidimensional" here
         | is extraneous, and doesn't even match the typical usage (where
         | RGBA is stored as a single 32-bit value). Everything is a tape
         | of 1's and 0's, "multidimensionality" comes from interpretation
         | of data.
         | 
         | The _dimension_ of image data is still 2: each pixel is a
         | sample a 2D projection of a 3D world, and is related to other
         | pixels in a way that 's different than, say, those of letters
         | in a line of text, or voxels (letters don't have a a well-
         | defined "up" neighbor, voxels have _more_ well-defined
         | neighbors than pixels do).
        
       | sainez wrote:
       | If anyone is looking for a quick overview of how LLMs are built,
       | I highly recommend this video by Steve Seitz:
       | https://www.youtube.com/watch?v=lnA9DMvHtfI.
       | 
       | It does an excellent job of taking you from 0 to a decent
       | understanding without dumbing down the content or abusing
       | analogies.
        
       | vrglvrglvrgl wrote:
       | [dead]
        
       ___________________________________________________________________
       (page generated 2023-04-25 23:00 UTC)