[HN Gopher] Neural Networks from Scratch
___________________________________________________________________
Neural Networks from Scratch
Author : bane
Score : 297 points
Date : 2021-10-09 03:14 UTC (2 days ago)
(HTM) web link (aegeorge42.github.io)
(TXT) w3m dump (aegeorge42.github.io)
| synergy20 wrote:
| very cool, nice UI, simplest tutorial but grasps the gist,
| perfect for starters to have a big picture before diving in the
| details.
| spoonsearch wrote:
| Very nice, the color combination and the UI is so pleasing. The
| explanation is cool :)
| robomartin wrote:
| For those curious about the _nabla_ ([?]) or gradient symbol (not
| a Greek letter):
|
| https://en.wikipedia.org/wiki/Nabla_symbol
| mLuby wrote:
| While most of the random starting weights converged quickly, this
| one got stuck with a fairly incorrect worldview, so to speak:
|
| 
|
| Is it overfitting to say the same is true for humans, where a
| brain's starting weights and early experiences may make it much
| more difficult to achieve an accurate model?
| sarathyweb wrote:
| The text is too small to read on my phone. I cannot zoom in
| either :(
| bnegreve wrote:
| I don't think it is a good idea to describe neural networks as a
| large graph of neurons interacting with each other. It is not
| really helpful to understand what is going on inside.
|
| It is more useful to understand them as a series of transforms
| that bend and fold the input space, in order to place pairs of
| similar items close to each other. I would like to see people
| trying to illustrate that instead.
|
| It also has the benefit of making the connection with linear
| algebra much easier to understand.
| mjburgess wrote:
| A Neural Network _isnt_ a graph in any case, and isn 't based
| on the brain.
|
| As you said, it's a sequence of transformations.
|
| NB. If it's a graph: write out the edge list (etc.) .
|
| NNs are diagrammed as graphs, but this is highly misleading.
| Retric wrote:
| Neural Networks are a graph more specifically a Weighted
| Directed Graph.
|
| They are also very much modeled after the brain, more
| specifically they originate from a 1943 paper by
| neurophysiologist Warren McCulloch and mathematician Walter
| Pitts who described how neurons in the brain might work by
| modeling a simple neural network.
|
| Of course it's not an accurate model, but it very much is
| based on early understanding of biological neurons.
| jonnycomputer wrote:
| Yeah, I very much don't understand OP's argument. And its
| trivial to write out the nodes and edges (at least for
| trivially sized neural networks).
| zwaps wrote:
| I think what op means is this: A graph is mathematically
| a set of vertices (nodes) and a set of ordered or
| unordered tuples giving the edges (ties). Now, sometimes
| you might have a weight on these edges, for example by
| specifying some function on the edge set.
|
| However, it is difficult to see how a neural network that
| includes operations like sum, multiply and tanh, might be
| modeled this way. How do you describe a dropout as a
| graph?
|
| I think the argument is that a graph is not sufficient to
| describe a NN, so technically speaking a NN is not a
| graph. It is more. It has edges between x and f(x), but
| we also need to specify what f(x) is. The mathematical
| definition of a graph doesn't do that.
| Retric wrote:
| A weighted graph can include weights for both the
| vertices and edges. For example a network latency diagram
| may include the physical wires as separate from the
| router as the router latency may depend on network load.
| Similarly routers themselves have internal bandwidth
| limitations etc.
|
| As to the rest separating the NN from everything needed
| to generate it is a useful distinction. You're not going
| to generate a different f(x) by slightly changing the
| training set etc. It's however a somewhat arbitrary
| distinction.
| [deleted]
| laGrenouille wrote:
| > NB. If it's a graph: write out the edge list (etc.) .
|
| I don't understand what issue you are referring to.
|
| For a dense network, each pair of adjacent layers forms a
| complete bipartite graph. In other words, edges are all pairs
| with one node in layer N and another in layer N+1.
|
| CNNs and RNNs take a little more work, but still easy to
| describe the graph structure.
| zwaps wrote:
| I think op means that a graph is not sufficient to describe
| a NN. If a layer is Y=XB, then you draw that as set of
| nodes Y and individual weight b_ij as edge-weights from X.
| Right.
|
| But can you describe things like concat, max-pooling,
| attention etc. without changing the meaning of the edges?
| Or do you have to annotate edges to now mean "apply
| function here"? If so, op probably wants to say that you
| are describing more than a graph. There's a graph there,
| but you need more, you need elaborate descriptions of what
| edges do. In that case, op could be correct to say that
| technically, NN are not graphs.
|
| Or, perhaps NN can generally be represented by vertices and
| edge lists. It certainly isn't the usual way to draw them,
| though.
| farresito wrote:
| Totally agree with you. The article that opened my eyes was
| this[0] one. This[1] video is also very good.
|
| [0] https://colah.github.io/posts/2014-03-NN-Manifolds-
| Topology/
|
| [1] https://www.youtube.com/watch?v=e5xKayCBOeU
| joefourier wrote:
| You really find large n-dimensional transforms easier to reason
| about and visualise, as opposed to layers of neurons with
| connections? You don't find it much more intuitive to see it as
| a graph once you start adding recurrence, convolutions,
| sparsity, dropout, connections across multiple layers, etc.,
| let alone coming up with new concepts?
|
| I think it's useful to understand it in both ways, but our
| intuitions about transforms are largely useless when the number
| of dimensions is high enough.
| nerdponx wrote:
| It's good to have both perspectives. Ideally you learn the
| layers-of-transforms version alongside the styled graph-of-
| neurons version. If you had to only pick one, which one you
| learn would depend a lot on what kind of student you are and
| what your goals are. I think the layers-of-transforms version
| is "less wrong" in general, but probably harder to
| understand, so it's maybe better if you had to learn just
| one.
| ravi-delia wrote:
| I think understanding how neural networks work is easiest if
| you think of them as networks. Reasoning about _why_ they
| work is a lot easier thinking about them as transformations.
| It 's not like you're actually picturing all the parameters
| of a nontrivial network one way or the other.
| farresito wrote:
| Not the person you are answering to, but I think it's all
| about the level of abstraction you want to reason at. I
| didn't grok neural networks until I visualized the
| transformations that were happening in a very simple network.
| Once that made sense, I could start thinking in terms of
| layers.
| shimonabi wrote:
| If anyone is interested, here is a simple symbol recognizer using
| backpropagation I wrote in Python a while ago with the help of
| the book "Make your own network" by Tariq Rashid. Numpy is a
| great help with matrix calculations.
|
| https://www.youtube.com/watch?v=IAQyVmTDz0A
| andreyk wrote:
| Pleasantly surprised by this, not yet another blog post on this
| but rather a nice interactive lesson. Well done!
| aeg42x wrote:
| Hi everyone! I made this thing! I'm glad you all like it :) This
| is actually my first time using javascript, so if there's any
| issues please let me know and I'll do my best to fix them.
| Pensacola wrote:
| Hi, nice site! But since you asked, here's an issue: the little
| "Click to increase or decrease weights" feature doesn't work in
| Firefox.
| windsignaling wrote:
| "first time using javascript"
|
| Impressive. I think the first time I used Javascript I made a
| button.
| mdp2021 wrote:
| On mine, the textboxes are broken - overlapping other areas and
| rendered with a heavy blur.
| aeg42x wrote:
| Are you using mobile or a browser? And could you please post a
| screenshot? I'll see what I can do! Thank you!
| pplanel wrote:
| Can't start in Android's Firefox.
| informationslob wrote:
| I can.
| moffkalast wrote:
| Probably can't start in Netscape Navigator either, the
| audacity.
| kebsup wrote:
| Very nice. I've have created very similar thing a few years ago,
| but yours is nicer. :D https://nnplayground.com
| minihat wrote:
| Each time I teach neural nets to an engineer, there's only a 50%
| chance they can write down the chain rule. Colah's blog on
| backprop used to be my favorite resource to leave them with
| (https://colah.github.io/posts/2015-08-Backprop).
|
| The explanation of the calculus in this tool is equally
| fantastic. And the art is very cute.
|
| There are many ways to skin a cat, of course, but this is as good
| a tutorial as I've seen for getting you through backprop as fast
| as possible.
| jhgb wrote:
| > there's only a 50% chance they can write down the chain rule
|
| I blame the common mathematical notation for that.
| friebetill wrote:
| I found this 13 min explanation very helpful in understanding
| backpropagation (https://youtu.be/c36lUUr864M?t=2520).
|
| First he explains the necessary concepts:
|
| 1) Chain Rule
|
| 2) Computational Graph
|
| Then he explains backpropagation in these three steps (first in
| general and then with examples):
|
| 1) Forward pass: Compute loss
|
| 2) Compute local gradients
|
| 3) Backward pass: Compute dLoss/dWeights using the Chain Rule
| shaan7 wrote:
| Any recommendations for a 101 book for neural nets for someone
| who is "just a programmer"? OP's tutorial is quite nice, but I
| love to read books and find it easier to learn from them.
| aeg42x wrote:
| I highly recommend http://neuralnetworksanddeeplearning.com/
| it's an online book that has some great code examples built
| in.
| carom wrote:
| There is a book called neural networks from scratch at
| https://nnfs.io.
| wesleywt wrote:
| Fastai has a course: practical deep learning for programmers.
| carom wrote:
| There are also coursera specializations from Andrew Ng at
| https://deeplearning.ai.
| matsemann wrote:
| Ng's course is bottom up: Start with the basic math,
| expand upon it, until you arrive at ML and neural nets.
|
| Fastai is top down: learn to use practical ML with
| abstractions, and then dig deeper and explain as needed.
|
| I preferred fastai's approach, even though I enjoyed
| both. Ng's could be a bit too low level and fundamental
| for what I wanted to learn.
| carom wrote:
| This is a valuable take. Fastai was very frustrating for
| me because I wanted to understand the internals. I ended
| up not finishing it, so take my opinion with a grain of
| salt.
| windsignaling wrote:
| I much prefer Andrew Ng's courses as well.
|
| I tried Fast AI, but it seems to be trying too hard to
| take out the math, which oddly for me (as a STEM grad)
| makes it much more difficult to understand.
|
| Had to stop when I saw him using Excel spreadsheets to
| explain convolution.
| jacobcmarshall wrote:
| _Deep Learning with Python_ by Chollet is an excellent
| beginner resource if you are a hands-on learner.
|
| It starts off with some tutorials using the Keras library,
| and then gets into the math later on.
|
| By the end of the book, you create multiple different types
| of neural networks for identifying images, text, and more! I
| highly recommend it.
| baron_harkonnen wrote:
| Given the current state of automatic differentiation I'm not so
| sure it's even necessary or particularly useful to focus on
| backpropagation any more.
|
| While backprop has major historic significance, in the end it's
| essentially just a pure calculation which no longer needs to be
| done by hand.
|
| Don't get me wrong, I still believe that understanding the
| gradient is hugely important, and conceptually it will always
| be essential to understand that one is optimizing a neural
| network by taking the derivative of the loss function, but
| backprop is not necessary nor is it particularly useful for
| modern neural networks (nobody is computing gradients by hand
| for transformers).
|
| IMHO a better approach is to focus on a tool like JAX where
| taking a derivative is abstracted away cleanly enough, but at
| the same time you remain fully aware of all the calculus that
| is being done.
|
| Especially for programmers, it's better to look at Neural
| Networks as just a specific application of Differentiable
| Programing. This makes them both easier to understand and also
| enables the learner to open a much broader class of problems
| they can solve with the same tools.
| medo-bear wrote:
| Backpropagation is a particular implementation of reverse
| mode auto-differentiation, and it is the basis for all
| implementaions of DL models. It is very strange for me to
| read this as though it is very obvious and commonly accepted
| fact, which I don't think it is.
| baron_harkonnen wrote:
| > to read this as though it is very obvious and commonly
| accepted fact
|
| I'm not entirely sure what you're referring to by "this"
| but assuming you mean my comment, I think what I'm saying
| is very much up for debate and not an "obvious and commonly
| accepted fact". Karpathy has a very reasonably argument
| that directly disagrees with what I'm suggesting [0]. Of
| course he also agrees that in practice nobody will every
| use backprop directly.
|
| Whether it's JAX, TF, PyTorch, etc the chain rule will be
| applied for you. I'm arguing that I think it's helpful to
| not have to worry about the details of how your derivative
| is being computed, and rather build an intuition about
| using derivatives as an abstraction. To be fair I think
| Karpathy is correct for people who are going to be learning
| to explicitly be experts in Neural Networks.
|
| My point is more that given how powerful our tools today
| are for computing derivatives (I think JAX/Autograd have
| improved since Karpathy wrote that article), it's better to
| teach programmers to learn think of derivatives, gradients,
| hessians etc as high level abstractions. Worrying less
| about how to compute them and more about how to use them.
| In this way thinking about modeling doesn't need to be
| restricted to strictly NNs, but rather use NNs and example
| and then demonstrate to the student that they are free to
| build any model by defining how the model predicts, scoring
| the prediction and using the tools of calculus to answer
| other common questions you might have.
|
| edit: a good analogy is logic programming and
| backtracking/unification. The entire point of logic
| programming is to abstract away backtracking. Sure experts
| in Prolog do need to understand backtracking, but it's more
| helpful to get beginners understanding how Prolog behaves
| than understand the details of backtracking.
|
| [0] https://karpathy.medium.com/yes-you-should-understand-
| backpr...
| medo-bear wrote:
| but with backprop you do not worry about computing
| derivatives by hand. backprop and AD in general means you
| do not have to do that. maybe one of us is
| misunderstanding the other
|
| i am saying that if you want to work with ML algorithms
| on a more deeper level you must learn backprop
|
| if you want to implement some models on the other hand,
| you can just follow a recipe approach
| matsemann wrote:
| > _there 's only a 50% chance they can write down the chain
| rule_
|
| Why should I, though? I remember the concept from calculus. I
| know pytorch keeps track of the various stuff I do to a vector
| and calculates a gradient based on it. What more do I need to
| know when all I want to do is to play with applications, not
| implement backprop myself?
| medo-bear wrote:
| if you don't understand chain rule then you dont understand
| backprop, which means you do not really understand how deep
| learning works. at most you can follow recipes cook book
| style. it is kind of how one can make a website without a
| deep understanding of networking
| baron_harkonnen wrote:
| > at most you can follow recipes cook book style.
|
| Here I disagree with you pretty strongly. Once someone is
| comfortable with differentiable programming it's much more
| obvious how to build and optimize any type of model.
|
| People should be more concerned about when to use
| derivatives, gradients, hessians, Laplace approximation etc
| rather than worry about the implementation details of these
| tools.
|
| Abstraction can also aid depth of understanding. I know
| plenty of people who can implement backprop, but then don't
| understand how to estimate parameter uncertainty from the
| Hessian. The latter is much more important for general
| model building.
| medo-bear wrote:
| i am not sure what you are disagreeing with. chain rule
| is basic calculus that precedes understanding hessians.
| my argument is, if you can not understand what the chain
| rule is, you will not understand more complicated
| mathematics in ML. do you think i am wrong ?
|
| EDIT: also uncertainty estimation is the stuff of
| probabalistic approach to ML. i would say that people who
| do probabalistic ML are quite mathematically capable (at
| least to my experience)
| baron_harkonnen wrote:
| > chain rule is basic calculus that precedes
| understanding hessians.
|
| It doesn't have to be that way. The hessian is an
| abstract idea and the chain rule and more specifically
| backpropagation are methods of computing the results for
| an abstract idea. When I want the hessian I want a matrix
| of second order partial derivatives, I'm not interested
| in how those are computed.
|
| For a more concrete example, would you say that using the
| quantile function for the normal distribution requires
| you to be able to implement it from scratch?
|
| There are many, very smart, very knowledgeable people
| that correctly use the normal quantile function (inverse
| CDF) every day for essential quantitative computation
| that have absolutely no idea how to implement the inverse
| error function (an essential part of the normal
| quantile). Would you say that you don't really know
| statistics if you can't do this? That a beginner must
| understand the implementation details of the inverse
| error function before making any claims about normal
| quantiles? I myself would absolutely need to pull up a
| copy of Numerical Recipes to do this. It would be, in my
| opinion, ludicrous to say that anyone wanting to write
| statistical code should understand and be able to
| implement the normal quantile function. Maybe in 1970
| that was true, but we have software to abstract that out
| for us.
|
| The same is becoming true of backprop. I can simply call
| jax.grad on my implementation of loss of the forward pass
| of the NN I'm interested in and get the gradient of that
| function, the same way I can call scipy.stats.norm.ppf to
| get that quantile for a normal. All that is important is
| that you understand what the quantile function of the
| normal distribution means for you to use it correctly,
| and again I suspect there are many practicing
| statisticians that don't know how to implement this.
|
| And to give you a bit of context, my view on this has
| developed from working with many people who can pass a
| calculus exam and perform the necessarily steps to
| compute a derivative, but yet have almost no intuition
| about what a derivative _means_ and how to use it and
| reason about it. Calculus historically focused on
| computation over intuition because that was what was
| needed to do practical work with calculus. Today the
| computation can take second place to the intuition
| because we have powerful tools that can take care of all
| the computation for you.
| tchalla wrote:
| > my argument is, if you can not understand what the
| chain rule is, you will not understand more complicated
| mathematics in ML.
|
| Are you sure about this?
| medo-bear wrote:
| yes. in europe admission into an ML-type masters degree
| lists all three standard levels of mathematical analysis
| as a bare minimum for application
| tchalla wrote:
| If by understand, you mean understand and not regurgitate
| it when asked as a trivia question - I agree with you.
| However, there are different interpretations of the chain
| rule.
| Imnimo wrote:
| Certainly there's a lot you can do without understanding
| backprop - you can train pre-made architectures, you can put
| pre-made layers together to build your own architecture, you
| can tweak hyperparameters and improve your model's accuracy,
| and so on. But I also think you will eventually run into a
| problem that would be much easier to debug if you understand
| backprop. If your model isn't learning, and your tensorboard
| graphs show your gradient magnitude is through the roof,
| it'll be much easier to track that down if you have a strong
| conceptual model of how gradients are calculated and how they
| flow backwards through the network.
___________________________________________________________________
(page generated 2021-10-11 23:00 UTC)