[HN Gopher] The matrix calculus you need for deep learning (2018)
       ___________________________________________________________________
        
       The matrix calculus you need for deep learning (2018)
        
       Author : cpp_frog
       Score  : 184 points
       Date   : 2023-07-30 17:18 UTC (1 days ago)
        
 (HTM) web link (explained.ai)
 (TXT) w3m dump (explained.ai)
        
       | trolan wrote:
       | I finished Vector Calculus last year and have no experience in
       | machine learning but this seems exceptionally thorough and would
       | have made my life easier having a practical explanation over a
       | mathematical one, but woe is the life of the engineering student
       | I guess.
        
         | parrt wrote:
         | Glad to be of assistance! Yeah, It really annoyed me that this
         | critical information was not listed in any one particular spot.
        
       | cs702 wrote:
       | Please change the link to the original source:
       | 
       | https://arxiv.org/abs/1802.01528
       | 
       | ---
       | 
       | EDIT: It turns out explained.ai is the personal website of one of
       | the authors, so there's no need to change the link. See comment
       | below.
        
         | parrt wrote:
         | :) Yeah, I use my own internal markdown to generate really nice
         | html (with fast latex-derived images for equations) and then
         | full-on latex. (tool is https://github.com/parrt/bookish)
         | 
         | I prefer reading on the web unless I'm offline. The latex its
         | super handy for printing a nice document.
        
           | cs702 wrote:
           | Even though it's shockingly common, I never cease to be
           | surprised and delighted when authors who are on HN take the
           | time to reply to comments about their work.
           | 
           | Thank you for doing this with Jeremy and sharing it with the
           | world!
        
             | parrt wrote:
             | Sure thing! Very enjoyable to have people use our work.
        
         | liorben-david wrote:
         | Explained.ai seems to be Terrence Parr's personal site
        
           | cs702 wrote:
           | Thank you for pointing it out. I edited my comment.
        
       | godelski wrote:
       | There's a common belief that you don't need math for ML or that
       | you need a lot of math for ML. So let me clarify:
       | 
       | You don't need math to make a model perform well, but you do need
       | math to know why your model is wrong.
        
       | rdedev wrote:
       | I had followed this when I was learning DL through Andrew NG's
       | course. In one of the lessons, he had the formula for calculating
       | the loss as well as it's derivatives.
       | 
       | I tried driving these formulas from scratch using what I learned
       | from OP's post but it felt like there was something missing. I
       | think it boils down to me not knowing how to aggregate those
       | element wise derivatives into a matrix form. Afaik the Matrix
       | cookbook and certain notes from Stanford cs231n that helped me
       | grok it fully
        
       | jayro wrote:
       | We just released a comprehensive online course on Multivariable
       | Calculus (https://mathacademy.com/courses/multivariable-
       | calculus), and we also have a course on Mathematics for Machine
       | Learning (https://mathacademy.com/courses/mathematics-for-
       | machine-lear...) that covers just the matrix calculus you need in
       | addition to just the linear algebra and statistics you need, etc.
       | I'm a founder and would be happy to answer any questions you
       | might have.
        
         | barrenko wrote:
         | Whom do you think Mathematics for Machine Learning benefits? In
         | my personal opinion the only audience for a plethora of courses
         | and articles available in that regard is useful mostly to the
         | people that recently went through college level Linear Algebra.
         | 
         | I'd like more resources geared for people that are done with
         | Khan Academy and want something as well made for more advanced
         | topics.
        
           | jayro wrote:
           | The Mathematics for Machine Learning course doesn't assume
           | knowledge of Linear Algebra, but covers the basics of Linear
           | Algebra you'll need along with the basics of Multivariable
           | Calculus, Statistics, Probability, etc. it does however,
           | assume knowledge of high-school math and Single Variable
           | Calculus. If you've been out of school for while, our
           | adaptive diagnostic exam will identify your knowledge gaps
           | and create a custom course for you that includes the
           | necessary remediation.
           | 
           | If you're REALLY rusty (maybe you've been out of school for a
           | while 5+ years), or maybe you just never learned the material
           | that well in the first place, then you might want to start
           | with one of our Mathematical Foundations courses that will
           | scaffold you up to the level where you can handle the content
           | in Mathematics for Machine Learning. More info can be found
           | here: https://mathacademy.com/courses
           | 
           | The Mathematics for Machine Learning course would be ideal
           | for anyone who majored in a STEM subject like CS (or at least
           | has a solid mathematical foundation) and is interested in
           | doing work in machine learning.
        
         | thewataccount wrote:
         | I understand you don't have a free trial, is there any chance
         | you have a demo somewhere of what it actually looks like
         | though? Like a tiny sample lesson or something along those
         | lines? It looks interesting but I'm just uncertain as to what
         | it actually "feels" like in practice vs lets say Brilliant,
         | etc.
         | 
         | I only see pictures, I'm curious the extent of the interaction
         | in the linear algebra/matrix calc specifically
        
       | quanto wrote:
       | The article/webpage is a nice walk-through for the uninitiated.
       | Half the challenge of doing matrix calculus is remembering the
       | dimension of the object you are dealing with (scalar, vector,
       | matrix, higher-dim tensor).
       | 
       | Ultimately, the point of using matrix calculus (or matrices in
       | general) is not just concision of notation but also understanding
       | that matrices are operators acting on members of some spaces,
       | i.e. vectors. It is this higher level abstraction that makes
       | matrices powerful.
       | 
       | For people who are familiar with the concepts but need a concise
       | refresher, the Wikipedia page serves well:
       | 
       | https://en.wikipedia.org/wiki/Matrix_calculus
        
         | PartiallyTyped wrote:
         | Adding, these operators are also "polymorphic"; for matrix
         | multiplication the only operations you need are (non
         | commutative) multiplication and addition; thus you can use
         | elements of any non-commutative ring, i.e. a set of elements
         | with those two operations :D
         | 
         | Matrices themselves form non-commutative rings too; and based
         | on this, you can think of a 4N x 4N matrix as a 4x4 matrix
         | whose elements are NxN matrices [1] :D
         | 
         | [1] https://youtu.be/FX4C-JpTFgY?list=PL49CF3715CB9EF31D&t=1107
         | 
         | You already know whose lecture it is :D
         | 
         | I love math.. I should have become a mathematician ...
        
           | mrfox321 wrote:
           | Re [1]: it's fairly concrete to simply say that matrix
           | multiplication can be performed block-wise.
        
             | PartiallyTyped wrote:
             | I don't disagree; but that is just an example of MM. The
             | gist is not that you can do block multiplication; but that
             | you can define matrices over any non commutative ring,
             | which includes other matrices - ie blocks.
        
               | mrfox321 wrote:
               | Yeah matrices are more abstract. I guess I am just
               | pointing out that your concrete example of non-
               | commutative rings (matrices of matrices) still needs a
               | proof to demonstrate bijection between 4N x 4N (scalar)
               | and 4 x 4 (N x N(scalar)).
               | 
               | Block MM demonstrates the equivalence.
        
       | _the_inflator wrote:
       | I just had a glimpse look at it. A good sum-up.
       | 
       | It seems that these topics are covered by the first one or two
       | semesters of a Math degree. Of course university is a bit more
       | advanced.
        
       | thatsadude wrote:
       | vec(ABC)=kron(C.T,A)vec(C) is all your need for matrix calculus!
        
         | esafak wrote:
         | Can anyone provide an intuitive explanation?
        
           | hayasaki wrote:
           | They have an error in their formula, but the vectorized
           | form(stacking columns of the matrix to form a vector) of the
           | triple matrix multiplication(A times B times C) can be
           | changed to a form involving kronecker products against
           | another vectorized matrix.
           | 
           | I wouldn't say that is everything, but it is a useful trick.
        
             | esafak wrote:
             | That is just reading out the equation in English. My
             | question is, why is it so?
        
               | hayasaki wrote:
               | The correct version you can find here: https://en.wikiped
               | ia.org/wiki/Kronecker_product#Matrix_equat...
               | 
               | The answer for why it is so is pretty trivial(just do the
               | indexing for each element) if you know the definition of
               | the kronecker product and what the 'vec' operation is.
               | 
               | For an intuitive explanation, try thinking of how the
               | matrix multiplication would work and consider how the
               | kronecker product pattern would apply to the vector.
               | 
               | This honestly isn't a super interesting result, and I
               | would say the original commenter was overstating its
               | importance in the matrix calculus. It really is more
               | useful for solving certain matrix problems, or speeding
               | up some tensor product calculations if you have things
               | with a certain structure. Like if we have discretization
               | of a PDE then depending on the representation the
               | operator in the discrete space may be a sum of kronecker
               | products, so applying those can be fast using a matrix
               | multiply and never storing the kroneckered matrices.
        
       | scrubs wrote:
       | Darn good post!
        
       | bluerooibos wrote:
       | Oh nice, I did most of this in school, and during my non-CS
       | engineering degree. Thanks for sharing!
       | 
       | Always wanted to dip my toes into ML, but I've never been
       | convinced of it's usefulness to the average solo developer, in
       | terms of things you can build with this new knowledge. Likely I
       | don't know enough about it to make that call though.
        
         | williamcotton wrote:
         | Here's an ML project I've been working on as a solo dev:
         | 
         | https://github.com/williamcotton/chordviz
         | 
         | Labeling software in React, CNN in PyTorch, prediction on app
         | in SwiftUI. 12,000 and counting hand labeled images of my hand
         | on a guitar fretboard!
        
       | nsajko wrote:
       | Another matrix math reference:
       | https://github.com/r-barnes/MatrixForensics
        
       | dang wrote:
       | Related:
       | 
       |  _The matrix calculus you need for deep learning (2018)_ -
       | https://news.ycombinator.com/item?id=26676729 - April 2021 (40
       | comments)
       | 
       |  _Matrix calculus for deep learning part 2_ -
       | https://news.ycombinator.com/item?id=23358761 - May 2020 (6
       | comments)
       | 
       |  _Matrix Calculus for Deep Learning_ -
       | https://news.ycombinator.com/item?id=21661545 - Nov 2019 (47
       | comments)
       | 
       |  _The Matrix Calculus You Need for Deep Learning_ -
       | https://news.ycombinator.com/item?id=17422770 - June 2018 (77
       | comments)
       | 
       |  _Matrix Calculus for Deep Learning_ -
       | https://news.ycombinator.com/item?id=16267178 - Jan 2018 (81
       | comments)
        
       | SnooSux wrote:
       | This is the resource I wish I had in 2018. Every grad school
       | course had a Linear Algebra review lecture but never got into the
       | Matrix Calculus I actually needed.
        
         | unpaddedantacid wrote:
         | [dead]
        
         | dpflan wrote:
         | True, this was a designated resource during my studies
         | (2020/2022), but they were post-2018.
        
         | ayhanfuat wrote:
         | That was my struggle, too. Imperial College London has a small
         | online course which covers similar topics
         | (https://www.coursera.org/learn/multivariate-calculus-
         | machine...). It helped a lot.
        
       ___________________________________________________________________
       (page generated 2023-07-31 23:01 UTC)