[HN Gopher] The Big Six Matrix Factorizations
___________________________________________________________________
The Big Six Matrix Factorizations
Author : jjgreen
Score : 212 points
Date : 2022-05-18 12:07 UTC (10 hours ago)
(HTM) web link (nhigham.com)
(TXT) w3m dump (nhigham.com)
| shusaku wrote:
| I love Dr. Higham's blog. In case anyone missed it, the whole
| series is on GitHub: https://github.com/higham/what-is
| howlin wrote:
| Now apply them to the data used to identify the "Big Five"
| personality traits! This is an interesting application of factor
| analysis/matrix factorization:
|
| https://en.wikipedia.org/wiki/Big_Five_personality_traits
| FabHK wrote:
| Yes, the Big Five were found by factor analysis, generally the
| PCA, which is basically an eigenvalue (spectral) decomposition
| of the (symmetric positive definite) covariance matrix of the
| data, which coincides with the SVD of the (normalised) data
| matrix.
|
| The SVD is everywhere. IIRC, the first Netflix price was one
| won with the SVD:
|
| https://en.wikipedia.org/wiki/Netflix_Prize
|
| https://cseweb.ucsd.edu/classes/fa17/cse291-b/reading/Progre...
| Royi wrote:
| Where's the data? It could be indeed interesting.
| jetbooster wrote:
| And then sell it to the Big Four!
|
| https://en.m.wikipedia.org/wiki/Big_Four_accounting_firms
| rhyn00 wrote:
| Spectral decomposition is pretty cool.
|
| Application 1 - spectral clustering - an alternative to k-means
| for nonlinear clusters. Get a Distance matrix of your data,
| spectral decomp, run k-means on your k top eigen vectors and
| that's your clusters.
|
| Application 2 - graph clustering - (run spectral clustering on
| adj matrix!)
|
| There's some tricks to getting it to work in practice like
| normalizing but it's a simple and powerful method. Also the
| matrices can get big so it helps a lot to use sparse matrix
| libraries for the computations.
|
| [1] https://towardsdatascience.com/spectral-clustering-
| aba2640c0....
|
| [2] https://www.hindawi.com/journals/ddns/2020/4540302/
| arbitrandomuser wrote:
| I've never given a second thought about what the etymology of
| "spectral" in spectral decomposition is. Somewhere in the back
| of my mind (and I guess many students of physics have the same
| notion) subconsciously i assumed it originates from eigenvalues
| of the Hamiltonian determining the atomic spectral lines . But
| I've never followed up on it and actually looked it up .
| the_svd_doctor wrote:
| I might be wrong about the exact historical reason.
|
| But the way I see it "spectral decomposition of A" is a way
| to express A as a sum of orthogonal, rank-1, operators. A =
| \sum l_i u_i u_i^T. Those l_i are the eigenvalues; u_i are
| the eigenvectors.
|
| The eigenvectors look a whole lot like the "modes" in a
| Fourier decomposition. And if you plot (i, l_i), the
| eigenvalues are a bit like the "spectrum" (the amplitude of
| each mode).
|
| In fact, the complex exponentials (the modes in the Fourier
| decomposition) are also eigenvectors of a specific operator
| (the Laplacian).
|
| Math people are good at finding connections between things.
| [deleted]
| throwawaymaths wrote:
| According to the almighty wikipedia, The connection is
| correct but it turned out to be an accident. David Hilbert
| who coined spectral theory was surprised when it was found to
| be applicable to solving quantum mechanical spectra.
| bloqs wrote:
| I love that on this website I can find comments that I simply
| don't understand a word of. Good on you for doing the stuff you
| do sir.
| jdeaton wrote:
| This is a fantastically clear outline of this topic. Thank you!
|
| > The terms "factorization" and "decomposition" are synonymous
| and it is a matter of convention which is used. Our list
| comprises three factorization and three decompositions.
|
| I can't tell if this is a joke: right after saying that these two
| words mean the same thing in this context, they are then used to
| categorize the methods.
|
| Edit: This is the kind of content that proves the "dead internet
| theory" is wrong.
| FabHK wrote:
| It's not a categorisation, but some trivia regarding the name.
| For example: By convention, the SVD is called the SVD (singular
| value decomposition), not the SVF (singular value
| factorisation). That would be synonymous, but that's not what
| it's called. Similarly for the other 5 methods.
| tomnipotent wrote:
| > I can't tell if this is a joke
|
| It makes sense, the author is just using the original name of
| the method to bucket them e.g. spectral decomposition.
| yuppiemephisto wrote:
| the words are used synonymously in practice, but i think
| there's an opportunity to distinguish them usefully.
|
| suggestion:
|
| - 'factoring' is a multiplicative breakdown, a 'composition',
| like prime factorization.
|
| - 'decomposition' could also be called 'partition', and is an
| additive breakdown, like how 3 could be split into 2 + 1
| isaacfrond wrote:
| The article presents a note on the 6 well known matrix
| compositions. He states that all of them have cubic complexity,
| but practical algorithms with better exponents exist for all of
| them.
| oh_my_goodness wrote:
| Could you link a practical algorithm with an exponent lower
| than 3? (I think of these things
| https://en.wikipedia.org/wiki/Computational_complexity_of_ma...
| as not being practical, but I'd love to be wrong. )
| daniel-cussen wrote:
| https://www.fgemm.com, coming soon.
| gnufx wrote:
| For something Strassen-ish you could look at
| https://jianyuhuang.com/papers/sc16.pdf and the GPU
| implementation https://apps.cs.utexas.edu/apps/sites/default/
| files/tech_rep...
| martinhath wrote:
| The fact that they're all cubic isn't really the notable part
| of the runtime of computing the different decompositions,
| because the constants involved are really different. In
| practice, a common reason for computing many of these
| decompositions is to solve a linear system `Ax=b`, because with
| the decomposition in hand it is really easy to solve the whole
| system (using e.g. backsubstitution). For instance, with C++s
| Eigen, look at the 100x100 column of [1], and we can see that
| there's orders of magnitude difference between the fast and
| slow approaches. THey're all still cubic, sure, but we're
| talking 168x difference here.
|
| (of course, it's not so clear cut, since robustness varies, not
| all methods are applicable, and the benchmark is for solving
| the system, not computing the decomposition, but overall,
| knowledge of which decomposition is fast and which is not is
| absolutely crucial to practitioners)
|
| [1]:
| https://eigen.tuxfamily.org/dox/group__DenseDecompositionBen...
| adgjlsfhk1 wrote:
| They're all basically O(M(n)) where M(n) is your matrix
| multiplication time. Even though M(n)<=n^2.3...., it's
| reasonable to say that it's n^3, because in practice, no one
| uses the sub-cubic algorithms. Strassen is possibly workable,
| but it isn't widely used, and all of the sub-cubic algorithms
| have accuracy tradeoffs.
| photon-torpedo wrote:
| > but practical algorithms with better exponents exist for all
| of them.
|
| I'm aware of randomized algorithms with better complexity,
| which come at the cost of only giving approximate results
| (though the approximation may be perfectly good for practical
| purposes). See e.g. [1]. Are there other approaches?
|
| [1] https://doi.org/10.1137/090771806
| owlbite wrote:
| If the data is Sparse (which is not uncommon for large
| matrices in the real world), you can exploit the sparsity to
| do significantly better then O(n**3).
| akomtu wrote:
| Imo, a better way to present this is to draw a diagram where
| various decompositions are connected by arrows that explain how
| one decomposition can be turned into another.
| jdeaton wrote:
| That is fascinating. Do you have an example of this which you
| can point to?
| swframe2 wrote:
| See this playlist on semidefinite programming:
| https://www.youtube.com/playlist?list=PLqwozWPBo-FtOsjrla4jk...
| for the surprising way factorization can be used to partially
| solve a NP problem.
| the_svd_doctor wrote:
| Higham is an outstanding researcher. One book I really enjoyed
| from him is
| https://epubs.siam.org/doi/book/10.1137/1.9780898717778. He has
| many many other excellent ones.
| whatever1 wrote:
| Boo LU!!! Go with stable QR!
| the_svd_doctor wrote:
| But LU, when it works, is less flops (only half though, IIRC).
| And doesn't require a square-root. I think that's why it used
| to be preferred when compute was a very scarce resource.
| Clearly that's not really the case anymore today.
|
| But otherwise I agree, QR is almost better on every front.
| heinrichhartman wrote:
| From a theoretical perspective the most fundamental decomposition
| is the rank-decomposition:
|
| A = X D Y
|
| with X,Y invertible and D diagonal. It's called rank
| decomposition, because you can read off the rank from A by
| counting the non-zero entries in D. It's also useful to
| determining bases for the image and the kernel of A. Every math
| student learns a version of that in their first lecture series of
| Linear Algebra.
|
| Curiously, the rank decomposition is not covered in the numerical
| literature. Also it's not possible to derive a rank decomposition
| from LR, QR decomposition, although the underlying algorithms
| (Gauss, Gram-Schmidt) could be used to do so.
|
| It took me multiple weeks of work, to understand what the problem
| with this algorithms is, and what the practical options are to
| establish a rank decomposition are. Full details are available on
| my blog:
|
| https://www.heinrichhartmann.com/posts/2021-03-08-rank-decom...
|
| TL;DR. Your options are (1) SVD, (2) QR factorization with column
| pivoting (part of LAPACK), (3) LDU factorization with total
| pivoting (not implemented in BLAS/LAPACK/etc.)
| whatshisface wrote:
| What makes it the most fundamental?
| edflsafoiewq wrote:
| It's basically the matrix form of dim(domain(f)) =
| dim(ker(f)) + dim(im(f)), or domain(f)/ker(f) [?] im(f) with
| f (inducing) the isomorphism.
| whatshisface wrote:
| But the image and kernel are not the only two properties of
| a matrix.
| heinrichhartman wrote:
| Sure, but from an algebraic (categorical) perspective,
| taking kernels, co-kernels and images are the most
| fundamental operations you can think of.
|
| The question is: How do you compute them in an effective
| (avoiding full SVD) and stable way?
| heinrichhartman wrote:
| As it turns out, the author does cover the rank-decomposition
| (under the name "Rank-Revealing Factorization") in his blog as
| well:
|
| https://nhigham.com/2021/05/19/what-is-a-rank-revealing-fact...
| FabHK wrote:
| Higham there adds the condition that X, Y be well-conditioned
| (as they are in the SVD, for example). That's a problem with
| the Jordan decomposition A = X J X^-1: the X is non-singular
| in theory, but can be ill-conditioned, and thus the Jordan
| decomposition is not stable and not really used in practice
| (as highlighted in the original article).
| joppy wrote:
| This decomposition is not used much because it is not unique in
| any sense, it is not numerically stable, and it is fairly
| uninteresting: the matrix D can always be taken to consist of a
| diagonal of 1s of length rank(A) and the rest zeros. It is much
| more interesting once X and Y are constrained to something like
| unitary matrices (in which case we get SVD).
|
| This "rank decomposition" is a bit interesting algebraically
| rather than numerically, since then the numerical stability
| problems disappear. Also, if we take all matrices to be over
| the integers (and require that "invertible" means the inverse
| is also an integer matrix), then the rank decomposition is a
| lot like the Smith normal form of a matrix.
| heinrichhartman wrote:
| It's not unique, but it can be implemented in a way that is
| stable. E.g. SVD does this (but is quite overkill for the
| requirements).
|
| It's highly relevant from an algebraic perspective, hence
| it's curious that it's not covered (at all) in the numeric
| literature.
| bigbillheck wrote:
| Numerical linear algebra is very different from linear
| algebra.
| whatshisface wrote:
| I could be misunderstanding, but I think you'd get all the
| algebraic content out of it by computing the dimensions of
| the image and kernel, then working with them from there on.
| Why would you want to have a matrix decomposition that
| separated them?
|
| Granted, computing the dimensions of the kernel is not so
| easy, especially because a pair of vectors can be
| arbitrarily close without being linearly dependent. No
| wonder there is no stable way to do it, it technically
| exceeds the capability of finite-precision numbers.
| Multiplying a vector of unequal components by _most_
| numbers, especially those with non-terminating base-two
| decimal representations, will produce a vector that is
| linearly independent when rounded back to finite precision.
|
| Clearly then, linear independence on computers has to be
| considered in the continuous sense in which singular values
| reveal it, where a very small singular value represents an
| "almost-kernel," which is the closest thing to a kernel you
| are likely to find outside of carefully constructed
| examples or integer matrices.
| magnio wrote:
| Isn't that just Gaussian elimination?
| leephillips wrote:
| Thank you for this wonderfully concise summary: it's convenient
| to have all this in one compact document.
|
| I suppose "flops" means "floating-point operations" here?
| Heretofore I've always encountered this as an abbreviation for
| "floating-point operations per second".
| mvcalder wrote:
| When I used Matlab as an undergrad in the late 80's it used to
| report the flop count after each command, and those were
| floating point operations, no time units involved. I tried to
| find an image of the old command line but found this historical
| note first [1] and thought it would be of interest to readers
| of the article.
|
| [1]
| http://www.stat.uchicago.edu/~lekheng/courses/309f14/flops/v...
| and
| [deleted]
| the_svd_doctor wrote:
| It's always been very ambiguous in practice. To avoid ambiguity
| I usually use `flop/s` but not everyone likes that :)
| MatteoFrigo wrote:
| Speaking of SVD doctors, I heard many years ago (from Alan
| Edelman) that Gene Golub's license plate used to be "DR SVD".
| Later he switched to "PROF SVD".
|
| I couldn't confirm the DR SVD part, but the PROF SVD story
| appears to be real: https://www.mathworks.com/company/newslet
| ters/articles/profe...
| the_svd_doctor wrote:
| I heard something very similar from some of his very close
| (ex-)colleagues, so it appears to be true :)
| arinlen wrote:
| > To avoid ambiguity I usually use `flop/s` but not everyone
| likes that :)
|
| Flop/s makes no sense and is outright wrong. The whole point
| of flops is to express how many floating point operations are
| required by an algorithm. How many operations are performed
| per second is a property of the hardware you're using to run
| an implementation of the algorithm.
|
| You want to express computational complexity in terms of
| floating point operations.
| the_svd_doctor wrote:
| Well, yeah, I use it in the context of floating-points-
| operations-per-seconds. That's the most common use in my
| field. I was replying to the parent comment. No need to use
| this kind of tone.
| zodiac wrote:
| Maybe the_svd_doctor uses flop/s to describe hardware, like
| you said
| gnufx wrote:
| The number of operations per second varies by at least an
| order of magnitude on the same hardware depending on the
| GEMM algorithm you use (reference or Goto), and you quote
| the performance of the implementation in terms of FLOP/s
| knowing the number of FLOPs required by the computation.
| That makes sense to people who implement and measure these
| things.
| hanche wrote:
| Indeed, the meaning of "flops" is ambiguous, but that seems
| hard to avoid. Fortunately, the ambiguity is easily resolved
| from context in most cases, as it is here.
| melling wrote:
| Any suggestions on what to learn in Linear Algebra after Gilbert
| Strang's 18.06SC?
|
| https://ocw.mit.edu/courses/18-06sc-linear-algebra-fall-2011...
|
| My goal is to learn the math behind machine learning.
| isaacfrond wrote:
| Here are some links from my favourite reference website:
|
| Book: Mathematics for Machine Learning (mml-book.github.io)
| https://news.ycombinator.com/item?id=16750789
|
| Mathematics for Machine Learning [pdf] (mml-book.com)
| https://news.ycombinator.com/item?id=21293132
|
| Mathematics of Machine Learning (2016) (ibm.com)
| https://news.ycombinator.com/item?id=15146746
| [deleted]
| time_to_smile wrote:
| The vast majority of the mathematics required to really
| understand ML is just probability, calculus and basic linear
| algebra. If you know these already and still struggle it's
| because the notation is often terse and assumes a specific
| context. Regarding this the only answer is to keep reading key
| papers and work through the math, ideally in code as well.
|
| For most current gen deep learning there's not even that much
| math, it's mostly a growing library of what are basically
| engineering tricks. For example an LSTM is almost exclusively
| very basic linear algebra with some calculus used to optimize
| it. But by it's nature the calculus can't be done by hand and
| the actual implementation of all that basic linear algebra is
| tricky and takes practice.
|
| You'll learn more by implementing things from scratch based on
| the math than you will trying to read through all the
| background material hoping that one day it will all make sense.
| It only ever makes sense when implementing and by continuous
| reading/practice.
| natly wrote:
| These lectures are fantastic after you've mastered the basics
| of linear algebra https://www.youtube.com/watch?v=McLq1hEq3UY
| (convex optimization, by a very experienced and often funny
| lecturer)
| nightski wrote:
| Stephen Boyd is a very good lecturer! I watched his videos on
| linear dynamical systems almost a decade ago and thought he
| did a fantastic job. Would highly recommend.
| sjburt wrote:
| The problem set for the stanford linear dynamic systems
| that he taught is also fantastic. The level of difficulty
| and the breadth of applications leaves you with so many
| tools.
| screye wrote:
| I am a big fan of Murphy's ML book [1].
|
| [1] https://smile.amazon.com/Probabilistic-Machine-Learning-
| Intr...
|
| It covers almost all the math you'd need to start doing ML
| research. I find it to be the ideal 'one book to rule them all'
| book for CS people in ML. Although, pure math grads in ML might
| find the book to not go deep enough.
| ivad wrote:
| You might enjoy these notes:
| http://mlg.eng.cam.ac.uk/teaching/4f13/2122/ They give (I
| think) a good general overview, while also going a little bit
| more in-depth in a few areas (e.g., Gaussian Processes).
| swframe2 wrote:
| The math used in ML papers is very diverse. There is too much
| to learn. It is easier if you pick a problem and learn the math
| for it. Find a group that is already working on that problem
| and ask them for best way to learn the math for it.
| DavidSJ wrote:
| A strong foundation in linear algebra, multivariable
| calculus, probability, and statistics is going to be
| generally applicable almost no matter what problem you work
| on.
| DavidSJ wrote:
| If you want a beautiful abstract perspective on linear algebra
| to complement Strang's more down-to-earth, matrix- and linear
| equation-oriented lectures, pick up Axler's _Linear Algebra
| Done Right_.
| natly wrote:
| He also released a lecture series recently (Axler himself!)
| which barely anyone seems to be talking about: https://www.yo
| utube.com/playlist?list=PLGAnmvB9m7zOBVCZBUUmS...
| ngmc wrote:
| You could take Strang's follow-up course on learning from data.
|
| https://ocw.mit.edu/courses/18-065-matrix-methods-in-data-an...
| jstx1 wrote:
| I don't know if 18.065 was worth the time, it felt like a
| repeat of 18.06 with hardly anything new (and not nearly
| enough context around actual applications).
| mgaunard wrote:
| Missing the LDLT decomposition.
| the_svd_doctor wrote:
| It can't always be computed accurately; I suspect that's why
| it's not mentioned.
|
| Bunch-Kaufman is the "right" factorization for indefinite
| Hermitian/symmetric matrices but it's not as well know.
___________________________________________________________________
(page generated 2022-05-18 23:00 UTC)