[HN Gopher] New Bounds for Matrix Multiplication: From Alpha to ...
___________________________________________________________________
New Bounds for Matrix Multiplication: From Alpha to Omega
Author : beefman
Score : 45 points
Date : 2024-03-07 16:26 UTC (6 hours ago)
(HTM) web link (epubs.siam.org)
(TXT) w3m dump (epubs.siam.org)
| barbarr wrote:
| I love these types of improvements - I figure they don't _really_
| have a practical application (except for shaving a few
| milliseconds off ultra large matrices), but they 're a testament
| to human ingenuity and the desire to strive for perfection no
| matter how far out of reach.
| UncleOxidant wrote:
| There's a hell of a lot of matrix multiplications going on in
| AI/ML. Shaving off a few milliseconds for ultra large matrices
| could save power and money for the big players like Google,
| Microsoft/OpenAI and Meta.
| amanda99 wrote:
| Those are by and large many small matrices being multiplied,
| these results have nothing to do with that.
| amanda99 wrote:
| No, these aren't practical algorithms. No one uses them in
| reality.
|
| The hope is that these small improvements will lead to new
| insights that will then lead to big improvements. I doubt how
| matrix multiplication will be done will really change
| regardless of these theoretical results, unless something
| really groundbreaking and shocking is discovered.
|
| It's all relatively pretty in the grand scheme of things.
| hinkley wrote:
| I think the last thing I saw on matrices was looking toward
| optimizing slightly larger sub-problems.
|
| That smells a little bit like a loop unrolling and/or
| locality improvement to me, You can often beat a theoretical
| algorithmic improvement with a practical one in such
| situations. And if you do a little bit of both you hopefully
| end up with something that's faster than the current most
| practical implementation.
| jcranmer wrote:
| Strassen's algorithm is rarely used: its primary use, to my
| understanding, is in matrix algorithms of more exotic fields
| than reals or complex numbers, where minimizing multiplies is
| extremely useful. Even then, Strassen only starts to beat out
| naive matrix multiply when you're looking at n >= 1000's--and
| at that size, you're probably starting to think about using
| sparse matrices where your implementation strategy is
| completely different. But for a regular dgemm or sgemm, it's
| not commonly used in BLAS implementations.
|
| The more advanced algorithms than Strassen's are even worse in
| terms of the cutover point, and are never seriously considered.
| taeric wrote:
| How big are the matrixes in some modern training pipelines?
| We always talk of absurdly large parameter spaces.
| jedbrown wrote:
| The cross-over can be around 500
| (https://doi.org/10.1109/SC.2016.58) for 2-level Strassen.
| It's not used by regular BLAS because it is less numerically
| stable (a concern that becomes more severe for the fancier
| fast MM algorithms). Whether or not the matrix can be
| compressed (as sparse, fast transforms, or data-sparse such
| as the various hierarchical low-rank representations) is more
| a statement about the problem domain, though it's true a
| sizable portion of applications that produce large matrices
| are producing matrices that are amenable to data-sparse
| representations.
| hinkley wrote:
| I have this feeling that the lower bound for matrix
| multiplication is heading towards some transcendental number that
| we don't know about, or a function of one we do.
|
| I wonder if finding the answer to that will clue someone in to
| what the optimal solution is, by pointing to some hidden
| structure to the problem that we have all missed.
| staplung wrote:
| Maybe instead they'll be like Feigenbaum constants: apparently
| universal constants that we don't really understand. For
| instance, we don't even know for sure that Feigenbaum constants
| are irrational, though mathematicians who study them strongly
| suspect they're transcendental.
|
| https://en.wikipedia.org/wiki/Feigenbaum_constants
| pillusmany wrote:
| Tensor cores massively accelerate matrix multiplication with no
| algorithmic breakthrough. Just by being smart about how you
| move/cache/reuse values, operations which are considered O(1).
|
| There should be an O notation which takes into account everything
| - various kinds of memory latencies, speed of logic operations,
| .... Obviously we have wall clock or total energy used.
| _kulang wrote:
| Isn't O notation asymptotic and a truncation? Adding an
| additional O(n) time to a O(n^2) algorithm would result in an
| O(n^2) algorithm? As far as I understand anyway. But you could
| make a more descriptive expression for the time based on what
| you know about the operations for sure
| AdamH12113 wrote:
| Tangential question: How can an algorithm end up with a
| fractional exponent? It's clear to me how nested loops can give
| O(n^2) or O(n^2), and how a binary search can give O(log n), but
| how on earth can you get something like O(n^2.371552), or even
| O(n^2.7)?
| vman512 wrote:
| It's pretty easy to get to O(N^(3/2)) at least, since that is
| sqrt(N)^3
|
| Imagine an algorithm that re-arranges a list into a sqrt(N) by
| sqrt(N) grid, and does O(num_columns^2) work for each row.
| jcranmer wrote:
| The classic example here is Strassen's algorithm. You divide
| your two input matrices into quarters, and then you need to do
| 7 matrix multiplies of the smaller (n/2) matrices. When you add
| up the total work done, this ends up needing to do n^(log base
| 2 of 7) work, or about n^2.8.
| csjh wrote:
| Many divide and conquer algorithms are expressed in the form
| O(n^logb(a)) because of the master theorem
| https://en.m.wikipedia.org/wiki/Master_theorem_(analysis_of_...
| kzrdude wrote:
| I think a simpler algorithm like
| https://en.wikipedia.org/wiki/Karatsuba_algorithm demonstrates
| how it can happen.
___________________________________________________________________
(page generated 2024-03-07 23:00 UTC)