hngopher.com

       [HN Gopher] New Bounds for Matrix Multiplication: From Alpha to ...
       ___________________________________________________________________
        
       New Bounds for Matrix Multiplication: From Alpha to Omega
        
       Author : beefman
       Score  : 45 points
       Date   : 2024-03-07 16:26 UTC (6 hours ago)
        
 (HTM) web link (epubs.siam.org)
 (TXT) w3m dump (epubs.siam.org)
        
       | barbarr wrote:
       | I love these types of improvements - I figure they don't _really_
       | have a practical application (except for shaving a few
       | milliseconds off ultra large matrices), but they 're a testament
       | to human ingenuity and the desire to strive for perfection no
       | matter how far out of reach.
        
         | UncleOxidant wrote:
         | There's a hell of a lot of matrix multiplications going on in
         | AI/ML. Shaving off a few milliseconds for ultra large matrices
         | could save power and money for the big players like Google,
         | Microsoft/OpenAI and Meta.
        
           | amanda99 wrote:
           | Those are by and large many small matrices being multiplied,
           | these results have nothing to do with that.
        
         | amanda99 wrote:
         | No, these aren't practical algorithms. No one uses them in
         | reality.
         | 
         | The hope is that these small improvements will lead to new
         | insights that will then lead to big improvements. I doubt how
         | matrix multiplication will be done will really change
         | regardless of these theoretical results, unless something
         | really groundbreaking and shocking is discovered.
         | 
         | It's all relatively pretty in the grand scheme of things.
        
           | hinkley wrote:
           | I think the last thing I saw on matrices was looking toward
           | optimizing slightly larger sub-problems.
           | 
           | That smells a little bit like a loop unrolling and/or
           | locality improvement to me, You can often beat a theoretical
           | algorithmic improvement with a practical one in such
           | situations. And if you do a little bit of both you hopefully
           | end up with something that's faster than the current most
           | practical implementation.
        
         | jcranmer wrote:
         | Strassen's algorithm is rarely used: its primary use, to my
         | understanding, is in matrix algorithms of more exotic fields
         | than reals or complex numbers, where minimizing multiplies is
         | extremely useful. Even then, Strassen only starts to beat out
         | naive matrix multiply when you're looking at n >= 1000's--and
         | at that size, you're probably starting to think about using
         | sparse matrices where your implementation strategy is
         | completely different. But for a regular dgemm or sgemm, it's
         | not commonly used in BLAS implementations.
         | 
         | The more advanced algorithms than Strassen's are even worse in
         | terms of the cutover point, and are never seriously considered.
        
           | taeric wrote:
           | How big are the matrixes in some modern training pipelines?
           | We always talk of absurdly large parameter spaces.
        
           | jedbrown wrote:
           | The cross-over can be around 500
           | (https://doi.org/10.1109/SC.2016.58) for 2-level Strassen.
           | It's not used by regular BLAS because it is less numerically
           | stable (a concern that becomes more severe for the fancier
           | fast MM algorithms). Whether or not the matrix can be
           | compressed (as sparse, fast transforms, or data-sparse such
           | as the various hierarchical low-rank representations) is more
           | a statement about the problem domain, though it's true a
           | sizable portion of applications that produce large matrices
           | are producing matrices that are amenable to data-sparse
           | representations.
        
       | hinkley wrote:
       | I have this feeling that the lower bound for matrix
       | multiplication is heading towards some transcendental number that
       | we don't know about, or a function of one we do.
       | 
       | I wonder if finding the answer to that will clue someone in to
       | what the optimal solution is, by pointing to some hidden
       | structure to the problem that we have all missed.
        
         | staplung wrote:
         | Maybe instead they'll be like Feigenbaum constants: apparently
         | universal constants that we don't really understand. For
         | instance, we don't even know for sure that Feigenbaum constants
         | are irrational, though mathematicians who study them strongly
         | suspect they're transcendental.
         | 
         | https://en.wikipedia.org/wiki/Feigenbaum_constants
        
       | pillusmany wrote:
       | Tensor cores massively accelerate matrix multiplication with no
       | algorithmic breakthrough. Just by being smart about how you
       | move/cache/reuse values, operations which are considered O(1).
       | 
       | There should be an O notation which takes into account everything
       | - various kinds of memory latencies, speed of logic operations,
       | .... Obviously we have wall clock or total energy used.
        
         | _kulang wrote:
         | Isn't O notation asymptotic and a truncation? Adding an
         | additional O(n) time to a O(n^2) algorithm would result in an
         | O(n^2) algorithm? As far as I understand anyway. But you could
         | make a more descriptive expression for the time based on what
         | you know about the operations for sure
        
       | AdamH12113 wrote:
       | Tangential question: How can an algorithm end up with a
       | fractional exponent? It's clear to me how nested loops can give
       | O(n^2) or O(n^2), and how a binary search can give O(log n), but
       | how on earth can you get something like O(n^2.371552), or even
       | O(n^2.7)?
        
         | vman512 wrote:
         | It's pretty easy to get to O(N^(3/2)) at least, since that is
         | sqrt(N)^3
         | 
         | Imagine an algorithm that re-arranges a list into a sqrt(N) by
         | sqrt(N) grid, and does O(num_columns^2) work for each row.
        
         | jcranmer wrote:
         | The classic example here is Strassen's algorithm. You divide
         | your two input matrices into quarters, and then you need to do
         | 7 matrix multiplies of the smaller (n/2) matrices. When you add
         | up the total work done, this ends up needing to do n^(log base
         | 2 of 7) work, or about n^2.8.
        
         | csjh wrote:
         | Many divide and conquer algorithms are expressed in the form
         | O(n^logb(a)) because of the master theorem
         | https://en.m.wikipedia.org/wiki/Master_theorem_(analysis_of_...
        
         | kzrdude wrote:
         | I think a simpler algorithm like
         | https://en.wikipedia.org/wiki/Karatsuba_algorithm demonstrates
         | how it can happen.
        
       ___________________________________________________________________
       (page generated 2024-03-07 23:00 UTC)