[HN Gopher] Discovering faster matrix multiplication algorithms ...
___________________________________________________________________
Discovering faster matrix multiplication algorithms with
reinforcement learning
Author : shantanu_sharma
Score : 331 points
Date : 2022-10-05 15:12 UTC (7 hours ago)
(HTM) web link (www.nature.com)
(TXT) w3m dump (www.nature.com)
| [deleted]
| didibus wrote:
| Can someone knowledgeable tell me if it discovered an algorithm
| we can understand and re-implement ourself, like is the pseudo-
| code for it known? Or is it kind of stuck in the infered function
| of the ML model?
| bugzz wrote:
| yes, they implemented the resulting algorithms and have charts
| showing real world performance on GPUs and TPUs
| ckrapu wrote:
| 50% fewer operations for skewsymmetric matrix vector product is
| pretty big, IMO.
| xpe wrote:
| The paper is a great read -- an interesting approach, fun
| theoretical comparisons, plus benchmarks for optimized algorithms
| for specific hardware (CPU and GPU, allowing for differing
| instruction sets).
|
| But it doesn't stop there; from the "Discussion" section:
|
| > One important strength of AlphaTensor is its flexibility to
| support complex stochastic and non-differentiable rewards (from
| the tensor rank to practical efficiency on specific hardware), in
| addition to finding algorithms for custom operations in a wide
| variety of spaces (such as finite fields). We believe this will
| spur applications of AlphaTensor towards designing algorithms
| that optimize metrics that we did not consider here, such as
| numerical stability or energy usage.
| machinekob wrote:
| Wasn't this based on some other paper? I have read something like
| that a year+ (?) ago also about deep learning based matrix
| multiplication boost.
| KKKKkkkk1 wrote:
| Reinforcement learning is usually introduced as a technique to
| train an agent in simulation that would then be let loose in a
| real environment. Interesting that they choose to treat it is a
| search strategy.
| mikewarot wrote:
| I have a gut feeling that there is a faster way to compute
| logarithms going from least to most significant bit. How would I
| go about using ML to find it?
|
| [Edit] I think Feynman's algorithm might do it:
|
| "Consider the problem of finding the logarithm of a fractional
| number between 1 and 2. (The algorithm can be generalized without
| too much difficulty.) Feynman observed that any such number can
| be uniquely represented as a product of numbers of the form 1 +
| 2^(-k), where k is an integer. Testing for the presence of each
| of these factors in a binary representation is simply a matter of
| a shift and a subtraction. Once the factors are determined, the
| logarithm can be computed by adding together the precomputed
| logarithms of the factors. The algorithm fit the Connection
| Machine especially well because the small table of the logarithms
| of 1 + 2^(-k) could be shared by all the processors. The entire
| computation took less time than doing a division."
| radford-neal wrote:
| Seems similar to the CORDIC algorithm
| (https://en.wikipedia.org/wiki/CORDIC).
| adgjlsfhk1 wrote:
| note that this algorithm will no longer be good. for precisions
| up to roughly 1000 bits small tables combined with minimax
| polynomials are optimal and for higher precision, you want to
| use more complicated methods. if you're interested, the ARB
| library unlikely the currently fastest known methods.
| xpe wrote:
| Start with these questions:
|
| 1. How big is the search space?
|
| 2. What analysis approaches are likely to bear fruit for the
| search space? (theoretical analysis? optimization?)
|
| 3. If optimization is called for, what kind?
| svnt wrote:
| The big claim turns out to be a little overstated. The claim:
|
| > AlphaTensor's algorithm improves on Strassen's two-level
| algorithm for the first time, to our knowledge, since its
| discovery 50 years ago.
|
| reduces to:
|
| > AlphaTensor discovers algorithms that outperform the Strassen-
| square algorithm, which is a fast algorithm for large square
| matrices31,32. Although the discovered algorithm has the same
| theoretical complexity as Strassen-square, it outperforms it in
| practice, as it is optimized for the considered hardware.
| Interestingly, AlphaTensor finds algorithms with a larger number
| of additions compared with Strassen-square (or equivalently,
| denser decompositions), but the discovered algorithms generate
| individual operations that can be efficiently fused by the
| specific XLA33 grouping procedure and thus are more tailored
| towards the compiler stack we use. The algorithms found by
| AlphaTensor also provide gains on matrix sizes larger than what
| they were optimized for. Finally, Fig. 5c shows the importance of
| tailoring to particular hardware, as algorithms optimized for one
| hardware do not perform as well on other hardware.
|
| So they improved on Strassen's 50-year-old algorithm by
| optimizing it for hardware that has only existed for a few years
| and de-optimizing it for human use? There is so much cool stuff
| here without claiming to do something you really didn't.
| atty wrote:
| On the other hand, real world performance is really the only
| useful metric for matrix multiplication. I don't really care
| about the theoretical performance, or the number of operations.
| Not disagreeing with your take that the claim is grandiose,
| just pointing out that finding a generalizable way to automate
| the improvement of what is almost certainly the most important
| mathematical operation a computer can do is worth some
| attention. It also suggests other mathematical operations could
| be improved this way as well - potentially algorithms that
| haven't gotten as much theoretical or practical attention as
| matrix multiplication. With all that said, they even point out
| that other people have worked on this before using other
| optimization strategies. I'd guess they got into Nature either
| by using the Deepmind name, because Nature really loves
| reinforcement learning because it's a cool topic that draws
| views, or because they're the first group to find their
| algorithm leads to real improvements. (Probably a mixture of
| all three, I'd guess)
| throwawaymaths wrote:
| > the number of operations
|
| This is a very good proxy for actual real world speed. It's
| pretty much "as good as it gets" for most straight
| computational tasks, though sometimes memory movement is your
| real bottleneck.
| fulafel wrote:
| It's become a much worse proxy since the memory wall.
| sdenton4 wrote:
| This is questionable...
|
| The 'best' algorithms for matrix multiplication are
| galactic algorithms that provide no actual benefit. Raw
| operation counts are a good proxy for speed, but the big-O
| complexity that people actually chase hasn't been
| especially helpful for this problem in the last twenty+
| years.
|
| https://en.wikipedia.org/wiki/Matrix_multiplication_algorit
| h...
| throwawaymaths wrote:
| Probably the most common matrix multiplication is (nx9) x
| (9xm) (9 = 3x3 cells from a convnet) . If you can
| optimize the shit out of those you might be in business
| for something interesting.
|
| Though to be honest the real slow step in machine
| learning is training and the slow step in training is the
| _outer product_ of two matrices.... I don 't believe
| there is an algorithmic way out of that one.
|
| For non-ml/non-GF purposes, you might also worry about
| numerical stability of these matrix multiplications,
| which is not addressed in this paper.
| a1369209993 wrote:
| > and the slow step in training is the _outer product_ of
| two matrices.... I don 't believe there is an algorithmic
| way out of that one.
|
| Well, there are several, but the obvious ones tend to
| require strange or unrealistic assumptions about the
| hardware. The most obvious such assumption, IMO, being
| that the hardware is arranged in 3D space in a manner
| roughly analogous to a human brain, which tends to be at
| odds with the common practice of mostly-planar
| photolithography, and with the preference to be able to
| change the network topology experimentally without
| building new hardware.
| dekhn wrote:
| Science proceeds at the rate by which linearly larger
| matrices can be decomposed in less than exponential time.
| matheusd wrote:
| This is excellent! You should make a T-Shirt with this
| quote on it!
| psychphysic wrote:
| Surely theoretical improvement begets real world, especially
| in the context of highly specialised hardware.
|
| It was with theoretical performance improvement that
| motivated the creation of SIMD and led to real world speed
| ups.
| fulafel wrote:
| It's interesting how both SIMD and big-o have had so
| different reasons for being of limited but still
| significant relevance.
|
| SIMD, and vector processors like it was called in the 70s,
| delivered practical speedups in simple benchmarks right
| away but most applications dont't take advantage because of
| SW engineering reasons. Whereas big-O improvement ignores
| important components of performance per unit of time
| (memory access and constant factors) and is purely
| theoretical in an essential sense.
| taeric wrote:
| Not necessarily. In particular, this could have slower
| growth, but still higher cost on normal workloads.
| chrchang523 wrote:
| Sometimes. In the case of matrix multiplication, there is a
| pretty large backlog of "galactic algorithms" going down to
| ~O(n^2.373) that haven't yet led to real-world
| improvements. Strassen's ~O(n^2.807) algorithm is only
| considered over the basic O(n^3) strategy for n>1000.
| HarHarVeryFunny wrote:
| Winograd O(n^2.37) is a win for 3x3 convolutions in
| cuDNN, so it can be implemented efficiently.
| chrchang523 wrote:
| My understanding is that the Winograd minimal filtering
| algorithms used in cuDNN are different from the O(n^2.37)
| Coppersmith-Winograd-descended matrix multiplication
| algorithms. But I acknowledge that these can be
| considered cousins, produced by the same line of
| research.
| HarHarVeryFunny wrote:
| I'm pretty sure there's no difference. It does seem to be
| pretty hard to turn the theoretical win into a practical
| one - the GPU kernel needs to be coded extremely
| efficiently to match the underlying hardware. AFAIK it's
| only a win for 3x3 - maybe for one other size too.
| Originally Winograd wasn't supported by cuDNN on NVidia's
| Tensor Cores (matmul-specific hardware on more recentish
| GPUs), vs CUDA cores, but a Google search seems to
| indicate it can be done - not sure if that's in cuDNN
| though.
| letitgo12345 wrote:
| I think they find a N^2.778 complexity algorithm (obtained by
| applying their 4x4 discovered algorithm recursively) so this is
| not correct.
| Ar-Curunir wrote:
| You're wrong. They discovered an algorithm with fewer
| multiplications for 4x4 matrices.
|
| They also discovered better algorithms for other dimensions.
| dekhn wrote:
| That's the DeepMind modus operandi- and why they publish in
| Nature. It's in their best interest to claim as much covered
| ground as possible so they can keep working towards their end-
| goal. But most people who read Nature already know that results
| in that journal are overhyped. You just have to apply a
| specific prior/translator when you read their papers.
| noobermin wrote:
| This really is a depressingly true sentiment. Nature is both
| the most important publication but so often, the big shots
| publish overhyped stuff there that doesn't deserve it but get
| away with it since they are bigshots. Underscore here for
| another piece of evidence of the dysfunction in science.
| brutusborn wrote:
| I don't doubt the sentiment is true, but hasn't this kind
| of science always been this way? By that I mean authors
| inflating the importance of their work; everyone wants to
| be seen as having the biggest breakthroughs.
|
| When I think of dysfunction in 'science' I usually think of
| unfalsifiable hypothesis, the repeatability crisis in
| Psychology, p-hacking in Medicine, misuse of statistical
| methods in Economics and other epistemic issues, but I
| don't think of exaggerations like this.
| dekhn wrote:
| Scientists have long been self-promoters who desire that
| their theories become the dominant ones and they use many
| techniques to achieve this.
|
| However, the trend towards maximizing the predicted
| outcomes of your research really took off during the
| human genomics project.
| adalacelove wrote:
| > Scientists have long been self-promoters
|
| The famous ones more than the rest, I guess. To my mind
| nevertheless comes Cavendish, my hero.
| YorkshireSeason wrote:
| I would imagine that it's the metrics universities and
| funding agencies apply in promotion decisions. For
| example, my (well-known) university decided to measure
| impact, and takes "Twitter engagement" as one proxy
| metric for impact -- against my explicit recommendations.
| I'll leave the consequences to everybody's imagination.
| dekhn wrote:
| Yes. In fact many scientists post to twitter but don't
| look at the replies (Emily Bender is an example) or even
| block you if you disagree with them. That's not
| engagement and I wonder what the dean would do about a
| promotion where the scientist just sort of blathered on
| twitter and had lots of followers, but wasn't actually
| providing any real scientific value (again, Emily Bender
| is an example).
| YorkshireSeason wrote:
| Deans typically lack detailed technical knowledge to
| evaluate candidates (not to mention time), they just go
| with the flow, and want to run the department smoothly.
|
| I think the real mechanism is this: those who make the
| decision (those with the most power, i.e. those who bring
| in the most funding and can threaten to leave if they
| don't get their way) already know whom they want and they
| cherry-pick data, eg "impact" figures, to bolster their
| case. That enables the dean then to justify the decision
| in public with those cherry-picked figures ... (the dean
| can hardly say we are hiring X because otherwise top
| funding getter Y will leave)
|
| I think Twitter is less important in STEM subjects than
| in social sciences or humanities, as STEM has more
| clearcut results.
| chabons wrote:
| Maybe I'm mis-reading the paper, but my interpretation was that
| the algorithm discovered using a reward which scales with HW
| performance matches the theoretical complexity of Strassen, but
| is more performant on the HW.
|
| They _also_ identified algorithms which do fewer matrix
| multiplications than Strassen, improving the lower bound of
| matrix multiplies required (They highlight this in Fig 3).
|
| In that light, I thought their claim was fair. They've
| discovered (different) algorithms which are both theoretically
| and practically better than Strassen's.
| [deleted]
| boole1854 wrote:
| This is not a correct summary of the results of the paper.
|
| First, you cut out the initial part of the sentence about
| improving on Strassen's two-level algorithm. Here is the
| complete sentence:
|
| > Particularly relevant is the case of 4 x 4 matrices in a
| finite field, where AlphaTensor's algorithm improves on
| Strassen's two-level algorithm for the first time, to our
| knowledge, since its discovery 50 years ago.
|
| That is, note that the improvement cited in the latter part of
| the sentence is for the _4 x 4 matrix case_.
|
| But then your next quote is not in reference to 4 x 4 matrices.
| That is, for _larger matrix sizes_ , the best algorithm
| AlphaTensor discovered had the same theoretical complexity as
| Strassen-square, but better performance.
|
| _EDIT: my next comments are confused and best ignored. See
| reply from pxx below._
|
| For the 4 x 4 matrix case, the theoretical complexity of
| AlphaTensor's algorithm was O(N^2.778) compared with the
| Strassen algorithm's complexity of O(N^2.8074), which _is_ an
| improvement.
| Nihmie wrote:
| To be pedantic about this complexity estimate for the 4 x 4
| case, both of them reduce to a O(1). If N == 4, it's a
| constant, so we have O(4^2.778) == O(4^2.8074) == O(1).
|
| Talking about _scaling_ for a problem that has no scaling
| factor is a bit odd.
| vlovich123 wrote:
| If you have to process many such matrices, you're not at
| O(1). I would imagine that's what the O is referring to in
| this case.
| cochne wrote:
| If you look at it that way then any algorithm is just
| o(n) where n is the number of matrices. O does not care
| about constant factors
| est31 wrote:
| Yeah the paper uses O notation for complexity, but never
| when N is constant. For constant N and M, the paper uses
| the number of scalar multiplication operations performed as
| complexity measure. According to Figure 3, their algorithm
| has decreased the best known value from 49 to 47 for 4x4
| matrices, and from 98 to 96 for 5x5 matrices (as well as
| some further state of the art improvements).
| dgacmu wrote:
| In Z_2 - modular matrix multiplication, not conventional
| matrix multiplication. And in this case, modular over what I
| assume is GF(2), i.e., single bits.
| aaaaaaaaaaab wrote:
| Modular multiplication over GF(2)... also known as AND?
| pxx wrote:
| what, no, you have this all wrong. the complexity of
| multiplying two 4x4 matrices is clearly constant. the
| important thing about the constant, however, is that you can
| use this primitive on the block matrices recursively to
| possibly improve the asymptotic complexity.
|
| the complexity improvement in this paper is for arithmetic in
| Z_2 (modular arithmetic over single bits). in standard
| arithmetic, there is no asymptotic complexity improvement but
| the developed algorithms involve more efficiently fusible[0]
| operations, which is useful for the target hardware.
|
| [0] ed: this used to read 'with fewer multiplications,' which
| is clearly wrong: fewer multiplications would lower
| complexity. Interestingly, the discovered algorithm seems to
| have a larger number of additions, but still runs faster.
| ColinWright wrote:
| I thought that savings on the number of multiplications
| leads to savings in complexity. For example, multiplying
| 2x2 matrices using 7 multiplications instead of 8 gives an
| improvement in complexity.
|
| See my comment here:
| https://news.ycombinator.com/item?id=33098192
|
| It's possible that my understanding is completely wrong,
| but your comment is at odds with my other reading, so it
| would be useful to get some clarification.
| pxx wrote:
| sorry; whoops, I don't know why I said multiplications
| there. I somehow thought the "more additions" somehow
| meant "fewer multiplications," but no, it's just "more
| additions but faster performance on target." edited.
|
| your other comment notes that for some shapes, even in
| standard arithmetic, they have found algorithms with
| fewer multiplications than best-known. but those don't
| seem to extend to an algorithm with better asymptotic
| complexity for multiplying general matrices. otherwise
| I'd assume they'd claim them :)
| sdenton4 wrote:
| This is correct, as I understand it.
|
| Most complexity measures are defined in terms of
| asymptotic behavior, so any specific finite algorithm has
| 'constant' complexity. This is an obviously unhelpful bit
| of pedantry in the theory side.
|
| In fact, it's as you say; we can compute a 'raw'
| complexity for a finite operation (eg, raw count of
| arithmetic ops), and then use that operation as a
| primitive to create an algorithm with an asymptotic
| complexity which depends directly on the finite
| operation's 'raw' complexity.
|
| (complexity theory, and specifically for matrix
| multiplication, is a great example of metrics becoming
| targets becoming bad incentives, a la Goodhart's Law. The
| so-called best algorithms are 'galactic' and therefore of
| no use to anyone. There's some fuzzy-headed hope that one
| day the people chasing the big-O exponent metric will
| come up with something practically useful, buuuuuut...
| you actually gotta work practical problem to solve the
| practical problem, and that requires different metrics.
| This seems to be what's motivating the work under
| discussion here.)
| boole1854 wrote:
| You are correct. I will edit.
| a-dub wrote:
| if you're in gf2 on a traditional cpu and care about wall
| clock time, i start to wonder how something like simd xors
| and parity would compare...
| skybrian wrote:
| Isn't multiplying two 4x4 matrices a finite problem? What
| does big O complexity mean in that case?
| Rioghasarig wrote:
| My understanding of this situation is that they found a way
| to multiply 4x4 matrices that requires fewer
| multiplications than Strassen's . This implies that that
| can generalize to a way to multiply n x n matrices with a
| better theoretical complexity than Strassen's algorithm.
| adgjlsfhk1 wrote:
| any fixed size multiplication algorithm has a corresponding
| general algorithm where you recursively divide a matrix
| into blocks and treat each block as an element. for any nxn
| algorithm that takes k multiplications, this yields a
| general algorithm with runtime n^log_n(k)
| ColinWright wrote:
| How can I reconcile your comment here with the comment in
| https://news.ycombinator.com/item?id=33098192 ??
| throwawaymaths wrote:
| No, they count multiplications, 47 vs 49 in the 4x4 case. This
| assumes that multiplications are the heavy lift relative to
| additions, memory moves, etc. this is not a bad assumption, and
| is hardware independent, although there are some proposed
| 'exotic' hardware solutions where addition and multiplication
| are equally difficult (via lookup table, limited to about 8-bit
| x 8-bit) or where addition is harder than multiplication (log
| space number systems) but I don't think these have been
| deployed
| lifthrasiir wrote:
| That assumption didn't age very well. In many if not most
| modern architectures multiplication still does take more
| cycles than addition but can have a higher throughput if
| scheduled well, and fused multiply-add can be as fast as a
| single multiplication, essentially giving a free addition.
| throwawaymaths wrote:
| > but can have a higher throughput if scheduled well
|
| Unlikely to be true for matrix multiplications, which have
| well-defined data dependencies.
|
| > and fused multiply-add can be as fast as a single
| multiplication, essentially giving a free addition.
|
| Yes, this supports the assumption that multiplication is
| the heavy lift.
| tlb wrote:
| With sufficiently parallel hardware, you can do an entire
| mat4 * mat4 multiplication in 3 cycles. First do all 64
| multiplications in parallel, then do 32 adds, then 16
| adds to get the final answer.
|
| For operations in GF(2) where they claim a result, a
| multiply is just an AND gate and an add is an XOR gate.
| So the fully parallel hardware version is 64 AND gates
| and 48 XOR gates, with a total gate delay of 3. This is a
| trivial amount of hardware and could easily be an
| instruction in some alternate universe where it was
| useful.
| klyrs wrote:
| You missed the part where the "4x4 matrix multiplication"
| refers to multiplication of NxN matrices logically
| partitioned into sixteen blocks of size (N/4)x(N/4), done
| recursively.
| WhitneyLand wrote:
| In general to speed up matrix operations how much is some sort of
| result caching applied?
|
| For example if you profile calculations done when training a
| model, is there any significant repetition happening that would
| allow some kind of benefit from table lookups of certain
| solutions?
| bitwize wrote:
| It's begun. AI is improving itself.
| chillee wrote:
| Quoting myself on twitter:
|
| https://twitter.com/cHHillee/status/1577713102434361344
|
| I'm quite suspicious about their hardware benchmarks. They're not
| writing custom kernels, they're relying on a graph compiler like
| XLA to automatically fuse their decomposed matmuls (and my guess
| is that XLA will not be very good at this).
|
| Moreover, as far as I can tell, they don't report absolute
| performance numbers anywhere. In other words, I suspect that a
| naive N^3 matrix multiplication would absolutely smoke them in
| performance.
| [deleted]
| puffoflogic wrote:
| Why the heck is a math paper in Nature? I'd put Nature somewhere
| just below vixra in terms of math credibility.
| 323 wrote:
| If you could find a way to train cats to solve integrals,
| wouldn't that be a Nature paper and not a math one?
| optimalsolver wrote:
| Now, were these algorithms discovered, or invented?
|
| I.e., have they always been there, just sitting in Platonic space
| waiting for a conscious mind to stumble across them, or have they
| just now popped into existence?
| WFHRenaissance wrote:
| Discovered. I'd go as far to say that all "invention" is
| actually just discovery that we find useful.
| bjourne wrote:
| Never heard of the "matrix multiplication tensor" before. Is
| there some digestible blog post or article somewhere explaining
| what it is and why it is useful?
| HarHarVeryFunny wrote:
| Can someone explain the approach here?
|
| I understand that they transform the problem into a gamified 3-D
| matrix decomposition, but what exactly is the motivation for
| using RL to beat this decomposition game ? Why not just use, for
| example, an evolution algorithm to find incrementally better
| decompositions ?
| a-dub wrote:
| this is cool. i suppose it's only a matter of time until we see
| optimizing compilers that use transformer-rl style searches for
| subsets of their optimization and codegen.
| mhh__ wrote:
| Generating pgo data with graph ML is already a thing
| a-dub wrote:
| i'm guessing most of that stuff is register allocation and
| cache management.
|
| rl-searches could potentially rewrite entire algorithms.
|
| edit: (more musing) maybe the future of software engineering
| will be authoring bullet proof acceptance tests.
| Psyladine wrote:
| Nevermind compilers, the possibility of updating architectures
| & code bases by automation. A true feasible attack on tech debt
| and legacy code.
| danuker wrote:
| Will also generate some fun bugs!
| mirekrusin wrote:
| Unless it provides or simply works within formal
| verification?
| f1shy wrote:
| Formal verification does not prove lack of bugs. In best
| case, can only catch one certain type of bugs.
|
| https://smartech.gatech.edu/handle/1853/62855
| adrianN wrote:
| Formal verification is very good at proving that compiler
| transformations preserve semantics. Programming language
| semantics are pretty well specified (at least for some
| programming languages...), so the chance of bugs in the
| specification are low.
| wokkel wrote:
| It can catch all kinds of bugs, but you have to ask the
| correct questions. So it comes down to define what a bug
| is and the assumptions you use for testing. And therein
| lies the problem: what constitutes a bug? A function that
| add two numbers but bever terminates might be considered
| bugfree if forget to include as a bug that not giving an
| answer before the end of the universe is faulty
| behaviour. We humans are terrible at writing these kind
| of specifications, so formal verification as a method
| just pushes the correctness from code to specification.
| hdjjhhvvhga wrote:
| Relax - we'll soon have self-learning debuggers, too.
| iKlsR wrote:
| Whenever I see stuff about automated algorithms it reminds me
| of a movie, can't remember the name or plot exactly but they
| create a being or upload someones consciousness and the "ai"
| is solving all the problems the world faces until it reaches
| the last one, humans. It then begins to synthesize some
| airborne virus or nukes that would scour us from the earth.
| We basically created tech that would eventually optimize us
| out of existence as a cancer to the earth.
|
| Edit, found it Transcendence (2014), also it seems my brain
| injected some random thoughts into that plot, it's not about
| that at all but the point stands!
| haolez wrote:
| Transendence is one of the few movies that present an
| almighty AI that's benevolent in nature.
| ASalazarMX wrote:
| Must have been an embarrassing incident for the engineer
| that commented out the "Don't kill humans" constraint for a
| quick debug session.
| jeffbee wrote:
| Superoptimizers are already extremely good at discovering
| latent flaws in high level source code, and in compilers.
| ColinWright wrote:
| Quoting:
|
| > _... AlphaTensor finds an algorithm for multiplying 4x4
| matrices using 47 multiplications in Z_2 , thereby outperforming
| Strassen's two-level algorithm, which involves 7^2 = 49
| multiplications. By applying this algorithm recursively, one
| obtains a practical matrix multiplication algorithm in Z_2 with
| complexity O(N^2.778)._
|
| > _Moreover, AlphaTensor discovers efficient algorithms for
| multiplying matrices in standard arithmetic; for example,
| AlphaTensor finds a rank-76 decomposition of T_{4,5,5}, improving
| over the previous state-of-the-art complexity of 80
| multiplications._
| loufe wrote:
| This is completely besides the matter, but reading "provably" in
| the abstract is a frank reminder of how terrible English
| spelling/pronounciation is. I can't imagine I'm the only well-
| read native English speaker who read this as "prov-ably" on first
| take. I don't know about most languages, but you just don't get
| nonsense like this in French, at least.
| eigenschwarz wrote:
| https://www.merriam-webster.com/dictionary/provably
|
| Edit: Sorry, missed your point on first reading, which perhaps
| reinforces your point. I read papers like this for a living so
| the language is unambiguous but I see how it might not be for
| others, even if English is their first language.
| pwdisswordfish9 wrote:
| I agree. This is completely besides the matter.
| evmar wrote:
| English is terrible for sure, but this comment is pretty funny
| to compare it to the language that spells /kes k@ se/ as
| "qu'est-ce que c'est".
| elcomet wrote:
| I don't understand your point, provably means which can be
| proven, is there another meaning I'm missing ?
| ogogmad wrote:
| He's talking about the spelling not matching the
| pronunciation
| 613style wrote:
| Think "probably" vs "provably" (it's about
| pronunciation/spelling)
___________________________________________________________________
(page generated 2022-10-05 23:00 UTC)