[HN Gopher] BLIS: A BLAS-like framework for basic linear algebra...
___________________________________________________________________
BLIS: A BLAS-like framework for basic linear algebra routines
Author : ogogmad
Score : 46 points
Date : 2024-01-24 20:32 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| optimalsolver wrote:
| New? Isn't this AMD's reference BLAS implementation?
| dang wrote:
| We've taken the word 'new' out of the title. Thanks!
| lloda wrote:
| BLIS is a different library from BLAS, with a better API. For
| example when you pass an array, you can pass independent
| strides for all axes, and you can give independent
| transpose/conjugate flags. It does have a BLAS compatibility
| interface.
|
| The interface isn't perfect though. For example they didn't
| commit to supporting zero strides. That used to work anyway,
| but it's broken now for some archs.
| bee_rider wrote:
| BLIS is a framework to instantiate (among other things) a BLAS.
| The exciting bit is that they distilled it down to a couple
| compute kernels. You provide stuff like the inner loops for a
| matrix multiplication, it turns it into a numerical library.
|
| AMD did use it to create their BLAS library though.
|
| Also, side note, when I've heard "reference BLAS," it has been
| used in the opposite way. Netlib BLAS is the reference BLAS, it
| has basically bad performance but it defines the functionality.
| AMD used BLIS to create a tuned vendor BLAS.
| dang wrote:
| Discussed just a little in the past:
|
| _BLIS: BLAS-Like Library Instantiation Software Framework_ -
| https://news.ycombinator.com/item?id=11369541 - March 2016 (2
| comments)
| boegel wrote:
| For people new to BLIS, I can recommend this talk from the
| EasyBuild User Meeting (2021):
| https://www.youtube.com/watch?v=eb3dXivyTzE
| quanto wrote:
| The real money shot is here:
| https://github.com/flame/blis/blob/master/docs/Performance.m...
|
| It seems that the selling point is that BLIS does multi-core
| quite well. I am especially impressed that it does as well as the
| highly optimized Intel MKL on Intel CPUs.
|
| I do not see the selling point of BLIS-specific APIs, though. The
| whole point of having an open BLAS API standard is that numerical
| libraries should be drop-in replaceable, so when a new library
| (such as BLIS here) comes along, one could just re-link the
| library and reap the performance gain immediately.
|
| What is interesting is that numerical algebra work, by nature, is
| mostly embarrassingly parallel, so it should not be too difficult
| to write multi-core implementations. And yet, BLIS here performs
| so much better than some other industry-leading implementations
| on multi-core configurations. So the question is not why BLIS
| does so well; the question is why some other implementations do
| so poorly.
| celrod wrote:
| I don't think these are realistic. It does far worse than MKL
| in practice on Intel CPUs in my tests, especially if arrays are
| not huge.
| mgaunard wrote:
| It's not that difficult to be faster than MKL when you write
| the benchmarks.
|
| MKL is a general-purpose library, it does compromises to be
| good for most sizes. If you care about the performance of
| specific sizes, you can write code to beat it on that very
| specific benchmark.
|
| Interestingly, BLIS does poorly at small sizes, which is the
| area where it is easiest to beat MKL at.
| pletnes wrote:
| Is this approach similar to or inspired by ATLAS? Sounds like it
| to me at first glance.
| bee_rider wrote:
| They are solving the same problem (tuning a BLAS).
|
| ATLAS tries to automate it away.
|
| For BLIS, instead they isolated the part that requires hand-
| tuning so you only have to write a couple kernels, then they
| can use them for the whole library.
|
| BLIS is quite a bit newer and I'd be surprised if ATLAS kept
| up.
___________________________________________________________________
(page generated 2024-01-24 23:00 UTC)