[HN Gopher] BLIS: A BLAS-like framework for basic linear algebra...
       ___________________________________________________________________
        
       BLIS: A BLAS-like framework for basic linear algebra routines
        
       Author : ogogmad
       Score  : 46 points
       Date   : 2024-01-24 20:32 UTC (2 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | optimalsolver wrote:
       | New? Isn't this AMD's reference BLAS implementation?
        
         | dang wrote:
         | We've taken the word 'new' out of the title. Thanks!
        
         | lloda wrote:
         | BLIS is a different library from BLAS, with a better API. For
         | example when you pass an array, you can pass independent
         | strides for all axes, and you can give independent
         | transpose/conjugate flags. It does have a BLAS compatibility
         | interface.
         | 
         | The interface isn't perfect though. For example they didn't
         | commit to supporting zero strides. That used to work anyway,
         | but it's broken now for some archs.
        
         | bee_rider wrote:
         | BLIS is a framework to instantiate (among other things) a BLAS.
         | The exciting bit is that they distilled it down to a couple
         | compute kernels. You provide stuff like the inner loops for a
         | matrix multiplication, it turns it into a numerical library.
         | 
         | AMD did use it to create their BLAS library though.
         | 
         | Also, side note, when I've heard "reference BLAS," it has been
         | used in the opposite way. Netlib BLAS is the reference BLAS, it
         | has basically bad performance but it defines the functionality.
         | AMD used BLIS to create a tuned vendor BLAS.
        
       | dang wrote:
       | Discussed just a little in the past:
       | 
       |  _BLIS: BLAS-Like Library Instantiation Software Framework_ -
       | https://news.ycombinator.com/item?id=11369541 - March 2016 (2
       | comments)
        
       | boegel wrote:
       | For people new to BLIS, I can recommend this talk from the
       | EasyBuild User Meeting (2021):
       | https://www.youtube.com/watch?v=eb3dXivyTzE
        
       | quanto wrote:
       | The real money shot is here:
       | https://github.com/flame/blis/blob/master/docs/Performance.m...
       | 
       | It seems that the selling point is that BLIS does multi-core
       | quite well. I am especially impressed that it does as well as the
       | highly optimized Intel MKL on Intel CPUs.
       | 
       | I do not see the selling point of BLIS-specific APIs, though. The
       | whole point of having an open BLAS API standard is that numerical
       | libraries should be drop-in replaceable, so when a new library
       | (such as BLIS here) comes along, one could just re-link the
       | library and reap the performance gain immediately.
       | 
       | What is interesting is that numerical algebra work, by nature, is
       | mostly embarrassingly parallel, so it should not be too difficult
       | to write multi-core implementations. And yet, BLIS here performs
       | so much better than some other industry-leading implementations
       | on multi-core configurations. So the question is not why BLIS
       | does so well; the question is why some other implementations do
       | so poorly.
        
         | celrod wrote:
         | I don't think these are realistic. It does far worse than MKL
         | in practice on Intel CPUs in my tests, especially if arrays are
         | not huge.
        
         | mgaunard wrote:
         | It's not that difficult to be faster than MKL when you write
         | the benchmarks.
         | 
         | MKL is a general-purpose library, it does compromises to be
         | good for most sizes. If you care about the performance of
         | specific sizes, you can write code to beat it on that very
         | specific benchmark.
         | 
         | Interestingly, BLIS does poorly at small sizes, which is the
         | area where it is easiest to beat MKL at.
        
       | pletnes wrote:
       | Is this approach similar to or inspired by ATLAS? Sounds like it
       | to me at first glance.
        
         | bee_rider wrote:
         | They are solving the same problem (tuning a BLAS).
         | 
         | ATLAS tries to automate it away.
         | 
         | For BLIS, instead they isolated the part that requires hand-
         | tuning so you only have to write a couple kernels, then they
         | can use them for the whole library.
         | 
         | BLIS is quite a bit newer and I'd be surprised if ATLAS kept
         | up.
        
       ___________________________________________________________________
       (page generated 2024-01-24 23:00 UTC)