[HN Gopher] Blaze: A High Performance C++ Math Library
       ___________________________________________________________________
        
       Blaze: A High Performance C++ Math Library
        
       Author : optimalsolver
       Score  : 68 points
       Date   : 2024-04-17 10:15 UTC (12 hours ago)
        
 (HTM) web link (bitbucket.org)
 (TXT) w3m dump (bitbucket.org)
        
       | stargrazer wrote:
       | Is this in represented here for posterity? Last news item is
       | 15.8.2020. There are recent commits for compiler compatibility
       | testing (feb 2024).
       | 
       | What is of import here?
        
         | sevagh wrote:
         | People can post whatever they want on HN. It's a neat library,
         | why not post it?
         | 
         | Previous post (by the same submitter):
         | https://news.ycombinator.com/item?id=34407106
        
       | dannyz wrote:
       | It seems like every large project these days has coalesced around
       | Eigen, what are some of the advantages that Blaze has over Eigen?
        
         | queuebert wrote:
         | Or cuBLAS. In practice, if I'm going through the trouble to
         | rewrite math in C++, I'd rather just make GPU kernels.
        
           | VHRanger wrote:
           | I mean, that only works for a small subset of workloads where
           | the data movement patterns fit, the bandwidth is more
           | important than the latency, etc.
           | 
           | The reality is that almost all workloads aren't anywhere near
           | saturating the AVX instruction max bandwidth on a CPU since
           | Haswell.
        
             | queuebert wrote:
             | Depends on whether you measure workloads as "jobs" or
             | "flops". If "flops", I would hazard that the bulk of
             | computing on the planet right now is happening on GPUs.
        
             | chrsig wrote:
             | I'm by no means an expert in the topic, but to share my
             | take anyway: It seems to me like there's just diminishing
             | returns in SIMD approaches. If you're going to organize
             | your data well for SIMD use then it's not a far reach to
             | make it work well on a gpu, which will keep getting more
             | cores.
             | 
             | I imagine we'll get to a point where CPUs are actually just
             | pretty dumb drivers for issuing gpu commands.
        
               | gdiamos wrote:
               | As someone who worked on CUDA 15 years ago - it's amazing
               | to me that someone on the internet posted this statement.
               | 
               | Did GPUs win?
        
               | thrtythreeforty wrote:
               | Yes and no. The compute density and memory bandwidth is
               | unmatched. But the programming model is markedly worse,
               | even for something like CUDA: you inherently have to
               | think about parallelism, how to organize data, write your
               | kernels in a special language, deal with wacky
               | toolchains, and still get to deal with the CPU and
               | operating system.
               | 
               | There is great power in the convenience of "with
               | open('foo') as f:". Most workloads are still stitching
               | together I/O bound APIs, not doing memory-bound or CPU-
               | bound compute.
        
               | gdiamos wrote:
               | CUDA was always harder to program - even if you could get
               | better perf
               | 
               | It took a long time to find something that really took
               | advantage of it, but we did eventually. CUDA enabled deep
               | learning which enabled LLMs . That's history.
               | 
               | What surprised me about the statement was that it implied
               | that the model of python driving optimized GPU kernels
               | was broader than deep learning.
               | 
               | That was the original vision of CUDA - most of the
               | computational work being done by massively parallel cores
        
               | chrsig wrote:
               | I don't think that there's a "win" here. It's just sort
               | of which way you tilt your head, how much space do you
               | have to cram a ton of cores connected to a really wide
               | memory bus and how close can you get the storage while
               | keeping everything from catching on fire, no? ("just sort
               | of" is going to have to skip leg day because of the
               | herculean lift it just did)
               | 
               | It's a fairly fractal pattern in distributing computing.
               | Move the high throughput heavy computation bits away from
               | the low latency responsive bits ("low latency" here is
               | relative to the total computation). Use an event loop for
               | the reactive bits. Eventually someone will invert the
               | event loop to use coroutines so everything looks
               | synchronous (Go, anyone? python's gevent?).
               | 
               | After it seems to me that the only real question is if
               | takes too long or costs too much to move the data to the
               | storage location the heavy computation hardware uses.
               | There's really not much of a conceptual difference
               | between airflow driving snowflake and c++ running on a
               | cpu driving cuda kernels. It takes a certain scale to
               | make going from a OLTP database to an OLAP database worth
               | it, just like it takes a certain scale to make a GPU
               | worth it over simd instructions on the local processor.
        
               | CyberDildonics wrote:
               | Win what? This person said they were inexperienced. SIMD
               | is extremely valuable and the situations where it works
               | well are not rare at all.
        
               | VHRanger wrote:
               | Not really, no.
               | 
               | GPUs are still very limited, even compared to the SIMD
               | instruction set. You couldn't make a CUDAjson the same
               | way the SIMDjson library is built for example, because it
               | doesnt handle SIMD branching in a way that accomodates
               | it.
               | 
               | Second, again, the latency issue. GPUs are only good if
               | you have a pipeline of data to constantly feed it, so
               | that the PCIe transfer latency issue is minimal.
        
             | Const-me wrote:
             | > almost all workloads aren't anywhere near saturating the
             | AVX instruction max bandwidth on a CPU since Haswell
             | 
             | That's true, but GPUs aren't only good at FLOPs, the memory
             | bandwidth in them is also an order of magnitude faster than
             | system memory.
             | 
             | In my previous computer, the numbers were 484 GB/second for
             | 1080 Ti, and 50 GB/second for DDR4 system memory. In my
             | current one, they are 672 GB/second for 4070 Ti super, and
             | 74 GB/second for DDR5 system memory.
        
         | oispakaljaa wrote:
         | According to the provided benchmarks [1], it seems to be quite
         | a bit faster.
         | 
         | [1] https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks
        
           | dannyz wrote:
           | These benchmarks look to be ~8 years old, and don't really
           | agree with benchmarks done by other sources
           | (https://romanpoya.medium.com/a-look-at-the-performance-of-
           | ex..., https://eigen.tuxfamily.org/index.php?title=Benchmark)
           | 
           | In general I would be skeptical about any benchmark that
           | claims to beat MKL significantly on standard operations
        
             | adgjlsfhk1 wrote:
             | beating MKL for <100x100 is pretty doable. the BLAS
             | framework has a decent amount of inherent overhead, so just
             | exposing a better API (e.g. one that specifies the array
             | types and sizes well) makes it pretty easy to improve
             | things. For big sizes though, MKL is incredibly good.
        
         | VHRanger wrote:
         | Compile times for one.
         | 
         | Eigen uses C++ templates to do most things, which explodes
         | compile times.
        
           | planede wrote:
           | AFAIK blaze is also somewhat heavy on templates, but maybe it
           | uses more modern metaprogramming techniques.
        
           | a_t48 wrote:
           | Compile times and binary sizes :(
        
         | 1over137 wrote:
         | Is Eigen still alive? There's been no release in 3 years, and
         | no news about it:
         | https://gitlab.com/libeigen/eigen/-/issues/2699
        
           | sevagh wrote:
           | The master branch is active and people use Eigen today. The
           | Discord has maintainers that are still active. Not sure how
           | it could be considered "dead"?
        
             | infamouscow wrote:
             | The rise of frontend developers over the last 5 years
             | learned everything must be new.
             | 
             | That a math library of all things could be complete is
             | several orders of thinking beyond their ability. I'm sure
             | the gut reaction is to downvote this for the embarrassing
             | criticism, but in all seriousness, this is the right
             | answer.
        
               | klaussilveira wrote:
               | What? You mean I don't need to refactor and break API
               | every 6 months?
        
               | sevagh wrote:
               | I realize asking for a new 4.0 release is fair (and the
               | GitLab issue does have a highly upvoted request for a
               | release).
               | 
               | But you can't just call things "dead" for no reason, it's
               | in poor taste. It's feature-complete, not dead!
        
               | stanleykm wrote:
               | Sure code can be "feature complete" but the reality is
               | the rest of the world changes, so there will be more and
               | more friction for your users over time. For example
               | someone in the issue mentions they need to use mainline
               | to use eigen with cuda now.
        
               | infamouscow wrote:
               | Mathematics is a priori. It's beyond the world changing.
               | You might be surprised to learn we still use Euclid's
               | geometry despite it being thousands of years old.
               | 
               | What you're actually saying is you expect open source
               | maintainers to add arbitrary functionality for free.
        
               | imadj wrote:
               | > Mathematics is a priori
               | 
               | Sure, but the discussion here is about a software library
               | not the math concepts
        
               | infamouscow wrote:
               | Software programs are equivalent to mathematical proofs.
               | [1]
               | 
               | Short of a bug in the implementation, there has yet to be
               | a valid explanation for why mathematics libraries need to
               | be continuously maintained. If I published an NPM library
               | called left-add, which adds the left parameter to the
               | right parameter (read: addition) how long, exactly,
               | should I expect to maintain this for others?
               | 
               | The only explanation so far is that scumbags expect open
               | source library maintainers to slave away indefinitely.
               | The further we steer into the weeds of ignorant
               | explanations, the more I'm inclined to believe this
               | really is the underlying rationale.
               | 
               | 1: https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_cor
               | respon...
        
               | imadj wrote:
               | There are many reasons why a library require continuous
               | maintainance even when it's "feature-complete", off the
               | top of my head:
               | 
               | 1. Bug fixes
               | 
               | 2. Security issues
               | 
               | 3. Optimization
               | 
               | 4. Compatibility/Adapt to landscape changes
               | 
               | People pointing flaws in a library aren't "scumbags that
               | expect open source library maintainers to slave away
               | indefinitely"
               | 
               | No one is forcing the maintainer to "slave away", they
               | can step down any time and say I'm not up for this role
               | anymore. Those interested will fork the library and carry
               | the torch.
               | 
               | No need to be so defensive and insult others just for
               | giving feedback.
        
               | infamouscow wrote:
               | I think you've constructing a strawman, arguing for
               | general software libraries. We're talking specifically
               | about math libraries.
               | 
               | Regardless of the strawman, the person(s) that authored
               | the code don't owe you anything. They don't have to step
               | down, make an announcement, or merge your changes just
               | because you can't read or comprehend the license text
               | that says very clearly in all capital letters the
               | software is warrantied for no purpose once so ever,
               | implied or otherwise.
               | 
               | If one had a patch and was eager to see it upstreamed
               | quickly, it seems like you're arguing the maintenance
               | status actually doesn't matter. Since "[t]hose interested
               | will fork the library and carry the torch" if the patch
               | isn't merged expediently.
               | 
               | But if you're confident the interested will fork and
               | carry the torch, why do you think you're entitled to
               | force the author(s) giving software warrantied for no
               | purpose should step down. That's genuinely deranged, and
               | my insults appear to be accurate descriptions rather than
               | ad hominem attacks since no coherent explanation has been
               | provided as to why the four reasons given somehow
               | supersede the authors chosen license.
        
               | stanleykm wrote:
               | I don't think I'm saying that at all. There are plenty of
               | little libraries out there written in C89 in 1994 that
               | still work perfectly well today. But they don't claim to
               | use the latest compiler or hardware features to make the
               | compiled binary fast, nor do they come with expectations
               | about how easy or hard it is to integrate. The code
               | simply exists and has not been touched in 30 years. Use
               | at your own peril.
               | 
               | If you have a math library that is relying on hardware
               | and compilers to make it fast you should acknowledge that
               | the software and hardware ecosystem in which you exist is
               | constantly changing even if the math is not.
        
               | infamouscow wrote:
               | > THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
               | CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
               | WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
               | WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
               | PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
               | COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY
               | DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
               | CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
               | PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
               | DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
               | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
               | CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
               | OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
               | SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
               | DAMAGE.
               | 
               | This is a pretty bold and loud acknowledgement.
               | 
               | What more could you really ask for when even lawyers
               | think this is sufficient.
        
               | stanleykm wrote:
               | > What more could you really ask for
               | 
               | Some signal that the project is being maintained? If it's
               | not that's fine but don't go radio silent and get pissy
               | when people ask if a project is dead...
               | 
               | This is not a legal or moral issue it's just being
               | considerate for others as well. You, the maintainer, made
               | the choice to maintain this project in the public and
               | foster a userbase. This is not a one-way relationship.
               | People spend their time making patches and integrating
               | your software. You are under no obligation to maintain it
               | of course but dont be a dick.
        
               | infamouscow wrote:
               | The reason open source maintainers get pissy is because
               | idiots selectively ignore _entire paragraphs of the
               | license_ that explicitly states the project isn 't
               | maintained and you shouldn't imply it is under any
               | circumstances. The author is being extremely considerate.
               | The problem is fools have no respect for author or chosen
               | license. They rather do the opposite of what the author's
               | license says. The only reason we're having this
               | discussion is because there's enough fools that think
               | they might be on to something.
               | 
               | The implication is the mistake, not the author for not
               | being explicit enough.
        
               | stanleykm wrote:
               | The only one being foolish here is you with needless
               | pedantry. Yes the legal contract says that the authors
               | dont owe anyone anything but there is also a social
               | contract at play here that you are apparently not
               | understanding.
        
               | codingstream wrote:
               | I don't recall there ever being a social contract.
               | 
               | Further, what makes you assume everyone is on the same
               | page about what that social contract is? Have you even
               | considered the possibility that there might be
               | differences of opinion on a social contract which are
               | incompatible? It's why the best course of action is to
               | follow the license rather than delusional fantasies.
               | 
               | The idea there's a social contract is sophistry. Plain
               | and simple.
        
           | VHRanger wrote:
           | I mean, it's not like linear algebra has changed that much in
           | 4 years?
        
             | touisteur wrote:
             | Randomized linear algebra and under-solving (mixed
             | precision or fp32 instead of fp64) seem to be taking off
             | more than in the past, mostly on gpu though (use of tensor
             | cores, expensive fp64, memory bandwidth limits).
             | 
             | And I wish Eigen had a larger spectrum of 'solvers' you can
             | chose from, depending on what you want. But in general I
             | agree with you, except there's always a cycle to eke out
             | somewhere, right?
        
             | delfinom wrote:
             | Too many people have their brain rotted from the web dev
             | world where things are reinvented every other week.
        
         | flemishgun wrote:
         | I'm surprised people think this, there is also the widely-used
         | Armadillo linear algebra library. In my opinion it has a much
         | nicer syntax.
         | 
         | https://arma.sourceforge.net/
        
           | UncleOxidant wrote:
           | How's the performance?
           | 
           | EDIT: also being on Sourceforge is kind of a hinderance to
           | discovery these days. I wonder why they chose to be on there
           | instead of github?
        
             | stagger87 wrote:
             | It's slower but maybe the target audience is different?
             | Armadillo prioritizes MATLAB like syntax. I use armadillo
             | as a stepping stone between MATLAB prototypes and a hand
             | rolled C++ solution, and in many scenarios it can get you a
             | long ways down the road.
        
               | flemishgun wrote:
               | Tough to say something as blanket as "it's slower"...
               | there are lots of operations in any linear algebra
               | library. It's not a direct comparison with other C++
               | linear algebra libraries, but hard to say Armadillo is
               | slow based on benchmarks like this:
               | 
               | https://conradsanderson.id.au/pdfs/sanderson_curtin_armad
               | ill...
        
       | klaussilveira wrote:
       | If you want something similar, but for games:
       | 
       | https://github.com/EricLengyel/Terathon-Math-Library
        
         | OnionBlender wrote:
         | What is the advantage over glm? The geometric algebra stuff?
        
         | Arelius wrote:
         | Another good PGA library
         | 
         | https://github.com/jeremyong/Klein
        
       | Solvency wrote:
       | out of curiosity, when and/or how often do these high-performance
       | math libraries get folded into game physics engines? Like would
       | Blaze offer any sort of advantage if you were to develop a new 3d
       | soft/hard body physics engine?
        
         | cyber_kinetist wrote:
         | For typical game physics engines... not that much. Math
         | libraries like Eigen or Blaze use lots of template
         | metaprogramming techniques under the hood that can help when
         | you're doing large batched matrix multiplications (since it can
         | remove temporary allocations at compile-time and can also fuse
         | operations efficiently, as well as applying various SIMD
         | optimizations), but it doesn't really help when you need lots
         | of small operations (with mat3 / mat4 / vec3 / quat / etc.).
         | Typically game physics engines tend to use iterative algorithms
         | for their solvers (Gauss-Seidel, PBD, etc...) instead of
         | batched "matrix"-oriented ones, so you'll get less benefits out
         | of Eigen / Blaze compared to what you typically see in deep
         | learning / scientific computing workloads.
         | 
         | The codebases I've seen in many game physics engines seem to
         | all roll their own minimal math libraries for these stuff, or
         | even just use SIMD (SSE / AVX) intrinsics directly. Examples:
         | PhysX (https://github.com/NVIDIA-Omniverse/PhysX), Box2D
         | (https://github.com/erincatto/box2d), Bullet
         | (https://github.com/bulletphysics/bullet3)...
        
       | floor_ wrote:
       | I don't know if I would call a math library that uses templates
       | so liberally "high performance". High performance also includes
       | compile time in my opinion.
        
         | murderfs wrote:
         | Your opinion is wrong.
        
           | oivey wrote:
           | Yeah. Avoiding templates almost certainly leads to losing run
           | time performance. The compile time is a drop in the bucket.
        
       ___________________________________________________________________
       (page generated 2024-04-17 23:01 UTC)