[HN Gopher] Blaze: A High Performance C++ Math Library
___________________________________________________________________
Blaze: A High Performance C++ Math Library
Author : optimalsolver
Score : 68 points
Date : 2024-04-17 10:15 UTC (12 hours ago)
(HTM) web link (bitbucket.org)
(TXT) w3m dump (bitbucket.org)
| stargrazer wrote:
| Is this in represented here for posterity? Last news item is
| 15.8.2020. There are recent commits for compiler compatibility
| testing (feb 2024).
|
| What is of import here?
| sevagh wrote:
| People can post whatever they want on HN. It's a neat library,
| why not post it?
|
| Previous post (by the same submitter):
| https://news.ycombinator.com/item?id=34407106
| dannyz wrote:
| It seems like every large project these days has coalesced around
| Eigen, what are some of the advantages that Blaze has over Eigen?
| queuebert wrote:
| Or cuBLAS. In practice, if I'm going through the trouble to
| rewrite math in C++, I'd rather just make GPU kernels.
| VHRanger wrote:
| I mean, that only works for a small subset of workloads where
| the data movement patterns fit, the bandwidth is more
| important than the latency, etc.
|
| The reality is that almost all workloads aren't anywhere near
| saturating the AVX instruction max bandwidth on a CPU since
| Haswell.
| queuebert wrote:
| Depends on whether you measure workloads as "jobs" or
| "flops". If "flops", I would hazard that the bulk of
| computing on the planet right now is happening on GPUs.
| chrsig wrote:
| I'm by no means an expert in the topic, but to share my
| take anyway: It seems to me like there's just diminishing
| returns in SIMD approaches. If you're going to organize
| your data well for SIMD use then it's not a far reach to
| make it work well on a gpu, which will keep getting more
| cores.
|
| I imagine we'll get to a point where CPUs are actually just
| pretty dumb drivers for issuing gpu commands.
| gdiamos wrote:
| As someone who worked on CUDA 15 years ago - it's amazing
| to me that someone on the internet posted this statement.
|
| Did GPUs win?
| thrtythreeforty wrote:
| Yes and no. The compute density and memory bandwidth is
| unmatched. But the programming model is markedly worse,
| even for something like CUDA: you inherently have to
| think about parallelism, how to organize data, write your
| kernels in a special language, deal with wacky
| toolchains, and still get to deal with the CPU and
| operating system.
|
| There is great power in the convenience of "with
| open('foo') as f:". Most workloads are still stitching
| together I/O bound APIs, not doing memory-bound or CPU-
| bound compute.
| gdiamos wrote:
| CUDA was always harder to program - even if you could get
| better perf
|
| It took a long time to find something that really took
| advantage of it, but we did eventually. CUDA enabled deep
| learning which enabled LLMs . That's history.
|
| What surprised me about the statement was that it implied
| that the model of python driving optimized GPU kernels
| was broader than deep learning.
|
| That was the original vision of CUDA - most of the
| computational work being done by massively parallel cores
| chrsig wrote:
| I don't think that there's a "win" here. It's just sort
| of which way you tilt your head, how much space do you
| have to cram a ton of cores connected to a really wide
| memory bus and how close can you get the storage while
| keeping everything from catching on fire, no? ("just sort
| of" is going to have to skip leg day because of the
| herculean lift it just did)
|
| It's a fairly fractal pattern in distributing computing.
| Move the high throughput heavy computation bits away from
| the low latency responsive bits ("low latency" here is
| relative to the total computation). Use an event loop for
| the reactive bits. Eventually someone will invert the
| event loop to use coroutines so everything looks
| synchronous (Go, anyone? python's gevent?).
|
| After it seems to me that the only real question is if
| takes too long or costs too much to move the data to the
| storage location the heavy computation hardware uses.
| There's really not much of a conceptual difference
| between airflow driving snowflake and c++ running on a
| cpu driving cuda kernels. It takes a certain scale to
| make going from a OLTP database to an OLAP database worth
| it, just like it takes a certain scale to make a GPU
| worth it over simd instructions on the local processor.
| CyberDildonics wrote:
| Win what? This person said they were inexperienced. SIMD
| is extremely valuable and the situations where it works
| well are not rare at all.
| VHRanger wrote:
| Not really, no.
|
| GPUs are still very limited, even compared to the SIMD
| instruction set. You couldn't make a CUDAjson the same
| way the SIMDjson library is built for example, because it
| doesnt handle SIMD branching in a way that accomodates
| it.
|
| Second, again, the latency issue. GPUs are only good if
| you have a pipeline of data to constantly feed it, so
| that the PCIe transfer latency issue is minimal.
| Const-me wrote:
| > almost all workloads aren't anywhere near saturating the
| AVX instruction max bandwidth on a CPU since Haswell
|
| That's true, but GPUs aren't only good at FLOPs, the memory
| bandwidth in them is also an order of magnitude faster than
| system memory.
|
| In my previous computer, the numbers were 484 GB/second for
| 1080 Ti, and 50 GB/second for DDR4 system memory. In my
| current one, they are 672 GB/second for 4070 Ti super, and
| 74 GB/second for DDR5 system memory.
| oispakaljaa wrote:
| According to the provided benchmarks [1], it seems to be quite
| a bit faster.
|
| [1] https://bitbucket.org/blaze-lib/blaze/wiki/Benchmarks
| dannyz wrote:
| These benchmarks look to be ~8 years old, and don't really
| agree with benchmarks done by other sources
| (https://romanpoya.medium.com/a-look-at-the-performance-of-
| ex..., https://eigen.tuxfamily.org/index.php?title=Benchmark)
|
| In general I would be skeptical about any benchmark that
| claims to beat MKL significantly on standard operations
| adgjlsfhk1 wrote:
| beating MKL for <100x100 is pretty doable. the BLAS
| framework has a decent amount of inherent overhead, so just
| exposing a better API (e.g. one that specifies the array
| types and sizes well) makes it pretty easy to improve
| things. For big sizes though, MKL is incredibly good.
| VHRanger wrote:
| Compile times for one.
|
| Eigen uses C++ templates to do most things, which explodes
| compile times.
| planede wrote:
| AFAIK blaze is also somewhat heavy on templates, but maybe it
| uses more modern metaprogramming techniques.
| a_t48 wrote:
| Compile times and binary sizes :(
| 1over137 wrote:
| Is Eigen still alive? There's been no release in 3 years, and
| no news about it:
| https://gitlab.com/libeigen/eigen/-/issues/2699
| sevagh wrote:
| The master branch is active and people use Eigen today. The
| Discord has maintainers that are still active. Not sure how
| it could be considered "dead"?
| infamouscow wrote:
| The rise of frontend developers over the last 5 years
| learned everything must be new.
|
| That a math library of all things could be complete is
| several orders of thinking beyond their ability. I'm sure
| the gut reaction is to downvote this for the embarrassing
| criticism, but in all seriousness, this is the right
| answer.
| klaussilveira wrote:
| What? You mean I don't need to refactor and break API
| every 6 months?
| sevagh wrote:
| I realize asking for a new 4.0 release is fair (and the
| GitLab issue does have a highly upvoted request for a
| release).
|
| But you can't just call things "dead" for no reason, it's
| in poor taste. It's feature-complete, not dead!
| stanleykm wrote:
| Sure code can be "feature complete" but the reality is
| the rest of the world changes, so there will be more and
| more friction for your users over time. For example
| someone in the issue mentions they need to use mainline
| to use eigen with cuda now.
| infamouscow wrote:
| Mathematics is a priori. It's beyond the world changing.
| You might be surprised to learn we still use Euclid's
| geometry despite it being thousands of years old.
|
| What you're actually saying is you expect open source
| maintainers to add arbitrary functionality for free.
| imadj wrote:
| > Mathematics is a priori
|
| Sure, but the discussion here is about a software library
| not the math concepts
| infamouscow wrote:
| Software programs are equivalent to mathematical proofs.
| [1]
|
| Short of a bug in the implementation, there has yet to be
| a valid explanation for why mathematics libraries need to
| be continuously maintained. If I published an NPM library
| called left-add, which adds the left parameter to the
| right parameter (read: addition) how long, exactly,
| should I expect to maintain this for others?
|
| The only explanation so far is that scumbags expect open
| source library maintainers to slave away indefinitely.
| The further we steer into the weeds of ignorant
| explanations, the more I'm inclined to believe this
| really is the underlying rationale.
|
| 1: https://en.wikipedia.org/wiki/Curry%E2%80%93Howard_cor
| respon...
| imadj wrote:
| There are many reasons why a library require continuous
| maintainance even when it's "feature-complete", off the
| top of my head:
|
| 1. Bug fixes
|
| 2. Security issues
|
| 3. Optimization
|
| 4. Compatibility/Adapt to landscape changes
|
| People pointing flaws in a library aren't "scumbags that
| expect open source library maintainers to slave away
| indefinitely"
|
| No one is forcing the maintainer to "slave away", they
| can step down any time and say I'm not up for this role
| anymore. Those interested will fork the library and carry
| the torch.
|
| No need to be so defensive and insult others just for
| giving feedback.
| infamouscow wrote:
| I think you've constructing a strawman, arguing for
| general software libraries. We're talking specifically
| about math libraries.
|
| Regardless of the strawman, the person(s) that authored
| the code don't owe you anything. They don't have to step
| down, make an announcement, or merge your changes just
| because you can't read or comprehend the license text
| that says very clearly in all capital letters the
| software is warrantied for no purpose once so ever,
| implied or otherwise.
|
| If one had a patch and was eager to see it upstreamed
| quickly, it seems like you're arguing the maintenance
| status actually doesn't matter. Since "[t]hose interested
| will fork the library and carry the torch" if the patch
| isn't merged expediently.
|
| But if you're confident the interested will fork and
| carry the torch, why do you think you're entitled to
| force the author(s) giving software warrantied for no
| purpose should step down. That's genuinely deranged, and
| my insults appear to be accurate descriptions rather than
| ad hominem attacks since no coherent explanation has been
| provided as to why the four reasons given somehow
| supersede the authors chosen license.
| stanleykm wrote:
| I don't think I'm saying that at all. There are plenty of
| little libraries out there written in C89 in 1994 that
| still work perfectly well today. But they don't claim to
| use the latest compiler or hardware features to make the
| compiled binary fast, nor do they come with expectations
| about how easy or hard it is to integrate. The code
| simply exists and has not been touched in 30 years. Use
| at your own peril.
|
| If you have a math library that is relying on hardware
| and compilers to make it fast you should acknowledge that
| the software and hardware ecosystem in which you exist is
| constantly changing even if the math is not.
| infamouscow wrote:
| > THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND
| CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED
| WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
| WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A
| PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE
| COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY
| DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
| CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
| PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
| DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
| CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
| CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE
| OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
| SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH
| DAMAGE.
|
| This is a pretty bold and loud acknowledgement.
|
| What more could you really ask for when even lawyers
| think this is sufficient.
| stanleykm wrote:
| > What more could you really ask for
|
| Some signal that the project is being maintained? If it's
| not that's fine but don't go radio silent and get pissy
| when people ask if a project is dead...
|
| This is not a legal or moral issue it's just being
| considerate for others as well. You, the maintainer, made
| the choice to maintain this project in the public and
| foster a userbase. This is not a one-way relationship.
| People spend their time making patches and integrating
| your software. You are under no obligation to maintain it
| of course but dont be a dick.
| infamouscow wrote:
| The reason open source maintainers get pissy is because
| idiots selectively ignore _entire paragraphs of the
| license_ that explicitly states the project isn 't
| maintained and you shouldn't imply it is under any
| circumstances. The author is being extremely considerate.
| The problem is fools have no respect for author or chosen
| license. They rather do the opposite of what the author's
| license says. The only reason we're having this
| discussion is because there's enough fools that think
| they might be on to something.
|
| The implication is the mistake, not the author for not
| being explicit enough.
| stanleykm wrote:
| The only one being foolish here is you with needless
| pedantry. Yes the legal contract says that the authors
| dont owe anyone anything but there is also a social
| contract at play here that you are apparently not
| understanding.
| codingstream wrote:
| I don't recall there ever being a social contract.
|
| Further, what makes you assume everyone is on the same
| page about what that social contract is? Have you even
| considered the possibility that there might be
| differences of opinion on a social contract which are
| incompatible? It's why the best course of action is to
| follow the license rather than delusional fantasies.
|
| The idea there's a social contract is sophistry. Plain
| and simple.
| VHRanger wrote:
| I mean, it's not like linear algebra has changed that much in
| 4 years?
| touisteur wrote:
| Randomized linear algebra and under-solving (mixed
| precision or fp32 instead of fp64) seem to be taking off
| more than in the past, mostly on gpu though (use of tensor
| cores, expensive fp64, memory bandwidth limits).
|
| And I wish Eigen had a larger spectrum of 'solvers' you can
| chose from, depending on what you want. But in general I
| agree with you, except there's always a cycle to eke out
| somewhere, right?
| delfinom wrote:
| Too many people have their brain rotted from the web dev
| world where things are reinvented every other week.
| flemishgun wrote:
| I'm surprised people think this, there is also the widely-used
| Armadillo linear algebra library. In my opinion it has a much
| nicer syntax.
|
| https://arma.sourceforge.net/
| UncleOxidant wrote:
| How's the performance?
|
| EDIT: also being on Sourceforge is kind of a hinderance to
| discovery these days. I wonder why they chose to be on there
| instead of github?
| stagger87 wrote:
| It's slower but maybe the target audience is different?
| Armadillo prioritizes MATLAB like syntax. I use armadillo
| as a stepping stone between MATLAB prototypes and a hand
| rolled C++ solution, and in many scenarios it can get you a
| long ways down the road.
| flemishgun wrote:
| Tough to say something as blanket as "it's slower"...
| there are lots of operations in any linear algebra
| library. It's not a direct comparison with other C++
| linear algebra libraries, but hard to say Armadillo is
| slow based on benchmarks like this:
|
| https://conradsanderson.id.au/pdfs/sanderson_curtin_armad
| ill...
| klaussilveira wrote:
| If you want something similar, but for games:
|
| https://github.com/EricLengyel/Terathon-Math-Library
| OnionBlender wrote:
| What is the advantage over glm? The geometric algebra stuff?
| Arelius wrote:
| Another good PGA library
|
| https://github.com/jeremyong/Klein
| Solvency wrote:
| out of curiosity, when and/or how often do these high-performance
| math libraries get folded into game physics engines? Like would
| Blaze offer any sort of advantage if you were to develop a new 3d
| soft/hard body physics engine?
| cyber_kinetist wrote:
| For typical game physics engines... not that much. Math
| libraries like Eigen or Blaze use lots of template
| metaprogramming techniques under the hood that can help when
| you're doing large batched matrix multiplications (since it can
| remove temporary allocations at compile-time and can also fuse
| operations efficiently, as well as applying various SIMD
| optimizations), but it doesn't really help when you need lots
| of small operations (with mat3 / mat4 / vec3 / quat / etc.).
| Typically game physics engines tend to use iterative algorithms
| for their solvers (Gauss-Seidel, PBD, etc...) instead of
| batched "matrix"-oriented ones, so you'll get less benefits out
| of Eigen / Blaze compared to what you typically see in deep
| learning / scientific computing workloads.
|
| The codebases I've seen in many game physics engines seem to
| all roll their own minimal math libraries for these stuff, or
| even just use SIMD (SSE / AVX) intrinsics directly. Examples:
| PhysX (https://github.com/NVIDIA-Omniverse/PhysX), Box2D
| (https://github.com/erincatto/box2d), Bullet
| (https://github.com/bulletphysics/bullet3)...
| floor_ wrote:
| I don't know if I would call a math library that uses templates
| so liberally "high performance". High performance also includes
| compile time in my opinion.
| murderfs wrote:
| Your opinion is wrong.
| oivey wrote:
| Yeah. Avoiding templates almost certainly leads to losing run
| time performance. The compile time is a drop in the bucket.
___________________________________________________________________
(page generated 2024-04-17 23:01 UTC)