[HN Gopher] Faer-rs: Linear algebra foundation for Rust
___________________________________________________________________
Faer-rs: Linear algebra foundation for Rust
Author : nateb2022
Score : 195 points
Date : 2024-04-24 12:41 UTC (10 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| rayrrr wrote:
| How exactly does this dovetail with https://github.com/rust-
| or/good_lp ? Will it be a replacement, an enhancement, or
| something else?
| sestep wrote:
| Seems like a different focus, no? Aren't linear programming
| solvers a much narrower domain than general linear algebra
| libraries?
| fastneutron wrote:
| Correct, linear algebra and linear programming are two very
| distinct things. The latter is a widely-used optimization
| technique, and computationally depends on the former, which
| is a general mathematical framework for essentially all
| numerical computing.
| rayrrr wrote:
| I asked because Faer describes itself as a low-level linear
| algebra library with a high-level wrapper, whereas good_lp
| describes itself as a high-level wrapper around many relevant
| low-level libraries.
| constantcrying wrote:
| Linear programming and linear algebra aren't the same thing.
| electrozav wrote:
| That's code for linear programming (optimization) not for
| linear algebra
| andyferris wrote:
| Does anyone understand the difference (in the benchmark tables)
| between faer and faer(par)? Which number should be considered
| important?
| royjacobs wrote:
| If you look at the benchmark code [0] you can see it indicates
| whether Rayon is used for parallelism.
|
| [0] https://github.com/sarah-ek/faer-rs/blob/main/faer-
| bench/src...
| llimllib wrote:
| the former is run with no parallelism, and the latter with
| `parallelism::Rayon(0)`, which presumably means that it has
| some parallelism, though I don't write rust or know what Rayon
| is.
|
| [1]: https://github.com/sarah-ek/faer-
| rs/blob/172f651fafe625a2534...
| nateb2022 wrote:
| Rayon is a library that helps to parallelize sequential
| computations while guaranteeing data-race freedom. Cf.
| https://docs.rs/rayon/latest/rayon/
| nindalf wrote:
| Rayon is a crate that makes it easy to write code that
| executes in parallel. Take a function that looks like this.
| It'll execute on a single thread.
|
| fn sum_of_squares(input: &[i32]) -> i32 { input.iter()
| .map(|&i| i * i) .sum() }
|
| If you changed it to input.par_iter() then it executes in
| parallel on multiple threads.
| bayindirh wrote:
| Why Eigen is not run in parallel mode w/ Open-MP?
|
| Eigen handle most (if not all, I just skimmed the tables) tasks
| in parallel [0]. Plus, it has hand-tuned SIMD code inside, so it
| needs "-march=native -mtune=native -O3" to make it "full send".
|
| Some solvers' speed change more than 3x with "-O3", to begin
| with.
|
| This is the Eigen benchmark file [1].
|
| [0]: https://eigen.tuxfamily.org/dox/TopicMultiThreading.html
|
| [1]: https://github.com/sarah-ek/faer-rs/blob/main/faer-
| bench/eig...
| sarah-ek wrote:
| author here, eigen is compiled with -fopenmp, which enables
| parallelism by default
| bayindirh wrote:
| Hi! Thanks for chiming in!
|
| Did you check with resource utilization? If you don't provide
| "OMP_NUM_THREADS=n", Eigen doesn't auto-parallelize by
| default.
| sarah-ek wrote:
| i did check, yes
| dannyz wrote:
| Related on the Eigen benchmarking, I see a lot of use of auto
| in the benchmarks. Eigen does not recommend using it like this
| (https://eigen.tuxfamily.org/dox/TopicPitfalls.html) because
| the template expressions can be quite complicated. I'm not sure
| if it matters here or not, but it would probably better not to
| use it in the benchmarks.
| sarah-ek wrote:
| i've contributed to eigen in the past and know enough about
| the internals of the codebase to know my way around safe
| `auto` usage
| dannyz wrote:
| I wasn't worried about safe usage, more that some of the
| initialization may be moved inside the benchmarking
| function instead of outside of it like intended. I'm sure
| you know more about it than me though.
| bayindirh wrote:
| While auto is a compile time burden, it creates a lot of load
| during compilation of this benchmark.
|
| My complete Ph.D., using ton of Eigen components plus other
| libraries was compiling in 10 seconds flat on a way older
| computer. This requires gigs of RAM plus a minute.
| sarah-ek wrote:
| the long compile times are mostly because im instantiating
| every dense decomposition in the library in one translation
| unit, for several data types (f32, f64, f128, c32, c64,
| c128)
| citizen_friend wrote:
| Is there a rust API for LAPACK?
| sanxiyn wrote:
| Yes, see https://github.com/blas-lapack-rs.
| meisel wrote:
| Looking at thin matrix SVD, it appears _much_ faster than
| everyone else. I'm curious what it's doing differently at a high
| level and if there's any tradeoff in accuracy. I also wonder how
| it compares to MKL, which is typically the winner in all these
| benchmarks on Intel.
| sarah-ek wrote:
| im in the process of refactoring the benchmark code at the
| moment, and plan to include mkl in the benches soon.
|
| overall, the results show that faer is usually faster, or even
| with openblas, and slower than mkl on my desktop
| mjhay wrote:
| Wow, that's impressive! I wouldn't expect anything to be able
| to beat MKL, given the optimizations made based on
| proprietary information.
| netwrt wrote:
| Does this include an einsum function? I find that it makes
| complex operations easier to work with
| sarah-ek wrote:
| not yet. a tensor api is on the long term todo list, but it's a
| big undertaking and i'd like to focus on matrix operations for
| the time being
| Ericson2314 wrote:
| Nixpkgs has a some pluggable BLAS/Lapack implementation infra. If
| this does offer a shim layer doing exactly that interace, it
| would be nice to see this packaged as a new alternative!
| heinrichhartman wrote:
| This does not seem to depend on BLAS/LAPACK.
|
| Good to see LU decomposition with full pivoting being implemented
| here (which is missing from BLAS/LAPACK). This gives a fast,
| numerically stable way to compute the rank of a matrix (with a
| basis of the kernel and image spaces). Details:
| https://www.heinrichhartmann.com/posts/2021-03-08-rank-decom....
| sarah-ek wrote:
| lapack does expose a full pivoting lu as far as i can tell?
| https://netlib.org/lapack/explore-html/d8/d4d/group__getc2_g...
| bee_rider wrote:
| If you are going to include vs MKL benchmarks in your repo,
| full pivoting LU might be one to consider. I think most
| people are happy with partial pivoting, so I sorta suspect
| Intel hasn't heavily tuned their implementation, might be
| room to beat up on the king of the hill, haha.
| sarah-ek wrote:
| funny you mention that, the full pivoting. it's one of the
| few benchmarks where faer wins by a huge margin
|
| n faer mkl openblas
|
| 1024 27.06 ms 186.33 ms 793.26 ms
|
| 1536 73.57 ms 605.71 ms 2.65 s
|
| 2048 280.74 ms 1.53 s 8.99 s
|
| 2560 867.15 ms 3.31 s 17.06 s
|
| 3072 1.87 s 6.13 s 55.21 s
|
| 3584 3.42 s 10.18 s 71.56 s
|
| 4096 6.11 s 15.70 s 168.88 s
| adgjlsfhk1 wrote:
| it might be interesting to add butterfly lu
| https://arxiv.org/abs/1901.11371. it's a way of doing a
| numerically stable lu like factorization without any
| pivoting, which allows it to parallelize better.
| sarah-ek wrote:
| looks interesting! thanks for sharing
| bee_rider wrote:
| It looks like they are describing a preconditioner there.
| adgjlsfhk1 wrote:
| the key point is that the preconditioner allows you to
| skip pivoting which is really nice because the pivoting
| introduces a lot of data dependence.
| heinrichhartman wrote:
| Thanks for pointing this out. Looks like only the python
| bindings are not included in nunpy.
| bmitc wrote:
| Is there a collection of algorithms published somewhere, in a
| coherent manner? It seems like to me, there should be a
| collection of matrix algebra, symbolic algebra, etc. algorithms
| somewhere that makes it easy to implement these in any language
| or system. As it is, it seems you need to already be an expert
| or researcher in the space to make any movement.
| rvdca wrote:
| One could argue that libraries like BLAS/LAPACK are those
| collections...
| bmitc wrote:
| That can be argued, but they're pretty esoteric. For
| example, what language are they written in? Fortran,
| assembly, etc.? Probably nothing the majority of developers
| or mathematicians can't understand. And creating bindings
| for them is non-trivial. I don't even know where to start
| from the website. And on the website, all the links to the
| routines are broken. https://www.netlib.org/blas/
| sevagh wrote:
| BLIS is an interesting new direction in that regard:
| https://github.com/flame/blis
|
| >The BLAS-like Library Instantiation Software (BLIS)
| framework is a new infrastructure for rapidly instantiating
| Basic Linear Algebra Subprograms (BLAS) functionality. Its
| fundamental innovation is that virtually all computation
| within level-2 (matrix-vector) and level-3 (matrix-matrix)
| BLAS operations can be expressed and optimized in terms of
| very simple kernels.
| sevagh wrote:
| On the contrary, it seemingly can be used to make a BLAS
| implementation (example in a PR: https://github.com/sarah-
| ek/faer-rs/pull/37)
| arijun wrote:
| In a few fields of rust we are starting to see a convergence of
| lower level libraries that then can be shared amongst the higher
| level crates. For example, wgpu is seeing broad use across a
| bunch of libraries from game engines to UI libraries. This then
| allows the shared libraries to be made more robust, with more
| shared resources going into them.
|
| Does anyone know how much of this is happening in the
| matrix/array space in rust? There are several libraries that have
| overlapping goals: ndarray, nalgebra, etc. How much do they share
| in terms of underlying code? Do they share data structures, or is
| anything like that on the horizon?
| antimora wrote:
| WGPU is also used by Burn deep learning framework and WONNX for
| computation.
| sarah-ek wrote:
| as far as i know, very little is shared. ndarray-linalg is
| mostly a lapack wrapper. nalgebra and faer both implement the
| algorithms from scratch, with nalgebra focusing more on smaller
| matrices
| owlbite wrote:
| Something looks dubious with the benchmarking here to me.
|
| Top-tier numerical linear algebra libraries hold all hit the same
| number (give or take a few percent) for matrix multiply, because
| they're all achieving the same hardware peak performance.
| sarah-ek wrote:
| one issue that may be affecting the result is that openmp's
| threadpool doesn't play well with rayon's. i've seen some perf
| degradation in the past (on both sides) when both are used in
| the same program
|
| i plan to address that after refactoring the benches by
| executing each library individually
| jimbobraggins wrote:
| Very cool project! I'd suggest before running the new
| benchmarks to reach out to to the developers of the packages
| you are testing against to see if they think the benchmark
| you wrote is doing efficient calling conventions. I work on a
| large open source software project and we've had people claim
| they are 10x faster than us while they were really using our
| code in some very malformed ways.
|
| Also stops them from grumbling after you post good results!
| sarah-ek wrote:
| fair enough. i try to stay in touch with the eigen and
| nalgebra developers so i have a good idea on how to write
| code efficiently with them. for openblas and mkl i've been
| trying recently to call into the lapack api (benches doing
| that are still unpublished at the moment), that way im
| using a universal interface for that kinda stuff.
|
| and of course i do check the cpu utilization to make sure
| that all threads are spinning for multithreaded benchmarks,
| and occasionally check the assembly of the hot loops to
| make sure that the libraries were built properly and are
| dispatching to the right code. (avx2, avx512, etc) so
| overall i try to take it seriously and i'll give credit
| where credit is due when it turns out another library is
| faster
| loeg wrote:
| A bit of a tangent, but the same author has something like a
| libdivide (C++) for Rust: https://github.com/sarah-ek/fastdiv .
| Cool.
___________________________________________________________________
(page generated 2024-04-24 23:01 UTC)