[HN Gopher] High-performance computing, with much less code
___________________________________________________________________
High-performance computing, with much less code
Author : mpweiher
Score : 91 points
Date : 2025-03-14 13:53 UTC (2 days ago)
(HTM) web link (news.mit.edu)
(TXT) w3m dump (news.mit.edu)
| fancyfredbot wrote:
| Previously discussed in this thread (which links to the actual
| code rather than the press release):
| https://news.ycombinator.com/item?id=43365734
| somethingsome wrote:
| Halide is mainly for image processing, Exo seems more general
| computations oriented, but do they differ in other ways?
| gotoeleven wrote:
| Does anyone know of literature describing why compilers can't do
| these optimizations on top of vanilla code, without requiring an
| extra DSL? Is there just no ergonomic way to express the extra
| information about what operation orderings are optimal or allowed
| for specific platforms to the compiler?
| hansvm wrote:
| > on top of vanilla code
|
| > without requiring an extra DSL
|
| Give or take some nitpicking about where a language extension
| stops and a DSL begins, those things are the same. The only
| question is the syntax. The key thing the referenced paper
| shows is a set of composable operations (extremely nice
| property to have in a language).
|
| There are other solutions. E.g., Polly [0] is capable of
| creating the 6-loop tiled square matrix multiplication routine
| from the vanilla 3-loop implementation. Much like auto-
| vectorization though, there are limits. E.g., if changing the
| scheduling also requires a change in your data representation
| -- like AoSoA rather than AoS or SoA to fit collocated data
| into a cache line -- you'll have to manually write the
| scheduling code as well since a vanilla loop no longer indexes
| the right things.
|
| [0] https://polly.llvm.org/
| achierius wrote:
| One benefit of making a 'properly new' language is that you
| don't have to support old code; I've definitely seen a lot of
| cool optimizations dropped because while they'd help in one
| place, they'd blow up a loop somewhere else, and ultimately
| there's only so far you can go with using heuristics to make
| your optimization apply to one but not the other. Sometimes
| (e.g. taking advantage of UB) this can even be functional --
| 'breaking' some pieces of non-conformant code in exchange for
| getting better performance elsewhere. This is especially
| relevant in broad-domain languages like C/C++; Linus in
| particular has opined multiple times against the GCC devs
| choosing to implement new optimizations that ended up
| breaking code which had been working fine in Linux for years
| up to that point.
| dzaima wrote:
| You can quite often write C code that compilers will
| autovectorize into what you want.. if the thing in question is
| a simple case, or you put in effort into doing the non-trivial
| aspects yourself.
|
| e.g. for the loop reordering example [1] in C you could just,
| well, manually reorder the loops; and the simple loop would
| autovectorize to AVX2 just fine.
|
| The things that are messy-to-impossible to achieve in current
| traditional compilers are things where the compiler thinks it
| does a smart transformation, but it ends up making things
| worse.
|
| [1]: https://github.com/exo-
| lang/exo/tree/main/examples/avx2_matm...
| CyberDildonics wrote:
| This looks like they are rebranding Halide into something that
| can be sold as "AI".
| boxed wrote:
| I wonder how this would compete against something like Mojo.
| convolvatron wrote:
| idk if Mojo is 'the answer', but eventually, a compiled
| language with first class tensors and concurrency is necessary
| to simultaneously support heterogeneity, performance, and
| usability. https://chapel-lang.org is the only other one I know
| of that's currently in active development (?)
___________________________________________________________________
(page generated 2025-03-16 23:01 UTC)