hngopher.com

       [HN Gopher] High-performance computing, with much less code
       ___________________________________________________________________
        
       High-performance computing, with much less code
        
       Author : mpweiher
       Score  : 91 points
       Date   : 2025-03-14 13:53 UTC (2 days ago)
        
 (HTM) web link (news.mit.edu)
 (TXT) w3m dump (news.mit.edu)
        
       | fancyfredbot wrote:
       | Previously discussed in this thread (which links to the actual
       | code rather than the press release):
       | https://news.ycombinator.com/item?id=43365734
        
       | somethingsome wrote:
       | Halide is mainly for image processing, Exo seems more general
       | computations oriented, but do they differ in other ways?
        
       | gotoeleven wrote:
       | Does anyone know of literature describing why compilers can't do
       | these optimizations on top of vanilla code, without requiring an
       | extra DSL? Is there just no ergonomic way to express the extra
       | information about what operation orderings are optimal or allowed
       | for specific platforms to the compiler?
        
         | hansvm wrote:
         | > on top of vanilla code
         | 
         | > without requiring an extra DSL
         | 
         | Give or take some nitpicking about where a language extension
         | stops and a DSL begins, those things are the same. The only
         | question is the syntax. The key thing the referenced paper
         | shows is a set of composable operations (extremely nice
         | property to have in a language).
         | 
         | There are other solutions. E.g., Polly [0] is capable of
         | creating the 6-loop tiled square matrix multiplication routine
         | from the vanilla 3-loop implementation. Much like auto-
         | vectorization though, there are limits. E.g., if changing the
         | scheduling also requires a change in your data representation
         | -- like AoSoA rather than AoS or SoA to fit collocated data
         | into a cache line -- you'll have to manually write the
         | scheduling code as well since a vanilla loop no longer indexes
         | the right things.
         | 
         | [0] https://polly.llvm.org/
        
           | achierius wrote:
           | One benefit of making a 'properly new' language is that you
           | don't have to support old code; I've definitely seen a lot of
           | cool optimizations dropped because while they'd help in one
           | place, they'd blow up a loop somewhere else, and ultimately
           | there's only so far you can go with using heuristics to make
           | your optimization apply to one but not the other. Sometimes
           | (e.g. taking advantage of UB) this can even be functional --
           | 'breaking' some pieces of non-conformant code in exchange for
           | getting better performance elsewhere. This is especially
           | relevant in broad-domain languages like C/C++; Linus in
           | particular has opined multiple times against the GCC devs
           | choosing to implement new optimizations that ended up
           | breaking code which had been working fine in Linux for years
           | up to that point.
        
         | dzaima wrote:
         | You can quite often write C code that compilers will
         | autovectorize into what you want.. if the thing in question is
         | a simple case, or you put in effort into doing the non-trivial
         | aspects yourself.
         | 
         | e.g. for the loop reordering example [1] in C you could just,
         | well, manually reorder the loops; and the simple loop would
         | autovectorize to AVX2 just fine.
         | 
         | The things that are messy-to-impossible to achieve in current
         | traditional compilers are things where the compiler thinks it
         | does a smart transformation, but it ends up making things
         | worse.
         | 
         | [1]: https://github.com/exo-
         | lang/exo/tree/main/examples/avx2_matm...
        
       | CyberDildonics wrote:
       | This looks like they are rebranding Halide into something that
       | can be sold as "AI".
        
       | boxed wrote:
       | I wonder how this would compete against something like Mojo.
        
         | convolvatron wrote:
         | idk if Mojo is 'the answer', but eventually, a compiled
         | language with first class tensors and concurrency is necessary
         | to simultaneously support heterogeneity, performance, and
         | usability. https://chapel-lang.org is the only other one I know
         | of that's currently in active development (?)
        
       ___________________________________________________________________
       (page generated 2025-03-16 23:01 UTC)