hngopher.com

       [HN Gopher] Co-Dfns v5.7.0
       ___________________________________________________________________
        
       Co-Dfns v5.7.0
        
       Author : Tomte
       Score  : 26 points
       Date   : 2024-07-10 16:11 UTC (6 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | Duanemclemore wrote:
       | Cool - I was just installing the drivers and ArrayFire needed to
       | run Co-Dfns yesterday. When I get back to it later I'll pull this
       | new version. Thanks for posting!
        
       | mlochbaum wrote:
       | Huh. This is the commit that introduces the register model:
       | https://github.com/Co-dfns/Co-dfns/commit/f89919144f22441f21...
       | 
       | In the compiler, it's working with dense adjacency matrix
       | representations, so this will be at best O(n^2) in whatever the
       | relevant sort of node is (expressions? functions?). "at present
       | is not optimized" seems a bit of an understatement here: I've
       | never heard anything about these sorts of graph algorithms being
       | possible with good asymptotics in an array style. In practice, I
       | think any value of Co-dfns would be more in the fact that it
       | emits GPU code than that it does it quickly, but this calls into
       | question the value of writing it in APL, I would say (for what
       | it's worth, I believe Co-dfns is still not able to compile
       | itself).
       | 
       | The phrase "allocate all intermediate arrays" seems fairly
       | confusing: what's actually being allocated must be space for the
       | pointers to these arrays and other metadata, not the data of the
       | array itself. As the data is variable-size, it can't be fully
       | planned out at compile time, and I'm fairly sure the register
       | allocation is done when there's not enough shape or type
       | information to even make an attempt at data allocation. This
       | change can only improve constant overhead for primitive calls,
       | and won't be relevant to computations where most of the work is
       | on large arrays. I think it's helpful for Co-dfns in a practical
       | sense, but not too exciting as it only helps with bad cases: if
       | this is important for a user then they'd probably end up with
       | better performance by switching to a statically-typed language
       | when it comes time to optimize.
       | 
       | It does mention that "For most small array values, we will use
       | static allocation types, which prevents the need for allocating
       | additional space for an array above and beyond its header.", so
       | small arrays are taken care of by register allocation. Tradeoff
       | there between being able to fit larger arrays in this space
       | versus wasting space for values that don't use the full header.
       | 
       | Constant lifting within the compiler is pretty cool, I'll have to
       | look into that.
        
         | moonchild wrote:
         | > I've never heard anything about these sorts of graph
         | algorithms being possible with good asymptotics in an array
         | style
         | 
         | yeah--idk graph algos really, but have heard parallelising
         | combinatorial search in general (eg sat) is hard because forks
         | and joins happen heterogeneously and erratically. this 2001
         | vintage has a bunch of completely sequential graph colouring
         | algorithms in j
         | https://dl.acm.org/doi/pdf/10.1145/570406.570416 (and j at
         | least has sparse matrices!)
         | 
         | > Constant lifting within the compiler is pretty cool, I'll
         | have to look into that.
         | 
         | hrm, it seems to refer to 2 things: 1) constants allocated in a
         | special space; 2) interning. 2 is obviously worthwhile; 1 i
         | would guess is related to memory being weird on the gpu?
        
       | systems wrote:
       | "High-performance, Reliable, and Parallel APL" -github
       | 
       | because i had no clue what co-dfns is
        
         | tokyovigilante wrote:
         | Still not further enlightened.
        
           | rscho wrote:
           | It's a compiler for the APL language, that generates GPU
           | code. What's interesting about it is that it's also _hosted
           | on the GPU_ , and it also has a very original architecture
           | representing trees as arrays of pointers.
        
             | jiggawatts wrote:
             | This is a very valuable contribution to computer science
             | that has been overlooked because of how obscure the
             | implementation language is.
             | 
             | The idea of that flattening a tree into an array with
             | relative indexing to represent the structure opens up a
             | whole bag of performance tools.
             | 
             | Parallel processing is just one (major!) aspect. It also
             | enables GPU processing and unlocks SIMD instructions. Not
             | to mention that instead of millions of tiny inefficient
             | mallocs, this can use large contiguous arrays.
             | 
             | An itch I've been meaning to scratch is implementing
             | something like Equality Saturation on top of this data
             | structure in the style of EGG (e-graphs good).
             | 
             | Combine that with the Category Theory work done recently in
             | the Haskell world for arbitrary structure to structure
             | functional maps and you could make a compiler that takes a
             | bunch of term rewrite rules and spits out parallel GPU code
             | that can apply optimisation rewrites at enormous speed.
             | 
             | "The best thing about the computer revolution is that it
             | hasn't happened yet."
        
       ___________________________________________________________________
       (page generated 2024-07-10 23:01 UTC)