[HN Gopher] Co-Dfns v5.7.0
___________________________________________________________________
Co-Dfns v5.7.0
Author : Tomte
Score : 26 points
Date : 2024-07-10 16:11 UTC (6 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Duanemclemore wrote:
| Cool - I was just installing the drivers and ArrayFire needed to
| run Co-Dfns yesterday. When I get back to it later I'll pull this
| new version. Thanks for posting!
| mlochbaum wrote:
| Huh. This is the commit that introduces the register model:
| https://github.com/Co-dfns/Co-dfns/commit/f89919144f22441f21...
|
| In the compiler, it's working with dense adjacency matrix
| representations, so this will be at best O(n^2) in whatever the
| relevant sort of node is (expressions? functions?). "at present
| is not optimized" seems a bit of an understatement here: I've
| never heard anything about these sorts of graph algorithms being
| possible with good asymptotics in an array style. In practice, I
| think any value of Co-dfns would be more in the fact that it
| emits GPU code than that it does it quickly, but this calls into
| question the value of writing it in APL, I would say (for what
| it's worth, I believe Co-dfns is still not able to compile
| itself).
|
| The phrase "allocate all intermediate arrays" seems fairly
| confusing: what's actually being allocated must be space for the
| pointers to these arrays and other metadata, not the data of the
| array itself. As the data is variable-size, it can't be fully
| planned out at compile time, and I'm fairly sure the register
| allocation is done when there's not enough shape or type
| information to even make an attempt at data allocation. This
| change can only improve constant overhead for primitive calls,
| and won't be relevant to computations where most of the work is
| on large arrays. I think it's helpful for Co-dfns in a practical
| sense, but not too exciting as it only helps with bad cases: if
| this is important for a user then they'd probably end up with
| better performance by switching to a statically-typed language
| when it comes time to optimize.
|
| It does mention that "For most small array values, we will use
| static allocation types, which prevents the need for allocating
| additional space for an array above and beyond its header.", so
| small arrays are taken care of by register allocation. Tradeoff
| there between being able to fit larger arrays in this space
| versus wasting space for values that don't use the full header.
|
| Constant lifting within the compiler is pretty cool, I'll have to
| look into that.
| moonchild wrote:
| > I've never heard anything about these sorts of graph
| algorithms being possible with good asymptotics in an array
| style
|
| yeah--idk graph algos really, but have heard parallelising
| combinatorial search in general (eg sat) is hard because forks
| and joins happen heterogeneously and erratically. this 2001
| vintage has a bunch of completely sequential graph colouring
| algorithms in j
| https://dl.acm.org/doi/pdf/10.1145/570406.570416 (and j at
| least has sparse matrices!)
|
| > Constant lifting within the compiler is pretty cool, I'll
| have to look into that.
|
| hrm, it seems to refer to 2 things: 1) constants allocated in a
| special space; 2) interning. 2 is obviously worthwhile; 1 i
| would guess is related to memory being weird on the gpu?
| systems wrote:
| "High-performance, Reliable, and Parallel APL" -github
|
| because i had no clue what co-dfns is
| tokyovigilante wrote:
| Still not further enlightened.
| rscho wrote:
| It's a compiler for the APL language, that generates GPU
| code. What's interesting about it is that it's also _hosted
| on the GPU_ , and it also has a very original architecture
| representing trees as arrays of pointers.
| jiggawatts wrote:
| This is a very valuable contribution to computer science
| that has been overlooked because of how obscure the
| implementation language is.
|
| The idea of that flattening a tree into an array with
| relative indexing to represent the structure opens up a
| whole bag of performance tools.
|
| Parallel processing is just one (major!) aspect. It also
| enables GPU processing and unlocks SIMD instructions. Not
| to mention that instead of millions of tiny inefficient
| mallocs, this can use large contiguous arrays.
|
| An itch I've been meaning to scratch is implementing
| something like Equality Saturation on top of this data
| structure in the style of EGG (e-graphs good).
|
| Combine that with the Category Theory work done recently in
| the Haskell world for arbitrary structure to structure
| functional maps and you could make a compiler that takes a
| bunch of term rewrite rules and spits out parallel GPU code
| that can apply optimisation rewrites at enormous speed.
|
| "The best thing about the computer revolution is that it
| hasn't happened yet."
___________________________________________________________________
(page generated 2024-07-10 23:01 UTC)