[HN Gopher] Show HN: Zig Topological Sort Library for Parallel P...
       ___________________________________________________________________
        
       Show HN: Zig Topological Sort Library for Parallel Processing
        
       I believe the best way to learn a language is by doing an in-depth
       project. This is my first Zig project intended for learning the
       ropes on publishing a Zig package. It turns out to be quite solid
       and performant. It might be a bit over-engineered.  This little
       library is packed with the following features:                 -
       Building dependency graph from dependency data.       - Performing
       topological sort on the dependency graph.       - Generating
       dependence-free subsets for parallel processing.       - Cycle
       detection and cycle reporting.
        
       Author : ww520
       Score  : 72 points
       Date   : 2025-04-01 17:48 UTC (5 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | Cieric wrote:
       | I've been enjoying zig myself quite a bit, I'm fairly confident I
       | could do some larger projects in it (excluding comptime since
       | it's missing features I sorely need for some of my current
       | projects.) I like it a bit more than C/C++ in a lot of cases, I
       | need it to be pushed just a tiny bit further before I can really
       | dedicate effort towards large projects in it. I was even curious
       | if I could implement the features I need myself (there is even a
       | proposal already), but got the answer "Just don't." (I don't
       | blame andrew, he already has a lot on his plate and I'm a
       | stranger to him.) So I'm at the point of either waiting for the
       | feature or forking it, haven't decided what I'm going to do
       | though.
       | 
       | More on topic for the project though, did you have any project
       | ideas for this? I think I could use this for my opencv node
       | editor, I just did the naive method of marking all outputs dirty
       | as their inputs nodes got reprocessed. I assume this would fix
       | the potential problem of recomputing a node twice? I also see you
       | mention "Cycle detection and cycle reporting." but what
       | specifically happens if I do have a cycle? Does it just give up
       | or is it something like best effort?
        
         | kreco wrote:
         | Could you describe briefly what feature you are sorely missing?
         | 
         | I like the language intention but I can't get past the syntax.
        
           | klysm wrote:
           | Syntax is so much less important that semantics that it isn't
           | even really worth talking about in my opinion
        
             | codethief wrote:
             | Readability (in the sense of "How fast can the average
             | developer parse code in the given language?") and proneness
             | to errors are a thing, though.
             | 
             | Consider, e.g., how in TypeScript object literals ({ a: b,
             | c: d, e }), object types ({ a: b; c: d; }) and object
             | destructuring ({ a, c, e } = ... ) can all look very
             | similar. Same thing for lambdas ((a: b) => b) and function
             | types ((a: b) => b). Also, additional parentheses are
             | needed to prevent the compiler from mistaking an object
             | literal ({ ... }) for a function body ({ ... }) when it is
             | returned in a lambda expression. In short: Some of
             | TypeScript's syntactic constructs are _heavily_ overloaded
             | and their meaning depends on the context.
             | 
             | For an example of proneness to errors, consider that in Nix
             | function calls (<function name> <param1> <param2> ...) and
             | lists ([<item1> <item 2> ...]) look almost the same and
             | it's very easy to confuse the two and mess up, e.g.
             | 
             | ``` let foo = [ item1 myFunction "arg1" "arg2" item3 ] ```
             | 
             | defines a list with 5 items, not 3, due to missing
             | parentheses around the function call.
        
               | klysm wrote:
               | Sure but I don't think those examples really matter once
               | you establish basic familiarity with a language. The
               | semantics and constructs a language provides are much
               | more important and debating syntax is missing the forest
               | for the trees
        
               | FL33TW00D wrote:
               | The array syntax is very offensive: `const a = [3]i32{ 1,
               | 2, 3 };` A set is denoted by braces, not an array.
        
               | CooCooCaCha wrote:
               | This is exactly why I find the language unintuitive. I
               | don't understand why they made the choices they made. For
               | example, why curly brackets?
               | 
               | I find the rust equivalent much more intuitive `let a:
               | [i32; 3] = [1, 2, 3];`
        
           | Cieric wrote:
           | For me it's all comptime stuff and it's kind of arbitrary
           | things like parsing out the type information of a function
           | doesn't include the name of the function parameters, but
           | basically everything else that has a name has that
           | information present in their info structure. The other thing
           | is tags, being able to tag things that I can parse at compile
           | time. I'm making something close to a database orm,
           | (specifically it's spacetimedb, thought it'd be fun to use
           | zig with). But information about things like primary keys,
           | auto increments, constraints and similar all has to live in a
           | different structure completely untied to the original struct
           | or function. I'd like to be able to tie those things together
           | easily to avoid mistakes and confusion. I have different
           | workarounds that I've tried, but nothing that's universal for
           | all my test cases.
           | 
           | For syntax there are a few things that I'm iffy on, but
           | nothing that I'd consider a deal breaker. I found it very
           | easy to read right out of the gate, which is basically the
           | only thing I need to really learn a new language (probably
           | the only reason I haven't learned rust yet.)
        
             | airstrike wrote:
             | Just wanted to say that Rust may look strange early on but
             | very, very quickly becomes entirely natural, so don't let
             | that be the reason why you haven't learned it is my
             | unsolicited input
        
               | Cieric wrote:
               | Yeah, I just haven't needed the memory safety that comes
               | with it and I don't have the same gripes everyone else
               | has with c's include system. At this point it just
               | doesn't have anything to offer that I really care about.
               | I only learned zig because of the comptime stuff and some
               | ease of use when it came to tls encryption. I'm a little
               | interested in rust macros, but that's really it and I
               | don't think that's enough to learn a new language. I'm
               | sure I'll eventually have a project where memory safety
               | (with speed) is a priority, but to this point it's just
               | never come up at work or the projects I work on in my
               | free time.
        
               | airstrike wrote:
               | I hear you, people have different preferences and rank
               | their preferences in different order.
               | 
               | For what it's worth, I use Rust daily and I don't really
               | care about memory issue. It's nice that it comes with the
               | package, but it's not why I do it. Believe it or not, the
               | borrow checker is first and foremost why I enjoy writing
               | Rust. It's such a brilliant idea I don't understand why
               | it's not more widely used. A helpful compiler and a good
               | (if imperfect) crate ecosystem are probably 2nd and 3rd.
        
         | ww520 wrote:
         | > More on topic for the project though, did you have any
         | project ideas for this?
         | 
         | I implemented topological sort for bringing up and shutting
         | down OS services in order like 20 years ago, but it was like a
         | quick and dirty way doing it and never formally as a published
         | library. I do it this time because it's a well understood topic
         | so I can concentrate on learning the ropes of Zig and package
         | publishing while doing something useful. And I go the extra
         | miles to find dependence free subsets for parallel processing
         | and to detect cycles.
         | 
         | > I assume this would fix the potential problem of recomputing
         | a node twice?
         | 
         | Yes. Topological sort is perfect for laying out the path of
         | computation tasks to avoid redoing things.
         | 
         | > "Cycle detection and cycle reporting." but what specifically
         | happens if I do have a cycle? Does it just give up or is it
         | something like best effort?
         | 
         | It will keep going as a best effort when cycles are detected.
         | It will produce a partial topological sorted result and a set
         | of cyclic nodes.
        
           | klysm wrote:
           | I was surprised to see this as a library at all - isn't it
           | trivial especially for small collections?
        
             | ww520 wrote:
             | Yes. The core algorithm is like 20 lines of code in the
             | run_algorithm() function. But to make it easier to use, to
             | handle different types of input, and report/query output,
             | etc. take much more. This is the difference between an idea
             | and a product in a loose sense. It gives purpose to my
             | learning process anyway.
        
       | sharmasachin98 wrote:
       | Curious: have you benchmarked it against any similar tools in
       | other languages (like Go or Rust) for comparison?
        
         | ww520 wrote:
         | No. I haven't got the chance to benchmark against others. When
         | I ran the benchmarks in the library, I could process 1 million
         | pairs of dependency in 10's of milliseconds, in my 5-year old
         | laptop.                 Testing 1000000 items, 1-to-1 chaining
         | dependency.         Add dependency 1000000 items. Time: 93ms,
         | 10645885 items/s, 93 ns/item                   Sort 1000000
         | items. Time: 113ms, 8795778 items/s, 113 ns/item
         | Testing 1000000 items, 1-to-4 chaining dependency.         Add
         | dependency 1000000 items. Time: 87ms, 11428323 items/s, 87
         | ns/item                   Sort 1000000 items. Time: 44ms,
         | 22508986 items/s, 44 ns/item            Testing 1000000 items,
         | 1-to-10 chaining dependency.         Add dependency 1000000
         | items. Time: 102ms, 9748793 items/s, 102 ns/item
         | Sort 1000000 items. Time: 31ms, 31707077 items/s, 31 ns/item
         | Testing 1000000 items, 1-to-10 chaining dependency, with
         | max_range set.         Add dependency 1000000 items. Time:
         | 25ms, 39460028 items/s, 25 ns/item                   Sort
         | 1000000 items. Time: 31ms, 31633556 items/s, 31 ns/item
        
       | chubot wrote:
       | Hm this reminds me that the Python stdlib has grown a module for
       | topological sorting fo a graph:
       | 
       | https://docs.python.org/3/library/graphlib.html
       | 
       | I haven't used it yet, I'd be curious if anyone has
        
         | nerdponx wrote:
         | I used it to build a (now permanently unfinished) lightweight
         | DAG runner that uses only the Python standard library, with the
         | intention that you can just copy the .py file into a project
         | and use it without installing anything other than Python on
         | your system. I think it might be of niche use to some people,
         | but I personally wasn't even dogfooding it, I was just
         | scratching an itch.
         | 
         | The purpose of adding it to the standard library was to
         | implement linear MRO for classes:
         | https://bugs.python.org/issue17005
        
       | duped wrote:
       | > Generating dependence-free subsets for parallel processing.
       | 
       | Unsure how this is defined (*) but graph cutting approaches to
       | concurrent task scheduling is both pessimistic (poor utilization
       | of available resources) and (iirc) NP-hard, so you pay an big
       | cost upfront.
       | 
       | On the other hand, if you know the indegree/outdegree of each
       | node at the time they are visited (meaning the graph is static)
       | you can run Kahn's algorithm concurrently and put each node into
       | a shared work queue. You can optimize that further by having per-
       | thread work queues and stealing between them. Depending on what
       | the nodes represent there are even more optimizations and
       | heuristics, concurrent task scheduling is a hard problem.
       | 
       | * imagine the graph
       | 
       | (a, b) (a, c) (b, d) (c, d)
       | 
       | Is it possible to get nodes b and c in parallel "subsets" in your
       | library?
        
         | ww520 wrote:
         | Yes. It would produce dependence-free subsets. I just ran your
         | sample (assuming a,b means a depends on b).
         | Topologically sorted sets: [ { d }  { b c }  { a }  ]
         | Topologically sorted list: [ d b c a ]       Nodes: [ a b c d ]
         | Dependency tree:       [ d ]         d -> [ b c ]           b
         | -> [ a ]             a ->           c -> [ a ]             a ->
         | 
         | The dependence-free subset finding is probably not exhausting
         | and optimal. I haven't gone through formal proofing. It's
         | opportunistic and best effort at best currently.
        
           | duped wrote:
           | How are the subsets defined?
        
             | ww520 wrote:
             | At every round of the algorithm, all nodes with 0 in-degree
             | (i.e. they are not depending on anyone) are collected as a
             | dependence-free subset.
             | 
             | They serve as the root set to the rest of the graph for the
             | current round. The depending nodes reached from root set
             | have their in-degree decremented. When their in-degrees
             | reach 0, they are added to the next root set.
             | 
             | I'm using double-buffering to maintain the current root set
             | for processing and to collect the next root set for the
             | next round, instead of using a queue as in Kahn's
             | algorithm. At the end of the round, I simply swap the
             | double-buffers. It's very efficient. When the next root set
             | is empty, all nodes have been processed.
        
       | jedisct1 wrote:
       | Good job! This is really cool!
        
       | tediousgraffit1 wrote:
       | I really like this, this is the perfect size project for
       | exploring a new piece of tech. I especially like that you
       | implemented an actual cli and not just tests.
        
         | ww520 wrote:
         | Yes. The CLI is important. It's great for running tests and
         | driving development. It also forces me to dog-food my own
         | library to think in the shoes of the users of the library. In
         | fact a number of features were added due to shortcomings
         | exposed while implementing the CLI.
        
       ___________________________________________________________________
       (page generated 2025-04-01 23:00 UTC)