[HN Gopher] Show HN: Zig Topological Sort Library for Parallel P...
___________________________________________________________________
Show HN: Zig Topological Sort Library for Parallel Processing
I believe the best way to learn a language is by doing an in-depth
project. This is my first Zig project intended for learning the
ropes on publishing a Zig package. It turns out to be quite solid
and performant. It might be a bit over-engineered. This little
library is packed with the following features: -
Building dependency graph from dependency data. - Performing
topological sort on the dependency graph. - Generating
dependence-free subsets for parallel processing. - Cycle
detection and cycle reporting.
Author : ww520
Score : 72 points
Date : 2025-04-01 17:48 UTC (5 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| Cieric wrote:
| I've been enjoying zig myself quite a bit, I'm fairly confident I
| could do some larger projects in it (excluding comptime since
| it's missing features I sorely need for some of my current
| projects.) I like it a bit more than C/C++ in a lot of cases, I
| need it to be pushed just a tiny bit further before I can really
| dedicate effort towards large projects in it. I was even curious
| if I could implement the features I need myself (there is even a
| proposal already), but got the answer "Just don't." (I don't
| blame andrew, he already has a lot on his plate and I'm a
| stranger to him.) So I'm at the point of either waiting for the
| feature or forking it, haven't decided what I'm going to do
| though.
|
| More on topic for the project though, did you have any project
| ideas for this? I think I could use this for my opencv node
| editor, I just did the naive method of marking all outputs dirty
| as their inputs nodes got reprocessed. I assume this would fix
| the potential problem of recomputing a node twice? I also see you
| mention "Cycle detection and cycle reporting." but what
| specifically happens if I do have a cycle? Does it just give up
| or is it something like best effort?
| kreco wrote:
| Could you describe briefly what feature you are sorely missing?
|
| I like the language intention but I can't get past the syntax.
| klysm wrote:
| Syntax is so much less important that semantics that it isn't
| even really worth talking about in my opinion
| codethief wrote:
| Readability (in the sense of "How fast can the average
| developer parse code in the given language?") and proneness
| to errors are a thing, though.
|
| Consider, e.g., how in TypeScript object literals ({ a: b,
| c: d, e }), object types ({ a: b; c: d; }) and object
| destructuring ({ a, c, e } = ... ) can all look very
| similar. Same thing for lambdas ((a: b) => b) and function
| types ((a: b) => b). Also, additional parentheses are
| needed to prevent the compiler from mistaking an object
| literal ({ ... }) for a function body ({ ... }) when it is
| returned in a lambda expression. In short: Some of
| TypeScript's syntactic constructs are _heavily_ overloaded
| and their meaning depends on the context.
|
| For an example of proneness to errors, consider that in Nix
| function calls (<function name> <param1> <param2> ...) and
| lists ([<item1> <item 2> ...]) look almost the same and
| it's very easy to confuse the two and mess up, e.g.
|
| ``` let foo = [ item1 myFunction "arg1" "arg2" item3 ] ```
|
| defines a list with 5 items, not 3, due to missing
| parentheses around the function call.
| klysm wrote:
| Sure but I don't think those examples really matter once
| you establish basic familiarity with a language. The
| semantics and constructs a language provides are much
| more important and debating syntax is missing the forest
| for the trees
| FL33TW00D wrote:
| The array syntax is very offensive: `const a = [3]i32{ 1,
| 2, 3 };` A set is denoted by braces, not an array.
| CooCooCaCha wrote:
| This is exactly why I find the language unintuitive. I
| don't understand why they made the choices they made. For
| example, why curly brackets?
|
| I find the rust equivalent much more intuitive `let a:
| [i32; 3] = [1, 2, 3];`
| Cieric wrote:
| For me it's all comptime stuff and it's kind of arbitrary
| things like parsing out the type information of a function
| doesn't include the name of the function parameters, but
| basically everything else that has a name has that
| information present in their info structure. The other thing
| is tags, being able to tag things that I can parse at compile
| time. I'm making something close to a database orm,
| (specifically it's spacetimedb, thought it'd be fun to use
| zig with). But information about things like primary keys,
| auto increments, constraints and similar all has to live in a
| different structure completely untied to the original struct
| or function. I'd like to be able to tie those things together
| easily to avoid mistakes and confusion. I have different
| workarounds that I've tried, but nothing that's universal for
| all my test cases.
|
| For syntax there are a few things that I'm iffy on, but
| nothing that I'd consider a deal breaker. I found it very
| easy to read right out of the gate, which is basically the
| only thing I need to really learn a new language (probably
| the only reason I haven't learned rust yet.)
| airstrike wrote:
| Just wanted to say that Rust may look strange early on but
| very, very quickly becomes entirely natural, so don't let
| that be the reason why you haven't learned it is my
| unsolicited input
| Cieric wrote:
| Yeah, I just haven't needed the memory safety that comes
| with it and I don't have the same gripes everyone else
| has with c's include system. At this point it just
| doesn't have anything to offer that I really care about.
| I only learned zig because of the comptime stuff and some
| ease of use when it came to tls encryption. I'm a little
| interested in rust macros, but that's really it and I
| don't think that's enough to learn a new language. I'm
| sure I'll eventually have a project where memory safety
| (with speed) is a priority, but to this point it's just
| never come up at work or the projects I work on in my
| free time.
| airstrike wrote:
| I hear you, people have different preferences and rank
| their preferences in different order.
|
| For what it's worth, I use Rust daily and I don't really
| care about memory issue. It's nice that it comes with the
| package, but it's not why I do it. Believe it or not, the
| borrow checker is first and foremost why I enjoy writing
| Rust. It's such a brilliant idea I don't understand why
| it's not more widely used. A helpful compiler and a good
| (if imperfect) crate ecosystem are probably 2nd and 3rd.
| ww520 wrote:
| > More on topic for the project though, did you have any
| project ideas for this?
|
| I implemented topological sort for bringing up and shutting
| down OS services in order like 20 years ago, but it was like a
| quick and dirty way doing it and never formally as a published
| library. I do it this time because it's a well understood topic
| so I can concentrate on learning the ropes of Zig and package
| publishing while doing something useful. And I go the extra
| miles to find dependence free subsets for parallel processing
| and to detect cycles.
|
| > I assume this would fix the potential problem of recomputing
| a node twice?
|
| Yes. Topological sort is perfect for laying out the path of
| computation tasks to avoid redoing things.
|
| > "Cycle detection and cycle reporting." but what specifically
| happens if I do have a cycle? Does it just give up or is it
| something like best effort?
|
| It will keep going as a best effort when cycles are detected.
| It will produce a partial topological sorted result and a set
| of cyclic nodes.
| klysm wrote:
| I was surprised to see this as a library at all - isn't it
| trivial especially for small collections?
| ww520 wrote:
| Yes. The core algorithm is like 20 lines of code in the
| run_algorithm() function. But to make it easier to use, to
| handle different types of input, and report/query output,
| etc. take much more. This is the difference between an idea
| and a product in a loose sense. It gives purpose to my
| learning process anyway.
| sharmasachin98 wrote:
| Curious: have you benchmarked it against any similar tools in
| other languages (like Go or Rust) for comparison?
| ww520 wrote:
| No. I haven't got the chance to benchmark against others. When
| I ran the benchmarks in the library, I could process 1 million
| pairs of dependency in 10's of milliseconds, in my 5-year old
| laptop. Testing 1000000 items, 1-to-1 chaining
| dependency. Add dependency 1000000 items. Time: 93ms,
| 10645885 items/s, 93 ns/item Sort 1000000
| items. Time: 113ms, 8795778 items/s, 113 ns/item
| Testing 1000000 items, 1-to-4 chaining dependency. Add
| dependency 1000000 items. Time: 87ms, 11428323 items/s, 87
| ns/item Sort 1000000 items. Time: 44ms,
| 22508986 items/s, 44 ns/item Testing 1000000 items,
| 1-to-10 chaining dependency. Add dependency 1000000
| items. Time: 102ms, 9748793 items/s, 102 ns/item
| Sort 1000000 items. Time: 31ms, 31707077 items/s, 31 ns/item
| Testing 1000000 items, 1-to-10 chaining dependency, with
| max_range set. Add dependency 1000000 items. Time:
| 25ms, 39460028 items/s, 25 ns/item Sort
| 1000000 items. Time: 31ms, 31633556 items/s, 31 ns/item
| chubot wrote:
| Hm this reminds me that the Python stdlib has grown a module for
| topological sorting fo a graph:
|
| https://docs.python.org/3/library/graphlib.html
|
| I haven't used it yet, I'd be curious if anyone has
| nerdponx wrote:
| I used it to build a (now permanently unfinished) lightweight
| DAG runner that uses only the Python standard library, with the
| intention that you can just copy the .py file into a project
| and use it without installing anything other than Python on
| your system. I think it might be of niche use to some people,
| but I personally wasn't even dogfooding it, I was just
| scratching an itch.
|
| The purpose of adding it to the standard library was to
| implement linear MRO for classes:
| https://bugs.python.org/issue17005
| duped wrote:
| > Generating dependence-free subsets for parallel processing.
|
| Unsure how this is defined (*) but graph cutting approaches to
| concurrent task scheduling is both pessimistic (poor utilization
| of available resources) and (iirc) NP-hard, so you pay an big
| cost upfront.
|
| On the other hand, if you know the indegree/outdegree of each
| node at the time they are visited (meaning the graph is static)
| you can run Kahn's algorithm concurrently and put each node into
| a shared work queue. You can optimize that further by having per-
| thread work queues and stealing between them. Depending on what
| the nodes represent there are even more optimizations and
| heuristics, concurrent task scheduling is a hard problem.
|
| * imagine the graph
|
| (a, b) (a, c) (b, d) (c, d)
|
| Is it possible to get nodes b and c in parallel "subsets" in your
| library?
| ww520 wrote:
| Yes. It would produce dependence-free subsets. I just ran your
| sample (assuming a,b means a depends on b).
| Topologically sorted sets: [ { d } { b c } { a } ]
| Topologically sorted list: [ d b c a ] Nodes: [ a b c d ]
| Dependency tree: [ d ] d -> [ b c ] b
| -> [ a ] a -> c -> [ a ] a ->
|
| The dependence-free subset finding is probably not exhausting
| and optimal. I haven't gone through formal proofing. It's
| opportunistic and best effort at best currently.
| duped wrote:
| How are the subsets defined?
| ww520 wrote:
| At every round of the algorithm, all nodes with 0 in-degree
| (i.e. they are not depending on anyone) are collected as a
| dependence-free subset.
|
| They serve as the root set to the rest of the graph for the
| current round. The depending nodes reached from root set
| have their in-degree decremented. When their in-degrees
| reach 0, they are added to the next root set.
|
| I'm using double-buffering to maintain the current root set
| for processing and to collect the next root set for the
| next round, instead of using a queue as in Kahn's
| algorithm. At the end of the round, I simply swap the
| double-buffers. It's very efficient. When the next root set
| is empty, all nodes have been processed.
| jedisct1 wrote:
| Good job! This is really cool!
| tediousgraffit1 wrote:
| I really like this, this is the perfect size project for
| exploring a new piece of tech. I especially like that you
| implemented an actual cli and not just tests.
| ww520 wrote:
| Yes. The CLI is important. It's great for running tests and
| driving development. It also forces me to dog-food my own
| library to think in the shoes of the users of the library. In
| fact a number of features were added due to shortcomings
| exposed while implementing the CLI.
___________________________________________________________________
(page generated 2025-04-01 23:00 UTC)