[HN Gopher] dgsh - Directed Graph Shell
       ___________________________________________________________________
        
       dgsh - Directed Graph Shell
        
       Author : pabs3
       Score  : 116 points
       Date   : 2025-09-30 13:39 UTC (9 hours ago)
        
 (HTM) web link (www2.dmst.aueb.gr)
 (TXT) w3m dump (www2.dmst.aueb.gr)
        
       | uncletaco wrote:
       | Hello. In English this makes me think of the phrase "dog shit".
       | Not sure if that's intentional or not.
        
         | pentaphobe wrote:
         | Second English speaker here who didn't make that connection at
         | all
        
           | nasretdinov wrote:
           | English is my third language and I can confirm I didn't even
           | think about this
        
             | rirze wrote:
             | Same english is my fourth language and it didn't even
             | appear to me
        
               | DSpinellis wrote:
               | Author of dgsh here. This is definitely not what I had in
               | mind.
        
           | lucideer wrote:
           | Another English speaker data point here & I actually read
           | dogshit before I read dgsh.
        
         | DonHopkins wrote:
         | That's what I think when I hear "bash".
        
           | kbr2000 wrote:
           | batshit?
        
           | dotnetcarpenter wrote:
           | That's literally the word for shit in Norwegian: baesj https:
           | //upload.wikimedia.org/wikipedia/commons/6/69/Nb-b%C3%...
        
         | goldenCeasar wrote:
         | Now I can't unconnect this, I hope OP was aware because now he
         | wont forget too.
        
           | em-bee wrote:
           | and no matter how much i try, i can't make the connection.
           | best i can come up with is dogshell, and even that is a
           | stretch. phew...
        
       | jimbokun wrote:
       | This is very interesting, but I'm wondering how it compares to
       | just using a dynamic language like Python or Ruby for the same
       | tasks. Curious how the line count to express the same tasks would
       | come out.
        
         | PaulHoule wrote:
         | There is a lot of stuff for Python which follows the "express
         | computation as a dag" approach, especially Apache Airflow
         | 
         | https://airflow.apache.org/
        
           | DSpinellis wrote:
           | Apache Airflow solves a very different problem. Its DAGs are
           | static dependencies between sequentially executed processing
           | steps, whereas the DAGs of dgsh express live direct data
           | flows.
        
             | PaulHoule wrote:
             | Yeah, there are also the boxes and lines tools like
             | 
             | https://www.knime.com/
             | 
             | which have their own subculture. You could solve the same
             | problems they do with pandas and scikit-learn but people
             | who use those tools would never use pandas and scikit-learn
             | and vice versa.
             | 
             | Circa 2015 I was thinking those tools all had the
             | architectural flaw that they pass relational rows over the
             | lines as opposed to JSON objects (or equivalent) which
             | means you had to realize joins as highly complex graphs
             | where things that seem like local concerns to me require a
             | global structure and where what seems like a little change
             | to management changes the whole graph in a big way.
             | 
             | I found the people who were buying up that sort of tools
             | didn't give a damn because they thought customers demanded
             | the speed of columnar execution which our way couldn't
             | deliver.
             | 
             | I made a prototype that gave the right answers every time
             | and then went to work for a place which had some luck
             | selling their own version that didn't always give the right
             | answers because: they didn't know what algebra it
             | supported, didn't believe something like that had an
             | algebra, and didn't properly tear the pipeline down at the
             | end.
        
             | jpitz wrote:
             | Do you mean to say that two non-dependant tasks in an
             | Airflow DAG aren't able to concurrently execute? Thats not
             | my experience. I'm also confused by the use of 'static' in
             | this context.
        
               | DSpinellis wrote:
               | That's the point: non-dependant tasks can run
               | concurrently in Airflow. In sh/BAsh/dgsh dependant tasks
               | can also run concurrently, as in tar cf - . | xz.
        
           | croemer wrote:
           | I was curious but the docs are a nightmare. I clicked through
           | a couple of pages and couldn't see a single simple non-
           | trivial example.
        
         | sunshine-o wrote:
         | I respect Python but the upgrade to Python 3 showed that data
         | processing workloads that can be handled by standard Unix
         | tooling should stay there.
         | 
         | The upgrade was a nightmare for so many organizations. It
         | shouldn't be that way but it was.
        
         | procaryote wrote:
         | spawning shell commands and the equivalent of piping is
         | surprisingly hard in python. It's almost easier to do in C
         | 
         | There are probably libraries that could help, but then you need
         | to install dependencies which is sad in python for other
         | reasons
        
           | croemer wrote:
           | We use snakemake a lot in bioinformatics to take advantage of
           | parallelism in workflows while staying close to Python:
           | https://github.com/snakemake/snakemake
           | 
           | Others use nextflow but that requires learning Groovy and
           | it's less intuitive.
        
         | everforward wrote:
         | From a glance, it looks like very similar tradeoffs vs bash.
         | Much harder to read in a medium-large application, but much
         | more ergonomic IO and process control.
         | 
         | I.e. much faster to use dgsh for a basic processing DAG, much
         | more painful to use dgsh for a large ETL pipeline.
         | 
         | Python with something like Prefect isn't something you'd use a
         | REPL to bang out a one-off on, but it'd be more maintainable.
         | dgsh would let you use a REPL to bang out a quick and dirty
         | DAG.
        
         | DSpinellis wrote:
         | I've found creating pipelines with Python to be messy and
         | intuitive. Other than creating a DSL to express them I can't
         | see how DAGs can be expressed naturally with Python's syntax.
         | 
         | Even creating tools in Python that can be connected together in
         | a Unix shell pipeline isn't trivial. By default if a downstream
         | program stops processing Python's output you get an unsightly
         | broken pipe exception, so you need to execute
         | signal.signal(signal.SIGPIPE, signal.SIG_DFL) to avoid this.
        
       | esafak wrote:
       | This would have been great 10-20 years ago, or even at the
       | coining of Unix pipes. By today's standards, however, the syntax
       | feels clunky and dated. I'd like to see contemporary shells like
       | nushell and elvish copy these ideas, with attribution of course,
       | in a more modern way. That is the best way I can see to honor
       | this stagnant project: https://github.com/dspinellis/dgsh
        
         | hnlmorg wrote:
         | Murex has had this capability for years.
         | (https://github.com/lmorg/murex)
         | 
         | I'm on my phone at the moment and cooking so cannot type any
         | examples, but if I get time, I'll throw together some
         | comparisons later tonight
        
           | esafak wrote:
           | I could not find any mention of DAGs or directed acyclic
           | graphs in the documentation.
        
             | hnlmorg wrote:
             | Yeah it's not technically DAG since it uses iteration, but
             | then dgsh will use iteration under the hood too.
             | 
             | However Murex does support CSP-style concurrency. So while
             | there's no syntax sugar for writing graphs, you can very
             | easily create adhoc pipes and pass them around instead of
             | using stdout / stderr.
             | 
             | So it wouldn't actually take much to refine that with some
             | DAG-friendly syntax.
             | 
             | In fact maybe that can be my next project...
        
               | DSpinellis wrote:
               | I'm curious: what do you mean by "dgsh will use iteration
               | under the hood too"? Dgsh does several things under the
               | hood, but I wouldn't characterize any of them as
               | iteration.
        
               | hnlmorg wrote:
               | Yes you're right. My apologies. I was glancing at the
               | examples while cooking, specifically the git example
               | (https://www2.dmst.aueb.gr/dds/sw/dgsh/#commit-stats)
               | thinking that it was iterating over the lines output from
               | git, but clearly that's not even how bash would work.
               | That will teach me for commenting without giving
               | something my full attention first doh!
               | 
               | Looking properly at this, I can see no iteration is
               | needed. Which actually makes the Murex implementation
               | even easier because Murex already has tee pipes just like
               | dgsh. It's just not (yet) particularly well documented.
        
               | DSpinellis wrote:
               | Admiring your multi-tasking!
        
               | em-bee wrote:
               | would you be able to share or point to some examples? i
               | am curious.
        
         | zokier wrote:
         | Well, the project started 12 years ago (as sgsh), so that fits
         | into your 10-20 years ago window :)
        
         | DSpinellis wrote:
         | I went through two iterations before adopting the current
         | syntax. Truth is neither me nor Doug McIlroy, the inventor of
         | Unix pipes, who kindly and generously provided feedback during
         | dgsh's development, had something better to propose.
         | 
         | What syntax would you propose?
        
           | esafak wrote:
           | Greetings, Diomidis.
           | 
           | I would suggest a familiar notation like "[a, b] -> c" in a
           | dedicated dag block:                 dag text_stats {
           | tee -> [ split_words, count_chars ]              # word-based
           | frequencies         split_words -> tee_words
           | tee_words -> ngram2 -> save_digram         tee_words ->
           | ngram3 -> save_trigram         tee_words -> ranked_frequency
           | -> save_words              # character-based frequencies
           | count_chars -> add_percentage         chars_to_lines ->
           | ranked_frequency -> add_percentage -> save_chars       }
           | run text_stats < input.txt
           | 
           | https://www2.dmst.aueb.gr/dds/sw/dgsh/#text-properties
           | 
           | or                 dag commit_graph {         git_log ->
           | filter_recent -> sort -n -> [ uniq_committers, sort_by_email
           | ]              uniq_committers -> [ last_commit,
           | first_commit, committer_positions ]         [ last_commit,
           | first_commit ] -> cat -> tr '\n' ' ' -> days_between
           | [ committer_positions, sort_by_email ] -> join_by_email ->
           | sort -k2n -> [ make_bitmap_header, plot_per_day ]
           | [ uniq_committers, days_between ] -> emit_dims ->
           | plot_per_day              make_bitmap_header -> cat
           | plot_per_day -> morphconv -> [ to_png_large, to_png_small ]
           | }            run commit_graph
           | 
           | https://www2.dmst.aueb.gr/dds/sw/dgsh/#committer-plot
           | 
           | The translations above are computer-assisted and may contain
           | mistakes, but you get the idea.
        
             | DSpinellis wrote:
             | Thank you for the suggestion. This would mean that you'd
             | also then create some mapping from each name (like git_log)
             | to its implementation, right?
        
               | esafak wrote:
               | Yes, using shell functions:                 git_log() {
               | git log --pretty=tformat:'%at %ae'       }
               | 
               | Separating function definitions allows you to run, test,
               | and re-use them.
        
               | DSpinellis wrote:
               | And, more importantly, assign a name to a process, so
               | that it can appear multiple times in the graph.
        
             | shanemhansen wrote:
             | The closeness of this syntax to graphviz dot is very
             | interesting.
             | 
             | having dgsh output a graphvis file in dry-run mode would be
             | a neat feature.
        
         | o11c wrote:
         | Frankly, I find that anything more than some preparatory `exec
         | {my_fd}< <(commands ...)` is an unmaintainable mess, so bash is
         | plenty for any program that _should_ be implemented in bash.
        
           | DSpinellis wrote:
           | Manually playing around with fds is definitely
           | unmaintainable. My hope is that a clean syntax can help
           | create maintainable complex pipelines.
        
       | politician wrote:
       | A solution to the One Billion Row Challenge (1brc.dev) written in
       | dgsh would be a interesting as a benchmark.
        
         | DSpinellis wrote:
         | Nice benchmark! This is a (not at all efficient) awk one-liner.
         | 
         | awk -F\; ' $2 > max[$1] { max[$1] = $2 } !($1 in min) || $2 <
         | min[$1] { min[$1] = $2 } { sum[$1] += $2; count[$1]++} END {
         | for (n in sum) printf("%s=%.1f/%.1f/%.1f, ", n, min[n], sum[n]
         | / count[n], max[n])}'
         | 
         | Can't see how dgsh could be applied to it.
        
       | dang wrote:
       | Related. Others?
       | 
       |  _Dgsh - Directed Graph Shell_ -
       | https://news.ycombinator.com/item?id=21700014 - Dec 2019 (11
       | comments)
       | 
       |  _Dgsh - Directed graph shell_ -
       | https://news.ycombinator.com/item?id=13352659 - Jan 2017 (51
       | comments)
        
       | byearthithatius wrote:
       | Interesting. What are the benefits of thinking of data pipelines
       | in terms of a DAG? Why cant it be cyclical with exit conditions?
        
         | DSpinellis wrote:
         | A nicer syntax and a lower probability of deadlocks.
        
       ___________________________________________________________________
       (page generated 2025-09-30 23:01 UTC)