[HN Gopher] dgsh - Directed Graph Shell
___________________________________________________________________
dgsh - Directed Graph Shell
Author : pabs3
Score : 116 points
Date : 2025-09-30 13:39 UTC (9 hours ago)
(HTM) web link (www2.dmst.aueb.gr)
(TXT) w3m dump (www2.dmst.aueb.gr)
| uncletaco wrote:
| Hello. In English this makes me think of the phrase "dog shit".
| Not sure if that's intentional or not.
| pentaphobe wrote:
| Second English speaker here who didn't make that connection at
| all
| nasretdinov wrote:
| English is my third language and I can confirm I didn't even
| think about this
| rirze wrote:
| Same english is my fourth language and it didn't even
| appear to me
| DSpinellis wrote:
| Author of dgsh here. This is definitely not what I had in
| mind.
| lucideer wrote:
| Another English speaker data point here & I actually read
| dogshit before I read dgsh.
| DonHopkins wrote:
| That's what I think when I hear "bash".
| kbr2000 wrote:
| batshit?
| dotnetcarpenter wrote:
| That's literally the word for shit in Norwegian: baesj https:
| //upload.wikimedia.org/wikipedia/commons/6/69/Nb-b%C3%...
| goldenCeasar wrote:
| Now I can't unconnect this, I hope OP was aware because now he
| wont forget too.
| em-bee wrote:
| and no matter how much i try, i can't make the connection.
| best i can come up with is dogshell, and even that is a
| stretch. phew...
| jimbokun wrote:
| This is very interesting, but I'm wondering how it compares to
| just using a dynamic language like Python or Ruby for the same
| tasks. Curious how the line count to express the same tasks would
| come out.
| PaulHoule wrote:
| There is a lot of stuff for Python which follows the "express
| computation as a dag" approach, especially Apache Airflow
|
| https://airflow.apache.org/
| DSpinellis wrote:
| Apache Airflow solves a very different problem. Its DAGs are
| static dependencies between sequentially executed processing
| steps, whereas the DAGs of dgsh express live direct data
| flows.
| PaulHoule wrote:
| Yeah, there are also the boxes and lines tools like
|
| https://www.knime.com/
|
| which have their own subculture. You could solve the same
| problems they do with pandas and scikit-learn but people
| who use those tools would never use pandas and scikit-learn
| and vice versa.
|
| Circa 2015 I was thinking those tools all had the
| architectural flaw that they pass relational rows over the
| lines as opposed to JSON objects (or equivalent) which
| means you had to realize joins as highly complex graphs
| where things that seem like local concerns to me require a
| global structure and where what seems like a little change
| to management changes the whole graph in a big way.
|
| I found the people who were buying up that sort of tools
| didn't give a damn because they thought customers demanded
| the speed of columnar execution which our way couldn't
| deliver.
|
| I made a prototype that gave the right answers every time
| and then went to work for a place which had some luck
| selling their own version that didn't always give the right
| answers because: they didn't know what algebra it
| supported, didn't believe something like that had an
| algebra, and didn't properly tear the pipeline down at the
| end.
| jpitz wrote:
| Do you mean to say that two non-dependant tasks in an
| Airflow DAG aren't able to concurrently execute? Thats not
| my experience. I'm also confused by the use of 'static' in
| this context.
| DSpinellis wrote:
| That's the point: non-dependant tasks can run
| concurrently in Airflow. In sh/BAsh/dgsh dependant tasks
| can also run concurrently, as in tar cf - . | xz.
| croemer wrote:
| I was curious but the docs are a nightmare. I clicked through
| a couple of pages and couldn't see a single simple non-
| trivial example.
| sunshine-o wrote:
| I respect Python but the upgrade to Python 3 showed that data
| processing workloads that can be handled by standard Unix
| tooling should stay there.
|
| The upgrade was a nightmare for so many organizations. It
| shouldn't be that way but it was.
| procaryote wrote:
| spawning shell commands and the equivalent of piping is
| surprisingly hard in python. It's almost easier to do in C
|
| There are probably libraries that could help, but then you need
| to install dependencies which is sad in python for other
| reasons
| croemer wrote:
| We use snakemake a lot in bioinformatics to take advantage of
| parallelism in workflows while staying close to Python:
| https://github.com/snakemake/snakemake
|
| Others use nextflow but that requires learning Groovy and
| it's less intuitive.
| everforward wrote:
| From a glance, it looks like very similar tradeoffs vs bash.
| Much harder to read in a medium-large application, but much
| more ergonomic IO and process control.
|
| I.e. much faster to use dgsh for a basic processing DAG, much
| more painful to use dgsh for a large ETL pipeline.
|
| Python with something like Prefect isn't something you'd use a
| REPL to bang out a one-off on, but it'd be more maintainable.
| dgsh would let you use a REPL to bang out a quick and dirty
| DAG.
| DSpinellis wrote:
| I've found creating pipelines with Python to be messy and
| intuitive. Other than creating a DSL to express them I can't
| see how DAGs can be expressed naturally with Python's syntax.
|
| Even creating tools in Python that can be connected together in
| a Unix shell pipeline isn't trivial. By default if a downstream
| program stops processing Python's output you get an unsightly
| broken pipe exception, so you need to execute
| signal.signal(signal.SIGPIPE, signal.SIG_DFL) to avoid this.
| esafak wrote:
| This would have been great 10-20 years ago, or even at the
| coining of Unix pipes. By today's standards, however, the syntax
| feels clunky and dated. I'd like to see contemporary shells like
| nushell and elvish copy these ideas, with attribution of course,
| in a more modern way. That is the best way I can see to honor
| this stagnant project: https://github.com/dspinellis/dgsh
| hnlmorg wrote:
| Murex has had this capability for years.
| (https://github.com/lmorg/murex)
|
| I'm on my phone at the moment and cooking so cannot type any
| examples, but if I get time, I'll throw together some
| comparisons later tonight
| esafak wrote:
| I could not find any mention of DAGs or directed acyclic
| graphs in the documentation.
| hnlmorg wrote:
| Yeah it's not technically DAG since it uses iteration, but
| then dgsh will use iteration under the hood too.
|
| However Murex does support CSP-style concurrency. So while
| there's no syntax sugar for writing graphs, you can very
| easily create adhoc pipes and pass them around instead of
| using stdout / stderr.
|
| So it wouldn't actually take much to refine that with some
| DAG-friendly syntax.
|
| In fact maybe that can be my next project...
| DSpinellis wrote:
| I'm curious: what do you mean by "dgsh will use iteration
| under the hood too"? Dgsh does several things under the
| hood, but I wouldn't characterize any of them as
| iteration.
| hnlmorg wrote:
| Yes you're right. My apologies. I was glancing at the
| examples while cooking, specifically the git example
| (https://www2.dmst.aueb.gr/dds/sw/dgsh/#commit-stats)
| thinking that it was iterating over the lines output from
| git, but clearly that's not even how bash would work.
| That will teach me for commenting without giving
| something my full attention first doh!
|
| Looking properly at this, I can see no iteration is
| needed. Which actually makes the Murex implementation
| even easier because Murex already has tee pipes just like
| dgsh. It's just not (yet) particularly well documented.
| DSpinellis wrote:
| Admiring your multi-tasking!
| em-bee wrote:
| would you be able to share or point to some examples? i
| am curious.
| zokier wrote:
| Well, the project started 12 years ago (as sgsh), so that fits
| into your 10-20 years ago window :)
| DSpinellis wrote:
| I went through two iterations before adopting the current
| syntax. Truth is neither me nor Doug McIlroy, the inventor of
| Unix pipes, who kindly and generously provided feedback during
| dgsh's development, had something better to propose.
|
| What syntax would you propose?
| esafak wrote:
| Greetings, Diomidis.
|
| I would suggest a familiar notation like "[a, b] -> c" in a
| dedicated dag block: dag text_stats {
| tee -> [ split_words, count_chars ] # word-based
| frequencies split_words -> tee_words
| tee_words -> ngram2 -> save_digram tee_words ->
| ngram3 -> save_trigram tee_words -> ranked_frequency
| -> save_words # character-based frequencies
| count_chars -> add_percentage chars_to_lines ->
| ranked_frequency -> add_percentage -> save_chars }
| run text_stats < input.txt
|
| https://www2.dmst.aueb.gr/dds/sw/dgsh/#text-properties
|
| or dag commit_graph { git_log ->
| filter_recent -> sort -n -> [ uniq_committers, sort_by_email
| ] uniq_committers -> [ last_commit,
| first_commit, committer_positions ] [ last_commit,
| first_commit ] -> cat -> tr '\n' ' ' -> days_between
| [ committer_positions, sort_by_email ] -> join_by_email ->
| sort -k2n -> [ make_bitmap_header, plot_per_day ]
| [ uniq_committers, days_between ] -> emit_dims ->
| plot_per_day make_bitmap_header -> cat
| plot_per_day -> morphconv -> [ to_png_large, to_png_small ]
| } run commit_graph
|
| https://www2.dmst.aueb.gr/dds/sw/dgsh/#committer-plot
|
| The translations above are computer-assisted and may contain
| mistakes, but you get the idea.
| DSpinellis wrote:
| Thank you for the suggestion. This would mean that you'd
| also then create some mapping from each name (like git_log)
| to its implementation, right?
| esafak wrote:
| Yes, using shell functions: git_log() {
| git log --pretty=tformat:'%at %ae' }
|
| Separating function definitions allows you to run, test,
| and re-use them.
| DSpinellis wrote:
| And, more importantly, assign a name to a process, so
| that it can appear multiple times in the graph.
| shanemhansen wrote:
| The closeness of this syntax to graphviz dot is very
| interesting.
|
| having dgsh output a graphvis file in dry-run mode would be
| a neat feature.
| o11c wrote:
| Frankly, I find that anything more than some preparatory `exec
| {my_fd}< <(commands ...)` is an unmaintainable mess, so bash is
| plenty for any program that _should_ be implemented in bash.
| DSpinellis wrote:
| Manually playing around with fds is definitely
| unmaintainable. My hope is that a clean syntax can help
| create maintainable complex pipelines.
| politician wrote:
| A solution to the One Billion Row Challenge (1brc.dev) written in
| dgsh would be a interesting as a benchmark.
| DSpinellis wrote:
| Nice benchmark! This is a (not at all efficient) awk one-liner.
|
| awk -F\; ' $2 > max[$1] { max[$1] = $2 } !($1 in min) || $2 <
| min[$1] { min[$1] = $2 } { sum[$1] += $2; count[$1]++} END {
| for (n in sum) printf("%s=%.1f/%.1f/%.1f, ", n, min[n], sum[n]
| / count[n], max[n])}'
|
| Can't see how dgsh could be applied to it.
| dang wrote:
| Related. Others?
|
| _Dgsh - Directed Graph Shell_ -
| https://news.ycombinator.com/item?id=21700014 - Dec 2019 (11
| comments)
|
| _Dgsh - Directed graph shell_ -
| https://news.ycombinator.com/item?id=13352659 - Jan 2017 (51
| comments)
| byearthithatius wrote:
| Interesting. What are the benefits of thinking of data pipelines
| in terms of a DAG? Why cant it be cyclical with exit conditions?
| DSpinellis wrote:
| A nicer syntax and a lower probability of deadlocks.
___________________________________________________________________
(page generated 2025-09-30 23:01 UTC)