[HN Gopher] Reflow, a language for distributed, incremental data...
___________________________________________________________________
Reflow, a language for distributed, incremental data processing in
the cloud
Author : krab
Score : 67 points
Date : 2021-05-18 04:49 UTC (2 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| all2 wrote:
| From the README:
|
| Reflow comprises:
|
| - a functional, lazy, type-safe domain specific language for
| writing workflow programs;
|
| - a runtime for evaluating Reflow programs incrementally,
| coordinating cluster execution, and transparent memoization;
|
| - a cluster scheduler to dynamically provision and tear down
| resources from a cloud provider (AWS currently supported).
|
| and
|
| Reflow was designed to support sophisticated, large-scale
| bioinformatics workflows, but should be widely applicable to
| scientific and engineering computing workloads. It was built
| using Go.
|
| Reflow joins a long list of systems designed to tackle
| bioinformatics workloads, but differ from these in important
| ways:
|
| - it is a vertically integrated system with a minimal set of
| external dependencies; this allows Reflow to be "plug-and-play":
| bring your cloud credentials, and you're off to the races;
|
| - it defines a strict data model which is used for transparent
| memoization and other optimizations;
|
| - it takes workflow software seriously: the Reflow DSL provides
| type checking, modularity, and other constructors that are
| commonplace in general purpose programming languages; because of
| its high level data model and use of caching, Reflow computes
| incrementally: it is always able to compute the smallest set of
| operations given what has been computed previously.
| dannykwells wrote:
| If you're into workflow runners, Reflow and Cromwell
| (https://github.com/broadinstitute/cromwell) are the only two
| really to consider. Having tried them all, these two are by far
| the best and most supported (and there are 100s!)
|
| Cromwell is great because it is google cloud native and supported
| within the Terra ecosystem (https://app.terra.bio/) meaning you
| do not need to host it yourself - you can just connect your
| google account and go.
|
| Reflow, I've heard, is a little more "professional" given that
| the Grail team is heavily ex-Google. But both can scale to
| massively parallel (1000+ parallel analyses).
| aednichols wrote:
| Thanks for the shoutout. Cromwell/Terra developer here in an
| informal capacity, can answer Qs.
| epistasis wrote:
| Interesting to see Grail share this, I'm excited to try it out.
|
| I'm perpetually unsatisfied with bioinformatics workflow
| software. Snakemake and GNU make remain my favorites so far in
| terms of developing novel analysis. However, making GNU make into
| a reusable pipeline always feels like an awful and ugly hack. And
| GNU make requires a shared file system among nodes, which is
| problematic on AWS...
|
| This seems to have potential for both recording the steps for
| reproducible science, but also turning those set of steps into a
| reusable pipeline easily.
| fwip wrote:
| My personal favorite is Nextflow (http://nextflow.io/). Quick
| to start up a one-off script in, and it's ready to run in
| production without too much tweaking.
|
| Edit: I especially appreciate the wide range of supported
| systems for both dependency management (running the gamut from
| GNU modules or conda to docker/singularity containers) and
| execution environments (local, SLURM, SGE, AWS, Azure, etc.)
| The_Amp_Walrus wrote:
| Is reflow go only?
|
| is the .rf file format a DSL or an existing language?
| prb2 wrote:
| Reflow is implemented in Go, but it can be used to run programs
| in any language.
|
| The Reflow language (.rf files) is a DSL, the language is
| described in more detail here:
| https://github.com/grailbio/reflow/blob/master/LANGUAGE.md
| mariusae wrote:
| If you want to work with Go, check out bigslice [1] (and
| bigmachine [2]), which is built on a similar architecture.
|
| [1] https://github.com/grailbio/bigslice/ [2]
| https://github.com/grailbio/bigmachine
___________________________________________________________________
(page generated 2021-05-20 23:00 UTC)