[HN Gopher] Show HN: Mandala - Automatically save, query and ver...
___________________________________________________________________
Show HN: Mandala - Automatically save, query and version Python
computations
`mandala` is a framework I wrote to automate tracking ML
experiments for my research. It differs from other experiment
tracking tools by making persistence, query and versioning logic a
generic part of the programming language itself, as opposed to an
external logging tool you must learn and adapt to. The goal is to
be able to write expressive computational code without thinking
about persistence (like in an interactive session), and still have
the full benefits of a versioned, queriable storage afterwards.
Surprisingly, it turns out that this vision can pretty much be
achieved with two generic tools: 1. a memoization+versioning
decorator, `@op`, which tracks inputs, outputs, code and runtime
dependencies (other functions called, or global variables accessed)
every time a function is called. Essentially, this makes function
calls replace logging: if you want something saved, you write a
function that returns it. Using (a lot of) hashing, `@op` ensures
that the same version of the function is never executed twice on
the same inputs. Importantly, the decorator encourages/enforces
composition. Before a call, `@op` functions wrap their inputs in
special objects, `Ref`s, and return `Ref`s in turn. Furthermore,
data structures can be made transparent to `@op`s, so that an `@op`
can be called on a list of outputs of other `@op`s, or on an
element of the output of another `@op`. This creates an expressive
"web" of `@op` calls over time. 2. a data structure,
`ComputationFrame`, can automatically organize any such web of
`@op` calls into a high-level view, by grouping calls with a
similar role into "operations", and their inputs/outputs into
"variables". It can detect "imperative" patterns - like feedback
loops, branching/merging, and grouping multiple results in a single
object - and surface them in the graph. `ComputationFrame`s are a
"synthesis" of computation graphs and relational databases, and can
be automatically "exported" as dataframes, where columns are
variables and operations in the graph, and rows contain values and
calls for (possibly partial) executions of the graph. The upshot is
that you can query the relationships between any variables in a
project in one line, even in the presence of very heterogeneous
patterns in the graph. I'm very excited about this project - which
is still in an alpha version being actively developed - and
especially about the `ComputationFrame` data structure. I'd love to
hear the feedback of the HN community. Colab quickstart:
https://colab.research.google.com/github/amakelov/mandala/bl...
Blog post introducing `ComputationFrame`s (can be opened in Colab
too): https://amakelov.github.io/mandala/blog/01_cf/ Docs:
https://amakelov.github.io/mandala/
Author : amakelov
Score : 16 points
Date : 2024-07-11 20:10 UTC (2 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
___________________________________________________________________
(page generated 2024-07-11 23:00 UTC)