[HN Gopher] Snapshottable Stores
___________________________________________________________________
Snapshottable Stores
Author : matt_d
Score : 58 points
Date : 2024-09-04 21:18 UTC (4 days ago)
(HTM) web link (dl.acm.org)
(TXT) w3m dump (dl.acm.org)
| jauntywundrkind wrote:
| This is the way.
|
| We already have computing starting to log itself clearly via
| OpenTelemtry. This feels like only a small jump, to recording
| your processing just a little bit better.
|
| There used to be a background rumbling about hardware
| trasnsactional memory. But it never seemed like we needed
| hardware, we always just needed better state snapshotting as we
| go.
| matt_d wrote:
| Abstract:
|
| "We say that an imperative data structure is snapshottable or
| supports snapshots if we can efficiently capture its current
| state, and restore a previously captured state to become the
| current state again. This is useful, for example, to implement
| backtracking search processes that update the data structure
| during search.
|
| Inspired by a data structure proposed in 1978 by Baker, we
| present a snapshottable store, a bag of mutable references that
| supports snapshots. Instead of capturing and restoring an array,
| we can capture an arbitrary set of references (of any type) and
| restore all of them at once. This snapshottable store can be used
| as a building block to support snapshots for arbitrary data
| structures, by simply replacing all mutable references in the
| data structure by our store references. We present use-cases of a
| snapshottable store when implementing type-checkers and automated
| theorem provers.
|
| Our implementation is designed to provide a very low overhead
| over normal references, in the common case where the
| capture/restore operations are infrequent. Read and write in
| store references are essentially as fast as in plain references
| in most situations, thanks to a key optimisation we call record
| elision. In comparison, the common approach of replacing
| references by integer indices into a persistent map incurs a
| logarithmic overhead on reads and writes, and sophisticated
| algorithms typically impose much larger constant factors.
|
| The implementation, which is inspired by Baker's and the OCaml
| implementation of persistent arrays by Conchon and Filliatre, is
| both fairly short and very hard to understand: it relies on
| shared mutable state in subtle ways. We provide a mechanized
| proof of correctness of its core using the Iris framework for the
| Coq proof assistant."
|
| Motivating example:
|
| "Union-Find is a central data structure in several algorithms.
| For example, it is at the core of ML type inference, which
| proceeds by repeated unification between type variables. Union-
| Find can also be used to track equalities between type
| constructors, as introduced in the typing environment when type-
| checking Guarded Algebraic Data Types (GADTs) for example.
|
| When using a Union-Find data structure to implement a type
| system, it is common to need backtracking, which requires the
| inference state to be snapshottable. For example:
|
| (1) A single unification between two types during ML type
| inference translates into several unifications between type
| variables, traversing the structure of the two types. If we
| discover that the two types are in fact incompatible, we fail
| with a type error. However, we may want to revert the
| unifications that were already performed, so that the error
| message shown to the user does not include confusing signs of
| being halfway through the unification, or so that the interactive
| toplevel session can continue in a clean environment.
|
| (2) Production languages unfortunately have to consider
| backtracking to implement certain less principled typing rules:
| try A, and if it fails revert to a clean state and try B instead.
|
| (3) GADT equations are only added to the typing environment in
| the context of a given match clause, and must then be rolled back
| before checking the other clauses.
|
| We have encountered requirements (1) and (2) in the
| implementation of the OCaml type-checker, and (1) and (3) in the
| development of Inferno [Pottier, 2014], a prototype type-
| inference library implemented in OCaml that aims to be efficient.
|
| Now a question for the reader: how would you change the Union-
| Find implementation above to support snapshots?"
| kragen wrote:
| this is a very nice result and promises to be very useful, if i
| am understanding it correctly. i was skeptical that they could
| find anything new to say in 02024 about such a well-studied
| problem, but the paper surprised me; they do. in fact there are
| many delightful things in this paper, starting with their clean
| and general abstract interface for union-find! (even though
| union-find is just their motivating example rather than anything
| central to the paper)
|
| in essence, i think their proposal is to journal all the updates
| to your mutable store, in much the same way that transactional
| memory does, so that you can travel to any past or future state
| by replaying the journal of updates; if you change the past, you
| cause the timeline to diverge, but you don't lose the ability to
| travel back to the original timeline, because you can journal-
| replay your way back to the divergence point. unlike
| transactional memory, they don't do anything special on reads of
| mutable state, just writes. because they're testing on algorithms
| that mutate state sparingly, they get very good efficiency.
| moreover, their comparison is with the same algorithm using plain
| ocaml references, but ocaml's write barrier makes it expensive to
| mutate state anyway, so their efficiency looks even better
|
| they avoid storing redundant journal entries between snapshots by
| using a generation counter that counts snapshots, so repeatedly
| mutating the same reference doesn't keep allocating memory; it
| just repeatedly checks the generation counter
|
| i wonder to what extent you can combine their algorithm with umut
| acar's 'self-adjusting computation' to get a generic, efficient
| way of converting an imperative program into an incremental
| dataflow program?
|
| perhaps inspired by union-find, their tree of store states is
| made out of only parent pointers, so you can walk up the tree
| (from a snapshot) to the root, but never back down. (a new child
| node is constructed for each parent when you do this.) this
| allows garbage collection to work on saved snapshots! however, a
| live snapshot can still keep alive an otherwise unreferenced
| mutable reference, even though there's no way for the user of the
| snapshottable store to access it
|
| they credit fully fp-persistent shallow binding to henry baker. i
| don't think that's fair, much as i appreciate his work; baker
| himself credits shallow binding to maclisp, citing moon's maclisp
| reference manual, and the more capable version he describes to
| greenblatt's lispm, each from four years earlier. and i don't
| think moon or greenblatt would claim full credit for it either,
| nor would they claim it only dated from 01974; what was new in
| greenblatt's design was that it could handle lexical scoping and
| upward funargs, while older implementations of shallow binding
| could only handle dynamic scoping and downward funargs. this is
| the difference between 'persistence' and 'semi-persistence' in
| the terminology of allain et al., as they explain on page 4. to a
| great extent what baker was doing was explaining the technique
| rather than claiming credit for it
|
| this reminds me of a section from the mother of all multics emacs
| papers (https://www.multicians.org/mepap.html), by bernie
| greenberg, about the first lisp emacs, which was written in
| maclisp
|
| > _The implementation of multiple buffers was viewed as a task of
| multiplexing the extant function of the editor over several
| buffers. The buffer being edited is defined by about two dozen
| Lisp variables of the basic editor, identifying the current
| Editorline, its current (open /closed) state, the first and last
| Editorlines of the buffer, the list of marks, and so forth.
| Switching buffers (i.e., switching the attention of the editor,
| as the user sees it) need consist only of switching the values of
| all of these variables. Neither the interactive driver nor the
| redisplay need be cognizant of the existence of multiple buffers;
| the redisplay will simply find that a different "current
| Editorline" exists if buffers are switched between calls to it.
| What is more, the only functions in the basic editor that have to
| be aware of the existence of multiple buffers are those that deal
| with many buffers, switch them, etc. All other code simply
| references the buffer state variables, and operates upon the
| current buffer._
|
| > _The function in the basic editor which implements the command
| that switches buffers does so by saving up the values of all of
| the relevant Lisp variables, that define the buffer, and placing
| a saved image (a list of their values) as a property of the Lisp
| symbol whose name is the current buffer's. The similarly saved
| list of the target buffer's is retrieved, and the contained
| values of the buffer state variables instated. A new buffer is
| created simply by replacing the "instatement" step with
| initialization of the state variables to default values for an
| empty buffer. Buffer destruction is accomplished simply by
| removing the saved state embedded as a property: all pointers to
| the buffer will vanish thereby, and the MacLisp garbage collector
| will take care of the rest._
|
| > _The alternate approach to multiple buffers would have been to
| have the buffer state variables referenced indirectly through
| some pointer which is simply replaced to change buffers. This
| approach, in spite of not being feasible in Lisp, is less
| desirable than the current approach, for it distributes cost at
| variable reference time, not buffer-switching time, and the
| former is much more common._
|
| > _One of the most interesting per-buffer state variables is
| itself a list of arbitrary variables placed there by extension
| code. Extension code can register variables by a call to an
| appropriate primitive in the basic editor. The values of all such
| variables registered in a given buffer will be saved and restored
| when that buffer is exited and re-entered. The ability of Lisp to
| treat a variable as a run-time object facilitates this. Variables
| can thus be given "per-buffer" dynamic scope on demand, allowing
| extensions to operate in many buffers simultaneously using the
| same code and the same variables, in an analogous fashion to the
| way Multics programs can be executing in many processes
| simultaneously._
|
| to some extent this reflects the computer architectures that were
| current at the time (nowadays indexing off a base pointer in a
| register is usually faster than accessing memory at an absolute
| address) but the overall principle is still applicable; it just
| manifests in different ways
|
| but of course what allain et al. have implemented is much more
| efficient at switching contexts, because it only updates the
| references that are different between the two contexts
|
| this paper is timely for me because, entirely by coincidence, i
| implemented dynamic scoping with shallow binding in ansi forth
| last week: : (let!) dup @ over swap 2r> rot >r
| rot >r >r >r ! ; : let! (let!) 2r> ! ;
|
| short example usage, a word to print a number in decimal
| regardless of what the current _base_ is:
| decimal : dec. 10 base let! . ;
|
| because this stores the saved value on the return stack, it is
| 'semi-persistent' in the terms of this paper; once you have
| undone an update, you cannot redo it
|
| longer example: variable src variable dest
| variable stor variable n defer move-disc
| : text-disc cr ." Move disc " n @ . ." from " src @
| emit ." to " dest @ emit ; ' text-disc is move-disc
| : hanoi ( src stor dest n -- ) n let! dest let! stor
| let! src let! n @ 0= if exit then src @
| dest @ stor @ n @ 1- recurse move-disc
| stor @ src @ dest @ n @ 1- recurse ; char A char
| B char C 4 hanoi
|
| this code should surprise you if you are familiar with the
| conventional wisdom that standard forth doesn't have local
| variables!
|
| (it's a pretty inefficient implementation though)
|
| and, also entirely by coincidence, i implemented union-find in
| forth a couple of weeks earlier than that:
| https://asciinema.org/a/672405
| eisbaw wrote:
| Abstract reads alot like the Nix store.
___________________________________________________________________
(page generated 2024-09-08 23:00 UTC)