[HN Gopher] Snapshottable Stores
       ___________________________________________________________________
        
       Snapshottable Stores
        
       Author : matt_d
       Score  : 58 points
       Date   : 2024-09-04 21:18 UTC (4 days ago)
        
 (HTM) web link (dl.acm.org)
 (TXT) w3m dump (dl.acm.org)
        
       | jauntywundrkind wrote:
       | This is the way.
       | 
       | We already have computing starting to log itself clearly via
       | OpenTelemtry. This feels like only a small jump, to recording
       | your processing just a little bit better.
       | 
       | There used to be a background rumbling about hardware
       | trasnsactional memory. But it never seemed like we needed
       | hardware, we always just needed better state snapshotting as we
       | go.
        
       | matt_d wrote:
       | Abstract:
       | 
       | "We say that an imperative data structure is snapshottable or
       | supports snapshots if we can efficiently capture its current
       | state, and restore a previously captured state to become the
       | current state again. This is useful, for example, to implement
       | backtracking search processes that update the data structure
       | during search.
       | 
       | Inspired by a data structure proposed in 1978 by Baker, we
       | present a snapshottable store, a bag of mutable references that
       | supports snapshots. Instead of capturing and restoring an array,
       | we can capture an arbitrary set of references (of any type) and
       | restore all of them at once. This snapshottable store can be used
       | as a building block to support snapshots for arbitrary data
       | structures, by simply replacing all mutable references in the
       | data structure by our store references. We present use-cases of a
       | snapshottable store when implementing type-checkers and automated
       | theorem provers.
       | 
       | Our implementation is designed to provide a very low overhead
       | over normal references, in the common case where the
       | capture/restore operations are infrequent. Read and write in
       | store references are essentially as fast as in plain references
       | in most situations, thanks to a key optimisation we call record
       | elision. In comparison, the common approach of replacing
       | references by integer indices into a persistent map incurs a
       | logarithmic overhead on reads and writes, and sophisticated
       | algorithms typically impose much larger constant factors.
       | 
       | The implementation, which is inspired by Baker's and the OCaml
       | implementation of persistent arrays by Conchon and Filliatre, is
       | both fairly short and very hard to understand: it relies on
       | shared mutable state in subtle ways. We provide a mechanized
       | proof of correctness of its core using the Iris framework for the
       | Coq proof assistant."
       | 
       | Motivating example:
       | 
       | "Union-Find is a central data structure in several algorithms.
       | For example, it is at the core of ML type inference, which
       | proceeds by repeated unification between type variables. Union-
       | Find can also be used to track equalities between type
       | constructors, as introduced in the typing environment when type-
       | checking Guarded Algebraic Data Types (GADTs) for example.
       | 
       | When using a Union-Find data structure to implement a type
       | system, it is common to need backtracking, which requires the
       | inference state to be snapshottable. For example:
       | 
       | (1) A single unification between two types during ML type
       | inference translates into several unifications between type
       | variables, traversing the structure of the two types. If we
       | discover that the two types are in fact incompatible, we fail
       | with a type error. However, we may want to revert the
       | unifications that were already performed, so that the error
       | message shown to the user does not include confusing signs of
       | being halfway through the unification, or so that the interactive
       | toplevel session can continue in a clean environment.
       | 
       | (2) Production languages unfortunately have to consider
       | backtracking to implement certain less principled typing rules:
       | try A, and if it fails revert to a clean state and try B instead.
       | 
       | (3) GADT equations are only added to the typing environment in
       | the context of a given match clause, and must then be rolled back
       | before checking the other clauses.
       | 
       | We have encountered requirements (1) and (2) in the
       | implementation of the OCaml type-checker, and (1) and (3) in the
       | development of Inferno [Pottier, 2014], a prototype type-
       | inference library implemented in OCaml that aims to be efficient.
       | 
       | Now a question for the reader: how would you change the Union-
       | Find implementation above to support snapshots?"
        
       | kragen wrote:
       | this is a very nice result and promises to be very useful, if i
       | am understanding it correctly. i was skeptical that they could
       | find anything new to say in 02024 about such a well-studied
       | problem, but the paper surprised me; they do. in fact there are
       | many delightful things in this paper, starting with their clean
       | and general abstract interface for union-find! (even though
       | union-find is just their motivating example rather than anything
       | central to the paper)
       | 
       | in essence, i think their proposal is to journal all the updates
       | to your mutable store, in much the same way that transactional
       | memory does, so that you can travel to any past or future state
       | by replaying the journal of updates; if you change the past, you
       | cause the timeline to diverge, but you don't lose the ability to
       | travel back to the original timeline, because you can journal-
       | replay your way back to the divergence point. unlike
       | transactional memory, they don't do anything special on reads of
       | mutable state, just writes. because they're testing on algorithms
       | that mutate state sparingly, they get very good efficiency.
       | moreover, their comparison is with the same algorithm using plain
       | ocaml references, but ocaml's write barrier makes it expensive to
       | mutate state anyway, so their efficiency looks even better
       | 
       | they avoid storing redundant journal entries between snapshots by
       | using a generation counter that counts snapshots, so repeatedly
       | mutating the same reference doesn't keep allocating memory; it
       | just repeatedly checks the generation counter
       | 
       | i wonder to what extent you can combine their algorithm with umut
       | acar's 'self-adjusting computation' to get a generic, efficient
       | way of converting an imperative program into an incremental
       | dataflow program?
       | 
       | perhaps inspired by union-find, their tree of store states is
       | made out of only parent pointers, so you can walk up the tree
       | (from a snapshot) to the root, but never back down. (a new child
       | node is constructed for each parent when you do this.) this
       | allows garbage collection to work on saved snapshots! however, a
       | live snapshot can still keep alive an otherwise unreferenced
       | mutable reference, even though there's no way for the user of the
       | snapshottable store to access it
       | 
       | they credit fully fp-persistent shallow binding to henry baker. i
       | don't think that's fair, much as i appreciate his work; baker
       | himself credits shallow binding to maclisp, citing moon's maclisp
       | reference manual, and the more capable version he describes to
       | greenblatt's lispm, each from four years earlier. and i don't
       | think moon or greenblatt would claim full credit for it either,
       | nor would they claim it only dated from 01974; what was new in
       | greenblatt's design was that it could handle lexical scoping and
       | upward funargs, while older implementations of shallow binding
       | could only handle dynamic scoping and downward funargs. this is
       | the difference between 'persistence' and 'semi-persistence' in
       | the terminology of allain et al., as they explain on page 4. to a
       | great extent what baker was doing was explaining the technique
       | rather than claiming credit for it
       | 
       | this reminds me of a section from the mother of all multics emacs
       | papers (https://www.multicians.org/mepap.html), by bernie
       | greenberg, about the first lisp emacs, which was written in
       | maclisp
       | 
       | > _The implementation of multiple buffers was viewed as a task of
       | multiplexing the extant function of the editor over several
       | buffers. The buffer being edited is defined by about two dozen
       | Lisp variables of the basic editor, identifying the current
       | Editorline, its current (open /closed) state, the first and last
       | Editorlines of the buffer, the list of marks, and so forth.
       | Switching buffers (i.e., switching the attention of the editor,
       | as the user sees it) need consist only of switching the values of
       | all of these variables. Neither the interactive driver nor the
       | redisplay need be cognizant of the existence of multiple buffers;
       | the redisplay will simply find that a different "current
       | Editorline" exists if buffers are switched between calls to it.
       | What is more, the only functions in the basic editor that have to
       | be aware of the existence of multiple buffers are those that deal
       | with many buffers, switch them, etc. All other code simply
       | references the buffer state variables, and operates upon the
       | current buffer._
       | 
       | > _The function in the basic editor which implements the command
       | that switches buffers does so by saving up the values of all of
       | the relevant Lisp variables, that define the buffer, and placing
       | a saved image (a list of their values) as a property of the Lisp
       | symbol whose name is the current buffer's. The similarly saved
       | list of the target buffer's is retrieved, and the contained
       | values of the buffer state variables instated. A new buffer is
       | created simply by replacing the "instatement" step with
       | initialization of the state variables to default values for an
       | empty buffer. Buffer destruction is accomplished simply by
       | removing the saved state embedded as a property: all pointers to
       | the buffer will vanish thereby, and the MacLisp garbage collector
       | will take care of the rest._
       | 
       | > _The alternate approach to multiple buffers would have been to
       | have the buffer state variables referenced indirectly through
       | some pointer which is simply replaced to change buffers. This
       | approach, in spite of not being feasible in Lisp, is less
       | desirable than the current approach, for it distributes cost at
       | variable reference time, not buffer-switching time, and the
       | former is much more common._
       | 
       | > _One of the most interesting per-buffer state variables is
       | itself a list of arbitrary variables placed there by extension
       | code. Extension code can register variables by a call to an
       | appropriate primitive in the basic editor. The values of all such
       | variables registered in a given buffer will be saved and restored
       | when that buffer is exited and re-entered. The ability of Lisp to
       | treat a variable as a run-time object facilitates this. Variables
       | can thus be given "per-buffer" dynamic scope on demand, allowing
       | extensions to operate in many buffers simultaneously using the
       | same code and the same variables, in an analogous fashion to the
       | way Multics programs can be executing in many processes
       | simultaneously._
       | 
       | to some extent this reflects the computer architectures that were
       | current at the time (nowadays indexing off a base pointer in a
       | register is usually faster than accessing memory at an absolute
       | address) but the overall principle is still applicable; it just
       | manifests in different ways
       | 
       | but of course what allain et al. have implemented is much more
       | efficient at switching contexts, because it only updates the
       | references that are different between the two contexts
       | 
       | this paper is timely for me because, entirely by coincidence, i
       | implemented dynamic scoping with shallow binding in ansi forth
       | last week:                   : (let!) dup @ over swap 2r> rot >r
       | rot >r >r >r ! ; : let! (let!) 2r> ! ;
       | 
       | short example usage, a word to print a number in decimal
       | regardless of what the current _base_ is:
       | decimal : dec.  10 base let!  . ;
       | 
       | because this stores the saved value on the return stack, it is
       | 'semi-persistent' in the terms of this paper; once you have
       | undone an update, you cannot redo it
       | 
       | longer example:                   variable src  variable dest
       | variable stor  variable n              defer move-disc
       | : text-disc           cr ." Move disc " n @ . ." from " src @
       | emit ."  to " dest @ emit ;              ' text-disc is move-disc
       | : hanoi ( src stor dest n -- )           n let!  dest let!  stor
       | let!  src let!           n @ 0= if exit then           src @
       | dest @  stor @  n @ 1-  recurse           move-disc
       | stor @  src @  dest @  n @ 1-  recurse ;              char A char
       | B char C 4 hanoi
       | 
       | this code should surprise you if you are familiar with the
       | conventional wisdom that standard forth doesn't have local
       | variables!
       | 
       | (it's a pretty inefficient implementation though)
       | 
       | and, also entirely by coincidence, i implemented union-find in
       | forth a couple of weeks earlier than that:
       | https://asciinema.org/a/672405
        
       | eisbaw wrote:
       | Abstract reads alot like the Nix store.
        
       ___________________________________________________________________
       (page generated 2024-09-08 23:00 UTC)