[HN Gopher] Orthogonal Persistence
       ___________________________________________________________________
        
       Orthogonal Persistence
        
       Author : mpweiher
       Score  : 41 points
       Date   : 2024-03-06 12:33 UTC (2 days ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | convolvatron wrote:
       | having this as a model would be lovely and I agree wholeheartedly
       | with the exposition. its interesting though to think about what
       | impact this model would have on programming. a lot of our
       | processing and tooling are built around the notion that programs
       | are _almost_ right, and that we can bring them in and out - and
       | hopefully in the process our precious data hasn't been mangled.
       | 
       | when we express state directly in programs, we gain a lot, but
       | our notion of trashy disposable execution goes away and now we
       | have to think a lot more about how that system evolves.
        
         | usrusr wrote:
         | Reminds me of the discussions on hn when Intel Optane wasn't
         | quite dead yet. Those always seemed to end with the conclusion
         | that if separation between volatile and persistent memory had
         | not been forced on us by technological reality, it would be a
         | concept we'd better have invented at some point.
        
           | catskul2 wrote:
           | Think you can find any of those discussions? I'd be curious
           | to read/browse.
        
             | usrusr wrote:
             | This is the one that was featured in my memory:
             | 
             | https://news.ycombinator.com/item?id=32314814
             | 
             | But the most recent one is also interesting:
             | https://news.ycombinator.com/item?id=38527437
        
           | 082349872349872 wrote:
           | I was tangentially involved with an orthogonally persistent
           | OS, and we indeed had had to reinvent a distinct journalling
           | channel and special optionally-volatile storage, for DBMS-
           | style applications.
        
             | kragen wrote:
             | other reasons to need optionally-volatile storage include
             | secure encryption key generation (reusing randomness often
             | fatally compromises it) and device drivers (if you restore
             | the internal state of your device driver from a checkpoint,
             | but not the state of the device, you will probably crash
             | the system the next time the driver tries to frob the
             | device)
             | 
             | despite this, virtual machine checkpoints in qemu work well
             | enough for many purposes
        
           | cmrdporcupine wrote:
           | I am not sure about this?
           | 
           | Volatile memory is at this time merely an outgrowth of the
           | uptime of the system. Back when people routinely turned their
           | machines "off and on again", it became part of that
           | convention. But now uptime can be measured in years, and even
           | personal laptops can enter and exit suspended state for weeks
           | on end without clearing volatile memory.
           | 
           | What we have developed in _software_ systems to accommodate
           | this on long running processes is garbage collection.
           | 
           | If the volatile/non-volatile distinction had never developed,
           | all that would have happened is that R&D into garbage
           | collection would have been more intense, and earlier.
           | 
           | In fact Lisp had garbage collection from day 1.
           | 
           | Systems like Smalltalk were also built from the ground up on
           | an image-based model where all reachable state was
           | persistent.
           | 
           | In other words: transient data does not necessitate volatile
           | memory. It necessitates garbage collection, though. (And
           | likely also a distinction in programming between "performant"
           | memory areas and non-performant, assuming our NV storage is
           | the latter.)
           | 
           | In a way, programmers having to deal with their garbage
           | upfront and not relying on _" have you tried turning it off
           | and on again?"_ could have created better software
           | engineering practices earlier? Maybe?
        
       | bugbuddy wrote:
       | Yes, we can absolutely implement this but your computer now runs
       | x times slower and or is y times more expensive. For the vast
       | majority of people, this would be a quaint exercise and real
       | market exists for it.
        
       | qazxcvbnm wrote:
       | > Transactions are not modular because every function needs to
       | know whether it's already in a transaction or not, to be
       | conscious of what global entry point in a completely different
       | module owns the transaction.
       | 
       | I fail to understand the section about why transactions are
       | unmodular. I've never encountered transaction code where the
       | initiator of the transaction would affect the computation; could
       | anyone elucidate this?
        
         | marcosdumay wrote:
         | You can't write this in any random function that you don't know
         | who will call:
         | 
         | do $$
         | 
         | begin transaction;
         | 
         | update page set change_count = change_count + 1 where page_id =
         | 1;
         | 
         | if (select change_count = 100 from page where page_id = 1) then
         | rollback;
         | 
         | else                   commit;
         | 
         | end if;
         | 
         | $$
        
         | layer8 wrote:
         | This is probably about the nesting of transactions.
         | 
         | If you start a transaction when the calling code already
         | started a transaction, then either you get an error because
         | nested transactions are forbidden, or a reference counter is
         | incremented for the transaction, so that when you close your
         | inner transaction, no commit is done at that point, and instead
         | the commit is only done when outermost transaction closes.
         | 
         | This latter case means that you don't know when your inner
         | transaction really commits, and also if you perform multiple
         | inner transactions and the later one fails, the earlier one
         | will implicitly also be rolled back, because they are all
         | really just one outer transaction. Well, of course you could
         | use separate database connections with independent
         | transactions, but then you get into deadlocks or other problems
         | when you really work on the same data.
         | 
         | So you can't have modules that build on each other, and each
         | being able to use transactions independently from each other.
         | Transactions don't compose in that way.
         | 
         | You would basically have to "color" every function based on
         | wether it may perform a transaction or not, and within a
         | transaction block you would only be allowed to call function
         | that don't themselves perform a transaction. It becomes more
         | complicated when transactions are not lexically scoped, but
         | live in an object.
        
       | qazxcvbnm wrote:
       | > Persistence is Orthogonal to the Data Model, ...
       | 
       | I have some experience with a custom data runtime where the
       | persistence is orthogonal to the data model, with silhouettes
       | reminiscent of the described solutions in many of the features of
       | my system, including multiple orthogonal/model-agnostic
       | persistence backends, automatic data synchronisation, persistable
       | executions, automatable schema changes, automatic reactivity.
       | 
       | This direction can indeed bring about great savings in various
       | parts of development; however, it seems to me that more subtlety
       | than indicated in the post is required.
       | 
       | The programmer must be provided with ergonomic means to give
       | denotations for things like when and where to persist, in order
       | to reduce data movement, and to keep the system performant (this
       | does not violate orthogonality; we may specify e.g. to persist at
       | the _logical_ location, say, in the cloud, without having to
       | specify the physical persistence). For instance, considering the
       | case of schema changes, unless the system bundles its language
       | inside the database, for performance sake, to perform such
       | changes in an  "Orthogonal Persistence" system external to the
       | database would take an completely disproportionate amount of time
       | relative to using SQL in the database. The data runtime I work
       | with uses the idea of lenses (where valid lenses would
       | necessarily be reversible) to allow for coherent, undoable schema
       | changes, but I still resort to SQL for regular (eager) migrations
       | (the lenses system for schema changes can still be useful for
       | migrations applied lazily).
        
         | jerf wrote:
         | Or, to put it another like, like visual programming, like
         | "programming languages should be able to wear syntaxes like
         | themes", like "there ought to be some sort of nocode type
         | solution with all the power of conventional programming but
         | easy enough for anyone to pick up", there are _reasons_ why
         | this is not how all programming works already. Good ones and
         | big ones. And none of those reasons are that nobody has had the
         | idea before or put work into implementing it. If you want to
         | succeed with an approach like this, you 're going to need to
         | understand them.
         | 
         | To be honest, such experience as I've had with automated
         | persistence has generally actually _strongly_ convinced me of
         | the opposite, that it is a _positive good_ that we do not get
         | persistence everywhere. Consider the understanding that we get
         | from functional programming that state is generally dangerous
         | and to be carefully managed. Pervasive persistence fights
         | _hard_ against that careful management. Now state is not just
         | in your program up until the OS process is terminated, but it
         | 's _all_ permanently and automatically persisted. You get a
         | _huge_ new class of bugs involving path dependence on what bits
         | of code were running across what bits of state when, and who
         | ran which versions, and you hit them _all the time_ , and they
         | are _nightmares_ to debug. At least when the program has the
         | courtesy to completely cease existing and leave some particular
         | concrete bit of state behind for the future, and then run
         | through your code to load it back from that location, you have
         | boundaries, and procedures for minimization and reconstruction.
         | I actually shy away from too much automated persistence, and
         | also have a very skeptical eye on the ever-present promise of
         | memory that is as fast as RAM but persists like SSDs... I
         | rather expect the computing world will discover that
         | "rebooting" is not just a crutch, but actually a pretty
         | fundamental and useful tool. However much in _theory_ your
         | software should never need it, in practice it 's just too
         | useful.
         | 
         | That said, best of luck to those jousting with this windmill.
         | I'm not saying don't joust, people in general probably don't
         | joust enough, I'm just saying, learn the history of why this
         | hasn't worked before and learn the challenges. Success is at
         | the very least more likely if one learns from the previous
         | efforts.
        
       | 082349872349872 wrote:
       | See the end for a discussion of "Unfriendly Persistence":
       | https://github.com/mighty-gerbils/gerbil-persist/blob/master...
       | 
       | (the data you'd like to keep is more volatile than you'd wish,
       | but the data others keep on you is much less volatile that you'd
       | wish)
        
       | nahuel0x wrote:
       | Surprised of not seeing Smalltalk mentioned on the article.
        
       | AnthonyMQ wrote:
       | You should have a look at https://internetcomputer.org they built
       | everything around orthogonal persistence. Pretty interesting and
       | fun to build on it. I developed https://www.aedile.io
        
       | thom wrote:
       | This was very fashionable 15-20 years ago, in both application
       | and OS research. One such Java framework:
       | 
       | https://prevayler.org/
       | 
       | Less ambitious than TFA overall, I grant you.
        
       | geophile wrote:
       | That phrase certainly brings back memories, from when I worked at
       | an object-oriented database startup.
       | 
       | The object-orientation was actually pretty unimportant, (except
       | for those products that brought in persistence via inheritance --
       | so not _really_ orthogonal). No, the point was adding a new
       | storage class to programming languages.
       | 
       | I worked at Object Design, and we had (IMHO) an incredibly
       | elegant approach. In our approach, persistence really was
       | orthogonal to type, for C/C++. If you want a FooBar, you would
       | write "new FooBar(123)". That gives you a FooBar in the heap,
       | disappears at process end (or on deletion), etc. Or you could
       | write "new(db) FooBar(123)", and then on commit (we had
       | transactions of course), the FooBar would be in the database, and
       | accessible by other processes.
       | 
       | A page-faulting mechanism would bring in pages containing
       | locations that your program referenced. That itself was very
       | elegant.
       | 
       | But the really beautiful thing about this architecture was
       | getting it to work in a 32-bit address space. We did some clever
       | things about mapping portions of the address space during the
       | faulting process to make things work transparently. (This problem
       | pretty much disappears with a 64-bit address space.)
       | 
       | Separate from all that, we had a collection library, integrated
       | with an OO query language. E.g., you could have a collections of
       | widgets in your database, write "widgets[: weight < 0.01 and
       | !strcmp(color, 'red') :], and get back a set containing the
       | qualifying widgets. We also supported 1:1, 1:n, and m:n
       | relationships, which would maintain pointers and sets of pointers
       | in both directions.
       | 
       | It was a "database system" because our VCs wanted it to be. But
       | it really wasn't. It really was a new storage class for C/C++,
       | and later, for Smalltalk and Java.
       | 
       | Object Design also had a spectacularly talented group of
       | engineers, many of whom came from MIT AI Lab/Symbolics.
        
       | Retr0id wrote:
       | >servers only see a unindexed random-looking key value store
       | 
       | (quoted from the main readme)
       | 
       | I bet there's some fun attacks waiting to happen, related to
       | watching for specific access patterns. Avoidable I'm sure, but I
       | imagine it'll require awareness from application developers.
        
         | Retr0id wrote:
         | idk if the authors are reading this, but here's some feedback
         | on the row encryption scheme:
         | 
         | 1. Please use an AEAD!
         | 
         | 2. IIUC, the current design exposes the hashes of the data
         | values. This seems undesirable and I think you can avoid it.
        
       ___________________________________________________________________
       (page generated 2024-03-08 23:00 UTC)