[HN Gopher] Orthogonal Persistence
___________________________________________________________________
Orthogonal Persistence
Author : mpweiher
Score : 41 points
Date : 2024-03-06 12:33 UTC (2 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| convolvatron wrote:
| having this as a model would be lovely and I agree wholeheartedly
| with the exposition. its interesting though to think about what
| impact this model would have on programming. a lot of our
| processing and tooling are built around the notion that programs
| are _almost_ right, and that we can bring them in and out - and
| hopefully in the process our precious data hasn't been mangled.
|
| when we express state directly in programs, we gain a lot, but
| our notion of trashy disposable execution goes away and now we
| have to think a lot more about how that system evolves.
| usrusr wrote:
| Reminds me of the discussions on hn when Intel Optane wasn't
| quite dead yet. Those always seemed to end with the conclusion
| that if separation between volatile and persistent memory had
| not been forced on us by technological reality, it would be a
| concept we'd better have invented at some point.
| catskul2 wrote:
| Think you can find any of those discussions? I'd be curious
| to read/browse.
| usrusr wrote:
| This is the one that was featured in my memory:
|
| https://news.ycombinator.com/item?id=32314814
|
| But the most recent one is also interesting:
| https://news.ycombinator.com/item?id=38527437
| 082349872349872 wrote:
| I was tangentially involved with an orthogonally persistent
| OS, and we indeed had had to reinvent a distinct journalling
| channel and special optionally-volatile storage, for DBMS-
| style applications.
| kragen wrote:
| other reasons to need optionally-volatile storage include
| secure encryption key generation (reusing randomness often
| fatally compromises it) and device drivers (if you restore
| the internal state of your device driver from a checkpoint,
| but not the state of the device, you will probably crash
| the system the next time the driver tries to frob the
| device)
|
| despite this, virtual machine checkpoints in qemu work well
| enough for many purposes
| cmrdporcupine wrote:
| I am not sure about this?
|
| Volatile memory is at this time merely an outgrowth of the
| uptime of the system. Back when people routinely turned their
| machines "off and on again", it became part of that
| convention. But now uptime can be measured in years, and even
| personal laptops can enter and exit suspended state for weeks
| on end without clearing volatile memory.
|
| What we have developed in _software_ systems to accommodate
| this on long running processes is garbage collection.
|
| If the volatile/non-volatile distinction had never developed,
| all that would have happened is that R&D into garbage
| collection would have been more intense, and earlier.
|
| In fact Lisp had garbage collection from day 1.
|
| Systems like Smalltalk were also built from the ground up on
| an image-based model where all reachable state was
| persistent.
|
| In other words: transient data does not necessitate volatile
| memory. It necessitates garbage collection, though. (And
| likely also a distinction in programming between "performant"
| memory areas and non-performant, assuming our NV storage is
| the latter.)
|
| In a way, programmers having to deal with their garbage
| upfront and not relying on _" have you tried turning it off
| and on again?"_ could have created better software
| engineering practices earlier? Maybe?
| bugbuddy wrote:
| Yes, we can absolutely implement this but your computer now runs
| x times slower and or is y times more expensive. For the vast
| majority of people, this would be a quaint exercise and real
| market exists for it.
| qazxcvbnm wrote:
| > Transactions are not modular because every function needs to
| know whether it's already in a transaction or not, to be
| conscious of what global entry point in a completely different
| module owns the transaction.
|
| I fail to understand the section about why transactions are
| unmodular. I've never encountered transaction code where the
| initiator of the transaction would affect the computation; could
| anyone elucidate this?
| marcosdumay wrote:
| You can't write this in any random function that you don't know
| who will call:
|
| do $$
|
| begin transaction;
|
| update page set change_count = change_count + 1 where page_id =
| 1;
|
| if (select change_count = 100 from page where page_id = 1) then
| rollback;
|
| else commit;
|
| end if;
|
| $$
| layer8 wrote:
| This is probably about the nesting of transactions.
|
| If you start a transaction when the calling code already
| started a transaction, then either you get an error because
| nested transactions are forbidden, or a reference counter is
| incremented for the transaction, so that when you close your
| inner transaction, no commit is done at that point, and instead
| the commit is only done when outermost transaction closes.
|
| This latter case means that you don't know when your inner
| transaction really commits, and also if you perform multiple
| inner transactions and the later one fails, the earlier one
| will implicitly also be rolled back, because they are all
| really just one outer transaction. Well, of course you could
| use separate database connections with independent
| transactions, but then you get into deadlocks or other problems
| when you really work on the same data.
|
| So you can't have modules that build on each other, and each
| being able to use transactions independently from each other.
| Transactions don't compose in that way.
|
| You would basically have to "color" every function based on
| wether it may perform a transaction or not, and within a
| transaction block you would only be allowed to call function
| that don't themselves perform a transaction. It becomes more
| complicated when transactions are not lexically scoped, but
| live in an object.
| qazxcvbnm wrote:
| > Persistence is Orthogonal to the Data Model, ...
|
| I have some experience with a custom data runtime where the
| persistence is orthogonal to the data model, with silhouettes
| reminiscent of the described solutions in many of the features of
| my system, including multiple orthogonal/model-agnostic
| persistence backends, automatic data synchronisation, persistable
| executions, automatable schema changes, automatic reactivity.
|
| This direction can indeed bring about great savings in various
| parts of development; however, it seems to me that more subtlety
| than indicated in the post is required.
|
| The programmer must be provided with ergonomic means to give
| denotations for things like when and where to persist, in order
| to reduce data movement, and to keep the system performant (this
| does not violate orthogonality; we may specify e.g. to persist at
| the _logical_ location, say, in the cloud, without having to
| specify the physical persistence). For instance, considering the
| case of schema changes, unless the system bundles its language
| inside the database, for performance sake, to perform such
| changes in an "Orthogonal Persistence" system external to the
| database would take an completely disproportionate amount of time
| relative to using SQL in the database. The data runtime I work
| with uses the idea of lenses (where valid lenses would
| necessarily be reversible) to allow for coherent, undoable schema
| changes, but I still resort to SQL for regular (eager) migrations
| (the lenses system for schema changes can still be useful for
| migrations applied lazily).
| jerf wrote:
| Or, to put it another like, like visual programming, like
| "programming languages should be able to wear syntaxes like
| themes", like "there ought to be some sort of nocode type
| solution with all the power of conventional programming but
| easy enough for anyone to pick up", there are _reasons_ why
| this is not how all programming works already. Good ones and
| big ones. And none of those reasons are that nobody has had the
| idea before or put work into implementing it. If you want to
| succeed with an approach like this, you 're going to need to
| understand them.
|
| To be honest, such experience as I've had with automated
| persistence has generally actually _strongly_ convinced me of
| the opposite, that it is a _positive good_ that we do not get
| persistence everywhere. Consider the understanding that we get
| from functional programming that state is generally dangerous
| and to be carefully managed. Pervasive persistence fights
| _hard_ against that careful management. Now state is not just
| in your program up until the OS process is terminated, but it
| 's _all_ permanently and automatically persisted. You get a
| _huge_ new class of bugs involving path dependence on what bits
| of code were running across what bits of state when, and who
| ran which versions, and you hit them _all the time_ , and they
| are _nightmares_ to debug. At least when the program has the
| courtesy to completely cease existing and leave some particular
| concrete bit of state behind for the future, and then run
| through your code to load it back from that location, you have
| boundaries, and procedures for minimization and reconstruction.
| I actually shy away from too much automated persistence, and
| also have a very skeptical eye on the ever-present promise of
| memory that is as fast as RAM but persists like SSDs... I
| rather expect the computing world will discover that
| "rebooting" is not just a crutch, but actually a pretty
| fundamental and useful tool. However much in _theory_ your
| software should never need it, in practice it 's just too
| useful.
|
| That said, best of luck to those jousting with this windmill.
| I'm not saying don't joust, people in general probably don't
| joust enough, I'm just saying, learn the history of why this
| hasn't worked before and learn the challenges. Success is at
| the very least more likely if one learns from the previous
| efforts.
| 082349872349872 wrote:
| See the end for a discussion of "Unfriendly Persistence":
| https://github.com/mighty-gerbils/gerbil-persist/blob/master...
|
| (the data you'd like to keep is more volatile than you'd wish,
| but the data others keep on you is much less volatile that you'd
| wish)
| nahuel0x wrote:
| Surprised of not seeing Smalltalk mentioned on the article.
| AnthonyMQ wrote:
| You should have a look at https://internetcomputer.org they built
| everything around orthogonal persistence. Pretty interesting and
| fun to build on it. I developed https://www.aedile.io
| thom wrote:
| This was very fashionable 15-20 years ago, in both application
| and OS research. One such Java framework:
|
| https://prevayler.org/
|
| Less ambitious than TFA overall, I grant you.
| geophile wrote:
| That phrase certainly brings back memories, from when I worked at
| an object-oriented database startup.
|
| The object-orientation was actually pretty unimportant, (except
| for those products that brought in persistence via inheritance --
| so not _really_ orthogonal). No, the point was adding a new
| storage class to programming languages.
|
| I worked at Object Design, and we had (IMHO) an incredibly
| elegant approach. In our approach, persistence really was
| orthogonal to type, for C/C++. If you want a FooBar, you would
| write "new FooBar(123)". That gives you a FooBar in the heap,
| disappears at process end (or on deletion), etc. Or you could
| write "new(db) FooBar(123)", and then on commit (we had
| transactions of course), the FooBar would be in the database, and
| accessible by other processes.
|
| A page-faulting mechanism would bring in pages containing
| locations that your program referenced. That itself was very
| elegant.
|
| But the really beautiful thing about this architecture was
| getting it to work in a 32-bit address space. We did some clever
| things about mapping portions of the address space during the
| faulting process to make things work transparently. (This problem
| pretty much disappears with a 64-bit address space.)
|
| Separate from all that, we had a collection library, integrated
| with an OO query language. E.g., you could have a collections of
| widgets in your database, write "widgets[: weight < 0.01 and
| !strcmp(color, 'red') :], and get back a set containing the
| qualifying widgets. We also supported 1:1, 1:n, and m:n
| relationships, which would maintain pointers and sets of pointers
| in both directions.
|
| It was a "database system" because our VCs wanted it to be. But
| it really wasn't. It really was a new storage class for C/C++,
| and later, for Smalltalk and Java.
|
| Object Design also had a spectacularly talented group of
| engineers, many of whom came from MIT AI Lab/Symbolics.
| Retr0id wrote:
| >servers only see a unindexed random-looking key value store
|
| (quoted from the main readme)
|
| I bet there's some fun attacks waiting to happen, related to
| watching for specific access patterns. Avoidable I'm sure, but I
| imagine it'll require awareness from application developers.
| Retr0id wrote:
| idk if the authors are reading this, but here's some feedback
| on the row encryption scheme:
|
| 1. Please use an AEAD!
|
| 2. IIUC, the current design exposes the hashes of the data
| values. This seems undesirable and I think you can avoid it.
___________________________________________________________________
(page generated 2024-03-08 23:00 UTC)