[HN Gopher] Orthogonal Persistence
___________________________________________________________________
Orthogonal Persistence
Author : mpweiher
Score : 100 points
Date : 2024-03-06 12:33 UTC (3 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| convolvatron wrote:
| having this as a model would be lovely and I agree wholeheartedly
| with the exposition. its interesting though to think about what
| impact this model would have on programming. a lot of our
| processing and tooling are built around the notion that programs
| are _almost_ right, and that we can bring them in and out - and
| hopefully in the process our precious data hasn't been mangled.
|
| when we express state directly in programs, we gain a lot, but
| our notion of trashy disposable execution goes away and now we
| have to think a lot more about how that system evolves.
| usrusr wrote:
| Reminds me of the discussions on hn when Intel Optane wasn't
| quite dead yet. Those always seemed to end with the conclusion
| that if separation between volatile and persistent memory had
| not been forced on us by technological reality, it would be a
| concept we'd better have invented at some point.
| catskul2 wrote:
| Think you can find any of those discussions? I'd be curious
| to read/browse.
| usrusr wrote:
| This is the one that was featured in my memory:
|
| https://news.ycombinator.com/item?id=32314814
|
| But the most recent one is also interesting:
| https://news.ycombinator.com/item?id=38527437
| 082349872349872 wrote:
| I was tangentially involved with an orthogonally persistent
| OS, and we indeed had had to reinvent a distinct journalling
| channel and special optionally-volatile storage, for DBMS-
| style applications.
| kragen wrote:
| other reasons to need optionally-volatile storage include
| secure encryption key generation (reusing randomness often
| fatally compromises it) and device drivers (if you restore
| the internal state of your device driver from a checkpoint,
| but not the state of the device, you will probably crash
| the system the next time the driver tries to frob the
| device)
|
| despite this, virtual machine checkpoints in qemu work well
| enough for many purposes
| cmrdporcupine wrote:
| I am not sure about this?
|
| Volatile memory is at this time merely an outgrowth of the
| uptime of the system. Back when people routinely turned their
| machines "off and on again", it became part of that
| convention. But now uptime can be measured in years, and even
| personal laptops can enter and exit suspended state for weeks
| on end without clearing volatile memory.
|
| What we have developed in _software_ systems to accommodate
| this on long running processes is garbage collection.
|
| If the volatile/non-volatile distinction had never developed,
| all that would have happened is that R&D into garbage
| collection would have been more intense, and earlier.
|
| In fact Lisp had garbage collection from day 1.
|
| Systems like Smalltalk were also built from the ground up on
| an image-based model where all reachable state was
| persistent.
|
| In other words: transient data does not necessitate volatile
| memory. It necessitates garbage collection, though. (And
| likely also a distinction in programming between "performant"
| memory areas and non-performant, assuming our NV storage is
| the latter.)
|
| In a way, programmers having to deal with their garbage
| upfront and not relying on _" have you tried turning it off
| and on again?"_ could have created better software
| engineering practices earlier? Maybe?
| 48864w6ui wrote:
| Back in the days when minicomputers (which required a walk
| to the air-conditioned machine room to reboot) and
| microcomputers (which had a case or keyboard switch)
| coexisted, the former were way less flaky than the latter.
| bugbuddy wrote:
| Yes, we can absolutely implement this but your computer now runs
| x times slower and or is y times more expensive. For the vast
| majority of people, this would be a quaint exercise and real
| market exists for it.
| qazxcvbnm wrote:
| > Transactions are not modular because every function needs to
| know whether it's already in a transaction or not, to be
| conscious of what global entry point in a completely different
| module owns the transaction.
|
| I fail to understand the section about why transactions are
| unmodular. I've never encountered transaction code where the
| initiator of the transaction would affect the computation; could
| anyone elucidate this?
| marcosdumay wrote:
| You can't write this in any random function that you don't know
| who will call:
|
| do $$
|
| begin transaction;
|
| update page set change_count = change_count + 1 where page_id =
| 1;
|
| if (select change_count = 100 from page where page_id = 1) then
| rollback;
|
| else commit;
|
| end if;
|
| $$
| layer8 wrote:
| This is probably about the nesting of transactions.
|
| If you start a transaction when the calling code already
| started a transaction, then either you get an error because
| nested transactions are forbidden, or a reference counter is
| incremented for the transaction, so that when you close your
| inner transaction, no commit is done at that point, and instead
| the commit is only done when outermost transaction closes.
|
| This latter case means that you don't know when your inner
| transaction really commits, and also if you perform multiple
| inner transactions and the later one fails, the earlier one
| will implicitly also be rolled back, because they are all
| really just one shared outer transaction.
|
| Of course, you could use separate database connections with
| independent transactions, but then you get into deadlocks or
| other problems when you really work on the same data.
|
| So you can't have modules that build on each other, while each
| being able to use transactions independently from each other.
| Transactions don't compose in that way.
|
| You would basically have to "color" every function based on
| wether it may perform a transaction or not, and within a
| transaction block you would only be allowed to call functions
| that don't themselves perform a transaction. It becomes more
| complicated when you have transactions that are not lexically
| scoped, but for example live in an object.
| qazxcvbnm wrote:
| > Persistence is Orthogonal to the Data Model, ...
|
| I have some experience with a custom data runtime where the
| persistence is orthogonal to the data model, with silhouettes
| reminiscent of the described solutions in many of the features of
| my system, including multiple orthogonal/model-agnostic
| persistence backends, automatic data synchronisation, persistable
| executions, automatable schema changes, automatic reactivity.
|
| This direction can indeed bring about great savings in various
| parts of development; however, it seems to me that more subtlety
| than indicated in the post is required.
|
| The programmer must be provided with ergonomic means to give
| denotations for things like when and where to persist, in order
| to reduce data movement, and to keep the system performant (this
| does not violate orthogonality; we may specify e.g. to persist at
| the _logical_ location, say, in the cloud, without having to
| specify the physical persistence). For instance, considering the
| case of schema changes, unless the system bundles its language
| inside the database, for performance sake, to perform such
| changes in an "Orthogonal Persistence" system external to the
| database would take an completely disproportionate amount of time
| relative to using SQL in the database. The data runtime I work
| with uses the idea of lenses (where valid lenses would
| necessarily be reversible) to allow for coherent, undoable schema
| changes, but I still resort to SQL for regular (eager) migrations
| (the lenses system for schema changes can still be useful for
| migrations applied lazily).
| jerf wrote:
| Or, to put it another like, like visual programming, like
| "programming languages should be able to wear syntaxes like
| themes", like "there ought to be some sort of nocode type
| solution with all the power of conventional programming but
| easy enough for anyone to pick up", there are _reasons_ why
| this is not how all programming works already. Good ones and
| big ones. And none of those reasons are that nobody has had the
| idea before or put work into implementing it. If you want to
| succeed with an approach like this, you 're going to need to
| understand them.
|
| To be honest, such experience as I've had with automated
| persistence has generally actually _strongly_ convinced me of
| the opposite, that it is a _positive good_ that we do not get
| persistence everywhere. Consider the understanding that we get
| from functional programming that state is generally dangerous
| and to be carefully managed. Pervasive persistence fights
| _hard_ against that careful management. Now state is not just
| in your program up until the OS process is terminated, but it
| 's _all_ permanently and automatically persisted. You get a
| _huge_ new class of bugs involving path dependence on what bits
| of code were running across what bits of state when, and who
| ran which versions, and you hit them _all the time_ , and they
| are _nightmares_ to debug. At least when the program has the
| courtesy to completely cease existing and leave some particular
| concrete bit of state behind for the future, and then run
| through your code to load it back from that location, you have
| boundaries, and procedures for minimization and reconstruction.
| I actually shy away from too much automated persistence, and
| also have a very skeptical eye on the ever-present promise of
| memory that is as fast as RAM but persists like SSDs... I
| rather expect the computing world will discover that
| "rebooting" is not just a crutch, but actually a pretty
| fundamental and useful tool. However much in _theory_ your
| software should never need it, in practice it 's just too
| useful.
|
| That said, best of luck to those jousting with this windmill.
| I'm not saying don't joust, people in general probably don't
| joust enough, I'm just saying, learn the history of why this
| hasn't worked before and learn the challenges. Success is at
| the very least more likely if one learns from the previous
| efforts.
| dTal wrote:
| Offtopic perhaps, but I am interested in reading an
| explanation of the good, big reasons why not "programming
| languages should be able to wear syntaxes like themes".
| Racket seems quite an interesting counterpoint, and I've
| never heard it argued as fundamentally flawed.
| 082349872349872 wrote:
| See the end for a discussion of "Unfriendly Persistence":
| https://github.com/mighty-gerbils/gerbil-persist/blob/master...
|
| (the data you'd like to keep is more volatile than you'd wish,
| but the data others keep on you is much less volatile that you'd
| wish)
| nahuel0x wrote:
| Surprised of not seeing Smalltalk mentioned on the article.
| layer8 wrote:
| Smalltalk doesn't persist by default, you have to explicitly
| save a snapshot of your image.
| mpweiher wrote:
| Still pretty orthogonal...
| jdougan wrote:
| Not so much Smalltalk, but Gemstone/S should have gotten a
| mention.
| AnthonyMQ wrote:
| You should have a look at https://internetcomputer.org they built
| everything around orthogonal persistence. Pretty interesting and
| fun to build on it. I developed https://www.aedile.io
| thom wrote:
| This was very fashionable 15-20 years ago, in both application
| and OS research. One such Java framework:
|
| https://prevayler.org/
|
| Less ambitious than TFA overall, I grant you.
| sillywalk wrote:
| I'm not sure if counts as "orthogonal persistence", but there's
| the Aurora Operating System[0] from 2021. It's apparently based
| on FreeBSD, and can run most unmodified apps, as well as having
| an API to support its Store.
|
| "We present the Aurora single level store (SLS), an OS that
| simplifies persistence by automatically per- sisting all
| traditionally ephemeral application state. With recent storage
| hardware like NVMe SSDs and NVDIMMs, Aurora is able to
| continuously checkpoint entire applications with millisecond
| granularity. Aurora is the first full POSIX single level store
| to han- dle complex applications ranging from databases to web
| browsers"
|
| [0] https://rcs.uwaterloo.ca/pubs/hotos21-aurora.pdf
| mdaniel wrote:
| BSD 3-clause
| https://github.com/prevayler/prevayler/blob/master/LICENSE.t...
| geophile wrote:
| That phrase certainly brings back memories, from when I worked at
| an object-oriented database startup.
|
| The object-orientation was actually pretty unimportant, (except
| for those products that brought in persistence via inheritance --
| so not _really_ orthogonal). No, the point was adding a new
| storage class to programming languages.
|
| I worked at Object Design, and we had (IMHO) an incredibly
| elegant approach. In our approach, persistence really was
| orthogonal to type, for C/C++. If you want a FooBar, you would
| write "new FooBar(123)". That gives you a FooBar in the heap,
| disappears at process end (or on deletion), etc. Or you could
| write "new(db) FooBar(123)", and then on commit (we had
| transactions of course), the FooBar would be in the database, and
| accessible by other processes.
|
| A page-faulting mechanism would bring in pages containing
| locations that your program referenced. That itself was very
| elegant.
|
| But the really beautiful thing about this architecture was
| getting it to work in a 32-bit address space. We did some clever
| things about mapping portions of the address space during the
| faulting process to make things work transparently. (This problem
| pretty much disappears with a 64-bit address space.)
|
| Separate from all that, we had a collection library, integrated
| with an OO query language. E.g., you could have a collections of
| widgets in your database, write "widgets[: weight < 0.01 and
| !strcmp(color, 'red') :], and get back a set containing the
| qualifying widgets. We also supported 1:1, 1:n, and m:n
| relationships, which would maintain pointers and sets of pointers
| in both directions.
|
| It was a "database system" because our VCs wanted it to be. But
| it really wasn't. It really was a new storage class for C/C++,
| and later, for Smalltalk and Java.
|
| Object Design also had a spectacularly talented group of
| engineers, many of whom came from MIT AI Lab/Symbolics.
| aeontech wrote:
| That sounds fascinating! Does anything like this exist now in
| the open source world?
| geophile wrote:
| Not that I know of.
| jinjin2 wrote:
| The closest I know of is Realm, which is an amazing object
| database and seems to do a lot of what the op describes. But
| after the acquisition by MongoDB they seem to have drifted a
| bit more in a document database direction.
| jdougan wrote:
| How would you compare it to Gemstone/S? I spent a large chunk
| of the 90s working on a maintenance management system witha GS
| back end.
| mpweiher wrote:
| Can you describe your experience working with that GS
| backend?
| jdougan wrote:
| What is is you want to know?
| Retr0id wrote:
| >servers only see a unindexed random-looking key value store
|
| (quoted from the main readme)
|
| I bet there's some fun attacks waiting to happen, related to
| watching for specific access patterns. Avoidable I'm sure, but I
| imagine it'll require awareness from application developers.
| Retr0id wrote:
| idk if the authors are reading this, but here's some feedback
| on the row encryption scheme:
|
| 1. Please use an AEAD!
|
| 2. IIUC, the current design exposes the hashes of the data
| values. This seems undesirable and I think you can avoid it.
| couchand wrote:
| > Atomic sections must be short: long atomic sections will break
| liveness, i.e. may cause the system to become unresponsive.
|
| > ...
|
| > Based on disk latency, we may target say a millisecond as
| duration before which to commit the current transaction. When the
| timer is reached, the transaction is delayed until all current
| atomic sections are completed; and (possibly after a grace
| period) new atomic sections are blocked from even being started,
| until after the transaction is committed.
|
| Maybe I'm reading this wrong, but the limitations on transaction
| duration seem to be disqualifying for real usage? If it's not
| possible to run an atomic transaction for longer than a few
| milliseconds without bringing the system down?
| pilgrim0 wrote:
| It seems like the Tuple Space [1] model for distributed
| computing, put forth by Linda, lends itself perfectly for this
| use case. I very much appreciate the ratio of simplicity to power
| offered by tuple spaces. At it's original form maybe it's too
| simple, but there are many ways it can be improved upon and
| brought into modernity.
|
| [1] https://en.m.wikipedia.org/wiki/Tuple_space
___________________________________________________________________
(page generated 2024-03-09 23:01 UTC)