[HN Gopher] Permazen: a different persistence layer for Java
___________________________________________________________________
Permazen: a different persistence layer for Java
Author : mike_hearn
Score : 57 points
Date : 2023-09-19 09:44 UTC (1 days ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| cmrdporcupine wrote:
| I have come to the place where I feel like the word "persistence"
| when applied to data / knowledge / information in this way is
| almost offensively reductionist and actually ignorant of the
| background in the field...
|
| ... Just going to put this here, since it's clear the world still
| hasn't read and absorbed this knowledge, 50 years later.
|
| I wish people would stop retreading the same broken paths over
| and over again.
|
| https://twobithistory.org/2017/12/29/codd-relational-model.h...
|
| https://cs.uwaterloo.ca/~david/cs848s14/codd-relational.pdf
|
| _" Future users of large data banks must be protected from
| having to know how the data is organized in the machine (the
| internal representation) ... A model based on n-ary relations, a
| normal form for data base relations, and the concept of a
| universal data sublanguage are introduced."_
| oftenwrong wrote:
| Based on my reading, Permazen _does_ insulate the user from the
| data organisation.
|
| https://github.com/permazen/permazen/wiki/FAQ#architecture
|
| It provides a Java view of the data, and handles translating
| operations in that later into operations on the underlying
| stores.
| cmrdporcupine wrote:
| _" The Java model layer is a Java-centric, type safe, object-
| oriented persistence layer for Java applications"_
|
| That's exactly the kind of thing Codd was arguing against 50
| years ago.
|
| Information does not fit in Java objects. It doesn't want to
| be stuck (and does not work well) in hierarchical property &
| inheritance relationships. That's not how knowledge works.
|
| Information (and the human mind) _does_ tend towards working
| better when formulated in terms of first order / predicate
| logic. Which is what the relational data model is based upon.
| SQL is a rather crude approximation of that.
| tadfisher wrote:
| True, but we're talking about persisting Java objects, not
| relational data or an approximation of human knowledge or
| whatever. This is a narrower problem space, and one
| solution is treating the storage layer as a KV store,
| ignoring the problem of translating queries to SQL or
| mapping classes to DDL. This library is deliberately
| throwing away the relational model and its power, and the
| README acknowledges such.
| cmrdporcupine wrote:
| The answer is to stop persisting Java objects, and start
| writing applications with relations (facts /
| propositions) in mind.
|
| The OO stuff I think has gotten in the way of our jobs.
| When I worked in Java it was a mountain of pointless
| data/transfer object transformations. Microservices has
| made this worse. It's a really bad paradigm.
| speed_spread wrote:
| So, it's an object database, like Zope's ZODB on Python?
|
| I like the idea, but I'd like to learn about use cases for it.
|
| Otherwise, in Java, MapDB is about as far as I'd be willing to
| go: https://github.com/jankotek/mapdb/
| KronisLV wrote:
| Someone else mentioned jOOQ, but personally I also rather enjoyed
| JDBI3: https://jdbi.org/#_introduction_to_jdbi_3
|
| It addresses the issues with using JDBC directly (not nice
| ergonomics), while still letting you work with SQL directly
| without too many abstractions in the middle. In combination with
| Dropwizard, it was pretty pleasant:
| https://www.dropwizard.io/en/stable/manual/jdbi3.html
|
| Other than that, I actually liked using myBatis with XML mappers:
| https://mybatis.org/mybatis-3/sqlmap-xml.html and their dynamic
| functionality: https://mybatis.org/mybatis-3/dynamic-sql.html
|
| It might sound a bit of crazy on the surface, but their DSL
| actually made sense and was intertwined with the SQL you wrote, a
| bit like templating that you might use for front end stuff,
| except that directly for your database queries. It was great for
| adding complex WHERE parts for specific filters or re-using parts
| of queries.
|
| Either way, it's nice to have various different options and to
| also have some newcomers.
| icedchai wrote:
| I've used JDBI in the past. It seems pretty solid, light
| weight, and eliminated much of the repetition/ceremony involved
| with plain old JDBC. I generally can't stand Java ORMs
| (Hibernate, anyone?)
| nmadden wrote:
| JDBI3 looks nice. I previously had good experiences with
| Dalesbred (https://dalesbred.org), which sounds similar to
| JDBI3's fluent API -- a fairly minimal layer over JDBC with
| better ergonomics.
| dang wrote:
| Recent and related: https://news.ycombinator.com/item?id=37553957
|
| Also _Permazen: Language-Natural Persistence Layer for Java_ -
| https://news.ycombinator.com/item?id=21646037 - Nov 2019 (5
| comments)
|
| The paper:
| https://cdn.jsdelivr.net/gh/permazen/permazen@master/permaze...
| mike_hearn wrote:
| Beyond my reply to gregopet, here's a few other useful features
| Permazen has (and although the library is for Java, you could do
| something similar in any language).
|
| - Schema migrations can be done online, "just in time" on a per
| row (object) basis. This avoids the common problem you can hit
| with some RDBMS where you need downtime to change table schemas,
| or you end up very constrained in what changes you can make.
| Permazen has some powerful schema evolution features that allow
| objects and graphs to be migrated transactionally and in
| arbitrary ways, including full blown type changes, but with
| sensible defaults for the most common kinds of changes (e.g.
| adding an enum value).
|
| - You get access to very highly optimized and scalable backends
| that you might not get if you're limited to an RDBMS, for
| example, you can use RocksDB or FoundationDB. It can be an
| advantage sometimes, for example, RocksDB doesn't need the same
| attention paid to "vacuuming" or "xid wraparound" as Postgres
| does: all maintenance is done automatically and online. It can
| also do things like store data on different tiers of hardware
| (cold data on hard disks, hot data on SSDs).
|
| - Because you can trivially instantiate a TreeMap as an in-memory
| KV store and then serialize it to an efficient encoding, you can
| use it as a file format or network protocol.
|
| - Beyond obvious things like indexes and secondary indexes, it
| abstracts the concept of indexing to arbitrary derived data. You
| can easily define "indexes" using any code you like, containing
| any data you like, and the library ensures the derived data will
| be kept up to date even if it needs to be updated in reaction to
| changes "far away" in the object graph. This is a bit like
| triggers, except it all runs client side and you're not coding it
| all up using SQL.
|
| Then it has all the usual ORM features you'd expect but done
| better, like transactional validation of constraints.
|
| So you can think of this as a really, really advanced object
| persistence/serialization library, or if you like a software
| transactional memory, that has enough features to be usable as a
| database replacement when paired with a good backend.
|
| Last but not least, Archie (the guy who wrote it) is a really
| thorough guy. He's spent a lot of time considering and
| eliminating many of the edge cases that surface with ORMs. The
| API is fully documented and clean. I've used this library for
| small projects and it was always delightfully unsurprising.
|
| The big weakness is there's no real equivalent of the RDBMS
| network protocols, so your code needs to run near the data. We've
| talked occasionally about using sandboxing and code motion to
| move queries over slow links like the global internet, but it's
| never been implemented.
| gregopet wrote:
| Oh lord this looks terrible. Let's reinvent a bunch of wheels and
| maim the database and end up with.. I don't know exactly what,
| even after reading the page for over 5 minutes.
|
| I've tried various Java persistence technologies but this one is
| proving a hard sell to me. I'm currently using jOOQ which has a
| different philosophy, it embraces SQL and brings it closer to
| Java and frankly, I'm very happy with it. And sure, JPA does a
| few things wrong (as the Permazen comparison points out) but from
| what I'm reading I'd still rather use it over Permazen.
|
| The project used to be called JSimpleDB and it should have kept
| that name IMHO since very simple applications for developers
| unfamiliar with SQL may be one of the few valid niches for it.
| mike_hearn wrote:
| Let me try and explain why it's at least _interesting_ , better
| than the website is doing, then.
|
| Firstly, why separate things into a pluggable KV layer and then
| object persistence on top? There are some advantages to doing
| things this way.
|
| One is it lets you use a single technology at every level of
| scalability. Want to store a few objects for your desktop app
| settings file? Use a KV store that's dirt simple (can even be
| text files). Need to store billions of objects? Plug it into
| FoundationDB or Spanner or some other highly scalable backend.
| Generate a tiny in-memory transaction containing a small object
| sub-graph, serialize it to a protobuf-sized message and then
| send it over the network? Use a TreeMap. Want a single process
| but ultra-fast store? Use RocksDB. All those those are easy,
| and the API remains the same. Because (sorted) KV stores have
| pretty uniform interfaces you can even do tricks like
| incremental online migration by composing multiple backends
| together, or get notified when specific object fields change,
| or add a caching layer, and again the API can be uniform.
|
| The closest to this in the RDBMS world would be trying to
| migrate between SQLite and, say, Oracle or some cloud pseudo-
| RDBMS, probably relying on another large abstraction layer on
| top to try and cover up the major differences between the
| backends. And then importing yet another giant library with a
| different set of data definitions and protocols with their own
| limits to handle object serialization outside of the database.
|
| During development and testing you can run against a local or
| in-memory database, then switch to more powerful backends for
| the final rounds of testing. Normally in Java this requires
| something like H2 but then you're using a different DB entirely
| and it causes a whole round of new problems.
|
| As a concrete example of where this is useful, Permazen was
| written for a 911 call dispatching system in the USA that
| required multi-master synchronous replication between
| geographically distributed sites that could tolerate
| connectivity failure between any of the masters and still allow
| each site to function. Adding this sort of feature is fairly
| easy because Permazen comes with a Raft backend, and has ways
| to add hooks for reconciliation of transactions. The whole
| stack is or can be Java when done this way, which makes for a
| very transparent stack.
|
| Secondly, why use "language-integrated persistence" as the
| paper calls it?
|
| SQL and RDBMs are, as I think you'll agree, powerful but very
| complicated technologies. SQL is a full blown programming
| language. If you're comfortable with jOOQ then that means
| you've mastered both Java and SQL and jOOQ and maybe the SQL
| dialect of your specific database as well, which is a lot. Plus
| subjects like schema setup and migration. Permazen asks a
| question about simplification: what if you just needed your
| knowledge of your primary programming language as well as a few
| database basics (like what indexes are)? What if that was
| enough to solve many problems? You've already accepted that
| this is valuable at some level because you're using jOOQ that
| adds something like this on top of Java+SQL, but jOOQ is a huge
| library. In Permazen you're using APIs from the standard
| library like NavigableSet, Stream, along with a few helper
| utilities for things like set intersection. Additionally,
| because you're relying on the host language more there are way
| fewer APIs to learn. This may be more intuitive for some
| developers, especially those that don't use SQL all the time.
|
| Writing queries out as functional maps, filters and folds may
| seem awkward compared to having an RDBMS try and work it out
| for you, but there are reliability and predictability
| advantages. Some RDBMS engines have a problematic failure mode
| in which query performance can change drastically in production
| without anyone actually changing anything, as the statistics of
| the underlying table shift and suddenly cause a change in the
| query plan. It's also easy to write queries in SQL that don't
| have the expected performance. There are _many_ stories on HN
| and elsewhere of heros swooping in to save some company or
| other through the application of a judicious CREATE INDEX
| command. In Permazen, because you 're writing out the sequence
| of steps to get the data, it's always clear from reading the
| code what the query performance will be, and that performance
| will be stable over time.
|
| Finally, as you note, due to the object/relational mismatch JPA
| is a very complex technology yet lacks some features
| regardless. If you want to work with object graphs then
| Permazen does provide a much cleaner API. It also supports
| useful features like online schema migrations - becoming unable
| to change tables without downtime is a major problem with some
| RDBMS.
|
| All that said, it must be noted that Permazen is the personal
| project of a ex-FreeBSD hacker. It runs in production
| successfully for years as part of a larger contracting project,
| but it's not a large commercial project. It's best thought of
| as the starting point for a conversation about persistence
| rather than something that's going to obliterate the
| competition tomorrow.
| lukaseder wrote:
| > This may be more intuitive for some developers, especially
| those that don't use SQL all the time.
|
| I tend to recommend my famous talk to such developers:
| https://www.youtube.com/watch?v=wTPGW1PNy_Y
| mike_hearn wrote:
| Nice talk, you're a great speaker! And BTW I also love jOOQ
| :)
|
| Your example translated to Permazen Kotlin (for concision)
| would be something like this: data class
| IncomeByDate(val film: Film, val date: LocalDate, val
| income: Long) films .flatMap {
| film -> film.rentals.map { IncomeByDate(film, it.date,
| it.amount) } } .groupingBy { it.film }
| .reduce { _, l, r -> l.copy(income = l.income + r.income) }
| .values .sortedWith(comparing<IncomeByDate,
| String> { it.film.title }.thenBy { it.date })
|
| That's not using a convenience library like your jOOL.
|
| You can argue against this in a bunch of ways. We could
| debate readability, the fact that it's not Java (doesn't
| matter technically) or that maybe the RDBMS parallelizes
| operations. We could add parallelism with a
| .parallelStream() in the right spot easily enough. But the
| query is around the same length as the SQL and reads in a
| similar fashion.
|
| You discuss this a bit later but then say, look how much
| time we've spent! Sure, it comes later in your talk, but
| that doesn't equal time spent. You're comparing a SQL query
| you presented fait accompli, vs half a talk of iterating on
| the Java.
|
| I think you probably overestimate how easy SQL is because
| you're an expert in it. For occasional users like me, it
| can be a quirky pain. Even the way strings work is
| unintuitive. But we know the standard libraries of our
| languages pretty well, we have to, it's required for the
| job. Your whole product is built on the fact that SQL isn't
| good enough, there's a lot of problems that remain unsolved
| when you just use SQL. Otherwise jOOQ wouldn't exist.
|
| That's before we get into the different properties of the
| backends, e.g. horizontal read/write scalability for free
| (FDB) vs RDBMS, incremental online schema evolution and so
| on.
| lukaseder wrote:
| I don't know if your various flatMap / etc methods are
| purely implemented in the client (it would be quite bad
| from a performance perspective? But since you're
| implementing reducers in kotlin, I guess that's what this
| is), or if you somehow translate the AST to SQL (similar
| to jinq.org in Java or Slick in Scala or LINQ in .NET).
|
| But in either case, I think that mimicking "idiomatic"
| client APIs is more of a distraction than something
| useful. I've explored this here, where I was asked about
| my opinion on Kotlin's Exposed:
| https://www.youtube.com/watch?v=6Ji9yKnZ3D8
|
| Obviously, this is ultimately a matter of taste, but just
| like all these "better SQL languages" (e.g. PRQL) come
| and go, these translations to "better APIs" also come and
| go. SQL is the only thing to stay (has been for more than
| 50 years now!)
|
| > We could add parallelism with a .parallelStream() in
| the right spot easily enough
|
| You typically don't even need to hint it, the optimiser
| might choose to parallelise on its own, or not, depending
| on production load... Anyway, that's an overrated topic,
| IMO.
|
| > I think you probably overestimate how easy SQL is
| because you're an expert in it.
|
| I'm happy when coding in any language / paradigm. When
| working with XML, I will happily use XSLT, XPath, etc.
| When working with JSON, I don't mind going into
| JavaScript. I'm just trying to stay curious.
|
| I really don't think that SQL is "harder" than any other
| language. It may just be something certain people don't
| really like, for various reasons.
|
| > Your whole product is built on the fact that SQL isn't
| good enough
|
| I think you're projecting your own distaste into my work
| here. I love SQL. SQL is wonderful. It's quirky, yes, but
| what isn't. jOOQ users just don't like working with an
| external DSL _within Java_ (though many are very happy to
| write views and stored procedures with SQL and procedural
| extensions). There 's no need for a false dichotomy here.
| I've worked on large systems that were mostly implemented
| with SQL written in views, and it was perfect!
|
| Also, jOOQ is _much_ more than just the DSL. SQL
| transformation, parsing, etc., it 's a vast product. The
| string-y SQL folks could use it as a template engine,
| without touching the DSL, and still profit from parsing /
| transformations / mapping: https://blog.jooq.org/using-
| java-13-text-blocks-for-plain-sq...
|
| Some customers use jOOQ merely to migrate off Oracle to
| PostgreSQL (via the translating JDBC proxy).
|
| And I'm looking forward to the OpenJDK's Project Babylon.
| Perhaps we'll get actual macros in Java, soon, which
| could work well with jOOQ.
|
| Anyway, I didn't mean to hi-jack too much. It's great
| when people try out new / different approaches. I'm just
| triggered whenever someone claims that SQL is harder than
| anything else, when they should have said, they prefer
| other things and don't want to learn more SQL (which is
| fine, but quite a different statement).
| mike_hearn wrote:
| Permazen is a library that runs in-process, there is no
| network protocol, so it expects to be close to your data.
| When you use flatMap then yes that's compiled for a for
| loop under the hood, and reads on the objects trigger KV
| reads. There are ways to use hinting and pre-fetching and
| such if latency starts getting higher, say if you use
| FoundationDB (which does have a protocol). But in most KV
| stores even those over the network, there is a local
| cache and you can pre-read keys close to each other in
| keyspace.
|
| I guess by easy/hard, I am considering experience to be
| 99% of that. If you transform XML by using XSLT then
| that's great, but most devs won't know what to make of
| that, because they lack the experience. I haven't used
| XSLT since, I think, 2001, and if I had to use it today
| I'd need to relearn it from scratch. That's not _hard_
| hard, it 's just time consuming, and I'd rather write
| perhaps slightly more verbose or less pretty code in a
| language that me+team already use than (re)learn a DSL in
| that case.
| jabradoodle wrote:
| > I don't know if your various flatMap / etc methods are
| purely implemented in the client (it would be quite bad
| from a performance perspective?
|
| Haven't quite groked how it works myself; the readme does
| call out that if your data is separated from your
| application by a high latency network then performance
| likely won't be good enough.
|
| I'm interested in taking a look at this as I maintain
| apps that use local kv stores so this could complement
| the approach I'm already using.
| gregopet wrote:
| Thank you for your explanation!
| gregopet wrote:
| Personally, I found https://microstream.one/ interesting
| for simple persistance needs (haven't tried it, though).
| drzaiusx11 wrote:
| I switched from jOOQ to JDBI, I find JDBIs interfaces more
| ergonomic than jOOQs, which was essentially just a nice SQL
| query builder pattern (when I last used it at least.)
| ianlevesque wrote:
| jOOQ is pretty great, and recently got good Kotlin support. My
| only complaint is if you use their DSL you lose the SQL syntax
| highlighting JetBrains IDEs have.
| selimco wrote:
| In kotlin (maybe also java?) you can use the jetbrains
| @Language annotation to get syntax highlighting on string
| variables. A little more verbose but maybe it works for you.
| gregopet wrote:
| jOOQ doesn't work with string variables (I mean it can, but
| that's not the point of that library). It compiles your
| database schema into objects and then you invoke a very
| SQL-like API over them. The code then reads very close to
| the actual SQL and since the schema is encoded in the Java
| objects you get the usual code completion for things like
| columns, tables, stored procedures, indices..
|
| What you don't get is the more advanced, entire-query-level
| hints, because IDE's can't tell that Java / Kotlin / ...
| code is actually the SQL that will be emitted if you squint
| just a little bit :)
| drzaiusx11 wrote:
| While this may work for greenfield applications, I don't see this
| working well for preexisting schemas. From their getting started
| page: "Database fields are automatically created for any abstract
| getter methods", which definitely scares me away since they seem
| to be relying on automatic field type conversions.
|
| I prefer to manage my schemas when I can and do type and DAO
| conversions via mapper classes in the very simple and elegant
| JDBI framework where you write SQL annotations above your DAO
| methods https://jdbi.org/#_declarative_api
|
| JDBI does wonders for wonky old schemas you've inherited, since
| joins etc work out of the box (just throw them in your
| annotations!) The annotations can also link to .SQL files for the
| big hairy queries.
|
| All these "do magic" frameworks (hibernate being one of the
| first) work in the simple cases but then fall apart whenever you
| need to do anything complex/not-prescribed. I end up having to
| dig into the internals of the framework to see what's going wrong
| which negates their whole value add.
| drzaiusx11 wrote:
| Note: JDBI is an ORM where you provide the M directly (or use
| its default bean mapper class for simple schemas)
|
| NOT to be confused with JDBC or ODBC which are entirely
| different beasts
| DaiPlusPlus wrote:
| The HN link title says "Language-natural persistence to KV
| stores" - but the page itself only mentions Java, and the linked
| "API docs" is just a Javadoc:
| http://permazen.github.io/permazen/site/apidocs/index.html?i...
| mike_hearn wrote:
| The concept can be implemented in any language, this
| implementation just happens to use Java.
|
| More subtly, the design is split into three layers. At the
| bottom is an abstraction over pluggable KV stores. The middle
| layer is the "database" layer, which is language neutral. None
| of the concepts in this API are Java specific.
|
| http://permazen.github.io/permazen/site/apidocs/io/permazen/...
|
| Then the final layer sits on top of the middle database layer,
| and is the language binding to Java. It uses specific Java
| features like annotations and annotation processors to generate
| object bindings, adds things like subtyping, support for enums,
| etc.
|
| In theory you could take a language like Python or JavaScript
| and connect it directly to the middle layer using e.g. GraalVM
| (or reimplement it). You'd want to work out what the equivalent
| language bindings would be for those languages.
| TOGoS wrote:
| It also appears to be about music production, or concerts, or
| something.
|
| Maybe this is an AI-generated website.
| DaiPlusPlus wrote:
| No, it's a real GitHub project: https://github.com/permazen -
| they just used a random website template.
| dang wrote:
| Ok, let's change to https://github.com/permazen/permazen
| instead of https://permazen.io/ above.
|
| Usually we go the other way and prefer the project page,
| but there's clearly not enough info there.
| activitypea wrote:
| "We want your code to look like Java, not learn a new query
| language" combined with "we will support any persistence layer
| under the sun" is going to be a recipe for disaster.
| minzak wrote:
| A bunch of arguments they list on their Github page are
| strawmans.
|
| For example they say nobody wants to learn a new query language.
| I worked with Hibernate's QL as well as Ebean's QL and have to
| say none require any special training. Sure, the syntax differs
| somewhat but you can achieve what you want fairly quickly by
| looking at examples. And they all look like simplified SQL
| anyway.
|
| Also the page says one has to invent DAO layer but again -
| neither Hibernate nor alternatives require this. On the contrary,
| for example if you look at the canonical Playframework/Ebean
| examples they suggest static finders inside POJO classes, which
| work just fine.
| pkage wrote:
| How does this compare to a traditional ORM? I looked through the
| slides and the paper and this seems to largely provide the same
| kind of functionality as a regular ORM but with the caveat that
| you can only do key/value transactions.
| vmfunction wrote:
| Think of it like the localStore in Deno. I guess at this point
| all the lang are going to jump on the band wagon. Why not, all
| lang should have native KV store.
|
| However if we are to jump on in JDK world, persistence is not
| hard. Java have supurb db like H2 with full postgres
| compliance, that can be embedded in memory.
| indigo945 wrote:
| > However if we are to jump on in JDK world, persistence is
| not hard. Java have supurb db like H2 with full postgres
| compliance, that can be embedded in memory.
|
| H2 is a cute database, but it doesn't even come close to
| having "full postgres compliance". Besides, it solves a
| completely different problem from Permazen or JPA - it's a
| storage engine, not a storage abstraction layer.
| mike_hearn wrote:
| Transactions apply across full object graphs. The fact that it
| uses a KV store under the hood is exposed to you, but it
| doesn't impose limits.
|
| For example if you read an object, read some fields, follow a
| reference to another object, loop over a list you find there,
| then make some changes and commit, the underlying transaction
| will only apply if those operations are conflict free.
| thom wrote:
| I remember a long time ago there was a fashion for "prevalence
| layers" instead of persistence layers. The idea being you persist
| your event log, but your app is the materialisation of all those
| events replayed. That way you get stop-the-world persistence but
| you're only ever dealing with your domain model (or the event
| sourced version of it, at least). Obviously you then run into
| issues around the size of the working set, and querying use
| cases. The approach in the article seems like an attempt to have
| your cake and eat it, and so my gut reaction is that it won't
| work, but then I never really had an issue with ORMs in the first
| place - Hibernate was great and I don't think I've seen an
| approach in 15+ years that convinces me we've moved forwards.
___________________________________________________________________
(page generated 2023-09-20 23:03 UTC)