Post ATbosuRqZVL417ZAwK by zzzeek@fosstodon.org
 (DIR) More posts by zzzeek@fosstodon.org
 (DIR) Post #ATaT4htyeU2EQlz6Zc by blacklight@social.platypush.tech
       2023-03-13T22:49:00Z
       
       0 likes, 0 repeats
       
       I've finally managed to iron most of the remaining wrinkles in the core ORM model for #Platypush. What a pain.I've decided to avoid ORMs if possible in the future, and write my own query templates rather than working around somebody else's workarounds.#SQLAlchemy is great, they said. It allows you to map your relational model to objects, and do all the mapping magic behind the scenes, they said.As long as you don't cache the objects that you read - you have to write your own caching layer and keep it synchronized with SQLAlchemy, preferably through events. Everything should preferably happen within a scoped session, or an (ugly and bug-prone) static global session.As long as you don't access objects with relations outside of a session - you never know what the eager/lazy logic decided to fetch, and, unless you want to deal with tens of pages of documentation, you'd better write your own logic that retrieves and deep copies everything.As long as you don't access the same objects from different threads - unless you implement your own locking logic, your own algorithm to handle concurrent partial updates, create your own deep copies of the objects before passing them around, create your own JSON serializer/deserializer, and/or create your own producer/consumer architecture to make sure that only one thread processes all the updates.In other words, an ORM works for you as long as you don't build a complex modern application. If you do, then you'll probably have to write your own ORM around it anyway.What problem were we trying to solve with ORMs again? And when exactly did we decide that the SQL layer was to ugly and complex to use directly, so we had to wrap it into a big object-oriented model that introduces much more complexity than the original model?How many energies have we invested over the past decades trying to solve this problem of how to fit a circular peg into a square hole?
       
 (DIR) Post #ATaZkDgzxXIucr5kn2 by zzzeek@fosstodon.org
       2023-03-14T00:03:45Z
       
       0 likes, 0 repeats
       
       @blacklight since you tagged sqlalchemy, I would remark that it's fine if the Session model of objects representing proxies to rows in a transaction doesn't work for you, ORMs like peewee, Django, sqlmodel have other patterns that suit many devs better.But it's not reasonable to use the "we are all suffering because of ORMs" trope and/or only toy projects use them.  It should be obvious that this not true for lots of dev teams with serious applications today.
       
 (DIR) Post #ATaac5VSbwe0sHNhVw by blacklight@social.platypush.tech
       2023-03-14T00:13:30Z
       
       0 likes, 0 repeats
       
       @zzzeek I have indeed used SQLAlchemy in several large projects, and every time I've bumped into the same set of problems. Thread-safety is something that should be handled by the app, caching is something that should be handled by the app (or not handled at all, if you want to avoid headaches), lazy (and often not controllable) evaluation of nested relations leads to the software suddenly blowing up when reading an attribute outside of a session, etc.In most of the cases, I've resorted to implement my own full copy of the object after flushing it, and then managing the merge with the existing db objects when I pass the "dummy" object back to the session, but it's a lot of work - and it's also far from efficient. The ORM promises to blend the relational world with the object-oriented world, but in any serious application you'll always need a mapping/schema anyway, because handling the proxy objects directly in the app comes with a lot of strings attached.
       
 (DIR) Post #ATaasIbZg7gk8GyH4a by zzzeek@fosstodon.org
       2023-03-14T00:16:26Z
       
       0 likes, 0 repeats
       
       @blacklight yeah working w/ detached objects is an antipattern in SQLAlchemy, the tutorials all illustrate objects in an attached format.  it looks like you want a kind of global bag of objects that can be called upon by any number of threads / transacitons / connections, that's great but that's not SQLAlchemy's built in model.   it's not the only ORM model.
       
 (DIR) Post #ATackL5Qlg7TQ8GhO4 by zzzeek@fosstodon.org
       2023-03-14T00:17:10Z
       
       0 likes, 0 repeats
       
       @blacklight suffice to say there are a *lot* of quite large real world apps like Openstack that use SQLAlchemy's ORM patterns as is with great success.  it's just a specific pattern.
       
 (DIR) Post #ATackLt3n9onu3UKkC by blacklight@social.platypush.tech
       2023-03-14T00:37:24Z
       
       0 likes, 0 repeats
       
       @zzzeek what's the "right" pattern then? I've worked in two organizations with large projects that made use of SQLAlchemy, and I make a heavy use of it in Platypush as well. And it seems that, in all the use-cases I've been exposed to, everybody struggled to understand what's the right pattern.In one of the projects we used to use SQLAlchemy a-la-JDBC - only executing static SQL and retrieving static result sets.In the other one we created a global "singleton" session wrapped in a lock that all the pieces of the application would use, just to avoid the detached instance error.In Platypush, I'm working on a heavily multithreaded application, and with a schema with polymorphic models and recursive parent-child relations. In some parts I use Marshmallow to map the ORM objects to data classes, in other parts I've resorted to deep-copy everything and recursively expand the relationship myself - otherwise hell breaks lose when serializing the objects to JSON.In all these examples, something smelled very fishy to me, like me and my colleagues clearly didn't get the "right way" of using the framework.So what's the right way of using a proxy object? Start a session, use the proxies, explicitly copy the attributes that we want to reuse outside of the session, and toss it away?
       
 (DIR) Post #ATaea9QJTbNKGCJ0YS by zzzeek@fosstodon.org
       2023-03-14T00:57:57Z
       
       0 likes, 0 repeats
       
       @blacklight if you are using the unit of work pattern then you'd want to use that data in the context of a DB transaction.   if the objects are long running outside the scope of a thread / transaction, then you have to build some system to keep those objects around in some way.  you can detach them sure whatever.  but if you are manipulating them and requiesting data that they don't have, obviously you have to connect to the database to do that
       
 (DIR) Post #ATagi91T3AI5F5cj3I by zzzeek@fosstodon.org
       2023-03-14T01:21:48Z
       
       0 likes, 0 repeats
       
       @blacklight it sounds like your applications want some kind of global "structure of truth" that is basically a proxy to I would assume a mostly static database, where requesting data that isn't loaded uses ad-hoc connections.  The Session used to have a mode called "autocommit" that was more like this, but it had problems.  obv that "global" structure is not so global, different parts of the app change it inconsitently, and you are inventing your own object database essentially.
       
 (DIR) Post #ATbG9bDZpEaau37A3c by zzzeek@fosstodon.org
       2023-03-14T01:25:40Z
       
       0 likes, 0 repeats
       
       @blacklight im not actually kidding here, if there's a robust "global structure of truth" pattern that is worth having, someone should do it.  but i have a feeling at its core it would need the objects to be fully immutable.   it could use a different kind of Session that reintroduces the "autocommit" idea, loses "flush" and "transactions" entirely.  as long as the lines aren't blurry which is what we used to have.  blurry lines help noone
       
 (DIR) Post #ATbG9blFo1WmaUs3Y8 by blacklight@social.platypush.tech
       2023-03-14T07:58:57Z
       
       0 likes, 0 repeats
       
       @zzzeek I don't have a silver bullet either, but what I've built in Platypush so far seems to work fine for all of my integrations. Basically my pattern can be summarized as:- Use the session to fetch the objects- Ability to tell the ORM engine "eagerly fetch all the relations in the result set without me expanding/copying all the attributes explicitly - don't worry dear ORM, I know what I'm doing" (one of the main problems in my implementation is that my main class has a reflective parent-children relation reported on both ends, and doing something like e.g. entity.parent.children[0].children[1] outside of the session is likely to break things with DetachedInstanceError randomly even if the objects were retrieved with lazy=joined)- When the objects are flushed, committed and moved outside of the session, they should be "dehydrated" automatically - i.e. they should be accessible just like a data class, without SQLAlchemy magic happening behind the scenes, so those objects can be the actual model of the application (regardless of the complexity of the application) without needing additional mappers- When the objects are used within the context of a session again, they should be "re-hydrated" - merge any gaps between the db and local representation etc.My main problem is basically that I can't use objects fetched within a session outside of the session without having to think of a lot of internals of the framework.
       
 (DIR) Post #ATbGhwv7dvR63crV0i by rimu@mastodon.nzoss.nz
       2023-03-14T01:22:19Z
       
       0 likes, 0 repeats
       
       @blacklight Fabio: "fight me about ORMs"Me: "no thanks"Just use an ORM for that 90% of the app which is simple CRUD and use SQL directly for the tricky bits. Isn't that what everyone does?
       
 (DIR) Post #ATbGhxeUvDjSKM5jjk by blacklight@social.platypush.tech
       2023-03-14T08:05:10Z
       
       0 likes, 0 repeats
       
       @rimu using plain SQL is becoming kind of old-fashioned apparently, and SQLAlchemy 2 has even removed some of the constructs that support raw SQL. It's a trend that started already ~2016 (when almost all the teams in Booking were invited to start migrating their raw SQL statements in the code to DBI's ORM), and I've only seen it increase since then.
       
 (DIR) Post #ATbfZoB2lQ31TiWXEu by zzzeek@fosstodon.org
       2023-03-14T12:43:46Z
       
       0 likes, 0 repeats
       
       @blacklight first off it looks like you do have a notion of how ORM can work, so there's that./1
       
 (DIR) Post #ATbostX7yNy7BD1sX2 by zzzeek@fosstodon.org
       2023-03-14T12:43:58Z
       
       0 likes, 0 repeats
       
       @blacklight secondly, I'm sorry to say a lot of critiques of SQLAlchemy seem to come this way, where you list out a bunch of things that are completely supported, possible, and ordinary.   you can set all the relationships to eager load.  if you are looking for more flexible option forms rather than structural, there are wildcards for that, if the wildcards need more flexibility, that's in the area of improvements that can be made with demonstrated use cases. /2
       
 (DIR) Post #ATbosuRqZVL417ZAwK by zzzeek@fosstodon.org
       2023-03-14T13:33:13Z
       
       0 likes, 0 repeats
       
       @blacklight i will add that the case of parent.children[0].children[1].parent.children gets a little crazy and I think we've made some improvements to that recently.   the issue is the ORM doesnt want to get into endless loops.   selectinload has very new features for loading as deeply as possible until it encounters itself again, though still might not yet cover this case.  it should not be surprising that eager loaders are trying to avoid loading the whole DB in as one giant graph
       
 (DIR) Post #ATbosvBZpTv0IwxhDc by blacklight@social.platypush.tech
       2023-03-14T14:28:06Z
       
       0 likes, 0 repeats
       
       @zzzeek I didn't know about selectinload, but looking at how it's implemented it looks like it could solve some of my issues.About "avoiding endless loops" - that's exactly why I laid my argument about "hey dear ORM, I know what I'm doing, please just let me do it" :) in my application I've already added logic that prevents relational loops from being inserted. I mean, it can eventually become a thin line between relational and graph-like databases (at that explains why graph DBs like Neo4j and Arango load everything in memory), but if a developer is aware about these trade-offs, and wants to be able to explore relationship networks even outside of the session, they should probably be allowed to do so.
       
 (DIR) Post #ATbosvpHRrg4I5XP6W by zzzeek@fosstodon.org
       2023-03-14T12:46:10Z
       
       0 likes, 0 repeats
       
       @blacklight also lazy=joined is not a good eager loader to use for very nested eager loading.  we have a loader called "selectinload" that is much better.   /3
       
 (DIR) Post #ATbosy7QvLO1Oy2vg0 by zzzeek@fosstodon.org
       2023-03-14T12:47:17Z
       
       0 likes, 0 repeats
       
       @blacklight third, "dehydration", detached objects are plainly accessible.  DetachedInstanceError only occurs when you access an attribute that was never loaded.   do you want it to just return None, which is essentially the wrong answer?  there's actually a loader for that called noload.  But returning the wrong answer is not really SQLAlchemy's style.  I thought you were going to suggest detached objects just fire up a connection and load.  there's actually a way to do that too.. /4
       
 (DIR) Post #ATbot0JCmXzWC3ZMIq by zzzeek@fosstodon.org
       2023-03-14T12:57:07Z
       
       0 likes, 0 repeats
       
       @blacklight 5/ merge, there's of course session.merge() and helpers like merge_frozen_result() for more full-on caching usecases (see dogpile.cache example which I used in production for this kind of problem).  merge() as you know in order to refresh the objects has to run a database query.   so if you have a complex graph of objects, you might want to requery it more efficiently. merge() now accepts loader options too, so you can merge(object, options=[eagerloaders..])
       
 (DIR) Post #ATbot2Ox09m8gSGyXI by zzzeek@fosstodon.org
       2023-03-14T12:57:14Z
       
       0 likes, 0 repeats
       
       @blacklight seems like these are existing cases or areas where if improvements are needed we of course can make them, thanks for the specifics
       
 (DIR) Post #ATbpvBhvTN1A4yjml6 by zzzeek@fosstodon.org
       2023-03-14T14:39:42Z
       
       0 likes, 0 repeats
       
       @blacklight the "load any number of levels deep" problem is very difficult, I did tackle it in 2.0 with "recursion_depth": https://docs.sqlalchemy.org/en/20/orm/queryguide/relationships.html#sqlalchemy.orm.selectinload however it only works so far for N1->N1 relationships, not N1->N2->... it was very difficult to keep it from spinning out of control
       
 (DIR) Post #ATlsbYPqTlDOHMFWDI by blacklight@social.platypush.tech
       2023-03-19T10:56:53Z
       
       0 likes, 0 repeats
       
       @zzzeek after months spent fighting and debugging the ORM, I think I've finally reached a situation where things look quite stable and predictable.I think that the problem in my case is that my relational designed pushed a bit too many boundaries at the same time.I have an `Entity` class, which is extended polymorphically via `polymorphically_identity` on a `type` column by a bunch of other classes (e.g. `Device`, `Sensor` etc.). So multiple physical records for each entity in order to map its type.On top of that, each entity has a `parent` relationship that can point to another entity (with lazy=joined), with a backref to `children` (with lazy=immediate).Then I have a bunch of plugins each running in its own thread. When these plugins want to push a new entity (either newly created or updated), they create a model instance on the fly and push it to the engine queue, where the engine is the single thread in charge of periodically persisting the received objects - I had to go for this design after too many headaches trying to handle concurrent sessions in SQLAlchemy.This also means that the plugins omit the primary key when creating their model instances. They use <external_id, plugin> as an alternative key instead. So when the engine has to flush the objects in the session, it first has to infer if an object with the same alternative key already exists on the db, and update that one instead of creating a new one.Crucially, the application needs to support partial updates on multiple dimensions. For example, a plugin can push an update only for the `temperature` field of a `Sensor` object. In that case, the application must be clever enough to:1. Detect if the `Sensor` entity has already been persisted (by doing an alternative key lookup) and if it also has other columns (in that case, it should coalesce all the non-null attributes).2. Detect if the entity has persisted `children` that haven't been reported on the model instance, and if so merge the newly reported children with the existing ones (and do this recursively down the whole hierarchy), in order to prevent sudden "orphan" instances.3. Detect if the entity has a parent that has already been persisted, and wire it to the existing parent if that's the case (and do this recursively up the whole hierarchy).4. Eventually flush all the entities (both those that have been coalesced/merged and those that are newly created) in a consistent way.Implementing this design in SQLAlchemy has really given me headaches for months. I've eventually resorted to a solution that does the following:1. When a batch of entities has to be flushed, the engine creates a new session, it extracts *all* the alternative keys of all the entities in the hierarchy (both upwards, following the `parent` relationship, and downwards, following the `children` relationship) and it does a big select on the alternative key to eagerly fetch everything related to those entities that already exists, so I don't risk running into DetachedInstanceErrors later.2. Detect any loops in the branches (using the alternative key as an identifier).3. Recursively rewire all the given entities to those that already exist, being careful not to remove any links to any of the existing children when rewriting the `children` attribute.4. Coalesce the column values with those newly provided.5. Once the hierarchies have been merged/updated, it climbs all the way up to the root entities of those provided in the batch (i.e. the entities with `parent=None`), and it uses BFS to split the initial batch using a "layer cake" approach - i.e. first a `session.add_all` for the root entities, then for the batch containing the first layer of children, then the second, and so on.6. After flushing and committing the session, the engine performs a deep copy of the flushed entities (i.e. it creates new model instances that aren't attached to any session, and whose parent/children references are also detached all the way up and down) and it notifies the upstream plugins about the new objects.Eventually it's quite convoluted, but it's the only way I could find after a lot of research and debugging to prevent the logic from exploding either with DetachedInstanceError, or for reusing instances attached to other sessions (mostly when some of the `parent` or `children` in the hierarchy hadn't been eagerly reloaded in the current session), or for IntegrityError when using `merge` on objects without a populated primary key, or to prevent "orphan" relationships when a plugin pushes an update only for one child.I also feel that I've pushed the envelope of what is traditionally possible with a DBMS a bit too much, and the lines with a graph db are now a bit blurred, so probably some of the custom logic around the ORM was inevitable. But SQLAlchemy could have probably made my life a bit easier if:1. `Session.merge` could also support object merging on other attributes other than the primary key.2. It provided an easy way to eagerly fetch everything (upwards and downwards) when using recursive relationships.3. It provided an easy way to coalesce all the columns of two instances (e.g. "consider this constant as an empty value, and update the existing instance using the most recent available value").4. It provided an easy way to add/merge an object to the collection of children of another object using an alternative key as an identifier, and doing all the rewiring as required.Do you think there was an easier way to achieve this with the tools provided by SQLAlchemy? As a note, I'm still on SQLAlchemy 1.4.x - the version 2.0 introduced several breaking changes on my non-ORM layer that I haven't managed to fix yet.
       
 (DIR) Post #ATm8PP8gk5fjIcS8K8 by zzzeek@fosstodon.org
       2023-03-19T13:53:58Z
       
       0 likes, 0 repeats
       
       @blacklight briefly, the problem with merge() overall is that it's slow because it works a row at a time.  it's a method taken straight from hibernate.    so we've never really embraced "merge()" as a place we want to expand things, as there are better ways to "merge" things:1. SELECT all the rows you want to merge on whatever criteria.   now those objects are all in the identity map.2. now merge the state you have into that list of objects you just loaded.
       
 (DIR) Post #ATmWyUMvgjJMYQNOKW by zzzeek@fosstodon.org
       2023-03-19T13:57:10Z
       
       0 likes, 0 repeats
       
       @blacklight that is, instead of building up a big structure of your ORM objects outside of a session first, start with the session, load everything that exists first, then build out that object graph.if you truly have the objects on the outside already, do all these same steps but then build up a dictionary on the criteria you are "merging" on.   A simplistic version of this is in the very old recipe https://github.com/sqlalchemy/sqlalchemy/wiki/UniqueObject and there is also some thinking on this at https://github.com/sqlalchemy/sqlalchemy/wiki/SessionIndexing
       
 (DIR) Post #ATmWyUwNYvfSKMxhaK by zzzeek@fosstodon.org
       2023-03-19T13:57:15Z
       
       0 likes, 0 repeats
       
       @blacklight The other way to "merge" efficiently on "any" criteria is to use upserts; for PostgreSQL this is ON CONFLICT DO UPDATE.   We have good support for this in SQLAlchemy 1.4 and even better in 2.0; you can get ORM objects back from an INSERT..ON CONFLICT..RETURNING statement.
       
 (DIR) Post #ATmWyVOjrULvkKELmy by blacklight@social.platypush.tech
       2023-03-19T18:29:15Z
       
       0 likes, 0 repeats
       
       @zzzeek the approach you described is exactly what I've converged on, with the additional twist that an `Entity` object has recursive parent/children relationships to other `Entity` objects, so eagerly load everything in the session involves recursively climbing hierarchies up and down in order to do a big SELECT FROM entity WHERE (external_id=.. AND plugin=..) OR (..) that loads all the objects that may require changes into the session. Then the merge logic on the custom key is handled by the application logic.The thing is that I barely feeling like using the ORM in this case. I mean, I use a session to load everything upfront (and it's up to my logic to figure out everything that needs loading), but then all the merge happens within my business logic - and I also have to disable autocommit and autoflush to prevent things from being flushed before they're ready. Then I hook into the ORM again at the end of the process to do a batched add_all followed by flush+commit. Granted, I expected to do some work to make the ORM fit onto my business logic, but not so much :)About ON CONFLICT DO UPDATE - I've seen this being often referenced as an alternative solution, but apparently it piggybacks on the constructs provided by the underlying DBMS - so I have to explicitly use the MySQL or Postgres dialects to express it. That's a bit of a constraint because my application has been quite database-agnostic so far - and the default DBMS is SQLite. Is there a way to support the on_conflict_update construct across multiple backends?
       
 (DIR) Post #ATmnK2dh479KBpW5q4 by zzzeek@fosstodon.org
       2023-03-19T21:32:25Z
       
       0 likes, 0 repeats
       
       @blacklight we dont quite have a dialect-agnostic upsert right now as the way the three supporting DBs + also SQL Server and Oracle do these, they are extremely different.