[HN Gopher] Database Protocols Are Underwhelming
       ___________________________________________________________________
        
       Database Protocols Are Underwhelming
        
       Author : PaulHoule
       Score  : 71 points
       Date   : 2025-04-05 13:51 UTC (9 hours ago)
        
 (HTM) web link (byroot.github.io)
 (TXT) w3m dump (byroot.github.io)
        
       | tanelpoder wrote:
       | The title says Database Protocols, but the author writes only
       | about MySQL and Postgres. There are other databases out there,
       | for example Oracle that I'm the most familiar with and it has had
       | a break/reset mechanism built in to its protocol for at least 30
       | years [1]. Its a 2 step operation (2 roundtrips), one sending an
       | error back to the client instead of the expected result and the
       | second step will reset "current call" (not current session or
       | connection) state on the server side and then request the client
       | do the same.
       | 
       | Databases/protocols that differentiate between connection,
       | session and call scopes, can reliably perform resets without
       | requiring to tear down the entire connection.
       | 
       | Retrying is safe with no client side transaction idempotency
       | tokens needed, as the error handling will roll back any changes
       | done only by the failed _call_ , not the entire transaction or
       | session.
       | 
       | [1]: https://tanelpoder.com/2008/04/10/sqlnet-breakreset-to-
       | clien...
       | 
       | Edit: added the comment about retry and idempotency.
        
       | atombender wrote:
       | Postgres does have "reset" support.
       | 
       | There are several ways commands depending on what you want to
       | reset. If you're only interested in mutable state changeable with
       | SET, you can use RESET ALL instead. This resets configuration
       | settings such as the time zone, statement timeout, and so on.
       | 
       | Postgres has different fine-grained ways to reset other aspects
       | of a session, such as closing cursors (CLOSE ALL), unlistening
       | from notifications (UNLISTEN), deallocating prepared plans
       | (DEALLOCATE), and so on. Finally, there's DISCARD ALL, which
       | resets everything.
       | 
       | I agree about error handling. Most clients just disconnect on a
       | connection error.
       | 
       | One gotcha about Postgres queries is that running queries do not
       | get cancelled when the client disconnects. This has always been a
       | weird one that's quite dangerous, as it can lead to dog-piling --
       | if a client starts a slow query, then times out or cancels its
       | connection, the query lives on; then if another identical request
       | starts, same thing happens, and soon you have 100 queries running
       | that are doing the same thing and getting slower and slower. You
       | have to be very careful with retrying.
        
         | klysm wrote:
         | I can see the argument for letting a query finish if it is
         | doing any kind of mutation. For a read-only query though,
         | surely it would be a good addition in postgres to cancel the
         | query? Cancelling things can be quite difficult though in
         | practice.
        
           | atombender wrote:
           | My understanding is that the difficulty is knowing the client
           | disconnected. Postgres doesn't know until it tries to write
           | the result to the stream. That's the core team's explanation
           | in this [1] ancient thread, at least.
           | 
           | I don't know why Postgres can't write keepalive messages
           | while executing the query; speculating here, but it's
           | possible that the architecture is synchronous and doesn't
           | support doing anything with the connection while the query is
           | executing. It's an old threading model where one process is
           | started per connection.
           | 
           | I could have sworn I had read about a new configuration
           | option introduced in Postgres 14, but I can't find anything
           | about it.
           | 
           | [1] https://www.postgresql.org/message-
           | id/e09785e00907280622v9b1...
        
           | marcosdumay wrote:
           | The database must stay correct if you cut power to the
           | machine instead of disconnecting the client, so I don't see
           | how mutation would make it impossible to drop partial work.
        
       | throwanem wrote:
       | "I would like to share a prepared statement across database
       | client sessions" is nearly the exact use case for a view, or some
       | other possibly partially materialized decomposition. You are of
       | course still going to use prepared statements in your clients to
       | avoid naive SQL injection, but that's always going to be true; by
       | the time the engine has parsed a query sufficiently to validate
       | it can be safely executed with given parameters, it has
       | effectively been "prepared," so you're not actually getting a
       | meaningful perf benefit there. Where you _do_ see that benefit
       | is, as almost always, in query complexity: if your statements
       | cost so much to prepare and your sessions are so unreliable that
       | this becomes a genuine problem, optimizing query preparation is
       | very much the wrong place to start solving.
       | 
       | If you find yourself so frequently hitting socket timeouts
       | attempting to execute database queries that you need to modify
       | your ORM to work around this problem automatically, your problems
       | likewise almost certainly begin at the schema layer. I'm not
       | going to blame an ORM maintainer for having to deal with that
       | concern in a way that avoids irritating library consumers, and I
       | suppose I can see where it does no good arguing that one's own
       | clients (said consumers) are using the database wrong in order to
       | need such functionality.
       | 
       | I'd have thought to see perf considered to some extent beyond
       | blaming the engine, but I suppose there is a kind of honesty in
       | simply omitting any discussion of causes for these apparently so
       | common "network errors," in cases where they are so common and
       | thus must be so transparently retried that the entire existence
       | of this behavior could be totally overlooked by the application
       | developer nominally responsible for the database interaction.
       | 
       | (I'm curious why a library maintainer needs to assert
       | responsibility over session variables at all times, but I suppose
       | it makes sense from a perspective that one's users are not
       | especially trustworthy, as I suppose most ORM maintainers and
       | especially ActiveRecord's must conclude. I guess it's fair to say
       | that connections needing to manipulate the session that
       | granularly should exist outside the ORM entirely, but then we're
       | back to the whole tiresome architectural question that's vexed
       | the field for now nearly two decades since Neward's "Vietnam"
       | paper:
       | https://web.archive.org/web/20060703024451/http://blogs.tedn....)
        
         | atombender wrote:
         | > I'm curious why a library maintainer needs to assert
         | responsibility over session variables at all times
         | 
         | When pooling connections, you need to reset the connection when
         | putting it back in the pool. For example, if you grab a
         | connection from the pool and do:                   SET SESSION
         | statement_timeout to '1s';         SET timezone TO
         | 'Europe/London';
         | 
         | ...and then release the connection, you don't want the next
         | connection to inherit these settings.
         | 
         | It has nothing to do with the ORM (although the ORM itself may
         | want to set some baseline settings based on config), and
         | everything to do with sharing connections between different
         | parts of a single application.
        
           | throwanem wrote:
           | I mean I get that, what I don't get is why screwing around
           | that way is _not_ explicitly defined as  "don't do this,
           | here's why it isn't supported and here's what you should try
           | to do instead; if you ignore all that and still break
           | something, you get to keep both its pieces" in the
           | documentation, rather than trying to work around it with some
           | kind of global reset.
           | 
           | Why should the engine burden itself with an undo stack (even
           | if only one frame deep) for every client, on behalf of a
           | minority so incoherently implemented that it can't keep
           | straight what it's doing across the span of two (application
           | and ORM library) codebases? A metaphor already exists in the
           | interface for getting a fresh connection: you do this by
           | creating a fresh connection. If you want some other way to
           | accomplish the same task, it's fair to require justification
           | for the effort of providing that second option, and no such
           | justification is presented in the article under discussion.
           | 
           | I don't like ORMs, I think it's long obvious. But to be
           | excruciatingly clear, I grant them the same courtesy as any
           | other library in that they absolutely may declare things
           | explicitly out of scope. Seeing that _not_ done, in a case
           | where the consumer is pretty explicitly screwing around with
           | internals of a database connection belonging to the library,
           | is what 's surprising, even in the context of ORMs as a
           | category, where maximalism seems so constantly the order of
           | the day.
        
         | hobs wrote:
         | This very much depends on the SQL engine you are talking about
         | - many early sql engines literally compiled stored procedures
         | and didn't allow the dynamism you imply - some still offer such
         | features.
         | 
         | Some SQL engines are more sensitive (due to caching plans in
         | the first place or not) to this problem as well - SQL Server
         | famously utilizes parameter sniffing for performance, which has
         | positive implications of skipping work, and the negative of
         | skipping work you might need to do.
        
           | throwanem wrote:
           | A stored procedure and a prepared statement aren't the same,
           | though. I'm not sure how persuasive an argument from how one
           | is optimized is meant to be for the other.
           | 
           | My experience has long been that in almost every case where a
           | "database performance" problem occurs while an ORM is in use,
           | presumptively refactoring to eliminate the ORM almost
           | immediately reveals the buried n+1 or other trivially
           | pathological pattern which, if the ORM's implementor knows to
           | try to avoid it, the ORM's interface typically is not
           | sufficiently expressive to disambiguate in any case. (Fair
           | do's: In the uncommon case where that interface is so
           | expressive, one does usually find the implementation accounts
           | for the case.)
           | 
           | Hence my earlier reference to the "Vietnam" paper, which
           | dissects in extensive (if unfortunately mostly fruitless)
           | detail the sunk cost fallacy at the heart of this problem.
        
       | majkinetor wrote:
       | I am interested in "Idempotency Keys" that are mentioned in the
       | article. Anybody using those? How are you saving them? What about
       | responses? Are they cached or recalculated..
        
       | sgarland wrote:
       | Some of this doesn't make sense. If you have common settings that
       | you always want enabled, like sql_mode, then set it on the
       | server's config, and stop wasting time doing so on every session.
       | 
       | If you _do_ need some session-specific variables, use a
       | connection pooler (you should anyway) that handles that.
       | ProxySQL, for one, tracks the per-session variables that are set,
       | so when a client requests a connection, it hands it one already
       | set to its requested specification (if it's available).
       | 
       | > The reason this is important is that whenever the write or the
       | read fails, in the overwhelming majority of cases, you don't know
       | whether the query was executed on the server or not.
       | 
       | Yes, you definitely can know this in almost all cases. Did you
       | receive a result? It succeeded. Did you receive a warning? It
       | succeeded, but maybe read the warning. Did you receive an error?
       | It did not succeed. Parse the error message to determine what
       | went wrong, and if a retry makes sense. Did you receive nothing?
       | Now, and only now, do you not know the state.
       | 
       | It blows my mind the number of devs who haven't read the API
       | specification for their RDBMS. There is a ton of useful
       | information you can glean that's often ignored.
       | 
       | > These statements also have a cost on the server side. Each
       | statement requires some amount of memory in the database server.
       | So you have to be careful not to create an unbounded amount of
       | them, which for an ORM isn't easy to enforce.
       | 
       | Which is why you can limit them. MySQL has a default of 16382;
       | not sure about Postgres.
       | 
       | Also, to the complaint about them being session-scoped, this is
       | yet another reason to use a connection pooler. They can often
       | share them across connections. I know PgBouncer can.
       | 
       | Read. Application. Manuals.
        
       | kiitos wrote:
       | > In both cases the idea is the same, we're configuring the
       | connection, making it behave differently.
       | 
       | This is the impedance mismatch. The _connection_ exists at a
       | layer beneath the database/application layer. You're not
       | configuring the _connection_, you're configuring the _session_!
       | And the relationship between connections and sessions is one-to-
       | many, not one-to-one.
        
       ___________________________________________________________________
       (page generated 2025-04-05 23:01 UTC)