[HN Gopher] Database Protocols Are Underwhelming
___________________________________________________________________
Database Protocols Are Underwhelming
Author : PaulHoule
Score : 71 points
Date : 2025-04-05 13:51 UTC (9 hours ago)
(HTM) web link (byroot.github.io)
(TXT) w3m dump (byroot.github.io)
| tanelpoder wrote:
| The title says Database Protocols, but the author writes only
| about MySQL and Postgres. There are other databases out there,
| for example Oracle that I'm the most familiar with and it has had
| a break/reset mechanism built in to its protocol for at least 30
| years [1]. Its a 2 step operation (2 roundtrips), one sending an
| error back to the client instead of the expected result and the
| second step will reset "current call" (not current session or
| connection) state on the server side and then request the client
| do the same.
|
| Databases/protocols that differentiate between connection,
| session and call scopes, can reliably perform resets without
| requiring to tear down the entire connection.
|
| Retrying is safe with no client side transaction idempotency
| tokens needed, as the error handling will roll back any changes
| done only by the failed _call_ , not the entire transaction or
| session.
|
| [1]: https://tanelpoder.com/2008/04/10/sqlnet-breakreset-to-
| clien...
|
| Edit: added the comment about retry and idempotency.
| atombender wrote:
| Postgres does have "reset" support.
|
| There are several ways commands depending on what you want to
| reset. If you're only interested in mutable state changeable with
| SET, you can use RESET ALL instead. This resets configuration
| settings such as the time zone, statement timeout, and so on.
|
| Postgres has different fine-grained ways to reset other aspects
| of a session, such as closing cursors (CLOSE ALL), unlistening
| from notifications (UNLISTEN), deallocating prepared plans
| (DEALLOCATE), and so on. Finally, there's DISCARD ALL, which
| resets everything.
|
| I agree about error handling. Most clients just disconnect on a
| connection error.
|
| One gotcha about Postgres queries is that running queries do not
| get cancelled when the client disconnects. This has always been a
| weird one that's quite dangerous, as it can lead to dog-piling --
| if a client starts a slow query, then times out or cancels its
| connection, the query lives on; then if another identical request
| starts, same thing happens, and soon you have 100 queries running
| that are doing the same thing and getting slower and slower. You
| have to be very careful with retrying.
| klysm wrote:
| I can see the argument for letting a query finish if it is
| doing any kind of mutation. For a read-only query though,
| surely it would be a good addition in postgres to cancel the
| query? Cancelling things can be quite difficult though in
| practice.
| atombender wrote:
| My understanding is that the difficulty is knowing the client
| disconnected. Postgres doesn't know until it tries to write
| the result to the stream. That's the core team's explanation
| in this [1] ancient thread, at least.
|
| I don't know why Postgres can't write keepalive messages
| while executing the query; speculating here, but it's
| possible that the architecture is synchronous and doesn't
| support doing anything with the connection while the query is
| executing. It's an old threading model where one process is
| started per connection.
|
| I could have sworn I had read about a new configuration
| option introduced in Postgres 14, but I can't find anything
| about it.
|
| [1] https://www.postgresql.org/message-
| id/e09785e00907280622v9b1...
| marcosdumay wrote:
| The database must stay correct if you cut power to the
| machine instead of disconnecting the client, so I don't see
| how mutation would make it impossible to drop partial work.
| throwanem wrote:
| "I would like to share a prepared statement across database
| client sessions" is nearly the exact use case for a view, or some
| other possibly partially materialized decomposition. You are of
| course still going to use prepared statements in your clients to
| avoid naive SQL injection, but that's always going to be true; by
| the time the engine has parsed a query sufficiently to validate
| it can be safely executed with given parameters, it has
| effectively been "prepared," so you're not actually getting a
| meaningful perf benefit there. Where you _do_ see that benefit
| is, as almost always, in query complexity: if your statements
| cost so much to prepare and your sessions are so unreliable that
| this becomes a genuine problem, optimizing query preparation is
| very much the wrong place to start solving.
|
| If you find yourself so frequently hitting socket timeouts
| attempting to execute database queries that you need to modify
| your ORM to work around this problem automatically, your problems
| likewise almost certainly begin at the schema layer. I'm not
| going to blame an ORM maintainer for having to deal with that
| concern in a way that avoids irritating library consumers, and I
| suppose I can see where it does no good arguing that one's own
| clients (said consumers) are using the database wrong in order to
| need such functionality.
|
| I'd have thought to see perf considered to some extent beyond
| blaming the engine, but I suppose there is a kind of honesty in
| simply omitting any discussion of causes for these apparently so
| common "network errors," in cases where they are so common and
| thus must be so transparently retried that the entire existence
| of this behavior could be totally overlooked by the application
| developer nominally responsible for the database interaction.
|
| (I'm curious why a library maintainer needs to assert
| responsibility over session variables at all times, but I suppose
| it makes sense from a perspective that one's users are not
| especially trustworthy, as I suppose most ORM maintainers and
| especially ActiveRecord's must conclude. I guess it's fair to say
| that connections needing to manipulate the session that
| granularly should exist outside the ORM entirely, but then we're
| back to the whole tiresome architectural question that's vexed
| the field for now nearly two decades since Neward's "Vietnam"
| paper:
| https://web.archive.org/web/20060703024451/http://blogs.tedn....)
| atombender wrote:
| > I'm curious why a library maintainer needs to assert
| responsibility over session variables at all times
|
| When pooling connections, you need to reset the connection when
| putting it back in the pool. For example, if you grab a
| connection from the pool and do: SET SESSION
| statement_timeout to '1s'; SET timezone TO
| 'Europe/London';
|
| ...and then release the connection, you don't want the next
| connection to inherit these settings.
|
| It has nothing to do with the ORM (although the ORM itself may
| want to set some baseline settings based on config), and
| everything to do with sharing connections between different
| parts of a single application.
| throwanem wrote:
| I mean I get that, what I don't get is why screwing around
| that way is _not_ explicitly defined as "don't do this,
| here's why it isn't supported and here's what you should try
| to do instead; if you ignore all that and still break
| something, you get to keep both its pieces" in the
| documentation, rather than trying to work around it with some
| kind of global reset.
|
| Why should the engine burden itself with an undo stack (even
| if only one frame deep) for every client, on behalf of a
| minority so incoherently implemented that it can't keep
| straight what it's doing across the span of two (application
| and ORM library) codebases? A metaphor already exists in the
| interface for getting a fresh connection: you do this by
| creating a fresh connection. If you want some other way to
| accomplish the same task, it's fair to require justification
| for the effort of providing that second option, and no such
| justification is presented in the article under discussion.
|
| I don't like ORMs, I think it's long obvious. But to be
| excruciatingly clear, I grant them the same courtesy as any
| other library in that they absolutely may declare things
| explicitly out of scope. Seeing that _not_ done, in a case
| where the consumer is pretty explicitly screwing around with
| internals of a database connection belonging to the library,
| is what 's surprising, even in the context of ORMs as a
| category, where maximalism seems so constantly the order of
| the day.
| hobs wrote:
| This very much depends on the SQL engine you are talking about
| - many early sql engines literally compiled stored procedures
| and didn't allow the dynamism you imply - some still offer such
| features.
|
| Some SQL engines are more sensitive (due to caching plans in
| the first place or not) to this problem as well - SQL Server
| famously utilizes parameter sniffing for performance, which has
| positive implications of skipping work, and the negative of
| skipping work you might need to do.
| throwanem wrote:
| A stored procedure and a prepared statement aren't the same,
| though. I'm not sure how persuasive an argument from how one
| is optimized is meant to be for the other.
|
| My experience has long been that in almost every case where a
| "database performance" problem occurs while an ORM is in use,
| presumptively refactoring to eliminate the ORM almost
| immediately reveals the buried n+1 or other trivially
| pathological pattern which, if the ORM's implementor knows to
| try to avoid it, the ORM's interface typically is not
| sufficiently expressive to disambiguate in any case. (Fair
| do's: In the uncommon case where that interface is so
| expressive, one does usually find the implementation accounts
| for the case.)
|
| Hence my earlier reference to the "Vietnam" paper, which
| dissects in extensive (if unfortunately mostly fruitless)
| detail the sunk cost fallacy at the heart of this problem.
| majkinetor wrote:
| I am interested in "Idempotency Keys" that are mentioned in the
| article. Anybody using those? How are you saving them? What about
| responses? Are they cached or recalculated..
| sgarland wrote:
| Some of this doesn't make sense. If you have common settings that
| you always want enabled, like sql_mode, then set it on the
| server's config, and stop wasting time doing so on every session.
|
| If you _do_ need some session-specific variables, use a
| connection pooler (you should anyway) that handles that.
| ProxySQL, for one, tracks the per-session variables that are set,
| so when a client requests a connection, it hands it one already
| set to its requested specification (if it's available).
|
| > The reason this is important is that whenever the write or the
| read fails, in the overwhelming majority of cases, you don't know
| whether the query was executed on the server or not.
|
| Yes, you definitely can know this in almost all cases. Did you
| receive a result? It succeeded. Did you receive a warning? It
| succeeded, but maybe read the warning. Did you receive an error?
| It did not succeed. Parse the error message to determine what
| went wrong, and if a retry makes sense. Did you receive nothing?
| Now, and only now, do you not know the state.
|
| It blows my mind the number of devs who haven't read the API
| specification for their RDBMS. There is a ton of useful
| information you can glean that's often ignored.
|
| > These statements also have a cost on the server side. Each
| statement requires some amount of memory in the database server.
| So you have to be careful not to create an unbounded amount of
| them, which for an ORM isn't easy to enforce.
|
| Which is why you can limit them. MySQL has a default of 16382;
| not sure about Postgres.
|
| Also, to the complaint about them being session-scoped, this is
| yet another reason to use a connection pooler. They can often
| share them across connections. I know PgBouncer can.
|
| Read. Application. Manuals.
| kiitos wrote:
| > In both cases the idea is the same, we're configuring the
| connection, making it behave differently.
|
| This is the impedance mismatch. The _connection_ exists at a
| layer beneath the database/application layer. You're not
| configuring the _connection_, you're configuring the _session_!
| And the relationship between connections and sessions is one-to-
| many, not one-to-one.
___________________________________________________________________
(page generated 2025-04-05 23:01 UTC)