hngopher.com

       [HN Gopher] Safe relational database queries using the Rust type...
       ___________________________________________________________________
        
       Safe relational database queries using the Rust type system
        
       Author : lukastyrychtr
       Score  : 264 points
       Date   : 2024-11-30 09:29 UTC (1 days ago)
        
 (HTM) web link (blog.lucasholten.com)
 (TXT) w3m dump (blog.lucasholten.com)
        
       | echelon wrote:
       | Good luck with this, Lucas! We need all of the innovation in the
       | Rust DB space we can get.
       | 
       | I can't use yours in production yet as it only supports SQLite,
       | but I'll keep my eyes peeled for updates.
       | 
       | We're using Sqlx and Diesel in production currently.
       | 
       | Sqlx is great because it's not an ORM and lets you write "type
       | checked" raw SQL. It frustrates us in the fact that you can't
       | write dynamic queries and have them be type checked, though. No
       | `IN` clauses or conditional predicates, only static queries with
       | simple bindings get type checked.
       | 
       | Love to see more ORMs like yours! Diesel isn't our favorite.
        
         | Simpliplant wrote:
         | After being frustrated with Diesel, I gave a try to SeaORM and
         | I couldn't be happier (have been using it for more than a
         | year). It is not perfect but certainly the best ORM I have ever
         | used myself (vs Diesel, ActiveRecord, SQLAlchemy)
        
           | kstrauser wrote:
           | How come? I'm new to the space and picked Diesel a couple
           | weeks ago for a new project because it seemed the most
           | recommended. What do you prefer about SeaORM?
        
             | Simpliplant wrote:
             | To me, SeaORM feels like a well-designed library while many
             | things in Diesel feel like an afterthought or a hack. Which
             | mostly manifests itself when doing something complex and
             | debugging it.
             | 
             | Also recommend reading https://www.sea-
             | ql.org/SeaORM/docs/internal-design/diesel/
        
               | kstrauser wrote:
               | I'll check it out. Thanks!
        
               | echelon wrote:
               | sea-query looks a lot like jOOQ!
        
               | weiznich wrote:
               | Please note that this comparison is outdated since at
               | least 2 years, given that diesel-async exists for more
               | than 2 years now and this page completely forgets to
               | mention it.
        
           | satvikpendem wrote:
           | SeaORM is not compile time safe by their own admission so
           | it's automatically out of the running, personally speaking.
        
             | kelnos wrote:
             | That feels like a weird requirement to me; an ORM is by its
             | very nature not going to be able to validate queries at
             | compile-time.
             | 
             | And even if you're writing queries directly, any time you
             | need to write something dynamic, you're not going to be
             | able to verify those at compile-time either.
        
               | satvikpendem wrote:
               | > an ORM is by its very nature not going to be able to
               | validate queries at compile-time.
               | 
               | I don't see why this is by its "very nature" when
               | libraries like diesel or Prisma already do this.
               | 
               | > any time you need to write something dynamic, you're
               | not going to be able to verify those at compile-time
               | either.
               | 
               | Diesel can validate dynamic queries at compile time too,
               | sqlx too to some extent.
        
         | written-beyond wrote:
         | My first Rust PR was for a Diesel Plugin crate, their macros
         | weren't updated with Rusts newer spec.
         | 
         | I honestly really enjoyed Diesel but it's extremely slow
         | migration to async really made it lose traction.
         | 
         | I've used SQLx extensively for projects after that and it was
         | pretty decent.
        
           | weiznich wrote:
           | Please note that up until today an async rust database
           | library does not give you any measurable performance
           | advantage compared to a sync database library. In addition to
           | that it won't even matter for most applications as they don't
           | reach the required scale to even hit that bottleneck. As a
           | matter of facts crates.io just run well on sync diesel up
           | until maybe a month ago. They have now switched to diesel-
           | async, for certain specific not-performance related reasons.
           | The lead developer there told me specifically that from a
           | performance point of view the service would have been fine
           | with sync diesel for quite a while without problems, and
           | that's with somewhat exponential growth of requests.
           | 
           | Other than that the async rust ecosystem is still in a place
           | that makes it literally impossible to provide strong
           | guarantees around handling transactions and make sure that
           | they ended, which is a main reason why diesel-async is not
           | considered stable from my side yet. This problem exists in
           | all other async rust database libraries as well, as it's an
           | language level problem. They just do not document this
           | correctly.
        
             | written-beyond wrote:
             | WEIZNICH! I am your biggest fan. Thank you so much for the
             | amazing crates you've made and for Diesel.
             | 
             | When I started my rust journey Diesel and Rocket were the
             | first few crates I worked with and it really opened up my
             | mind and I haven't turned back since.
             | 
             | As for async, my primary concern for that was to make sure
             | you minimise blocking code calls in your futures. Even if
             | there weren't many performance gains to be made from Diesel
             | itself being async but having database calls marked as
             | Futures could theoretically help the runtime schedule and
             | manage other threads.
             | 
             | I'd like to thank you for you amazing work once more!
        
         | k972 wrote:
         | Depending on what database you're using, you might be able to
         | restructure your dynamic queries to fit a static form...
         | 
         | WHERE ... AND CASE WHEN ? IS NULL THEN TRUE ELSE col ILIKE ? ||
         | '%' END AND CASE WHEN ? IS NULL THEN TRUE ELSE col2 IN (?) END
         | 
         | Each "dynamic" parameter is then bound twice. (My personal
         | preference is to use a CTE to bind and name my parameter values
         | in the query instead of binding multiple times whenever
         | possible).
        
           | kelnos wrote:
           | That only works when the dynamic portion is in the WHERE
           | clause, though. I often need to dynamically generate UPDATEs
           | and even sometimes INSERTs.
        
       | typicalset wrote:
       | This looks interesting as an approach to relating schema to data
       | types. One aspect that feels very counter-intuitive/unidiomatic
       | is that, if I understand correctly, in your example there is no
       | Schema enum. Usually this sort of macro does not erase the type
       | defined, and I would find it extremely confusing that I cannot
       | reference, say, Schema::User{..}. It would be clearer to me if
       | this were defined inside of a macro_rules macro, more like
       | lazy_static e.g.
       | 
       | ``` schema!{ User { name: String, }, Story { author: User, title:
       | String, content: String, }, } ```
        
         | pwdisswordfishz wrote:
         | I was just about to ask WTF is a Schema-typed value supposed to
         | represent.
        
       | davidatbu wrote:
       | I am really happy to see some more exploration in the typesafe-
       | db-access-in-Rust space.
       | 
       | > The existing libraries don't provide the compile time
       | guarantees that I want and are verbose or awkward like SQL.
       | 
       | Worth noting: diesel definitely fulfills the "providing compile
       | time guarantees" criteria.
       | 
       | Here's where I stand on the inevitable ORM-vs-no-ORM debate
       | that's about to erupt: I like typesafe query builders that don't
       | abstract over sql ( I'd put diesel in this category, and i would
       | not put activerecord or djangos orm or sealORM in this category).
       | 
       | It looks like rust-query will lean towards the full-ORM side of
       | that spectrum. Not my cup of tea, but the tea industry can
       | accommodate many flavors :)
        
         | pdimitar wrote:
         | > _Worth noting: diesel definitely fulfills the "providing
         | compile time guarantees" criteria._
         | 
         | Also sqlx.
         | 
         | I liked the article and I bookmarked the library but this
         | statement from the author confused me a bit. At least two
         | libraries give you pretty good compile-time guarantees. sqlx
         | even gives you the option to specify a DB where it can check
         | whether your queries will succeed at runtime.
        
           | weiznich wrote:
           | > Also sqlx.
           | 
           | The guarantees provides by sqlx are less strong than what's
           | provided by diesel due to the fact that sqlx needs to know
           | the complete query statically at compile time. This excludes
           | dynamic constructs like `IN` expressions or dynamic where
           | clauses from the set of checked queries. Diesel also verifies
           | that these queries are correct at compile time.
        
         | Kinrany wrote:
         | Agree. SQL libraries should aim to make any valid query
         | possible to express (though semantic equivalents are fine)
         | while integrating query building into the rest of the language
         | as seamlessly as possible.
        
       | Diggsey wrote:
       | SQL, while not great, does have some advantages that are hard to
       | beat:
       | 
       | - Everyone knows at least some basic SQL, even non-technical
       | users have often encountered it in some form.
       | 
       | - The documentation for eg. PostgreSQL is for SQL, so if you
       | write queries in anything else you have to mentally translate
       | back and forth, and so you need to know SQL anyway.
       | 
       | - Any external tools you use to interact with the database will
       | use SQL.
       | 
       | - Changing the queries doesn't require an expensive compilation
       | step - `sqlx` gets the best of both worlds in this respect, in
       | that it is able to type-check the parameters and rely on the
       | database itself to validate the query, so you don't end up with a
       | ton of type-system shenanigans that increase compilation times.
       | 
       | Maybe for a brand new database, a better query language could win
       | out, but having used sqlx I can't imagine going back to a query-
       | builder style interface for existing SQL databases.
        
         | jeltz wrote:
         | Yeah, I am using sqlx and it is amazing. The only thing it
         | cannot do is help you in the case when you actually need a
         | query builder, e.g. when you need support for queries with
         | dynamic filters and sort orders. But other than that it is much
         | nicer to work with than every other query library I have used.
        
           | Diggsey wrote:
           | FWIW, you very rarely need to make the query itself dynamic.
           | After all, SQL alone is powerful enough to do conditionals.
           | 
           | For example:                   SELECT * FROM example
           | WHERE ($1 IS NULL OR field1 = $1)         AND ($2 IS NULL OR
           | field2 = $2)         ...
           | 
           | When you supply the parameters to this prepared statement,
           | the constant conditions should be optimized away by the query
           | planner if you're using a decent database.
        
             | ris wrote:
             | Those tricks cover a very small subset of dynamic queries,
             | and if you don't give your developers a proper tool to do
             | dynamic queries in, your code will start growing these
             | funny string-formatting-based query-mangling functions that
             | have god-knows-what escaping problems behind them.
        
               | Diggsey wrote:
               | That has not been my experience. SQL is a fully fledged
               | programming language so there's very little that you
               | can't express this way.
               | 
               | What are you doing that _requires_ dynamically building a
               | query?
        
               | kelnos wrote:
               | > _That has not been my experience_
               | 
               | Well, it _has_ been mine. That trick you suggested works
               | for the simple binary  "either include this WHERE clause
               | filter or don't" -- which, agreed, is something that
               | comes up often enough -- but it's only in the simplest of
               | CRUD apps where that's all I've needed.
               | 
               | A quick example: I have a REST endpoint that updates a
               | resource. The JSON fields that map to the DB columns can
               | be missing (leave the column value unchanged), null (set
               | the column value to NULL), or have a value (set the new
               | provided value). The list of assignments in the UPDATE
               | needs to be dynamic; sometimes there might be only one
               | column being updated, sometimes two, three, four five...
        
               | jon_richards wrote:
               | Isn't the standard just to set the unchanged column to
               | itself?
        
               | LinXitoW wrote:
               | It's a fully fledged, horrible language. Anything beyond
               | basic queries is unreadable, ESPECIALLY when it's done in
               | plain strings in another language. There's not even a way
               | to not have to repeat values EVERY SINGLE TIME (think
               | variables or constants in every other language).
               | 
               | Oh, but what about <feature>? Well, is that SQL, or a
               | frankensteined version of SQL, aka a "dialect"?
               | 
               | SQL is the JavaScript of databases, and we'll be better
               | for it once we admit this.
        
             | 63stack wrote:
             | Can you show an example where you have 4 possible fields to
             | filter on, and then one more where the sorting field is
             | dynamic? (Not the value, but the column)
        
           | pimeys wrote:
           | You can add sea-orm on top of sqlx for a query builder. It
           | can also introspect your SQL database and generate entity
           | types.
           | 
           | I built my own back in the days when I worked for Prisma.
           | None of these crates existed. It's not that complex if you
           | limit the builder to queries you need in your application.
           | 
           | But, to be honest, jOOQ or LINQ would be awesome to get for
           | Rust.
        
         | unshavedyak wrote:
         | I am toying with spreadsheet backends these days and it's had
         | me wondering - do spreadsheet query languages offer value that
         | SQL is missing? Beyond "people are used to it" of course.
         | 
         | One thing i am wanting beyond SQL though is a more Object
         | oriented language. Ie i want something SQL-like (or more
         | specifically PRQL, which i find very nice), but without the
         | being so tailored towards a flat data structure. I want to use
         | it on JSON/etc.
        
         | wslh wrote:
         | BTW, do you suggest a LINQ kind of integration in Rust? I just
         | found the Microsoft has a Rust for C#/.NET developers resource
         | [1][2] which is more about integration I think. It is a little
         | weird that there is not Rust.NET yet. .NET is not a garbage
         | collection only platform. I guess Microsoft is working
         | internally on this.
         | 
         | [1] https://microsoft.github.io/rust-for-dotnet-
         | devs/latest/linq... [2] https://microsoft.github.io/rust-for-
         | dotnet-devs/latest/
        
           | estebank wrote:
           | I'm impressed by this individual's effort to bring rust to
           | the clr: https://github.com/FractalFir/rustc_codegen_clr
           | 
           | It is already way more complete given it's age than I would
           | have expected.
           | 
           | I agree that the clr is a more natural target than, let's
           | say, the JVM (at least until Project Valhalla comes out).
        
       | kstrauser wrote:
       | Thanks for releasing this! More ideas are great and welcome, and
       | even I don't use this directly, maybe the ORM I use will like
       | some of the features and add them.
       | 
       | But I have to admit, this one bit almost made me stop reading:
       | 
       | > For those who don't know, SQL is the standard when it comes to
       | interacting with databases.
       | 
       | I can scarcely imagine anyone who both 1. would want to use this
       | and 2. doesn't know what SQL is. So if I'm reading this, and
       | you're telling me what a database is, I assume you assume I'm an
       | idiot. This makes me wonder if it has design choices to protect
       | hypothetical Rust-using, DB-using idiots from themselves at the
       | expense of making it hard to handle edge cases.
       | 
       | I'm sure you didn't mean it that way, but that's how it rubbed
       | me. Maybe consider leaving out the extremely basic introductory
       | bits.
        
         | davidatbu wrote:
         | A more charitable take is that OP tried to make his blogpost
         | more accessible to newbies at a very small verbosity cost for
         | non-newbies.
         | 
         | Fwiw: i'm totally fine with that.
         | 
         | > This makes me wonder if it has design choices to ...
         | 
         | I personally felt that the relevant design choices were
         | explicitly enumerated, and so one needs not to deduce them from
         | the rest of the text.
        
           | kstrauser wrote:
           | That's fair, and I offered that mainly in the spirit of
           | feedback. _To me_ , and perhaps to no one else, but just _to
           | me_ , it was a little off putting. Like, "yeah, yeah, I know
           | what SQL is. Do you think I don't? Or did you just learn
           | about it last month and you're eager to share?"
           | 
           | I'm certainly not gonna lose sleep over it. Someone came on
           | HN to show us their neat new project. I wanted to give them
           | my first impression to help them craft their future
           | messaging. If they disagree, fine with me! They don't owe me
           | anything.
        
       | mattrighetti wrote:
       | Looks nice! I've been using SeaQuery for a while now but
       | documentation is not that great if you're looking to create some
       | more advanced queries.
       | 
       | While I liked the idea of having strongly typed queries I've
       | lately found out that sometimes they unnecessarily slow me down
       | during the development process, I'm considering going back to the
       | good old prep statements and binding values manually.
        
         | giovannibonetti wrote:
         | You might be interested in SQLx, since you write the SQL but
         | the library generates the tedious bindings for you.
        
           | mattrighetti wrote:
           | That's my go-to! Maybe it was not super clear from my comment
           | but I'm not using sea-query-orm but just the query builder,
           | once the query is build you still have to feed it to sqlx
           | (e.g [0]) :)
           | 
           | [0]: https://github.com/SeaQL/sea-
           | query/blob/master/examples/sqlx...
        
       | ocschwar wrote:
       | Thanks for this! I am looking forward to putting this together
       | with the Zola CMS for some hobby applications.
        
       | ninetyninenine wrote:
       | >My opinion is that SQL should be for computers to write. This
       | would put it firmly in the same category as LLVM IR.
       | 
       | This is nuts.
       | 
       | SQL is a high level language. It's higher level than python or
       | rust. It's basically a declarative statement that's almost
       | english-like and it's specifically DESIGNED to be more readable
       | and easier to use by humans. It compiles down into many
       | procedures that don't easily map to the SQL statement itself. You
       | can't get any higher level than SQL.
       | 
       | The issue here is that the database exists at the bottleneck of
       | web dev. It's where state mutation happens and it's essentially
       | usually the slowest part of the pipeline in a computer. Yet
       | instead of having fine grained low level control over this part
       | of the pipeline, we have a high level language on top of it. So
       | if we want to optimize this part of the pipeline we have to HACK
       | the query. We have to make the abstraction layer leaky in the API
       | itself with EXPLAIN. We can't do direct optimizations because SQL
       | is so high level. It is a bit of a problem but the industry is so
       | entrenched in SQL that it's actually 10x more efficient to just
       | use it then to develop an API that's more appropriate for this
       | level. SQL is tech debt we are sort of stuck with. The ideal API
       | would be one that is both high level but allows fine grained
       | control... but we don't have one yet.
       | 
       | To use machine learning analogies. SQL is a local optima. There's
       | a much more optimal language somewhere in this space but we are
       | stuck in the local optima and it's likely we will never end up
       | finding the actual optimal api.
       | 
       | In short SQL is the furthest thing from LLVM IR. It's crazy. You
       | cannot treat it as the same thing. If you do there are huge
       | problems.
       | 
       | The problem with rust query and the problem with ORMs in general
       | is that the API for these libraries are in itself high level.
       | They are HIGH level abstractions ON TOP of high level
       | abstractions. You want to optimize a query now? Well you need to
       | hack the first high level abstraction in such a way that it hacks
       | the second high level abstraction such that it produces optimized
       | compiled procedures. That's the problem here.
       | 
       | All this ORM stuff is just programmer OCD. We don't want to do
       | meta programming where we have another language living as a
       | string in our web app. We want everything fully integrated so we
       | create an abstraction in attempt to get rid of an abstraction
       | that was intended to be an abstraction in itself. It's aesthetics
       | and the aesthetics actually makes life harder.
        
         | vlovich123 wrote:
         | > You can't get any higher level than SQL.
         | 
         | Can't you? Wouldn't higher level be a goal oriented natural
         | language like "what's the most popular music file in the
         | database" and an AI agent just figured out how to give you the
         | answer?
        
           | ninetyninenine wrote:
           | Well. You know what I mean.
           | 
           | AI is also unreliable. Place chatGPT over a database and it's
           | going to start lying about what is in it eventually.
           | 
           | That being said my statement likely isn't strictly true even
           | when you account for ML. I'm sure there are even higher level
           | languages that are not probability based and thus
           | deterministic with expected results. But in terms of a
           | software developers average experience, SQL is the highest
           | level you can get without getting into the minefield that is
           | LLMs.
        
             | threeseed wrote:
             | > SQL is the highest level you can get without getting into
             | the minefield that is LLMs
             | 
             | You can use NLPs to safely write SQL in more business
             | language.
             | 
             | And of course there are GUIs.
        
             | kelnos wrote:
             | > _SQL is the highest level you can get without getting
             | into the minefield that is LLMs_
             | 
             | That's trivially false.
             | 
             | I'm currently playing with sea-orm[0], and I can write
             | things like:                   let db =
             | Database::connect(...).await?;         let yesterday:
             | chrono::DateTime<chrono::Utc> = ...;         let foos:
             | Vec<foos::Model> = Foos::find()
             | .filter(foos::Column::Status.eq(FooStatus::Frobbed))
             | .filter(foos::Column::UpdatedAt.gt(yesterday))
             | .left_join(Bars)             .also_select(Bars)
             | .all(&db)             .await?;
             | 
             | How is that not higher-level than SQL?
             | 
             | [0] https://crates.io/crates/sea-orm
        
               | cwbriscoe wrote:
               | Why not just write SQL, it would be much more readable.
        
             | cmrdporcupine wrote:
             | > SQL is the highest level you can get without getting into
             | the minefield that is LLMs.
             | 
             | I mean, not really true. Datalog is relational, like SQL,
             | but gives you a higher level rule-oriented mechanism for
             | querying that can avoid a lot of the mechanics of
             | explicitly joining things, for example.
        
         | Yoric wrote:
         | That's a good point.
         | 
         | It would be very nice to have a LLVM IR-style language for
         | database queries, though. As a language, SQL is... well, a
         | little behind times. ORMs explore lots of interesting design
         | choices, but as you mention, by piling up high-level decisions
         | on top of something that, by many aspects, are even higher
         | decisions.
         | 
         | I wonder what an LLVM IR-like would look like in this space.
        
           | ninetyninenine wrote:
           | I agree there should be a common api. The sheer number of
           | dialects with SQL contributes to a lot of issues with ORMs.
           | 
           | But SQL is definitely not something designed for computers to
           | only write or compile another language into (which is what
           | the OP is promoting).
        
             | Yoric wrote:
             | Yeah.
             | 
             | I wonder if something like GlueSQL or the IndexedDB layer
             | of most browsers could be used as such an API.
        
         | josephg wrote:
         | I hear you and I 70% agree.
         | 
         | I've also been using Prisma for a project recently and I've
         | been finding it quite good. It layers several useful features
         | on top of sql, like schema management (checked in to git), code
         | gen (incl typescript types), db connection management,
         | serialization / deserialization from JavaScript objects to sql
         | rows and back, and it lets you follow foreign keys. If you
         | want, you can also easily just make raw sql queries. That works
         | fine too.
         | 
         | Sql is a very old programming language, and it's showing its
         | age. The design is wacky - does GROUP BY go before or after
         | ORDER BY in the query? Every command has custom, seemingly
         | random words which are actually parameters. They are placed at
         | totally arbitrary, hard to remember spots in the command. And
         | what, we make our program output a string that gets immediately
         | parsed by the sql server? What a horrible, stupid way to do
         | RPC. It also gives terrible error messages if you get it wrong.
         | And it's awful to learn and debug.
         | 
         | All of that is a huge pity, because modern databases can do so,
         | so much. Doing logic inside the database can often be an order
         | of magnitude faster than doing it in client code. SQL is a
         | fantastic way to represent your data.
         | 
         | One path for fixing this would be to move the sql query parser
         | inside the client library. Then internally, have the client
         | library send optimized RPC instructions to the database. This
         | would take cpu load off the database (usually a good move -
         | databases are harder to scale). And it would open the door for
         | the client library to provide other better, lighter and faster
         | ways to programmatically construct database queries. Ideally
         | without the totally unnecessary loop of constructing then
         | parsing sql.
        
           | ninetyninenine wrote:
           | >One path for fixing this would be to move the sql query
           | parser inside the client library. Then internally, have the
           | client library send optimized RPC instructions to the
           | database. This would take cpu load off the database (usually
           | a good move - databases are harder to scale). And it would
           | open the door for the client library to provide other better,
           | lighter and faster ways to programmatically construct
           | database queries. Ideally without the totally unnecessary
           | loop of constructing then parsing sql.
           | 
           | The problem here is these low level optimizations that the
           | database is doing is freaking complicated. There's a lot of
           | intelligence going on in the background and it's different
           | depending on the database. We can solve it with a low level
           | common IR but that IR will be extremely complex. And that IR
           | will likely give different performance profiles for the same
           | code on different databases.
           | 
           | I can see this being solved for one database. But not across
           | all databases and a common api. We sort of do have a common
           | api and we paid for it with the trade off of it being a
           | easier high level language on top of the part of the web that
           | needs to be most optimized.
        
             | bpicolo wrote:
             | > can solve it with a low level common IR
             | 
             | Wouldn't that basically just be a query plan? The language
             | is the easy part. Determining the optimal query plan based
             | on rough statistics about the data and the DBs ability to
             | go about that is the hard part.
             | 
             | There are more general purpose query optimization libs out
             | there, like Calcite
        
           | avianlyric wrote:
           | > One path for fixing this would be to move the sql query
           | parser inside the client library. Then internally, have the
           | client library send optimized RPC instructions to the
           | database. This would take cpu load off the database (usually
           | a good move - databases are harder to scale). And it would
           | open the door for the client library to provide other better,
           | lighter and faster ways to programmatically construct
           | database queries. Ideally without the totally unnecessary
           | loop of constructing then parsing sql.
           | 
           | Erm, that's already true. Proper clients will be parsing SQL
           | locally and sending representations of the query, and its
           | parameters separately. They may often be using a wire-format
           | that's very close to textual SQL, but parsing that SQL is
           | hardly the slow part of executing a query.
           | 
           | DB CPU load doesn't come from parsing SQL, it comes from the
           | query planning process, and the actual work of reading and
           | filtering data. The planning process alone is extremely
           | complex as the DB will be using all manner of statistical
           | data collected about the type of data stored, and exactly how
           | it's stored, and how it various columns do or don't correlate
           | with each other, in order to estimate the best possible way
           | of performing constructing the query plan. A process that
           | factors in additional elements like the relative speeds of
           | reading from disk, vs memory, and CPU cache, and how that
           | varies with working set size.
           | 
           | In addition, the real CPU load comes from the actual
           | execution of that query. Streaming data off disks into
           | memory, building various temporary in-memory data structures
           | to accelerate joins and filters, there's a lot of bytes that
           | need to be shuffled around, and modern DB codebase are now
           | optimising for the number of CPU cycles needed to operate
           | tight inner loops. None of this work can be moved to the
           | client, not unless you have a mechanism of streaming GB/s of
           | data to your client on every query.
           | 
           | To think of SQL as just an RPC protocol completely misses the
           | point of SQL, or the incredible engineering involved in
           | modern query planners. There a many reasons to replace SQL if
           | something better, but the idea that it's an inefficient RPC
           | protocol, and that clients should have more fine-grained
           | control of the planning and execution of queries, really
           | isn't one of them.
        
         | 9rx wrote:
         | _> We don 't want to do meta programming where we have another
         | language living as a string in our web app._
         | 
         | Strings are fine in theory, but the trouble with SQL is that,
         | while too high-level in all the wrong places, it is also too
         | low-level in all the wrong places. Developers want to be able
         | to do things like define the shape of the data beyond tables
         | and compose queries, which SQL fails hard at. In order to
         | accomplish what developers need, you either end up extending
         | SQL into a new language, or you can use the programming
         | constructs you already have, treating SQL as a compiler target,
         | where necessary. The latter is considerably easier to
         | implement.
        
         | jimbokun wrote:
         | I think the sweet spot is a query builder that just ensures you
         | emit syntactically valid SQL to send to the database. So there
         | are no surprises about what the database is executing. But
         | protects against typos and security issues from building SQL
         | through string concatenation.
        
         | bob1029 wrote:
         | Every time I've found myself frustrated with a SQL query, there
         | was some horrible misstep in how the underlying schema was
         | designed.
         | 
         | Having the wisdom to craft a high quality schema can be the
         | biggest unlock in the whole enterprise. If the tables, columns
         | & relations are ~1:1 with the actual business (i.e., you went
         | and talked to real, live stakeholders at design time), then it
         | should be deeply intuitive for a non-technical domain expert to
         | interact with.
         | 
         | Often, you don't get a chance to redo the whole thing, but that
         | doesn't mean you can't still design an ideal schema and then
         | shim it with views, CTEs, replicas, ETL, etc.
         | 
         | SQL is a _domain specific_ language. It would be like
         | complaining about how horrible it is to write Lua scripts. The
         | criticism is baseless without understanding the underlying
         | schema /bindings.
        
         | habitue wrote:
         | I see your point for one part, which is that
         | `select([id]).from(Users)` is maybe in the best case as
         | readable as `select id from users`, and in many common cases
         | it'll be less readable than sql. Plus, it's not quite sql,
         | you'll have to look up how to do things a lot of times.
         | 
         | But we aren't just talking about SQL the high level language
         | that you might execute when you are poking the data live. In a
         | web app dynamically constructing sql, we have multiple issues:
         | 
         | 1. Deserialization: we need to know what the query returns and
         | plug it into data structures we want to use in our program
         | 
         | 2. SQL is dynamically typed, and if you construct queries
         | dynamically you're even more dynamically up a creek. Maybe even
         | "sql injection vulnerability" up a creek.
         | 
         | 1 and 2 are solved by strongly typed query builders like this:
         | the type system has a proof that the ways you will construct
         | queries in your program actually result in valid sql queries
         | and that they return the right data shape.
         | 
         | It's actually solving a real problem
        
         | zamalek wrote:
         | I call it "SQL fear", and I agree: people need to get over it.
         | We used EFCore at my previous job, and I don't know how much
         | time figuring out how poorly performing queries are happening.
         | The abstraction leaks like a sieve and isn't worth it.
         | 
         | Now, if we could replace SQL with a purpose-built language,
         | that I would be interested in. I have to concede that SQL
         | doesn't have great ergonomics (e.g. if FROM was first then
         | auto-complete would be better, trailing commas would eliminate
         | much formatting bike shedding, why are INSERT and UPDATE so
         | different).
        
         | threeseed wrote:
         | > It's basically a declarative statement that's almost english-
         | like and it's specifically DESIGNED to be more readable and
         | easier to use by humans
         | 
         | It has categorically failed at this task. That is what we have
         | an entire ecosystem of tools to allow people to query and
         | interact with databases without using SQL.
         | 
         | Developers need to understand that SQL is an abstraction for
         | developers, analysts, data engineers etc not end users.
         | 
         | And only basic SQL statements are English like.
        
         | Animats wrote:
         | I tend to agree.
         | 
         | The main trouble with SQL is that the query parameters are
         | mixed in with the query, which results in string escape
         | problems. That dates from when people were expected to query a
         | database by hand from SQL, which is sometimes useful.
         | 
         | For machine generated use, SQL queries should consist of a well
         | defined fill-in-the-blanks system, with a text SQL query and a
         | machine data structure. Something like this:
         | 
         | Query in SQL text as a constant string:
         | "SELECT foo FROM BAR WHERE name=NAME1"
         | 
         | Parameters: key/value pairs in a safe native format for the
         | language in use.                   {"NAME1" : "Smith"}
         | 
         | That gets you out of the curse of character escaping and SQL
         | injection, the only real problem. It avoids trying to rewrite
         | SQL in yet another ORM.
         | 
         | Microsoft has something which works this way, but it's
         | unnecessarily clunky.[1]
         | 
         | [1] https://learn.microsoft.com/en-us/sql/relational-
         | databases/n...
        
           | Animats wrote:
           | This suggests a simple API using a Rust macro:
           | fill_in_sql!(query_string, query_params)
           | 
           | Where query_params is a structure containing the parameters:
           | struct some query {             name: &str,             id:
           | u32,         }
           | 
           | The macro would parse the query string, find the variables to
           | be filled in, match them to fields in the "params" structure,
           | and generate the code to safely escape and fill in the
           | params. Any matching failures are found at compile time.
           | 
           | This is similar to what the usual "derive" macros do -
           | examine a data structure at compile time and generate
           | appropriate code.
        
             | kelnos wrote:
             | I've used systems like this in Scala. It's actually pretty
             | magical to write something like:                   val foos
             | = sql"SELECT * from foos WHERE status =
             | 'frobbed'".as[List[Foo]]
             | 
             | ... and have the macro system parse the query at compile
             | time and verify that the query will indeed return a
             | `List[Foo]`.
             | 
             | I think this is a nice middle ground, but constructing
             | dynamic queries often can't be checked at compile-time.
             | 
             | (I'm probably getting the syntax somewhat wrong; I haven't
             | done Scala in quite a few years at this point.)
        
           | kelnos wrote:
           | I don't really think SQL injection bugs are the main driver.
           | Certainly everyone should be using a query builder that
           | protects them from this (and yes, I know, there are still
           | people who screw this up), but I think the issues with SQL go
           | far beyond that.
           | 
           | Personally I would much rather deal with a database table as
           | a collection of objects with various fields in them. In a
           | language with a reasonable collections library, I want to be
           | operating on that data using functional combinators like
           | filter, map, group_by, sort/order_by, etc. That feels much
           | more natural to me than writing out stringly-typed SQL by
           | hand. This is sorta an ORM, but not quite: it's somewhere in
           | between writing raw queries, and using a full-fledged
           | (usually inefficient) ORM.
           | 
           | The downside, of course, is that the performance of my
           | queries is at the mercy of whatever the query generator
           | underneath it is doing. But the nice thing is that if I need
           | to optimize a query, I can always bypass the generator and
           | write it by hand. But I wouldn't do that until I've measured
           | things; no reason to prematurely optimize.
        
           | avianlyric wrote:
           | > For machine generated use, SQL queries should consist of a
           | well defined fill-in-the-blanks system, with a text SQL query
           | and a machine data structure. Something like this:
           | 
           | Erm, that's exactly how modern DB and DB clients work. The
           | client passes the SQL query with placeholders for parameters,
           | and then sends the parameters separately. It's also a core
           | part of how prepared SQL queries work.
           | 
           | Every SQL client in any worthwhile language implements this
           | pattern of passing a SQL query and its parameters as separate
           | data structures, and they have done for decades. SQL
           | injection only happens when people don't use their SQL client
           | properly, and attempt to compose SQL queries using naive
           | string templating. Effectively doing an end run around all
           | the protections DB and clients provide by default.
        
         | Seb-C wrote:
         | > The ideal API would be one that is both high level but allows
         | fine grained control... but we don't have one yet.
         | 
         | What kind of fine grained control are you talking about here? I
         | never felt like I was missing anything when doing even complex
         | SQL (unless I am forced to use a crappy MySQL, which problems
         | are not inherent to SQL itself).
         | 
         | Quite the contrary actually: despite having this amazing high
         | level of abstaction, most developers are ignorant or don't care
         | to properly index the data, add integrity constraints, manage
         | the locks or think about transactions.
         | 
         | In 99% of the non-SQL code that I see, developers don't even go
         | further than nested loops containing IOs, so I don't think that
         | letting them decide which tree/trie is appropriate for a user
         | case (for example) would help in any way.
        
         | swaits wrote:
         | > almost english-like
         | 
         | Real world SQL from the analytics, DE, BIE side of things is
         | _extremely far_ from readable. It's a form of actual hell.
        
       | tyleo wrote:
       | It looks like you just got this blog set up. Congratulations!
        
       | Aeolun wrote:
       | It seems that migrations with individual row level manipulation
       | would be _tragically_ slow to execute to me? Like, I have a table
       | with a billion rows, and any normal update statement takes up to
       | an hour. I don't want to think what an update per row would take.
        
         | Hytak wrote:
         | Hi, migrations are 1 select statement + `n` insert statement
         | for `n` rows right now.
         | 
         | This might be improved to insert in batches in the future
         | without changing the API.
        
           | Aeolun wrote:
           | I imagine a select on 1B rows will be equally problematic :)
           | 
           | But you are right, it's not something you necessarily have to
           | worry about now. It just means it doesn't work for our use
           | case.
           | 
           | Just figured I should point it out.
        
       | levkk wrote:
       | My main concern with application-defined schemas is that this
       | schema is validated by the wrong system. The database is the
       | authority on what the schema is; all other layers in your
       | application make assumptions based on effectively hearsay.
       | 
       | The closest we came so far to bridging this gap in strictly typed
       | language like Rust is SQLx, which creates a struct based on the
       | database types returned by a query. This is validated at compile
       | time against a database, which is good, but of course there is no
       | guarantee that the production database will have the same types.
       | Easiest mistake to make is to design a query against your local
       | Postgres v15 and hit a runtime error in production running
       | Postgres v12, e.g. a function like gen_ramdom_uuid() doesn't
       | exist. Another is to assume a migration in production was
       | actually executed.
       | 
       | In duck-typed languages like Ruby, the application objects are
       | directly created from the database at runtime. They are as
       | accurate as possible, since the schema is directly read at
       | application startup. Then of course you see developers do
       | something like:                   if respond_to?(:column_x)
       | # do something with column_x         end
       | 
       | To summarize, I think application-defined schemas provide a false
       | sense of security and add another layer of work for the engineer.
        
         | IshKebab wrote:
         | This doesn't seem fundamentally different from any schema/API
         | mismatch issue. For example using the wrong header for a C
         | library, or the wrong Protobuf schema.
         | 
         | I guess it would be good if it verified it at runtime somehow
         | though. E.g. when you first connect to the database it checks
         | Postgresql is the minimum required version, and the tables
         | match what was used at compile time.
        
           | dietr1ch wrote:
           | It could be verified at runtime, but I haven't seen anyone
           | trying to version/hash schemas and include that in the
           | request.
           | 
           | The workaround in practice seems to be to keep the DB behind
           | a server that always(tm) uses a compatible schema and exposes
           | an API that's either properly versioned or at least safe for
           | slightly older clients. To be fair it's hard to get rid of
           | the middleman and serve straight from the DB, it's always
           | deemed too scary for many reasons, so it's not that bad.
        
           | Someone wrote:
           | It shouldn't be hard for a database to keep a hash around for
           | each database, update it whenever a DDL
           | (https://en.wikipedia.org/wiki/Data_definition_language)
           | command is run, and, optionally, verify that a query is run
           | against the exact database structure.
           | 
           | Could be as simple as secure hashing the old hash with the
           | text of the DDL command appended.
           | 
           | That would mean two databases can be identical, structure-
           | wise, but have different hashes (for example if tables are
           | created in a different order), but would that matter in
           | practice?
           | 
           | Alternatively, they can keep a hash for every table, index,
           | constraint, etc. and XOR them to get the database hash.
        
             | jeltz wrote:
             | Sounds very hard to me. How do you handle online schema
             | changes with this? Schema changes are trivial to do if you
             | can be down for a long time.
        
         | Hytak wrote:
         | rust-query manages migrations and reads the schema from the
         | database to check that it matches what was defined in the
         | application. If at any point the database schema doesn't match
         | the expected schema, then rust-query will panic with an error
         | message explaining the difference (currently this error is not
         | very pretty).
         | 
         | Furthermore, at the start of every transaction, rust-query will
         | check that the `schema_version` (sqlite pragma) did not change.
         | (source: I am the author)
        
           | mjr00 wrote:
           | > rust-query manages migrations and reads the schema from the
           | database to check that it matches what was defined in the
           | application. If at any point the database schema doesn't
           | match the expected schema, then rust-query will panic with an
           | error message explaining the difference (currently this error
           | is not very pretty).
           | 
           | IMO - this sounds like "tell me you've never operated a real
           | production system before without telling me you've never
           | operated a real production system before."
           | 
           | Shit happens in real life. Even if you have a great
           | deployment pipeline, at some point, you'll need to add a
           | missing index in production _fast_ because a wave of users
           | came in and revealed a shit query. Or your on-call DBA will
           | need to modify a table over the weekend from i32 - > i64
           | because you ran out of primary key values, and you can't
           | spend the time updating all your code. (in Rust this is
           | dicier, of course, but with something like Python shouldn't
           | cause issues in general.) Or you'll just need to run some
           | operation out of band -- that is, not relying on a migration
           | -- because it what makes sense. Great example is using
           | something like pt-osc[0] to create a temporary table copy and
           | add temporary triggers to an existing table in order to do a
           | zero-downtime copy.
           | 
           | Or maybe you just need to drop and recreate an index because
           | it got corrupted. Shit happens!
           | 
           | Anyway, I really wouldn't recommend a design that relies on
           | your database _always_ agreeing with your codebase _100% of
           | the time_. What you should strive for is your codebase being
           | _compatible_ with the database 100% of the time -- that means
           | new columns get added with a default value (or NULL) so
           | inserts work, you don 't drop or rename columns or tables
           | without a strict deprecation process (i.e. a rename is really
           | add in db -> add writes to code -> backfill values in db ->
           | remove from code -> remove from db), etc...
           | 
           | But fundamentally panicking because a table has an extra
           | column is crazy. How else would you add a column to a running
           | production system?
           | 
           | [0] https://docs.percona.com/percona-toolkit/pt-online-
           | schema-ch...
        
             | threeseed wrote:
             | > Even if you have a great deployment pipeline, at some
             | point, you'll need to add a missing index in production
             | fast because a wave of users came in and revealed a shit
             | query.
             | 
             | This sounds more like a CI/CD and process issue.
             | 
             | There is no reason why adding a new index in code and
             | deploying it into Production should be more complex or
             | error prone than modifying it on the database itself.
        
               | mjr00 wrote:
               | Direct execution of `CREATE INDEX...` on a database table
               | is always going to be faster than going through a normal
               | deployment pipeline. Even if we assume your pipeline is
               | _really_ fast, which is probably not the case at most
               | orgs, you are still comparing a single SQL statement
               | execution, to a single SQL statement execution + git push
               | + code reviews + merge + running through Jenkins
               | /Circle/whatever. How long does that overhead take? How
               | much money have you lost because your website won't load
               | when your post is on the frontpage of HN? Seconds and
               | minutes count. I don't want my code crashing because an
               | unexpected index exists in this scenario.
        
               | threeseed wrote:
               | You should be able to deploy end to end to Production in
               | less than a minute.
               | 
               | Companies should be focused on solving that problem first
               | before doing insanely short-sighted workarounds like
               | skipping pushing to Git and code reviews.
        
               | mjr00 wrote:
               | > You should be able to deploy end to end to Production
               | in less than a minute.
               | 
               | When I was at AWS (RDS) our end-to-end production
               | deployment process was 7 days. We were also pulling
               | $25million/day or so in profit. I'm sure that number is
               | much higher now.
               | 
               | There's a large difference between what the theoretical
               | "right" thing is from a textbook perspective, and what
               | successful engineering teams do in reality.
               | 
               | edit: besides, it doesn't even make sense in this
               | context. I have 100 servers talking to the database. I
               | need to create an index, ok, add it to the code. Deploy
               | to server 1. Server 1 adds the index as part of the
               | migration process, and let's say it's instant-ish (not
               | realistic but whatever). Do the other 99 servers now
               | panic because there's an unexpected index on the table?
        
               | kelnos wrote:
               | That's a lovely ideal, but I'm the real world, there are
               | relatively few companies that meet that metric.
        
               | threeseed wrote:
               | I've worked at FAANG and enterprise companies and we
               | managed to do it.
               | 
               | There are no technical reasons why it can't be done. Only
               | process and will.
        
               | kelnos wrote:
               | Yes, and that's exactly the point. The reality doesn't
               | usually match the ideals, and many orgs do not have good
               | process, and do not have the political will to get good
               | process implemented. Part of being a professional is
               | recognizing where reality falls short of the ideals (an
               | all-too-common occurrence), and doing the best you can to
               | successfully get your work done in that environment.
               | 
               | And of course I don't know which FAANGs you worked at,
               | but I know folks at FAANGs who have complained to me
               | about CI and deployment times. Hell, these are huge
               | companies; while they try to harmonize tooling,
               | deployment times (especially when test suites of varying
               | quality are involved) can vary a lot across a company. I
               | wouldn't be surprised if there were people at the
               | companies you worked at that were upset with deployment
               | times, even if the teams you worked on were in good
               | shape.
               | 
               | Honestly, when someone suggests something like you've
               | suggested (that everyone should be able to get their
               | deployment times to under a minute), I really do wonder
               | if they're intentionally arguing in bad faith or are
               | trolling. I know for a fact that things are not that
               | rosy, and are rarely that rosy, even at the companies you
               | claim to have worked at, and it's hard to believe that
               | anyone could genuinely think that this is a broadly-
               | attainable target. That doesn't mean that _no one_ can do
               | it, but that does mean that designing tooling that
               | assumes everyone can do it is... well, just kinda naive
               | and not very useful.
        
               | threeseed wrote:
               | You have two choices: (1) try and solve your deployment
               | issues or (2) make unmanaged, untested, unreviewed
               | changes directly in Production.
               | 
               | Now you may say I'm just trolling but option (1) seems
               | better to me for the long-term health of the
               | project/company. And I don't believe it's correct to say
               | it is an unrealistic goal.
        
               | pclmulqdq wrote:
               | There are lots of reasons to do slow rollouts. You should
               | be rolling out in stages anyway.
        
               | swiftcoder wrote:
               | > I've worked at FAANG and enterprise companies and we
               | managed to do it.
               | 
               | You have a very different experience to the rest of us,
               | in that cases.
               | 
               | The big AWS services all had deployments measured in days
               | - or even weeks (depending on how many regions they are
               | deployed across). Facebook's monorepo took upwards of an
               | hour just to get a PR through the merge queue. Both were
               | notorious for "hand-jamming" critical fixes directly to
               | production.
        
               | jeltz wrote:
               | I don't think I have ever seen a non-toy project where
               | that was the case.
        
               | tempodox wrote:
               | You do code review in less than a minute?
        
               | threeseed wrote:
               | You do code reviews/automated testing in lower
               | environments _before_ you make changes directly to
               | Production.
               | 
               | And in this case if it's an emergency hot fix then it's
               | still better to do this through a managed, tracked,
               | tested pipeline.
        
             | kelnos wrote:
             | It's a bummer that you've been downvoted, because it really
             | does seem like people here have not operated databases at
             | scale.
             | 
             | I will never claim that we were great at managing databases
             | at Twilio, but often a schema change would take hours,
             | days, or even a week or two to complete. We're taking about
             | tables with hundreds of millions of rows, or more.
             | 
             | We'd start the change on a DB replica. When it would
             | finish, we would have to wait for the replica to catch up
             | with the primary. Then we would bring up new replicas,
             | replicating from the replica with the new schema. Finally
             | that replica would get promoted to primary, with all the
             | old replicas (and the old primary, of course) removed from
             | service, and the new replicas brought in.
             | 
             | Only then could we deploy code that was aware of and used
             | the updated schema. The previous code of course had to
             | ignore unknown columns, and if we ever wanted to drop a
             | column, we had to first deploy code that would stop using
             | that column. Any column type changes would need to be
             | backwards-compatible. If that wasn't possible, we'd have to
             | add a new column and backfill it. Adding indexes would
             | usually be fine without preparatory code changes, but if we
             | wanted to drop an index we'd first have to make sure there
             | were no queries still depending on it.
             | 
             | Even for a "small" schema change that "only" took minutes
             | or a few tens of seconds to complete, we'd still have to
             | use this process. What, do you think we'd shut part or all
             | of a real-time communications platform down while we do a
             | schema change? Of course not.
             | 
             | The idea that the application could or should be in control
             | of this process, or could always be in sync with the
             | database when it came to its understanding of the schema,
             | is impossibly unrealistic.
        
               | mjr00 wrote:
               | Yep, sounds like we have similar experiences! I first had
               | to start thinking about this stuff at Hootsuite, back in
               | the exciting 1million+ DAU days a decade ago. Before
               | then, to me databases were just a thing that got deployed
               | along with the application, and deploys only happened on
               | a Friday night so who cares about downtime? By the time
               | anyone tries logging into the app on Monday morning, the
               | code and database will all be up to date. Going to a
               | place where deploys were happening constantly _and_
               | nonzero downtime was unacceptable was eye-opening.
               | 
               | > The idea that the application could or should be in
               | control of this process, or could always be in sync with
               | the database when it came to its understanding of the
               | schema, is impossibly unrealistic.
               | 
               | These days my attitude is to treat databases as a
               | _completely separate service_ from the application code,
               | which they effectively are. They 're on a different set
               | of servers, and the interface they provide is the
               | columns/tables/views/etc, accessed through SQL. So yeah,
               | no breaking changes, and the only thing application code
               | should care about is if the queries it tries to execute
               | return the expected sets of data, not if the schema
               | itself matches. And certainly not about things like
               | views, triggers or indexes.
               | 
               | This does end up being more overhead than migrations
               | alongside the application code, which I know a lot of
               | developers prefer because they're easier to use, but the
               | approach just doesn't work after a certain scale.
               | 
               | (to be clear, I still use Liquibase etc to manage
               | migrations, the process for applying those changes is
               | just completely separate from deploying application
               | code.)
        
               | kelnos wrote:
               | > _These days my attitude is to treat databases as a
               | completely separate service from the application code,
               | which they effectively are. They 're on a different set
               | of servers, and the interface they provide is the
               | columns/tables/views/etc, accessed through SQL._
               | 
               | I've never thought of it this way, but I think this is
               | really smart. If I have a service that exposes a REST
               | API, I can, say, add a new field to a JSON object that's
               | returned from an API endpoint without telling clients
               | about it. Those clients can update later in order to take
               | advantage of the information returned in the new field.
               | 
               | Same thing with a database: I can add a new column, and
               | clients can learn about the new column later in the
               | future, no problem. The database schema is just a part of
               | the database's API, and it can be evolved in a backwards-
               | compatible manner just like any other API.
               | 
               | > _to be clear, I still use Liquibase etc to manage
               | migrations, the process for applying those changes is
               | just completely separate from deploying application
               | code._
               | 
               | Right, the schema needs to be managed and there needs to
               | be a source of truth for it, with tooling to do
               | migrations, but coupling that so closely with the
               | application so the schema and application always must be
               | in sync (like some others seem to think is the One True
               | Way) is a mistake, and would be a complete non-starter
               | for my past professional needs.
        
               | jessekv wrote:
               | IMO its a similar situation to the discussion here:
               | 
               | https://capnproto.org/faq.html#how-do-i-make-a-field-
               | require...
        
               | dathinab wrote:
               | I had to realize that at least in start up world most
               | (non db focused) devs thing they might not be experts in
               | SQL but have a very solid understanding
               | 
               | ... and then don't know about a lot of very fundamental
               | important parts and are blissfully unaware about that,
               | too.
               | 
               | And to be clear I'm not saying they don't remember the
               | exact details of something.
               | 
               | What I mean they don't even know that there are things
               | they have to look up, nor any experience or willingness
               | to understand what they did wrong by consulting the
               | official documentation instead of just randomly googling
               | and trying out "solutions" until one seem to happen to
               | work.
               | 
               | The most common example would be having so little
               | understanding about transaction that they believe
               | transactions are just magically fixing all race
               | conditions, and then then being very surprised that they
               | don't. Or believing that transactions in SQL are
               | fundamentally broken after realizing that somehow their
               | databases got corrupted.
               | 
               | And again I don't mean junior deves, but people with 10+
               | years of backend or "fullstack" experience, i.e. people
               | which at least should know that when to consult
               | documentation/lookup protections transactions provide
               | etc.
               | 
               | I have seen more then one time a (final state of) the
               | situation where people started with believing SQL
               | transaction magically fix everything, then get
               | "corrupted" data then blame SQL for being broken and move
               | to NoSql.
               | 
               | The joke here is all the concurrency problem are very
               | fundamental and independent of SQL vs. NoSQL.
               | 
               | And SQL often gives you more powerful/easy to use (at
               | small scale) tools to enforce synchronization, but at a
               | cost. While NoSQL often gives you harder to use
               | primitives where you have to do much more outside of the
               | database to guarantee correctness, but then at least you
               | will more likely blame you code instead of the db for
               | things not working.
               | 
               | The most ironic thing here is I'm not a db expert, I just
               | know where my knowledge stops and where I can lookup the
               | missing parts and can't even give you much tips about
               | huge dbs in production luckily surprisingly many
               | companies have comparatively "small" db needs.
               | 
               | And honest where I see race condition related issues in
               | SQL quite often I'm rarely not seeing them in NoSQL code.
               | Where this issues in SQL make me sad as they are often
               | very avoidable in NoSQL I often feel like giving up in
               | resignation.
               | 
               | Through that experience is for "smallish" databases not
               | Twillo scale. But a surprising large amount of companies
               | have surprisingly "smallish" databases. Like no joke I
               | have seen companies being very vocal about their "huge
               | database" and then you realize it's like 5GiB ;=)
               | 
               | Honestly I (metaphorically speaking) don't even want to
               | know how db experts feel about this, I'm not a db expert
               | and just have a solid enough foundation to know where my
               | knowledge stops and when I have to look things up (which
               | is all the time, because I'm not writing that much SQL).
        
             | jessekv wrote:
             | IMO "parse, don't validate" can apply to data coming out of
             | the database too.
        
             | clutchski wrote:
             | This is the correct answer.
        
           | kelnos wrote:
           | In addition to the deployment-time issues and other stuff I
           | and others have commented downthread, I thought of another
           | problem with this.
           | 
           | I can't see how this would even work for trivial, quick, on-
           | line schema changes. Let's say I have 10 servers running the
           | same service that talks to the database (that is, the service
           | fronting the database is scaled out horizontally). How would
           | I do a migration? Obviously I can't deploy new code to all 10
           | servers simultaneously that will do the schema migration;
           | only one server can run the migration. So one server runs the
           | migration, and... what, the other 9 servers immediately panic
           | because their idea of the schema is out of date?
           | 
           | Or I deploy code to all 10 servers but somehow designate that
           | only one of them will actually do the schema migration. Well,
           | now the other 9 servers are expecting the new schema, and
           | will panic before that 1 server can finish doing the
           | migration.
           | 
           | It seems to me that rust-query is only suitable for
           | applications where you have to schedule downtime in order to
           | do schema changes. That's just unacceptable for any business
           | I've worked at.
        
             | dayjah wrote:
             | This isn't unique to rust-query; this problem also exists
             | with ActiveRecord, for example. At Twitch we just had to
             | _really_ think about our migrations and write code to
             | handle differences.
             | 
             | Basically no free lunch!
        
             | rendaw wrote:
             | I think first and foremost, if you're going to use a tool
             | like this, you need to do everything through the tool.
             | 
             | That said, for zero downtime migrations there are a number
             | of techniques, but it typically boils down to splitting the
             | migration into two steps where each step is rolled out to
             | each server before starting the next:
             | https://teamplify.com/blog/zero-downtime-DB-migrations/
             | https://johnnymetz.com/posts/multistep-database-changes/
             | etc
             | 
             | I'm not sure if there's anything that automates this, but
             | it'd probably need to involve the infrastructure layer
             | (like terraform) too.
             | 
             | Edit: There's one other approach I've heard of for zero
             | downtime deployments:
             | 
             | Start running the new version in new instances/services
             | parallel to the old version, but pause it before doing any
             | database stuff. Drain client connections to the old version
             | and queue them. Once drained, stop the old version, perform
             | database migrations, and start the new version, then start
             | consuming the queue.
             | 
             | This is (I think) more general but you could get client
             | timeouts or need to kill long requests to the old version,
             | and requires coordination between infrastructure (load
             | balancer?) and software versions.
        
             | zozbot234 wrote:
             | You can always use SQL views to expose a version of your
             | schema to apps that's different from what's in the
             | underlying DB tables.
        
           | Merad wrote:
           | Unless you have some tricks up your sleeve that I'm not
           | thinking of, an immediate consequence of this is that zero
           | downtime deployments and blue/green deployments become
           | impossible. Those both rely on your app being able to run in
           | a state where the schema is not an exact match for what the
           | app expects - but it's compatible so the app can still
           | function.
        
             | ComputerGuru wrote:
             | Semantic versioning?
        
               | Merad wrote:
               | If I understand the GP correctly, there's no notion of
               | semver involved. Any difference in the schema results in
               | a runtime error.
        
               | ComputerGuru wrote:
               | Yes, that's what I gathered from the code. But I was
               | proposing it as an "easy" solution that doesn't involve
               | throwing away OP's main idea.
        
             | Filligree wrote:
             | And that's okay. Most applications don't need zero-downtime
             | deployments, and there are already plenty of APIs that
             | support that use case. I'd rather have more like this one.
        
           | Filligree wrote:
           | I'm so glad you made this. I've been searching for a decent
           | Rust database library for half a year already, and this ticks
           | all the boxes.
           | 
           | I haven't tried it yet, so I might have to eat my words
           | later, but- great job! It's going to save tons of effort.
        
           | theptip wrote:
           | In all the systems I've built (mostly Django) you need to
           | tolerate vN and vN+1 simultaneously; you are not going to
           | turn off your app to upgrade the DB.
           | 
           | You'll have some Pods on the old application version while
           | you do your gradual upgrade.
           | 
           | How do you envision rolling upgrades working here?
        
         | sobellian wrote:
         | Surely it is easier to just check that all migrations have run
         | before you start serving requests? Column existence is
         | insufficient to verify that the database conforms to what the
         | application expects (existence of indices, foreign key
         | relationships with the right delete/update rules, etc).
        
         | Kinrany wrote:
         | The application is necessarily the authority on its
         | expectations of the database.
        
           | mjr00 wrote:
           | You can see my sibling comment, but in the real world of
           | operating databases at any sort of scale, you need to have
           | databases in transitory states where the application can
           | continue to function even though the underlying database has
           | changed.
           | 
           | The quintessential example is adding a column. If you want to
           | deploy with zero downtime, you have to square with the
           | reality that a database schema change and deployment of
           | application code is not an atomic operation. One must happen
           | before the other. Particularly when you deal with fleets of
           | servers with blue/green deploys where server 1 gets deployed
           | at t=0minutes but server N doesn't get deployed until
           | t=60minutes. Your application code will straight up fail if
           | it tries to insert a column that doesn't exist, so it's
           | necessary to change the database first. This normally means
           | adding a column that's either nullable or has a default
           | value, to allow the application to function as normal,
           | without knowing the column exists.
           | 
           | So in a way, yes, the application is still the authority, but
           | it's an authority on the _interface_ it expects from the
           | database. It can define which columns _should_ exist, but not
           | which columns _should not_ exist.
        
             | Hytak wrote:
             | You (and many other commenters) are right that rust-query
             | currently requires downtime to do migrations. For many
             | applications this is fine, but it would still be nice to
             | support zero-downtime migrations
             | 
             | Your argument that the application should be the authority
             | on the _interface_ it expects from the database makes a lot
             | of sense. I will consider changing the schema check to be
             | more flexible as part of support for zero-downtime
             | migrations.
        
           | throwawaymaths wrote:
           | You might have more than one application hitting the same
           | database
        
           | kelnos wrote:
           | Absolutely not. Certainly it's necessary that the database
           | schema be _compatible_ with whatever the application believes
           | the schema to be, but the application need not be in control
           | of it, and at the orgs I 've worked at, there's no way we
           | could build zero-downtime systems if the application had to
           | be in charge of schema.
           | 
           | Consider even a very simple case: let's say I have a database
           | with two (identical) application servers talking to it (that
           | is, I've horizontally scaled my application due to load and
           | availability requirements). If I need to do a schema change,
           | and the application needs to be in charge of schema, how
           | would that even work? If I deploy the change & migration to
           | one of the two servers, once the migration is complete, the
           | second server will freak out because the schema doesn't match
           | its "authoritative" view anymore. If I deploy the change to
           | both servers at the same time, and somehow designate one of
           | them to actually run the migration, then the other one will
           | immediately panic on startup because the first server hasn't
           | completed the migration yet.
           | 
           | Not to mention this setup breaks red/black deployments: even
           | in the case where I only have one application server in front
           | of the database, how do I bring up a new deployment, allow
           | both servers to run while the new one is passing health
           | checks, and then bring down the original server? They would
           | both have different "authoritative" views of what the schema
           | should be.
           | 
           | This also completely breaks the ability to roll back, at
           | least without also rolling the schema back too. That's risky;
           | I don't want to have my service rollback depend on schema
           | rollback also working properly.
           | 
           | This kind of "application is authoritative about schema" only
           | works when you can schedule downtime to do schema changes.
           | That would be a non-starter for any company I've worked at.
        
             | Kinrany wrote:
             | The application can ensure that its assumptions hold
             | without asserting the exact state of the database.
             | 
             | Almost all migrations are idempotent and backwards-
             | compatible, and almost all of the rest of them can be made
             | to be by splitting into multiple changes and waiting for
             | old versions of the application to be shut down.
        
           | seanhunter wrote:
           | How does that work in the case where you have one database
           | and n applications using it? Are they all somehow the
           | authority, or do you have a way to pick one?
        
         | ninetyninenine wrote:
         | agreed. Maybe having a schema check on the build step of the
         | application will solve this. If the schema doesn't match then
         | it doesn't compile. Most orms of course do the opposite. They
         | generate a migration for the database from the code.
        
         | ris wrote:
         | And you end up with no canonical declaration of the schema in
         | your application code, leaving developers to mentally apply
         | potentially tens, hundreds of migrations to build up an idea of
         | what the tables are expected to look like.
        
           | eddd-ddde wrote:
           | No matter how you define your schemas, you still have a
           | series of migrations as data evolves. This is not an issue of
           | schema definition.
        
         | ojkelly wrote:
         | Would it make more sense to consider the response from the DB,
         | like a response from any other system or user input, and take
         | the parse don't validate approach?
         | 
         | After all, the DB is another system, and its state can be
         | different to what you expected.
         | 
         | At compile time we have a best guess. Unless there was a way to
         | tell the DB what version of the schema we think it has, it
         | could always be wrong.
        
         | runeks wrote:
         | > This is validated at compile time against a database, which
         | is good, but of course there is no guarantee that the
         | production database will have the same types. Easiest mistake
         | to make is to design a query against your local Postgres v15
         | and hit a runtime error in production running Postgres v12,
         | e.g. a function like gen_ramdom_uuid() doesn't exist. Another
         | is to assume a migration in production was actually executed.
         | 
         | One could have the backend fetch DB schema/version info at
         | startup, compare it to its own view of what the schema should
         | look like, and fail if the two disagree. That way, a new
         | deployment would fail before being activated, instead of being
         | deployed successfully and queries failing down the line.
        
         | nurettin wrote:
         | > schema is validated by the wrong system. The database is the
         | authority on what the schema is
         | 
         | What you describe is db-first. This rust library is code-first.
         | In code-first, code is responsible for generating ddl
         | statements using what is called a "migration" where the library
         | detects changes to code and applies them to the schema.
        
         | shermantanktop wrote:
         | Agree. Mid-tier developers who create queries for a SQL db to
         | execute are doing the equivalent of using Java code to generate
         | HTML. The target is not a programmatic API, it's a language
         | that was designed for end users, and it is both more expressive
         | and more idiosyncratic than any facade you build in front of
         | it.
        
       | api wrote:
       | This is close to something I've wanted to see for a long time: a
       | language where relational models are first class citizens and the
       | engine for data storage is abstracted away.
       | 
       | Tables, joins, etc. should all be representable within the
       | language's type system and standard libraries. Connectivity to a
       | database should be handled by plugins/implementations of storage
       | engines. If no storage engine is used, the relational data just
       | lives in RAM like any other variable, collection, etc.
       | 
       | It's kind of the opposite of an ORM. An ORM tries to bolt a
       | relational database into an OOP or other language model. This
       | wraps the language around relational database concepts, fixing
       | the impedance mismatch in the language rather than with a
       | translation layer.
       | 
       | It also means one does not have to reinvent relational concepts
       | (badly) in every application with a complex data model even if
       | that application is _not_ backed by a database.
        
       | summerlight wrote:
       | >My opinion is that SQL should be for computers to write. This
       | would put it firmly in the same category as LLVM IR.
       | 
       | Although I partially agree with the high level idea of this
       | statement (let's accept the reality; no mature database is going
       | to support your new shiny query language anytime soon), SQL is
       | not the most convenient language to be written by code generator.
       | Actually this is a sort of nightmare; a seemingly simple plan
       | optimization would completely change the layout of the query so
       | you're going to have a decent chance to write a full query
       | rewrite stage just for the sake of workaround for a specific
       | query engine. Google's SQL pipe proposal seems to make it a bit
       | better but it has the same problem of new query languages unless
       | it's broadly adopted.
        
       | omani wrote:
       | everytime I see something like this I question myself why I would
       | want or need this over postgREST.
       | 
       | I mean why reinvent the wheel? we live in a time where the DB
       | itself is a REST API with postgREST.
       | 
       | https://postgrest.com
        
         | satvikpendem wrote:
         | Why would I want to interact with my database over the network
         | when I can just...do it locally? It's orders of magnitudes
         | slower and more wasteful and it doesn't even provide all the
         | guarantees that using actual SQL or an ORM does.
        
           | kccqzy wrote:
           | If you are using PostgREST as intended, you don't have an
           | application layer any more. It's not something extra you put
           | in between the application and the database, but it replaces
           | the application.
        
             | satvikpendem wrote:
             | Ah so it's like Hasura, it's a BaaS platform. Unfortunately
             | I don't like putting all my logic in the client only.
        
               | kccqzy wrote:
               | No the logic is in the database. They encourage you to
               | write functions to achieve custom logic in the database.
               | Search for `create or replace function` in their tutorial
               | to see an example.
        
               | satvikpendem wrote:
               | Yeah that is also something I don't want to do, after
               | having used Hasura at a previous place. After a while,
               | the logic gets convoluted when it's all in the database
               | only.
        
       | satvikpendem wrote:
       | When this was posted on reddit, I saw someone [0] ask how it was
       | different from diesel-rs (and diesel-async, the officially
       | supported add-on crate, lest someone says diesel doesn't have
       | async).
       | 
       | I saw some replies by the diesel maintainer about how the creator
       | of this Rust-Query might not have really understood in-depth how
       | diesel worked and wanted to write their own solution, and there's
       | nothing wrong with that, of course, but this thread might be some
       | good context for others asking themselves similar questions.
       | 
       | [0] https://www.reddit.com/r/rust/s/6midd79iDo
        
         | pstoll wrote:
         | When a sample showing a new SQL thing has eg
         | 
         | a) no id types eg for User, just a string b) uses i64 for
         | "stars" C) any other of a million things...
         | 
         | I quickly conclude it's likely a cute toy by someone who wanted
         | to write a thing, not ever likely to be a battle-hardened
         | effort brought forth by a high performance / scale database
         | tortured soul. And I move on.
        
         | rendaw wrote:
         | I'm (was? not sure after seeing this) working on a similar
         | thing to this, and the most significant difference between this
         | and diesel is that this has stronger safety guarantees by
         | controlling migrations as well, whereas diesel just trusts the
         | database is in the state you claim it is.
        
           | weiznich wrote:
           | You get the same safety guarantees with diesel by using the [
           | `embed_migration!`](https://docs.diesel.rs/2.2.x/diesel_migra
           | tions/macro.embed_m...) macro and running your applications
           | on startup. Diesel decouples that as there are situations
           | where you might want to work with a slightly different
           | database as defined, for example if you don't control the
           | database.
        
             | rendaw wrote:
             | I don't get the impression that there's any interaction or
             | data sharing between the embedded migrations and the query
             | generation part, that would validate the queries against
             | the schema as defined by the migration -- is that wrong?
             | The documentation doesn't seem to mention it at least.
        
       | mijoharas wrote:
       | > The actual row numbers are never exposed from the library API.
       | Application logic should not need to know about them.
       | 
       | This point confuses me. If we're thinking about a web server,
       | you'll pass your data to the frontend with a row ID, so that they
       | can refer to and modify that data in another request? How would
       | it work otherwise?
       | 
       | Am I missing something? I can think of some use cases, but this
       | limitation cuts down on the uses I'd have for this massively.
        
         | twiss wrote:
         | I'm guessing they're talking about
         | https://www.complexsql.com/rowid-rownum/, i.e. the row numbers
         | defined by the database, rather than the value of the ID column
         | you typically create in each table.
         | 
         | I.e., it should still be possible to create a column with a
         | public identifier by which you can refer to the row, I assume.
        
         | Hytak wrote:
         | Sending row IDs to your frontend has two potential problems:
         | 
         | - Row IDs can be reused when the row is deleted. So when the
         | frontend sends a message to the backend to modify that data, it
         | might accidentally modify a different row that was created
         | after the original was deleted.
         | 
         | - You may be leaking the total number of rows (this allows
         | detecting when new rows are created etc. which can be
         | problematic).
         | 
         | If you have nothing else to uniquely identify a row, then you
         | can always create a random unique identifier and put a unique
         | constraint on it.
        
       | skeptrune wrote:
       | Diesel basically being the same as Postgres SQL is an awesome
       | feature. I'd much rather than then an abstracted query language
       | which is difficult to understand.
        
       | runeks wrote:
       | Nit: a DB query can never be made type safe (avoiding runtime
       | failure) because its correctness depends on the state of an
       | external service, which may change at any point in time.
        
       | rendaw wrote:
       | This is awesome, but also kind of hurts since I was working on
       | something similar here: https://github.com/andrewbaxter/good-
       | ormning
       | 
       | They're very similar: Both define schema version "snapshots" not
       | "diffs", with diffing happening automatically, and check queries
       | based on the latest defined version, don't require a running
       | database, etc.
       | 
       | The main difference seems to be about queries. My approach was to
       | try to replicate sql query structure directly with the query
       | builders, rather than abstract away from sql, whereas this
       | appears to aim for something that's less similar to the original
       | sql. I don't consider myself an sql expert and I've seen sql
       | abstractions that aren't able to replicate all functionality, but
       | seeing the reddit debate with the diesel author this seems fairly
       | thorough.
       | 
       | FWIW in `good-ormning` I was thinking of getting rid of the query
       | builders/my own query ast and having a macro that parses with
       | `datafusion-sqlparser-rs` and checks that ast instead, but it's a
       | significant change.
       | 
       | FWIW I think SQL is awful too, so I'm really happy to see the
       | support here and I hope rust-query takes off.
        
       | dboreham wrote:
       | For folks reading who don't have experience in this space: this
       | kind of thing is usually a bad idea. On the surface it looks like
       | a neat thing, and the people who write these components clearly
       | think they're a good thing. But not so much.
        
       ___________________________________________________________________
       (page generated 2024-12-01 23:01 UTC)