[HN Gopher] Safe relational database queries using the Rust type...
___________________________________________________________________
Safe relational database queries using the Rust type system
Author : lukastyrychtr
Score : 264 points
Date : 2024-11-30 09:29 UTC (1 days ago)
(HTM) web link (blog.lucasholten.com)
(TXT) w3m dump (blog.lucasholten.com)
| echelon wrote:
| Good luck with this, Lucas! We need all of the innovation in the
| Rust DB space we can get.
|
| I can't use yours in production yet as it only supports SQLite,
| but I'll keep my eyes peeled for updates.
|
| We're using Sqlx and Diesel in production currently.
|
| Sqlx is great because it's not an ORM and lets you write "type
| checked" raw SQL. It frustrates us in the fact that you can't
| write dynamic queries and have them be type checked, though. No
| `IN` clauses or conditional predicates, only static queries with
| simple bindings get type checked.
|
| Love to see more ORMs like yours! Diesel isn't our favorite.
| Simpliplant wrote:
| After being frustrated with Diesel, I gave a try to SeaORM and
| I couldn't be happier (have been using it for more than a
| year). It is not perfect but certainly the best ORM I have ever
| used myself (vs Diesel, ActiveRecord, SQLAlchemy)
| kstrauser wrote:
| How come? I'm new to the space and picked Diesel a couple
| weeks ago for a new project because it seemed the most
| recommended. What do you prefer about SeaORM?
| Simpliplant wrote:
| To me, SeaORM feels like a well-designed library while many
| things in Diesel feel like an afterthought or a hack. Which
| mostly manifests itself when doing something complex and
| debugging it.
|
| Also recommend reading https://www.sea-
| ql.org/SeaORM/docs/internal-design/diesel/
| kstrauser wrote:
| I'll check it out. Thanks!
| echelon wrote:
| sea-query looks a lot like jOOQ!
| weiznich wrote:
| Please note that this comparison is outdated since at
| least 2 years, given that diesel-async exists for more
| than 2 years now and this page completely forgets to
| mention it.
| satvikpendem wrote:
| SeaORM is not compile time safe by their own admission so
| it's automatically out of the running, personally speaking.
| kelnos wrote:
| That feels like a weird requirement to me; an ORM is by its
| very nature not going to be able to validate queries at
| compile-time.
|
| And even if you're writing queries directly, any time you
| need to write something dynamic, you're not going to be
| able to verify those at compile-time either.
| satvikpendem wrote:
| > an ORM is by its very nature not going to be able to
| validate queries at compile-time.
|
| I don't see why this is by its "very nature" when
| libraries like diesel or Prisma already do this.
|
| > any time you need to write something dynamic, you're
| not going to be able to verify those at compile-time
| either.
|
| Diesel can validate dynamic queries at compile time too,
| sqlx too to some extent.
| written-beyond wrote:
| My first Rust PR was for a Diesel Plugin crate, their macros
| weren't updated with Rusts newer spec.
|
| I honestly really enjoyed Diesel but it's extremely slow
| migration to async really made it lose traction.
|
| I've used SQLx extensively for projects after that and it was
| pretty decent.
| weiznich wrote:
| Please note that up until today an async rust database
| library does not give you any measurable performance
| advantage compared to a sync database library. In addition to
| that it won't even matter for most applications as they don't
| reach the required scale to even hit that bottleneck. As a
| matter of facts crates.io just run well on sync diesel up
| until maybe a month ago. They have now switched to diesel-
| async, for certain specific not-performance related reasons.
| The lead developer there told me specifically that from a
| performance point of view the service would have been fine
| with sync diesel for quite a while without problems, and
| that's with somewhat exponential growth of requests.
|
| Other than that the async rust ecosystem is still in a place
| that makes it literally impossible to provide strong
| guarantees around handling transactions and make sure that
| they ended, which is a main reason why diesel-async is not
| considered stable from my side yet. This problem exists in
| all other async rust database libraries as well, as it's an
| language level problem. They just do not document this
| correctly.
| written-beyond wrote:
| WEIZNICH! I am your biggest fan. Thank you so much for the
| amazing crates you've made and for Diesel.
|
| When I started my rust journey Diesel and Rocket were the
| first few crates I worked with and it really opened up my
| mind and I haven't turned back since.
|
| As for async, my primary concern for that was to make sure
| you minimise blocking code calls in your futures. Even if
| there weren't many performance gains to be made from Diesel
| itself being async but having database calls marked as
| Futures could theoretically help the runtime schedule and
| manage other threads.
|
| I'd like to thank you for you amazing work once more!
| k972 wrote:
| Depending on what database you're using, you might be able to
| restructure your dynamic queries to fit a static form...
|
| WHERE ... AND CASE WHEN ? IS NULL THEN TRUE ELSE col ILIKE ? ||
| '%' END AND CASE WHEN ? IS NULL THEN TRUE ELSE col2 IN (?) END
|
| Each "dynamic" parameter is then bound twice. (My personal
| preference is to use a CTE to bind and name my parameter values
| in the query instead of binding multiple times whenever
| possible).
| kelnos wrote:
| That only works when the dynamic portion is in the WHERE
| clause, though. I often need to dynamically generate UPDATEs
| and even sometimes INSERTs.
| typicalset wrote:
| This looks interesting as an approach to relating schema to data
| types. One aspect that feels very counter-intuitive/unidiomatic
| is that, if I understand correctly, in your example there is no
| Schema enum. Usually this sort of macro does not erase the type
| defined, and I would find it extremely confusing that I cannot
| reference, say, Schema::User{..}. It would be clearer to me if
| this were defined inside of a macro_rules macro, more like
| lazy_static e.g.
|
| ``` schema!{ User { name: String, }, Story { author: User, title:
| String, content: String, }, } ```
| pwdisswordfishz wrote:
| I was just about to ask WTF is a Schema-typed value supposed to
| represent.
| davidatbu wrote:
| I am really happy to see some more exploration in the typesafe-
| db-access-in-Rust space.
|
| > The existing libraries don't provide the compile time
| guarantees that I want and are verbose or awkward like SQL.
|
| Worth noting: diesel definitely fulfills the "providing compile
| time guarantees" criteria.
|
| Here's where I stand on the inevitable ORM-vs-no-ORM debate
| that's about to erupt: I like typesafe query builders that don't
| abstract over sql ( I'd put diesel in this category, and i would
| not put activerecord or djangos orm or sealORM in this category).
|
| It looks like rust-query will lean towards the full-ORM side of
| that spectrum. Not my cup of tea, but the tea industry can
| accommodate many flavors :)
| pdimitar wrote:
| > _Worth noting: diesel definitely fulfills the "providing
| compile time guarantees" criteria._
|
| Also sqlx.
|
| I liked the article and I bookmarked the library but this
| statement from the author confused me a bit. At least two
| libraries give you pretty good compile-time guarantees. sqlx
| even gives you the option to specify a DB where it can check
| whether your queries will succeed at runtime.
| weiznich wrote:
| > Also sqlx.
|
| The guarantees provides by sqlx are less strong than what's
| provided by diesel due to the fact that sqlx needs to know
| the complete query statically at compile time. This excludes
| dynamic constructs like `IN` expressions or dynamic where
| clauses from the set of checked queries. Diesel also verifies
| that these queries are correct at compile time.
| Kinrany wrote:
| Agree. SQL libraries should aim to make any valid query
| possible to express (though semantic equivalents are fine)
| while integrating query building into the rest of the language
| as seamlessly as possible.
| Diggsey wrote:
| SQL, while not great, does have some advantages that are hard to
| beat:
|
| - Everyone knows at least some basic SQL, even non-technical
| users have often encountered it in some form.
|
| - The documentation for eg. PostgreSQL is for SQL, so if you
| write queries in anything else you have to mentally translate
| back and forth, and so you need to know SQL anyway.
|
| - Any external tools you use to interact with the database will
| use SQL.
|
| - Changing the queries doesn't require an expensive compilation
| step - `sqlx` gets the best of both worlds in this respect, in
| that it is able to type-check the parameters and rely on the
| database itself to validate the query, so you don't end up with a
| ton of type-system shenanigans that increase compilation times.
|
| Maybe for a brand new database, a better query language could win
| out, but having used sqlx I can't imagine going back to a query-
| builder style interface for existing SQL databases.
| jeltz wrote:
| Yeah, I am using sqlx and it is amazing. The only thing it
| cannot do is help you in the case when you actually need a
| query builder, e.g. when you need support for queries with
| dynamic filters and sort orders. But other than that it is much
| nicer to work with than every other query library I have used.
| Diggsey wrote:
| FWIW, you very rarely need to make the query itself dynamic.
| After all, SQL alone is powerful enough to do conditionals.
|
| For example: SELECT * FROM example
| WHERE ($1 IS NULL OR field1 = $1) AND ($2 IS NULL OR
| field2 = $2) ...
|
| When you supply the parameters to this prepared statement,
| the constant conditions should be optimized away by the query
| planner if you're using a decent database.
| ris wrote:
| Those tricks cover a very small subset of dynamic queries,
| and if you don't give your developers a proper tool to do
| dynamic queries in, your code will start growing these
| funny string-formatting-based query-mangling functions that
| have god-knows-what escaping problems behind them.
| Diggsey wrote:
| That has not been my experience. SQL is a fully fledged
| programming language so there's very little that you
| can't express this way.
|
| What are you doing that _requires_ dynamically building a
| query?
| kelnos wrote:
| > _That has not been my experience_
|
| Well, it _has_ been mine. That trick you suggested works
| for the simple binary "either include this WHERE clause
| filter or don't" -- which, agreed, is something that
| comes up often enough -- but it's only in the simplest of
| CRUD apps where that's all I've needed.
|
| A quick example: I have a REST endpoint that updates a
| resource. The JSON fields that map to the DB columns can
| be missing (leave the column value unchanged), null (set
| the column value to NULL), or have a value (set the new
| provided value). The list of assignments in the UPDATE
| needs to be dynamic; sometimes there might be only one
| column being updated, sometimes two, three, four five...
| jon_richards wrote:
| Isn't the standard just to set the unchanged column to
| itself?
| LinXitoW wrote:
| It's a fully fledged, horrible language. Anything beyond
| basic queries is unreadable, ESPECIALLY when it's done in
| plain strings in another language. There's not even a way
| to not have to repeat values EVERY SINGLE TIME (think
| variables or constants in every other language).
|
| Oh, but what about <feature>? Well, is that SQL, or a
| frankensteined version of SQL, aka a "dialect"?
|
| SQL is the JavaScript of databases, and we'll be better
| for it once we admit this.
| 63stack wrote:
| Can you show an example where you have 4 possible fields to
| filter on, and then one more where the sorting field is
| dynamic? (Not the value, but the column)
| pimeys wrote:
| You can add sea-orm on top of sqlx for a query builder. It
| can also introspect your SQL database and generate entity
| types.
|
| I built my own back in the days when I worked for Prisma.
| None of these crates existed. It's not that complex if you
| limit the builder to queries you need in your application.
|
| But, to be honest, jOOQ or LINQ would be awesome to get for
| Rust.
| unshavedyak wrote:
| I am toying with spreadsheet backends these days and it's had
| me wondering - do spreadsheet query languages offer value that
| SQL is missing? Beyond "people are used to it" of course.
|
| One thing i am wanting beyond SQL though is a more Object
| oriented language. Ie i want something SQL-like (or more
| specifically PRQL, which i find very nice), but without the
| being so tailored towards a flat data structure. I want to use
| it on JSON/etc.
| wslh wrote:
| BTW, do you suggest a LINQ kind of integration in Rust? I just
| found the Microsoft has a Rust for C#/.NET developers resource
| [1][2] which is more about integration I think. It is a little
| weird that there is not Rust.NET yet. .NET is not a garbage
| collection only platform. I guess Microsoft is working
| internally on this.
|
| [1] https://microsoft.github.io/rust-for-dotnet-
| devs/latest/linq... [2] https://microsoft.github.io/rust-for-
| dotnet-devs/latest/
| estebank wrote:
| I'm impressed by this individual's effort to bring rust to
| the clr: https://github.com/FractalFir/rustc_codegen_clr
|
| It is already way more complete given it's age than I would
| have expected.
|
| I agree that the clr is a more natural target than, let's
| say, the JVM (at least until Project Valhalla comes out).
| kstrauser wrote:
| Thanks for releasing this! More ideas are great and welcome, and
| even I don't use this directly, maybe the ORM I use will like
| some of the features and add them.
|
| But I have to admit, this one bit almost made me stop reading:
|
| > For those who don't know, SQL is the standard when it comes to
| interacting with databases.
|
| I can scarcely imagine anyone who both 1. would want to use this
| and 2. doesn't know what SQL is. So if I'm reading this, and
| you're telling me what a database is, I assume you assume I'm an
| idiot. This makes me wonder if it has design choices to protect
| hypothetical Rust-using, DB-using idiots from themselves at the
| expense of making it hard to handle edge cases.
|
| I'm sure you didn't mean it that way, but that's how it rubbed
| me. Maybe consider leaving out the extremely basic introductory
| bits.
| davidatbu wrote:
| A more charitable take is that OP tried to make his blogpost
| more accessible to newbies at a very small verbosity cost for
| non-newbies.
|
| Fwiw: i'm totally fine with that.
|
| > This makes me wonder if it has design choices to ...
|
| I personally felt that the relevant design choices were
| explicitly enumerated, and so one needs not to deduce them from
| the rest of the text.
| kstrauser wrote:
| That's fair, and I offered that mainly in the spirit of
| feedback. _To me_ , and perhaps to no one else, but just _to
| me_ , it was a little off putting. Like, "yeah, yeah, I know
| what SQL is. Do you think I don't? Or did you just learn
| about it last month and you're eager to share?"
|
| I'm certainly not gonna lose sleep over it. Someone came on
| HN to show us their neat new project. I wanted to give them
| my first impression to help them craft their future
| messaging. If they disagree, fine with me! They don't owe me
| anything.
| mattrighetti wrote:
| Looks nice! I've been using SeaQuery for a while now but
| documentation is not that great if you're looking to create some
| more advanced queries.
|
| While I liked the idea of having strongly typed queries I've
| lately found out that sometimes they unnecessarily slow me down
| during the development process, I'm considering going back to the
| good old prep statements and binding values manually.
| giovannibonetti wrote:
| You might be interested in SQLx, since you write the SQL but
| the library generates the tedious bindings for you.
| mattrighetti wrote:
| That's my go-to! Maybe it was not super clear from my comment
| but I'm not using sea-query-orm but just the query builder,
| once the query is build you still have to feed it to sqlx
| (e.g [0]) :)
|
| [0]: https://github.com/SeaQL/sea-
| query/blob/master/examples/sqlx...
| ocschwar wrote:
| Thanks for this! I am looking forward to putting this together
| with the Zola CMS for some hobby applications.
| ninetyninenine wrote:
| >My opinion is that SQL should be for computers to write. This
| would put it firmly in the same category as LLVM IR.
|
| This is nuts.
|
| SQL is a high level language. It's higher level than python or
| rust. It's basically a declarative statement that's almost
| english-like and it's specifically DESIGNED to be more readable
| and easier to use by humans. It compiles down into many
| procedures that don't easily map to the SQL statement itself. You
| can't get any higher level than SQL.
|
| The issue here is that the database exists at the bottleneck of
| web dev. It's where state mutation happens and it's essentially
| usually the slowest part of the pipeline in a computer. Yet
| instead of having fine grained low level control over this part
| of the pipeline, we have a high level language on top of it. So
| if we want to optimize this part of the pipeline we have to HACK
| the query. We have to make the abstraction layer leaky in the API
| itself with EXPLAIN. We can't do direct optimizations because SQL
| is so high level. It is a bit of a problem but the industry is so
| entrenched in SQL that it's actually 10x more efficient to just
| use it then to develop an API that's more appropriate for this
| level. SQL is tech debt we are sort of stuck with. The ideal API
| would be one that is both high level but allows fine grained
| control... but we don't have one yet.
|
| To use machine learning analogies. SQL is a local optima. There's
| a much more optimal language somewhere in this space but we are
| stuck in the local optima and it's likely we will never end up
| finding the actual optimal api.
|
| In short SQL is the furthest thing from LLVM IR. It's crazy. You
| cannot treat it as the same thing. If you do there are huge
| problems.
|
| The problem with rust query and the problem with ORMs in general
| is that the API for these libraries are in itself high level.
| They are HIGH level abstractions ON TOP of high level
| abstractions. You want to optimize a query now? Well you need to
| hack the first high level abstraction in such a way that it hacks
| the second high level abstraction such that it produces optimized
| compiled procedures. That's the problem here.
|
| All this ORM stuff is just programmer OCD. We don't want to do
| meta programming where we have another language living as a
| string in our web app. We want everything fully integrated so we
| create an abstraction in attempt to get rid of an abstraction
| that was intended to be an abstraction in itself. It's aesthetics
| and the aesthetics actually makes life harder.
| vlovich123 wrote:
| > You can't get any higher level than SQL.
|
| Can't you? Wouldn't higher level be a goal oriented natural
| language like "what's the most popular music file in the
| database" and an AI agent just figured out how to give you the
| answer?
| ninetyninenine wrote:
| Well. You know what I mean.
|
| AI is also unreliable. Place chatGPT over a database and it's
| going to start lying about what is in it eventually.
|
| That being said my statement likely isn't strictly true even
| when you account for ML. I'm sure there are even higher level
| languages that are not probability based and thus
| deterministic with expected results. But in terms of a
| software developers average experience, SQL is the highest
| level you can get without getting into the minefield that is
| LLMs.
| threeseed wrote:
| > SQL is the highest level you can get without getting into
| the minefield that is LLMs
|
| You can use NLPs to safely write SQL in more business
| language.
|
| And of course there are GUIs.
| kelnos wrote:
| > _SQL is the highest level you can get without getting
| into the minefield that is LLMs_
|
| That's trivially false.
|
| I'm currently playing with sea-orm[0], and I can write
| things like: let db =
| Database::connect(...).await?; let yesterday:
| chrono::DateTime<chrono::Utc> = ...; let foos:
| Vec<foos::Model> = Foos::find()
| .filter(foos::Column::Status.eq(FooStatus::Frobbed))
| .filter(foos::Column::UpdatedAt.gt(yesterday))
| .left_join(Bars) .also_select(Bars)
| .all(&db) .await?;
|
| How is that not higher-level than SQL?
|
| [0] https://crates.io/crates/sea-orm
| cwbriscoe wrote:
| Why not just write SQL, it would be much more readable.
| cmrdporcupine wrote:
| > SQL is the highest level you can get without getting into
| the minefield that is LLMs.
|
| I mean, not really true. Datalog is relational, like SQL,
| but gives you a higher level rule-oriented mechanism for
| querying that can avoid a lot of the mechanics of
| explicitly joining things, for example.
| Yoric wrote:
| That's a good point.
|
| It would be very nice to have a LLVM IR-style language for
| database queries, though. As a language, SQL is... well, a
| little behind times. ORMs explore lots of interesting design
| choices, but as you mention, by piling up high-level decisions
| on top of something that, by many aspects, are even higher
| decisions.
|
| I wonder what an LLVM IR-like would look like in this space.
| ninetyninenine wrote:
| I agree there should be a common api. The sheer number of
| dialects with SQL contributes to a lot of issues with ORMs.
|
| But SQL is definitely not something designed for computers to
| only write or compile another language into (which is what
| the OP is promoting).
| Yoric wrote:
| Yeah.
|
| I wonder if something like GlueSQL or the IndexedDB layer
| of most browsers could be used as such an API.
| josephg wrote:
| I hear you and I 70% agree.
|
| I've also been using Prisma for a project recently and I've
| been finding it quite good. It layers several useful features
| on top of sql, like schema management (checked in to git), code
| gen (incl typescript types), db connection management,
| serialization / deserialization from JavaScript objects to sql
| rows and back, and it lets you follow foreign keys. If you
| want, you can also easily just make raw sql queries. That works
| fine too.
|
| Sql is a very old programming language, and it's showing its
| age. The design is wacky - does GROUP BY go before or after
| ORDER BY in the query? Every command has custom, seemingly
| random words which are actually parameters. They are placed at
| totally arbitrary, hard to remember spots in the command. And
| what, we make our program output a string that gets immediately
| parsed by the sql server? What a horrible, stupid way to do
| RPC. It also gives terrible error messages if you get it wrong.
| And it's awful to learn and debug.
|
| All of that is a huge pity, because modern databases can do so,
| so much. Doing logic inside the database can often be an order
| of magnitude faster than doing it in client code. SQL is a
| fantastic way to represent your data.
|
| One path for fixing this would be to move the sql query parser
| inside the client library. Then internally, have the client
| library send optimized RPC instructions to the database. This
| would take cpu load off the database (usually a good move -
| databases are harder to scale). And it would open the door for
| the client library to provide other better, lighter and faster
| ways to programmatically construct database queries. Ideally
| without the totally unnecessary loop of constructing then
| parsing sql.
| ninetyninenine wrote:
| >One path for fixing this would be to move the sql query
| parser inside the client library. Then internally, have the
| client library send optimized RPC instructions to the
| database. This would take cpu load off the database (usually
| a good move - databases are harder to scale). And it would
| open the door for the client library to provide other better,
| lighter and faster ways to programmatically construct
| database queries. Ideally without the totally unnecessary
| loop of constructing then parsing sql.
|
| The problem here is these low level optimizations that the
| database is doing is freaking complicated. There's a lot of
| intelligence going on in the background and it's different
| depending on the database. We can solve it with a low level
| common IR but that IR will be extremely complex. And that IR
| will likely give different performance profiles for the same
| code on different databases.
|
| I can see this being solved for one database. But not across
| all databases and a common api. We sort of do have a common
| api and we paid for it with the trade off of it being a
| easier high level language on top of the part of the web that
| needs to be most optimized.
| bpicolo wrote:
| > can solve it with a low level common IR
|
| Wouldn't that basically just be a query plan? The language
| is the easy part. Determining the optimal query plan based
| on rough statistics about the data and the DBs ability to
| go about that is the hard part.
|
| There are more general purpose query optimization libs out
| there, like Calcite
| avianlyric wrote:
| > One path for fixing this would be to move the sql query
| parser inside the client library. Then internally, have the
| client library send optimized RPC instructions to the
| database. This would take cpu load off the database (usually
| a good move - databases are harder to scale). And it would
| open the door for the client library to provide other better,
| lighter and faster ways to programmatically construct
| database queries. Ideally without the totally unnecessary
| loop of constructing then parsing sql.
|
| Erm, that's already true. Proper clients will be parsing SQL
| locally and sending representations of the query, and its
| parameters separately. They may often be using a wire-format
| that's very close to textual SQL, but parsing that SQL is
| hardly the slow part of executing a query.
|
| DB CPU load doesn't come from parsing SQL, it comes from the
| query planning process, and the actual work of reading and
| filtering data. The planning process alone is extremely
| complex as the DB will be using all manner of statistical
| data collected about the type of data stored, and exactly how
| it's stored, and how it various columns do or don't correlate
| with each other, in order to estimate the best possible way
| of performing constructing the query plan. A process that
| factors in additional elements like the relative speeds of
| reading from disk, vs memory, and CPU cache, and how that
| varies with working set size.
|
| In addition, the real CPU load comes from the actual
| execution of that query. Streaming data off disks into
| memory, building various temporary in-memory data structures
| to accelerate joins and filters, there's a lot of bytes that
| need to be shuffled around, and modern DB codebase are now
| optimising for the number of CPU cycles needed to operate
| tight inner loops. None of this work can be moved to the
| client, not unless you have a mechanism of streaming GB/s of
| data to your client on every query.
|
| To think of SQL as just an RPC protocol completely misses the
| point of SQL, or the incredible engineering involved in
| modern query planners. There a many reasons to replace SQL if
| something better, but the idea that it's an inefficient RPC
| protocol, and that clients should have more fine-grained
| control of the planning and execution of queries, really
| isn't one of them.
| 9rx wrote:
| _> We don 't want to do meta programming where we have another
| language living as a string in our web app._
|
| Strings are fine in theory, but the trouble with SQL is that,
| while too high-level in all the wrong places, it is also too
| low-level in all the wrong places. Developers want to be able
| to do things like define the shape of the data beyond tables
| and compose queries, which SQL fails hard at. In order to
| accomplish what developers need, you either end up extending
| SQL into a new language, or you can use the programming
| constructs you already have, treating SQL as a compiler target,
| where necessary. The latter is considerably easier to
| implement.
| jimbokun wrote:
| I think the sweet spot is a query builder that just ensures you
| emit syntactically valid SQL to send to the database. So there
| are no surprises about what the database is executing. But
| protects against typos and security issues from building SQL
| through string concatenation.
| bob1029 wrote:
| Every time I've found myself frustrated with a SQL query, there
| was some horrible misstep in how the underlying schema was
| designed.
|
| Having the wisdom to craft a high quality schema can be the
| biggest unlock in the whole enterprise. If the tables, columns
| & relations are ~1:1 with the actual business (i.e., you went
| and talked to real, live stakeholders at design time), then it
| should be deeply intuitive for a non-technical domain expert to
| interact with.
|
| Often, you don't get a chance to redo the whole thing, but that
| doesn't mean you can't still design an ideal schema and then
| shim it with views, CTEs, replicas, ETL, etc.
|
| SQL is a _domain specific_ language. It would be like
| complaining about how horrible it is to write Lua scripts. The
| criticism is baseless without understanding the underlying
| schema /bindings.
| habitue wrote:
| I see your point for one part, which is that
| `select([id]).from(Users)` is maybe in the best case as
| readable as `select id from users`, and in many common cases
| it'll be less readable than sql. Plus, it's not quite sql,
| you'll have to look up how to do things a lot of times.
|
| But we aren't just talking about SQL the high level language
| that you might execute when you are poking the data live. In a
| web app dynamically constructing sql, we have multiple issues:
|
| 1. Deserialization: we need to know what the query returns and
| plug it into data structures we want to use in our program
|
| 2. SQL is dynamically typed, and if you construct queries
| dynamically you're even more dynamically up a creek. Maybe even
| "sql injection vulnerability" up a creek.
|
| 1 and 2 are solved by strongly typed query builders like this:
| the type system has a proof that the ways you will construct
| queries in your program actually result in valid sql queries
| and that they return the right data shape.
|
| It's actually solving a real problem
| zamalek wrote:
| I call it "SQL fear", and I agree: people need to get over it.
| We used EFCore at my previous job, and I don't know how much
| time figuring out how poorly performing queries are happening.
| The abstraction leaks like a sieve and isn't worth it.
|
| Now, if we could replace SQL with a purpose-built language,
| that I would be interested in. I have to concede that SQL
| doesn't have great ergonomics (e.g. if FROM was first then
| auto-complete would be better, trailing commas would eliminate
| much formatting bike shedding, why are INSERT and UPDATE so
| different).
| threeseed wrote:
| > It's basically a declarative statement that's almost english-
| like and it's specifically DESIGNED to be more readable and
| easier to use by humans
|
| It has categorically failed at this task. That is what we have
| an entire ecosystem of tools to allow people to query and
| interact with databases without using SQL.
|
| Developers need to understand that SQL is an abstraction for
| developers, analysts, data engineers etc not end users.
|
| And only basic SQL statements are English like.
| Animats wrote:
| I tend to agree.
|
| The main trouble with SQL is that the query parameters are
| mixed in with the query, which results in string escape
| problems. That dates from when people were expected to query a
| database by hand from SQL, which is sometimes useful.
|
| For machine generated use, SQL queries should consist of a well
| defined fill-in-the-blanks system, with a text SQL query and a
| machine data structure. Something like this:
|
| Query in SQL text as a constant string:
| "SELECT foo FROM BAR WHERE name=NAME1"
|
| Parameters: key/value pairs in a safe native format for the
| language in use. {"NAME1" : "Smith"}
|
| That gets you out of the curse of character escaping and SQL
| injection, the only real problem. It avoids trying to rewrite
| SQL in yet another ORM.
|
| Microsoft has something which works this way, but it's
| unnecessarily clunky.[1]
|
| [1] https://learn.microsoft.com/en-us/sql/relational-
| databases/n...
| Animats wrote:
| This suggests a simple API using a Rust macro:
| fill_in_sql!(query_string, query_params)
|
| Where query_params is a structure containing the parameters:
| struct some query { name: &str, id:
| u32, }
|
| The macro would parse the query string, find the variables to
| be filled in, match them to fields in the "params" structure,
| and generate the code to safely escape and fill in the
| params. Any matching failures are found at compile time.
|
| This is similar to what the usual "derive" macros do -
| examine a data structure at compile time and generate
| appropriate code.
| kelnos wrote:
| I've used systems like this in Scala. It's actually pretty
| magical to write something like: val foos
| = sql"SELECT * from foos WHERE status =
| 'frobbed'".as[List[Foo]]
|
| ... and have the macro system parse the query at compile
| time and verify that the query will indeed return a
| `List[Foo]`.
|
| I think this is a nice middle ground, but constructing
| dynamic queries often can't be checked at compile-time.
|
| (I'm probably getting the syntax somewhat wrong; I haven't
| done Scala in quite a few years at this point.)
| kelnos wrote:
| I don't really think SQL injection bugs are the main driver.
| Certainly everyone should be using a query builder that
| protects them from this (and yes, I know, there are still
| people who screw this up), but I think the issues with SQL go
| far beyond that.
|
| Personally I would much rather deal with a database table as
| a collection of objects with various fields in them. In a
| language with a reasonable collections library, I want to be
| operating on that data using functional combinators like
| filter, map, group_by, sort/order_by, etc. That feels much
| more natural to me than writing out stringly-typed SQL by
| hand. This is sorta an ORM, but not quite: it's somewhere in
| between writing raw queries, and using a full-fledged
| (usually inefficient) ORM.
|
| The downside, of course, is that the performance of my
| queries is at the mercy of whatever the query generator
| underneath it is doing. But the nice thing is that if I need
| to optimize a query, I can always bypass the generator and
| write it by hand. But I wouldn't do that until I've measured
| things; no reason to prematurely optimize.
| avianlyric wrote:
| > For machine generated use, SQL queries should consist of a
| well defined fill-in-the-blanks system, with a text SQL query
| and a machine data structure. Something like this:
|
| Erm, that's exactly how modern DB and DB clients work. The
| client passes the SQL query with placeholders for parameters,
| and then sends the parameters separately. It's also a core
| part of how prepared SQL queries work.
|
| Every SQL client in any worthwhile language implements this
| pattern of passing a SQL query and its parameters as separate
| data structures, and they have done for decades. SQL
| injection only happens when people don't use their SQL client
| properly, and attempt to compose SQL queries using naive
| string templating. Effectively doing an end run around all
| the protections DB and clients provide by default.
| Seb-C wrote:
| > The ideal API would be one that is both high level but allows
| fine grained control... but we don't have one yet.
|
| What kind of fine grained control are you talking about here? I
| never felt like I was missing anything when doing even complex
| SQL (unless I am forced to use a crappy MySQL, which problems
| are not inherent to SQL itself).
|
| Quite the contrary actually: despite having this amazing high
| level of abstaction, most developers are ignorant or don't care
| to properly index the data, add integrity constraints, manage
| the locks or think about transactions.
|
| In 99% of the non-SQL code that I see, developers don't even go
| further than nested loops containing IOs, so I don't think that
| letting them decide which tree/trie is appropriate for a user
| case (for example) would help in any way.
| swaits wrote:
| > almost english-like
|
| Real world SQL from the analytics, DE, BIE side of things is
| _extremely far_ from readable. It's a form of actual hell.
| tyleo wrote:
| It looks like you just got this blog set up. Congratulations!
| Aeolun wrote:
| It seems that migrations with individual row level manipulation
| would be _tragically_ slow to execute to me? Like, I have a table
| with a billion rows, and any normal update statement takes up to
| an hour. I don't want to think what an update per row would take.
| Hytak wrote:
| Hi, migrations are 1 select statement + `n` insert statement
| for `n` rows right now.
|
| This might be improved to insert in batches in the future
| without changing the API.
| Aeolun wrote:
| I imagine a select on 1B rows will be equally problematic :)
|
| But you are right, it's not something you necessarily have to
| worry about now. It just means it doesn't work for our use
| case.
|
| Just figured I should point it out.
| levkk wrote:
| My main concern with application-defined schemas is that this
| schema is validated by the wrong system. The database is the
| authority on what the schema is; all other layers in your
| application make assumptions based on effectively hearsay.
|
| The closest we came so far to bridging this gap in strictly typed
| language like Rust is SQLx, which creates a struct based on the
| database types returned by a query. This is validated at compile
| time against a database, which is good, but of course there is no
| guarantee that the production database will have the same types.
| Easiest mistake to make is to design a query against your local
| Postgres v15 and hit a runtime error in production running
| Postgres v12, e.g. a function like gen_ramdom_uuid() doesn't
| exist. Another is to assume a migration in production was
| actually executed.
|
| In duck-typed languages like Ruby, the application objects are
| directly created from the database at runtime. They are as
| accurate as possible, since the schema is directly read at
| application startup. Then of course you see developers do
| something like: if respond_to?(:column_x)
| # do something with column_x end
|
| To summarize, I think application-defined schemas provide a false
| sense of security and add another layer of work for the engineer.
| IshKebab wrote:
| This doesn't seem fundamentally different from any schema/API
| mismatch issue. For example using the wrong header for a C
| library, or the wrong Protobuf schema.
|
| I guess it would be good if it verified it at runtime somehow
| though. E.g. when you first connect to the database it checks
| Postgresql is the minimum required version, and the tables
| match what was used at compile time.
| dietr1ch wrote:
| It could be verified at runtime, but I haven't seen anyone
| trying to version/hash schemas and include that in the
| request.
|
| The workaround in practice seems to be to keep the DB behind
| a server that always(tm) uses a compatible schema and exposes
| an API that's either properly versioned or at least safe for
| slightly older clients. To be fair it's hard to get rid of
| the middleman and serve straight from the DB, it's always
| deemed too scary for many reasons, so it's not that bad.
| Someone wrote:
| It shouldn't be hard for a database to keep a hash around for
| each database, update it whenever a DDL
| (https://en.wikipedia.org/wiki/Data_definition_language)
| command is run, and, optionally, verify that a query is run
| against the exact database structure.
|
| Could be as simple as secure hashing the old hash with the
| text of the DDL command appended.
|
| That would mean two databases can be identical, structure-
| wise, but have different hashes (for example if tables are
| created in a different order), but would that matter in
| practice?
|
| Alternatively, they can keep a hash for every table, index,
| constraint, etc. and XOR them to get the database hash.
| jeltz wrote:
| Sounds very hard to me. How do you handle online schema
| changes with this? Schema changes are trivial to do if you
| can be down for a long time.
| Hytak wrote:
| rust-query manages migrations and reads the schema from the
| database to check that it matches what was defined in the
| application. If at any point the database schema doesn't match
| the expected schema, then rust-query will panic with an error
| message explaining the difference (currently this error is not
| very pretty).
|
| Furthermore, at the start of every transaction, rust-query will
| check that the `schema_version` (sqlite pragma) did not change.
| (source: I am the author)
| mjr00 wrote:
| > rust-query manages migrations and reads the schema from the
| database to check that it matches what was defined in the
| application. If at any point the database schema doesn't
| match the expected schema, then rust-query will panic with an
| error message explaining the difference (currently this error
| is not very pretty).
|
| IMO - this sounds like "tell me you've never operated a real
| production system before without telling me you've never
| operated a real production system before."
|
| Shit happens in real life. Even if you have a great
| deployment pipeline, at some point, you'll need to add a
| missing index in production _fast_ because a wave of users
| came in and revealed a shit query. Or your on-call DBA will
| need to modify a table over the weekend from i32 - > i64
| because you ran out of primary key values, and you can't
| spend the time updating all your code. (in Rust this is
| dicier, of course, but with something like Python shouldn't
| cause issues in general.) Or you'll just need to run some
| operation out of band -- that is, not relying on a migration
| -- because it what makes sense. Great example is using
| something like pt-osc[0] to create a temporary table copy and
| add temporary triggers to an existing table in order to do a
| zero-downtime copy.
|
| Or maybe you just need to drop and recreate an index because
| it got corrupted. Shit happens!
|
| Anyway, I really wouldn't recommend a design that relies on
| your database _always_ agreeing with your codebase _100% of
| the time_. What you should strive for is your codebase being
| _compatible_ with the database 100% of the time -- that means
| new columns get added with a default value (or NULL) so
| inserts work, you don 't drop or rename columns or tables
| without a strict deprecation process (i.e. a rename is really
| add in db -> add writes to code -> backfill values in db ->
| remove from code -> remove from db), etc...
|
| But fundamentally panicking because a table has an extra
| column is crazy. How else would you add a column to a running
| production system?
|
| [0] https://docs.percona.com/percona-toolkit/pt-online-
| schema-ch...
| threeseed wrote:
| > Even if you have a great deployment pipeline, at some
| point, you'll need to add a missing index in production
| fast because a wave of users came in and revealed a shit
| query.
|
| This sounds more like a CI/CD and process issue.
|
| There is no reason why adding a new index in code and
| deploying it into Production should be more complex or
| error prone than modifying it on the database itself.
| mjr00 wrote:
| Direct execution of `CREATE INDEX...` on a database table
| is always going to be faster than going through a normal
| deployment pipeline. Even if we assume your pipeline is
| _really_ fast, which is probably not the case at most
| orgs, you are still comparing a single SQL statement
| execution, to a single SQL statement execution + git push
| + code reviews + merge + running through Jenkins
| /Circle/whatever. How long does that overhead take? How
| much money have you lost because your website won't load
| when your post is on the frontpage of HN? Seconds and
| minutes count. I don't want my code crashing because an
| unexpected index exists in this scenario.
| threeseed wrote:
| You should be able to deploy end to end to Production in
| less than a minute.
|
| Companies should be focused on solving that problem first
| before doing insanely short-sighted workarounds like
| skipping pushing to Git and code reviews.
| mjr00 wrote:
| > You should be able to deploy end to end to Production
| in less than a minute.
|
| When I was at AWS (RDS) our end-to-end production
| deployment process was 7 days. We were also pulling
| $25million/day or so in profit. I'm sure that number is
| much higher now.
|
| There's a large difference between what the theoretical
| "right" thing is from a textbook perspective, and what
| successful engineering teams do in reality.
|
| edit: besides, it doesn't even make sense in this
| context. I have 100 servers talking to the database. I
| need to create an index, ok, add it to the code. Deploy
| to server 1. Server 1 adds the index as part of the
| migration process, and let's say it's instant-ish (not
| realistic but whatever). Do the other 99 servers now
| panic because there's an unexpected index on the table?
| kelnos wrote:
| That's a lovely ideal, but I'm the real world, there are
| relatively few companies that meet that metric.
| threeseed wrote:
| I've worked at FAANG and enterprise companies and we
| managed to do it.
|
| There are no technical reasons why it can't be done. Only
| process and will.
| kelnos wrote:
| Yes, and that's exactly the point. The reality doesn't
| usually match the ideals, and many orgs do not have good
| process, and do not have the political will to get good
| process implemented. Part of being a professional is
| recognizing where reality falls short of the ideals (an
| all-too-common occurrence), and doing the best you can to
| successfully get your work done in that environment.
|
| And of course I don't know which FAANGs you worked at,
| but I know folks at FAANGs who have complained to me
| about CI and deployment times. Hell, these are huge
| companies; while they try to harmonize tooling,
| deployment times (especially when test suites of varying
| quality are involved) can vary a lot across a company. I
| wouldn't be surprised if there were people at the
| companies you worked at that were upset with deployment
| times, even if the teams you worked on were in good
| shape.
|
| Honestly, when someone suggests something like you've
| suggested (that everyone should be able to get their
| deployment times to under a minute), I really do wonder
| if they're intentionally arguing in bad faith or are
| trolling. I know for a fact that things are not that
| rosy, and are rarely that rosy, even at the companies you
| claim to have worked at, and it's hard to believe that
| anyone could genuinely think that this is a broadly-
| attainable target. That doesn't mean that _no one_ can do
| it, but that does mean that designing tooling that
| assumes everyone can do it is... well, just kinda naive
| and not very useful.
| threeseed wrote:
| You have two choices: (1) try and solve your deployment
| issues or (2) make unmanaged, untested, unreviewed
| changes directly in Production.
|
| Now you may say I'm just trolling but option (1) seems
| better to me for the long-term health of the
| project/company. And I don't believe it's correct to say
| it is an unrealistic goal.
| pclmulqdq wrote:
| There are lots of reasons to do slow rollouts. You should
| be rolling out in stages anyway.
| swiftcoder wrote:
| > I've worked at FAANG and enterprise companies and we
| managed to do it.
|
| You have a very different experience to the rest of us,
| in that cases.
|
| The big AWS services all had deployments measured in days
| - or even weeks (depending on how many regions they are
| deployed across). Facebook's monorepo took upwards of an
| hour just to get a PR through the merge queue. Both were
| notorious for "hand-jamming" critical fixes directly to
| production.
| jeltz wrote:
| I don't think I have ever seen a non-toy project where
| that was the case.
| tempodox wrote:
| You do code review in less than a minute?
| threeseed wrote:
| You do code reviews/automated testing in lower
| environments _before_ you make changes directly to
| Production.
|
| And in this case if it's an emergency hot fix then it's
| still better to do this through a managed, tracked,
| tested pipeline.
| kelnos wrote:
| It's a bummer that you've been downvoted, because it really
| does seem like people here have not operated databases at
| scale.
|
| I will never claim that we were great at managing databases
| at Twilio, but often a schema change would take hours,
| days, or even a week or two to complete. We're taking about
| tables with hundreds of millions of rows, or more.
|
| We'd start the change on a DB replica. When it would
| finish, we would have to wait for the replica to catch up
| with the primary. Then we would bring up new replicas,
| replicating from the replica with the new schema. Finally
| that replica would get promoted to primary, with all the
| old replicas (and the old primary, of course) removed from
| service, and the new replicas brought in.
|
| Only then could we deploy code that was aware of and used
| the updated schema. The previous code of course had to
| ignore unknown columns, and if we ever wanted to drop a
| column, we had to first deploy code that would stop using
| that column. Any column type changes would need to be
| backwards-compatible. If that wasn't possible, we'd have to
| add a new column and backfill it. Adding indexes would
| usually be fine without preparatory code changes, but if we
| wanted to drop an index we'd first have to make sure there
| were no queries still depending on it.
|
| Even for a "small" schema change that "only" took minutes
| or a few tens of seconds to complete, we'd still have to
| use this process. What, do you think we'd shut part or all
| of a real-time communications platform down while we do a
| schema change? Of course not.
|
| The idea that the application could or should be in control
| of this process, or could always be in sync with the
| database when it came to its understanding of the schema,
| is impossibly unrealistic.
| mjr00 wrote:
| Yep, sounds like we have similar experiences! I first had
| to start thinking about this stuff at Hootsuite, back in
| the exciting 1million+ DAU days a decade ago. Before
| then, to me databases were just a thing that got deployed
| along with the application, and deploys only happened on
| a Friday night so who cares about downtime? By the time
| anyone tries logging into the app on Monday morning, the
| code and database will all be up to date. Going to a
| place where deploys were happening constantly _and_
| nonzero downtime was unacceptable was eye-opening.
|
| > The idea that the application could or should be in
| control of this process, or could always be in sync with
| the database when it came to its understanding of the
| schema, is impossibly unrealistic.
|
| These days my attitude is to treat databases as a
| _completely separate service_ from the application code,
| which they effectively are. They 're on a different set
| of servers, and the interface they provide is the
| columns/tables/views/etc, accessed through SQL. So yeah,
| no breaking changes, and the only thing application code
| should care about is if the queries it tries to execute
| return the expected sets of data, not if the schema
| itself matches. And certainly not about things like
| views, triggers or indexes.
|
| This does end up being more overhead than migrations
| alongside the application code, which I know a lot of
| developers prefer because they're easier to use, but the
| approach just doesn't work after a certain scale.
|
| (to be clear, I still use Liquibase etc to manage
| migrations, the process for applying those changes is
| just completely separate from deploying application
| code.)
| kelnos wrote:
| > _These days my attitude is to treat databases as a
| completely separate service from the application code,
| which they effectively are. They 're on a different set
| of servers, and the interface they provide is the
| columns/tables/views/etc, accessed through SQL._
|
| I've never thought of it this way, but I think this is
| really smart. If I have a service that exposes a REST
| API, I can, say, add a new field to a JSON object that's
| returned from an API endpoint without telling clients
| about it. Those clients can update later in order to take
| advantage of the information returned in the new field.
|
| Same thing with a database: I can add a new column, and
| clients can learn about the new column later in the
| future, no problem. The database schema is just a part of
| the database's API, and it can be evolved in a backwards-
| compatible manner just like any other API.
|
| > _to be clear, I still use Liquibase etc to manage
| migrations, the process for applying those changes is
| just completely separate from deploying application
| code._
|
| Right, the schema needs to be managed and there needs to
| be a source of truth for it, with tooling to do
| migrations, but coupling that so closely with the
| application so the schema and application always must be
| in sync (like some others seem to think is the One True
| Way) is a mistake, and would be a complete non-starter
| for my past professional needs.
| jessekv wrote:
| IMO its a similar situation to the discussion here:
|
| https://capnproto.org/faq.html#how-do-i-make-a-field-
| require...
| dathinab wrote:
| I had to realize that at least in start up world most
| (non db focused) devs thing they might not be experts in
| SQL but have a very solid understanding
|
| ... and then don't know about a lot of very fundamental
| important parts and are blissfully unaware about that,
| too.
|
| And to be clear I'm not saying they don't remember the
| exact details of something.
|
| What I mean they don't even know that there are things
| they have to look up, nor any experience or willingness
| to understand what they did wrong by consulting the
| official documentation instead of just randomly googling
| and trying out "solutions" until one seem to happen to
| work.
|
| The most common example would be having so little
| understanding about transaction that they believe
| transactions are just magically fixing all race
| conditions, and then then being very surprised that they
| don't. Or believing that transactions in SQL are
| fundamentally broken after realizing that somehow their
| databases got corrupted.
|
| And again I don't mean junior deves, but people with 10+
| years of backend or "fullstack" experience, i.e. people
| which at least should know that when to consult
| documentation/lookup protections transactions provide
| etc.
|
| I have seen more then one time a (final state of) the
| situation where people started with believing SQL
| transaction magically fix everything, then get
| "corrupted" data then blame SQL for being broken and move
| to NoSql.
|
| The joke here is all the concurrency problem are very
| fundamental and independent of SQL vs. NoSQL.
|
| And SQL often gives you more powerful/easy to use (at
| small scale) tools to enforce synchronization, but at a
| cost. While NoSQL often gives you harder to use
| primitives where you have to do much more outside of the
| database to guarantee correctness, but then at least you
| will more likely blame you code instead of the db for
| things not working.
|
| The most ironic thing here is I'm not a db expert, I just
| know where my knowledge stops and where I can lookup the
| missing parts and can't even give you much tips about
| huge dbs in production luckily surprisingly many
| companies have comparatively "small" db needs.
|
| And honest where I see race condition related issues in
| SQL quite often I'm rarely not seeing them in NoSQL code.
| Where this issues in SQL make me sad as they are often
| very avoidable in NoSQL I often feel like giving up in
| resignation.
|
| Through that experience is for "smallish" databases not
| Twillo scale. But a surprising large amount of companies
| have surprisingly "smallish" databases. Like no joke I
| have seen companies being very vocal about their "huge
| database" and then you realize it's like 5GiB ;=)
|
| Honestly I (metaphorically speaking) don't even want to
| know how db experts feel about this, I'm not a db expert
| and just have a solid enough foundation to know where my
| knowledge stops and when I have to look things up (which
| is all the time, because I'm not writing that much SQL).
| jessekv wrote:
| IMO "parse, don't validate" can apply to data coming out of
| the database too.
| clutchski wrote:
| This is the correct answer.
| kelnos wrote:
| In addition to the deployment-time issues and other stuff I
| and others have commented downthread, I thought of another
| problem with this.
|
| I can't see how this would even work for trivial, quick, on-
| line schema changes. Let's say I have 10 servers running the
| same service that talks to the database (that is, the service
| fronting the database is scaled out horizontally). How would
| I do a migration? Obviously I can't deploy new code to all 10
| servers simultaneously that will do the schema migration;
| only one server can run the migration. So one server runs the
| migration, and... what, the other 9 servers immediately panic
| because their idea of the schema is out of date?
|
| Or I deploy code to all 10 servers but somehow designate that
| only one of them will actually do the schema migration. Well,
| now the other 9 servers are expecting the new schema, and
| will panic before that 1 server can finish doing the
| migration.
|
| It seems to me that rust-query is only suitable for
| applications where you have to schedule downtime in order to
| do schema changes. That's just unacceptable for any business
| I've worked at.
| dayjah wrote:
| This isn't unique to rust-query; this problem also exists
| with ActiveRecord, for example. At Twitch we just had to
| _really_ think about our migrations and write code to
| handle differences.
|
| Basically no free lunch!
| rendaw wrote:
| I think first and foremost, if you're going to use a tool
| like this, you need to do everything through the tool.
|
| That said, for zero downtime migrations there are a number
| of techniques, but it typically boils down to splitting the
| migration into two steps where each step is rolled out to
| each server before starting the next:
| https://teamplify.com/blog/zero-downtime-DB-migrations/
| https://johnnymetz.com/posts/multistep-database-changes/
| etc
|
| I'm not sure if there's anything that automates this, but
| it'd probably need to involve the infrastructure layer
| (like terraform) too.
|
| Edit: There's one other approach I've heard of for zero
| downtime deployments:
|
| Start running the new version in new instances/services
| parallel to the old version, but pause it before doing any
| database stuff. Drain client connections to the old version
| and queue them. Once drained, stop the old version, perform
| database migrations, and start the new version, then start
| consuming the queue.
|
| This is (I think) more general but you could get client
| timeouts or need to kill long requests to the old version,
| and requires coordination between infrastructure (load
| balancer?) and software versions.
| zozbot234 wrote:
| You can always use SQL views to expose a version of your
| schema to apps that's different from what's in the
| underlying DB tables.
| Merad wrote:
| Unless you have some tricks up your sleeve that I'm not
| thinking of, an immediate consequence of this is that zero
| downtime deployments and blue/green deployments become
| impossible. Those both rely on your app being able to run in
| a state where the schema is not an exact match for what the
| app expects - but it's compatible so the app can still
| function.
| ComputerGuru wrote:
| Semantic versioning?
| Merad wrote:
| If I understand the GP correctly, there's no notion of
| semver involved. Any difference in the schema results in
| a runtime error.
| ComputerGuru wrote:
| Yes, that's what I gathered from the code. But I was
| proposing it as an "easy" solution that doesn't involve
| throwing away OP's main idea.
| Filligree wrote:
| And that's okay. Most applications don't need zero-downtime
| deployments, and there are already plenty of APIs that
| support that use case. I'd rather have more like this one.
| Filligree wrote:
| I'm so glad you made this. I've been searching for a decent
| Rust database library for half a year already, and this ticks
| all the boxes.
|
| I haven't tried it yet, so I might have to eat my words
| later, but- great job! It's going to save tons of effort.
| theptip wrote:
| In all the systems I've built (mostly Django) you need to
| tolerate vN and vN+1 simultaneously; you are not going to
| turn off your app to upgrade the DB.
|
| You'll have some Pods on the old application version while
| you do your gradual upgrade.
|
| How do you envision rolling upgrades working here?
| sobellian wrote:
| Surely it is easier to just check that all migrations have run
| before you start serving requests? Column existence is
| insufficient to verify that the database conforms to what the
| application expects (existence of indices, foreign key
| relationships with the right delete/update rules, etc).
| Kinrany wrote:
| The application is necessarily the authority on its
| expectations of the database.
| mjr00 wrote:
| You can see my sibling comment, but in the real world of
| operating databases at any sort of scale, you need to have
| databases in transitory states where the application can
| continue to function even though the underlying database has
| changed.
|
| The quintessential example is adding a column. If you want to
| deploy with zero downtime, you have to square with the
| reality that a database schema change and deployment of
| application code is not an atomic operation. One must happen
| before the other. Particularly when you deal with fleets of
| servers with blue/green deploys where server 1 gets deployed
| at t=0minutes but server N doesn't get deployed until
| t=60minutes. Your application code will straight up fail if
| it tries to insert a column that doesn't exist, so it's
| necessary to change the database first. This normally means
| adding a column that's either nullable or has a default
| value, to allow the application to function as normal,
| without knowing the column exists.
|
| So in a way, yes, the application is still the authority, but
| it's an authority on the _interface_ it expects from the
| database. It can define which columns _should_ exist, but not
| which columns _should not_ exist.
| Hytak wrote:
| You (and many other commenters) are right that rust-query
| currently requires downtime to do migrations. For many
| applications this is fine, but it would still be nice to
| support zero-downtime migrations
|
| Your argument that the application should be the authority
| on the _interface_ it expects from the database makes a lot
| of sense. I will consider changing the schema check to be
| more flexible as part of support for zero-downtime
| migrations.
| throwawaymaths wrote:
| You might have more than one application hitting the same
| database
| kelnos wrote:
| Absolutely not. Certainly it's necessary that the database
| schema be _compatible_ with whatever the application believes
| the schema to be, but the application need not be in control
| of it, and at the orgs I 've worked at, there's no way we
| could build zero-downtime systems if the application had to
| be in charge of schema.
|
| Consider even a very simple case: let's say I have a database
| with two (identical) application servers talking to it (that
| is, I've horizontally scaled my application due to load and
| availability requirements). If I need to do a schema change,
| and the application needs to be in charge of schema, how
| would that even work? If I deploy the change & migration to
| one of the two servers, once the migration is complete, the
| second server will freak out because the schema doesn't match
| its "authoritative" view anymore. If I deploy the change to
| both servers at the same time, and somehow designate one of
| them to actually run the migration, then the other one will
| immediately panic on startup because the first server hasn't
| completed the migration yet.
|
| Not to mention this setup breaks red/black deployments: even
| in the case where I only have one application server in front
| of the database, how do I bring up a new deployment, allow
| both servers to run while the new one is passing health
| checks, and then bring down the original server? They would
| both have different "authoritative" views of what the schema
| should be.
|
| This also completely breaks the ability to roll back, at
| least without also rolling the schema back too. That's risky;
| I don't want to have my service rollback depend on schema
| rollback also working properly.
|
| This kind of "application is authoritative about schema" only
| works when you can schedule downtime to do schema changes.
| That would be a non-starter for any company I've worked at.
| Kinrany wrote:
| The application can ensure that its assumptions hold
| without asserting the exact state of the database.
|
| Almost all migrations are idempotent and backwards-
| compatible, and almost all of the rest of them can be made
| to be by splitting into multiple changes and waiting for
| old versions of the application to be shut down.
| seanhunter wrote:
| How does that work in the case where you have one database
| and n applications using it? Are they all somehow the
| authority, or do you have a way to pick one?
| ninetyninenine wrote:
| agreed. Maybe having a schema check on the build step of the
| application will solve this. If the schema doesn't match then
| it doesn't compile. Most orms of course do the opposite. They
| generate a migration for the database from the code.
| ris wrote:
| And you end up with no canonical declaration of the schema in
| your application code, leaving developers to mentally apply
| potentially tens, hundreds of migrations to build up an idea of
| what the tables are expected to look like.
| eddd-ddde wrote:
| No matter how you define your schemas, you still have a
| series of migrations as data evolves. This is not an issue of
| schema definition.
| ojkelly wrote:
| Would it make more sense to consider the response from the DB,
| like a response from any other system or user input, and take
| the parse don't validate approach?
|
| After all, the DB is another system, and its state can be
| different to what you expected.
|
| At compile time we have a best guess. Unless there was a way to
| tell the DB what version of the schema we think it has, it
| could always be wrong.
| runeks wrote:
| > This is validated at compile time against a database, which
| is good, but of course there is no guarantee that the
| production database will have the same types. Easiest mistake
| to make is to design a query against your local Postgres v15
| and hit a runtime error in production running Postgres v12,
| e.g. a function like gen_ramdom_uuid() doesn't exist. Another
| is to assume a migration in production was actually executed.
|
| One could have the backend fetch DB schema/version info at
| startup, compare it to its own view of what the schema should
| look like, and fail if the two disagree. That way, a new
| deployment would fail before being activated, instead of being
| deployed successfully and queries failing down the line.
| nurettin wrote:
| > schema is validated by the wrong system. The database is the
| authority on what the schema is
|
| What you describe is db-first. This rust library is code-first.
| In code-first, code is responsible for generating ddl
| statements using what is called a "migration" where the library
| detects changes to code and applies them to the schema.
| shermantanktop wrote:
| Agree. Mid-tier developers who create queries for a SQL db to
| execute are doing the equivalent of using Java code to generate
| HTML. The target is not a programmatic API, it's a language
| that was designed for end users, and it is both more expressive
| and more idiosyncratic than any facade you build in front of
| it.
| api wrote:
| This is close to something I've wanted to see for a long time: a
| language where relational models are first class citizens and the
| engine for data storage is abstracted away.
|
| Tables, joins, etc. should all be representable within the
| language's type system and standard libraries. Connectivity to a
| database should be handled by plugins/implementations of storage
| engines. If no storage engine is used, the relational data just
| lives in RAM like any other variable, collection, etc.
|
| It's kind of the opposite of an ORM. An ORM tries to bolt a
| relational database into an OOP or other language model. This
| wraps the language around relational database concepts, fixing
| the impedance mismatch in the language rather than with a
| translation layer.
|
| It also means one does not have to reinvent relational concepts
| (badly) in every application with a complex data model even if
| that application is _not_ backed by a database.
| summerlight wrote:
| >My opinion is that SQL should be for computers to write. This
| would put it firmly in the same category as LLVM IR.
|
| Although I partially agree with the high level idea of this
| statement (let's accept the reality; no mature database is going
| to support your new shiny query language anytime soon), SQL is
| not the most convenient language to be written by code generator.
| Actually this is a sort of nightmare; a seemingly simple plan
| optimization would completely change the layout of the query so
| you're going to have a decent chance to write a full query
| rewrite stage just for the sake of workaround for a specific
| query engine. Google's SQL pipe proposal seems to make it a bit
| better but it has the same problem of new query languages unless
| it's broadly adopted.
| omani wrote:
| everytime I see something like this I question myself why I would
| want or need this over postgREST.
|
| I mean why reinvent the wheel? we live in a time where the DB
| itself is a REST API with postgREST.
|
| https://postgrest.com
| satvikpendem wrote:
| Why would I want to interact with my database over the network
| when I can just...do it locally? It's orders of magnitudes
| slower and more wasteful and it doesn't even provide all the
| guarantees that using actual SQL or an ORM does.
| kccqzy wrote:
| If you are using PostgREST as intended, you don't have an
| application layer any more. It's not something extra you put
| in between the application and the database, but it replaces
| the application.
| satvikpendem wrote:
| Ah so it's like Hasura, it's a BaaS platform. Unfortunately
| I don't like putting all my logic in the client only.
| kccqzy wrote:
| No the logic is in the database. They encourage you to
| write functions to achieve custom logic in the database.
| Search for `create or replace function` in their tutorial
| to see an example.
| satvikpendem wrote:
| Yeah that is also something I don't want to do, after
| having used Hasura at a previous place. After a while,
| the logic gets convoluted when it's all in the database
| only.
| satvikpendem wrote:
| When this was posted on reddit, I saw someone [0] ask how it was
| different from diesel-rs (and diesel-async, the officially
| supported add-on crate, lest someone says diesel doesn't have
| async).
|
| I saw some replies by the diesel maintainer about how the creator
| of this Rust-Query might not have really understood in-depth how
| diesel worked and wanted to write their own solution, and there's
| nothing wrong with that, of course, but this thread might be some
| good context for others asking themselves similar questions.
|
| [0] https://www.reddit.com/r/rust/s/6midd79iDo
| pstoll wrote:
| When a sample showing a new SQL thing has eg
|
| a) no id types eg for User, just a string b) uses i64 for
| "stars" C) any other of a million things...
|
| I quickly conclude it's likely a cute toy by someone who wanted
| to write a thing, not ever likely to be a battle-hardened
| effort brought forth by a high performance / scale database
| tortured soul. And I move on.
| rendaw wrote:
| I'm (was? not sure after seeing this) working on a similar
| thing to this, and the most significant difference between this
| and diesel is that this has stronger safety guarantees by
| controlling migrations as well, whereas diesel just trusts the
| database is in the state you claim it is.
| weiznich wrote:
| You get the same safety guarantees with diesel by using the [
| `embed_migration!`](https://docs.diesel.rs/2.2.x/diesel_migra
| tions/macro.embed_m...) macro and running your applications
| on startup. Diesel decouples that as there are situations
| where you might want to work with a slightly different
| database as defined, for example if you don't control the
| database.
| rendaw wrote:
| I don't get the impression that there's any interaction or
| data sharing between the embedded migrations and the query
| generation part, that would validate the queries against
| the schema as defined by the migration -- is that wrong?
| The documentation doesn't seem to mention it at least.
| mijoharas wrote:
| > The actual row numbers are never exposed from the library API.
| Application logic should not need to know about them.
|
| This point confuses me. If we're thinking about a web server,
| you'll pass your data to the frontend with a row ID, so that they
| can refer to and modify that data in another request? How would
| it work otherwise?
|
| Am I missing something? I can think of some use cases, but this
| limitation cuts down on the uses I'd have for this massively.
| twiss wrote:
| I'm guessing they're talking about
| https://www.complexsql.com/rowid-rownum/, i.e. the row numbers
| defined by the database, rather than the value of the ID column
| you typically create in each table.
|
| I.e., it should still be possible to create a column with a
| public identifier by which you can refer to the row, I assume.
| Hytak wrote:
| Sending row IDs to your frontend has two potential problems:
|
| - Row IDs can be reused when the row is deleted. So when the
| frontend sends a message to the backend to modify that data, it
| might accidentally modify a different row that was created
| after the original was deleted.
|
| - You may be leaking the total number of rows (this allows
| detecting when new rows are created etc. which can be
| problematic).
|
| If you have nothing else to uniquely identify a row, then you
| can always create a random unique identifier and put a unique
| constraint on it.
| skeptrune wrote:
| Diesel basically being the same as Postgres SQL is an awesome
| feature. I'd much rather than then an abstracted query language
| which is difficult to understand.
| runeks wrote:
| Nit: a DB query can never be made type safe (avoiding runtime
| failure) because its correctness depends on the state of an
| external service, which may change at any point in time.
| rendaw wrote:
| This is awesome, but also kind of hurts since I was working on
| something similar here: https://github.com/andrewbaxter/good-
| ormning
|
| They're very similar: Both define schema version "snapshots" not
| "diffs", with diffing happening automatically, and check queries
| based on the latest defined version, don't require a running
| database, etc.
|
| The main difference seems to be about queries. My approach was to
| try to replicate sql query structure directly with the query
| builders, rather than abstract away from sql, whereas this
| appears to aim for something that's less similar to the original
| sql. I don't consider myself an sql expert and I've seen sql
| abstractions that aren't able to replicate all functionality, but
| seeing the reddit debate with the diesel author this seems fairly
| thorough.
|
| FWIW in `good-ormning` I was thinking of getting rid of the query
| builders/my own query ast and having a macro that parses with
| `datafusion-sqlparser-rs` and checks that ast instead, but it's a
| significant change.
|
| FWIW I think SQL is awful too, so I'm really happy to see the
| support here and I hope rust-query takes off.
| dboreham wrote:
| For folks reading who don't have experience in this space: this
| kind of thing is usually a bad idea. On the surface it looks like
| a neat thing, and the people who write these components clearly
| think they're a good thing. But not so much.
___________________________________________________________________
(page generated 2024-12-01 23:01 UTC)