[HN Gopher] Type constraints in 65 lines of SQL
___________________________________________________________________
Type constraints in 65 lines of SQL
Author : oliverrice
Score : 93 points
Date : 2023-02-17 14:31 UTC (8 hours ago)
(HTM) web link (supabase.com)
(TXT) w3m dump (supabase.com)
| oliverrice wrote:
| hi, author here happy to answer any questions
|
| another approach that works great is to use `create type`[1] with
| an `input_function` and `output_function` to build on top of an
| existing scalar type. For example, using that method would allow
| semver to be persisted as a string natively. The only downside to
| that is you have to be superuser.
|
| [1] https://www.postgresql.org/docs/current/sql-createtype.html
| Groxx wrote:
| Is there a way to (reasonably/somewhat ergonomically) describe
| why a thing failed? True/false without reasons is pretty
| painful sometimes.
| mst wrote:
| Why the {1,255} ? Pg's text type is not varchar, it can get a
| lot longer than that and still index just fine, and while I can
| see a restriction on the pre_release stuff -maybe- being
| useful, compilation/build metadata doesn't seem like something
| where an enforced maximum length that low is necessarily
| preferable. (I could see URLs, file paths, repo+path+sha etc.
| type stuff being useful in metadata and not something the code
| recording it can really control the length of - it may not do
| any harm in practice at least for a while, but I don't
| immediately see what -good- the restriction does, albeit I'm
| perfectly willing to be told "here's the good thing about it
| that you didn't think of" ;)
| iddan wrote:
| Why are these practices not common with web developers? What's
| held them back in recent years using it?
| btown wrote:
| IMO the lack of a Terraform-esque ecosystem of tools around
| declaratively managing database-objects-as-infrastructure,
| and understanding how a rollout of a change would be planned,
| has historically been a big issue. If I'm choosing where in
| the stack to declare some new business logic or type
| constraint, I can trivially check that into Git if it's
| running on an application server, and most frameworks' ORMs
| can handle column-level migrations. And if I make a mistake,
| I can push a change that declaratively says what my function
| signature or DDL should be; it's one of the first things
| taught to new developers. Terraform extended this mentality
| to infrastructure, and it was gamechanging.
|
| But there are far fewer tutorials and best practices on how
| to, say, maintain a library of Postgres functions, types, and
| stored procedures that can be iterated on. I'd venture that
| most people have no idea how powerful their databases can be.
| dventimi wrote:
| I blame the Apple II computer.
| heywhatupboys wrote:
| no to forget: Turing's seminal paper
| RedShift1 wrote:
| I think most webdevs see databases as dumb datastores.
| weird-eye-issue wrote:
| Way easier handling this in the application code. If I had a
| developer join my team then commit this into the codebase
| rather than just using the ORM that everyone else is familiar
| with they would be fired UNLESS there was a very good reason
| for doing so
| thor_molecules wrote:
| So in this case, you've let a new dev commit to a code base
| without review.
|
| Then, instead of informing them of best practices and
| giving them time to re-work the code, you'd just fire them?
|
| Interesting strategy.
| oliverrice wrote:
| Agreed that there should be a data integrity or custom
| operation/filter/sorting requirement to motivate a user
| defined type
| cgh wrote:
| Easier, slower, more error-prone. Also, you should consider
| implementing code reviews rather than just firing people
| who commit code you don't like.
| ericHosick wrote:
| > using the ORM that everyone else is familiar with
|
| A relational database isn't always fronted by the same
| tech. You have RESTful APIs, GraphQL APIs, Low Code/No Code
| Solutions (like Retool), reporting tools, ETL systems, etc.
|
| You have company mergers where that one familiar ORM is now
| five to six unfamiliar ORMs.
|
| Ensuring data integrity is key and easiest way to do that
| is to push constraints as close to the persistence layer as
| possible.
| oliverrice wrote:
| I think there are a few things:
|
| 1) Its only been a few years since the trend of testing on
| sqlite locally and running postgres in production went away
| in favor of postgres everywhere (probably thanks to docker).
| That prevented using any feature that wasn't equally
| supported by both.
|
| 2) There's definitely a knowledge gap, and not just among
| developers. The features are most useful for building rich
| applications, so even DBAs didn't have much incentive to use
| them prior to tools like supabase using the database as the
| data model source-of-truth.
|
| 3) Companies are increasingly deciding that data is their
| competitive advantage and interest in data integrity is
| growing. Database constraints are unparalleled at that
| because they can't be sidestepped
| nickpeterson wrote:
| I often wonder about this kind of solution compared with a more
| traditional solution of using a table of columns and constraints.
| There seems like there should be an obvious heuristic of when to
| use one or the other but it eludes me.
| oliverrice wrote:
| Definitely. For the use-cases shown in the blog post, a table
| would be equally valid (and easier). The heuristic I use is to
| create custom types when the values will need to be used by SQL
| functions, or if support for things like ordering/aggregating
| are important to the application, and the rules to support
| those things require custom logic.
|
| For example, SemVer requires that two version that differ only
| in metadata e.g. `1.0.0+metaA` and `1.0.0+metaB` should be
| considered equal. It would be error prone and tedious to push
| that logic onto every query that wanted to sort the table, but
| with a type we can define the logic once and have it work
| everywhere.
| ericHosick wrote:
| A good heuristic to consider when designing composite types is
| whether any part of the type would lead to redundant storage.
| For example, a currency composite type might consist of an
| amount field and a type field.
|
| In most cases, a single invoice, order, deal, etc. is unlikely
| to be generated using multiple currencies. Therefore, a single
| currency_type field in the invoice table would be sufficient.
|
| If we used a composite currency type, the currency_type field
| would be highly redundant.
| skissane wrote:
| > If we used a composite currency type, the currency_type
| field would be highly redundant.
|
| It can help prevent the bug in which someone adds up multiple
| currencies without conversion.
|
| "25 USD + 25 EUR" will either fail or work via some currency
| conversion routine.
|
| "25 + 25" will produce a meaningless and useless wrong
| answer.
| doctor_eval wrote:
| Yes, this is exactly what I've done previously. For
| example, I implemented a sum() aggregate function that took
| "monetary" (currency + value) composite types and raised an
| exception if the currencies were not all the same.
| enugu wrote:
| Any idea of how to implement tagged unions in Postgres. For
| example, Tree a = Leaf a | Branch (Tree a, Tree a)
|
| This is one data type feature which would be great to have. I
| know you can create separate tables for each option in the type
| and use an id, but is a direct type implementation possible?
|
| Dont need polymorphism(say a = String). Even a non recursive
| tagged union would be helpful.
| csnweb wrote:
| While that might not fit your needs I map unions to Postgres
| with an enum encoding the type and then adding constraints on
| which additional columns must be set (and also must not be set)
| depending on the type. For your example the enum might be leaf
| or tree and then if it's of type tree you have non nullable
| left_child and right_child columns and if it's a leaf those
| must be null, and then you might have a value column that is
| non null in both cases or only for leaves.
| oliverrice wrote:
| One of the restrictions of composite types is that they can not
| contain an instance of themselves. So unfortunately, this is
| not currently possible.
|
| I had this issue when trying to implement an AST type for
| pg_graphql[1] back when it was written in SQL [2]. In the end
| we used a JSON type which was much less constrained. That might
| be solvable using pg_jsonschema [3] if you really wanted to
| have a good time though
|
| [1] https://github.com/supabase/pg_graphql
|
| [2]
| https://github.com/supabase/pg_graphql/blob/34cc266da972d356...
|
| [3] https://github.com/supabase/pg_jsonschema
| nextaccountic wrote:
| What's your opinion between jsonb vs hstore columns?
| https://www.postgresql.org/docs/15/hstore.html
| nextaccountic wrote:
| Actually, another thing. JSON works. But what about doing
| pattern matching on the tagged enum? Each variant should give
| access to its fields
|
| Can https://www.postgresql.org/docs/15/functions-
| conditional.htm... be used to do it?
| jerf wrote:
| It depends on your tolerance for the exact feature set that you
| get. The JSON type can be pressed into service to get something
| that represents that, in the sense that what you are asking for
| is a subset of the valid JSON values that Postgres can store.
| If you just want to slam the values into some local tagged
| union in your language when querying them, then problem
| effectively solved. If you want to start operating on it in the
| database itself, you have a lot more quirky things to work
| through. If you want the semantics _inside_ Postgres to be
| _exactly_ what a tagged union /sum type looks like then I think
| you'd be looking at a custom data type at a minimum, and even
| then there are probably limits as to how close you could get
| inside.
| nextaccountic wrote:
| This is my pet peeve with SQL. I hope some day PostgreSQL gain
| support for native tagged unions.
|
| I have saved some links about this
|
| https://github.com/solidsnack/pg-sql-variants
|
| https://thelyfsoshort.io/variant-types-in-postgresql-c63725f...
|
| https://www.parsonsmatt.org/2019/03/19/sum_types_in_sql.html
|
| https://typeable.io/blog/2019-11-21-sql-sum-types.html
|
| Right now I think the ideal is a jsonb column (with a field
| that stores the tag) plus
| https://github.com/supabase/pg_jsonschema. But this is only
| usable if your language translates to/from JSON and generates
| json schema for you.
|
| Rust can do this with https://serde.rs/ and
| https://docs.rs/schemars/latest/schemars/
___________________________________________________________________
(page generated 2023-02-17 23:00 UTC)