[HN Gopher] Type constraints in 65 lines of SQL
       ___________________________________________________________________
        
       Type constraints in 65 lines of SQL
        
       Author : oliverrice
       Score  : 93 points
       Date   : 2023-02-17 14:31 UTC (8 hours ago)
        
 (HTM) web link (supabase.com)
 (TXT) w3m dump (supabase.com)
        
       | oliverrice wrote:
       | hi, author here happy to answer any questions
       | 
       | another approach that works great is to use `create type`[1] with
       | an `input_function` and `output_function` to build on top of an
       | existing scalar type. For example, using that method would allow
       | semver to be persisted as a string natively. The only downside to
       | that is you have to be superuser.
       | 
       | [1] https://www.postgresql.org/docs/current/sql-createtype.html
        
         | Groxx wrote:
         | Is there a way to (reasonably/somewhat ergonomically) describe
         | why a thing failed? True/false without reasons is pretty
         | painful sometimes.
        
         | mst wrote:
         | Why the {1,255} ? Pg's text type is not varchar, it can get a
         | lot longer than that and still index just fine, and while I can
         | see a restriction on the pre_release stuff -maybe- being
         | useful, compilation/build metadata doesn't seem like something
         | where an enforced maximum length that low is necessarily
         | preferable. (I could see URLs, file paths, repo+path+sha etc.
         | type stuff being useful in metadata and not something the code
         | recording it can really control the length of - it may not do
         | any harm in practice at least for a while, but I don't
         | immediately see what -good- the restriction does, albeit I'm
         | perfectly willing to be told "here's the good thing about it
         | that you didn't think of" ;)
        
         | iddan wrote:
         | Why are these practices not common with web developers? What's
         | held them back in recent years using it?
        
           | btown wrote:
           | IMO the lack of a Terraform-esque ecosystem of tools around
           | declaratively managing database-objects-as-infrastructure,
           | and understanding how a rollout of a change would be planned,
           | has historically been a big issue. If I'm choosing where in
           | the stack to declare some new business logic or type
           | constraint, I can trivially check that into Git if it's
           | running on an application server, and most frameworks' ORMs
           | can handle column-level migrations. And if I make a mistake,
           | I can push a change that declaratively says what my function
           | signature or DDL should be; it's one of the first things
           | taught to new developers. Terraform extended this mentality
           | to infrastructure, and it was gamechanging.
           | 
           | But there are far fewer tutorials and best practices on how
           | to, say, maintain a library of Postgres functions, types, and
           | stored procedures that can be iterated on. I'd venture that
           | most people have no idea how powerful their databases can be.
        
           | dventimi wrote:
           | I blame the Apple II computer.
        
             | heywhatupboys wrote:
             | no to forget: Turing's seminal paper
        
           | RedShift1 wrote:
           | I think most webdevs see databases as dumb datastores.
        
           | weird-eye-issue wrote:
           | Way easier handling this in the application code. If I had a
           | developer join my team then commit this into the codebase
           | rather than just using the ORM that everyone else is familiar
           | with they would be fired UNLESS there was a very good reason
           | for doing so
        
             | thor_molecules wrote:
             | So in this case, you've let a new dev commit to a code base
             | without review.
             | 
             | Then, instead of informing them of best practices and
             | giving them time to re-work the code, you'd just fire them?
             | 
             | Interesting strategy.
        
             | oliverrice wrote:
             | Agreed that there should be a data integrity or custom
             | operation/filter/sorting requirement to motivate a user
             | defined type
        
             | cgh wrote:
             | Easier, slower, more error-prone. Also, you should consider
             | implementing code reviews rather than just firing people
             | who commit code you don't like.
        
             | ericHosick wrote:
             | > using the ORM that everyone else is familiar with
             | 
             | A relational database isn't always fronted by the same
             | tech. You have RESTful APIs, GraphQL APIs, Low Code/No Code
             | Solutions (like Retool), reporting tools, ETL systems, etc.
             | 
             | You have company mergers where that one familiar ORM is now
             | five to six unfamiliar ORMs.
             | 
             | Ensuring data integrity is key and easiest way to do that
             | is to push constraints as close to the persistence layer as
             | possible.
        
           | oliverrice wrote:
           | I think there are a few things:
           | 
           | 1) Its only been a few years since the trend of testing on
           | sqlite locally and running postgres in production went away
           | in favor of postgres everywhere (probably thanks to docker).
           | That prevented using any feature that wasn't equally
           | supported by both.
           | 
           | 2) There's definitely a knowledge gap, and not just among
           | developers. The features are most useful for building rich
           | applications, so even DBAs didn't have much incentive to use
           | them prior to tools like supabase using the database as the
           | data model source-of-truth.
           | 
           | 3) Companies are increasingly deciding that data is their
           | competitive advantage and interest in data integrity is
           | growing. Database constraints are unparalleled at that
           | because they can't be sidestepped
        
       | nickpeterson wrote:
       | I often wonder about this kind of solution compared with a more
       | traditional solution of using a table of columns and constraints.
       | There seems like there should be an obvious heuristic of when to
       | use one or the other but it eludes me.
        
         | oliverrice wrote:
         | Definitely. For the use-cases shown in the blog post, a table
         | would be equally valid (and easier). The heuristic I use is to
         | create custom types when the values will need to be used by SQL
         | functions, or if support for things like ordering/aggregating
         | are important to the application, and the rules to support
         | those things require custom logic.
         | 
         | For example, SemVer requires that two version that differ only
         | in metadata e.g. `1.0.0+metaA` and `1.0.0+metaB` should be
         | considered equal. It would be error prone and tedious to push
         | that logic onto every query that wanted to sort the table, but
         | with a type we can define the logic once and have it work
         | everywhere.
        
         | ericHosick wrote:
         | A good heuristic to consider when designing composite types is
         | whether any part of the type would lead to redundant storage.
         | For example, a currency composite type might consist of an
         | amount field and a type field.
         | 
         | In most cases, a single invoice, order, deal, etc. is unlikely
         | to be generated using multiple currencies. Therefore, a single
         | currency_type field in the invoice table would be sufficient.
         | 
         | If we used a composite currency type, the currency_type field
         | would be highly redundant.
        
           | skissane wrote:
           | > If we used a composite currency type, the currency_type
           | field would be highly redundant.
           | 
           | It can help prevent the bug in which someone adds up multiple
           | currencies without conversion.
           | 
           | "25 USD + 25 EUR" will either fail or work via some currency
           | conversion routine.
           | 
           | "25 + 25" will produce a meaningless and useless wrong
           | answer.
        
             | doctor_eval wrote:
             | Yes, this is exactly what I've done previously. For
             | example, I implemented a sum() aggregate function that took
             | "monetary" (currency + value) composite types and raised an
             | exception if the currencies were not all the same.
        
       | enugu wrote:
       | Any idea of how to implement tagged unions in Postgres. For
       | example, Tree a = Leaf a | Branch (Tree a, Tree a)
       | 
       | This is one data type feature which would be great to have. I
       | know you can create separate tables for each option in the type
       | and use an id, but is a direct type implementation possible?
       | 
       | Dont need polymorphism(say a = String). Even a non recursive
       | tagged union would be helpful.
        
         | csnweb wrote:
         | While that might not fit your needs I map unions to Postgres
         | with an enum encoding the type and then adding constraints on
         | which additional columns must be set (and also must not be set)
         | depending on the type. For your example the enum might be leaf
         | or tree and then if it's of type tree you have non nullable
         | left_child and right_child columns and if it's a leaf those
         | must be null, and then you might have a value column that is
         | non null in both cases or only for leaves.
        
         | oliverrice wrote:
         | One of the restrictions of composite types is that they can not
         | contain an instance of themselves. So unfortunately, this is
         | not currently possible.
         | 
         | I had this issue when trying to implement an AST type for
         | pg_graphql[1] back when it was written in SQL [2]. In the end
         | we used a JSON type which was much less constrained. That might
         | be solvable using pg_jsonschema [3] if you really wanted to
         | have a good time though
         | 
         | [1] https://github.com/supabase/pg_graphql
         | 
         | [2]
         | https://github.com/supabase/pg_graphql/blob/34cc266da972d356...
         | 
         | [3] https://github.com/supabase/pg_jsonschema
        
           | nextaccountic wrote:
           | What's your opinion between jsonb vs hstore columns?
           | https://www.postgresql.org/docs/15/hstore.html
        
           | nextaccountic wrote:
           | Actually, another thing. JSON works. But what about doing
           | pattern matching on the tagged enum? Each variant should give
           | access to its fields
           | 
           | Can https://www.postgresql.org/docs/15/functions-
           | conditional.htm... be used to do it?
        
         | jerf wrote:
         | It depends on your tolerance for the exact feature set that you
         | get. The JSON type can be pressed into service to get something
         | that represents that, in the sense that what you are asking for
         | is a subset of the valid JSON values that Postgres can store.
         | If you just want to slam the values into some local tagged
         | union in your language when querying them, then problem
         | effectively solved. If you want to start operating on it in the
         | database itself, you have a lot more quirky things to work
         | through. If you want the semantics _inside_ Postgres to be
         | _exactly_ what a tagged union /sum type looks like then I think
         | you'd be looking at a custom data type at a minimum, and even
         | then there are probably limits as to how close you could get
         | inside.
        
         | nextaccountic wrote:
         | This is my pet peeve with SQL. I hope some day PostgreSQL gain
         | support for native tagged unions.
         | 
         | I have saved some links about this
         | 
         | https://github.com/solidsnack/pg-sql-variants
         | 
         | https://thelyfsoshort.io/variant-types-in-postgresql-c63725f...
         | 
         | https://www.parsonsmatt.org/2019/03/19/sum_types_in_sql.html
         | 
         | https://typeable.io/blog/2019-11-21-sql-sum-types.html
         | 
         | Right now I think the ideal is a jsonb column (with a field
         | that stores the tag) plus
         | https://github.com/supabase/pg_jsonschema. But this is only
         | usable if your language translates to/from JSON and generates
         | json schema for you.
         | 
         | Rust can do this with https://serde.rs/ and
         | https://docs.rs/schemars/latest/schemars/
        
       ___________________________________________________________________
       (page generated 2023-02-17 23:00 UTC)