hngopher.com

       [HN Gopher] My notes on Gitlab's Postgres schema design (2022)
       ___________________________________________________________________
        
       My notes on Gitlab's Postgres schema design (2022)
        
       Author : daigoba66
       Score  : 437 points
       Date   : 2024-02-17 22:05 UTC (1 days ago)
        
 (HTM) web link (shekhargulati.com)
 (TXT) w3m dump (shekhargulati.com)
        
       | kelnos wrote:
       | > _For example, Github had 128 million public repositories in
       | 2020. Even with 20 issues per repository it will cross the serial
       | range. Also changing the type of the table is expensive._
       | 
       | I expect the majority of those public repositories are forks of
       | other repositories, and those forks only exist so someone could
       | create pull requests against the main repository. As such, they
       | won't ever have any issues, unless someone makes a mistake.
       | 
       | Beyond that, there are probably a lot of small, toy projects that
       | have no issues at all, or at most a few. Quickly-abandoned
       | projects will suffer the same fate.
       | 
       | I suspect that even though there are certainly some projects with
       | hundreds and thousands of issues, the average across all 128M of
       | those repos is likely pretty small, probably keeping things well
       | under the 2B limit.
       | 
       | Having said that, I agree that using a 4-byte type (well, 31-bit,
       | really) for that table is a ticking time bomb for some orgs,
       | github.com included.
        
         | justinclift wrote:
         | > github.com included
         | 
         | Typo?
        
         | rapfaria wrote:
         | _> Having said that, I agree that using a 4-byte type (well,
         | 31-bit, really) for that table is a ticking time bomb for some
         | orgs_
         | 
         | A bomb defused in a migration that takes eleven seconds
        
           | aeyes wrote:
           | The migration has to rewrite the whole table, bigint needs 8
           | bytes so you have to make room for that.
           | 
           | I have done several such primary key migrations on tables
           | with 500M+ records, they took anywhere from 30 to 120 minutes
           | depending on the amount of columns and indexes. If you have
           | foreign keys it can be even longer.
           | 
           | Edit: But there is another option which is logical
           | replication. Change the type on your logical replica, then
           | switch over. This way the downtime can be reduced to minutes.
        
             | winrid wrote:
             | This largely depends on the disk. I wouldn't expect that to
             | take 30mins on a modern NVME drive, but of course it
             | depends on table size.
        
               | aeyes wrote:
               | Disk wasn't the limit in my case, index creation is
               | single threaded.
        
               | winrid wrote:
               | Which is still very IO bound... Wonder what kinda IOPS
               | you were observing? Also they make pretty fast CPUs these
               | days :)
        
               | scient wrote:
               | Large tables take hours, if not days. I attempted a test
               | case on AWS using souped up io2 disks (the fastest most
               | expensive disks they have) and a beast of a DB server
               | (r5.12xl I think) and it became abundantly clear that at
               | certain scale you won't be doing any kind of in-place
               | table updates like that on the system. Especially if your
               | allowed downtime is one hour maintenance window per
               | week...
        
               | aeyes wrote:
               | I did it on a r6.24xlarge RDS instance and the CPU wasn't
               | doing much during the operation. IO peaked at 40k IOPS on
               | EBS with provisioned IOPS, I'm not sure if a local disk
               | would be any faster but I already know that rewriting the
               | table and creating the indexes are all single threaded so
               | there isn't much you could gain.
               | 
               | Once I got the logical replication setup to work I
               | changed 20 tables on the replica and made the switch with
               | 15 minutes of downtime. That saved me a lot of long
               | nights.
               | 
               | You can get an idea of how long this could take by
               | running pg_repack, it's basically doing the same thing:
               | Copying the data and recreating all the indexes.
        
             | semiquaver wrote:
             | In practice the only option that I've seen work for very
             | large teams and very large relational databases is online
             | schema change tools like https://github.com/shayonj/pg-osc
             | and https://github.com/github/gh-ost (the latter developed
             | for GitHub's monolith). It's just too difficult to model
             | what migrations will cause problems under load. Using a
             | binlog/shadowtable approach for all migrations mostly
             | obviates the problem.
        
               | CubsFan1060 wrote:
               | You can also migrate it using logical replication.
        
               | scient wrote:
               | Rails migrations really fail to be viable at scale. And
               | its not really because of the migrations in Rails, its
               | because changes in PostgreSQL get very very expensive.
               | Things have gotten better in PG 15, but its still not
               | quite there yet.
               | 
               | I've ran into an integer PK being almost exhausted in
               | values twice now. In one of these cases I used pg-osc to
               | rewrite a 8TB table on disk without downtime over a
               | period of a week to fix the issue. In the other case it
               | was about 6TB within 2 days. Its doable, but the amount
               | of planning and time it takes is non-trivial.
               | 
               | I so far have no found any other viable solutions either.
               | I keep hearing about logical replication but I haven't
               | seen a great real-world solution at scale so far.
        
           | tengbretson wrote:
           | In JavaScript land, postgres bigints deserialize as strings.
           | Is your application resilient to this? Are your downstream
           | customers ready to handle that sort of schema change?
           | 
           | Running the db migration is the easy part.
        
             | Rapzid wrote:
             | Depends on the lib. Max safe int size is like 9
             | quadrillion. You can safely deserialize serial bigints to
             | this without ever worrying about hitting that limit in many
             | domains.
        
               | duskwuff wrote:
               | > Max safe int size is like 9 quadrillion.
               | 
               | 2^53, to be precise. If your application involves
               | assigning a unique identifier to every ant on the planet
               | Earth (approx. 10^15 [?] 2^50), you might need to think
               | about this. Otherwise, I wouldn't worry about it.
        
           | jameshart wrote:
           | 11 seconds won't fix all your foreign keys. And all the code
           | written against it that assumes an int type will accommodate
           | the value.
        
         | zX41ZdbW wrote:
         | It is still under the limit today with 362,107,148 repositories
         | and 818,516,506 unique issues and pull requests:
         | 
         | https://play.clickhouse.com/play?user=play#U0VMRUNUIHVuaXEoc...
        
           | nly wrote:
           | That query took a long time
        
           | zx8080 wrote:
           | Elapsed: 12.618 sec, read 7.13 billion rows, 42.77 GB
           | 
           | This is too long, seems the ORDER BY is not set up correctly
           | for the table.
        
             | zx8080 wrote:
             | Also,
             | 
             | > `repo_name` LowCardinality(String),
             | 
             | This is not a low cardinality:
             | 
             | 7133122498 = 7.1B
             | 
             | Don't use low cardinality for such columns!
        
           | ssalka wrote:
           | I'm guessing this won't be including issues & PRs from
           | private repos, which could be substantial
        
         | mvdtnz wrote:
         | Do we know for sure if gitlab cloud uses a multi-tenanted
         | database, or a db per user/customer/org? In my experience
         | products that offer both a self hosted and cloud product tend
         | to prefer a database per customer, as this greatly simplifies
         | the shared parts of the codebase, which can use the same
         | queries regardless of the hosting type.
         | 
         | If they use a db per customer then no one will ever approach
         | those usage limits and if they do they would be better suited
         | to a self hosted solution.
        
           | Maxion wrote:
           | I've toyed with various SaaS designs and multi tenanted
           | databses always come to th forefront of my mind. It seems to
           | simplify the architecture a lot.
        
           | tcnj wrote:
           | Unless something has substantially changed since I last
           | checked, gitlab.com is essentially self-hosted gitlab
           | ultimate with a few feature flags to enable some marginally
           | different behaviour. That is, it uses one multitennant DB for
           | the whole platform.
        
             | karolist wrote:
             | Not according to [1] where the author said
             | 
             | > This effectively results in two code paths in many parts
             | of your platform: one for the SaaS version, and one for the
             | self-hosted version. Even if the code is physically the
             | same (i.e. you provide some sort of easy to use wrapper for
             | self-hosted installations), you still need to think about
             | the differences.
             | 
             | 1. https://yorickpeterse.com/articles/what-it-was-like-
             | working-...
        
         | golergka wrote:
         | Being two orders of magnitude away from running out of ids is
         | too close for comfort anyway.
        
         | iurisilvio wrote:
         | Migrating primary keys from int to bigint is feasible. Requires
         | some preparation and custom code, but zero downtime.
         | 
         | I'm managing a big migration following mostly this recipe, with
         | a few tweaks: http://zemanta.github.io/2021/08/25/column-
         | migration-from-in...
         | 
         | FKs, indexes and constraints in general make the process more
         | difficult, but possible. The data migration took some hours in
         | my case, but no need to be fast.
         | 
         | AFAIK GitLab has tooling to run tasks after upgrade to make it
         | work anywhere in a version upgrade.
        
         | istvanu wrote:
         | I'm convinced that GitHub's decision to move away from Rails
         | was partly influenced by a significant flaw in ActiveRecord:
         | its lack of support for composite primary keys. The need for
         | something as basic as PRIMARY KEY(repo_id, issue_id) becomes
         | unnecessarily complex within ActiveRecord, forcing developers
         | to use workarounds that involve a unique key alongside a
         | singular primary key column to meet ActiveRecord's requirements
         | --a less than ideal solution.
         | 
         | Moreover, the use of UUIDs as primary keys, while seemingly a
         | workaround, introduces its own set of problems. Despite
         | adopting UUIDs, the necessity for a unique constraint on the
         | (repo_id, issue_id) pair persists to ensure data integrity, but
         | this significantly increases the database size, leading to
         | substantial overhead. This is a major trade-off with potential
         | repercussions on your application's performance and
         | scalability.
         | 
         | This brings us to a broader architectural concern with Ruby on
         | Rails. Despite its appeal for rapid development cycles, Rails'
         | application-level enforcement of the Model-View-Controller
         | (MVC) pattern, where there is a singular model layer, a
         | singular controller layer, and a singular view layer, is
         | fundamentally flawed. This monolithic approach to MVC will
         | inevitably lead to scalability and maintainability issues as
         | the application grows. The MVC pattern would be more
         | effectively applied within modular or component-based
         | architectures, allowing for better separation of concerns and
         | flexibility. The inherent limitations of Rails, especially in
         | terms of its rigid MVC architecture and database management
         | constraints, are significant barriers for any project beyond
         | the simplest MVPs, and these are critical factors to consider
         | before choosing Rails for more complex applications.
        
           | zachahn wrote:
           | I can't really comment on GitHub, but Rails supports
           | composite primary keys as of Rails 7.1, the latest released
           | version [1].
           | 
           | About modularity, there are projects like Mongoid which can
           | completely replace ActiveRecord. And there are plugins for
           | the view layer, like "jbuilder" and "haml", and we can bypass
           | the view layer completely by generating/sending data inside
           | controller actions. But fair, I don't know if we can
           | completely replace the view and controller layers.
           | 
           | I know I'm missing your larger point about architecture! I
           | don't have so much to say, but I agree I've definitely worked
           | on some hard-to-maintain systems. I wonder if that's an
           | inevitability of Rails or an inevitability of software
           | systems--though I'm sure there are exceptional codebases out
           | there somewhere!
           | 
           | [1] https://guides.rubyonrails.org/7_1_release_notes.html#com
           | pos...
        
           | silasb wrote:
           | Do you have any sources on GitHub moving away from Rails?
           | This is the first that I've heard and my googlefu has
           | returned zero results around this. Just last year they had a
           | blog post around Building GitHub with Ruby and Rails[0] so
           | your remark caught my off guard.
           | 
           | [0]: https://github.blog/2023-04-06-building-github-with-
           | ruby-and...
        
           | modderation wrote:
           | Perhaps too late, but Rails 7.1[1] introduced composite
           | primary key support, and there's been a third-party gem[2]
           | offering the functionality for earlier versions of
           | ActiveRecord.
           | 
           | [1] https://guides.rubyonrails.org/7_1_release_notes.html#com
           | pos...
           | 
           | [2] https://github.com/composite-primary-
           | keys/composite_primary_...
        
       | martinald wrote:
       | Is it just me that thinks in general schema design and
       | development is stuck in the stone ages?
       | 
       | I mainly know dotnet stuff, which does have migrations in EF (I
       | note the point about gitlab not using this kind of thing because
       | of database compatibility). It can point out common data loss
       | while doing them.
       | 
       | However, it still is always quite scary doing migrations,
       | especially bigger ones refactoring something. Throw into this
       | jsonb columns and I feel it is really easy to screw things up and
       | suffer bad data loss.
       | 
       | For example, renaming a column (at least in EF) will result in a
       | column drop and column create on the autogenerated migrations.
       | Why can't I give the compiler/migration tool more context on this
       | easily?
       | 
       | Also the point about external IDs and internal IDs - why can't
       | the database/ORM do this more automatically?
       | 
       | I feel there really hasn't been much progress on this since
       | migration tooling came around 10+ years ago. I know ORMs are
       | leaky abstractions, but I feel everyone reinvents this stuff
       | themselves and every project does these common things a different
       | way.
       | 
       | Are there any tools people use for this?
        
         | sjwhevvvvvsj wrote:
         | One thing I like about hand designing schema is it makes you
         | sit down and make very clear choices about what your data is,
         | how it interrelates, and how you'll use it. You understand your
         | own goals more clearly.
        
           | nly wrote:
           | Exactly that. Sitting down and thinking about your data
           | structures and APIs before you start writing code seems to be
           | a fading skill.
        
             | EvanAnderson wrote:
             | It absolutely shows in the final product, too.
             | 
             | I wish more companies evaluated the suitability of software
             | based on reviewing the back-end data storage schema. A lot
             | of sins can be hidden in the application layer but many
             | become glaringly obvious when you look at how the data is
             | represented and stored.
        
           | doctor_eval wrote:
           | So many people I encounter seem to think it's the code that's
           | important when building the back end of an application. You
           | see this when people discussing database schemas start
           | comparing, say, rails to hibernate. But ORMs emphasise code
           | instead of data, which in my experience is a big mistake.
           | 
           | In my experience, getting the data structures right is 99% of
           | the battle. If you get that right, the code that follows is
           | simple and obvious.
           | 
           | For database applications, this means getting the schema
           | right. To this end, I always start with the underlying table
           | structures, and only start coding once I understand how the
           | various tables are going to interact.
           | 
           | Sadly, too many people think of the database as the annoying
           | hoops we jump through in order to store the results of our
           | code. In my world, the code I write is the minimum required
           | to safely manipulate the database; it's the data that counts.
           | 
           | Some people seem to think I'm weird for starting with the
           | database (and for using plpgsql), but I think it's actually a
           | superpower.
        
             | jupp0r wrote:
             | This is true for in memory data as well. Object oriented
             | programming is great for some problems, but it's also
             | limiting the way we think about data by putting it close to
             | the code operating on it. ORMs do the same to databases by
             | pretending that rows are objects when that's only one way
             | of modeling your problem.
        
             | sjwhevvvvvsj wrote:
             | Yup - and you can't code your way to real scale either. At
             | real scale the game is all about data structures. Code just
             | gets them from A to B.
             | 
             | Or as they say at Google, the job of SWE is "moving
             | protos".
        
             | jimbokun wrote:
             | It's part of the dirty little secret of why document
             | databases and other NoSQL systems became popular.
             | 
             | Required even less up front thinking about how to model
             | your data. Throw some blobs of JSON into Mongo or whatever,
             | and worry about the rest later.
        
             | thom wrote:
             | It wasn't always code first - you mention Hibernate but
             | 15-20 years ago it was entirely feasible to inherit a
             | database schema or design one up front, and then create
             | performant metadata mappings to a usable object model. That
             | sort of practise was tainted by the Bad Enterprise brushes
             | of verbosity and XML in general, and so everyone moved to
             | some flavour of active record. This allowed programmers to
             | write less code and fit it into neater boxes, at a time
             | when there was enormous demand for basic CRUD web apps, but
             | a lot of power and expressiveness was lost.
             | 
             | Somewhat ironically, many modern enterprises have peeled
             | all the way back to SQL for a huge amount of logic anyway,
             | so I don't think we're done caring about database schemas
             | quite yet.
        
               | doctor_eval wrote:
               | You are absolutely correct; my previous business ended up
               | ejecting almost all database logic from Java ORMs, and
               | moved almost everything to straight SQL manipulated by
               | stored procedures. Doing so resulted in a significant
               | performance and productivity increase, relative to all
               | the layers of nonsense we used to jump through.
               | 
               | One of the difficulties of doing this was that the
               | tooling isn't great. We had to write our own tools to
               | make it work, but the benefits of going back to SQL were
               | more than worth it.
               | 
               | (Many years later I made an open source version of that
               | tooling, https://github.com/pgpkg/pgpkg)
        
               | sjwhevvvvvsj wrote:
               | A simple function call as a wrapper for a well optimized
               | sql query just can't be beat for performance. I have
               | never understood why anybody would use an ORM, it's
               | usually as much work to learn and maintain them as SQL.
        
         | cschmatzler wrote:
         | If you use MySQL, Planetscale's branching is really amazing.
         | Not using them, but wish I could for that. Gives you a complete
         | diff of what you're doing, and can also pre-plan migrations and
         | only apply them when you need with their gating.
        
         | Atotalnoob wrote:
         | EF core doesn't to drop/create for columns in db providers that
         | support renaming columns. It only does it for ones that don't
         | like MySQL or SQLite
        
         | winrid wrote:
         | Not a silver bullet for every project but the Django ORM
         | largely solves this with its migrations. You define your table
         | classes and it just generates the migrations.
         | 
         | Throw in a type checker and you're in pretty good shape.
         | 
         | Rust also has sqlx which will type check your code against the
         | DB.
        
           | dxdm wrote:
           | I'm assuming this is why you say it's not a silver bullet,
           | but to make it more explicit: the Django ORM will happily
           | generate migrations that will lock crucial tables for long
           | amounts of time and bring down your production application in
           | the process.
           | 
           | You still need to know what SQL the migration will run (take
           | a look at `manage.py sqlmigrate`) and most importantly how
           | your database will apply it.
        
             | basil-rash wrote:
             | Dealing with a bunch of automigrate headaches in the Prisma
             | ORM convinced me to just drop the layer entirely and write
             | plain old SQL everywhere. It's forced me to learn a bunch
             | of new stuff, but the app runs faster now that I can
             | optimize every query and migrations are much simpler with a
             | single idempotent SQL setup script I can run to provision
             | whatever deployment of the DB I need. I'm sure some problem
             | spaces might benefit from all the additional complexity and
             | abstraction, but the average app certainly can make do
             | without for a long time.
        
               | winrid wrote:
               | It's a tradeoff! I think using the ORM to start and then
               | move off it later is valid, depending on how much time
               | you have to get an MVP out.
        
               | kfajdsl wrote:
               | I also switched to plain SQL migrations and queries
               | because I find it much simpler. I hear this a lot, that
               | ORMs are easier/quicker to use, but I've found that
               | writing plain SQL has almost no friction for me. I mean,
               | learning at least basic SQL is part of the first year in
               | most CS degrees.
        
               | fabian2k wrote:
               | There are some things that are really annoying in raw SQL
               | and that are much easier with ORMs. Dynamic queries are
               | one of those, if you don't want to concatenate SQL
               | strings yourself you need at least a query builder at
               | that point. The other part is updating/creating entities
               | with many relations, that gets very tedious quickly and
               | is somethings ORMs can handle for you.
               | 
               | It depends a lot on the specific application, for me
               | those things are so common that I prefer to use an ORM
               | even though I could write the SQL myself. ORMs are easier
               | to use, but you still need to learn what they do under
               | the hood and understand them or you will run into issues
               | along the way.
        
               | basil-rash wrote:
               | I'm not suggesting anyone implement the raw binary
               | protocol themselves! The postgres connection library
               | you're using should be able to handle dynamic queries
               | perfectly fine.
        
           | mvdtnz wrote:
           | An ORM is NEVER the solution, ever ever ever. Repeat after
           | me: ORMs are not the solution to this problem. They work in
           | your little toy apps with 4 customers but they are nothing
           | but pain on real enterprise grade software.
        
             | winrid wrote:
             | I don't think insults are really necessary here.
             | 
             | Also ORMs can be very useful, just don't do dumb stuff,
             | like with any technology.
             | 
             | I use them when appropriate.
        
             | winrid wrote:
             | also, every company I've worked at used ORMs in some
             | capacity. Sometimes, not always.
             | 
             | Also, I don't really work on any apps with only four
             | customers. either they have almost a million or zero :P Try
             | again. :)
        
         | timacles wrote:
         | Theres no right abstraction for it because everyones data is
         | different. From my experience what most developers dont realize
         | is that data is more complex than code. Code is merely the
         | stuff that sits on top of the data, shuffling it around... but
         | designing and handling the data in an efficient way is the real
         | engineering problem.
         | 
         | Any abstraction you could come up with wouldnt fit 90% of the
         | other cases
        
         | Merad wrote:
         | > Also the point about external IDs and internal IDs - why
         | can't the database/ORM do this more automatically?
         | 
         | It has pretty big implications for how your application code
         | interacts with the database. Queries that involve id's will
         | need to perform joins in order to check the external id.
         | Inserts or updates that need to set a foreign key need to
         | perform an extra lookup to map the external id to the correct
         | FK value (whether it's literally a separate query or a
         | CTE/subquery). Those are things that are way outside the realm
         | of what EF can handle automatically, at least as it exists
         | today.
        
         | emodendroket wrote:
         | I think that stuff works about as well as it possibly could. If
         | you think that's painful think about something like DynamoDB
         | where if you didn't really think through the access patterns up
         | front you're in for a world of pain.
        
       | josephg wrote:
       | > As I discussed in an earlier post[3] when you use Postgres
       | native UUID v4 type instead of bigserial table size grows by 25%
       | and insert rate drops to 25% of bigserial. This is a big
       | difference.
       | 
       | Does anyone know why UUIDv4 is so much worse than bigserial?
       | UUIDs are just 128 bit numbers. Are they super expensive to
       | generate or something? Whats going on here?
        
         | AprilArcus wrote:
         | UUIDv4s are fully random, and btree indices expect "right-
         | leaning" values with a sensible ordering. This makes indexing
         | operations on UUIDv4 columns slow, and was the motivation for
         | the development of UUIDv6 and UUIDv7.
        
           | couchand wrote:
           | I'm curious to learn more about this heuristic and how the
           | database leverages it for indexing. What does right-leaning
           | mean formally and what does analysis of the data structure
           | look like in that context? Do variants like B+ or B* have the
           | same charactersistics?
        
         | stephen123 wrote:
         | I think its because of btrees. Btrees and the pages work better
         | if only the last page is getting lots of writes. Iuids cause
         | lots of un ordered writes leading to page bloat.
        
         | barrkel wrote:
         | Random distribution in the sort order mean the cache locality
         | of a btree is poor - instead of inserts going to the last page,
         | they go all over the place. Locality of batch inserts is also
         | then bad at retrieval time, where related records are looked up
         | randomly later.
         | 
         | So you pay taxes at both insert time and later during
         | selection.
        
         | perrygeo wrote:
         | The 25% increase in size is true but it's 8 bytes, a small and
         | predictable linear increase per row. Compared to the rest of
         | the data in the row, it's not much to worry about.
         | 
         | The bigger issue is insert rate. Your insert rate is limited by
         | the amount of available RAM in the case of UUIDs. That's not
         | the case for auto-incrementing integers! Integers are
         | correlated with time while UUID4s are random - so they have
         | fundamentally different performance characteristics at scale.
         | 
         | The author cites 25% but I'd caution every reader to take this
         | with a giant grain of salt. At the beginning, for small tables
         | < a few million rows, the insert penalty is almost negligible.
         | If you did benchmarks here, you might conclude there's no
         | practical difference.
         | 
         | As your table grows, specifically as the size of the btree
         | index starts reaching the limits of available memory, postgres
         | can no longer handle the UUID btree entirely in memory and has
         | to resort to swapping pages to disk. An auto-integer type won't
         | have this problem since rows close in time will use the same
         | index page thus doesn't need to hit disk at all under the same
         | load.
         | 
         | Once you reach this scale, The difference in speed is orders of
         | magnitude. It's NOT a steady 25% performance penalty, it's a
         | 25x performance cliff. And the only solution (aside from a
         | schema migration) is to buy more RAM.
        
       | vinnymac wrote:
       | I always wondered what the purpose of that extra "I" was in the
       | CI variables `CI_PIPELINE_IID` and `CI_MERGE_REQUEST_IID` were
       | for. Always assumed it was a database related choice, but this
       | article confirms it.
        
       | zetalyrae wrote:
       | The point about the storage size of UUID columns is unconvincing.
       | 128 bits vs. 64 bits doesn't matter much when the table has five
       | other columns.
       | 
       | A much more salient concern for me is performance. UUIDv4 is
       | widely supported but is completely random, which is not ideal for
       | index performance. UUIDv7[0] is closer to Snowflake[1] and has
       | some temporal locality but is less widely implemented.
       | 
       | There's an orthogonal approach which is using bigserial and
       | encrypting the keys: https://github.com/abevoelker/gfc64
       | 
       | But this means 1) you can't rotate the secret and 2) if it's ever
       | leaked everyone can now Fermi-estimate your table sizes.
       | 
       | Having separate public and internal IDs seems both tedious and
       | sacrifices performance (if the public-facing ID is a UUIDv4).
       | 
       | I think UUIDv7 is the solution that checks the most boxes.
       | 
       | [0]: https://uuid7.com/
       | 
       | [1]: https://en.wikipedia.org/wiki/Snowflake_ID
        
         | newaccount7g wrote:
         | If I've learned anything in my 7 years of software development
         | it's that this kind of expertise is just "blah blah blah" that
         | will get you fired. Just make the system work. This amount of
         | trying to anticipate problems will just screw you up. I
         | seriously can't imagine a situation where knowing this would
         | actually improve the performance noticeably.
        
         | canadiantim wrote:
         | Would it ever make sense to have a uuidv7 as primary key but
         | then anther slug field for a public-id, e.g. one that is
         | shorter and better in a url or even allowing user to customize
         | it?
        
           | Horffupolde wrote:
           | Yes sure but now you have to handle two ids and guaranteeing
           | uniqueness across machines or clusters becomes hard.
        
             | vrosas wrote:
             | That and a uuid is going to be unique across all tables and
             | objects, whereas a slug will only be unique within a
             | certain subset e.g. users within an organization. I've seen
             | a production issue IRL where someone (definitely not me)
             | wrote a query fetching objects by slug and forgot to
             | include the 'AND parent_slug = xxx'
        
         | gfody wrote:
         | think of the primary keys in a database like typedef void* ie
         | it's your fundamental pointer and the size of it will impact
         | every aspect of performance throughout - memory/disk footprint
         | and corresponding throughput bottlenecks, cpu time comparing
         | keys which is what every operation reduces to in the deepest
         | inner-most loops of joins and lookups etc.
         | 
         | when x86-64 cpus were new the performance impact from switching
         | to 64-bit pointers was so bad we had to create x32/ilp32 and
         | the reason .NET still has "prefer 32-bit" as a default even
         | today.
         | 
         | using 128-bit uuids as PKs in a database is an awful mistake
        
         | Merad wrote:
         | > The point about the storage size of UUID columns is
         | unconvincing. 128 bits vs. 64 bits doesn't matter much when the
         | table has five other columns.
         | 
         | But it's not just the size of that one column, it's also the
         | size of all the places that id is used as a FK and the indexes
         | that may be needed on those FK columns. Think about something
         | like a user id that might be referenced by dozens or even
         | hundreds of FKs throughout your database.
        
           | solidsnack9000 wrote:
           | ...and this has not just a size impact but also a substantial
           | performance impact.
        
         | nine_k wrote:
         | It very much does when you have a ton of FKs (enforced or not)
         | using such a column, and thus indexed and used in many joins.
         | Making it twice as hard for the hot part of an index to fit to
         | RAM is never good for performance, nor for the cloud bill.
         | 
         | If you have a column that is used in many joins, there are
         | performance reasons to make it as compact as possible (but not
         | smaller).
        
         | s4i wrote:
         | The v7 isn't a silver bullet. In many cases you don't want to
         | leak the creation time of a resource. E.g. you want to upload a
         | video a month before making it public to your audience without
         | them knowing.
        
         | solidsnack9000 wrote:
         | > There's an orthogonal approach which is using bigserial and
         | encrypting the keys...
         | 
         | Another variant of this approach:
         | https://pgxn.org/dist/permuteseq/
         | 
         | It is also feasible to encrypt the value on display (when
         | placing it in URLs, emails, &c):
         | 
         | https://wiki.postgresql.org/wiki/Pseudo_encrypt
         | 
         | This maintains many of the benefits of sequential indexes and
         | _does_ allow you to change the key. However, if the key is
         | changed, it would break any bookmarks, invalidate anything sent
         | in older emails -- it would have the same effect as renaming
         | everything.
        
       | bluerooibos wrote:
       | Has anyone written about or noticed the performance differences
       | between Gitlab and GitHub?
       | 
       | They're both Rails-based applications but I find page load times
       | on Gitlab in general to be horrific compared to GitHub.
        
         | heyoni wrote:
         | I mean GitHub in general has been pretty reliable minus the two
         | outages they had last year and is usually pretty performant or
         | I wouldn't use their keyboard shortcuts.
         | 
         | There are some complaints here from a former dev about gitlab
         | that might provide insight into its culture and lack of regard
         | for performance: https://news.ycombinator.com/item?id=39303323
         | 
         | Ps: I do not use gitlab enough to notice performance issues but
         | thought you might appreciate the article
        
           | anoopelias wrote:
           | More comments on this submission:
           | https://news.ycombinator.com/item?id=39333220
        
           | imiric wrote:
           | > I mean GitHub in general has been pretty reliable minus the
           | two outages they had last year
           | 
           | Huh? GitHub has had major outages practically every other
           | week for a few years now. There are pages of HN threads[1].
           | 
           | There's a reason why githubstatus.com doesn't show historical
           | metrics and uptime percentages: it would make them look
           | incompetent. Many outages aren't even officially reported
           | there.
           | 
           | I do agree that when it's up, performance is typically better
           | than Gitlab's. But describing GH as reliable is delusional.
           | 
           | [1]: https://hn.algolia.com/?dateRange=all&page=0&prefix=fals
           | e&qu...
        
             | heyoni wrote:
             | Delusional? Anecdotal maybe...I was describing my
             | experience so thanks for elaborating.
             | 
             | I only use it as a code repository. Was it specific
             | services within GitHub that failed a lot?
        
               | imiric wrote:
               | My apologies, that came off a bit harsh.
               | 
               | > Was it specific services within GitHub that failed a
               | lot?
               | 
               | Well, everything from its frontends, to the CI service,
               | to the Git service itself. People weren't able to push,
               | pull, deploy or review code for hours at a time. Just
               | major disruptions all around, which happened regularly.
               | 
               | I do think this has improved slightly over the past few
               | months, but you were lucky if you weren't affected much.
        
               | heyoni wrote:
               | No worries you're fine. I guess downtime was much more
               | widespread than I could even imagine.
               | 
               | I do remember one time all of us at work joking that we
               | should get the day off when GitHub was done. :D
        
               | mvdtnz wrote:
               | You can't throw around specific metrics ("two outages
               | they had last year") then claim you were just sharing a
               | personal anecdote when someone calls you out and brings
               | receipts. Just take the L.
        
               | heyoni wrote:
               | I absolutely can because this is a discussion. Being
               | called delusional for sharing my experience on a platform
               | I use _at least_ 40 hours a week was a bit much but in
               | case you missed it, I relented. They brought receipts and
               | I came down from my position because there was none. I
               | wasn't arguing with anyone or saying anyone was wrong
               | about anything.
               | 
               | Do you expect all of your conversations to be backed by
               | facts and citation? Is that why the only comments you
               | make are aggressive and confrontational? Because it's
               | easier than the bar you're setting for everyone else?
               | Yea, I looked back at your history, not a single url.
               | 
               | And now I'm taking a position.
               | 
               | > No worries you're fine. I guess downtime was much more
               | widespread than I could even imagine.
               | 
               | Did you miss that?
        
         | golergka wrote:
         | I used Gitlab a few years ago, but then it had severe client-
         | side performance problems on large pull requests. Github isn't
         | ideal with them too, but it manages to be decent.
        
           | wdb wrote:
           | Yeah, good reason to spit up pull requests ;) I do think it
           | improved a lot over the last two years, though
        
         | oohffyvfg wrote:
         | > compared to GitHub.
         | 
         | this is like comparing chrome and other browsers, even chromium
         | based.
         | 
         | chrome and github will employ all tricks in the book, even if
         | they screw you. for example, how many hours of despair I've
         | wasted when manually dissecting a git history on employer
         | github by opening merge diffs, hitting ctrl F, seeing no
         | results and moving to the next... only to find on the 100th
         | diff that deep down the diff lost they hid the most important
         | file because it was more convenient for them (so one team lead
         | could hit some page load metric and get a promotion)
        
       | gfody wrote:
       | > 1 quintillion is equal to 1000000000 billions
       | 
       | it is pretty wild that we generally choose between int32 and
       | int64. we really ought to have a 5 byte integer type which would
       | support cardinalities of ~1T
        
         | klysm wrote:
         | Yeah it doesn't make sense to pick something that's not a power
         | of 2 unless you are packing it.
        
           | postalrat wrote:
           | I guess that depends on how the index works.
        
           | gfody wrote:
           | we usually work with some page size anyways, 64 5-byte ints
           | fit nicely into 5 512-bit registers, and ~1T is a pretty
           | flexible cardinality limit after ~2B
        
       | sidcool wrote:
       | Great read! And even better comments here.
        
       | exabrial wrote:
       | Foreign keys are expensive is an oft repeated rarely benched
       | claim. There are tons of ways to do it incorrectly. But in your
       | stack you are always enforcing integrity _somewhere_ anyway.
       | Leveraging the database instead of reimplementing it requires
       | knowledge and experimentation, and it more often than not it will
       | save your bacon.
        
       | rob137 wrote:
       | I found this post very useful. I'm wondering where I could find
       | others like it?
        
         | aflukasz wrote:
         | I recommend Postgres FM podcast, e.g. available as video on
         | Postgres TV yt channel. Good content on its own, and many
         | resources of this kind are linked in the episode notes. I
         | believe one of the authors even helped Gitlab specifically with
         | Postgres performance issues not that long ago.
        
         | cosmicradiance wrote:
         | Here's another - https://zerodha.tech/blog/working-with-
         | postgresql/
        
       | yellowapple wrote:
       | > It is generally a good practice to not expose your primary keys
       | to the external world. This is especially important when you use
       | sequential auto-incrementing identifiers with type integer or
       | bigint since they are guessable.
       | 
       | What value would there be in preventing guessing? How would that
       | even be possible if requests have to be authenticated in the
       | first place?
       | 
       | I see this "best practice" advocated often, but to me it reeks of
       | security theater. If an attacker is able to do anything useful
       | with a guessed ID without being authenticated and authorized to
       | do so, then something else has gone horribly, horribly,
       | _horribly_ wrong and _that_ should be the focus of one 's energy
       | instead of adding needless complexity to the schema.
       | 
       | The only case I know of where this might be valuable is from a
       | business intelligence standpoint, i.e. you don't want competitors
       | to know how many customers you have. My sympathy for such
       | concerns is quite honestly pretty low, and I highly doubt GitLab
       | cares much about that.
       | 
       | In GitLab's case, I'm reasonably sure the decision to use id +
       | iid is less driven by "we don't want people guessing internal
       | IDs" and more driven by query performance needs.
        
         | s4i wrote:
         | It's mentioned in the article. It's more to do with business
         | intelligence than security. A simple auto-incrementing ID will
         | reveal how many total records you have in a table and/or their
         | growth rate.
         | 
         | > If you expose the issues table primary key id then when you
         | create an issue in your project it will not start with 1 and
         | you can easily guess how many issues exist in the GitLab.
        
         | JimBlackwood wrote:
         | I follow this best practice, there's a few reasons why I do
         | this. It doesn't have to do with using a guessed primary ID for
         | some sort of privilege escalation, though. It has more to do
         | with not leaking any company information.
         | 
         | When I worked for an e-commerce company, one of our biggest
         | competitors used an auto-incrementing integer as primary key on
         | their "orders" table. Yeah... You can figure out how this was
         | used. Not very smart by them, extremely useful for my employer.
         | Neither of these will allow security holes or leak customer
         | info/payment info, but you'd still rather not leak this.
        
           | tomnipotent wrote:
           | > extremely useful for my employer.
           | 
           | I've been in these shoes before, and finding this information
           | doesn't help you as an executive or leader make any better
           | decisions than you could have before you had the data. No
           | important decision is going to be swayed by something like
           | this, and any decision that is probably wasn't important.
           | 
           | Knowing how many orders is placed isn't so useful without
           | average order value or items per cart, and the same is true
           | for many other kinds of data gleamed from this method.
        
             | JimBlackwood wrote:
             | That's not correct. Not every market is the same in it's
             | dynamics.
             | 
             | Yes, most of the time that information was purely
             | insightful and was simply monitored. However, at some
             | moments it definitely drove important decisions.
        
               | tomnipotent wrote:
               | Like what?
               | 
               | What's going to change how a team develops (physical)
               | products? What's a merchandiser or buyer going to learn
               | that influences how they spend millions of dollars or
               | deal with X weeks-on-hand of existing inventory? What's
               | an operations manager going to learn that improves their
               | ability to warehouse and ship product? How's marketing
               | going to change their strategy around segmentation or
               | channel distribution? What's a CEO going to learn that
               | changes what departments or activities they want to
               | invest in?
               | 
               | At best you get a few little tidbits of data you can
               | include in presentations or board decks, but nothing
               | that's going to influence critical decisions on how money
               | is getting spent to get the job done or how time is
               | getting allocated to projects. Worst case you have a
               | inexperienced CEO that's chasing rather than leading, and
               | just end up copying superficial aspects of your
               | competitors without the context or understanding of why
               | they did what they did.
               | 
               | I've called up execs at competitors and had friendly
               | chats that revealed more about their business in 30
               | minutes than you could possibly find out through this
               | "method".
        
         | lordgrenville wrote:
         | > The only case where this might be valuable is business
         | intelligence
         | 
         | Nitpick: I would not call this "business intelligence" (which
         | usually refers to internal use of the company's own data) but
         | "competitive intelligence".
         | https://en.wikipedia.org/wiki/Competitive_intelligence
        
           | EE84M3i wrote:
           | See also "German Tank Problem"
           | https://en.m.wikipedia.org/wiki/German_tank_problem
        
         | mnahkies wrote:
         | It can be really handy for scraping/archiving websites if
         | they're kind enough to use a guessable id
        
         | tetha wrote:
         | > I see this "best practice" advocated often, but to me it
         | reeks of security theater. If an attacker is able to do
         | anything useful with a guessed ID without being authenticated
         | and authorized to do so, then something else has gone horribly,
         | horribly, horribly wrong and that should be the focus of one's
         | energy instead of adding needless complexity to the schema.
         | 
         | Yes, but the ability to guess IDs can make this security issue
         | horrible, or much much worse.
         | 
         | If you had such a vulnerability and you are exposing the users
         | to UUIDs, now people have to guess UUIDs. Even a determined
         | attacker will have a hard time doing that or they would need
         | secondary sources to get the IDs. You have a data breach, but
         | you most likely have time to address it and then you can assess
         | the amount of data lost.
         | 
         | If you can just <seq 0 10000 | xargs -I ID curl
         | service/ticket/ID> the security issue is instantly elevated
         | onto a whole new level. Suddenly all data is leaked without
         | further effort and we're looking at mandatory report to data
         | protection agencies with a massive loss of data.
         | 
         | To me, this is one of these defense in depth things that should
         | be useless. And it has no effect in many, many cases.
         | 
         | But there is truely horrid software out there that has been
         | popped in exactly the described way.
        
           | dijit wrote:
           | Case in point, a recent security issue Gitlab experienced
           | (CVE-2023-7028; arbitrary password reset by knowing one of
           | the accounts associated mail addresses) was made worse by a
           | feature of gitlab that few people know about; that the
           | "userID" is associated with a meta/internal mail address.
           | 
           | This meant that people could send password resets for any
           | user if they knew their userID. The mail format was like
           | user-1@no-reply.gitlab.com or something.
           | 
           | Since it's a safe bet that "user ID 1" is an admin user,
           | someone weaponised this.
        
             | plagiarist wrote:
             | I've already resolved to never use Gitlab entirely on the
             | basis of that CVE but that makes it worse.
             | 
             | Password resets should just never go to an email that
             | hasn't been deliberately attached to an account by the
             | account's owner, full stop. There should not be a code path
             | where it is possible to send any such thing to arbitrary
             | emails. And redirect emails should never be treated as
             | account emails in any way.
        
             | yellowapple wrote:
             | Even without that auto-incrementing ID, there are plenty of
             | other options for guessing valid email addresses to use
             | with that exploit. For example, if you're able to figure
             | out the format an organization uses for their email
             | addresses (e.g. first.last@company.com), and you're able to
             | figure out who works at that org (via e.g. LinkedIn), then
             | there's a very good chance you can reset passwords for,
             | say, the company's CTO or other likely-highly-privileged
             | users.
             | 
             | That is: this kind of proves my point. Removing
             | autoincrementing IDs from the equation is of minimal
             | benefit when things have already gone horribly horribly
             | wrong like this. It's a _little_ bit more work on the
             | attacker 's part, but not by anywhere near enough for such
             | a "mitigation" to be of much practical benefit.
        
         | remus wrote:
         | In general it's a defense-in-depth thing. You definitely
         | shouldn't be relying on it, but as an attacker it just makes
         | your life a bit harder if it's not straightforward to work out
         | object IDs.
         | 
         | For example, imagine you're poking around a system that uses
         | incrementing ints as public identifiers. Immediately, you can
         | make a good guess that there's probably going to be some high
         | privileged users with user_id=1..100 so you can start probing
         | around those accounts. If you used UUIDs or similar then you're
         | not leaking that info.
         | 
         | In gitlabs case this is much less relevant, and it's more fo a
         | cosmetic thing.
        
           | serial_dev wrote:
           | > In gitlabs case this is much less relevant (...)
           | 
           | Why, though? GitLab is often self hosted, so being able to
           | iterate through objects, like users, can be useful for an
           | attacker.
        
             | remus wrote:
             | You're right, fair point.
        
             | yellowapple wrote:
             | In my experience self-hosted GitLabs are rarely publicly-
             | accessible in the first place; they're usually behind some
             | sort of VPN.
             | 
             | As for an attacker being able to iterate through users, if
             | that information is supposed to be private, and yet an
             | attacker is getting anything other than a 404, then that's
             | a problem in and of itself and my energy would be better
             | spent fixing that.
        
         | heax wrote:
         | It really depends but useful knowledege can be derived from
         | this. If user accounts use sequential ids the id 1 is most
         | likely the admin account that is created as first user.
        
         | metafunctor wrote:
         | Bugs happen also in access control. Unguessable IDs make it
         | much harder to exploit some of those bugs. Of course the focus
         | should be on ensuring correct access control in the first
         | place, but unguessable IDs can make the difference between a
         | horrible disaster and a close call.
         | 
         | It's also possible to use auto-incrementing database IDs and
         | encrypt them, if using UUIDs doesn't work for you. With
         | appropriate software layers in place, encrypted IDs work more
         | or less automatically.
        
         | worksonmine wrote:
         | > What value would there be in preventing guessing?
         | 
         | It prevents enumeration, which may or may not be a problem
         | depending on the data. If you want to build a database of user
         | profiles it's much easier with incremental IDs than UUID.
         | 
         | It is at least a data leak but can be a security issue. Imagine
         | a server doing wrong password correctly returning "invalid
         | username OR password" to prevent enumeration. If you can still
         | crawl all IDs and figure out if someone has an account that way
         | it helps filter out what username and password combinations to
         | try from previous leaks.
         | 
         | Hackers are creative and security is never about any single
         | protection.
        
           | yellowapple wrote:
           | > If you can still crawl all IDs and figure out if someone
           | has an account that way it helps filter out what username and
           | password combinations to try from previous leaks.
           | 
           | Right, but like I suggested above, if you're able to get any
           | response other than a 404 for an ID other than one you're
           | authorized to access, then that in and of itself is a severe
           | issue. So is being able to log in with that ID instead of an
           | actual username.
           | 
           | Hackers are indeed creative, but they ain't wizards. There
           | are countless other things that would need to go horribly
           | horribly wrong for an autoincrementing ID to be useful in an
           | attack, and the lack of autoincrementing IDs doesn't really
           | do much in practice to hinder an attacker once those things
           | have gone horribly, horribly wrong.
           | 
           | I can think of maybe one exception to this, and that's with
           | e-commerce sites providing guest users with URLs to their
           | order/shipping information after checkout. Even this is
           | straightforward to mitigate (e.g. by generating a random
           | token for each order and requiring it as a URL parameter),
           | and is entirely inapplicable to something like GitLab.
        
         | SkyMarshal wrote:
         | _> This is especially important when you use sequential auto-
         | incrementing identifiers with type integer or bigint since they
         | are guessable._
         | 
         | I thought we had long since moved past that to GUIDs or UUIDs
         | for primary keys. Then if you still need some kind of
         | sequential numbering that has meaning in relation to the other
         | fields, make a separate column for that.
        
         | coldtea wrote:
         | > _I see this "best practice" advocated often, but to me it
         | reeks of security theater._
         | 
         | The idea of "security theater" is overplayed. Security can be
         | (and should be) multilayered, it doesn't have to be all or
         | nothing. So that, when they break a layer (say the
         | authentication), they shouldn't automatically gain easy access
         | to the others
         | 
         | > _If an attacker is able to do anything useful with a guessed
         | ID without being authenticated and authorized to do so, then
         | something else has gone horribly, horribly, horribly wrong and
         | that should be the focus of one 's energy instead of adding
         | needless complexity to the schema._
         | 
         | Sure. But by that time, it's will be game over if you don't
         | also have the other layers in place.
         | 
         | The thing is that you can't anticipate any contigency. Bugs
         | tend to not preannounce themselves, especially tricky nuanced
         | bugs.
         | 
         | But when they do appear, and a user can "do [something] useful
         | with an ID without being authenticated and authorized to do so"
         | you'd be thanking all available Gods that you at least made the
         | IDs not guassable - which would also give them also access to
         | every user account on the system.
        
         | kehers wrote:
         | One good argument I found [^1] about not exposing primary keys
         | is that primary keys may change (during system/db change) and
         | you want to ensure users have a consistent way of accessing
         | data.
         | 
         | [^1]:
         | https://softwareengineering.stackexchange.com/questions/2183...
        
       | azlev wrote:
       | It's reasonable to not have auto increment id's, but it's not
       | clear to me if there is benefits to have 2 IDs, one internal and
       | one external. This increases the number of columns / indexes,
       | makes you always do a lookup first, and I can't see a security
       | scenario where I would change the internal key without changing
       | the external key. Am I missing something?
        
         | Aeolun wrote:
         | You always have the information at hand anyway when doing
         | anything per project. It's also more user friendly to have
         | every project's issues start with 1 instead of starting with
         | two trillion, seven hundred billion, three hundred and five
         | million, sevenhundred and seventeen thousand three hundred
         | twentyfive.
        
       | traceroute66 wrote:
       | Slight nit-pick, but I would pick up the author on the text vs
       | varchar section.
       | 
       | The author effectively wastes many words trying to prove a non-
       | existent performance difference and then concludes "there is not
       | much performance difference between the two types".
       | 
       | This horse bolted a long time ago. Its not "not much", its
       | "none".
       | 
       | The Postgres Wiki[1] explicitly tells you to use text unless you
       | have a very good reason not to. And indeed the docs themselves[2]
       | tell us that "For many purposes, character varying acts as though
       | it were a domain over text" and further down in the docs in the
       | green Tip box, "There is no performance difference among these
       | three types".
       | 
       | Therefore Gitlab's use of (mostly) text would indicate that they
       | have RTFM and that they have designed their schema for their
       | choice of database (Postgres) instead of attempting to implement
       | some stupid "portable" schema.
       | 
       | [1]
       | https://wiki.postgresql.org/wiki/Don%27t_Do_This#Don.27t_use...
       | [2] https://www.postgresql.org/docs/current/datatype-
       | character.h...
        
         | alex_smart wrote:
         | >The author effectively wastes many words trying to prove a
         | non-existent performance difference and then concludes "there
         | is not much performance difference between the two types".
         | 
         | They then also show that there is in fact a significant
         | performance difference when you need to migrate your schema to
         | accodomate a change in length of strings being stored. Altering
         | a table to a change a column from varchar(300) to varchar(200)
         | needs to rewrite every single row, where as updating the
         | constraint on a text column is essentially free, just a full
         | table scan to ensure that the existing values satisfy your new
         | constraints.
         | 
         | FTA:
         | 
         | >So, as you can see, the text type with CHECK constraint allows
         | you to evolve the schema easily compared to character varying
         | or varchar(n) when you have length checks.
        
           | traceroute66 wrote:
           | > They then also show that there is in fact a significant
           | performance difference when you need to migrate your schema
           | to accodomate a change in length of strings being stored.
           | 
           | Which is a pointless demonstration if you RTFM and design
           | your schema correctly, using text, just like the manual and
           | the wiki tells you to.
           | 
           | > the text type with CHECK constraint allows you to evolve
           | the schema easily compared to character varying or varchar(n)
           | when you have length checks.
           | 
           | Which is exactly what the manual tells you ....
           | 
           | "For many purposes, character varying acts as though it were
           | a domain over text"
        
             | alex_smart wrote:
             | And what is wrong with someone figuring out for themselves
             | and explaining to others why some suggestion makes sense
             | logically, rather than just quoting the manual?
        
       | eezing wrote:
       | We shouldn't assume that this schema was designed all at once,
       | but rather is the product of evolution. For example, maybe the
       | external_id was added after the initial release in order to
       | support the creation of unique ids in the application layer.
        
       ___________________________________________________________________
       (page generated 2024-02-18 23:01 UTC)