[HN Gopher] Citus 12: Schema-based sharding for PostgreSQL
       ___________________________________________________________________
        
       Citus 12: Schema-based sharding for PostgreSQL
        
       Author : mulander
       Score  : 82 points
       Date   : 2023-07-18 18:22 UTC (4 hours ago)
        
 (HTM) web link (www.citusdata.com)
 (TXT) w3m dump (www.citusdata.com)
        
       | mslot wrote:
       | Nice to see this on HN :)
       | 
       | The high-level is: You enable a setting and every CREATE SCHEMA
       | creates a new shard. All the tables in the schema will be co-
       | located so you can have efficient joins & foreign keys between
       | the tables.
       | 
       | On top of that, you can also have reference tables that are
       | replicated to all nodes, again for fast joins & foreign keys with
       | all schemas.
       | 
       | Everything else is about making every PostgreSQL feature work as
       | seamlessly as if there was no sharding. You can still do things
       | like transactions across schemas, create and use custom types,
       | access controls, work with other extensions, use procedures, etc.
        
       | __s wrote:
       | What's advantage over having tenant id as distribution column?
       | Seems like you make schema name the distribution column. Maybe
       | gross setups where same name function definition varies between
       | schemas (been there done that, don't want to do it again)
       | 
       | Seems like article only offers ease of use
        
       | skunkworker wrote:
       | Having used schema based sharding in postgres before, I hope
       | citus has a way around connection pooling and pgbouncer, as if
       | you use pgbouncer transaction pooling your search_path could
       | disappear at any time.
        
         | mulander wrote:
         | Great observation! :)
         | 
         | We worked upstream to have `search_path` properly handled
         | (tracked per client) by pgbouncer.
         | 
         | https://github.com/pgbouncer/pgbouncer/commit/8c18fc4d213ad4...
         | 
         | Check config.md in that commit for a verbose, humanized
         | description.
        
         | [deleted]
        
       | mariocesar wrote:
       | I'm having trouble getting CitusDB to work with RDS, even though
       | I'd really like to use it with AWS. Whenever I try to research
       | how to make it happen, I get stuck in a lot of challenges and end
       | up concluding that it might not be possible. While I could use
       | Azure instead, I'm hesitant because I have a lot of resources and
       | infrastructure in AWS and it wouldn't make sense to move the
       | database layer. The idea of having to do DBA and Ops work on my
       | own EC2 instances is overwhelming. I would appreciate any
       | guidance on how to use RDS with Citus on their documentation.
       | It's concerning that Citus might be withholding information about
       | the possibility of using RDS, given that Microsoft is its
       | partner.
        
         | aamederen wrote:
         | You can run Citus on EC2 for now.
        
         | AlexB138 wrote:
         | Microsoft isn't its "partner", they were entirely bought by
         | Microsoft. The goal is to make the technology an advantage for
         | Azure, as I understand it. It's very unlikely you'll see them
         | offer first-party support for RDS, or CloudSQL.
        
       | asah wrote:
       | love citus! this is useful.
       | 
       | one gotcha: schemas are a weird old thing in SQL that are kinda
       | the worst of all worlds, basically more like prefixing your SQL
       | object names than a real level of indirection. Schema objects
       | can't be manipulated as a batch, they don't provide the isolation
       | / access control of databases, and can't be manipulated in DML
       | and require stored procedures to manipulate.
        
         | nine_k wrote:
         | Schemas are namespaces, a convenience feature, not a security
         | feature.
        
           | rad_gruchalski wrote:
           | That's exactly the reason I went for YugabyteDB tablespaces
           | implementation on one of past projects. Dedicated database
           | locked down to selected nodes for full isolation.
        
         | paulddraper wrote:
         | > Schema objects can't be manipulated as a batch,
         | 
         | I would make a partitioned table, and put each partition in its
         | own schema.
        
         | mslot wrote:
         | True, though I'd phrase it more as a middle ground. Which model
         | is best depends on your needs.
         | 
         | A big advantage of schemas over databases is that you can
         | access different schemas over the same connection, which allows
         | much higher density, since connections are a scarce resource in
         | PostgreSQL. You can also have shared tables.
         | 
         | A big advantage over row-based sharding is that you can have
         | variations of the table definitions or a different set of
         | tables altogether in each schema. Plus for the purpose of
         | sharding it's easier to use, since normalized data models
         | complicate row-based sharding.
        
       | pickledish wrote:
       | Hm, question for people a bit more familiar with Postgres -- what
       | is meant by "schema" here?
       | 
       | My definition is "the columns and column types of a table", but,
       | that doesn't seem to make sense with what they're talking about
       | here ("large" and "small" schemas probably aren't referring to
       | wide and narrow tables for example, and I don't see how sharding
       | by my definition of "schema" could even make sense anyways)
        
         | hans_castorp wrote:
         | Schemas are namespaces (actually called that internally in
         | Postgres).
         | 
         | The SQL standard defines a two level namespace hierarchy. A
         | single "instance" of contains multiple catalogs and each
         | catalog contains multiple schemas (and each schema then
         | contains objects like tables, views, types, functions etc).
         | 
         | Many database products use the term "database" instead of
         | "catalog" e.g. in Postgres and SQL Server. But "schema" is used
         | quite uniformly. MySQL's "databases" are in fact "schemas"
         | though.
        
           | pickledish wrote:
           | Whew boy, that's a more sizable hierarchy than I thought was
           | going on under the hood lol, I guess I've got some reading to
           | do. TY for the pointers!
        
         | Hovertruck wrote:
         | Schema is a proper noun in Postgres:
         | https://www.postgresql.org/docs/current/ddl-schemas.html
        
         | mulander wrote:
         | Schemas are groupings of tables and other entities that can be
         | defined within a database. You can think of them like of a
         | namespace in programming languages. You can have the same table
         | definition, within the same database, defined multiple times
         | (each in a different schema) and each holding different data.
         | 
         | By large and small we are referring to the amount of data each
         | schema holds currently. They can grow over time and some of
         | them may become very big while others will remain small
         | (storage wise).
        
           | pickledish wrote:
           | Ah, interesting ok
           | 
           | > You can have the same table definition, within the same
           | database, defined multiple times (each in a different schema)
           | and each holding different data.
           | 
           | So in this respect, each table within a schema indeed already
           | acts like a "shard" of the overall table
           | 
           | Is this enforced? Like, if I create a table "messages" in
           | schema A and a table "messages" in table B, must they have
           | the same columns/column types, or is that just convention
        
             | mulander wrote:
             | They can be different, as long as your application can
             | handle them being different (which makes sense for
             | microservices as an example).
        
               | pickledish wrote:
               | I see! Fascinating, thanks for the help :)
        
         | ubertaco wrote:
         | I've seen MySQL clients/docs also use "schema" to refer to
         | "this collection of tables", basically what SQLite would call
         | "a database".
         | 
         | The words get really fuzzy here when you try to get a concrete
         | and ubiquitous definition, though.
         | 
         | Best I can do is to say, in these queries:
         | use foo;         select id, name from foo.people where id < 10;
         | 
         | ..."foo" is the schema in MySQL and, apparently, Postgres
         | parlance.
        
         | paulddraper wrote:
         | Not to be rude...but the same things as every other ANSI SQL
         | database. Nothing PostgreSQL specific here.
         | 
         | Schema is the thing you get from `CREATE SCHEMA`...a namespace
         | of tables/functions/views/etc.
        
           | pickledish wrote:
           | Ah I see! Yeah I guess it's just a concept I've never run
           | into before; unfortunate that it shares a name with the
           | "other" concept of schema used by e.g. https://json-
           | schema.org
           | 
           | Thank you!
        
       | metadat wrote:
       | Does Oracle support anything like this? Or _any other_ DBMS in
       | widespread use, for that matter?
       | 
       | The promise behind this approach to DB sharding has great
       | potential. Simultaneously impressive, novel, and badass.
       | 
       | I wish this had been available ten years ago at a few of my
       | startups!
        
         | brightball wrote:
         | I watched a Citus demo a few months back and it blew me away. I
         | hear upgrades are the big challenge.
        
       ___________________________________________________________________
       (page generated 2023-07-18 23:00 UTC)