[HN Gopher] Citus 12: Schema-based sharding for PostgreSQL
___________________________________________________________________
Citus 12: Schema-based sharding for PostgreSQL
Author : mulander
Score : 82 points
Date : 2023-07-18 18:22 UTC (4 hours ago)
(HTM) web link (www.citusdata.com)
(TXT) w3m dump (www.citusdata.com)
| mslot wrote:
| Nice to see this on HN :)
|
| The high-level is: You enable a setting and every CREATE SCHEMA
| creates a new shard. All the tables in the schema will be co-
| located so you can have efficient joins & foreign keys between
| the tables.
|
| On top of that, you can also have reference tables that are
| replicated to all nodes, again for fast joins & foreign keys with
| all schemas.
|
| Everything else is about making every PostgreSQL feature work as
| seamlessly as if there was no sharding. You can still do things
| like transactions across schemas, create and use custom types,
| access controls, work with other extensions, use procedures, etc.
| __s wrote:
| What's advantage over having tenant id as distribution column?
| Seems like you make schema name the distribution column. Maybe
| gross setups where same name function definition varies between
| schemas (been there done that, don't want to do it again)
|
| Seems like article only offers ease of use
| skunkworker wrote:
| Having used schema based sharding in postgres before, I hope
| citus has a way around connection pooling and pgbouncer, as if
| you use pgbouncer transaction pooling your search_path could
| disappear at any time.
| mulander wrote:
| Great observation! :)
|
| We worked upstream to have `search_path` properly handled
| (tracked per client) by pgbouncer.
|
| https://github.com/pgbouncer/pgbouncer/commit/8c18fc4d213ad4...
|
| Check config.md in that commit for a verbose, humanized
| description.
| [deleted]
| mariocesar wrote:
| I'm having trouble getting CitusDB to work with RDS, even though
| I'd really like to use it with AWS. Whenever I try to research
| how to make it happen, I get stuck in a lot of challenges and end
| up concluding that it might not be possible. While I could use
| Azure instead, I'm hesitant because I have a lot of resources and
| infrastructure in AWS and it wouldn't make sense to move the
| database layer. The idea of having to do DBA and Ops work on my
| own EC2 instances is overwhelming. I would appreciate any
| guidance on how to use RDS with Citus on their documentation.
| It's concerning that Citus might be withholding information about
| the possibility of using RDS, given that Microsoft is its
| partner.
| aamederen wrote:
| You can run Citus on EC2 for now.
| AlexB138 wrote:
| Microsoft isn't its "partner", they were entirely bought by
| Microsoft. The goal is to make the technology an advantage for
| Azure, as I understand it. It's very unlikely you'll see them
| offer first-party support for RDS, or CloudSQL.
| asah wrote:
| love citus! this is useful.
|
| one gotcha: schemas are a weird old thing in SQL that are kinda
| the worst of all worlds, basically more like prefixing your SQL
| object names than a real level of indirection. Schema objects
| can't be manipulated as a batch, they don't provide the isolation
| / access control of databases, and can't be manipulated in DML
| and require stored procedures to manipulate.
| nine_k wrote:
| Schemas are namespaces, a convenience feature, not a security
| feature.
| rad_gruchalski wrote:
| That's exactly the reason I went for YugabyteDB tablespaces
| implementation on one of past projects. Dedicated database
| locked down to selected nodes for full isolation.
| paulddraper wrote:
| > Schema objects can't be manipulated as a batch,
|
| I would make a partitioned table, and put each partition in its
| own schema.
| mslot wrote:
| True, though I'd phrase it more as a middle ground. Which model
| is best depends on your needs.
|
| A big advantage of schemas over databases is that you can
| access different schemas over the same connection, which allows
| much higher density, since connections are a scarce resource in
| PostgreSQL. You can also have shared tables.
|
| A big advantage over row-based sharding is that you can have
| variations of the table definitions or a different set of
| tables altogether in each schema. Plus for the purpose of
| sharding it's easier to use, since normalized data models
| complicate row-based sharding.
| pickledish wrote:
| Hm, question for people a bit more familiar with Postgres -- what
| is meant by "schema" here?
|
| My definition is "the columns and column types of a table", but,
| that doesn't seem to make sense with what they're talking about
| here ("large" and "small" schemas probably aren't referring to
| wide and narrow tables for example, and I don't see how sharding
| by my definition of "schema" could even make sense anyways)
| hans_castorp wrote:
| Schemas are namespaces (actually called that internally in
| Postgres).
|
| The SQL standard defines a two level namespace hierarchy. A
| single "instance" of contains multiple catalogs and each
| catalog contains multiple schemas (and each schema then
| contains objects like tables, views, types, functions etc).
|
| Many database products use the term "database" instead of
| "catalog" e.g. in Postgres and SQL Server. But "schema" is used
| quite uniformly. MySQL's "databases" are in fact "schemas"
| though.
| pickledish wrote:
| Whew boy, that's a more sizable hierarchy than I thought was
| going on under the hood lol, I guess I've got some reading to
| do. TY for the pointers!
| Hovertruck wrote:
| Schema is a proper noun in Postgres:
| https://www.postgresql.org/docs/current/ddl-schemas.html
| mulander wrote:
| Schemas are groupings of tables and other entities that can be
| defined within a database. You can think of them like of a
| namespace in programming languages. You can have the same table
| definition, within the same database, defined multiple times
| (each in a different schema) and each holding different data.
|
| By large and small we are referring to the amount of data each
| schema holds currently. They can grow over time and some of
| them may become very big while others will remain small
| (storage wise).
| pickledish wrote:
| Ah, interesting ok
|
| > You can have the same table definition, within the same
| database, defined multiple times (each in a different schema)
| and each holding different data.
|
| So in this respect, each table within a schema indeed already
| acts like a "shard" of the overall table
|
| Is this enforced? Like, if I create a table "messages" in
| schema A and a table "messages" in table B, must they have
| the same columns/column types, or is that just convention
| mulander wrote:
| They can be different, as long as your application can
| handle them being different (which makes sense for
| microservices as an example).
| pickledish wrote:
| I see! Fascinating, thanks for the help :)
| ubertaco wrote:
| I've seen MySQL clients/docs also use "schema" to refer to
| "this collection of tables", basically what SQLite would call
| "a database".
|
| The words get really fuzzy here when you try to get a concrete
| and ubiquitous definition, though.
|
| Best I can do is to say, in these queries:
| use foo; select id, name from foo.people where id < 10;
|
| ..."foo" is the schema in MySQL and, apparently, Postgres
| parlance.
| paulddraper wrote:
| Not to be rude...but the same things as every other ANSI SQL
| database. Nothing PostgreSQL specific here.
|
| Schema is the thing you get from `CREATE SCHEMA`...a namespace
| of tables/functions/views/etc.
| pickledish wrote:
| Ah I see! Yeah I guess it's just a concept I've never run
| into before; unfortunate that it shares a name with the
| "other" concept of schema used by e.g. https://json-
| schema.org
|
| Thank you!
| metadat wrote:
| Does Oracle support anything like this? Or _any other_ DBMS in
| widespread use, for that matter?
|
| The promise behind this approach to DB sharding has great
| potential. Simultaneously impressive, novel, and badass.
|
| I wish this had been available ten years ago at a few of my
| startups!
| brightball wrote:
| I watched a Citus demo a few months back and it blew me away. I
| hear upgrades are the big challenge.
___________________________________________________________________
(page generated 2023-07-18 23:00 UTC)