[HN Gopher] Partitioning in Postgres, 2022 Edition
___________________________________________________________________
Partitioning in Postgres, 2022 Edition
Author : plaur782
Score : 67 points
Date : 2022-10-05 19:02 UTC (3 hours ago)
(HTM) web link (brandur.org)
(TXT) w3m dump (brandur.org)
| nullwarp wrote:
| We use Postgres partitioning quite successfully to handle our
| customer based data. Everything is partitioned by customer and
| some customers are partitioned further.
|
| One gotcha to be careful with is that if you run a query spanning
| multiple partitions, it will run them all at once and if your
| database isn't super big - will bring it to its knees.
|
| Outside of that really no issues. We also use Timescale quite
| heavily, which also works fantastic.
| makr17 wrote:
| IIRC, it won't run them all at the _same time_, it will run
| them in parallel, which ends up being limited by the value of
| `max_parallel_workers` in your config. So if you find that
| parallel operations are swamping your DB, maybe trying stepping
| that config down a bit. In general, queries that span several
| partitions shouldn't be a problem. Hitting _all_ partitions for
| a really big table, that's another thing entirely. But since
| you're partitioned you really ought to be limiting by the value
| of the partition key in your queries anyways...
| Ozzie_osman wrote:
| Do you have any notes on how you set this up? Did you use
| extensions and if so which?
| didgetmaster wrote:
| I am building my own database engine using some data objects
| (key-value stores) I invented to form columnar store tables. It
| has some really fast query speeds and analytic features (e.g.
| pivot tables) that test favorably compared to Postgres and other
| RDBMS offerings. (https://www.youtube.com/watch?v=OVICKCkWMZE)
|
| Partitioning a big table is definitely on my TODO list. How big
| does a typical table need to grow before partitioning is seen as
| a 'necessity'? What are some ways current partitioning strategies
| have made things too difficult?
| eatonphil wrote:
| That's awesome. Link to your repo or any posts you've written
| about your progress?
| didgetmaster wrote:
| I haven't decided yet which parts I am going to open source
| or which license I am going to use; but the binaries are
| available for free download on our website
| https://www.Didgets.com and I have a newsletter at
| Didgets.substack.com
| eatonphil wrote:
| Is Postgres ever going to allow automatic partitioning?
|
| Something like: CREATE TABLE logs (
| time TIMESTAMPTZ, data JSONB ) PARTITION ON
| time IN INTERVAL '1 day'
|
| And it just creates/manages these partitions for every day?
|
| If you can already manage partitions like this manually it feels
| like the next step is to just have it be automatic. So you have
| less of a need to switch to Timescale or ClickHouse or whatever
| other database as the amount of data you're storing/querying
| grows. (Yeah that's a handwave-y suggestion but at least you
| could stick with Postgres for longer.)
| yawgmoth wrote:
| pg_partman, which is included in most distributions of Postgres
| (including RDS) can handle some cases
|
| - https://github.com/pgpartman/pg_partman
|
| -
| https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Postg...
| boomskats wrote:
| pg_partman is awesome. The SQL syntax for partitioning isn't
| dissimilar to your example (since PG11?), but like you say,
| it's the automatic creation and cleanup of partitions that's
| it's missing. pg_partman used to have to do a lot more
| things, but nowadays you can just have it do the job of
| creating X future partitions for you at a given interval, and
| truncating partitions older than retention period Y.
|
| It's a great extension. We have used it in production for a
| few years now, as part of a commercial on-prem observability
| tool (an app that keeps around 1TB of data at any one time,
| written at avg 15mb/sec, maybe 500ish partitions, user
| configurable retention strategy). It is extremely reliable.
| Support for Keith F working on it is one of the reasons I
| like Crunchy.
|
| edit: I clearly didn't read the link before writing this
| eatonphil wrote:
| Timescale is just an extension too, but still I wonder why
| Postgres should not support automatic partition management
| themselves.
| CSDude wrote:
| Timescale is not just an extension. It offers many features
| and a restrictive license. Partman is just a utility. But I
| agree something like partman should be part of core
| Postgres
| xdanger wrote:
| It's just an extension. You can build the Apache 2.0 part
| of the extension and ignore the Timescale License parts.
| Those parts are mainly about multinode, compression and
| continuous aggregates.
|
| TSL is mainly about not competing with their cloud
| offerings. So you can't run a database-as-a-service for
| time series data with it.
|
| More about the license:
| https://www.timescale.com/blog/building-open-source-
| business...
|
| Comparison about the open source and community editions:
| https://docs.timescale.com/timescaledb/latest/timescaledb
| -ed...
___________________________________________________________________
(page generated 2022-10-05 23:01 UTC)