[HN Gopher] Show HN: Pg_replicate - Build Postgres replication a...
___________________________________________________________________
Show HN: Pg_replicate - Build Postgres replication applications in
Rust
Author : imor80
Score : 101 points
Date : 2024-08-10 15:00 UTC (7 hours ago)
(HTM) web link (github.com)
(TXT) w3m dump (github.com)
| imor80 wrote:
| Hey HN,
|
| For the past few months, as part of my job at Supabase, I have
| been working on pg_replicate. pg_replicate lets you very easily
| build applications which can copy data (full table copies and
| cdc) from Postgres to any other data system. Around six months
| back I was figuring out what can be built by tailing Postgres'
| WAL. pg_replicate grew organically out of that effort. Many
| similar tools, like Debezium, exist already which do a great job,
| but pg_replicate is much simpler and focussed only on Postgres.
| Rust was used in the project because I am most comfortable with
| it. pg_replicate abstracts over the Postgres logical replication
| protocol[0] and lets you work with higher level concepts. There
| are three main concepts to understand pg_replicate: source, sink
| and pipeline.
|
| 1/ A source is a Postgres db from which data is to be copied. 2/
| A sink is a data system into which data will be copied. 3/ A
| pipeline connects a source to a sink.
|
| Currently pg_replicate supports BigQuery, DuckDb local file and,
| MotherDuck as sinks. More sinks will be added in future. To
| support a new data system, you just need to implement the
| BatchSink trait (older Sink trait will be deprecated soon).
|
| pg_replicate is still under heavy development and is a little
| thin on documentation. Performance is another area which hasn't
| received much attention. We are releasing this to get feedback
| from the community and are still evaluating how (or if) we can
| integrate it with the Supabase platform. Comments and feedback
| are welcome.
|
| [0] Postgres logical replication protocol:
| [https://www.postgresql.org/docs/current/protocol-logical-rep...)
| convolvatron wrote:
| I've recently been playing with the logical replication
| protocol, and it enables all kinds of interesting usages. one
| really cool thing is that you see the transactional boundaries,
| so not only can you write a cache, you can do so in a way thats
| internally consistent.
|
| its also inherently much nicer than listen/notify, since you
| don't have to go back and figure out what data was associated
| with the event
| chucky_z wrote:
| I'm curious in your experience how many clients can run
| pg_replicate at once?
|
| With MySQL I saw the interesting use-case of the black hole
| storage engine to scale out replication logs but ultimately the
| only usage I'm aware of was for scaling other mysql read
| replicas.
|
| The idea of scaling an application by tailing logs from a
| database sounds very interesting to me, and I'm curious if
| you've explored this at all. There's of course things like
| Kafka (and then things like Debezium), but it's hard to beat
| direct!
| necubi wrote:
| This is so cool! I appreciate Debezium for its wide DB support
| (no part of me wants to know the inner workings of MSSQL's
| replication protocol) but it's finnicky to run. Great to have
| an alternative, at least for postgres.
| phamilton wrote:
| Postgres + Rust is one of the most exciting intersections of tech
| I've seen in a while.
|
| There's external tooling like his project, but postgres
| extensions in Rust are exciting.
|
| Full extensions via pgrx have been cool to see, but plrust +
| pg_tle is also starting to show up.
|
| If you aren't familiar with TLE (Trusted Language Extensions), it
| is a postgres extension from AWS that created some privileged
| interfaces for procedural languages (used for user-defined
| functions) to do some extra stuff. Right now it's mostly auth-
| related hooks but my hope is that it expands in the future.
|
| Plrust is a procedural language extension for Rust, allowing user
| defined functions written in Rust.
|
| The combination of those two could open up a world of rich
| extensions usable in managed hosted environments like RDS.
| rubenfiszel wrote:
| This is super timely.
|
| Windmill (https://windmill.dev) used to only support webhooks to
| trigger code and flow jobs. We have just added email support
| building our own MX server, and wanted to add CDC change. We were
| gonna do it on Debezium but this will allow us to remove the need
| for a third-party service and just add this as a crate. Thank you
| supabase for open-sourcing this.
| eknkc wrote:
| I was trying out the stdout example. Could not get it to log
| anything. DuckDB example worked so I went digging into the
| source. Apparently the stdout sink is using tracing and I did not
| have a `RUST_LOG` env var set.
|
| Might be a good idea to have it documented or have the default
| level set to info for the stdout example.
|
| Maybe this is common Rust knowledge and I just don't know what
| I'm doing though.
| stlava wrote:
| Nice! I'm one of the authors of pg-bifrost which is in the same
| space. Have you thought about / have solved sharding consumption
| across multiple slots / multi consumers to increase throughput?
| This is on my radar but not something I've investigated yet.
|
| The issue we've ran into is some team at work decides to re-write
| an entire table and things get backed up until they stop updating
| rows.
___________________________________________________________________
(page generated 2024-08-10 23:00 UTC)