[HN Gopher] Show HN: Pg_replicate - Build Postgres replication a...
       ___________________________________________________________________
        
       Show HN: Pg_replicate - Build Postgres replication applications in
       Rust
        
       Author : imor80
       Score  : 101 points
       Date   : 2024-08-10 15:00 UTC (7 hours ago)
        
 (HTM) web link (github.com)
 (TXT) w3m dump (github.com)
        
       | imor80 wrote:
       | Hey HN,
       | 
       | For the past few months, as part of my job at Supabase, I have
       | been working on pg_replicate. pg_replicate lets you very easily
       | build applications which can copy data (full table copies and
       | cdc) from Postgres to any other data system. Around six months
       | back I was figuring out what can be built by tailing Postgres'
       | WAL. pg_replicate grew organically out of that effort. Many
       | similar tools, like Debezium, exist already which do a great job,
       | but pg_replicate is much simpler and focussed only on Postgres.
       | Rust was used in the project because I am most comfortable with
       | it. pg_replicate abstracts over the Postgres logical replication
       | protocol[0] and lets you work with higher level concepts. There
       | are three main concepts to understand pg_replicate: source, sink
       | and pipeline.
       | 
       | 1/ A source is a Postgres db from which data is to be copied. 2/
       | A sink is a data system into which data will be copied. 3/ A
       | pipeline connects a source to a sink.
       | 
       | Currently pg_replicate supports BigQuery, DuckDb local file and,
       | MotherDuck as sinks. More sinks will be added in future. To
       | support a new data system, you just need to implement the
       | BatchSink trait (older Sink trait will be deprecated soon).
       | 
       | pg_replicate is still under heavy development and is a little
       | thin on documentation. Performance is another area which hasn't
       | received much attention. We are releasing this to get feedback
       | from the community and are still evaluating how (or if) we can
       | integrate it with the Supabase platform. Comments and feedback
       | are welcome.
       | 
       | [0] Postgres logical replication protocol:
       | [https://www.postgresql.org/docs/current/protocol-logical-rep...)
        
         | convolvatron wrote:
         | I've recently been playing with the logical replication
         | protocol, and it enables all kinds of interesting usages. one
         | really cool thing is that you see the transactional boundaries,
         | so not only can you write a cache, you can do so in a way thats
         | internally consistent.
         | 
         | its also inherently much nicer than listen/notify, since you
         | don't have to go back and figure out what data was associated
         | with the event
        
         | chucky_z wrote:
         | I'm curious in your experience how many clients can run
         | pg_replicate at once?
         | 
         | With MySQL I saw the interesting use-case of the black hole
         | storage engine to scale out replication logs but ultimately the
         | only usage I'm aware of was for scaling other mysql read
         | replicas.
         | 
         | The idea of scaling an application by tailing logs from a
         | database sounds very interesting to me, and I'm curious if
         | you've explored this at all. There's of course things like
         | Kafka (and then things like Debezium), but it's hard to beat
         | direct!
        
         | necubi wrote:
         | This is so cool! I appreciate Debezium for its wide DB support
         | (no part of me wants to know the inner workings of MSSQL's
         | replication protocol) but it's finnicky to run. Great to have
         | an alternative, at least for postgres.
        
       | phamilton wrote:
       | Postgres + Rust is one of the most exciting intersections of tech
       | I've seen in a while.
       | 
       | There's external tooling like his project, but postgres
       | extensions in Rust are exciting.
       | 
       | Full extensions via pgrx have been cool to see, but plrust +
       | pg_tle is also starting to show up.
       | 
       | If you aren't familiar with TLE (Trusted Language Extensions), it
       | is a postgres extension from AWS that created some privileged
       | interfaces for procedural languages (used for user-defined
       | functions) to do some extra stuff. Right now it's mostly auth-
       | related hooks but my hope is that it expands in the future.
       | 
       | Plrust is a procedural language extension for Rust, allowing user
       | defined functions written in Rust.
       | 
       | The combination of those two could open up a world of rich
       | extensions usable in managed hosted environments like RDS.
        
       | rubenfiszel wrote:
       | This is super timely.
       | 
       | Windmill (https://windmill.dev) used to only support webhooks to
       | trigger code and flow jobs. We have just added email support
       | building our own MX server, and wanted to add CDC change. We were
       | gonna do it on Debezium but this will allow us to remove the need
       | for a third-party service and just add this as a crate. Thank you
       | supabase for open-sourcing this.
        
       | eknkc wrote:
       | I was trying out the stdout example. Could not get it to log
       | anything. DuckDB example worked so I went digging into the
       | source. Apparently the stdout sink is using tracing and I did not
       | have a `RUST_LOG` env var set.
       | 
       | Might be a good idea to have it documented or have the default
       | level set to info for the stdout example.
       | 
       | Maybe this is common Rust knowledge and I just don't know what
       | I'm doing though.
        
       | stlava wrote:
       | Nice! I'm one of the authors of pg-bifrost which is in the same
       | space. Have you thought about / have solved sharding consumption
       | across multiple slots / multi consumers to increase throughput?
       | This is on my radar but not something I've investigated yet.
       | 
       | The issue we've ran into is some team at work decides to re-write
       | an entire table and things get backed up until they stop updating
       | rows.
        
       ___________________________________________________________________
       (page generated 2024-08-10 23:00 UTC)