[HN Gopher] Postgres Webhooks with Pgstream
       ___________________________________________________________________
        
       Postgres Webhooks with Pgstream
        
       Author : mebcitto
       Score  : 70 points
       Date   : 2024-09-01 10:10 UTC (12 hours ago)
        
 (HTM) web link (xata.io)
 (TXT) w3m dump (xata.io)
        
       | jitl wrote:
       | The GitHub repo README gives a better sense of the capabilities
       | of the pgstream system:
       | https://github.com/xataio/pgstream?tab=readme-ov-file#archit...
       | 
       | For deployments at a more serious scale, it seems they support
       | buffering WAL events into Kafka, similar to Debezium (the current
       | leader for change data capture), to de-couple the replication
       | slot reader on the Postgres side throughput from the event
       | handlers you deliver the events to.
       | 
       | Pgstream seems more batteries-included compared to using Debezium
       | to consume the PG log; Kafka is optional with pgstream and things
       | like webhook delivery and OpenSearch indexing are packaged in,
       | rather than being a "choose your own adventure" game with Kafka
       | Streams middleware ecosystem jungle. If their offerings
       | constraints work for your use-case, why not prefer it over
       | Debezium since it seems easier? I'd rather write Go than
       | Java/JVM-language if I need to plug into the pipeline.
       | 
       | However at any sort of serious scale you'll need the Kafka in
       | there, and then I'm less sure you'd make use of the other plug
       | and play stuff like the OpenSearch indexer or webhooks at all.
       | Certainly as volume grows webhooks start to feel like a bad fit
       | for CDC events at the level of a single row change; at least in
       | my brief Debezium CDC experience, my consumer pulls batches of
       | 1000+ changes at once from Kafka.
       | 
       | The other thing I don't see is transaction metadata, maybe I
       | shouldn't worry much about it (not many people seem to be
       | concerned) but I'd like my downstream consumer to have delayed
       | consistency with my Postgres upstream, which means I need to
       | consume record changes with the same transactional grouping as in
       | Postgres, otherwise I'll probably never be consistent in
       | practice: https://www.scattered-thoughts.net/writing/internal-
       | consiste...
        
       | adamcharnock wrote:
       | A quick google didn't reveal much about pgstream's delivery
       | semantics. Do these webhooks get called at most once, or at least
       | once? I hope the latter, but in absence of information I'd guess
       | it is the former.
        
         | simonw wrote:
         | It looks to me like it's at-most-once. This code here
         | https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c...
         | logs an error if a delivery fails but does not appear to queue
         | it for a second attempt.
        
           | jitl wrote:
           | Sad! Suitable for driving "best effort" use cases like
           | reactive UI updates or stale-while-revalidate cache
           | refreshes, but not usable for "business logic" like updating
           | user access lists, billing, sending emails, or maintaining a
           | secondary index of any sort.
        
       | simonw wrote:
       | Implementing webhook deliveries is one of those things that's way
       | harder than you would initially imagine. The two things I always
       | look for in systems like this are:
       | 
       | 1. Can it handle deliveries to slow on unreliable endpoints? In
       | particular, what happens if the external server is deliberately
       | slow to respond - someone could be trying to crash your system by
       | forcing it to deliver to slow-loading endpoints.
       | 
       | 2. How are retries handled? If the server returns a 500, a good
       | webhooks system will queue things up for re-delivery with an
       | exponential backoff and try a few more times before giving up
       | completely.
       | 
       | Point 1. only matters if you are delivering webhooks to untrusted
       | endpoints - systems like GitHub where anyone can sign up for hook
       | deliveries.
       | 
       | 2. is more important.
       | 
       | https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c...
       | shows that the HTTP client can be configured with a timeout
       | (which defaults to 10s
       | https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... )
       | 
       | From looking at
       | https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c...
       | it doesn't look like this system handles retries.
       | 
       | Retrying would definitely be a useful addition. PostgreSQL is a
       | great persistence store for recording failures and retry
       | attempts, so that feature would be a good fit for this system.
        
         | maxbond wrote:
         | Really any time you're using a distributed system, you should
         | ask yourself, do I need retries? Do I need circuit breakers? Do
         | I need rate limits?
         | 
         | The answers are probably either "yes" or "yes, but not yet."
        
         | tasn wrote:
         | They should defo use Svix[1], I'll reach out to the blog
         | author, this looks like a cool blog post and use-case.
         | 
         | 1: https://www.svix.com
        
         | kgeist wrote:
         | >a good webhooks system will queue things up for re-delivery
         | with an exponential backoff and try a few more times before
         | giving up completely.
         | 
         | Microsoft's Graph API also requires consumers to re-register
         | webhooks every N days, a kind of "heartbeat" to make sure the
         | webhooks aren't sent to a dead server forever.
        
       | canadiantim wrote:
       | Xata putting out some cool pg things. Is PG roll still looking
       | good these days? Seemed pretty sexy last time I looked at it
        
       | tazu wrote:
       | Curious if there's a reason this extension is written in Go and
       | not Zig? The blog posted [1] a few months ago about their cool
       | Zig library for making extensions and I've been playing around
       | with it. Is Zig just not mature enough?
       | 
       | [1]: https://xata.io/blog/introducing-pgzx
        
         | gobblegobble2 wrote:
         | It's not an extension. It's a standalone app that connects to
         | postgres and listens to a replication stream.
        
       | oars wrote:
       | Postgres webhooks could be very usefulness for me. Thanks for
       | sharing.
        
       ___________________________________________________________________
       (page generated 2024-09-01 23:00 UTC)