[HN Gopher] Postgres Webhooks with Pgstream
___________________________________________________________________
Postgres Webhooks with Pgstream
Author : mebcitto
Score : 70 points
Date : 2024-09-01 10:10 UTC (12 hours ago)
(HTM) web link (xata.io)
(TXT) w3m dump (xata.io)
| jitl wrote:
| The GitHub repo README gives a better sense of the capabilities
| of the pgstream system:
| https://github.com/xataio/pgstream?tab=readme-ov-file#archit...
|
| For deployments at a more serious scale, it seems they support
| buffering WAL events into Kafka, similar to Debezium (the current
| leader for change data capture), to de-couple the replication
| slot reader on the Postgres side throughput from the event
| handlers you deliver the events to.
|
| Pgstream seems more batteries-included compared to using Debezium
| to consume the PG log; Kafka is optional with pgstream and things
| like webhook delivery and OpenSearch indexing are packaged in,
| rather than being a "choose your own adventure" game with Kafka
| Streams middleware ecosystem jungle. If their offerings
| constraints work for your use-case, why not prefer it over
| Debezium since it seems easier? I'd rather write Go than
| Java/JVM-language if I need to plug into the pipeline.
|
| However at any sort of serious scale you'll need the Kafka in
| there, and then I'm less sure you'd make use of the other plug
| and play stuff like the OpenSearch indexer or webhooks at all.
| Certainly as volume grows webhooks start to feel like a bad fit
| for CDC events at the level of a single row change; at least in
| my brief Debezium CDC experience, my consumer pulls batches of
| 1000+ changes at once from Kafka.
|
| The other thing I don't see is transaction metadata, maybe I
| shouldn't worry much about it (not many people seem to be
| concerned) but I'd like my downstream consumer to have delayed
| consistency with my Postgres upstream, which means I need to
| consume record changes with the same transactional grouping as in
| Postgres, otherwise I'll probably never be consistent in
| practice: https://www.scattered-thoughts.net/writing/internal-
| consiste...
| adamcharnock wrote:
| A quick google didn't reveal much about pgstream's delivery
| semantics. Do these webhooks get called at most once, or at least
| once? I hope the latter, but in absence of information I'd guess
| it is the former.
| simonw wrote:
| It looks to me like it's at-most-once. This code here
| https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c...
| logs an error if a delivery fails but does not appear to queue
| it for a second attempt.
| jitl wrote:
| Sad! Suitable for driving "best effort" use cases like
| reactive UI updates or stale-while-revalidate cache
| refreshes, but not usable for "business logic" like updating
| user access lists, billing, sending emails, or maintaining a
| secondary index of any sort.
| simonw wrote:
| Implementing webhook deliveries is one of those things that's way
| harder than you would initially imagine. The two things I always
| look for in systems like this are:
|
| 1. Can it handle deliveries to slow on unreliable endpoints? In
| particular, what happens if the external server is deliberately
| slow to respond - someone could be trying to crash your system by
| forcing it to deliver to slow-loading endpoints.
|
| 2. How are retries handled? If the server returns a 500, a good
| webhooks system will queue things up for re-delivery with an
| exponential backoff and try a few more times before giving up
| completely.
|
| Point 1. only matters if you are delivering webhooks to untrusted
| endpoints - systems like GitHub where anyone can sign up for hook
| deliveries.
|
| 2. is more important.
|
| https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c...
| shows that the HTTP client can be configured with a timeout
| (which defaults to 10s
| https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c... )
|
| From looking at
| https://github.com/xataio/pgstream/blob/bab0a8e665d37441351c...
| it doesn't look like this system handles retries.
|
| Retrying would definitely be a useful addition. PostgreSQL is a
| great persistence store for recording failures and retry
| attempts, so that feature would be a good fit for this system.
| maxbond wrote:
| Really any time you're using a distributed system, you should
| ask yourself, do I need retries? Do I need circuit breakers? Do
| I need rate limits?
|
| The answers are probably either "yes" or "yes, but not yet."
| tasn wrote:
| They should defo use Svix[1], I'll reach out to the blog
| author, this looks like a cool blog post and use-case.
|
| 1: https://www.svix.com
| kgeist wrote:
| >a good webhooks system will queue things up for re-delivery
| with an exponential backoff and try a few more times before
| giving up completely.
|
| Microsoft's Graph API also requires consumers to re-register
| webhooks every N days, a kind of "heartbeat" to make sure the
| webhooks aren't sent to a dead server forever.
| canadiantim wrote:
| Xata putting out some cool pg things. Is PG roll still looking
| good these days? Seemed pretty sexy last time I looked at it
| tazu wrote:
| Curious if there's a reason this extension is written in Go and
| not Zig? The blog posted [1] a few months ago about their cool
| Zig library for making extensions and I've been playing around
| with it. Is Zig just not mature enough?
|
| [1]: https://xata.io/blog/introducing-pgzx
| gobblegobble2 wrote:
| It's not an extension. It's a standalone app that connects to
| postgres and listens to a replication stream.
| oars wrote:
| Postgres webhooks could be very usefulness for me. Thanks for
| sharing.
___________________________________________________________________
(page generated 2024-09-01 23:00 UTC)