[HN Gopher] Idempotency Keys for Exactly-Once Processing
___________________________________________________________________
Idempotency Keys for Exactly-Once Processing
Author : defly
Score : 53 points
Date : 2025-12-01 12:07 UTC (4 days ago)
(HTM) web link (www.morling.dev)
(TXT) w3m dump (www.morling.dev)
| hinkley wrote:
| Failure resistant systems end up having a bespoke implementation
| of a project management workflow built into them and then
| treating each task like a project to be managed from start to
| finish, with milestones along the way.
| doctorpangloss wrote:
| another POV is that solutions that require no long term
| "durable workflow" style storage provide exponentially more
| value. if you are making something that requires durable
| workflows, you ought to spend a little bit of time in product
| development so that it does _not_ require durable workflows,
| instead of a ton of time making something that isn 't very
| useful durable.
|
| for example, you can conceive of a software vendor that does
| the end-to-end of a real estate transaction: escrow, banking,
| signature, etc. The IT required to support the model of such a
| thing would be staggering. Does it make sense to do that kind
| of product development? That is inventing all of SAP, on top of
| solving your actual problem. Or making the mistake of adopting
| temporal, trigger, etc., who think they have a smaller problem
| than making all of SAP and spend considerable resources
| convincing you that they do.
|
| The status quo is that everyone focuses on their little part to
| do it as quickly as possible. The need for durable workflows is
| BAD. You should look at that problem as, make buying and
| selling homes much faster and simpler, or even change the order
| of things so that less durability is required; not re-enact the
| status quo as an IT driven workflow.
| whattheheckheck wrote:
| Interesting thought but how do you sell an idea that sounds
| like...
|
| "How we've been doing things is wrong and I am going to
| redesign it in a way that no one else knows about so I don't
| have to implement the thing that's asked of me"
| doctorpangloss wrote:
| Haha, another way of describing what you are saying is
| enterprise sales: "give people exactly what they ask for,
| not what makes the most sense."
|
| Businesses that require enterprise sales are probably the
| worst performing category of seed investing. They encompass
| all of Ed tech and health tech, which are the two worst
| industry verticals for VC; and Y Combinator has to focus on
| an index of B2B services for other programmers because
| without that constraint, nearly every "do what you are
| asked for" would fail. Most of the IT projects business do
| internally fail!
|
| In fact I think the idea you are selling is even harder, it
| is much harder to do B2B enterprise sales than knowing if
| the thing you are making makes sense and is good.
| majormajor wrote:
| Chesterton's Fence, no?
|
| Why are real-estate transactions complex and full of
| paperwork? Because there are history books filled with fraud.
| There are other types of large transactions that also involve
| a lot of paperwork too, for the same reason.
|
| Why does a company have extensive internal tracing of the
| progress of their business processes, and those of their
| customers? Same reason, usually. People want accountability
| and they want to discourage embezzlement and such things.
| leoqa wrote:
| Durable workflows are just distributed state machines. The
| complexity is there because guaranteeing a machine will
| always be available is _impossible_.
| ekjhgkejhgk wrote:
| Here's what I don't understand about distributed systems: TCP
| works amazing, so why not use the same ideas? Every message
| increments a counter, so the receiver can tell the ordering and
| whether some message is missing. Why is this complicated?
| exitb wrote:
| It needs a single consumer to be that simple.
| mkarrmann wrote:
| And a single producer! i.e. it breaks down if you add support
| for fault tolerance
| Etheryte wrote:
| TCP is a one to one relation, distributed systems are many to
| many.
| ekjhgkejhgk wrote:
| You mean like UDP which also works amazing?
| podgietaru wrote:
| UDP doesn't guarantee exactly once processing.
| ewidar wrote:
| Not trying to be snarly, but you should read the article and
| come back to discuss. This specific point is adressdd.
| ekjhgkejhgk wrote:
| Can't be bothered, I don't think it's that interesting.
|
| TCP exists and it's amazing.
|
| Multiple cores within a CPU also communicate perfectly.
|
| So this is a solved problem. My suspicion is that the people
| who write articles on "distributed systems" aren't aware of
| what already exists.
| manoDev wrote:
| > The more messages you need to process overall, the more
| attractive a solution centered around monotonically increasing
| sequences becomes, as it allows for space-efficient duplicate
| detection and exclusion, no matter how many messages you have.
|
| It should be the opposite: with more messages you want to scale
| with independent consumers, and a monotonic counter is a disaster
| for that.
|
| You also don't need to worry about dropping old messages if you
| implement your processing to respect the commutative property.
| itishappy wrote:
| > It should be the opposite: with more messages you want to
| scale with independent consumers, and a monotonic counter is a
| disaster for that.
|
| Is there any method for uniqueness testing that works after
| fan-out?
|
| > You also don't need to worry about dropping old messages if
| you implement your processing to respect the commutative
| property.
|
| Commutative property protects if messages are received out of
| order. Duplicates require idempotency.
| majormajor wrote:
| You only need monotonicity per producer here, and even with
| independent producer and consumer scaling you can make tracking
| that tractable as long as you can avoid every consumer needing
| to know about every producer while also having a truly huge
| cardinality of producers.
| bokohut wrote:
| This was my exact solution in the late 1990's that I formulated
| using a uid algorithm I created when confronted with a growing
| payment processing load issue that centralized hardware at the
| time could not handle. MsSQL could not process the ever
| increasing load yet the firehose of real-time payments
| transaction volume could not be turned off so an interim parallel
| solution involving microservices to walk everything over to
| Oracle was devised using this technique. Everything old is new
| again as the patterns and cycles ebb and flow.
| eximius wrote:
| These strategies only really work for stream processing. You also
| want idempotent APIs which won't really work with these. You'd
| probably go for the strategy they pass over which is having it be
| an arbitrary string key and just writing it down with some TTL.
| zmj wrote:
| I like the uuid v7 approach - being able to reject messages that
| have aged past the idempotency key retention period is a nice
| safeguard.
| imron wrote:
| I like to use uuid5 for this. It produces unique keys in a given
| namespace (defined by a uuid) but also takes an input key and
| produces identical output ID for the same input key.
|
| This has a number of nice properties:
|
| 1. You don't need to store keys in any special way. Just make
| them a unique column of your db and the db will detect duplicates
| for you (and you can provide logic to handle as required, eg
| ignoring if other input fields are the same, raising an error if
| a message has the same idempotent key but different fields).
|
| 2. You can reliably generate new downstream keys from an incoming
| key without the need for coordination between consumers, getting
| an identical output key for a given input key regardless of
| consumer.
|
| 3. In the event of a replayed message it's fine to republish
| downstream events because the system is now deterministic for a
| given input, so you'll get identical output (including generated
| messages) for identical input, and generating duplicate outputs
| is not an issue because this will be detected and ignored by
| downstream consumers.
|
| 4. This parallelises well because consumers are deterministic and
| don't require any coordination except by db transaction.
___________________________________________________________________
(page generated 2025-12-05 23:00 UTC)