[HN Gopher] Idempotency Keys for Exactly-Once Processing
       ___________________________________________________________________
        
       Idempotency Keys for Exactly-Once Processing
        
       Author : defly
       Score  : 53 points
       Date   : 2025-12-01 12:07 UTC (4 days ago)
        
 (HTM) web link (www.morling.dev)
 (TXT) w3m dump (www.morling.dev)
        
       | hinkley wrote:
       | Failure resistant systems end up having a bespoke implementation
       | of a project management workflow built into them and then
       | treating each task like a project to be managed from start to
       | finish, with milestones along the way.
        
         | doctorpangloss wrote:
         | another POV is that solutions that require no long term
         | "durable workflow" style storage provide exponentially more
         | value. if you are making something that requires durable
         | workflows, you ought to spend a little bit of time in product
         | development so that it does _not_ require durable workflows,
         | instead of a ton of time making something that isn 't very
         | useful durable.
         | 
         | for example, you can conceive of a software vendor that does
         | the end-to-end of a real estate transaction: escrow, banking,
         | signature, etc. The IT required to support the model of such a
         | thing would be staggering. Does it make sense to do that kind
         | of product development? That is inventing all of SAP, on top of
         | solving your actual problem. Or making the mistake of adopting
         | temporal, trigger, etc., who think they have a smaller problem
         | than making all of SAP and spend considerable resources
         | convincing you that they do.
         | 
         | The status quo is that everyone focuses on their little part to
         | do it as quickly as possible. The need for durable workflows is
         | BAD. You should look at that problem as, make buying and
         | selling homes much faster and simpler, or even change the order
         | of things so that less durability is required; not re-enact the
         | status quo as an IT driven workflow.
        
           | whattheheckheck wrote:
           | Interesting thought but how do you sell an idea that sounds
           | like...
           | 
           | "How we've been doing things is wrong and I am going to
           | redesign it in a way that no one else knows about so I don't
           | have to implement the thing that's asked of me"
        
             | doctorpangloss wrote:
             | Haha, another way of describing what you are saying is
             | enterprise sales: "give people exactly what they ask for,
             | not what makes the most sense."
             | 
             | Businesses that require enterprise sales are probably the
             | worst performing category of seed investing. They encompass
             | all of Ed tech and health tech, which are the two worst
             | industry verticals for VC; and Y Combinator has to focus on
             | an index of B2B services for other programmers because
             | without that constraint, nearly every "do what you are
             | asked for" would fail. Most of the IT projects business do
             | internally fail!
             | 
             | In fact I think the idea you are selling is even harder, it
             | is much harder to do B2B enterprise sales than knowing if
             | the thing you are making makes sense and is good.
        
           | majormajor wrote:
           | Chesterton's Fence, no?
           | 
           | Why are real-estate transactions complex and full of
           | paperwork? Because there are history books filled with fraud.
           | There are other types of large transactions that also involve
           | a lot of paperwork too, for the same reason.
           | 
           | Why does a company have extensive internal tracing of the
           | progress of their business processes, and those of their
           | customers? Same reason, usually. People want accountability
           | and they want to discourage embezzlement and such things.
        
           | leoqa wrote:
           | Durable workflows are just distributed state machines. The
           | complexity is there because guaranteeing a machine will
           | always be available is _impossible_.
        
       | ekjhgkejhgk wrote:
       | Here's what I don't understand about distributed systems: TCP
       | works amazing, so why not use the same ideas? Every message
       | increments a counter, so the receiver can tell the ordering and
       | whether some message is missing. Why is this complicated?
        
         | exitb wrote:
         | It needs a single consumer to be that simple.
        
           | mkarrmann wrote:
           | And a single producer! i.e. it breaks down if you add support
           | for fault tolerance
        
         | Etheryte wrote:
         | TCP is a one to one relation, distributed systems are many to
         | many.
        
           | ekjhgkejhgk wrote:
           | You mean like UDP which also works amazing?
        
             | podgietaru wrote:
             | UDP doesn't guarantee exactly once processing.
        
         | ewidar wrote:
         | Not trying to be snarly, but you should read the article and
         | come back to discuss. This specific point is adressdd.
        
           | ekjhgkejhgk wrote:
           | Can't be bothered, I don't think it's that interesting.
           | 
           | TCP exists and it's amazing.
           | 
           | Multiple cores within a CPU also communicate perfectly.
           | 
           | So this is a solved problem. My suspicion is that the people
           | who write articles on "distributed systems" aren't aware of
           | what already exists.
        
       | manoDev wrote:
       | > The more messages you need to process overall, the more
       | attractive a solution centered around monotonically increasing
       | sequences becomes, as it allows for space-efficient duplicate
       | detection and exclusion, no matter how many messages you have.
       | 
       | It should be the opposite: with more messages you want to scale
       | with independent consumers, and a monotonic counter is a disaster
       | for that.
       | 
       | You also don't need to worry about dropping old messages if you
       | implement your processing to respect the commutative property.
        
         | itishappy wrote:
         | > It should be the opposite: with more messages you want to
         | scale with independent consumers, and a monotonic counter is a
         | disaster for that.
         | 
         | Is there any method for uniqueness testing that works after
         | fan-out?
         | 
         | > You also don't need to worry about dropping old messages if
         | you implement your processing to respect the commutative
         | property.
         | 
         | Commutative property protects if messages are received out of
         | order. Duplicates require idempotency.
        
         | majormajor wrote:
         | You only need monotonicity per producer here, and even with
         | independent producer and consumer scaling you can make tracking
         | that tractable as long as you can avoid every consumer needing
         | to know about every producer while also having a truly huge
         | cardinality of producers.
        
       | bokohut wrote:
       | This was my exact solution in the late 1990's that I formulated
       | using a uid algorithm I created when confronted with a growing
       | payment processing load issue that centralized hardware at the
       | time could not handle. MsSQL could not process the ever
       | increasing load yet the firehose of real-time payments
       | transaction volume could not be turned off so an interim parallel
       | solution involving microservices to walk everything over to
       | Oracle was devised using this technique. Everything old is new
       | again as the patterns and cycles ebb and flow.
        
       | eximius wrote:
       | These strategies only really work for stream processing. You also
       | want idempotent APIs which won't really work with these. You'd
       | probably go for the strategy they pass over which is having it be
       | an arbitrary string key and just writing it down with some TTL.
        
       | zmj wrote:
       | I like the uuid v7 approach - being able to reject messages that
       | have aged past the idempotency key retention period is a nice
       | safeguard.
        
       | imron wrote:
       | I like to use uuid5 for this. It produces unique keys in a given
       | namespace (defined by a uuid) but also takes an input key and
       | produces identical output ID for the same input key.
       | 
       | This has a number of nice properties:
       | 
       | 1. You don't need to store keys in any special way. Just make
       | them a unique column of your db and the db will detect duplicates
       | for you (and you can provide logic to handle as required, eg
       | ignoring if other input fields are the same, raising an error if
       | a message has the same idempotent key but different fields).
       | 
       | 2. You can reliably generate new downstream keys from an incoming
       | key without the need for coordination between consumers, getting
       | an identical output key for a given input key regardless of
       | consumer.
       | 
       | 3. In the event of a replayed message it's fine to republish
       | downstream events because the system is now deterministic for a
       | given input, so you'll get identical output (including generated
       | messages) for identical input, and generating duplicate outputs
       | is not an issue because this will be detected and ignored by
       | downstream consumers.
       | 
       | 4. This parallelises well because consumers are deterministic and
       | don't require any coordination except by db transaction.
        
       ___________________________________________________________________
       (page generated 2025-12-05 23:00 UTC)