[HN Gopher] No such thing as exactly-once delivery
___________________________________________________________________
No such thing as exactly-once delivery
Author : todsacerdoti
Score : 72 points
Date : 2024-09-30 16:00 UTC (7 hours ago)
(HTM) web link (blog.sequinstream.com)
(TXT) w3m dump (blog.sequinstream.com)
| dondraper36 wrote:
| It's admirable when vendors don't try to abuse theory to make
| their product look better.
|
| There was some blog post last week that supposedly proved the
| impossibility of exactly once delivery wrong, but that was quite
| expectedly another attempt to rename the standard terms.
| convolvatron wrote:
| if you employ persistence to keep sequence numbers, and/or
| retransmissions and redundancy, you can drive the likelihood of
| distributed consensus over time arbitrarily close to
| probabilities where it doesn't matter anyways (like the earth
| suddenly being coincident with the sun).
|
| so what are we arguing about?
| sethammons wrote:
| there is a group of people that say that there is no such
| thing as exactly-once-delivery and they are correct. There is
| also a group of people who cannot hear nor read nor process
| the word "delivery" in that term and instead insist that
| exactly-once-processing is easily achievable via
| [deduplication, idempotent actions, counters, ...]. They are
| correct as long as they acknowledge "processing" vs
| "delivery" -- and there are several who refuse to acknowledge
| that and don't believe the distinction matters and they are
| wrong.
| rwiggins wrote:
| Out of curiosity, how would you describe TCP in these
| terms? Does the TCP stack's handling of sequence numbers
| constitute processing (on the client and server both I
| assume)? Which part(s) of a TCP connection could be
| described as delivery?
| wmf wrote:
| From TCP's perspective it has delivered the data when
| read() returns successfully. This is also the point at
| which TCP frees the packet buffers. From the app's
| perspective, TCP is an at-most-once system because data
| can be lost in corner cases caused by failures.
|
| (Of course plenty of people on the Internet use different
| definitions to arrive at the opposite conclusion but they
| are all wrong and I am right.)
| YZF wrote:
| I agree this is what you'd consider delivery. Also agree
| everyone else is wrong an I am right ;)
|
| The similar question in TCP is what happens when the
| sender writes to their socket and then loses the
| connection. At this point the sender doesn't know whether
| the receiver has the data or not. Both are possible. For
| the sender to recover it needs to re-connect and re-send
| the data. Thus the data is potentially delivered more
| than once and the receiver can use various strategies to
| deal with that.
|
| But sure, within the boundary of a single TCP connection
| data (by definition) is never delivered twice.
| wellpast wrote:
| is there a real difference between delivery and processing?
|
| In a streaming app framework (like Flink or Beam, or Kafka
| Streams), I can get an experience of exactly once
| "processing" by reverting to checkpoints on failure and re-
| processing.
|
| But what's the difference between doing this within the
| internal stores of a streaming app or coordinating a
| similar checkpointing/recovery/"exactly-once" mechanism
| with an external "delivery destination"?
| ithkuil wrote:
| Yes there is a difference and it's quite crucial to
| understanding how two groups of people can claim
| seemingly opposite things and both be correct.
|
| It's important to note that by processing here we don't
| just mean any "computing" but the act of "committing the
| side effects of the computation"
| wellpast wrote:
| A typical delivery target is a data store or cache.
| Writing to such a delivery can be construed of as a "side
| effect". But if you accept eventual consistency,
| achieving exactly once semantics is possible.
|
| Eg a streaming framework will let you tally an exactly
| once counter. You can flush the value of that tally to a
| data store. External observers will see an eventually
| consistent exactly-once-delivery result.
| andrewflnr wrote:
| Not that I disagree with getting terms exactly right, but
| if the "processing" that happens exactly once in your
| messaging system is firing a "delivery" event to a higher
| level of the stack (over a local channel), that's
| observationally indistinguishable from exactly-once
| delivery to that higher level, isn't it? What am I missing?
| wmf wrote:
| Nope, because _that_ event could also be lost
| /duplicated.
| Izkata wrote:
| In the systems I'm used to, that "passing up" is a
| function call within the same codebase. There isn't a
| second delivery system for a message to be
| lost/duplicated, though sometimes they obscure it by
| making it look more like such a system.
| ViewTrick1002 wrote:
| The problems always stem from the side effects. You can
| achieve exactly once like properties in a contained system.
|
| But if the processing contains for example sending an email
| then you need to apply said actions for all downstream
| actions, which is where exactly once as seen by the outside
| viewer gets hard.
| SpicyLemonZest wrote:
| What you _can 't_ do, and what naive consumers of distributed
| systems often expect, is drive the probability arbitrarily
| low solely from the producer side. The consumer has to be
| involved, even if they consider themselves to be a single-
| node process and don't want to bother with transactionality
| or idempotency keys.
| mgsouth wrote:
| Vendor: "Exactly once is possible. We do it!"
|
| No, as a vendor you're either deliberately lying, or
| completely incompentent. Let's insert the words they don't
| want to.
|
| "We do exactly once good enough for your application!"
|
| How could you possibly know that?
|
| "We do exactly once good enough for all practical purposes!"
|
| So are my purposes practical? _Am I_ a true scotsman?
|
| "We do exactly once good enough for applications we're good
| enough for!"
|
| Finally, a truthful claim. I'm not holding my breath.
|
| Edit: More directly address parent:
|
| It's not possible to even guarentee at-least-once in a finite
| amount of time, let alone exactly-once. For example, a high-
| frequency trading system wants to deliver exactly one copy of
| that 10-million-share order. And do it in under a
| microsecond. If you've got a 1.5 microsecond round trip, it's
| not possible to even get confirmation an order was received
| in that time limit, much less attempt another delivery. Your
| options are a) send once and hope. b) send lots of copies,
| with some kind of unique identifier so the receiver can do
| process-at-most-once, and hope you don't wind up sending to
| two different receiving servers.
| sidewndr46 wrote:
| This didn't stop AWS from at one point declaring "none of the
| above" as their delivery guarantees for S3 events:
|
| https://www.hydrogen18.com/blog/aws-s3-event-notifications-p...
|
| From what I can tell my blog post eventually motivated them to
| change this behavior
| tombert wrote:
| Reminds me of this:
| https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...
|
| Part of the reason I find distributed and concurrent computing so
| fascinating is that we lose so many assumptions about how
| computers are supposed to work; ordering guarantees, data
| actually arriving to destinations, fairness, etc.
|
| While it is of course possible for data to get lost or mangled in
| a single-threaded, non-distributed application, it's rare enough
| to where you often don't need to plan much around it. Once you
| add concurrency, and especially if you make it distributed, you
| suddenly have to plan around the _eventuality_ of the messages
| not arriving, or arriving in the wrong order, or arriving
| multiple times.
|
| Personally, I've gotten into the habit of PlusCal and TLA+ for
| everything, and modeling in the assumption that stuff might never
| arrive or arrive multiple times, which can work as a very good
| sanity check (though it's of course not a silver bullet).
| Animats wrote:
| There's no such thing as finite-time arbitration for
| multiprocessor requests to memory, but it works anyway. This is
| similar. You can't guarantee time-bounded exactly once delivery,
| but after enough interchanges the probability of being wrong
| approaches zero.
|
| This shows up explicitly in blockchain systems, where confidence
| improves as the number of confirmation cycles increases.
| YZF wrote:
| This is the first time I'm hearing about the topic of
| arbitrating multiprocessor access to memory. Isn't there a
| trivial counterexample where the bandwidth is divided by the
| number of processors and each processor gets a timeslot?
|
| Processors, despite being a distributed system (well really
| every ASIC is), don't typically suffer from your classical
| distributed systems style issues because they have no failures,
| or rather they only have certain catastrophic failure modes
| that are low probability and well controlled for. Your typical
| distributed system to contrast has constant failures.
| Animats wrote:
| > This is the first time I'm hearing about the topic of
| arbitrating multiprocessor access to memory.
|
| See [1]. This was a very real problem in early multiprocessor
| computer design.
|
| [1] https://en.wikipedia.org/wiki/Arbiter_%28electronics%29
| rcxdude wrote:
| It's more to do with the underlying electronics: digital is
| an abstraction, everything is analog underneath, and logic
| gates and registers are actually just very non-linear analog
| circuits. What this means is that it's always in principle
| possible for an asynchronous edge (i.e. one that is not
| aligned to the other edges in the system) to be arbitrarily
| non-determinate: never actually being decided as a 1 or a 0
| as it propagates through the digital circuit. This isn't an
| entirely theoretical issue, either: it can and will cause
| very strange behaviour in a digital system because a marginal
| or metastable value can be interpreted differently by
| different parts of the circuit that should be seeing the same
| value. It's possible in practice to push the chance of this
| happening extremely low, but it comes at the cost of latency:
| you need to spend more time to allow the circuit to actually
| settle into one state or the other.
|
| Within a fully synchronous system, it's possible to avoid
| this, though. But most multi-processor systems are not
| synchronous.
| londons_explore wrote:
| > But most multi-processor systems are not synchronous.
|
| Is this actually true? A modern AMD or Intel CPU doesn't
| have one quartz crystal per core, so therefore all per-core
| clocks have to be derived from one clock source, meaning
| the relationship between all the clock frequencies and
| phases is deterministic, even if not specced in the
| datasheet.
| dang wrote:
| Recent and related:
|
| _Yes, you can have exactly-once delivery_ -
| https://news.ycombinator.com/item?id=41598006 - Sept 2024 (133
| comments)
|
| Also. Others?
|
| _You cannot have exactly-once delivery (2015)_ -
| https://news.ycombinator.com/item?id=34986691 - March 2023 (217
| comments)
|
| _Show HN: Exactly-once delivery counter using Watermill
| messaging library_ -
| https://news.ycombinator.com/item?id=26888303 - April 2021 (9
| comments)
|
| _High-performance, exactly-once, failure-oblivious distributed
| programming (2018)_ -
| https://news.ycombinator.com/item?id=20562733 - July 2019 (26
| comments)
|
| _Exactly-once Semantics: How Kafka Does it_ -
| https://news.ycombinator.com/item?id=14670801 - June 2017 (39
| comments)
|
| _Delivering Billions of Messages Exactly Once_ -
| https://news.ycombinator.com/item?id=14664405 - June 2017 (133
| comments)
|
| _You Cannot Have Exactly-Once Delivery_ -
| https://news.ycombinator.com/item?id=9266725 - March 2015 (55
| comments)
|
| _Exactly-Once Messaging in Kafka_ -
| https://news.ycombinator.com/item?id=8880612 - Jan 2015 (8
| comments)
|
| _Exactly-Once Delivery May Not Be What You Want_ -
| https://news.ycombinator.com/item?id=8614264 - Nov 2014 (10
| comments)
| notfed wrote:
| Tangent, but: this reminds me of many philosophical debates,
| e.g. "do we have free will?".
|
| What I've found is that usually what's happening is that the
| "debaters" assume different definitions of the thing they're
| debating about, which means they're really debating about two
| entirely different things, with the same name, and that's the
| real reason why they've come to different conclusions.
| rzzzt wrote:
| One recommendation I've read is that in such cases you should
| treat the single label you and your opponent have attached to
| two different ideas as forbidden. Describe it in more detail
| to reveal the difference (don't just swap words in the
| starting combination with their synonyms).
| wmf wrote:
| That's fine for discussions between experts but when a
| hundred noobs enter the discussion it's just unpaid
| tutoring. These threads are bringing out a lot of "I've
| discovered something all the distributed systems experts
| missed" energy.
| aidenn0 wrote:
| For me "do we have free will" debate is useless because if
| the answer is "no" then that implies the result of the debate
| is predetermined and all involved lack any agency to change
| the outcome.
|
| Therefore any adult arguing the "no free will" side is either
|
| 1. Disingenuous
|
| 2. Purposefully playing Candyland as an adult.
|
| Either of which is so damaging to the Ethos that I lack any
| desire to continue the debate.
| yarg wrote:
| That doesn't follow.
|
| There's nothing disingenuous or moronic about believing
| that physical processes might be fundamentally
| deterministic.
| aidenn0 wrote:
| > There's nothing disingenuous or moronic about believing
| that physical processes might be fundamentally
| deterministic.
|
| My argument is that a game of Candyland is fundamentally
| deterministic in the same way.
| thadt wrote:
| > With that definition, SQS, Kafka, and Sequin are all systems
| that guarantee exactly-once processing. The term processing
| captures both the delivery of the message and the successful
| acknowledgment of the message.
|
| This terminology confuses my face. Maybe we need a better name
| for "exactly-once processing"? When I say "processing" I'm
| thinking of the actions that my program is going to take on said
| message - the very processing that they're talking about can
| fail. Can we not just say 'the message is held until
| acknowledgement'?
| theptip wrote:
| The simple way to visualize this is that all the processing can
| happen within a serializable transaction. If anything fails,
| the transaction rolls back. Marking the message as delivered is
| part of the same transaction.
| thadt wrote:
| Which means that the 'processing' in my application has to
| have the semantics of a serializable transaction. And it
| might not - as in the very example given in the post, where a
| processing application sends an email.
|
| Hence my complaint that making statements like "our system
| offers exactly once processing" can be completely correct -
| from the point of view of the _messaging system_. But it can
| be misleading to the casual reader in understanding how it
| fits into the broader solution, because the messaging system
| cannot guarantee that my application processes it exactly
| once. I 'm saying the terminology doesn't feel right.
| griffgeorge wrote:
| > That's why we don't say Sequin offers exactly-once delivery
| anywhere. We say it offers at-least-once delivery and exactly-
| once processing.
|
| The author should note that the homepage at
| https://sequinstream.com/ does in fact claim that Sequin supports
| "Exactly-once delivery" under the "Key features" heading.
| notfed wrote:
| Well that doesn't seem consistent.
| bhaney wrote:
| It will be eventually
| _acco wrote:
| Aye, good catch! Honest mistake. Fixed.
| notfed wrote:
| Ack.
| rzzzt wrote:
| I didn't get this one.
| 0xbadcafebee wrote:
| I mean, you _can_ do exactly-once delivery, it 's just a more
| complex state machine with a specific design requirement. Here it
| is using mail: # Receive new mail message,
| incoming id '1234'. # No other writer knows about this
| message. $ echo "hello world" > incoming/1.1234.txt
| # If files stay in incoming/ too long, it means they probably
| failed writing, # and are thus stale and can be removed.
| # When the message is done being written, the writer moves it to
| a new state. $ mv incoming/1.1234.txt new/1.1234.txt
| # The writer can then report to the sender that the message was
| accepted. # In mail, this is seen as optional, as the whole
| point is to deliver the message, # not to make sure you
| inform the sender that you will deliver the message.
| # Here's the exactly-once bit: # If you need assurance of
| exactly-once delivery, split out the delivery portion #
| from the confirmation portion. # # Have one
| actor talk to the sender, and another actor queue the message in
| another # temporary queue, like 'accepted/'. Once the
| message is queued there, you tell the # client it is
| accepted, wait for successful termination of the connection, and
| then # move the message to the 'new/' queue. #
| # This way, if there is an error between accepting the message
| and confirming with # the sender, you can detect that and
| leave the message in 'accepted/' where you can # later
| decide what to do with it. (But it won't get picked up for
| processing) # The message is now ready to be picked
| up for processing. # To process the message, a processor
| picks up the new message and moves it to a new state. #
| # The processor adds an identifier '5678', so the processor can
| identify which file it's working on. # It can have extra
| logic to re-attempt processing on this ID if it fails. $ mv
| new/1.1234.txt process/1.1234.5678.txt # If the
| processor dies or takes too long, you can identify that (stale
| file, processor tracking its work, etc) # and this can be
| moved back to 'new/' for another attempt. Moving it back to
| 'new/' is still atomic so # there is no danger of another
| processor continuing to work on it. # Once the
| processor is done processing, it will move to the complete state.
| $ mv process/1.1234.5678.txt done/1.txt # If the
| processing file no longer exists, it means it was either already
| done, # or previously died and was moved back to 'new/'.
|
| This process all depends on a database [filesystem] using
| synchronous atomic operations [mv]. If your distributed system
| can't handle that, yeah, you're gonna have a hard time.
| mgsouth wrote:
| So that everyone's on the same page, perhaps you could clarify
| what "delivery" means to you? For example:
|
| * The human to whom the message is addressed reads it on their
| screen.
|
| * The email is inserted into that person's inbox. It's possible
| something will happen to the message after than point, before
| the addressee actually reads it, but that isn't what you're
| talking about.
|
| * The email has been accepted by a server operating as an
| endpoint for the addressee (their company's Exchange server),
| _and_ acknowledgement has been received by the sending server.
|
| * The above, but no guarentees about the sending server getting
| the acknowledgement.
|
| etc.
|
| [Edit: also, what "guarentee" means to you. 100%, 99.99999% is
| good enough, and so on.]
| bcrl wrote:
| You have made a false assumption: directory operations across
| directories are not atomic. The typical way around that is to
| hardlink the file into the target directory and then sync the
| target directory then delete the old name, but now you no
| longer know if the file is new and has been processed or old
| when the systems is interrupted with the intermediate state on
| disk with the file in both directories. Which is exactly the
| problem messaging systems have. There is always a point in the
| state machines across multiple systems where a crash at the
| right point in time can lead to replay of a message. I worked
| on persistent messaging for a few years, and it is impossible
| to resolve this in real world systems. The best you can do is
| to narrow the window by reducing latency of the network and for
| commits to persistent storage.
| magicalhippo wrote:
| This feels a bit like those relativity puzzles, for example the
| barn-pole paradox[1].
|
| That is, delivered exactly-once seems to be a relative statement
| where one party can state they did deliver exactly-once while
| another party can disagree, and they're both correct relative to
| their local frames of reference.
|
| So, like how GR has no global conservation of energy, just local,
| could one say that distributed systems have no global
| conservation of messages, only local?
|
| Or am I just late night rambling again...
|
| [1]: https://en.wikipedia.org/wiki/Ladder_paradox
| socketcluster wrote:
| The "exactly once delivery is impossible" narrative got so strong
| that people feel that they have to redefine the word 'delivery'
| to make it fit the premise.
|
| If a message can be guaranteed to be received one or more times
| but results in a single entry being inserted into a database
| (e.g. in an idempotent way using UUIDs to enforce uniqueness),
| then clearly, a consumer reading the records from that database
| would only receive the message exactly once. If you consider the
| intended recipient to be the process, not the host, then the
| definition of 'delivery' allows you to claim that exactly-once
| delivery is possible.
|
| Of course this is a pragmatic, not mathematical argument. In
| theory you can't mathematically guarantee even at least once
| delivery because you can't guarantee that there's not going to be
| a Carrington event which will bring down the internet and drive
| humanity to extinction; thus preventing the consumer server from
| being rebooted.
| YZF wrote:
| What about other messages though? The question isn't usually
| limited to one message in isolation, it's related to the state
| of the system as a whole. So sure, trivially, setting one bit
| where that bit is never cleared can only be done once, so any
| process setting that bit is also trivially (and not very
| usefully?) "exactly once delivery".
| mgsouth wrote:
| On the contrary, that is an extremely useful process, and is
| the basis for vast swathes of multiprocessing algorithms
| (such as "has this thing finished initializing", or "has it
| been freed"). However, it is "at most once", not "exactly
| once". The bit may never be set.
| benlivengood wrote:
| Messages don't have to be delivered more than once to achieve
| exactly-once processing; the receiver can offer a query interface
| for delivery status. If every attempt at delivery checks for a
| unique message ID and only sends the message if it has not been
| received then you get exactly-once delivery of the _message_ , if
| not the _metadata_.
|
| Optimistically assuming the message has not been delivered yet
| will sometimes reduce average latency (most databases), but for
| very expensive messages it is better to check first (rsync)
| dataflow wrote:
| If there is no such thing as exactly-once delivery because the
| recipient might not feel like deduplicating, then there can be no
| such thing as at-least-once delivery either, because the
| recipient might not feel like being endlessly available for
| delivery either. Or the mailman might not feel like
| (re-)attempting deliveries endlessly either. Or the sender might
| not feel like re-sending things endlessly either.
|
| Is this insightful? Surprising? What's the point of the debate?
___________________________________________________________________
(page generated 2024-09-30 23:00 UTC)