[HN Gopher] Message Deduplication with RabbitMQ Streams
___________________________________________________________________
Message Deduplication with RabbitMQ Streams
Author : DeadTrickster
Score : 39 points
Date : 2021-07-30 14:04 UTC (8 hours ago)
(HTM) web link (blog.rabbitmq.com)
(TXT) w3m dump (blog.rabbitmq.com)
| Uberzi wrote:
| I don't get this ... Why is it the broker's responsibility to
| make sure the publishers are not duplicating ? Rather than fixing
| the code and fix the root cause ?
|
| With this feature, the producer still needs to implement a
| strictly increasing sequence value, make sure its state is
| persisted, and use that as the publishing ID.
|
| What if you need to restore your DB after a crash, and need to
| reprocess data to catch up ? If you stored your publishing ID
| sequence in that DB, your "smart" broker will happily drop these
| messages ... !?
|
| Unless there's something I don't understand, that sounds like a
| really bad "good idea".
| Dionakra wrote:
| Exactly-once semantics is virtually impossible to achieve,
| there is a lot of literature all over the internet, but an easy
| example could be the following.
|
| Imagine a client sending messages to RabbitMQ with some retry
| policy if it doesn't receive the ACK from the broker. If the
| client sends the message and it doesn't receive the ACK in the
| terms it has been defined (maybe a timeout of 30s), it will
| retry to send the message, as the consumer assumes that the
| message hasn't been received by the brokers, but it could be
| that the ACK back to the client is the one that failed to be
| sent to the client. The brokers actually saved the message, and
| they stored it, but the client doesn't know, so it retries the
| message.
|
| If you don't have some control over the messages, this retried
| message is _new_ to RabbitMQ, so it will store it and send back
| the ACK. Maybe this time it is successful and no other retries
| are made.
|
| With this scenario, the brokers would have received the same
| message twice. By adding this kind of control (Kafka does more
| or less the same by discarding messages with already processed
| IDs when configured as exactly-once) you can try to avoid
| duplicates. Of course it is limited by memory and it is not in
| fact exactly-once semantics, so they are calling it now
| _effectively exactly-once semantics_, as it is more precise.
___________________________________________________________________
(page generated 2021-07-30 23:01 UTC)