[HN Gopher] Message Deduplication with RabbitMQ Streams
       ___________________________________________________________________
        
       Message Deduplication with RabbitMQ Streams
        
       Author : DeadTrickster
       Score  : 39 points
       Date   : 2021-07-30 14:04 UTC (8 hours ago)
        
 (HTM) web link (blog.rabbitmq.com)
 (TXT) w3m dump (blog.rabbitmq.com)
        
       | Uberzi wrote:
       | I don't get this ... Why is it the broker's responsibility to
       | make sure the publishers are not duplicating ? Rather than fixing
       | the code and fix the root cause ?
       | 
       | With this feature, the producer still needs to implement a
       | strictly increasing sequence value, make sure its state is
       | persisted, and use that as the publishing ID.
       | 
       | What if you need to restore your DB after a crash, and need to
       | reprocess data to catch up ? If you stored your publishing ID
       | sequence in that DB, your "smart" broker will happily drop these
       | messages ... !?
       | 
       | Unless there's something I don't understand, that sounds like a
       | really bad "good idea".
        
         | Dionakra wrote:
         | Exactly-once semantics is virtually impossible to achieve,
         | there is a lot of literature all over the internet, but an easy
         | example could be the following.
         | 
         | Imagine a client sending messages to RabbitMQ with some retry
         | policy if it doesn't receive the ACK from the broker. If the
         | client sends the message and it doesn't receive the ACK in the
         | terms it has been defined (maybe a timeout of 30s), it will
         | retry to send the message, as the consumer assumes that the
         | message hasn't been received by the brokers, but it could be
         | that the ACK back to the client is the one that failed to be
         | sent to the client. The brokers actually saved the message, and
         | they stored it, but the client doesn't know, so it retries the
         | message.
         | 
         | If you don't have some control over the messages, this retried
         | message is _new_ to RabbitMQ, so it will store it and send back
         | the ACK. Maybe this time it is successful and no other retries
         | are made.
         | 
         | With this scenario, the brokers would have received the same
         | message twice. By adding this kind of control (Kafka does more
         | or less the same by discarding messages with already processed
         | IDs when configured as exactly-once) you can try to avoid
         | duplicates. Of course it is limited by memory and it is not in
         | fact exactly-once semantics, so they are calling it now
         | _effectively exactly-once semantics_, as it is more precise.
        
       ___________________________________________________________________
       (page generated 2021-07-30 23:01 UTC)