[HN Gopher] RabbitMQ vs. Kafka - An Architect's Dilemma (Part 1)
___________________________________________________________________
RabbitMQ vs. Kafka - An Architect's Dilemma (Part 1)
Author : gslin
Score : 78 points
Date : 2023-09-19 18:54 UTC (4 hours ago)
(HTM) web link (eranstiller.com)
(TXT) w3m dump (eranstiller.com)
| dividedbyzero wrote:
| NATS (https://nats.io/) is another option, though I'm not sure if
| it's still considered a viable Kafka replacement.
| fuzzy2 wrote:
| NATS is something else, but it's awesome. It has awesome
| throughput and latency out of the box (without Jetstream),
| while using little resources.
|
| I'd recommend considering it, especially as an alternative to
| RabbitMQ.
| speedgoose wrote:
| I only tested NATS using JetStream and I struggled with the
| throughput in Python. I probably used it wrong. But your
| comment may imply that jetstream is slow.
| to11mtm wrote:
| I think sometimes the client bindings are/were in need of
| improvement.
|
| As an example, the C# API was originally very 'go-like' and
| written to .NET Framework, didn't take advantage of a lot
| of newer features... to the point a 3rd party client was
| able to get somewhere between 3-4x the throughput. This is
| now finally being rectified with a new C# client, however
| it wouldn't surprise me if other languages have similar
| pains.
|
| I haven't tested JetStream but my general understanding is
| that you do have to be mindful of the different options for
| publishing; especially in the case of JetStream, a
| synchronous publish call can be relatively time consuming;
| it's better to async publish and (ab)use the future for
| flow control as needed.
| KaiserPro wrote:
| we used it for some low latency stuff in python. it was
| about 10ms to enqueue at worst. However we were using raw
| NATS, and had a clear SLA that meant that the queue was
| allowed to be offline or messages lost, so long as we
| notified the right services.
| klabb3 wrote:
| It's true FOSS, and the server is standalone Go binary that's
| so small it can even be embedded. Lots of language bindings for
| clients. Has persistence, durability, and nicely aligns into a
| raft-like cluster in a DC without a separate orchestrator.
|
| I'm a big fan - never understood why it's not at the top of the
| list in these tech reviews.
| nhumrich wrote:
| Rabbitmq is FOSS, has lots of language bindings. It has
| persistence, durability, and doesn't require a separate
| orchestrator.
| vorpalhex wrote:
| Different set of promises. NATS is great but has a different
| tradeoff bargain from Rabbit or Kafka.
| KaiserPro wrote:
| If I'm not buying a message bus in as a service, then NATS is
| great for pub/sub and or message passing system
|
| it is simple to configure, has good documentation, and
| excellent integration into most languages. It guarantees
| uptime, and thats about it. It clusters really well, so you can
| swap out instances, or scale in/out as you need.
| [deleted]
| bicijay wrote:
| Message ordering is an illusion. Unless you track/store the
| messages on the client and are willing to deal with stuck queues
| due to failures in one "poisoned" message.
| klabb3 wrote:
| There are different kinds of order. Yes, there's no total order
| in a distributed system, but you can have certain partial order
| guarantees. It's nice if something is added before it's
| updated, for instance.
| rafaelturk wrote:
| Couldn't agree more, messages should be completely agnostic
| from one-another. If you have a decent event-driven
| architecture, you don't need kafka. and you can be happy with
| Redis or RabbitMQ
| tannhaeuser wrote:
| I think this is an excellent article. The only thing I'd add is
| that RabbitMQ is an implementation of AQMP (optionally v1.0) as a
| standardized broker service protocol so is designed to be
| interchangeable with other extant implementations such as Apache
| Active MQ and Cupid whereas Kafka is one-of-a-kind software.
| Beyond that RabbitMQ has standardized client libs and frameworks
| in Java land if that matters to you - it did matter in the
| original context of message queue middlewares and SOA from where
| AMQP originated and where enterprise messaging sees major use.
| OTOH Kafka, with caveats, is in principle more "web scale" -
| though that is far from a free ride.
| purpleblue wrote:
| If someone is asking if they should decide between RabbitMQ vs
| Kafka, they should 100% use RabbitMQ. It means they have no idea
| what they're dealing with, the architectural differences, and the
| investment that the company needs in order to use Kafka.
|
| So use RabbitMQ.
| postalrat wrote:
| Redis isn't an option?
| rafaelturk wrote:
| Yes, Don't why is not mentioned in the article, maybe because
| is more barebones.
| noitpmeder wrote:
| I'd also like someone with experience to contrast redis for
| these use cases.
| FridgeSeal wrote:
| I'm personally a fan of Kafka. I think the design of persisting
| the messages, and tracking offsets for progress instead of
| message acknowledgments is a much cleaner and more versatile
| design.
|
| You can get all the same advantages of message acknowledgments,
| but now you can also replay queues, let different applications
| use the messages (handy for cross cutting event/notification
| systems) and you get better scaling properties-which doesn't hurt
| at the small scale, and provides further scaling when you need
| it.
| whalesalad wrote:
| > You can get all the same advantages of message
| acknowledgments, but now you can also replay queues
|
| with rmq you can reject/nack a message and have it put back on
| the queue. rmq is not well suited for long term historical
| retention inside queues a-la kafka's logs but it is possible to
| do.
|
| > let different applications use the messages (handy for cross
| cutting event/notification systems)
|
| rmq also does a publish once and fanout to multiple queues to
| support this. data is replicated so that could be a deal
| breaker, but it is possible.
|
| how often have you had to diagnose a stuck consumer or some
| other kind of offset glitch where a consumer is unable to
| resume where it left off?
|
| not knocking kafka here but I do think it is a tool you should
| reach for when you need to solve a very hyper focused problem,
| while rabbit is a tool more suited to most cases where queuing
| is required. kafka is a code smell in a lot of organizations
| from my experience - most do not need it.
| raducu wrote:
| > afka is a code smell in a lot of organizations from my
| experience - most do not need it.
|
| Kafka is really nice if you don't care that much about
| latency during peak load and you don't have absurd processing
| times for messages.
| raducu wrote:
| > You can get all the same advantages of message
| acknowledgments.
|
| Maybe 95% of cases, but not all.
|
| Long message processing time really kills kafka in a way it
| doesn't kill Rabbit Mq. Combine it with inherent read
| paralelism being limited to the number of partitions. Add in
| high variability of message rates and bingo, that's like 90% of
| the issues I've had with kafka over the years.
| KaiserPro wrote:
| > now you can also replay queues
|
| yeahnah, that leads to people treating queues like databases
| (I'm looking at you new york times, you know what you did
| wrong)
|
| its either a queue, or a pubsub, either way its ephemeral. Once
| its gone, it should stay gone. thats what database, object
| stores or filesystems are for.
|
| Kafka is a beast, has lots of bells and whistles and grinds to
| a halt when you look at it funny. Yes, it can scale, but also
| it can just sulk.
|
| rabbit has it's own set of problems, and frankly it's probably
| not choose either anymore.
| officialchicken wrote:
| > (I'm looking at you new york times, you know what you did
| wrong)
|
| You're going to have to be a tiny bit more specific here. NYT
| is THE factory of wrongness for sure. In every dimension. Are
| we talking "yellow cake" wrong, or somewhere else on the
| severity of f'up scale...
| supermatt wrote:
| The same MQ patterns as mentioned in the article (exactly once,
| consumer groups) can also be done in kafka, contrary to what the
| article suggests.
| robertlagrant wrote:
| > one is a message broker, and the other is a distributed
| streaming platform
|
| I think this is an odd way of putting it. One is smart messaging;
| dumb clients. The other is dumb messaging; smart clients. It
| turns out the latter (i.e. Kafka) scales wonderfully so you can
| send more data, but you add complexity to your clients, who can't
| just now pluck messages off a queue to process, or have messages
| retry upon the first 3 failures, as they could with RabbitMQ.
|
| Having said that, Kafka lets you keep all your data, so you don't
| have to worry about losing messages to unexpected interactions
| between RabbitMQ rules. But having said _that_ , now you have to
| store all your data.
| supermatt wrote:
| > One is smart messaging; dumb clients. The other is dumb
| messaging; smart clients.
|
| All the smartness of the messaging can be implemented in the
| smart clients. Then you can expose that as a smart messaging
| api to dumb clients.
|
| The most obvious example is kafka streams which exposes a
| "simple" api rather than dealing directly with kafka, but
| obviously you could create a less featurefull wrapper than
| that.
| waynesonfire wrote:
| And reimplement rabbitmq? Great idea. Let's do it in rust
| too.
| [deleted]
| NovemberWhiskey wrote:
| > _All the smartness of the messaging can be implemented in
| the smart clients._
|
| How do you do, for example, a queue with priorities client
| side without it being insanity? That's a relatively basic
| AMQP thing. Or managing the number of redeliveries for a
| message that's being repeatedly rejected.
|
| You can absolutely try to build some of this with a look-
| aside shared data store that all clients have to depend on in
| order to emulate having the capability in the broker, but you
| just introduced another common point of failure in addition
| to the messaging infrastructure. Life is too short for this.
| supermatt wrote:
| I totally agree that you cant do a lot of AMQP stuff. As
| you noted, you can build some of it by managing state via
| transactional producers, etc - but you definitely cant do
| everything. The biggest gripe for me is actually dynamic
| "queue" creation, patterns for topics, etc. So I use an MQ
| for an MQ ;)
|
| I'm just saying you can "dumb down" the client side on
| kafka by creating an abstraction layer (or one of the many
| higher level libs that already do that).
| eddythompson80 wrote:
| I can't help but think that this just gives you the worst of
| both worlds. You are now on the hook managing that non-
| standard "smart" wrapper which will quickly just become the
| status quo for the project. Anyone wanting to change how it
| works needs to understand exactly how "smart" you made it and
| all the side effects that will come with making a change
| there.
|
| I pushed against knative in our company particularly for that
| reason. Like we wanna use kafka because [Insert kafka sales
| pitch], but we don't want our developers to utilize any of
| the kafka features. We're just gonna define the kafka client
| in some yaml format and have our clients handle an http
| request per message. It didn't make sense to me.
| supermatt wrote:
| Thats kind of like saying dont use any software libraries
| because they all use the standard lib indirectly so you may
| as well just use that?
|
| Its just an abstraction layer to make things less effort.
| eddythompson80 wrote:
| yeah, don't wrap all calls to a standard lib in another
| homegrown or non-standard single-digit user lib that
| makes changes in all sort of subtle ways. There are
| plenty of C++ projects that make their own or wrap stdlib
| and they are always a big wtf.
|
| It's one thing to have an abstraction for kafka in your
| code, it's another to wrap the client in a smart client
| that reimplements something like rabbitmq, and much worse
| a smart service.
| supermatt wrote:
| > don't wrap all calls to a standard lib
|
| Im not saying to expose the same primitives - what would
| be the point in that? I am saying that EVERY lib you use
| will be using the standard lib or some abstraction of it
| to perform its own utility.
|
| > It's one thing to have an abstraction for kafka in your
| code, it's another to wrap the client in a smart client,
| and much worse a smart service.
|
| That abstraction is exactly what i am talking about. Why
| write 50 lines of boilerplate multiple times throughout
| your code when you can wrap that up in a single function
| call and expose THAT as your client. You know thats
| exactly what you will end up doing on any non-trivial
| project. Or you could use a lib that already does that -
| such as the "official" kafka streams lib.
| robertlagrant wrote:
| This would be my instinct too.
| raducu wrote:
| > who can't just now pluck messages off a queue to process
|
| The problem is you cannot mark individual messages as read, for
| a given consumer&partition you can only update the offset for a
| partition.
|
| If a certain message processing takes very long, all other
| messages in that partition will have to wait.
|
| Also, with kafka, the max read concurrency is equal to the
| number of partitions, for something like rabbitMq it is much
| higher; but you do get nice message ordering for any given
| partition in kafka which you do not get in RabbitMq (afik); you
| are also get some really nice data locality with kafka because
| unless the consumers get the partitions re-assigned, all
| messages for the same key are served on the same physical
| consumer.
| math wrote:
| Worth noting that Kafka is getting queues: https://cwiki.apac
| he.org/confluence/display/KAFKA/KIP-932%3A...
| ryanjshaw wrote:
| > The problem is you cannot mark individual messages as read,
| for a given consumer&partition you can only update the offset
| for a partition.
|
| Hence "smart clients". If you MUST process every message at
| least once, you will anyway be tracking messages individually
| on the client (e.g. a DB or file system plus logic for
| idempotent message processing) and thus disable auto-offset
| commits back to the cluster for your consumer.
|
| RabbitMQ says "let me track this for you", Kafka says "you
| already need to track this so why duplicate the data in the
| cluster and complicate the protocol".
|
| If you don't have reliable persistent storage available and
| insist on using the Kafka cluster to track offsets, you can
| track processed offsets in memory and whenever your lowest
| processed offset moves forward, you have your consumer commit
| that offset manually as part of its message loop.
|
| If your service restarts your downstream commands need to be
| idempotent of course because you will reconsume messages you
| may have previously processed, but this would be the case
| regardless of Kafka or RabbitMQ unless you're using
| distributed transactions (yuck).
|
| > If a certain message processing takes very long, all other
| messages in that partition will have to wait.
|
| You can stream messages into a buffer and process them in
| parallel, and commit the low watermark offset whenever it
| changes, as described above. I've implemented this in .NET
| with Channels and saturate the CPUs with no problem.
| lmm wrote:
| > You can stream messages into a buffer and process them in
| parallel, and commit the low watermark offset whenever it
| changes, as described above. I've implemented this in .NET
| with Channels and saturate the CPUs with no problem.
|
| And there are libraries that will manage all this for you
| e.g. https://github.com/line/decaton
| rafaelturk wrote:
| Nice post! RabbitMq is battle tested, exceptionally fast and low
| resources app. Capable of handling millions of
| transactions/second. RabbitMQ will handle vast majority of
| usecases. I'm puzzled why often startups, or even banks use
| Kafka, soley because is hype. Kafka on the order hand requires
| massive CPUs, Memory, often requiring its own K8S cluster just to
| be alive.
| rafaelturk wrote:
| If your have a clean event-driven architecture, ie messages are
| completely agnostic and decoupled from one-another you don't
| need Kafka.
| bhouston wrote:
| If you use Confluent Kafka, the billing is pretty high. About 4
| years ago it was much cheaper, but then they completely revamped
| the pricing to something ridiculous. I found that switching to
| Google Pub/Sub, at least if it meets your needs, is cheaper.
| esafak wrote:
| I see they offer Kafka's exactly-once delivery:
| https://cloud.google.com/blog/products/data-analytics/cloud-...
| AndyPa32 wrote:
| Yes, I can confirm that. Confluent is the most expensive part
| of our current infrastructure.
| bhouston wrote:
| Can you switch away from it? Or do you need its advanced
| features?
| BWStearns wrote:
| My comment is mostly about part 2 of this post, but wrt message
| ordering being a kafka "win" I'd raise the point that in the
| actual use case of "a consumer fails in some way to process the
| message" you can still end up with out of order processing of the
| consumer's input since you might want to dump them into a DLQ or
| something. The fact that the message isn't reappended to the
| topic by default for processing is kind of an academic point no?
|
| Unrelatedly, I've been looking at Pulsar lately. Anyone have
| experience with Pulsar and either RMQ/Kafka want to throw out
| some opinions from having tried both?
| whalesalad wrote:
| One is a tomato, the other is an orange. From a distance they
| might look alike but they really are two completely different
| tools. This is a pretty solid explanation of the differences with
| good illustrations.
|
| Rabbit can do everything Kafka does - and much more - in a more
| configurable manner. Kafka is highly optimized for essentially
| one use case and does that well. Nothing in life is free, there
| are trade-offs everywhere. I am not privy to which one is
| theoretically faster - but once you reach that question methinks
| the particular workload is the deciding factor.
| JohnMakin wrote:
| I am not an expert in either and have only worked with Kafka.
| At a past job I had to write a connector job to parse and
| sanitize some extremely dirty, unstructured data and pass it
| along somewhere else. RabbitMQ supports this? What is the one
| use case of kafka? I think you have it backwards.
| whalesalad wrote:
| > parse and sanitize some extremely dirty, unstructured data
| and pass it along somewhere else
|
| can you be more specific? that to me sounds like hello world
| for either of these tools. "santize data" is an application
| level concern that neither rabbit or kafka would handle. as
| far as "pass along somewhere else" again both tools can do.
| JohnMakin wrote:
| It was a Sink Connector. I don't know what it was or wasn't
| supposed to do but I was asked to do it, as is often the
| case in tech. I could have done any number of
| transformations in that process though, which I'm not sure
| rabbitmq supports
| whalesalad wrote:
| It sounds to me like you aren't really even sure what you
| built. I have operated both rabbit and kafka at scale I
| definitely do not have it backwards :)
| JohnMakin wrote:
| No, I'm not, because it was years ago, and I'm asking for
| clarification because what was said immediately sounded
| wrong to me (I've managed a lot of rabbitmq deployments)
| and you've not really given one other than an appeal to
| authority. guess I have my answer. Can't find anything
| that suggests rabbitmq natively supports anything like
| sink connectors. thanks.
| NovemberWhiskey wrote:
| > _I am not an expert in either and have only worked with
| Kafka._
|
| > _I've managed a lot of rabbitmq deployments_
|
| ... ?
| JohnMakin wrote:
| You do not need to be an expert in something's internal
| workings to manage/monitor a deployment. Surely this does
| not need to be explained further.
| lnenad wrote:
| A classic exchange on the internet.
| FINDarkside wrote:
| [flagged]
| supermatt wrote:
| > Can't find anything that suggests rabbitmq natively
| supports anything like sink connectors
|
| Kafka doesnt natively support them either. That would be
| Kafka Connect. I guess you could use it as an MQ, but it
| wouldnt be a very good one. Its more used as a data
| integration platform. If you want more MQ-like
| functionality OOTB on top of Kafka you would want to use
| something like Kafka Streams instead.
| JohnMakin wrote:
| Thanks for this clarification, this is what I was after.
| KaiserPro wrote:
| Rabbit is an arse to scale past one broker. It was possible,
| but a pain, that might have changed.
|
| Kafka is just a pain full stop.
| icedchai wrote:
| At a previous company, about 10 years ago, we had roughly 10
| RabbitMQ instances (brokers), all isolated. The system was
| essentially partitioned by queue server. We had a directory-
| ish service that would associate clients with their assigned
| node. It worked well, except if a client got too large we
| might have to move them to another queue.
| nhumrich wrote:
| The official rabbitmq controller for kubernetes is a breeze.
| Scales wonderfully without almost any effort.
___________________________________________________________________
(page generated 2023-09-19 23:00 UTC)