[HN Gopher] On efficiently partitioning a topic in Apache Kafka
___________________________________________________________________
On efficiently partitioning a topic in Apache Kafka
Author : belter
Score : 70 points
Date : 2022-05-20 15:42 UTC (2 days ago)
(HTM) web link (arxiv.org)
(TXT) w3m dump (arxiv.org)
| belter wrote:
| PDF: https://arxiv.org/pdf/2205.09415.pdf
|
| "...Even though Apache Kafka provides some out of the box
| optimizations, it does not strictly define how each topic shall
| be efficiently distributed into partitions. The well-formulated
| fine-tuning that is needed in order to improve an Apache Kafka
| cluster performance is still an open research problem.
|
| In this paper, we first model the Apache Kafka topic partitioning
| process for a given topic. Then, given the set of brokers,
| constraints and application requirements on throughput, OS load,
| replication latency and unavailability, we formulate the
| optimization problem of finding how many partitions are needed
| and show that it is computationally intractable, being an integer
| program.
|
| Furthermore, we propose two simple, yet efficient heuristics to
| solve the problem: the first tries to minimize and the second to
| maximize the number of brokers used in the cluster.
|
| Finally, we evaluate its performance via large-scale simulations,
| considering as benchmarks some Apache Kafka cluster configuration
| recommendations provided by Microsoft and Confluent. We
| demonstrate that, unlike the recommendations, the proposed
| heuristics respect the hard constraints on replication latency
| and perform better w.r.t. unavailability time and OS load, using
| the system resources in a more prudent way..."
| claytonjy wrote:
| To what extent do these concerns change or disappear if using
| alternatives, like Pulsar or Redpanda?
|
| It's been awhile since I dug into this but a couple jobs ago I
| was very concerned with what happens if replaying a topic from 0
| for a new consumer: existing, up-to-date consumers are negatively
| impacted! As I recall this was due to fundamental architecture
| around partitions, and a notable advantage of Pulsar was not
| having such issues. Is that correct? Is that still the case?
| bobgt wrote:
| Assuming i) you've set topic retention so that old messages
| still exist and your new consumer can "replay from 0"; and ii)
| your new consumer is using their own consumer group, then
| existing consumers won't be impacted.
| claytonjy wrote:
| Is there not resource contention when multiple consumer
| (groups) are reading from the same Kafka partition? You can
| of course over-provision your partitions to better allow for
| this, but rebalancing is not cheap and also tends to affect
| consumers.
|
| I may be describing the problem incorrectly, but I know
| vendors we talked to were aware of this issue and had
| workarounds; IIRC Aiven had tooling to easily spin up a
| temporary new "mirror" cluster for the new consumer to catch
| up.
| morelisp wrote:
| Starting a new consumer group won't cause rebalances
| (except in the new group).
|
| It sounds like you're asking if you can double the number
| of readers in a system with no performance impact. If
| you're at capacity, the answer is obviously no. Yes, every
| consumer takes some i/o and CPU on the brokers serving the
| data. I have never used Pulsar but I'm sure that's also the
| case there.
| bacheaul wrote:
| There's also the file system cache to consider, which
| Kafka famously leans on heavily for IO performance. If
| the majority of your consumers are reading the latest
| messages which were just written, these will likely come
| from the cache which is in memory. A consumer reading
| from the earliest messages on a large topic could
| conceivably cause changes to what's available from file
| system cache for other consumers reading from the latest
| messages, so they're not necessarily totally isolated.
| I've not taken measurements of this though to say it's an
| actual issue, just saying that I wouldn't dismiss it.
| EdwardDiego wrote:
| No, Pulsar has partitions too. And I'm surprised you saw a
| massive effect unless the brokers were under huge load.
___________________________________________________________________
(page generated 2022-05-22 23:01 UTC)