[HN Gopher] On efficiently partitioning a topic in Apache Kafka
       ___________________________________________________________________
        
       On efficiently partitioning a topic in Apache Kafka
        
       Author : belter
       Score  : 70 points
       Date   : 2022-05-20 15:42 UTC (2 days ago)
        
 (HTM) web link (arxiv.org)
 (TXT) w3m dump (arxiv.org)
        
       | belter wrote:
       | PDF: https://arxiv.org/pdf/2205.09415.pdf
       | 
       | "...Even though Apache Kafka provides some out of the box
       | optimizations, it does not strictly define how each topic shall
       | be efficiently distributed into partitions. The well-formulated
       | fine-tuning that is needed in order to improve an Apache Kafka
       | cluster performance is still an open research problem.
       | 
       | In this paper, we first model the Apache Kafka topic partitioning
       | process for a given topic. Then, given the set of brokers,
       | constraints and application requirements on throughput, OS load,
       | replication latency and unavailability, we formulate the
       | optimization problem of finding how many partitions are needed
       | and show that it is computationally intractable, being an integer
       | program.
       | 
       | Furthermore, we propose two simple, yet efficient heuristics to
       | solve the problem: the first tries to minimize and the second to
       | maximize the number of brokers used in the cluster.
       | 
       | Finally, we evaluate its performance via large-scale simulations,
       | considering as benchmarks some Apache Kafka cluster configuration
       | recommendations provided by Microsoft and Confluent. We
       | demonstrate that, unlike the recommendations, the proposed
       | heuristics respect the hard constraints on replication latency
       | and perform better w.r.t. unavailability time and OS load, using
       | the system resources in a more prudent way..."
        
       | claytonjy wrote:
       | To what extent do these concerns change or disappear if using
       | alternatives, like Pulsar or Redpanda?
       | 
       | It's been awhile since I dug into this but a couple jobs ago I
       | was very concerned with what happens if replaying a topic from 0
       | for a new consumer: existing, up-to-date consumers are negatively
       | impacted! As I recall this was due to fundamental architecture
       | around partitions, and a notable advantage of Pulsar was not
       | having such issues. Is that correct? Is that still the case?
        
         | bobgt wrote:
         | Assuming i) you've set topic retention so that old messages
         | still exist and your new consumer can "replay from 0"; and ii)
         | your new consumer is using their own consumer group, then
         | existing consumers won't be impacted.
        
           | claytonjy wrote:
           | Is there not resource contention when multiple consumer
           | (groups) are reading from the same Kafka partition? You can
           | of course over-provision your partitions to better allow for
           | this, but rebalancing is not cheap and also tends to affect
           | consumers.
           | 
           | I may be describing the problem incorrectly, but I know
           | vendors we talked to were aware of this issue and had
           | workarounds; IIRC Aiven had tooling to easily spin up a
           | temporary new "mirror" cluster for the new consumer to catch
           | up.
        
             | morelisp wrote:
             | Starting a new consumer group won't cause rebalances
             | (except in the new group).
             | 
             | It sounds like you're asking if you can double the number
             | of readers in a system with no performance impact. If
             | you're at capacity, the answer is obviously no. Yes, every
             | consumer takes some i/o and CPU on the brokers serving the
             | data. I have never used Pulsar but I'm sure that's also the
             | case there.
        
               | bacheaul wrote:
               | There's also the file system cache to consider, which
               | Kafka famously leans on heavily for IO performance. If
               | the majority of your consumers are reading the latest
               | messages which were just written, these will likely come
               | from the cache which is in memory. A consumer reading
               | from the earliest messages on a large topic could
               | conceivably cause changes to what's available from file
               | system cache for other consumers reading from the latest
               | messages, so they're not necessarily totally isolated.
               | I've not taken measurements of this though to say it's an
               | actual issue, just saying that I wouldn't dismiss it.
        
         | EdwardDiego wrote:
         | No, Pulsar has partitions too. And I'm surprised you saw a
         | massive effect unless the brokers were under huge load.
        
       ___________________________________________________________________
       (page generated 2022-05-22 23:01 UTC)