https://arxiv.org/abs/2205.09415

close this message
arXiv smileybones icon

Giving Week!

Show your support for Open Science by donating to arXiv during Giving
Week, April 25th-29th.

DONATE
Skip to main content
Cornell University
We gratefully acknowledge support from
the Simons Foundation and member institutions.
 
arxiv logo > cs > arXiv:2205.09415
[                    ]

Help | Advanced Search

[All fields        ]
Search
arXiv logo
Cornell University Logo
[                    ] GO
quick links

  * Login
  * Help Pages
  * About

Computer Science > Networking and Internet Architecture

arXiv:2205.09415 (cs)
[Submitted on 19 May 2022]

Title:On Efficiently Partitioning a Topic in Apache Kafka

Authors:Theofanis P. Raptis, Andrea Passarella
Download PDF

    Abstract: Apache Kafka addresses the general problem of
    delivering extreme high volume event data to diverse consumers
    via a publish-subscribe messaging system. It uses partitions to
    scale a topic across many brokers for producers to write data in
    parallel, and also to facilitate parallel reading of consumers.
    Even though Apache Kafka provides some out of the box
    optimizations, it does not strictly define how each topic shall
    be efficiently distributed into partitions. The well-formulated
    fine-tuning that is needed in order to improve an Apache Kafka
    cluster performance is still an open research problem. In this
    paper, we first model the Apache Kafka topic partitioning process
    for a given topic. Then, given the set of brokers, constraints
    and application requirements on throughput, OS load, replication
    latency and unavailability, we formulate the optimization problem
    of finding how many partitions are needed and show that it is
    computationally intractable, being an integer program.
    Furthermore, we propose two simple, yet efficient heuristics to
    solve the problem: the first tries to minimize and the second to
    maximize the number of brokers used in the cluster. Finally, we
    evaluate its performance via large-scale simulations, considering
    as benchmarks some Apache Kafka cluster configuration
    recommendations provided by Microsoft and Confluent. We
    demonstrate that, unlike the recommendations, the proposed
    heuristics respect the hard constraints on replication latency
    and perform better w.r.t. unavailability time and OS load, using
    the system resources in a more prudent way.

          This work has been submitted to the IEEE for possible
          publication. Copyright may be transferred without notice,
Comments: after which this version may no longer be accessible. This
          work was funded by the European Union's Horizon 2020
          research and innovation programme MARVEL under grant
          agreement No 957337
Subjects: Networking and Internet Architecture (cs.NI); Distributed,
          Parallel, and Cluster Computing (cs.DC)
Cite as:  arXiv:2205.09415 [cs.NI]
          (or arXiv:2205.09415v1 [cs.NI] for this version)
          https://doi.org/10.48550/arXiv.2205.09415
          Focus to learn more
          arXiv-issued DOI via DataCite

Submission history

From: Theofanis Raptis [view email]
[v1] Thu, 19 May 2022 09:30:04 UTC (379 KB)
Full-text links:

Download:

  * PDF
  * Other formats

(license)
Current browse context:
cs.NI
< prev   |   next >
new | recent | 2205
Change to browse by:
cs
cs.DC

References & Citations

  * NASA ADS
  * Google Scholar
  * Semantic Scholar

a export bibtex citation Loading...

Bibtex formatted citation

x
[loading...          ]
Data provided by:

Bookmark

BibSonomy logo Mendeley logo Reddit logo ScienceWISE logo
(*) Bibliographic Tools

Bibliographic and Citation Tools

[ ] Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
[ ] Litmaps Toggle
Litmaps (What is Litmaps?)
[ ] scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
( ) Code & Data

Code and Data Associated with this Article

[ ] arXiv Links to Code Toggle
arXiv Links to Code & Data (What is Links to Code & Data?)
( ) Demos

Demos

[ ] Replicate Toggle
Replicate (What is Replicate?)
( ) Related Papers

Recommenders and Search Tools

[ ] Connected Papers Toggle
Connected Papers (What is Connected Papers?)
[ ] Core recommender toggle
CORE Recommender (What is CORE?)
( ) About arXivLabs

arXivLabs: experimental projects with community collaborators

arXivLabs is a framework that allows collaborators to develop and
share new arXiv features directly on our website.

Both individuals and organizations that work with arXivLabs have
embraced and accepted our values of openness, community, excellence,
and user data privacy. arXiv is committed to these values and only
works with partners that adhere to them.

Have an idea for a project that will add value for arXiv's community?
Learn more about arXivLabs and how to get involved.

Which authors of this paper are endorsers? | Disable MathJax (What is
MathJax?)

  * About
  * Help

  * Click here to contact arXiv Contact
  * Click here to subscribe Subscribe

  * Copyright
  * Privacy Policy

  * Web Accessibility Assistance
  * arXiv Operational Status
    Get status notifications via email or slack