https://www.databricks.com/blog/intelligent-kubernetes-load-balancing-databricks

Skip to main content
 
[svg]
Login
 
[svg]

  * Why Databricks
      + 
          o Discover
              # For Executives
              # For Startups
              # Lakehouse Architecture
              # Mosaic Research
          o Customers
              # Customer Stories
          o Partners
              # Cloud Providers
                Databricks on AWS, Azure, GCP, and SAP
              # Consulting & System Integrators
                Experts to build, deploy and migrate to Databricks
              # Technology Partners
                Connect your existing tools to your Lakehouse
              # C&SI Partner Program
                Build, deploy or migrate to the Lakehouse
              # Data Partners
                Access the ecosystem of data consumers
              # Partner Solutions
                Find custom industry and migration solutions
              # Built on Databricks
                Build, market and grow your business
  * Product
      + 
          o Databricks Platform
              # Platform Overview
                A unified platform for data, analytics and AI
              # Data Management
                Data reliability, security and performance
              # Sharing
                An open, secure, zero-copy sharing for all data
              # Data Warehousing
                Serverless data warehouse for SQL analytics
              # Governance
                Unified governance for all data, analytics and AI
                assets
              # Data Engineering
                ETL and orchestration for batch and streaming data
              # Artificial Intelligence
                Build and deploy ML and GenAI applications
              # Data Science
                Collaborative data science at scale
              # Business Intelligence
                Intelligent analytics for real-world data
              # Application Development
                Quickly build secure data and AI apps
              # Database
                Postgres for data apps and AI agents
          o Integrations and Data
              # Marketplace
                Open marketplace for data, analytics and AI
              # IDE Integrations
                Build on the Lakehouse in your favorite IDE
              # Partner Connect
                Discover and integrate with the Databricks ecosystem
          o Pricing
              # Databricks Pricing
                Explore product pricing, DBUs and more
              # Cost Calculator
                Estimate your compute costs on any cloud
          o Open Source
              # Open Source Technologies
                Learn more about the innovations behind the platform
  * Solutions
      + 
          o Databricks for Industries
              # Communications
              # Media and Entertainment
              # Financial Services
              # Public Sector
              # Healthcare & Life Sciences
              # Retail
              # Manufacturing
              # See All Industries
          o Cross Industry Solutions
              # AI Agents
              # Cybersecurity
              # Marketing
          o Migration & Deployment
              # Data Migration
              # Professional Services
          o Solution Accelerators
              # Explore Accelerators
                Move faster toward outcomes that matter
  * Resources
      + 
          o Learning
              # Training
                Discover curriculum tailored to your needs
              # Databricks Academy
                Sign in to the Databricks learning platform
              # Certification
                Gain recognition and differentiation
              # Free Edition
                Learn professional Data and AI tools for free
              # University Alliance
                Want to teach Databricks? See how.
          o Events
              # Data + AI Summit
              # Data + AI World Tour
              # Data Intelligence Days
              # Event Calendar
          o Blog and Podcasts
              # Databricks Blog
                Explore news, product announcements, and more
              # Databricks Mosaic Research Blog
                Discover the latest in our Gen AI research
              # Data Brew Podcast
                Let's talk data!
              # Champions of Data + AI Podcast
                Insights from data leaders powering innovation
          o Get Help
              # Customer Support
              # Documentation
              # Community
          o Dive Deep
              # Resource Center
              # Demo Center
              # Architecture Center
  * About
      + 
          o Company
              # Who We Are
              # Our Team
              # Databricks Ventures
              # Contact Us
          o Careers
              # Working at Databricks
              # Open Jobs
          o Press
              # Awards and Recognition
              # Newsroom
          o Security and Trust
              # Security and Trust

  * Ready to get started?
  * Get a Demo

  * Login
  * Contact Us
  * Try Databricks

 1. Blog
 2. /
    Engineering
 3. /
    Article

---------------------------------------------------------------------

Intelligent Kubernetes Load Balancing at Databricks

Real-Time, Client-Side Load Balancing for Internal and Ingress
Traffic in Kubernetes

Intelligent Kubernetes Load Balancing at Databricks

Published: September 30, 2025

Engineering10 min read

by Gaurav Nanda, Vincent Cheng and Rohit Agrawal

Share this post

  *  
  *  
  *  

Keep up with us

Subscribe

Summary

  * Why Kubernetes' default load balancing falls short for
    high-throughput, persistent connections like gRPC, especially at
    Databricks scale.
  * How we built a client-side, control-plane-driven load balancing
    system using custom rpc client and xDS.
  * Trade-offs of alternative approaches like headless services and
    Istio, and why we chose a lightweight, client-driven model.

Introduction

At Databricks, Kubernetes is at the heart of our internal systems.
Within a single Kubernetes cluster, the default networking primitives
like ClusterIP services, CoreDNS, and kube-proxy are often
sufficient. They offer a simple abstraction to route service traffic.
But when performance and reliability matter, these defaults begin to
show their limits.

In this post, we'll share how we built an intelligent, client-side
load balancing system to improve traffic distribution, reduce tail
latencies, and make service-to-service communication more resilient.

If you are a Databricks user, you don't need to understand this blog
to be able to use the platform to its fullest. But if you're
interested in taking a peek under the hood, read on to hear about
some of the cool stuff we've been working on!

Problem statement

High-performance service-to-service communication in Kubernetes has
several challenges, especially when using persistent HTTP/2
connections, as we do at Databricks with gRPC.

How Kubernetes Routes Requests by Default

  * The client resolves the service name (e.g.,
    my-service.default.svc.cluster.local) via CoreDNS, which returns
    the service's ClusterIP (a virtual IP).
  * The client sends the request to the ClusterIP, assuming it's the
    destination.
  * On the node, iptables, IPVS, or eBPF rules (configured by
    kube-proxy) intercept the packet. The kernel rewrites the
    destination IP to one of the backend Pod IPs based on basic load
    balancing, such as round-robin, and forwards the packet.
  * The selected pod handles the request, and the response is sent
    back to the client.

While this model generally works, it quickly breaks down in
performance-sensitive environments, leading to significant
limitations.

Limitations

At Databricks, we operate hundreds of stateless services
communicating over gRPC within each Kubernetes cluster. These
services are often high-throughput, latency-sensitive, and run at
significant scale.

The default load balancing model falls short in this environment for
several reasons:

  * High tail latency: gRPC uses HTTP/2, which maintains long-lived
    TCP connections between clients and services. Since Kubernetes
    load balancing happens at Layer 4, the backend pod is chosen only
    once per connection. This leads to traffic skew, where some pods
    receive significantly more load than others. As a result, tail
    latencies increase and performance becomes inconsistent under
    load.
  * Inefficient resource usage: When traffic is not evenly spread, it
    becomes hard to predict capacity requirements. Some pods get CPU
    or memory starved while others sit idle. This leads to
    over-provisioning and waste.
  * Limited load balancing strategies: kube-proxy supports only basic
    algorithms like round-robin or random selection. There's no
    support for strategies like:
      + Weighted round robin
      + Error-aware routing
      + Zone-aware traffic routing

These limitations pushed us to rethink how we handle
service-to-service communication within a Kubernetes cluster.

Our Approach: Client-Side Load Balancing with Real-Time Service
Discovery

To address the limitations of kube-proxy and default service routing
in Kubernetes, we built a proxyless, fully client-driven load
balancing system backed by a custom service discovery control plane.

The fundamental requirement we had was to support load balancing at
the application layer, and removing dependency on the DNS on a
critical path. A Layer 4 load balancer, like kube-proxy, cannot make
intelligent per-request decisions for Layer 7 protocols (such as
gRPC) that utilize persistent connections. This architectural
constraint creates bottlenecks, necessitating a more intelligent
approach to traffic management.

The following table summarizes the key differences and the advantages
of a client-side approach:

Table 1: Default Kubernetes LB vs. Databricks' Client-Side LB

  Feature/       Default Kubernetes Load     Databricks' Client-Side
   Aspect        Balancing (kube-proxy)          Load Balancing
Load                                        Layer 7 (Application/
Balancing     Layer 4 (TCP/IP)              gRPC)
Layer
Decision      Once per TCP connection       Per-request
Frequency
Service       CoreDNS + kube-proxy (virtual xDS-based Control Plane +
Discovery     IP)                           Client Library
Supported     Basic (Round-robin, Random)   Advanced (P2C,
Strategies                                  Zone-affinity, Pluggable)
Tail Latency  High (due to traffic skew on  Reduced (even
Impact        persistent connections)       distribution, dynamic
                                            routing)
Resource      Inefficient                   Efficient (balanced load)
Utilization   (over-provisioning)
Dependency on High                          Minimal/Minimal, not on a
DNS/Proxy                                   critical path
Operational   Limited                       Fine-grained
Control

This system enables intelligent, up-to-date request routing with
minimal dependency on DNS or Layer 4 networking. It gives clients the
ability to make informed decisions based on live topology and health
data.

The figure shows our custom Endpoint Discovery Service in action. It
reads service and endpoint data from the Kubernetes API and
translates it into xDS responses. Both Armeria clients and API
proxies stream requests to it and receive live endpoint metadata,
which is then used by application servers for intelligent routing
with fallback clusters as backup."

Custom Control Plane (Endpoint discovery service)

We run a lightweight control plane that continuously monitors the
Kubernetes API for changes to Services and EndpointSlices. It
maintains an up-to-date view of all backend pods for every service,
including metadata like zone, readiness, and shard labels.

RPC Client Integration

A strategic advantage for Databricks was the widespread adoption of a
common framework for service communication across most of its
internal services, which are predominantly written in Scala. This
shared foundation allowed us to embed client-side service discovery
and load balancing logic directly into the framework, making it easy
to adopt across teams without requiring custom implementation effort.

Each service integrates with our custom client, which subscribes to
updates from the control plane for the services it depends on during
the connection setup. The client maintains a dynamic list of healthy
endpoints, including metadata like zone or shard, and updates
automatically as the control plane pushes changes.

Because the client bypasses both DNS resolution and kube-proxy
entirely, it always has a live, accurate view of service topology.
This allows us to implement consistent and efficient load balancing
strategies across all internal services.

Advanced Load Balancing in Clients

The rpc client performs request-aware load balancing using strategies
like:

  * Power of Two Choices (P2C): For the majority of services, a
    simple Power of Two Choices (P2C) algorithm has proven remarkably
    effective. This strategy involves randomly selecting two backend
    servers and then choosing the one with fewer active connections
    or lower load. Databricks' experience indicates that P2C strikes
    a strong balance between performance and implementation
    simplicity, consistently leading to uniform traffic distribution
    across endpoints.
  * Zone-affinity-based: The system also supports more advanced
    strategies, such as zone-affinity-based routing. This capability
    is vital for minimizing cross-zone network hops, which can
    significantly reduce network latency and associated data transfer
    costs, especially in geographically distributed Kubernetes
    clusters.

    The system also accounts for scenarios where a zone lacks
    sufficient capacity or becomes overloaded. In such cases, the
    routing algorithm intelligently spills traffic over to other
    healthy zones, balancing load while still preferring local
    affinity whenever possible. This ensures high availability and
    consistent performance, even under uneven capacity distribution
    across zones.
  * Pluggable Support: The architecture's flexibility allows for
    pluggable support for additional load balancing strategies as
    needed.

More advanced strategies, like zone-aware routing, required careful
tuning and deeper context about service topology, traffic patterns,
and failure modes; a topic to explore in a dedicated follow-up post.

To ensure the effectiveness of our approach, we ran extensive
simulations, experiments, and real-world metric analysis. We
validated that load remained evenly distributed and that key metrics
like tail latency, error rate, and cross-zone traffic cost stayed
within target thresholds. The flexibility to adapt strategies
per-service has been valuable, but in practice, keeping it simple
(and consistent) has worked best.

xDS Integration with Envoy

Our control plane extends its utility beyond the internal
service-to-service communication. It plays a crucial role in managing
external traffic by speaking the xDS API to Envoy, the discovery
protocol that lets clients fetch up-to-date configuration (like
clusters, endpoints, and routing rules) dynamically. Specifically, it
implements Endpoint Discovery Service (EDS) to provide Envoy with
consistent and up-to-date metadata about backend endpoints by
programming ClusterLoadAssignment resources. This ensures that
gateway-level routing (e.g., for ingress or public-facing traffic)
aligns with the same source of truth used by internal clients.

Summary

This architecture gives us fine-grained control over routing behavior
while decoupling service discovery from the limitations of DNS and
kube-proxy. The key takeaways are:

 1. clients always have a live, accurate view of endpoints and their
    health,
 2. load balancing strategies can be tailored per-service, improving
    efficiency and tail latency, and
 3. both internal and external traffic share the same source of
    truth, ensuring consistency across the platform.

Impact

After deploying our client-side load balancing system, we observed
significant improvements across both performance and efficiency:

  * Uniform Request Distribution
    Server-side QPS became evenly distributed across all backend
    pods. Unlike the prior setup, where some pods were overloaded
    while others remained underutilized, traffic now spreads
    predictably. The top chart shows the distribution before EDS,
    while the bottom chart shows the balanced distribution after EDS.
  * Stable Latency Profiles
    The variation in latency across pods dropped noticeably. Latency
    metrics improved and stabilized across pods, reducing long-tail
    behavior in gRPC workloads. The diagram below shows how P90
    latency became more stable after client-side load balancing was
    enabled.
    stable latency profiles
  * Resource Efficiency
    With more predictable latency and balanced load, we were able to
    reduce over-provisioned capacity. Across several services, this
    resulted in approximately a 20% reduction in pod count, freeing
    up compute resources without compromising reliability.

Challenges and Lessons Learned

While the rollout delivered clear benefits, we also uncovered several
challenges and insights along the way:

  * Server cold starts: Before client-side load balancing, most
    requests were sent over long-lived connections, so new pods were
    rarely hit until existing connections were recycled. After the
    shift, new pods began receiving traffic immediately, which
    surfaced cold-start issues where they handled requests before
    being fully warmed up. We addressed this by introducing
    slow-start ramp-up and biasing traffic away from pods with higher
    observed error rates. These lessons also reinforced the need for
    a dedicated warmup framework.
  * Metrics-based routing: We initially experimented with skewing
    traffic based on resource usage signals such as CPU. Although
    conceptually attractive, this approach proved unreliable:
    monitoring systems had different SLOs than serving workloads, and
    metrics like CPU were often trailing indicators rather than
    real-time signals of capacity. We ultimately moved away from this
    model and chose to rely on more dependable signals such as server
    health.
  * Client-library integration: Building load balancing directly into
    client libraries brought strong performance benefits, but it also
    created some unavoidable gaps. Languages without the library, or
    traffic flows that still depend on infrastructure load balancers,
    remain outside the scope of client-side balancing.

Alternatives Considered

While developing our client-side load balancing approach, we
evaluated other alternative solutions. Here's why we ultimately
decided against these:

Headless Services

Kubernetes headless services (clusterIP: None) provide direct pod IPs
via DNS, allowing clients and proxies (like Envoy) to perform their
own load balancing. This approach bypasses the limitation of
connection-based distribution in kube-proxy and enables advanced load
balancing strategies offered by Envoy (such as round robin,
consistent hashing, and least-loaded round robin).

In theory, switching existing ClusterIP services to headless services
(or creating additional headless services using the same selector)
would mitigate connection reuse issues by providing clients direct
endpoint visibility. However, this approach comes with practical
limitations:

  * Lack of Endpoint Weights: Headless services alone don't support
    assigning weights to endpoints, restricting our ability to
    implement fine-grained load distribution control.
  * DNS Caching and Staleness: Clients frequently cache DNS
    responses, causing them to send requests to stale or unhealthy
    endpoints.
  * No Support for Metadata: DNS records do not carry any additional
    metadata about the endpoints (e.g., zone, region, shard). This
    makes it difficult or impossible to implement strategies like
    zone-aware or topology-aware routing.

Although headless services can offer a temporary improvement over
ClusterIP services, the practical challenges and limitations made
them unsuitable as a long-term solution at Databricks' scale.

Service Meshes (e.g., Istio)

Istio provides powerful Layer 7 load balancing features using Envoy
sidecars injected into every pod. These proxies handle routing,
retries, circuit breaking, and more - all managed centrally through a
control plane.

While this model offers many capabilities, we found it unsuitable for
our environment at Databricks for a few reasons:

  * Operational complexity: Managing thousands of sidecars and
    control plane components adds significant overhead, particularly
    during upgrades and large-scale rollouts.
  * Performance overhead: Sidecars introduce additional CPU, memory,
    and latency costs per pod -- which becomes substantial at our
    scale.
  * Limited client flexibility: Since all routing logic is handled
    externally, it's difficult to implement request-aware strategies
    that rely on application-layer context.

We also evaluated Istio's Ambient Mesh. Since Databricks already had
proprietary systems for functions like certificate distribution, and
our routing patterns were relatively static, the added complexity of
adopting a full mesh outweighed the benefits. This was especially
true for a small infra team supporting a predominantly Scala
codebase.

It is worth noting that one of the biggest advantages of
sidecar-based meshes is language-agnosticism: teams can standardize
resiliency and routing across polyglot services without maintaining
client libraries everywhere. At Databricks, however, our environment
is heavily Scala-based, and our monorepo plus fast CI/CD culture make
the proxyless, client-library approach far more practical. Rather
than introducing the operational burden of sidecars, we invested in
building first-class load balancing directly into our libraries and
infrastructure components.

Future directions and Areas of exploration

Our current client-side load balancing approach has significantly
improved internal service-to-service communication. Yet, as
Databricks continues to scale, we're exploring several advanced areas
to further enhance our system:

Cross-Cluster and Cross-Region Load Balancing: As we manage thousands
of Kubernetes clusters across multiple regions, extending intelligent
load balancing beyond individual clusters is critical. We are
exploring technologies like flat L3 networking and service-mesh
solutions, integrating seamlessly with multi-region Endpoint
Discovery Service (EDS) clusters. This will enable robust
cross-cluster traffic management, fault tolerance, and globally
efficient resource utilization.

Advanced Load Balancing Strategies for AI Use Cases: We plan to
introduce more sophisticated strategies, such as weighted load
balancing, to better support advanced AI workloads. These strategies
will enable finer-grained resource allocation and intelligent routing
decisions based on specific application characteristics, ultimately
optimizing performance, resource consumption, and cost efficiency.

If you're interested in working on large-scale distributed
infrastructure challenges like this, we're hiring. Come build with us
-- explore open roles at Databricks!

Keep up with us

Subscribe

Share this post

  *  
  *  
  *  

Never miss a Databricks post

Subscribe to the categories you care about and get the latest posts
delivered to your inbox

Sign up

What's next?

 
Booting Databricks VMs 7x Faster for Serverless Compute

Data Engineering

November 25, 2024/9 min read

Booting Databricks VMs 7x Faster for Serverless Compute

 
[batch-infe]

Product

December 10, 2024/7 min read

Batch Inference on Fine Tuned Llama Models with Mosaic AI Model
Serving

 
databricks logo
Why Databricks
Discover

  * For Executives
  * For Startups
  * Lakehouse Architecture
  * Mosaic Research

Customers

  * Customer Stories

Partners

  * Cloud Providers
  * Technology Partners
  * Data Partners
  * Built on Databricks
  * Consulting & System Integrators
  * C&SI Partner Program
  * Partner Solutions

Why Databricks
Discover

  * For Executives
  * For Startups
  * Lakehouse Architecture
  * Mosaic Research

Customers

  * Customer Stories

Partners

  * Cloud Providers
  * Technology Partners
  * Data Partners
  * Built on Databricks
  * Consulting & System Integrators
  * C&SI Partner Program
  * Partner Solutions

Product
Databricks Platform

  * Platform Overview
  * Sharing
  * Governance
  * Artificial Intelligence
  * Business Intelligence
  * Database
  * Data Management
  * Data Warehousing
  * Data Engineering
  * Data Science
  * Application Development

Pricing

  * Pricing Overview
  * Pricing Calculator

Open Source
Integrations and Data

  * Marketplace
  * IDE Integrations
  * Partner Connect

Product
Databricks Platform

  * Platform Overview
  * Sharing
  * Governance
  * Artificial Intelligence
  * Business Intelligence
  * Database
  * Data Management
  * Data Warehousing
  * Data Engineering
  * Data Science
  * Application Development

Pricing

  * Pricing Overview
  * Pricing Calculator

Open Source
Integrations and Data

  * Marketplace
  * IDE Integrations
  * Partner Connect

Solutions
Databricks For Industries

  * Communications
  * Financial Services
  * Healthcare and Life Sciences
  * Manufacturing
  * Media and Entertainment
  * Public Sector
  * Retail
  * View All

Cross Industry Solutions

  * Cybersecurity
  * Marketing

Data Migration
Professional Services
Solution Accelerators
Solutions
Databricks For Industries

  * Communications
  * Financial Services
  * Healthcare and Life Sciences
  * Manufacturing
  * Media and Entertainment
  * Public Sector
  * Retail
  * View All

Cross Industry Solutions

  * Cybersecurity
  * Marketing

Data Migration
Professional Services
Solution Accelerators
Resources
Documentation
Customer Support
Community
Learning

  * Training
  * Certification
  * Free Edition
  * University Alliance
  * Databricks Academy Login

Events

  * Data + AI Summit
  * Data + AI World Tour
  * Data Intelligence Days
  * Event Calendar

Blog and Podcasts

  * Databricks Blog
  * Databricks Mosaic Research Blog
  * Data Brew Podcast
  * Champions of Data & AI Podcast

Resources
Documentation
Customer Support
Community
Learning

  * Training
  * Certification
  * Free Edition
  * University Alliance
  * Databricks Academy Login

Events

  * Data + AI Summit
  * Data + AI World Tour
  * Data Intelligence Days
  * Event Calendar

Blog and Podcasts

  * Databricks Blog
  * Databricks Mosaic Research Blog
  * Data Brew Podcast
  * Champions of Data & AI Podcast

About
Company

  * Who We Are
  * Our Team
  * Databricks Ventures
  * Contact Us

Careers

  * Open Jobs
  * Working at Databricks

Press

  * Awards and Recognition
  * Newsroom

Security and Trust
About
Company

  * Who We Are
  * Our Team
  * Databricks Ventures
  * Contact Us

Careers

  * Open Jobs
  * Working at Databricks

Press

  * Awards and Recognition
  * Newsroom

Security and Trust
 
databricks logo

Databricks Inc.
160 Spear Street, 15th Floor
San Francisco, CA 94105
1-866-330-0121

  *  
  *  
  *  
  *  
  *  
  *  

[telco-icon]

See Careers
at Databricks

  *  
  *  
  *  
  *  
  *  
  *  

(c) Databricks 2025. All rights reserved. Apache, Apache Spark, Spark,
the Spark Logo, Apache Iceberg, Iceberg, and the Apache Iceberg logo
are trademarks of the Apache Software Foundation.

  * Privacy Notice
  * |Terms of Use
  * |Modern Slavery Statement
  * |California Privacy
  * |Your Privacy Choices
  * #