https://dotat.at/@/2024-09-02-ewma.html
.@ Tony Finch - blog
* home
* search
* archive
* recent
* feed
* links
* elsewhere
---------------------------------------------------------------------
2024-09-02 - exponential rate limiting
* ⇐ 2024-08-30 ⫤
*
* ⊨ ☆ ⇒
Following my previous post on rate limiting with GCRA, leaky buckets
without the buckets, I reviewed my old notes on rate limiting for
Exim. I thought I should do a new write-up of the ideas that I hope
will be more broadly interesting.
Exponential rate limiting uses an exponentially-weighted moving
average to measure the client's rate. It is motivated by a shift of
perspective:
* first measure the client's rate,
* then compare it to the limit.
Algorithms like GCRA and leaky bucket don't allow you to separate
these two points because they don't measure the client's rate as a
concrete number.
A moving average allows more flexible policy enforcement because the
rate measurement is meaningful even when you don't apply
back-pressure. For example, it's useful in a dry run mode, or when
diverting messages to a quarantine.
An exponential rate limiter stores, for each client:
* last update time
* average rate
This is a similar amount of space as leaky bucket. GCRA uses less
space because it only needs to store a time.
The main disadvantage is that an exponential rate limiter needs
fairly complicated floating point arithmetic.
* configuration parameters
* algorithm
* behaviour
* enforcement
* rationale
+ very slow clients
+ varying intervals
+ penalty time
+ fast bursts
* discussion
configuration parameters
A rate limiter can have three configuration parameters:
* a maximum rate
* a burst size that allows a sender to briefly exceed the rate
limit
* an averaging period that determines how quickly past behaviour is
forgotten
In a linear rate limiter like GCRA or leaky bucket, the period is
fixed as burst / rate owing to the way the model works.
An exponential rate limiter has two parameters:
* a limit which is also the maximum burst size
* an averaging period
The maximum rate is limit / period.
For example, I might set limit = 600 requests per period = 1 hour. If
I want to allow the same long-term average rate, but with a smaller
burst size, I might set limit = 10 requests per period = 1 minute.
Deriving the max rate from the other two parameters makes the
algorithm easy to configure, and it turns out to simplify the
mathematical model very nicely.
algorithm
* each client has a stored update time and rate
* a request has a cost, which is typically 1 for fixed-cost
requests, or (for example) the request size in bytes when
limiting bandwidth
* when a request arrives, get the client's details
t_prev = client ? client.time : 0
r_prev = client ? client.rate : 0
* calculate the interval since the previous request, relative to
the averaging period
interval = (t_now - t_prev) / period
* clamp the interval to avoid division by zero
interval = max(interval, 1.0e-10)
* the exponential smoothing weight is explained below
alpha = exp(-interval)
* the instantaneous rate, measured in cost per period
r_inst = cost / interval
* the updated average rate is
r_now = (1 - alpha) * r_inst + alpha * r_prev
* ensure rare requests are counted in full
r_now = max(r_now, cost)
behaviour
When a client starts making requests very fast, its average rate (
r_prev and r_now) increases by close to the cost each time, so it
will hit the limit after close to limit / cost requests.
When the client's rate is more modest, or closer to its measured
average, the average changes by a smaller amount for each request.
When the client slows down, its measured rate decays exponentially
towards the new level.
enforcement
When a client exceeds its limit, how long must it wait before it can
try again and its request will be allowed?
t_next = t_now + period * ln(r_now / limit)
The decision to allow or deny a request is separate from calculating
the client's average rate. It will typically look like,
if r_now > limit:
return DENY(t_next)
client.time = t_now
client.rate = r_now
return ALLOW
This implements a "leaky" policy that measures the rate of requests
that are allowed, without increasing the client's rate for requests
that are denied. This is usually the right policy when DENY causes
backpressure and clients are expected to retry denied requests.
You can implement a "strict" policy by updating the client's stored
rate for both denied and allowed requests. A "strict" policy is often
appropriate when there is no backpressure. I used it when
quarantining email messages from end-users whose accounts might have
been compromised to send spam, or who might have been sending a
quarterly newsletter.
rationale
The next few subsections explain how the algorithm works in more
detail. You don't need to read them to successfully use exponential
rate limiting.
very slow clients
When a client returns after a long gap, the interval is very large,
which means alpha is small, and r_inst is small. As a result r_now
becomes very small.
This is unhelpful in practice: it effectively means the client's
first request is not counted. A more useful way to handle an isolated
request is to say its rate is the cost of the request per the period.
That way it gets treated like the first request of a fast burst.
The algorithm implements this logic by ensuring that the average rate
is at least as large as the cost of the current request.
varying intervals
Where does the exponential smoothing weight exp(-interval) come from
in the algorithm above?
We are using the usual formula for an exponentially weighted moving
average,
r_now = (1 - alpha) * r_inst + alpha * r_prev
Moving averages are commonly calculated over fixed-size intervals, so
typically alpha is also fixed. The subexpression alpha * r_prev says
how much to forget past behaviour. Each time a fixed-size interval
passes, the old rate gets multiplied by alpha again: that is, the
forgetfulness scales exponentially with time.
In our scenario, we want to update our measurement of the rate of
requests each time a request occurs, at irregular and unpredictable
intervals. So our alpha must vary exponentially with the interval. We
derive it using the time since the previous request as a power of
some base.
t_delta = t_now - t_prev
alpha = pow(base, t_delta)
We set the base using the configured averaging period. I previously
said somewhat vaguely that the period determines how quickly past
behaviour is forgotten. In an exponential rate limiter it is the time
for 63% forgetfulness.
pow(base, period) == exp(-1.0)
exp(period * ln(base)) == exp(-1.0)
ln(base) == -1.0 / period
pow(base, t_delta) == exp(t_delta * ln(base))
Therefore,
alpha = exp(-t_delta / period)
penalty time
When a client exceeds its limit, it must wait for some time doing
nothing before its request will be allowed. The wait is derived as
follows:
limit == (1 - alpha) * 0 + alpha * r_now
limit / r_now == exp(-wait)
ln(limit / r_now) == -wait
wait = ln(r_now / limit)
This wait is relative to the averaging period, so it gets multiplied
by the period to calculate the next permitted time.
t_next = t_now + period * wait
fast bursts
Basing the forgetfulness on e seems somewhat arbitrary: why not make
the forgetfulness 50% (a half life) or 90% instead of 63%?
Another seemingly arbitrary choice is to measure rates in cost per
period instead of per second.
It turns out that these choices fit together neatly so that fast
requests are counted at their full cost, so a client will hit its
limit when expected.
The updated rate is calculated as
r_now = (1 - alpha) * r_inst + alpha * r_prev
When the interval is very small, alpha is very nearly 1.0, and as a
result the calculation turns out to be counting approximately
linearly towards the limit
r_now = cost + r_prev
The second subexpression is obvious but the first one is surprising!
Let's unpack it.
(1 - alpha) * r_inst
== (1 - exp(-interval)) * cost / interval
Factor out the cost; the surprise is that
1 [?][?] (1 - exp(-interval)) / interval
This property comes from the fact that the gradient of e^x is 1 when
x is 0. To show why this is so, I need some basic calculus:
y [?] f(x) [?] e^x
dy [?] f(x + dx) - f(x)
dx [?] -interval
So, when x is zero and dx is small,
(1 - e^dx) / -dx
= (e^dx - 1) / dx
= (e^0 + dx - e^0) / dx
= ( f(0 + dx) - f(0) ) / dx
= dy / dx
[?] dy / dx
= e^x
= 1
discussion
I originally developed this algorithm in 2005-2006 to throttle
outgoing email spam floods. It turned out to be easy to use and
produced results that made sense. (More so after I added the slow
client tweak!)
I have gone into some detail to explain how the algorithm is derived
and how it behaves. It was years before I understood some of the
mathematics properly, because I accidentally landed on a sweet spot
through some combination of luck and applied mathematical supersition
- 1/e is more beautiful than 1/2 or 1/10, right?
I don't know if I reinvented exponential rate limiting, or if there
are other similar algorithms out there. When I was working on it I
was not able to find much literature on exponentially weighted moving
averages with varying intervals, so I laboriously worked it out for
myself.
I would love to hear about any other uses of exponential rate
limiting!
---------------------------------------------------------------------
⇐ 2024-08-30 ⇐ GCRA: leaky buckets
without the buckets ⇐ ⇒ ☆ &
DoubleRightArrow; ☆ ⇒
---------------------------------------------------------------------
Tony Finch