https://dotat.at/@/2024-09-02-ewma.html

.@ Tony Finch - blog

  * home
  * search
  * archive
  * recent
  * feed
  * links
  * elsewhere

---------------------------------------------------------------------

2024-09-02 - exponential rate limiting

  * &DoubleLeftArrow; 2024-08-30 &DoubleLeftTee;
  *  
  * &DoubleRightTee; &star; &DoubleRightArrow;

Following my previous post on rate limiting with GCRA, leaky buckets
without the buckets, I reviewed my old notes on rate limiting for
Exim. I thought I should do a new write-up of the ideas that I hope
will be more broadly interesting.

Exponential rate limiting uses an exponentially-weighted moving
average to measure the client's rate. It is motivated by a shift of
perspective:

  * first measure the client's rate,
  * then compare it to the limit.

Algorithms like GCRA and leaky bucket don't allow you to separate
these two points because they don't measure the client's rate as a
concrete number.

A moving average allows more flexible policy enforcement because the
rate measurement is meaningful even when you don't apply
back-pressure. For example, it's useful in a dry run mode, or when
diverting messages to a quarantine.

An exponential rate limiter stores, for each client:

  * last update time
  * average rate

This is a similar amount of space as leaky bucket. GCRA uses less
space because it only needs to store a time.

The main disadvantage is that an exponential rate limiter needs
fairly complicated floating point arithmetic.

  * configuration parameters
  * algorithm
  * behaviour
  * enforcement
  * rationale
      + very slow clients
      + varying intervals
      + penalty time
      + fast bursts
  * discussion

configuration parameters

A rate limiter can have three configuration parameters:

  * a maximum rate
  * a burst size that allows a sender to briefly exceed the rate
    limit
  * an averaging period that determines how quickly past behaviour is
    forgotten

In a linear rate limiter like GCRA or leaky bucket, the period is
fixed as burst / rate owing to the way the model works.

An exponential rate limiter has two parameters:

  * a limit which is also the maximum burst size
  * an averaging period

The maximum rate is limit / period.

For example, I might set limit = 600 requests per period = 1 hour. If
I want to allow the same long-term average rate, but with a smaller
burst size, I might set limit = 10 requests per period = 1 minute.

Deriving the max rate from the other two parameters makes the
algorithm easy to configure, and it turns out to simplify the
mathematical model very nicely.

algorithm

  * each client has a stored update time and rate

  * a request has a cost, which is typically 1 for fixed-cost
    requests, or (for example) the request size in bytes when
    limiting bandwidth

  * when a request arrives, get the client's details

    t_prev = client ? client.time : 0
    r_prev = client ? client.rate : 0

  * calculate the interval since the previous request, relative to
    the averaging period

    interval = (t_now - t_prev) / period

  * clamp the interval to avoid division by zero

    interval = max(interval, 1.0e-10)

  * the exponential smoothing weight is explained below

    alpha = exp(-interval)

  * the instantaneous rate, measured in cost per period

    r_inst = cost / interval

  * the updated average rate is

    r_now = (1 - alpha) * r_inst + alpha * r_prev

  * ensure rare requests are counted in full

    r_now = max(r_now, cost)

behaviour

When a client starts making requests very fast, its average rate (
r_prev and r_now) increases by close to the cost each time, so it
will hit the limit after close to limit / cost requests.

When the client's rate is more modest, or closer to its measured
average, the average changes by a smaller amount for each request.

When the client slows down, its measured rate decays exponentially
towards the new level.

enforcement

When a client exceeds its limit, how long must it wait before it can
try again and its request will be allowed?

    t_next = t_now + period * ln(r_now / limit)

The decision to allow or deny a request is separate from calculating
the client's average rate. It will typically look like,

    if r_now > limit:
        return DENY(t_next)

    client.time = t_now
    client.rate = r_now
    return ALLOW

This implements a "leaky" policy that measures the rate of requests
that are allowed, without increasing the client's rate for requests
that are denied. This is usually the right policy when DENY causes
backpressure and clients are expected to retry denied requests.

You can implement a "strict" policy by updating the client's stored
rate for both denied and allowed requests. A "strict" policy is often
appropriate when there is no backpressure. I used it when
quarantining email messages from end-users whose accounts might have
been compromised to send spam, or who might have been sending a
quarterly newsletter.

rationale

The next few subsections explain how the algorithm works in more
detail. You don't need to read them to successfully use exponential
rate limiting.

very slow clients

When a client returns after a long gap, the interval is very large,
which means alpha is small, and r_inst is small. As a result r_now
becomes very small.

This is unhelpful in practice: it effectively means the client's
first request is not counted. A more useful way to handle an isolated
request is to say its rate is the cost of the request per the period.
That way it gets treated like the first request of a fast burst.

The algorithm implements this logic by ensuring that the average rate
is at least as large as the cost of the current request.

varying intervals

Where does the exponential smoothing weight exp(-interval) come from
in the algorithm above?

We are using the usual formula for an exponentially weighted moving
average,

    r_now = (1 - alpha) * r_inst + alpha * r_prev

Moving averages are commonly calculated over fixed-size intervals, so
typically alpha is also fixed. The subexpression alpha * r_prev says
how much to forget past behaviour. Each time a fixed-size interval
passes, the old rate gets multiplied by alpha again: that is, the
forgetfulness scales exponentially with time.

In our scenario, we want to update our measurement of the rate of
requests each time a request occurs, at irregular and unpredictable
intervals. So our alpha must vary exponentially with the interval. We
derive it using the time since the previous request as a power of
some base.

    t_delta = t_now - t_prev

    alpha = pow(base, t_delta)

We set the base using the configured averaging period. I previously
said somewhat vaguely that the period determines how quickly past
behaviour is forgotten. In an exponential rate limiter it is the time
for 63% forgetfulness.

    pow(base, period) == exp(-1.0)

    exp(period * ln(base)) == exp(-1.0)

    ln(base) == -1.0 / period

    pow(base, t_delta) == exp(t_delta * ln(base))

Therefore,

    alpha = exp(-t_delta / period)

penalty time

When a client exceeds its limit, it must wait for some time doing
nothing before its request will be allowed. The wait is derived as
follows:

    limit == (1 - alpha) * 0 + alpha * r_now

    limit / r_now == exp(-wait)

    ln(limit / r_now) == -wait

    wait = ln(r_now / limit)

This wait is relative to the averaging period, so it gets multiplied
by the period to calculate the next permitted time.

    t_next = t_now + period * wait

fast bursts

Basing the forgetfulness on e seems somewhat arbitrary: why not make
the forgetfulness 50% (a half life) or 90% instead of 63%?

Another seemingly arbitrary choice is to measure rates in cost per
period instead of per second.

It turns out that these choices fit together neatly so that fast
requests are counted at their full cost, so a client will hit its
limit when expected.

The updated rate is calculated as

    r_now = (1 - alpha) * r_inst + alpha * r_prev

When the interval is very small, alpha is very nearly 1.0, and as a
result the calculation turns out to be counting approximately
linearly towards the limit

    r_now = cost + r_prev

The second subexpression is obvious but the first one is surprising!

Let's unpack it.

        (1 - alpha) * r_inst
     == (1 - exp(-interval)) * cost / interval

Factor out the cost; the surprise is that

   1 [?][?] (1 - exp(-interval)) / interval

This property comes from the fact that the gradient of e^x is 1 when
x is 0. To show why this is so, I need some basic calculus:

    y [?] f(x) [?] e^x

    dy [?] f(x + dx) - f(x)

    dx [?] -interval

So, when x is zero and dx is small,

      (1 - e^dx) / -dx
    = (e^dx - 1) / dx
    = (e^0 + dx - e^0) / dx
    = ( f(0 + dx) - f(0) ) / dx
    = dy / dx
    [?] dy / dx
    = e^x
    = 1

discussion

I originally developed this algorithm in 2005-2006 to throttle
outgoing email spam floods. It turned out to be easy to use and
produced results that made sense. (More so after I added the slow
client tweak!)

I have gone into some detail to explain how the algorithm is derived
and how it behaves. It was years before I understood some of the
mathematics properly, because I accidentally landed on a sweet spot
through some combination of luck and applied mathematical supersition
- 1/e is more beautiful than 1/2 or 1/10, right?

I don't know if I reinvented exponential rate limiting, or if there
are other similar algorithms out there. When I was working on it I
was not able to find much literature on exponentially weighted moving
averages with varying intervals, so I laboriously worked it out for
myself.

I would love to hear about any other uses of exponential rate
limiting!

---------------------------------------------------------------------
&DoubleLeftArrow; 2024-08-30 &DoubleLeftArrow; GCRA: leaky buckets
without the buckets &DoubleLeftArrow; &DoubleRightArrow; &star; &
DoubleRightArrow; &star; &DoubleRightArrow;
---------------------------------------------------------------------

Tony Finch <dot@dotat.at>