CHAPTER 4: DESIGN A RATE LIMITER 设计速率限制器


In a network system, a rate limiter is used to control the rate of traffic sent by a client or a
service. In the HTTP world, a rate limiter limits the number of client requests allowed to be
sent over a specified period. If the API request count exceeds the threshold defined by the
rate limiter, all the excess calls are blocked. Here are a few examples:
• A user can write no more than 2 posts per second.
• You can create a maximum of 10 accounts per day from the same IP address.
• You can claim rewards no more than 5 times per week from the same device.




在网络系统中，速率限制器用于控制客户端或服务发送的流量速率。在 HTTP 世界中，速率限制器限制在指定时间段内允许发送的客户端请求数。如果 API 请求计数超过速率限制器定义的阈值，则会阻止所有多余的调用。以下是一些示例：
• 用户每秒最多可以写入 2 个帖子。
• 您每天最多可以从同一IP地址创建10个帐户。
• 您每周可以从同一设备领取不超过 5 次奖励。


In this chapter, you are asked to design a rate limiter. Before starting the design, we first look
at the benefits of using an API rate limiter:
• Prevent resource starvation caused by Denial of Service (DoS) attack [1]. Almost all
APIs published by large tech companies enforce some form of rate limiting. For example,
Twitter limits the number of tweets to 300 per 3 hours [2]. Google docs APIs have the
following default limit: 300 per user per 60 seconds for read requests [3]. A rate limiter
prevents DoS attacks, either intentional or unintentional, by blocking the excess calls.
• Reduce cost. Limiting excess requests means fewer servers and allocating more
resources to high priority APIs. Rate limiting is extremely important for companies that
use paid third party APIs. For example, you are charged on a per-call basis for the
following external APIs: check credit, make a payment, retrieve health records, etc.
Limiting the number of calls is essential to reduce costs.
• Prevent servers from being overloaded. To reduce server load, a rate limiter is used to
filter out excess requests caused by bots or users’ misbehavior.

在本章中，您需要设计一个速率限制器。在开始设计之前，我们首先看看使用 API 速率限制器的好处： 

• 防止由拒绝服务 (DoS) 攻击引起的资源匮乏 [1]。大型科技公司发布的几乎所有 API 都强制执行某种形式的速率限制。例如，Twitter 将推文数量限制为每 3 小时 300 条 [2]。 Google 文档 API 具有以下默认限制：每个用户每 60 秒 300 个读取请求 [3]。速率限制器通过阻止过多的调用来防止有意或无意的 DoS 攻击。 

• 降低成本。限制过多的请求意味着更少的服务器并将更多的资源分配给高优先级的 API。速率限制对于使用付费第三方 API 的公司来说极为重要。例如，您需要为以下外部 API 按每次调用付费：检查信用、付款、检索健康记录等。限制调用次数对于降低成本至关重要。 

• 防止服务器过载。为了减少服务器负载，使用速率限制器过滤掉由机器人或用户的不当行为引起的过多请求。

Step 1 - Understand the problem and establish design scope 第 1 步 - 了解问题并建立设计范围



Rate limiting can be implemented using different algorithms, each with its pros and cons. The
interactions between an interviewer and a candidate help to clarify the type of rate limiters we
are trying to build.


Candidate: What kind of rate limiter are we going to design? Is it a client-side rate limiter or
server-side API rate limiter?
Interviewer: Great question. We focus on the server-side API rate limiter.
Candidate: Does the rate limiter throttle API requests based on IP, the user ID, or other
properties?
Interviewer: The rate limiter should be flexible enough to support different sets of throttle
rules.
Candidate: What is the scale of the system? Is it built for a startup or a big company with a
large user base?
Interviewer: The system must be able to handle a large number of requests.
Candidate: Will the system work in a distributed environment?
Interviewer: Yes.
Candidate: Is the rate limiter a separate service or should it be implemented in application
code?
Interviewer: It is a design decision up to you.
Candidate: Do we need to inform users who are throttled?
Interviewer: Yes.

可以使用不同的算法来实现速率限制，每种算法各有利弊。面试官和候选人之间的互动有助于阐明我们试图建立的速率限制器的类型。

考生：我们要设计什么样的限速器？它是客户端速率限制器还是服务器端 API 速率限制器？
面试官：很好的问题。我们专注于服务器端 API 速率限制器。
候选人：速率限制器是否基于 IP、用户 ID 或其他属性来限制 API 请求？
面试官：限速器应该足够灵活，以支持不同的节流规则集。
候选人：系统的规模是多少？它是为初创公司还是为拥有大量用户群的大公司而构建的？
面试官：系统必须能够处理大量的请求。
考生：系统会在分布式环境下工作吗？
面试官：是的。
候选人：速率限制器是一个单独的服务还是应该在应用程序代码中实现？
面试官：这是你的设计决定。
候选人：是否需要通知被限流的用户？
面试官：是的。

Requirements 要求


Here is a summary of the requirements for the system:
• Accurately limit excessive requests.
• Low latency. The rate limiter should not slow down HTTP response time.
• Use as little memory as possible.
• Distributed rate limiting. The rate limiter can be shared across multiple servers or
processes.
• Exception handling. Show clear exceptions to users when their requests are throttled.
• High fault tolerance. If there are any problems with the rate limiter (for example, a cache
server goes offline), it does not affect the entire system.


以下是系统要求的摘要： 

• 准确限制过多的请求。
• 低延迟。速率限制器不应减慢 HTTP 响应时间。
• 使用尽可能少的内存。
• 分布式速率限制。速率限制器可以在多个服务器或进程之间共享。
• 异常处理。当用户的请求受到限制时，向用户显示明确的异常。
• 高容错性。如果速率限制器有任何问题（例如，缓存服务器脱机），则不会影响整个系统。

Step 2 - Propose high-level design and get buy-in 第 2 步 - 提出高级设计并获得支持

Let us keep things simple and use a basic client and server model for communication.

Where to put the rate limiter? 在哪里放置速率限制器


Intuitively, you can implement a rate limiter at either the client or server-side. 
• Client-side implementation. Generally speaking, client is an unreliable place to enforce
rate limiting because client requests can easily be forged by malicious actors. Moreover,
we might not have control over the client implementation.
• Server-side implementation. Figure 4-1 shows a rate limiter that is placed on the serverside.

直观地说，您可以在客户端或服务器端实现速率限制器。 

• 客户端实现。一般来说，客户端是强制实施速率限制的不可靠位置，因为客户端请求很容易被恶意行为者伪造。此外，我们可能无法控制客户端实现。
• 服务器端实现。图 4-1 显示了放置在服务器端的速率限制器。


Besides the client and server-side implementations, there is an alternative way. Instead of
putting a rate limiter at the API servers, we create a rate limiter middleware, which throttles
requests to your APIs as shown in Figure 4-2.

除了客户端和服务器端实现之外，还有另一种方法。我们没有在 API 服务器上放置速率限制器，而是创建一个速率限制器中间件，该中间件限制对 API 的请求，如图 4-2 所示。


Let us use an example in Figure 4-3 to illustrate how rate limiting works in this design.
Assume our API allows 2 requests per second, and a client sends 3 requests to the server
within a second. The first two requests are routed to API servers. However, the rate limiter
middleware throttles the third request and returns a HTTP status code 429. The HTTP 429
response status code indicates a user has sent too many requests.

让我们使用图 4-3 中的示例来说明速率限制在此设计中的工作原理。假设我们的 API 允许每秒 2 个请求，并且客户端在一秒钟内向服务器发送 3 个请求。前两个请求路由到 API 服务器。但是，速率限制器中间件会限制第三个请求并返回 HTTP 状态代码 429。HTTP 429 响应状态代码指示用户发送了太多请求。


Cloud microservices [4] have become widely popular and rate limiting is usually
implemented within a component called API gateway. API gateway is a fully managed
service that supports rate limiting, SSL termination, authentication, IP whitelisting, servicing
static content, etc. For now, we only need to know that the API gateway is a middleware that
supports rate limiting.

云微服务 [4] 已广受欢迎，速率限制通常很常见,在名为 API 网关的组件中实现。API 网关是完全托管的,支持速率限制、SSL 终止、身份验证、IP 白名单、服务静态内容等现在，我们只需要知道 API 网关是一个中间件，支持速率限制。


While designing a rate limiter, an important question to ask ourselves is: where should the
rater limiter be implemented, on the server-side or in a gateway? There is no absolute answer.
It depends on your company’s current technology stack, engineering resources, priorities,
goals, etc. Here are a few general guidelines:
• Evaluate your current technology stack, such as programming language, cache service,
etc. Make sure your current programming language is efficient to implement rate limiting
on the server-side.
• Identify the rate limiting algorithm that fits your business needs. When you implement
everything on the server-side, you have full control of the algorithm. However, your
choice might be limited if you use a third-party gateway.
• If you have already used microservice architecture and included an API gateway in the
design to perform authentication, IP whitelisting, etc., you may add a rate limiter to the
API gateway.
• Building your own rate limiting service takes time. If you do not have enough
engineering resources to implement a rate limiter, a commercial API gateway is a better
option.

在设计速率限制器时，要问我们自己的一个重要问题是：应该在服务器端还是在网关中实现速率限制器？没有绝对的答案。这取决于您公司当前的技术栈、工程资源、优先级、目标等。以下是一些一般准则：

• 评估您当前的技术栈，例如编程语言、缓存服务等。确保您当前的编程语言是高效的在服务器端实施速率限制。
• 确定适合您业务需求的速率限制算法。当您在服务器端实现所有内容时，您可以完全控制算法。但是，如果您使用第三方网关，您的选择可能会受到限制。 
• 如果您已经使用微服务架构并在设计中包含API 网关以执行身份验证、IP 白名单等，您可以在API 网关中添加一个速率限制器。 
• 建立自己的速率限制服务需要时间。如果您没有足够的工程资源来实施速率限制器，商业 API 网关是更好的选择。

Algorithms for rate limiting 速率限制算法


Rate limiting can be implemented using different algorithms, and each of them has distinct
pros and cons. Even though this chapter does not focus on algorithms, understanding them at
high-level helps to choose the right algorithm or combination of algorithms to fit our use
cases. Here is a list of popular algorithms:
• Token bucket
• Leaking bucket
• Fixed window counter
• Sliding window log
• Sliding window counter

速率限制可以使用不同的算法来实现,并且每种算法都有不同的优缺点。即使本章不关注算法,但从高层次上理解它们有助于选择合适的算法或算法组合以适应我们的用例。以下是流行算法的列表:▁
•▁令牌桶▁
•▁泄漏桶▁
•▁固定窗口计数器▁
•▁滑动窗口日志▁
•▁滑动窗口计数器

Token bucket algorithm


The token bucket algorithm is widely used for rate limiting. It is simple, well understood and
commonly used by internet companies. Both Amazon [5] and Stripe [6] use this algorithm to
throttle their API requests.
The token bucket algorithm work as follows:

• A token bucket is a container that has pre-defined capacity. Tokens are put in the bucket
at preset rates periodically. Once the bucket is full, no more tokens are added. As shown in
Figure 4-4, the token bucket capacity is 4. The refiller puts 2 tokens into the bucket every
second. Once the bucket is full, extra tokens will overflow.

令牌桶算法广泛用于速率限制。它简单，易于理解，并被互联网公司普遍使用。Amazon [5] 和 Stripe [6] 都使用此算法来限制其 API 请求。令牌存储桶算法的工作方式如下： 

• 令牌存储桶是具有预定义容量的容器。代币会定期以预设的速率放入存储桶中。存储桶已满后，不会再添加令牌。如图 4-4 所示，令牌桶容量为 4。重新填充器每秒将 2 个令牌放入存储桶中。存储桶已满后，额外的令牌将溢出。


• Each request consumes one token. When a request arrives, we check if there are enough
tokens in the bucket. Figure 4-5 explains how it works.
    • If there are enough tokens, we take one token out for each request, and the request
    goes through.
    • If there are not enough tokens, the request is dropped.

• 每个请求使用一个令牌。当请求到达时，我们会检查存储桶中是否有足够的令牌。图 4-5 说明了它的工作原理。

    • 如果有足够的代币，我们会为每个请求取出一个代币，然后请求通过。
    • 如果没有足够的令牌，则会丢弃请求。


Figure 4-6 illustrates how token consumption, refill, and rate limiting logic work. In this
example, the token bucket size is 4, and the refill rate is 4 per 1 minute.

图 4-6 说明了令牌使用、重新填充和速率限制逻辑的工作原理。在此示例中，令牌存储桶大小为 4，重新填充速率为每 1 分钟 4。


The token bucket algorithm takes two parameters:
• Bucket size: the maximum number of tokens allowed in the bucket
• Refill rate: number of tokens put into the bucket every second
How many buckets do we need? This varies, and it depends on the rate-limiting rules. Here
are a few examples.
• It is usually necessary to have different buckets for different API endpoints. For instance,
if a user is allowed to make 1 post per second, add 150 friends per day, and like 5 posts per
second, 3 buckets are required for each user. 
• If we need to throttle requests based on IP addresses, each IP address requires a bucket.
• If the system allows a maximum of 10,000 requests per second, it makes sense to have a
global bucket shared by all requests.
Pros:
• The algorithm is easy to implement.
• Memory efficient.
• Token bucket allows a burst of traffic for short periods. A request can go through as long
as there are tokens left.
Cons:
• Two parameters in the algorithm are bucket size and token refill rate. However, it might
be challenging to tune them properly


令牌桶算法采用两个参数： 

• 桶大小：桶中允许的最大令牌数 
• 重新填充率：每秒放入桶中的令牌数 我们需要多少个桶？这会有所不同，并且取决于限速规则。这里有一些例子。 • 通常需要为不同的API 端点设置不同的桶。例如，如果允许用户每秒发 1 个帖子，每天添加 150 个朋友，每秒发 5 个帖子，则每个用户需要 3 个桶。 
• 如果我们需要根据IP 地址限制请求，每个IP 地址都需要一个桶。 
• 如果系统允许每秒最多 10,000 个请求，则让所有请求共享一个全局存储桶是有意义的。

优点： 
• 该算法易于实现。 
• 内存高效。
• 令牌桶允许短时间突发流量。只要还有剩余的令牌，请求就可以通过。

缺点： 
• 算法中的两个参数是桶大小和令牌重新填充率。但是，正确调整它们可能具有挑战性

Leaking bucket algorithm 漏桶算法


The leaking bucket algorithm is similar to the token bucket except that requests are processed
at a fixed rate. It is usually implemented with a first-in-first-out (FIFO) queue. The algorithm
works as follows:
• When a request arrives, the system checks if the queue is full. If it is not full, the request
is added to the queue.
• Otherwise, the request is dropped.
• Requests are pulled from the queue and processed at regular intervals.
Figure 4-7 explains how the algorithm works.

泄漏桶算法与令牌桶类似,只是请求以固定速率处理。它通常采用先进先出(FIFO)队列来实现。该算法的工作原理如下:

▁•▁当请求到达时,系统检查队列是否已满。如果未满,则将请求添加到队列中。
▁•▁否则,请求将被丢弃。
▁•▁请求从队列中提取并定期处理。

图▁4-7▁解释了算法的工作原理。


Leaking bucket algorithm takes the following two parameters:
• Bucket size: it is equal to the queue size. The queue holds the requests to be processed at
a fixed rate.
• Outflow rate: it defines how many requests can be processed at a fixed rate, usually in
seconds.
Shopify, an ecommerce company, uses leaky buckets for rate-limiting [7].
Pros:
• Memory efficient given the limited queue size.
• Requests are processed at a fixed rate therefore it is suitable for use cases that a stable
outflow rate is needed.
Cons:
• A burst of traffic fills up the queue with old requests, and if they are not processed in
time, recent requests will be rate limited.
• There are two parameters in the algorithm. It might not be easy to tune them properly

漏桶算法采用以下两个参数：

• 桶大小：等于队列大小。队列以固定速率保存要处理的请求。
• 流出率：它定义了可以以固定速率处理多少请求，通常以秒为单位。电子商务公司 Shopify 使用漏桶进行速率限制 [7]。

优点： 
• 鉴于队列大小有限，内存效率高。 
• 请求以固定速率处理，因此适用于需要稳定流出速率的用例。

缺点： 
• 突发流量用旧请求填满队列，如果不及时处理，最近的请求将被速率限制。 
• 算法中有两个参数。正确调整它们可能并不容易

Fixed window counter algorithm 固定窗口计数器算法


Fixed window counter algorithm works as follows:
• The algorithm divides the timeline into fix-sized time windows and assign a counter for
each window.
• Each request increments the counter by one.
• Once the counter reaches the pre-defined threshold, new requests are dropped until a new
time window starts.
Let us use a concrete example to see how it works. In Figure 4-8, the time unit is 1 second
and the system allows a maximum of 3 requests per second. In each second window, if more
than 3 requests are received, extra requests are dropped as shown in Figure 4-8.

固定窗口计数器算法的工作原理如下： 
• 该算法将时间线划分为固定大小的时间窗口，并为每个窗口分配一个计数器。
• 每个请求将计数器递增 1。
• 一旦计数器达到预定义的阈值，就会丢弃新请求，直到新的时间窗口开始。

让我们用一个具体的例子来看看它是如何工作的。在图 4-8 中，时间单位为 1 秒，系统每秒最多允许 3 个请求。在第二个窗口中，如果收到超过 3 个请求，则会丢弃额外的请求，如图 4-8 所示。


A major problem with this algorithm is that a burst of traffic at the edges of time windows
could cause more requests than allowed quota to go through. Consider the following case:

此算法的一个主要问题是，时间窗口边缘的流量突发可能会导致更多的请求超过允许的配额。请考虑以下情况：


In Figure 4-9, the system allows a maximum of 5 requests per minute, and the available quota
resets at the human-friendly round minute. As seen, there are five requests between 2:00:00
and 2:01:00 and five more requests between 2:01:00 and 2:02:00. For the one-minute window
between 2:00:30 and 2:01:30, 10 requests go through. That is twice as many as allowed
requests.
Pros:
• Memory efficient.
• Easy to understand.
• Resetting available quota at the end of a unit time window fits certain use cases.
Cons:
• Spike in traffic at the edges of a window could cause more requests than the allowed
quota to go through.

在图 4-9 中，系统每分钟最多允许 5 个请求，可用配额在人性化的圆形分钟重置。如前所述，在 2：00：00 和 2：01：00 之间有五个请求，在 2：01：00 和 2：02：00 之间还有五个请求。在 2：00：30 和 2：01：30 之间的一分钟时段内，将有 10 个请求通过。这是允许请求的两倍。

优点： 
• 内存效率高。
• 易于理解。
• 在单位时间窗口结束时重置可用配额适合某些用例。

缺点： 
• 窗口边缘的流量峰值可能会导致更多的请求超过允许的配额。

Sliding window log algorithm 滑动窗口日志算法


As discussed previously, the fixed window counter algorithm has a major issue: it allows
more requests to go through at the edges of a window. The sliding window log algorithm
fixes the issue. It works as follows:
• The algorithm keeps track of request timestamps. Timestamp data is usually kept in
cache, such as sorted sets of Redis [8].
• When a new request comes in, remove all the outdated timestamps. Outdated timestamps
are defined as those older than the start of the current time window.
• Add timestamp of the new request to the log.
• If the log size is the same or lower than the allowed count, a request is accepted.
Otherwise, it is rejected.
We explain the algorithm with an example as revealed in Figure 4-10.

如前所述，固定窗口计数器算法有一个主要问题：它允许更多请求在窗口边缘通过。滑动窗口日志算法修复了此问题。它的工作原理如下：
• 算法跟踪请求时间戳。时间戳数据通常保存在缓存中，例如排序的 Redis 集 [8]。
• 当收到新请求时，删除所有过时的时间戳。过期时间戳定义为早于当前时间窗口开始的时间戳。
• 将新请求的时间戳添加到日志中。
• 如果日志大小等于或低于允许的计数，则接受请求。否则，将被拒绝。我们用一个示例来解释算法，如图 4-10 所示。


In this example, the rate limiter allows 2 requests per minute. Usually, Linux timestamps are
stored in the log. However, human-readable representation of time is used in our example for
better readability.
• The log is empty when a new request arrives at 1:00:01. Thus, the request is allowed.
• A new request arrives at 1:00:30, the timestamp 1:00:30 is inserted into the log. After the
insertion, the log size is 2, not larger than the allowed count. Thus, the request is allowed.
• A new request arrives at 1:00:50, and the timestamp is inserted into the log. After the
insertion, the log size is 3, larger than the allowed size 2. Therefore, this request is rejected
even though the timestamp remains in the log.
• A new request arrives at 1:01:40. Requests in the range [1:00:40,1:01:40) are within the
latest time frame, but requests sent before 1:00:40 are outdated. Two outdated timestamps,
1:00:01 and 1:00:30, are removed from the log. After the remove operation, the log size
becomes 2; therefore, the request is accepted.

Pros:
• Rate limiting implemented by this algorithm is very accurate. In any rolling window,
requests will not exceed the rate limit.

Cons:
• The algorithm consumes a lot of memory because even if a request is rejected, its
timestamp might still be stored in memory.


在此示例中，速率限制器允许每分钟 2 个请求。通常，Linux 时间戳存储在日志中。但是，我们的示例中使用了人类可读的时间表示，以提高可读性。 

• 当新请求在 1:00:01 到达时，日志为空。因此，请求被允许。 
• 新请求在 1:00:30 到达，时间戳 1:00:30 被插入到日志中。插入后日志大小为2，不大于允许的计数。因此，请求被允许。 
• 新请求在 1:00:50 到达，时间戳被插入到日志中。插入后，日志大小为 3，大于允许的大小 2。因此，即使时间戳保留在日志中，该请求也会被拒绝。 
• 新请求在 1:01:40 到达。 [1:00:40,1:01:40) 范围内的请求在最新时间范围内，但在 1:00:40 之前发送的请求已过时。从日志中删除了两个过时的时间戳，1:00:01 和 1:00:30。 remove操作后，日志大小变为2；因此，请求被接受。


优点：
• 此算法实施的速率限制非常准确。在任何滚动窗口中，请求都不会超过速率限制。

缺点： 
• 该算法消耗大量内存，因为即使请求被拒绝，其时间戳仍可能存储在内存中。

Sliding window counter algorithm 滑动窗口计数器算法


The sliding window counter algorithm is a hybrid approach that combines the fixed window
counter and sliding window log. The algorithm can be implemented by two different
approaches. We will explain one implementation in this section and provide reference for the
other implementation at the end of the section. Figure 4-11 illustrates how this algorithm
works.

滑动窗口计数器算法是一种混合方法，它结合了固定窗口计数器和滑动窗口日志。该算法可以通过两种不同的方法实现。我们将在本节中解释一个实现，并在本节末尾为另一个实现提供参考。图 4-11 说明了此算法的工作原理。


Assume the rate limiter allows a maximum of 7 requests per minute, and there are 5 requests
in the previous minute and 3 in the current minute. For a new request that arrives at a 30%
position in the current minute, the number of requests in the rolling window is calculated
using the following formula:
• Requests in current window + requests in the previous window * overlap percentage of
the rolling window and previous window
• Using this formula, we get 3 + 5 * 0.7% = 6.5 request. Depending on the use case, the
number can either be rounded up or down. In our example, it is rounded down to 6.
Since the rate limiter allows a maximum of 7 requests per minute, the current request can go
through. However, the limit will be reached after receiving one more request.
Due to the space limitation, we will not discuss the other implementation here. Interested
readers should refer to the reference material [9]. This algorithm is not perfect. It has pros and
cons.
Pros
• It smooths out spikes in traffic because the rate is based on the average rate of the
previous window.
• Memory efficient.
Cons
• It only works for not-so-strict look back window. It is an approximation of the actual rate
because it assumes requests in the previous window are evenly distributed. However, this
problem may not be as bad as it seems. According to experiments done by Cloudflare [10],
only 0.003% of requests are wrongly allowed or rate limited among 400 million requests.


假设速率限制器每分钟最多允许 7 个请求，并且前一分钟有 5 个请求，当前分钟有 3 个请求。对于在当前分钟内到达 30% 位置的新请求，滚动窗口中的请求数使用以下公式计算：

• 当前窗口中的请求+ 上一个窗口中的请求* 滚动窗口与上一个窗口的重叠百分比 
• 使用此公式，我们得到3 + 5 * 0.7% = 6.5 个请求。根据用例，数字可以向上或向下舍入。在我们的示例中，它向下舍入为 6。由于速率限制器每分钟最多允许 7 个请求，因此当前请求可以通过。但是，将在收到更多请求后达到限制。限于篇幅，这里不讨论其他实现方式。有兴趣的读者可以参考参考资料[9]。这个算法并不完美。它有利也有弊。

优点 
• 它可以消除流量高峰，因为费率基于前一个窗口的平均速率。
• 内存效率高。

缺点 
• 它仅适用于不太严格的回视窗口。它是实际速率的近似值，因为它假定上一个窗口中的请求是均匀分布的。但是，这个问题可能并不像看起来那么糟糕。根据 Cloudflare [10] 所做的实验，在 4 亿个请求中，只有 0.003% 的请求被错误地允许或限制速率。

High-level architecture 高级算法


The basic idea of rate limiting algorithms is simple. At the high-level, we need a counter to
keep track of how many requests are sent from the same user, IP address, etc. If the counter is
larger than the limit, the request is disallowed.
Where shall we store counters? Using the database is not a good idea due to slowness of disk
access. In-memory cache is chosen because it is fast and supports time-based expiration
strategy. For instance, Redis [11] is a popular option to implement rate limiting. It is an inmemory store that offers two commands: INCR and EXPIRE.
• INCR: It increases the stored counter by 1.
• EXPIRE: It sets a timeout for the counter. If the timeout expires, the counter is
automatically deleted.


限速算法的基本思想很简单。在高级别，我们需要一个计数器来跟踪从同一用户、IP 地址等发送了多少请求。如果计数器大于限制，则不允许该请求。

我们在哪里存放柜台？由于磁盘访问速度缓慢，使用数据库不是一个好主意。选择内存中缓存是因为它速度快且支持基于时间的过期策略。例如，Redis [11] 是实现速率限制的流行选项。

它是一个内存存储，提供两个命令：INCR 和 EXPIRE。

• INCR：它将存储的计数器增加 1。
• 过期：它设置计数器的超时。如果超时过期，计数器将自动删除。


Figure 4-12 shows the high-level architecture for rate limiting, and this works as follows:

图 4-12 显示了速率限制的高级体系结构，其工作原理如下：


• The client sends a request to rate limiting middleware.
• Rate limiting middleware fetches the counter from the corresponding bucket in Redis and
checks if the limit is reached or not.
• If the limit is reached, the request is rejected.
• If the limit is not reached, the request is sent to API servers. Meanwhile, the system
increments the counter and saves it back to Redis.


• 客户端向速率限制中间件发送请求。 
• 速率限制中间件从Redis中的相应桶中获取计数器，并检查是否已达到限制。
    • 如果已达到限制，则拒绝请求。 
    • 如果未达到限制，则将请求发送到API服务器。同时，系统会增加计数器并将其保存回Redis。

Step 3 - Design deep dive 设计深入探讨


The high-level design in Figure 4-12 does not answer the following questions:
• How are rate limiting rules created? Where are the rules stored?
• How to handle requests that are rate limited?
In this section, we will first answer the questions regarding rate limiting rules and then go
over the strategies to handle rate-limited requests. Finally, we will discuss rate limiting in
distributed environment, a detailed design, performance optimization and monitoring.

图 4-12 中的高级设计未回答以下问题： 

• 如何创建速率限制规则？规则存储在哪里？
• 如何处理受速率限制的请求？

在本节中，我们将首先回答有关速率限制规则的问题，然后介绍处理速率限制请求的策略。最后，我们将讨论分布式环境中的速率限制、详细设计、性能优化和监控。

Rate limiting rules 速率限制规则


Lyft open-sourced their rate-limiting component [12]. We will peek inside of the component
and look at some examples of rate limiting rules:

Lyft开源了他们的限速组件[12]。我们将查看组件内部并查看速率限制规则的一些示例：

shell
domain: messaging
descriptors:
 - key: message_type
 Value: marketing
 rate_limit:
 unit: day
 requests_per_unit: 5


In the above example, the system is configured to allow a maximum of 5 marketing messages
per day. Here is another example:

 在上面的示例中，系统配置为每天最多允许 5 条营销消息。


domain: auth
descriptors:
- key: auth_type
Value: login
rate_limit:
unit: minute
requests_per_unit: 5


This rule shows that clients are not allowed to login more than 5 times in 1 minute. Rules are
generally written in configuration files and saved on disk.
此规则显示不允许客户端在 1 分钟内登录超过 5 次。规则通常写在配置文件中并保存在磁盘上。

Exceeding the rate limit 超出速率限制


In case a request is rate limited, APIs return a HTTP response code 429 (too many requests)
to the client. Depending on the use cases, we may enqueue the rate-limited requests to be
processed later. For example, if some orders are rate limited due to system overload, we may
keep those orders to be processed later.


如果请求受速率限制，API 会向客户端返回 HTTP 响应代码 429（请求过多）。根据用例，我们可能会将速率受限的请求排队等待以后处理。例如，如果某些订单由于系统过载而受到速率限制，我们可能会保留这些订单以供以后处理。

Rate limiter headers 速率限制器标头


How does a client know whether it is being throttled? And how does a client know the
number of allowed remaining requests before being throttled? The answer lies in HTTP
response headers. The rate limiter returns the following HTTP headers to clients:
X-Ratelimit-Remaining: The remaining number of allowed requests within the window.
X-Ratelimit-Limit: It indicates how many calls the client can make per time window.
X-Ratelimit-Retry-After: The number of seconds to wait until you can make a request again
without being throttled.
When a user has sent too many requests, a 429 too many requests error and X-Ratelimit-
Retry-After header are returned to the client.


客户端如何知道它是否受到限制？客户端在受到限制之前如何知道允许的剩余请求数？答案就在 HTTP 响应标头中。速率限制器向客户端返回以下 HTTP 标头：

X-Ratelimit-Remaining：窗口中允许的剩余请求数。
X-Ratelimit-Limit：它指示客户端在每个时间窗口可以进行的调用次数。
X-Ratelimit-Retry-After：等待您再次发出请求而不受到限制的秒数。

当用户发送了太多请求时，将向客户端返回 429 请求过多错误和 X-Ratelimit - 重试后标头。

Detailed design 详细设计

Figure 4-13 presents a detailed design of the system.


• Rules are stored on the disk. Workers frequently pull rules from the disk and store them
in the cache.
• When a client sends a request to the server, the request is sent to the rate limiter
middleware first.
• Rate limiter middleware loads rules from the cache. It fetches counters and last request
timestamp from Redis cache. Based on the response, the rate limiter decides:
• if the request is not rate limited, it is forwarded to API servers.
• if the request is rate limited, the rate limiter returns 429 too many requests error to
the client. In the meantime, the request is either dropped or forwarded to the queue.


• 规则存储在磁盘上。辅助角色经常从磁盘中提取规则并将其存储在缓存中。
• 当客户端向服务器发送请求时，该请求首先发送到速率限制器中间件。
• 速率限制器中间件从缓存加载规则。它从 Redis 缓存中获取计数器和上次请求时间戳。根据响应，速率限制器决定：
    • 如果请求不受速率限制，则会将其转发到 API 服务器。
    • 如果请求受速率限制，则速率限制器会向客户端返回 429 个请求过多错误。同时，请求被丢弃或转发到队列。

Rate limiter in a distributed environment 分布式环境中的速率限制器


Building a rate limiter that works in a single server environment is not difficult. However,
scaling the system to support multiple servers and concurrent threads is a different story.
There are two challenges:
• Race condition
• Synchronization issue

建立在单个服务器环境下运作的速率限制器并不难。 但是，将系统扩展以支持多个服务器和并发线程则是不同的问题。 

这里有两个挑战：
• 竞争条件 
• 同步问题

Race condition 竞争条件


As discussed earlier, rate limiter works as follows at the high-level:
• Read the counter value from Redis.
• Check if ( counter + 1 ) exceeds the threshold.
• If not, increment the counter value by 1 in Redis.
Race conditions can happen in a highly concurrent environment as shown in Figure 4-14.

如前所述，在高层面上，速率限制器的工作原理如下： 
• 从Redis读取计数器的值。 
• 检查（计数器+1）是否超过了阈值。 
• 如果没有，将Redis中的计数器值增加1。

在高度并发的环境中可能会发生竞争条件，如图4-14所示。


Assume the counter value in Redis is 3. If two requests concurrently read the counter value
before either of them writes the value back, each will increment the counter by one and write
it back without checking the other thread. Both requests (threads) believe they have the
correct counter value 4. However, the correct counter value should be 5.
Locks are the most obvious solution for solving race condition. However, locks will
significantly slow down the system. Two strategies are commonly used to solve the problem:
Lua script [13] and sorted sets data structure in Redis [8]. For readers interested in these
strategies, refer to the corresponding reference materials [8] [13].

假设 Redis 中的计数器值为 3。如果两个请求在其中一个请求写回该值之前同时读取计数器值，则每个请求都将计数器递增一个并将其写回，而不检查另一个线程。两个请求（线程）都认为它们具有正确的计数器值 4。但是，正确的计数器值应为 5。

锁是解决争用条件的最明显的解决方案。但是，锁会显着降低系统速度。通常使用两种策略来解决问题：Lua 脚本 [13] 和 Redis [8] 中的排序集数据结构。对于对这些策略感兴趣的读者，请参阅相应的参考资料[8] [13]。

Synchronization issue 同步问题


Synchronization is another important factor to consider in a distributed environment. To
support millions of users, one rate limiter server might not be enough to handle the traffic.
When multiple rate limiter servers are used, synchronization is required. For example, on the
left side of Figure 4-15, client 1 sends requests to rate limiter 1, and client 2 sends requests to
rate limiter 2. As the web tier is stateless, clients can send requests to a different rate limiter
as shown on the right side of Figure 4-15. If no synchronization happens, rate limiter 1 does
not contain any data about client 2. Thus, the rate limiter cannot work properly

同步是分布式环境中要考虑的另一个重要因素。为了支持数百万用户，一台速率限制器服务器可能不足以处理流量。使用多个速率限制器服务器时，需要同步。例如，在图 4-15 的左侧，客户端 1 向速率限制器 1 发送请求，客户端 2 向速率限制器 2 发送请求。由于 Web 层是无状态的，因此客户端可以将请求发送到不同的速率限制器，如图 4-15 右侧所示。如果未发生同步，则速率限制器 1 不包含有关客户端 2 的任何数据。因此，速率限制器无法正常工作


One possible solution is to use sticky sessions that allow a client to send traffic to the same
rate limiter. This solution is not advisable because it is neither scalable nor flexible. A better
approach is to use centralized data stores like Redis. The design is shown in Figure 4-16.

一种可能的解决方案是使用粘性会话，允许客户端将流量发送到同一速率限制器。此解决方案不可取，因为它既不可扩展也不灵活。更好的方法是使用像 Redis 这样的集中式数据存储。设计如图4-16所示。

Performance optimization 性能优化


Performance optimization is a common topic in system design interviews. We will cover two
areas to improve.
First, multi-data center setup is crucial for a rate limiter because latency is high for users
located far away from the data center. Most cloud service providers build many edge server
locations around the world. For example, as of 5/20 2020, Cloudflare has 194 geographically
distributed edge servers [14]. Traffic is automatically routed to the closest edge server to
reduce latency.

性能优化是系统设计访谈中的常见话题。我们将介绍两个需要改进的领域。


首先,多数据中心设置对于速率限制器至关重要,因为对于远离数据中心的用户而言,延迟很高。大多数云服务提供商在全球范围内构建了许多边缘服务器位置。例如,截至▁2020▁年▁5▁月▁20▁日,Cloudflare▁拥有▁194▁个地理分布的边缘服务器▁[14]。流量会自动路由到最近的边缘服务器以减少延迟。


Second, synchronize data with an eventual consistency model. If you are unclear about the
eventual consistency model, refer to the “Consistency” section in “Chapter 6: Design a Keyvalue Store.”

其次，将数据与最终一致性模型同步。如果您不清楚最终一致性模型，请参阅“第 6 章：设计键值存储”中的“一致性”部分。

Monitoring 监控


After the rate limiter is put in place, it is important to gather analytics data to check whether
the rate limiter is effective. Primarily, we want to make sure:
• The rate limiting algorithm is effective.
• The rate limiting rules are effective.
For example, if rate limiting rules are too strict, many valid requests are dropped. In this case,
we want to relax the rules a little bit. In another example, we notice our rate limiter becomes
ineffective when there is a sudden increase in traffic like flash sales. In this scenario, we may
replace the algorithm to support burst traffic. Token bucket is a good fit here.


设置速率限制器后，收集分析数据以检查速率限制器是否有效非常重要。首先，我们要确保：
• 速率限制算法是有效的。
• 限速规则有效。

例如，如果速率限制规则过于严格，则会丢弃许多有效请求。在这种情况下，我们想稍微放宽规则。在另一个示例中，我们注意到当流量（如限时抢购）突然增加时，我们的速率限制器变得无效。在这种情况下，我们可能会替换算法以支持突发流量。令牌桶非常适合这里。

Step 4 - Wrap up 总结


In this chapter, we discussed different algorithms of rate limiting and their pros/cons.
Algorithms discussed include:
• Token bucket
• Leaking bucket
• Fixed window
• Sliding window log
• Sliding window counter
Then, we discussed the system architecture, rate limiter in a distributed environment,
performance optimization and monitoring. Similar to any system design interview questions,
there are additional talking points you can mention if time allows:
• Hard vs soft rate limiting.
• Hard: The number of requests cannot exceed the threshold.
• Soft: Requests can exceed the threshold for a short period.
• Rate limiting at different levels. In this chapter, we only talked about rate limiting at the
application level (HTTP: layer 7). It is possible to apply rate limiting at other layers. For
example, you can apply rate limiting by IP addresses using Iptables [15] (IP: layer 3).
Note: The Open Systems Interconnection model (OSI model) has 7 layers [16]: Layer 1:
Physical layer, Layer 2: Data link layer, Layer 3: Network layer, Layer 4: Transport layer,
Layer 5: Session layer, Layer 6: Presentation layer, Layer 7: Application layer.
• Avoid being rate limited. Design your client with best practices:
• Use client cache to avoid making frequent API calls.
• Understand the limit and do not send too many requests in a short time frame.
• Include code to catch exceptions or errors so your client can gracefully recover from
exceptions.
• Add sufficient back off time to retry logic.
Congratulations on getting this far! Now give yourself a pat on the back. Good job!


在本章中，我们讨论了不同的速率限制算法及其优缺点。 

讨论的算法包括： 
• 令牌桶算法 
• 泄漏桶算法 
• 固定窗口算法 
• 滑动窗口日志算法
• 滑动窗口计数器算法 

接着，我们讨论了系统架构、分布式环境下的速率限制器、 性能优化和监控。类似于任何系统设计面试问题，如果时间允许，你可以提到其他谈话点：
• 硬限制与软限制。
• 硬限制：请求数量不能超过阈值。 
• 软限制：请求可以在短时间内超过阈值。
• 不同级别的速率限制。

在本章中，我们只讨论了应用层（HTTP：第7层）的速率限制。可以在其他层应用速率限制。例如，您可以使用Iptables [15]（IP：第3层）按IP地址应用速率限制。 注意：开放式系统互联模型（OSI模型）有7层[16]：第1层：物理层，第2层：数据链路层，第3层：网络层，第4层：传输层，第5层：会话层，第6层：表示层，第7层：应用层。
• 避免被速率限制。设计客户端时要遵循最佳实践：
• 使用客户端缓存以避免频繁调用API。
• 理解限制，并不要在短时间内发送过多请求。 
• 包含能够捕捉异常或错误的代码，以便客户端可以从异常中优雅地恢复。 
• 添加足够的重试逻辑退避时间。 

恭喜您走到了这一步！现在请自我鼓励，干得好！

Reference materials


[1] Rate-limiting strategies and techniques: https://cloud.google.com/solutions/rate-limitingstrategies-techniques
[2] Twitter rate limits: https://developer.twitter.com/en/docs/basics/rate-limits
[3] Google docs usage limits: https://developers.google.com/docs/api/limits
[4] IBM microservices: https://www.ibm.com/cloud/learn/microservices
[5] Throttle API requests for better throughput:
https://docs.aws.amazon.com/apigateway/latest/developerguide/api-gateway-requestthrottling.html
[6] Stripe rate limiters: https://stripe.com/blog/rate-limiters
[7] Shopify REST Admin API rate limits: https://help.shopify.com/en/api/reference/restadmin-api-rate-limits
[8] Better Rate Limiting With Redis Sorted Sets:
https://engineering.classdojo.com/blog/2015/02/06/rolling-rate-limiter/
[9] System Design — Rate limiter and Data modelling:
https://medium.com/@saisandeepmopuri/system-design-rate-limiter-and-data-modelling-
9304b0d18250
[10] How we built rate limiting capable of scaling to millions of domains:
https://blog.cloudflare.com/counting-things-a-lot-of-different-things/
[11] Redis website: https://redis.io/
[12] Lyft rate limiting: https://github.com/lyft/ratelimit
[13] Scaling your API with rate limiters:
https://gist.github.com/ptarjan/e38f45f2dfe601419ca3af937fff574d#request-rate-limiter
[14] What is edge computing: https://www.cloudflare.com/learning/serverless/glossary/whatis-edge-computing/
[15] Rate Limit Requests with Iptables: https://blog.programster.org/rate-limit-requests-withiptables
[16] OSI model: https://en.wikipedia.org/wiki/OSI_model#Layer_architecture

目录

CHAPTER 4: DESIGN A RATE LIMITER 设计速率限制器

Step 1 - Understand the problem and establish design scope 第 1 步 - 了解问题并建立设计范围

Requirements 要求

Step 2 - Propose high-level design and get buy-in 第 2 步 - 提出高级设计并获得支持

Where to put the rate limiter? 在哪里放置速率限制器

Algorithms for rate limiting 速率限制算法

Token bucket algorithm

Leaking bucket algorithm 漏桶算法

Fixed window counter algorithm 固定窗口计数器算法

Sliding window log algorithm 滑动窗口日志算法

Sliding window counter algorithm 滑动窗口计数器算法

High-level architecture 高级算法

Step 3 - Design deep dive 设计深入探讨

Rate limiting rules 速率限制规则

Exceeding the rate limit 超出速率限制

Rate limiter headers 速率限制器标头

Detailed design 详细设计

Rate limiter in a distributed environment 分布式环境中的速率限制器

Race condition 竞争条件

Synchronization issue 同步问题

Performance optimization 性能优化

Monitoring 监控

Step 4 - Wrap up 总结

Reference materials