CHAPTER 8: DESIGN A URL SHORTENER



In this chapter, we will tackle an interesting and classic system design interview question:
designing a URL shortening service like tinyurl.

在本章中，我们将解决一个有趣而经典的系统设计面试问题：
设计一个像tinyurl这样的URL缩短服务。

Step 1 - Understand the problem and establish design scope 第 1 步 - 了解问题并建立设计范围


System design interview questions are intentionally left open-ended. To design a well-crafted
system, it is critical to ask clarification questions.

系统设计面试问题有意保持开放式。设计精心制作的
系统，提出澄清问题至关重要。


Candidate: Can you give an example of how a URL shortener work?
Interviewer: Assume URL
https://www.systeminterview.com/q=chatsystem&c=loggedin&v=v3&l=long is the original
URL. Your service creates an alias with shorter length: https://tinyurl.com/ y7keocwj. If you
click the alias, it redirects you to the original URL.
Candidate: What is the traffic volume?
Interviewer: 100 million URLs are generated per day.
Candidate: How long is the shortened URL?
Interviewer: As short as possible.
Candidate: What characters are allowed in the shortened URL?
Interviewer: Shortened URL can be a combination of numbers (0-9) and characters (a-z, AZ).
Candidate: Can shortened URLs be deleted or updated?
Interviewer: For simplicity, let us assume shortened URLs cannot be deleted or updated.

应聘者：你能举个例子来说明URL缩短器是如何工作的吗？
面试官：假设网址
https://www.systeminterview.com/q=chatsystem&c=loggedin&v=v3&l=long 是原版
网址。您的服务会创建长度较短的别名：https://tinyurl.com/ y7keocwj。如果你
单击别名，它会将您重定向到原始 URL。
应聘者：流量是多少？
采访者：每天生成 1 亿个 URL。
应聘者：缩短的URL有多长？
主持人：尽量短。
应聘者：缩短的 URL 中允许使用哪些字符？
面试官：缩短的URL可以是数字（0-9）和字符（a-z，AZ）的组合。
应聘者：缩短的网址可以删除或更新吗？
采访者：为简单起见，我们假设缩短的URL无法删除或更新。


Here are the basic use cases:
1.URL shortening: given a long URL => return a much shorter URL
2.URL redirecting: given a shorter URL => redirect to the original URL
3.High availability, scalability, and fault tolerance considerations

以下是基本用例：
1.URL缩短：给定一个长URL =>返回一个更短的URL
2.URL重定向：给定较短的URL =>重定向到原始URL
3.高可用性、可扩展性和容错注意事项

Back of the envelope estimation 包络背面估算


• Write operation: 100 million URLs are generated per day.
• Write operation per second: 100 million / 24 /3600 = 1160
• Read operation: Assuming ratio of read operation to write operation is 10:1, read
operation per second: 1160 * 10 = 11,600
• Assuming the URL shortener service will run for 10 years, this means we must support
100 million * 365 * 10 = 365 billion records.
• Assume average URL length is 100.
• Storage requirement over 10 years: 365 billion * 100 bytes * 10 years = 365 TB
It is important for you to walk through the assumptions and calculations with your
interviewer so that both of you are on the same page.


• 写入操作：每天生成 1 亿个 URL。
• 每秒写入操作数：1 亿 / 24 /3600 = 1160
• 读操作：假设读写操作比为10：1，读每秒操作数：1160 * 10 = 11，600
• 假设URL缩短器服务将运行10年，这意味着我们必须支持1 亿 * 365 * 10 = 3650 亿条记录。
• 假设平均 URL 长度为 100。
• 10 年以上的存储需求：3650 亿 * 100 字节 * 10 年 = 365 TB
对您来说，与您的假设和计算一起完成假设和计算非常重要
面试官，这样你们俩就在同一页面上。

Step 2 - Propose high-level design and get buy-in 第 2 步 - 提出高级设计并获得支持


In this section, we discuss the API endpoints, URL redirecting, and URL shortening flows.

在本节中，我们将讨论 API 端点、URL 重定向和 URL 缩短流程。

API Endpoints 接口端点


API endpoints facilitate the communication between clients and servers. We will design the
APIs REST-style. If you are unfamiliar with restful API, you can consult external materials,
such as the one in the reference material [1]. A URL shortener primary needs two API
endpoints.
1.URL shortening. To create a new short URL, a client sends a POST request, which contains
one parameter: the original long URL. The API looks like this:
POST api/v1/data/shorten
• request parameter: {longUrl: longURLString} 
• return shortURL
2.URL redirecting. To redirect a short URL to the corresponding long URL, a client sends a
GET request. The API looks like this:
GET api/v1/shortUrl
• Return longURL for HTTP redirection

API 端点有助于客户端和服务器之间的通信。我们将设计
API REST 样式。如果你不熟悉 restful API，可以查阅外部资料，
如参考资料[1]中的那个。一个 URL 缩短器主要需要两个 API端点。

1.网址缩短。要创建新的短 URL，客户端会发送一个 POST 请求，其中包含一个参数：原始长 URL。该 API 如下所示：
POST api/v1/data/short
• 请求参数：{longUrl： longURLString}
• 返回短网址
2.网址重定向。要将短 URL 重定向到相应的长 URL，客户端会发送获取请求。该 API 如下所示：
GET api/v1/shortUrl
• 返回用于 HTTP 重定向的长网址

URL redirecting


Figure 8-1 shows what happens when you enter a tinyurl onto the browser. Once the server
receives a tinyurl request, it changes the short URL to the long URL with 301 redirect.

图 8-1 显示了在浏览器上输入 tinyurl 时会发生什么。一旦服务器
收到一个 tinyurl 请求，它会将短 URL 更改为带有 301 重定向的长 URL。


The detailed communication between clients and servers is shown in Figure 8-2.


One thing worth discussing here is 301 redirect vs 302 redirect.
301 redirect. A 301 redirect shows that the requested URL is “permanently” moved to the
long URL. Since it is permanently redirected, the browser caches the response, and
subsequent requests for the same URL will not be sent to the URL shortening service.
Instead, requests are redirected to the long URL server directly.
302 redirect. A 302 redirect means that the URL is “temporarily” moved to the long URL,
meaning that subsequent requests for the same URL will be sent to the URL shortening
service first. Then, they are redirected to the long URL server.
Each redirection method has its pros and cons. If the priority is to reduce the server load,
using 301 redirect makes sense as only the first request of the same URL is sent to URL
shortening servers. However, if analytics is important, 302 redirect is a better choice as it can
track click rate and source of the click more easily.

这里值得讨论的一件事是 301 重定向与 302 重定向。
301重定向。301 重定向显示请求的 URL 已“永久”移动到长网址。由于它是永久重定向的，因此浏览器会缓存响应，并且对同一 URL 的后续请求将不会发送到 URL 缩短服务。相反，请求将直接重定向到长 URL 服务器。
302重定向。302 重定向意味着 URL “暂时”移动到长 URL，这意味着对同一 URL 的后续请求将被发送到 URL 缩短服务至上。然后，它们被重定向到长 URL 服务器。

每种重定向方法都有其优点和缺点。如果优先级是减少服务器负载，使用 301 重定向是有意义的，因为只有相同 URL 的第一个请求才会发送到 URL缩短服务器。但是，如果分析很重要，302重定向是更好的选择，因为它可以
更轻松地跟踪点击率和点击来源。


The most intuitive way to implement URL redirecting is to use hash tables. Assuming the
hash table stores <shortURL, longURL> pairs, URL redirecting can be implemented by the
following:
• Get longURL: longURL = hashTable.get(shortURL)
• Once you get the longURL, perform the URL redirect.


实现 URL 重定向的最直观方法是使用哈希表。假设
哈希表存储<短URL，长URL>对，URL重定向可以通过
以后：
• Get longURL： longURL = hashTable.get（shortURL）
• 获取长网址后，执行网址重定向。

URL shortening


Let us assume the short URL looks like this: www.tinyurl.com/{hashValue}. To support the
URL shortening use case, we must find a hash function fx that maps a long URL to the
hashValue, as shown in Figure 8-3.

让我们假设短URL看起来像这样：www.tinyurl.com/{hashValue}。为了支持
URL 缩短用例，我们必须找到一个哈希函数 fx 将长 URL 映射到
哈希值，如图 8-3 所示。


The hash function must satisfy the following requirements:
• Each longURL must be hashed to one hashValue.
• Each hashValue can be mapped back to the longURL.
Detailed design for the hash function is discussed in deep dive.

哈希函数必须满足以下要求：
• 每个长URL必须散列为一个哈希值。
• 每个哈希值都可以映射回长URL。
哈希函数的详细设计将在深入探讨中讨论。

Step 3 - Design deep dive 步骤 3 - 设计深入探讨


Up until now, we have discussed the high-level design of URL shortening and URL
redirecting. In this section, we dive deep into the following: data model, hash function, URL shortening and URL redirecting.

到目前为止，我们已经讨论了 URL 缩短和 URL 的高级设计重定向。在本节中，我们将深入探讨以下内容：数据模型、哈希函数、URL 缩短和 URL 重定向。

Data model 数据模型


In the high-level design, everything is stored in a hash table. This is a good starting point;
however, this approach is not feasible for real-world systems as memory resources are limited
and expensive. A better option is to store <shortURL, longURL> mapping in a relational
database. Figure 8-4 shows a simple database table design. The simplified version of the table
contains 3 columns: id, shortURL, longURL.

在高级设计中，所有内容都存储在哈希表中。这是一个很好的起点;
但是，这种方法对于实际系统是不可行的，因为内存资源有限。
而且价格昂贵。更好的选择是在关系中存储<短URL，长URL>映射
数据库。图 8-4 显示了一个简单的数据库表设计。表格的简化版本
包含 3 列：id、shortURL、longURL。

Hash function hash函数


Hash function is used to hash a long URL to a short URL, also known as hashValue.

哈希函数用于将长 URL 哈希为短 URL，也称为哈希值。

Hash value length 哈希值长度


The hashValue consists of characters from [0-9, a-z, A-Z], containing 10 + 26 + 26 = 62
possible characters. To figure out the length of hashValue, find the smallest n such that 62^n
≥ 365 billion. The system must support up to 365 billion URLs based on the back of the
envelope estimation. Table 8-1 shows the length of hashValue and the corresponding
maximal number of URLs it can support.

哈希值由 [0-9， a-z， A-Z] 中的字符组成，包含 10 + 26 + 26 = 62
可能的字符。要计算哈希值的长度，请找到最小的 n，使得 62^n≥3650亿。系统必须支持多达 3650 亿个基于
包络估计。表 8-1 显示了哈希值的长度和相应的它可以支持的最大 URL 数。


When n = 7, 62 ^ n = ~3.5 trillion, 3.5 trillion is more than enough to hold 365 billion URLs,
so the length of hashValue is 7.
We will explore two types of hash functions for a URL shortener. The first one is “hash +
collision resolution”, and the second one is “base 62 conversion.” Let us look at them one by
one


当 n = 7， 62 ^ n = ~3.5 万亿时，3.5 万亿足以容纳 3650 亿个 URL，
所以哈希值的长度是 7。
我们将探索 URL 缩短器的两种类型的哈希函数。第一个是“哈希+
冲突解决“，第二个是”Base 62 转换”。让我们一一看一下
一

Hash + collision resolution “哈希+冲突解决“


To shorten a long URL, we should implement a hash function that hashes a long URL to a 7-
character string. A straightforward solution is to use well-known hash functions like CRC32,
MD5, or SHA-1. The following table compares the hash results after applying different hash
functions on this URL: https://en.wikipedia.org/wiki/Systems_design.

为了缩短长 URL，我们应该实现一个哈希函数，将长 URL 哈希为 7-
字符串。一个简单的解决方案是使用众所周知的哈希函数，如CRC32，
MD5 或 SHA-1。下表比较了应用不同哈希后的哈希结果
此 URL 上的函数：https://en.wikipedia.org/wiki/Systems_design。


As shown in Table 8-2, even the shortest hash value (from CRC32) is too long (more than 7
characters). How can we make it shorter?
The first approach is to collect the first 7 characters of a hash value; however, this method
can lead to hash collisions. To resolve hash collisions, we can recursively append a new
predefined string until no more collision is discovered. This process is explained in Figure 8-
5.

如表 8-2 所示，即使是最短的哈希值（来自 CRC32）也太长（超过 7
字符）。我们怎样才能缩短它？
第一种方法是收集哈希值的前 7 个字符;但是，这种方法
可能导致哈希冲突。为了解决哈希冲突，我们可以递归地附加一个新的
预定义字符串，直到不再发现冲突。这个过程在图 8-5 中进行了说明


This method can eliminate collision; however, it is expensive to query the database to check
if a shortURL exists for every request. A technique called bloom filters [2] can improve
performance. A bloom filter is a space-efficient probabilistic technique to test if an element is
a member of a set. Refer to the reference material [2] for more details.

这种方法可以消除碰撞;但是，查询数据库进行检查的成本很高
如果每个请求都存在短 URL。一种称为布隆过滤器 [2] 的技术可以改善
性能。布隆过滤器是一种节省空间的概率技术，用于测试元素是否
集合的成员。有关更多详细信息，请参阅参考资料 [2]。

Base 62 conversion Base 62 转换


Base conversion is another approach commonly used for URL shorteners. Base conversion
helps to convert the same number between its different number representation systems. Base
62 conversion is used as there are 62 possible characters for hashValue. Let us use an
example to explain how the conversion works: convert 1115710 to base 62 representation
(1115710 represents 11157 in a base 10 system).
• From its name, base 62 is a way of using 62 characters for encoding. The mappings are:
0-0, ..., 9-9, 10-a, 11-b, ..., 35-z, 36-A, ..., 61-Z, where ‘a’ stands for 10, ‘Z’ stands for 61,
etc.
• 1115710 = 2 x 622
 + 55 x 621 + 59 x 620
 = [2, 55, 59] -> [2, T, X] in base 62
representation. Figure 8-6 shows the conversation process.

基本转换是 URL 缩短器常用的另一种方法。基本转换
有助于在其不同的数字表示系统之间转换相同的数字。基础
使用 62 转换，因为哈希值有 62 个可能的字符。让我们使用
说明转换工作原理的示例：将1115710转换为基数 62 表示形式
（1115710 表示以 10 为基数的系统中的 11157）。
• 顾名思义，base 62 是一种使用 62 个字符进行编码的方式。映射是：
0-0， ...， 9-9， 10-a， 11-b， ...， 35-z， 36-A， ...， 61-Z， 其中 'a' 代表 10， 'Z' 代表 61，
等。
• 1115710 = 2 x 622
 + 55 x 621 + 59 x 620
 = [2， 55， 59] -> [2， T， X] 以 62 为底
表示法。图 8-6 显示了对话过程。


• Thus, the short URL is https://tinyurl.com /2TX

• 因此，短网址是 https://tinyurl.com /2TX

Comparison of the two approaches


Table 8-3 shows the differences of the two approaches.

URL shortening deep dive


As one of the core pieces of the system, we want the URL shortening flow to be logically
simple and functional. Base 62 conversion is used in our design. We build the following
diagram (Figure 8-7) to demonstrate the flow.

作为系统的核心部分之一，我们希望 URL 缩短流程在逻辑上是
简单实用。我们的设计中使用了 Base 62 转换。我们构建以下内容
图（图 8-7）来演示流程。


1. longURL is the input.
2. The system checks if the longURL is in the database.
3. If it is, it means the longURL was converted to shortURL before. In this case, fetch the
shortURL from the database and return it to the client.
4. If not, the longURL is new. A new unique ID (primary key) Is generated by the unique
ID generator.
5. Convert the ID to shortURL with base 62 conversion.
6. Create a new database row with the ID, shortURL, and longURL.



1. 长网址是输入。
2. 系统检查长URL是否在数据库中。
3.如果是，则表示之前已将长URL转换为短URL。在这种情况下，获取数据库中的 shortURL 并将其返回给客户端。
4. 如果不是，则长网址是新的。一个新的唯一 ID（主键）由唯一ID 生成器。
5. 使用 base 62 转换将 ID 转换为短 URL。
6. 创建一个包含 ID、短 URL 和长 URL 的新数据库行。


To make the flow easier to understand, let us look at a concrete example.
• Assuming the input longURL is: https://en.wikipedia.org/wiki/Systems_design
• Unique ID generator returns ID: 2009215674938.
• Convert the ID to shortURL using the base 62 conversion. ID (2009215674938) is
converted to “zn9edcu”.
• Save ID, shortURL, and longURL to the database as shown in Table 8-4.

为了使流程更易于理解，让我们看一个具体的例子。
• 假设输入的长URL是：https://en.wikipedia.org/wiki/Systems_design
• 唯一 ID 生成器返回 ID：2009215674938。
• 使用 base 62 转换将 ID 转换为短 URL。ID （2009215674938） 是转换为“Zn9edcu”。
• 将 ID、短 URL 和长 URL 保存到数据库中，如表 8-4 所示。


The distributed unique ID generator is worth mentioning. Its primary function is to generate
globally unique IDs, which are used for creating shortURLs. In a highly distributed
environment, implementing a unique ID generator is challenging. Luckily, we have already
discussed a few solutions in “Chapter 7: Design A Unique ID Generator in Distributed
Systems”. You can refer back to it to refresh your memory.

分布式唯一 ID 生成器值得一提。它的主要功能是生成
全局唯一 ID，用于创建短 URL。在高度分布
环境中，实现唯一的 ID 生成器具有挑战性。幸运的是，我们已经
在“第 7 章：在分布式中设计唯一 ID 生成器”中讨论了一些解决方案
系统”。您可以参考它以刷新您的记忆。

URL redirecting deep dive


Figure 8-8 shows the detailed design of the URL redirecting. As there are more reads than
writes, <shortURL, longURL> mapping is stored in a cache to improve performance.

图 8-8 显示了 URL 重定向的详细设计。由于读取次数多于
writes、<shortURL、longURL>映射存储在缓存中以提高性能。



The flow of URL redirecting is summarized as follows:
1. A user clicks a short URL link: https://tinyurl.com/zn9edcu
2. The load balancer forwards the request to web servers.
3. If a shortURL is already in the cache, return the longURL directly.
4. If a shortURL is not in the cache, fetch the longURL from the database. If it is not in the
database, it is likely a user entered an invalid shortURL.
5. The longURL is returned to the user.


URL 重定向的流程总结如下：
1. 用户点击短 URL 链接：https://tinyurl.com/zn9edcu
2. 负载均衡器将请求转发到 Web 服务器。
3. 如果缓存中已有短网址，请直接返回长网址。
4. 如果缓存中没有短 URL，请从数据库中获取长 URL。如果它不在
数据库，则可能是用户输入了无效的短URL。
5. 将长 URL 返回给用户。

Step 4 - Wrap up


In this chapter, we talked about the API design, data model, hash function, URL shortening,
and URL redirecting.
If there is extra time at the end of the interview, here are a few additional talking points.
• Rate limiter: A potential security problem we could face is that malicious users send an
overwhelmingly large number of URL shortening requests. Rate limiter helps to filter out
requests based on IP address or other filtering rules. If you want to refresh your memory
about rate limiting, refer to “Chapter 4: Design a rate limiter”.
• Web server scaling: Since the web tier is stateless, it is easy to scale the web tier by
adding or removing web servers.
• Database scaling: Database replication and sharding are common techniques.
• Analytics: Data is increasingly important for business success. Integrating an analytics
solution to the URL shortener could help to answer important questions like how many
people click on a link? When do they click the link? etc. 
• Availability, consistency, and reliability. These concepts are at the core of any large
system’s success. We discussed them in detail in Chapter 1, please refresh your memory
on these topics.
Congratulations on getting this far! Now give yourself a pat on the back. Good job!



在本章中，我们讨论了 API 设计、数据模型、哈希函数、URL 缩短、
和网址重定向。
如果面试结束时有额外的时间，这里有一些额外的谈话要点。
• 速率限制器：我们可能面临的潜在安全问题是恶意用户发送大量的 URL 缩短请求。速率限制器有助于过滤掉
基于 IP 地址或其他筛选规则的请求。如果您想刷新记忆关于速率限制，请参阅“第 4 章：设计速率限制器”。
• Web 服务器缩放：由于 Web 层是无状态的，因此很容易通过以下方式扩展 Web 层添加或删除 Web 服务器。
• 数据库扩展：数据库复制和分片是常用技术。
• 分析：数据对于业务成功越来越重要。集成分析URL缩短器的解决方案可以帮助回答重要问题，例如有多少
人们点击链接？他们什么时候点击链接？等。
• 可用性、一致性和可靠性。这些概念是任何大型的核心系统的成功。我们在第一章详细讨论过，请刷新记忆
关于这些主题。
恭喜你走到了这一步！现在拍拍自己的背。干得好！

Reference materials


[1] A RESTful Tutorial: https://www.restapitutorial.com/index.html
[2] Bloom filter: https://en.wikipedia.org/wiki/Bloom_filter

目录

CHAPTER 8: DESIGN A URL SHORTENER

Step 1 - Understand the problem and establish design scope 第 1 步 - 了解问题并建立设计范围

Back of the envelope estimation 包络背面估算

Step 2 - Propose high-level design and get buy-in 第 2 步 - 提出高级设计并获得支持

API Endpoints 接口端点

URL redirecting

URL shortening

Step 3 - Design deep dive 步骤 3 - 设计深入探讨

Data model 数据模型

Hash function hash函数

Hash value length 哈希值长度

Hash + collision resolution “哈希+冲突解决“

Base 62 conversion Base 62 转换

Comparison of the two approaches

URL shortening deep dive

URL redirecting deep dive

Step 4 - Wrap up

Reference materials