CHAPTER 12: DESIGN A CHAT SYSTEM


In this chapter we explore the design of a chat system. Almost everyone uses a chat app.
Figure 12-1 shows some of the most popular apps in the marketplace.

在本章中，我们将探讨聊天系统的设计。几乎每个人都使用聊天应用程序。
图 12-1 显示了市场上一些最受欢迎的应用。


A chat app performs different functions for different people. It is extremely important to nail down the exact requirements. For example, you do not want to design a system that focuses on group chat when the interviewer has one-on-one chat in mind. It is important to explore the feature requirements.

聊天应用程序为不同的人执行不同的功能。确定确切的要求非常重要。例如，当面试官考虑一对一聊天时，您不希望设计一个专注于群聊的系统。探索功能要求非常重要。

Step 1 - Understand the problem and establish design scope


It is vital to agree on the type of chat app to design. In the marketplace, there are one-on-one chat apps like Facebook Messenger, WeChat, and WhatsApp, office chat apps that focus on group chat like Slack, or game chat apps, like Discord, that focus on large group interaction and low voice chat latency.
The first set of clarification questions should nail down what the interviewer has in mind exactly when she asks you to design a chat system. At the very least, figure out if you should focus on a one-on-one chat or group chat app. Some questions you might ask are as follows:

就要设计的聊天应用程序类型达成一致至关重要。在市场上，有一对一的聊天应用程序，如Facebook Messenger，微信和WhatsApp，办公室聊天应用程序专注于Slack等群聊，或游戏聊天应用程序，如Discord，专注于大型群组互动和低语音聊天延迟。
第一组澄清问题应该确定面试官在要求你设计聊天系统时的想法。至少，弄清楚您是否应该专注于一对一聊天或群聊应用程序。您可能会问的一些问题如下：


Candidate: What kind of chat app shall we design? 1 on 1 or group based?
Interviewer: It should support both 1 on 1 and group chat.
Candidate: Is this a mobile app? Or a web app? Or both?
Interviewer: Both.
Candidate: What is the scale of this app? A startup app or massive scale?
Interviewer: It should support 50 million daily active users (DAU).
Candidate: For group chat, what is the group member limit?
Interviewer: A maximum of 100 people
Candidate: What features are important for the chat app? Can it support attachment?
Interviewer: 1 on 1 chat, group chat, online indicator. The system only supports text
messages.
Candidate: Is there a message size limit?
Interviewer: Yes, text length should be less than 100,000 characters long.
Candidate: Is end-to-end encryption required?
Interviewer: Not required for now but we will discuss that if time allows.
Candidate: How long shall we store the chat history? 
Interviewer: Forever.

应聘者：我们应该设计什么样的聊天应用？1对1还是基于小组？
主持人：应该同时支持1对1和群聊。
应聘者：这是一个移动应用程序吗？还是网络应用程序？还是两者兼而有之？
主持人：两者都有。
应聘者：这个应用的规模有多大？启动应用程序还是大规模？
主持人：它应该支持5000万日活跃用户（DAU）。
应聘者：对于群聊，群成员限制是多少？
面试官：最多100人
应聘者：哪些功能对聊天应用很重要？它可以支持附件吗？
面试官：1对1聊天，群聊，在线指示器。系统仅支持文本
消息。
应聘者：是否有邮件大小限制？
主持人：是的，文本长度应小于100，000个字符。
应聘者：是否需要端到端加密？
采访者：暂时不需要，但如果时间允许，我们会讨论这个问题。
应聘者：聊天记录要存储多久？
采访者：永远。


In the chapter, we focus on designing a chat app like Facebook messenger, with an emphasis on the following features:
• A one-on-one chat with low delivery latency
• Small group chat (max of 100 people)
• Online presence
• Multiple device support. The same account can be logged in to multiple accounts at the same time.
• Push notifications
It is also important to agree on the design scale. We will design a system that supports 50 million DAU.

在本章中，我们专注于设计像Facebook Messenger这样的聊天应用程序，重点是以下功能：
• 一对一聊天，交付延迟低
•小群聊（最多100人）
• 在线状态
•多设备支持。同一个帐户可以同时登录到多个帐户。
• 推送通知
就设计规模达成一致也很重要。我们将设计一个支持5000万DAU的系统。

Step 2 - Propose high-level design and get buy-in


To develop a high-quality design, we should have a basic knowledge of how clients and servers communicate. In a chat system, clients can be either mobile applications or web applications. Clients do not communicate directly with each other. Instead, each client connects to a chat service, which supports all the features mentioned above. Let us focus on fundamental operations. The chat service must support the following functions:
• Receive messages from other clients.
• Find the right recipients for each message and relay the message to the recipients.
• If a recipient is not online, hold the messages for that recipient on the server until she is online.
Figure 12-2 shows the relationships between clients (sender and receiver) and the chat service.


为了开发高质量的设计，我们应该对客户端和服务器如何通信有基本的了解。在聊天系统中，客户端可以是移动应用程序或 Web 应用程序。客户端之间不直接通信。相反，每个客户端都连接到支持上述所有功能的聊天服务。让我们专注于基本操作。聊天服务必须支持以下功能：
• 接收来自其他客户端的消息。
• 为每封邮件找到正确的收件人，并将邮件中继给收件人。
• 如果收件人不在线，请将该收件人的邮件保留在服务器上，直到她联机。
图 12-2 显示了客户端（发送方和接收方）与聊天服务之间的关系。


When a client intends to start a chat, it connects the chats service using one or more network protocols. For a chat service, the choice of network protocols is important. Let us discuss this with the interviewer.

Requests are initiated by the client for most client/server applications. This is also true for the sender side of a chat application. In Figure 12-2, when the sender sends a message to the receiver via the chat service, it uses the time-tested HTTP protocol, which is the most common web protocol. In this scenario, the client opens a HTTP connection with the chat service and sends the message, informing the service to send the message to the receiver. The keep-alive is efficient for this because the keep-alive header allows a client to maintain a
persistent connection with the chat service. It also reduces the number of TCP handshakes. HTTP is a fine option on the sender side, and many popular chat applications such as Facebook [1] used HTTP initially to send messages.

However, the receiver side is a bit more complicated. Since HTTP is client-initiated, it is not trivial to send messages from the server. Over the years, many techniques are used to simulate a server-initiated connection: polling, long polling, and WebSocket. Those are important techniques widely used in system design interviews so let us examine each of them.

当客户端打算开始聊天时，它会使用一个或多个网络协议连接聊天服务。对于聊天服务，网络协议的选择很重要。让我们和面试官讨论一下。

对于大多数客户端/服务器应用程序，请求由客户端启动。聊天应用程序的发送方也是如此。在图 12-2 中，当发送方通过聊天服务向接收方发送消息时，它使用经过时间考验的 HTTP 协议，这是最常见的 Web 协议。在此方案中，客户端打开与聊天服务的 HTTP 连接并发送消息，通知服务将消息发送到接收方。保持活动状态对此是有效的，因为保持活动状态标头允许客户端维护与聊天服务的持久连接。它还减少了 TCP 握手的次数。HTTP在发送端是一个很好的选择，许多流行的聊天应用程序，如Facebook[1]最初使用HTTP来发送消息。

但是，接收器端有点复杂。由于 HTTP 是客户端启动的，因此从服务器发送消息并非易事。多年来，许多技术用于模拟服务器启动的连接：轮询、长轮询和 WebSocket。这些是系统设计访谈中广泛使用的重要技术，因此让我们逐一检查它们。

Polling


As shown in Figure 12-3, polling is a technique that the client periodically asks the server if there are messages available. Depending on polling frequency, polling could be costly. It could consume precious server resources to answer a question that offers no as an answer most of the time.

如图 12-3 所示，轮询是一种客户端定期询问服务器是否有可用消息的技术。根据轮询频率，轮询的成本可能很高。它可能会消耗宝贵的服务器资源来回答大多数时候不提供答案的问题。

Long polling 长轮询


Because polling could be inefficient, the next progression is long polling (Figure 12-4).

由于轮询可能效率低下，因此下一个进程是长轮询（图 12-4）。


In long polling, a client holds the connection open until there are actually new messages available or a timeout threshold has been reached. Once the client receives new messages, it immediately sends another request to the server, restarting the process. Long polling has a few drawbacks:
    • Sender and receiver may not connect to the same chat server. HTTP based servers are usually stateless. If you use round robin for load balancing, the server that receives the message might not have a long-polling connection with the client who receives the message.
    • A server has no good way to tell if a client is disconnected.
    • It is inefficient. If a user does not chat much, long polling still makes periodic connections after timeouts.

在长轮询中，客户端使连接保持打开状态，直到实际有新消息可用或达到超时阈值。一旦客户端收到新消息，它会立即向服务器发送另一个请求，重新启动进程。长轮询有几个缺点：
    • 发送方和接收方可能无法连接到同一聊天服务器。基于 HTTP 的服务器通常是无状态的。如果使用轮循机制进行负载平衡，则接收消息的服务器可能与接收消息的客户端没有长轮询连接。
    • 服务器没有很好的方法来判断客户端是否断开连接。
    • 效率低下。如果用户聊天不多，长轮询在超时后仍会定期建立连接。

Websocket


WebSocket is the most common solution for sending asynchronous updates from server to client. Figure 12-5 shows how it works.

WebSocket 是从服务器到客户端发送异步更新的最常见解决方案。图 12-5 显示了它的工作原理。


WebSocket connection is initiated by the client. It is bi-directional and persistent. It starts its life as a HTTP connection and could be “upgraded” via some well-defined handshake to a WebSocket connection. Through this persistent connection, a server could send updates to a client. WebSocket connections generally work even if a firewall is in place. This is because they use port 80 or 443 which are also used by HTTP/HTTPS connections.

Earlier we said that on the sender side HTTP is a fine protocol to use, but since WebSocket is bidirectional, there is no strong technical reason not to use it also for sending. Figure 12-6 shows how WebSockets (ws) is used for both sender and receiver sides.


WebSocket 连接由客户端启动。它是双向和持久的。它以HTTP连接开始其生命周期，可以通过一些明确定义的握手“升级”为WebSocket连接。通过这种持久连接，服务器可以将更新发送到客户端。即使防火墙就位，WebSocket 连接通常也可以工作。这是因为它们使用端口 80 或 443，这些端口也由 HTTP/HTTPS 连接使用。

前面我们说过，在发送端HTTP是一个很好的协议，但是由于WebSocket是双向的，因此没有强有力的技术理由不将其用于发送。图 12-6 显示了如何将 WebSocket （ws） 用于发送方和接收方。


By using WebSocket for both sending and receiving, it simplifies the design and makes implementation on both client and server more straightforward. Since WebSocket connections are persistent, efficient connection management is critical on the server-side.

通过使用 WebSocket 进行发送和接收，它简化了设计，并使客户端和服务器上的实现更加简单。由于 WebSocket 连接是持久的，因此高效的连接管理在服务器端至关重要。

High-level design


Just now we mentioned that WebSocket was chosen as the main communication protocol
between the client and server for its bidirectional communication, it is important to note that everything else does not have to be WebSocket. In fact, most features (sign up, login, user profile, etc) of a chat application could use the traditional request/response method over HTTP. Let us drill in a bit and look at the high-level components of the system.

As shown in Figure 12-7, the chat system is broken down into three major categories: stateless services, stateful services, and third-party integration.

刚才我们提到WebSocket被选为主要的通信协议
在客户端和服务器之间进行双向通信时，请务必注意，其他所有内容不必是 WebSocket。事实上，聊天应用程序的大多数功能（注册、登录、用户个人资料等）都可以通过 HTTP 使用传统的请求/响应方法。让我们稍微钻取一下，看看系统的高级组件。

如图 12-7 所示，聊天系统分为三大类：无状态服务、有状态服务和第三方集成。

Stateless Services


Stateless services are traditional public-facing request/response services, used to manage the login, signup, user profile, etc. These are common features among many websites and apps. Stateless services sit behind a load balancer whose job is to route requests to the correct services based on the request paths. These services can be monolithic or individual microservices. We do not need to build many of these stateless services by ourselves as there are services in the market that can be integrated easily. The one service that we will discuss more in deep dive is the service discovery. Its primary job is to give the client a list of DNS host names of chat servers that the client could connect to.

无状态服务是传统的面向公众的请求/响应服务，用于管理登录、注册、用户配置文件等。这些是许多网站和应用程序之间的常见功能。无状态服务位于负载均衡器后面，负载均衡器的工作是根据请求路径将请求路由到正确的服务。这些服务可以是整体式微服务，也可以是单个微服务。我们不需要自己构建许多这些无状态服务，因为市场上有一些可以轻松集成的服务。我们将在深入探讨中详细讨论的一项服务是服务发现。它的主要工作是为客户端提供客户端可以连接到的聊天服务器的 DNS 主机名列表。

Stateful Service


The only stateful service is the chat service. The service is stateful because each client maintains a persistent network connection to a chat server. In this service, a client normally does not switch to another chat server as long as the server is still available. The service discovery coordinates closely with the chat service to avoid server overloading. We will go into detail in deep dive.

唯一的有状态服务是聊天服务。该服务是有状态的，因为每个客户端都维护与聊天服务器的持久网络连接。在此服务中，只要服务器仍然可用，客户端通常不会切换到另一个聊天服务器。服务发现与聊天服务密切协调，以避免服务器过载。我们将在深入探讨中详细介绍。

Third-party integration 第三方集成


For a chat app, push notification is the most important third-party integration. It is a way to inform users when new messages have arrived, even when the app is not running. Proper integration of push notification is crucial. Refer to Chapter 10 Design a notification system for more information.


对于聊天应用程序，推送通知是最重要的第三方集成。这是一种在新消息到达时通知用户的方法，即使应用未运行也是如此。正确集成推送通知至关重要。有关详细信息，请参阅第 10 章 设计通知系统。

Scalability 可扩展性


On a small scale, all services listed above could fit in one server. Even at the scale we design for, it is in theory possible to fit all user connections in one modern cloud server. The number of concurrent connections that a server can handle will most likely be the limiting factor. In our scenario, at 1M concurrent users, assuming each user connection needs 10K of memory on the server (this is a very rough figure and very dependent on the language choice), it only needs about 10GB of memory to hold all the connections on one box.

If we propose a design where everything fits in one server, this may raise a big red flag in the interviewer’s mind. No technologist would design such a scale in a single server. Single server design is a deal breaker due to many factors. The single point of failure is the biggest among them.

However, it is perfectly fine to start with a single server design. Just make sure the interviewer knows this is a starting point. Putting everything we mentioned together, Figure 12-8 shows the adjusted high-level design.

在小规模上，上面列出的所有服务都可以放在一台服务器中。即使在我们设计的规模上，理论上也可以在一个现代云服务器中容纳所有用户连接。服务器可以处理的并发连接数很可能是限制因素。在我们的场景中，在 1M 个并发用户时，假设每个用户连接在服务器上需要 10K 内存（这是一个非常粗略的数字，并且非常依赖于语言选择），它只需要大约 10GB 的内存即可将所有连接保存在一个盒子上。

如果我们提出一个设计，所有东西都适合一台服务器，这可能会在面试官的脑海中引发一个很大的危险信号。没有技术专家会在单个服务器中设计这样的规模。由于许多因素，单服务器设计是一个交易破坏者。单点故障是其中最大的。

但是，从单个服务器设计开始是完全可以的。只要确保面试官知道这是一个起点。将我们提到的所有内容放在一起，图 12-8 显示了调整后的高级设计。


In Figure 12-8, the client maintains a persistent WebSocket connection to a chat server for
real-time messaging.
• Chat servers facilitate message sending/receiving.
• Presence servers manage online/offline status.
• API servers handle everything including user login, signup, change profile, etc.
• Notification servers send push notifications.
• Finally, the key-value store is used to store chat history. When an offline user comes online, she will see all her previous chat history.

在图 12-8 中，客户端维护与聊天服务器的持久 WebSocket 连接
实时消息传递。
• 聊天服务器便于消息发送/接收。
• 在线状态服务器管理联机/脱机状态。
• API 服务器处理所有内容，包括用户登录、注册、更改个人资料等。
• 通知服务器发送推送通知。
• 最后，键值存储用于存储聊天记录。当离线用户联机时，她将看到她之前的所有聊天记录。

Storage


At this point, we have servers ready, services up running and third-party integrations complete. Deep down the technical stack is the data layer. Data layer usually requires some effort to get it correct. An important decision we must make is to decide on the right type of database to use: relational databases or NoSQL databases? To make an informed decision, we will examine the data types and read/write patterns.

Two types of data exist in a typical chat system. The first is generic data, such as user profile, setting, user friends list. These data are stored in robust and reliable relational databases. Replication and sharding are common techniques to satisfy availability and scalability requirements.

The second is unique to chat systems: chat history data. It is important to understand the read/write pattern.

• The amount of data is enormous for chat systems. A previous study [2] reveals that Facebook messenger and Whatsapp process 60 billion messages a day.
• Only recent chats are accessed frequently. Users do not usually look up for old chats.
• Although very recent chat history is viewed in most cases, users might use features that require random access of data, such as search, view your mentions, jump to specific messages, etc. These cases should be supported by the data access layer.
• The read to write ratio is about 1:1 for 1 on 1 chat apps. Selecting the correct storage system that supports all of our use cases is crucial. We recommend key-value stores for the following reasons:
• Key-value stores allow easy horizontal scaling.
• Key-value stores provide very low latency to access data.
• Relational databases do not handle long tail [3] of data well. When the indexes grow large, random access is expensive.
• Key-value stores are adopted by other proven reliable chat applications. For example, both Facebook messenger and Discord use key-value stores. Facebook messenger uses HBase [4], and Discord uses Cassandra [5].


此时，我们已经准备好了服务器，服务正在运行，第三方集成也已完成。技术堆栈的深处是数据层。数据层通常需要一些努力才能使其正确。我们必须做出的一个重要决定是决定使用正确的数据库类型：关系数据库还是NoSQL数据库？为了做出明智的决定，我们将检查数据类型和读/写模式。

典型的聊天系统中存在两种类型的数据。首先是通用数据，例如用户配置文件，设置，用户朋友列表。这些数据存储在强大可靠的关系数据库中。复制和分片是满足可用性和可伸缩性要求的常用技术。

第二个是聊天系统独有的：聊天历史数据。了解读/写模式非常重要。

    •聊天系统的数据量是巨大的。之前的一项研究[2]显示，Facebook Messenger和Whatsapp每天处理600亿条消息。
    •仅经常访问最近的聊天。用户通常不会查找旧聊天。
    • 尽管在大多数情况下会查看最近的聊天记录，但用户可能会使用需要随机访问数据的功能，例如搜索、查看您的提及、跳转到特定消息等。数据访问层应支持这些情况。
    • 1 对 1 聊天应用程序的读写比率约为 1：1。

选择支持我们所有用例的正确存储系统至关重要。出于以下原因，我们建议使用键值存储：
    • 键值存储允许轻松水平扩展。
    • 键值存储提供非常低的数据访问延迟。
    • 关系数据库不能很好地处理长尾数据 [3]。当索引变大时，随机访问的成本很高。
    • 键值存储被其他经过验证的可靠聊天应用程序采用。例如，Facebook Messenger和Discord都使用键值存储。Facebook messenger使用HBase [4]，Discord使用Cassandra [5]。

Data models


Just now, we talked about using key-value stores as our storage layer. The most important data is message data. Let us take a close look.

刚才，我们谈到了使用键值存储作为我们的存储层。最重要的数据是消息数据。让我们仔细看看。

Message table for 1 on 1 chat 1对1聊天的消息表


Figure 12-9 shows the message table for 1 on 1 chat. The primary key is message_id, which helps to decide message sequence. We cannot rely on created_at to decide the message sequence because two messages can be created at the same time.

图 12-9 显示了 1 对 1 聊天的消息表。主键是 message_id，这有助于确定消息顺序。我们不能依靠created_at来决定消息序列，因为可以同时创建两条消息。

Message table for group chat 群聊消息表


Figure 12-10 shows the message table for group chat. The composite primary key is
(channel_id, message_id). Channel and group represent the same meaning here. channel_id is the partition key because all queries in a group chat operate in a channel.

图 12-10 显示了群聊的消息表。复合主键为
（channel_id，message_id）。频道和组在这里表示相同的含义。channel_id是分区键，因为群聊中的所有查询都在频道中运行。

Message ID


How to generate message_id is an interesting topic worth exploring. Message_id carries the responsibility of ensuring the order of messages. To ascertain the order of messages, message_id must satisfy the following two requirements:

    • IDs must be unique.
    • IDs should be sortable by time, meaning new rows have higher IDs than old ones.
    
How can we achieve those two guarantees? The first idea that comes to mind is the
“auto_increment” keyword in MySql. However, NoSQL databases usually do not provide
such a feature.

The second approach is to use a global 64-bit sequence number generator like Snowflake [6]. This is discussed in “Chapter 7: Design a unique ID generator in a distributed system”.

The final approach is to use local sequence number generator. Local means IDs are only unique within a group. The reason why local IDs work is that maintaining message sequence within one-on-one channel or a group channel is sufficient. This approach is easier to implement in comparison to the global ID implementation.

如何生成message_id是一个值得探索的有趣话题。Message_id负责确保消息的顺序。要确定消息的顺序，message_id必须满足以下两个要求：

    • ID 必须是唯一的。
    • ID 应可按时间排序，这意味着新行的 ID 高于旧行。
    
我们如何才能实现这两个保证？想到的第一个想法是
MySql 中的“auto_increment”关键字。但是，NoSQL数据库通常不提供
这样的功能。

第二种方法是使用全局 64 位序列号生成器，如 Snowflake [6]。这在“第7章：在分布式系统中设计唯一的ID生成器”中讨论。

最后一种方法是使用本地序列号生成器。本地表示 ID 仅在组中是唯一的。本地 ID 工作的原因是，在一对一通道或组通道中维护消息序列就足够了。与全局 ID 实现相比，此方法更易于实现。

Step 3 - Design deep dive


In a system design interview, usually you are expected to dive deep into some of the components in the high-level design. For the chat system, service discovery, messaging flows, and online/offline indicators worth deeper exploration.

在系统设计面试中，通常您需要深入研究高级设计中的一些组件。对于聊天系统，服务发现、消息流和在线/离线指标值得深入探索。

Service discovery


The primary role of service discovery is to recommend the best chat server for a client based on the criteria like geographical location, server capacity, etc. Apache Zookeeper [7] is a popular open-source solution for service discovery. It registers all the available chat servers and picks the best chat server for a client based on predefined criteria.

Figure 12-11 shows how service discovery (Zookeeper) works.

服务发现的主要作用是根据地理位置、服务器容量等标准为客户端推荐最佳聊天服务器。Apache Zookeeper [7] 是一种流行的服务发现开源解决方案。它注册所有可用的聊天服务器，并根据预定义的条件为客户端选择最佳聊天服务器。

图 12-11 显示了服务发现 （Zookeeper） 的工作原理。


1. User A tries to log in to the app.
2. The load balancer sends the login request to API servers.
3. After the backend authenticates the user, service discovery finds the best chat server for User A. In this example, server 2 is chosen and the server info is returned back to User A.
4. User A connects to chat server 2 through WebSocket.


1. 用户 A 尝试登录应用程序。
2. 负载均衡器将登录请求发送到 API 服务器。
3. 后端对用户进行身份验证后，服务发现会为用户 A 找到最佳聊天服务器。在此示例中，选择服务器 2，并将服务器信息返回给用户 A。
4. 用户 A 通过 WebSocket 连接到聊天服务器 2。

Message flows


It is interesting to understand the end-to-end flow of a chat system. In this section, we will explore 1 on 1 chat flow, message synchronization across multiple devices and group chat flow.

了解聊天系统的端到端流程很有趣。在本节中，我们将探讨一对一聊天流程、跨多个设备的消息同步和群聊流程。

1 on 1 chat flow


Figure 12-12 explains what happens when User A sends a message to User B.


1. User A sends a chat message to Chat server 1.
2. Chat server 1 obtains a message ID from the ID generator.
3. Chat server 1 sends the message to the message sync queue.
4. The message is stored in a key-value store.
5.a. If User B is online, the message is forwarded to Chat server 2 where User B is
connected.
5.b. If User B is offline, a push notification is sent from push notification (PN) servers.
6. Chat server 2 forwards the message to User B. There is a persistent WebSocket
connection between User B and Chat server 2.


1. 用户 A 向聊天服务器发送聊天消息 1.
2. 聊天服务器 1 从 ID 生成器获取消息 ID。
3. 聊天服务器 1 将消息发送到消息同步队列。
4. 消息存储在键值存储中。
5.a.如果用户 B 处于联机状态，则消息将转发到用户 B 所在的聊天服务器 2连接。
5.b.如果用户 B 处于脱机状态，则会从推送通知 （PN） 服务器发送推送通知。
6. 聊天服务器 2 将消息转发给用户 B。有一个持久的 WebSocket
用户 B 和聊天服务器 2 之间的连接。

Message synchronization across multiple devices 跨多个设备的消息同步


Many users have multiple devices. We will explain how to sync messages across multiple devices. Figure 12-13 shows an example of message synchronization.

许多用户拥有多个设备。我们将解释如何在多个设备之间同步消息。图 12-13 显示了消息同步的示例。


In Figure 12-13, user A has two devices: a phone and a laptop. When User A logs in to the chat app with her phone, it establishes a WebSocket connection with Chat server 1. Similarly, there is a connection between the laptop and Chat server 1.


Each device maintains a variable called cur_max_message_id, which keeps track of the latest message ID on the device. Messages that satisfy the following two conditions are considered as news messages:
    • The recipient ID is equal to the currently logged-in user ID.
    • Message ID in the key-value store is larger than cur_max_message_id .
With distinct cur_max_message_id on each device, message synchronization is easy as each device can get new messages from the KV store.

在图 12-13 中，用户 A 有两个设备：手机和笔记本电脑。当用户 A 使用手机登录到聊天应用时，它会与聊天服务器 1 建立 WebSocket 连接。同样，笔记本电脑和聊天服务器 1 之间存在连接。

每个设备都维护一个名为 cur_max_message_id 的变量，该变量跟踪设备上的最新消息 ID。满足以下两个条件的消息被视为新闻消息：
    • 收件人 ID 等于当前登录的用户 ID。
    • 键值存储中的消息 ID 大于 cur_max_message_id 。
由于每台设备上都有不同的cur_max_message_id，因此消息同步很容易，因为每个设备都可以从 KV 存储中获取新消息。

Small group chat flow 小群聊流程


In comparison to the one-on-one chat, the logic of group chat is more complicated. Figures 12-14 and 12-15 explain the flow.

相比一对一聊天，群聊的逻辑更加复杂。图 12-14 和 12-15 解释了流程。


Figure 12-14 explains what happens when User A sends a message in a group chat. Assume there are 3 members in the group (User A, User B and user C). First, the message from User A is copied to each group member’s message sync queue: one for User B and the second for User C. You can think of the message sync queue as an inbox for a recipient. This design choice is good for small group chat because:

    • it simplifies message sync flow as each client only needs to check its own inbox to get new messages.
    • when the group number is small, storing a copy in each recipient’s inbox is not too expensive.
    
    
WeChat uses a similar approach, and it limits a group to 500 members [8]. However, for groups with a lot of users, storing a message copy for each member is not acceptable.

On the recipient side, a recipient can receive messages from multiple users. Each recipient has an inbox (message sync queue) which contains messages from different senders. Figure 12-15 illustrates the design.


图 12-14 说明了当用户 A 在群聊中发送消息时会发生什么情况。假设组中有 3 个成员（用户 A、用户 B 和用户 C）。首先，将来自用户 A 的消息复制到每个组成员的邮件同步队列：一个用于用户 B，另一个用于用户 C。您可以将邮件同步队列视为收件人的收件箱。此设计选择适用于小型群聊，因为：

    • 它简化了消息同步流程，因为每个客户端只需要检查自己的收件箱即可获取新消息。
    • 当组号较小时，在每个收件人的收件箱中存储一份副本不会太昂贵。
    
微信使用类似的方法，它将一个群组限制为500名成员[8]。但是，对于具有大量用户的组，不能为每个成员存储邮件副本。

在收件人端，收件人可以接收来自多个用户的邮件。每个收件人都有一个收件箱（邮件同步队列），其中包含来自不同发件人的邮件。图 12-15 说明了该设计。

Online presence 在线状态


An online presence indicator is an essential feature of many chat applications. Usually, you can see a green dot next to a user’s profile picture or username. This section explains what happens behind the scenes.

In the high-level design, presence servers are responsible for managing online status and communicating with clients through WebSocket. There are a few flows that will trigger online status change. Let us examine each of them.

在线状态指示器是许多聊天应用程序的基本功能。通常，您可以在用户的个人资料图片或用户名旁边看到一个绿点。本节介绍幕后发生的事情。

在高级设计中，状态服务器负责管理联机状态并通过 WebSocket 与客户端通信。有一些流会触发联机状态更改。让我们逐一检查它们。


The user login flow is explained in the “Service Discovery” section. After a WebSocket connection is built between the client and the real-time service, user A’s online status and last_active_at timestamp are saved in the KV store. Presence indicator shows the user is online after she logs in.


用户登录流程在“服务发现”部分中进行了说明。客户端与实时服务建立WebSocket连接后，用户A的在线状态和last_active_at时间戳将保存在KV存储中。状态指示器显示用户在登录后处于联机状态。

User logout


When a user logs out, it goes through the user logout flow as shown in Figure 12-17. The online status is changed to offline in the KV store. The presence indicator shows a user is offline.

当用户注销时，它将经历用户注销流程，如图 12-17 所示。KV商店中的联机状态更改为脱机。状态指示器显示用户处于脱机状态。

User disconnection


We all wish our internet connection is consistent and reliable. However, that is not always the case; thus, we must address this issue in our design. When a user disconnects from the internet, the persistent connection between the client and server is lost. A naive way to handle user disconnection is to mark the user as offline and change the status to online when the connection re-establishes. However, this approach has a major flaw. It is common for users to disconnect and reconnect to the internet frequently in a short time. For example, network
connections can be on and off while a user goes through a tunnel. Updating online status on every disconnect/reconnect would make the presence indicator change too often, resulting in poor user experience.

We introduce a heartbeat mechanism to solve this problem. Periodically, an online client sends a heartbeat event to presence servers. If presence servers receive a heartbeat event within a certain time, say x seconds from the client, a user is considered as online. Otherwise, it is offline.

In Figure 12-18, the client sends a heartbeat event to the server every 5 seconds. After sending 3 heartbeat events, the client is disconnected and does not reconnect within x = 30 seconds (This number is arbitrarily chosen to demonstrate the logic). The online status is changed to offline.

我们都希望我们的互联网连接是一致和可靠的。然而，情况并非总是如此;因此，我们必须在设计中解决这个问题。当用户与互联网断开连接时，客户端和服务器之间的持久连接将丢失。处理用户断开连接的一种天真方法是将用户标记为脱机，并在重新建立连接时将状态更改为联机。但是，这种方法存在重大缺陷。用户在短时间内频繁断开连接并重新连接到互联网是很常见的。例如，网络
当用户通过隧道时，连接可以打开和关闭。在每次断开连接/重新连接时更新联机状态会使状态指示器更改过于频繁，从而导致用户体验不佳。

我们引入了一种心跳机制来解决这个问题。联机客户端会定期向状态服务器发送检测信号事件。如果状态服务器在特定时间内（例如来自客户端的 x 秒）收到检测信号事件，则用户被视为联机。否则，它将处于脱机状态。

在图 12-18 中，客户端每 5 秒向服务器发送一次检测信号事件。发送 3 个心跳事件后，客户端断开连接，在 x = 30 秒内不会重新连接（此数字任意选择以演示逻辑）。联机状态更改为脱机。

Online status fanout


How do user A’s friends know about the status changes? Figure 12-19 explains how it works. Presence servers use a publish-subscribe model, in which each friend pair maintains a channel. When User A’s online status changes, it publishes the event to three channels, channel A-B, A-C, and A-D. Those three channels are subscribed by User B, C, and D, respectively. Thus, it is easy for friends to get online status updates. The communication between clients and servers is through real-time WebSocket.

用户 A 的朋友如何知道状态更改？图 12-19 说明了它的工作原理。状态服务器使用发布-订阅模型，其中每个好友对维护一个通道。当用户 A 的联机状态更改时，它会将事件发布到三个通道：通道 A-B、A-C 和 A-D。这三个频道分别由用户 B、C 和 D 订阅。因此，朋友很容易获得在线状态更新。客户端和服务器之间的通信是通过实时 WebSocket 进行的。


The above design is effective for a small user group. For instance, WeChat uses a similar approach because its user group is capped to 500. For larger groups, informing all members about online status is expensive and time consuming. Assume a group has 100,000 members. Each status change will generate 100,000 events. To solve the performance bottleneck, a possible solution is to fetch online status only when a user enters a group or manually refreshes the friend list.

以上设计对小用户群体有效。例如，微信使用类似的方法，因为它的用户组上限为500。对于较大的团体，通知所有成员有关在线状态既昂贵又耗时。假设一个组有 100，000 个成员。每次状态更改将生成 100，000 个事件。为了解决性能瓶颈，一个可能的解决方案是仅在用户进入群组或手动刷新好友列表时获取在线状态。

Step 4 - Wrap up


In this chapter, we presented a chat system architecture that supports both 1-to-1 chat and small group chat. WebSocket is used for real-time communication between the client and server. The chat system contains the following components: chat servers for real-time messaging, presence servers for managing online presence, push notification servers for sending push notifications, key-value stores for chat history persistence and API servers for other functionalities.

If you have extra time at the end of the interview, here are additional talking points:
• Extend the chat app to support media files such as photos and videos. Media files are significantly larger than text in size. Compression, cloud storage, and thumbnails are interesting topics to talk about.
• End-to-end encryption. Whatsapp supports end-to-end encryption for messages. Only the sender and the recipient can read messages. Interested readers should refer to the article in the reference materials [9].
• Caching messages on the client-side is effective to reduce the data transfer between the client and server.
• Improve load time. Slack built a geographically distributed network to cache users’ data, channels, etc. for better load time [10].
• Error handling.
• The chat server error. There might be hundreds of thousands, or even more persistent connections to a chat server. If a chat server goes offline, service discovery (Zookeeper) will provide a new chat server for clients to establish new connections with.
• Message resent mechanism. Retry and queueing are common techniques for resending messages. 

Congratulations on getting this far! Now give yourself a pat on the back. Good job!

在本章中，我们介绍了一种支持一对一聊天和小组聊天的聊天系统架构。WebSocket 用于客户端和服务器之间的实时通信。聊天系统包含以下组件：用于实时消息传递的聊天服务器、用于管理在线状态的状态服务器、用于发送推送通知的推送通知服务器、用于聊天历史记录持久性的键值存储以及用于其他功能的 API 服务器。

如果您在面试结束时有额外的时间，这里有额外的谈话要点：
• 扩展聊天应用程序以支持照片和视频等媒体文件。媒体文件的大小明显大于文本。压缩、云存储和缩略图是值得讨论的有趣话题。
• 端到端加密。Whatsapp支持消息的端到端加密。只有发件人和收件人可以阅读邮件。有兴趣的读者请参考参考资料中的文章[9]。
• 在客户端缓存消息可以有效地减少客户端和服务器之间的数据传输。
• 缩短加载时间。Slack 构建了一个地理分布式网络来缓存用户的数据、频道等。以获得更好的加载时间 [10]。
• 错误处理。
• 聊天服务器错误。可能有数十万甚至更多的持久连接到聊天服务器。如果聊天服务器脱机，服务发现 （Zookeeper） 将为客户端提供新的聊天服务器以建立新的连接。
• 消息重新发送机制。重试和排队是重新发送消息的常用技术。

恭喜你走到了这一步！现在拍拍自己的背。干得好！

Reference materials


[1] Erlang at Facebook: https://www.erlangfactory.com/upload/presentations/31/EugeneLetuchy-ErlangatFacebook.pdf
[2] Messenger and WhatsApp process 60 billion messages a day:
https://www.theverge.com/2016/4/12/11415198/facebook-messenger-whatsapp-numbermessages-vs-sms-f8-2016
[3] Long tail: https://en.wikipedia.org/wiki/Long_tail
[4] The Underlying Technology of Messages: https://www.facebook.com/notes/facebookengineering/the-underlying-technology-of-messages/454991608919/
[5] How Discord Stores Billions of Messages: https://blog.discordapp.com/how-discordstores-billions-of-messages-7fa6ec7ee4c7
[6] Announcing Snowflake: https://blog.twitter.com/engineering/en_us/a/2010/announcingsnowflake.html
[7] Apache ZooKeeper: https://zookeeper.apache.org/
[8] From nothing: the evolution of WeChat background system (Article in Chinese):
https://www.infoq.cn/article/the-road-of-the-growth-weixin-background
[9] End-to-end encryption: https://faq.whatsapp.com/en/android/28030015/
[10] Flannel: An Application-Level Edge Cache to Make Slack Scale:
https://slack.engineering/flannel-an-application-level-edge-cache-to-make-slack-scaleb8a6400e2f6b

目录

CHAPTER 12: DESIGN A CHAT SYSTEM

Step 1 - Understand the problem and establish design scope

Step 2 - Propose high-level design and get buy-in

Polling

Long polling 长轮询

Websocket

High-level design

Stateless Services

Stateful Service

Third-party integration 第三方集成

Scalability 可扩展性

Storage

Data models

Message ID

Step 3 - Design deep dive

Service discovery

Message flows

Message synchronization across multiple devices 跨多个设备的消息同步

Small group chat flow 小群聊流程

Online presence 在线状态

User login

User logout

User disconnection

Online status fanout

Step 4 - Wrap up

Reference materials