A notification system has already become a very popular feature for many applications in recent years. A notification alerts a user with important information like breaking news, product updates, events, offerings, etc. It has become an indispensable part of our daily life. In this chapter, you are asked to design a notification system. A notification is more than just mobile push notification. Three types of notification formats are: mobile push notification, SMS message, and Email. Figure 10-1 shows an example of each of these notifications. 通知系统已经成为许多应用程序中非常流行的功能 近年来。通知会提醒用户重要信息,例如突发新闻, 产品更新、活动、产品等它已成为我们日常生活中不可或缺的一部分。 在本章中,要求您设计一个通知系统。 通知不仅仅是移动推送通知。三种类型的通知格式 分别是:移动推送通知、短信和电子邮件。图 10-1 显示了 这些通知中的每一个。
Building a scalable system that sends out millions of notifications a day is not an easy task. It requires a deep understanding of the notification ecosystem. The interview question is purposely designed to be open-ended and ambiguous, and it is your responsibility to ask questions to clarify the requirements. 构建一个每天发送数百万条通知的可扩展系统并非易事。它 需要深入了解通知生态系统。面试问题是 故意设计为开放式和模棱两可,您有责任询问 问题以澄清要求。
Candidate: What types of notifications does the system support? Interviewer: Push notification, SMS message, and email. Candidate: Is it a real-time system? Interviewer: Let us say it is a soft real-time system. We want a user to receive notifications as soon as possible. However, if the system is under a high workload, a slight delay is acceptable. Candidate: What are the supported devices? Interviewer: iOS devices, android devices, and laptop/desktop. Candidate: What triggers notifications? Interviewer: Notifications can be triggered by client applications. They can also be scheduled on the server-side. Candidate: Will users be able to opt-out? Interviewer: Yes, users who choose to opt-out will no longer receive notifications. Candidate: How many notifications are sent out each day? Interviewer: 10 million mobile push notifications, 1 million SMS messages, and 5 million emails. 应聘者:系统支持哪些类型的通知? 面试官:推送通知、短信和电子邮件。 应聘者:是实时系统吗? 采访者:假设这是一个软实时系统。我们希望用户收到通知 尽快。但是,如果系统处于高工作负载下,则轻微的延迟是 可以接受。 应聘者:支持哪些设备? 面试官:iOS 设备、安卓设备和笔记本电脑/台式机。 应聘者:什么触发了通知? 主持人:通知可以由客户端应用程序触发。它们也可以是 计划在服务器端。 应聘者:用户是否可以选择退出? 主持人:是的,选择退出的用户将不再收到通知。 应聘者:每天发出多少个通知? 主持人:1000万条移动推送通知、100万条短信和500万条 电子邮件。
This section shows the high-level design that supports various notification types: iOS push notification, Android push notification, SMS message, and Email. It is structured as follows: • Different types of notifications • Contact info gathering flow • Notification sending/receiving flow 本节显示支持各种通知类型的高级设计:iOS 推送 通知、安卓推送通知、短信和电子邮件。它的结构如下: • 不同类型的通知 • 联系信息收集流程 • 通知发送/接收流程
We start by looking at how each notification type works at a high level.
We primary need three components to send an iOS push notification: • Provider. A provider builds and sends notification requests to Apple Push Notification Service (APNS). To construct a push notification, the provider provides the following data: • Device token: This is a unique identifier used for sending push notifications. • Payload: This is a JSON dictionary that contains a notification’s payload. Here is an example: 我们主要需要三个组件来发送 iOS 推送通知: •供应商。提供程序构建通知请求并将其发送到 Apple 推送通知 服务 (APNS)。要构造推送通知,提供程序提供以下内容 数据: • 设备令牌:这是用于发送推送通知的唯一标识符。 • 有效负载:这是一个包含通知有效负载的 JSON 字典。这是一个用例:
• APNS: This is a remote service provided by Apple to propagate push notifications to iOS devices. • iOS Device: It is the end client, which receives push notifications. • APNS:这是 Apple 提供的一项远程服务,用于将推送通知传播到 iOS 设备。 • iOS 设备:它是接收推送通知的最终客户端。
Android adopts a similar notification flow. Instead of using APNs, Firebase Cloud Messaging (FCM) is commonly used to send push notifications to android devices. Android 采用类似的通知流程。Firebase 云消息传递不使用 APN,而是使用 APNs。 (FCM) 通常用于向安卓设备发送推送通知。
For SMS messages, third party SMS services like Twilio [1], Nexmo [2], and many others are commonly used. Most of them are commercial services. 对于SMS消息,第三方SMS服务,如Twilio [1],Nexmo [2]和许多其他常用服务 。其中大多数是商业服务。
Although companies can set up their own email servers, many of them opt for commercial email services. Sendgrid [3] and Mailchimp [4] are among the most popular email services, which offer a better delivery rate and data analytics. 尽管公司可以设置自己的电子邮件服务器,但其中许多公司选择商业 电子邮件服务。Sendgrid [3] 和 Mailchimp [4] 是最受欢迎的电子邮件服务之一, 提供更好的交付率和数据分析。
Figure 10-6 shows the design after including all the third-party services.
To send notifications, we need to gather mobile device tokens, phone numbers, or email addresses. As shown in Figure 10-7, when a user installs our app or signs up for the first time, API servers collect user contact info and store it in the database. 要发送通知,我们需要收集移动设备令牌、电话号码或电子邮件 地址。如图 10-7 所示,当用户安装我们的应用或首次注册时, API 服务器收集用户联系信息并将其存储在数据库中。
Figure 10-8 shows simplified database tables to store contact info. Email addresses and phone numbers are stored in the user table, whereas device tokens are stored in the device table. A user can have multiple devices, indicating that a push notification can be sent to all the user devices. 图 10-8 显示了用于存储联系人信息的简化数据库表。电子邮件地址和电话 数字存储在用户表中,而设备令牌存储在设备表中。一个 用户可以拥有多个设备,表示可以向所有用户发送推送通知 设备。
We will first present the initial design; then, propose some optimizations.
Figure 10-9 shows the design, and each system component is explained below. 图10-9显示了设计,下面对每个系统组件进行了说明。
Service 1 to N: A service can be a micro-service, a cron job, or a distributed system that triggers notification sending events. For example, a billing service sends emails to remind customers of their due payment or a shopping website tells customers that their packages will be delivered tomorrow via SMS messages. Notification system: The notification system is the centerpiece of sending/receiving notifications. Starting with something simple, only one notification server is used. It provides APIs for services 1 to N, and builds notification payloads for third party services. Third-party services: Third party services are responsible for delivering notifications to users. While integrating with third-party services, we need to pay extra attention to extensibility. Good extensibility means a flexible system that can easily plugging or unplugging of a third-party service. Another important consideration is that a third-party service might be unavailable in new markets or in the future. For instance, FCM is unavailable in China. Thus, alternative third-party services such as Jpush, PushY, etc are used there. 服务 1 到 N:服务可以是微服务、cron 作业或分布式系统触发通知发送事件。例如,计费服务发送电子邮件以提醒客户应付款或购物网站告诉客户他们的包裹将通过短信发送。 通知系统:通知系统是发送/接收的核心。从简单的事情开始,只使用一个通知服务器。它提供 服务 1 到 N 的 API,并为第三方服务构建通知有效负载。 第三方服务:第三方服务负责将通知发送到用户。在与第三方服务集成的同时,我们需要格外注意 扩展。良好的可扩展性意味着一个灵活的系统,可以轻松插入或拔出第三方服务。另一个重要的考虑因素是第三方 服务可能无法在新市场或将来使用。例如,FCM 是在中国不可用。因此,使用了替代的第三方服务,如Jpush,PushY等。
iOS, Android, SMS, Email: Users receive notifications on their devices. Three problems are identified in this design: • Single point of failure (SPOF): A single notification server means SPOF. • Hard to scale: The notification system handles everything related to push notifications in one server. It is challenging to scale databases, caches, and different notification processing components independently. • Performance bottleneck: Processing and sending notifications can be resource intensive. For example, constructing HTML pages and waiting for responses from third party services could take time. Handling everything in one system can result in the system overload, especially during peak hours. iOS、安卓、短信、电子邮件:用户在其设备上接收通知。 此设计中发现了三个问题: • 单点故障 (SPOF):单个通知服务器意味着 SPOF。 • 难以扩展:通知系统处理与推送通知相关的所有内容在一台服务器。扩展数据库、缓存和不同的通知具有独立处理组件。 • 性能瓶颈:处理和发送通知可能会占用大量资源。例如,构建 HTML 页面并等待第三方的响应 服务可能需要时间。在一个系统中处理所有内容可以产生系统过载,尤其是在高峰时段。
After enumerating challenges in the initial design, we improve the design as listed below: • Move the database and cache out of the notification server. • Add more notification servers and set up automatic horizontal scaling. • Introduce message queues to decouple the system components. Figure 10-10 shows the improved high-level design. 在列举了初始设计中的挑战之后,我们改进了设计,如下所示: • 将数据库和缓存移出通知服务器。 • 添加更多通知服务器并设置自动水平缩放。 • 引入消息队列以分离系统组件。 图 10-10 显示了改进的高级设计。
The best way to go through the above diagram is from left to right: Service 1 to N: They represent different services that send notifications via APIs provided by notification servers. Notification servers: They provide the following functionalities: • Provide APIs for services to send notifications. Those APIs are only accessible internally or by verified clients to prevent spams. • Carry out basic validations to verify emails, phone numbers, etc. • Query the database or cache to fetch data needed to render a notification. • Put notification data to message queues for parallel processing. Here is an example of the API to send an email: POST https://api.example.com/v/sms/send Request body 浏览上图的最佳方法是从左到右: 服务 1 到 N:它们表示通过 API 发送通知的不同通知服务器。 通知服务器:它们提供以下功能: • 为服务提供发送通知的 API。这些 API 只能在内部访问或由经过验证的客户防止垃圾邮件。 • 执行基本验证以验证电子邮件、电话号码等。 • 查询数据库或缓存以获取呈现通知所需的数据。 • 将通知数据放入消息队列进行并行处理。 下面是发送电子邮件的 API 示例: 发布 https://api.example.com/v/sms/send 请求正文
Cache: User info, device info, notification templates are cached. DB: It stores data about user, notification, settings, etc. Message queues: They remove dependencies between components. Message queues serve as buffers when high volumes of notifications are to be sent out. Each notification type is assigned with a distinct message queue so an outage in one third-party service will not affect other notification types. Workers: Workers are a list of servers that pull notification events from message queues and send them to the corresponding third-party services. Third-party services: Already explained in the initial design. iOS, Android, SMS, Email: Already explained in the initial design. 缓存:缓存用户信息、设备信息、通知模板。 DB:它存储有关用户,通知,设置等的数据。 消息队列:它们删除组件之间的依赖关系。消息队列充当要发送大量通知时的缓冲区。每种通知类型都是 分配了不同的消息队列,因此一个第三方服务中的中断不会影响其他通知类型。 工作线程:工作线程是从消息队列中提取通知事件的服务器列表,并且将它们发送到相应的第三方服务。 第三方服务:在初始设计中已经解释过。 iOS,Android,短信,电子邮件:在初始设计中已经解释过。
Next, let us examine how every component works together to send a notification: 1. A service calls APIs provided by notification servers to send notifications. 2. Notification servers fetch metadata such as user info, device token, and notification setting from the cache or database. 3. A notification event is sent to the corresponding queue for processing. For instance, an iOS push notification event is sent to the iOS PN queue. 4. Workers pull notification events from message queues. 5. Workers send notifications to third party services. 6. Third-party services send notifications to user devices. 接下来,让我们看看每个组件如何协同工作以发送通知: 1. 服务调用通知服务器提供的 API 发送通知。 2. 通知服务器获取元数据,例如用户信息、设备令牌和通知 缓存或数据库中的设置。 3. 将通知事件发送到相应的队列进行处理。例如,一个iOS 推送通知事件将发送到 iOS PN 队列。 4. 工作人员从消息队列中提取通知事件。 5. 工作人员向第三方服务发送通知。 6. 第三方服务向用户设备发送通知。
In the high-level design, we discussed different types of notifications, contact info gathering flow, and notification sending/receiving flow. We will explore the following in deep dive: • Reliability. • Additional component and considerations: notification template, notification settings, rate limiting, retry mechanism, security in push notifications, monitor queued notifications and event tracking. • Updated design. 在高级设计中,我们讨论了不同类型的通知,联系信息收集 流和通知发送/接收流。我们将深入探讨以下内容: •可靠性。 • 其他组件和注意事项:通知模板、通知设置、速率限制、重试机制、推送通知中的安全性、监视排队的通知 和事件跟踪。 • 更新了设计。
We must answer a few important reliability questions when designing a notification system in distributed environments.
One of the most important requirements in a notification system is that it cannot lose data. Notifications can usually be delayed or re-ordered, but never lost. To satisfy this requirement, the notification system persists notification data in a database and implements a retry mechanism. The notification log database is included for data persistence, as shown in Figure 10-11. 通知系统中最重要的要求之一是它不会丢失数据。通知通常可以延迟或重新排序,但绝不会丢失。为了满足这一要求,通知系统将通知数据保留在数据库中并实现重试机制。包含通知日志数据库以实现数据持久化,如图所示 10-11.
The short answer is no. Although notification is delivered exactly once most of the time, the distributed nature could result in duplicate notifications. To reduce the duplication occurrence, we introduce a dedupe mechanism and handle each failure case carefully. Here is a simple dedupe logic: When a notification event first arrives, we check if it is seen before by checking the event ID. If it is seen before, it is discarded. Otherwise, we will send out the notification. For interested readers to explore why we cannot have exactly once delivery, refer to the reference material [5]. 简短的回答是否定的。尽管大多数情况下通知只传递一次,但分布式性质可能会导致重复通知。减少重复 发生重复数据删除机制时,我们引入了重复数据删除机制,并仔细处理每个故障情况。这是一个简单的重复数据删除逻辑: 当通知事件首次到达时,我们通过检查事件 ID 来检查之前是否看到它。 如果以前见过,则将其丢弃。否则,我们将发出通知。对于感兴趣的 读者要探索为什么我们不能有一次交货,请参阅参考资料[5].
We have discussed how to collect user contact info, send, and receive a notification. A notification system is a lot more than that. Here we discuss additional components including template reusing, notification settings, event tracking, system monitoring, rate limiting, etc. 我们已经讨论了如何收集用户联系信息、发送和接收通知。一个 通知系统远不止于此。在这里,我们讨论其他组件,包括 模板复用、通知设置、事件跟踪、系统监控、限速等
A large notification system sends out millions of notifications per day, and many of these notifications follow a similar format. Notification templates are introduced to avoid building every notification from scratch. A notification template is a preformatted notification to create your unique notification by customizing parameters, styling, tracking links, etc. Here is an example template of push notifications. BODY: You dreamed of it. We dared it. [ITEM NAME] is back — only until [DATE]. CTA: Order Now. Or, Save My [ITEM NAME] The benefits of using notification templates include maintaining a consistent format, reducing the margin error, and saving time. 大型通知系统每天发送数百万条通知,其中许多通知 通知遵循类似的格式。引入通知模板以避免构建 从头开始的每个通知。通知模板是预先格式化的通知 通过自定义参数、样式、跟踪链接等来创建您独特的通知。这是 推送通知的示例模板。 BODY: You dreamed of it. We dared it. [ITEM NAME] is back — only until [DATE]. CTA: Order Now. Or, Save My [ITEM NAME] 使用通知模板的好处包括保持一致的格式,减少边距误差,节省时间。
Users generally receive way too many notifications daily and they can easily feel overwhelmed. Thus, many websites and apps give users fine-grained control over notification settings. This information is stored in the notification setting table, with the following fields: user_id bigInt channel varchar # push notification, email or SMS opt_in boolean # opt-in to receive notification Before any notification is sent to a user, we first check if a user is opted-in to receive this type of notification. 用户通常每天收到太多通知,他们很容易感觉到淹没。因此,许多网站和应用程序为用户提供了对通知的细粒度控制设置。此信息存储在通知设置表中,其中包含以下字段: user_id bigInt channel varchar # 推送通知、电子邮件或短信 opt_in布尔值 # 选择接收通知 在向用户发送任何通知之前,我们首先会检查用户是否选择接收此类型的通知。
To avoid overwhelming users with too many notifications, we can limit the number of notifications a user can receive. This is important because receivers could turn off notifications completely if we send too often. 为避免用户收到过多的通知,我们可以限制 用户可以接收的通知。这很重要, 如果我们发送太频繁,接受器可能会关闭通知。
When a third-party service fails to send a notification, the notification will be added to the message queue for retrying. If the problem persists, an alert will be sent out to developers. 当第三方服务无法发送通知时,通知将添加到 用于重试的消息队列。如果问题仍然存在,则会向开发人员发送警报。
For iOS or Android apps, appKey and appSecret are used to secure push notification APIs [6]. Only authenticated or verified clients are allowed to send push notifications using our APIs. Interested users should refer to the reference material [6]. 对于 iOS 或 Android 应用程序,appKey 和 appSecret 用于保护推送通知 API [6]. 只有经过身份验证或经过验证的客户才能使用我们的apis。感兴趣的用户应参考参考资料[6]。
A key metric to monitor is the total number of queued notifications. If the number is large, the notification events are not processed fast enough by workers. To avoid delay in the notification delivery, more workers are needed. Figure 10-12 (credit to [7]) shows an example of queued messages to be processed. 要监视的一个关键指标是排队通知的总数。如果数量很大, 工作人员处理通知事件的速度不够快。为避免延迟 通知传递,需要更多的工人。图 10-12(归功于 [7])显示了 要处理的排队消息的示例。
Notification metrics, such as open rate, click rate, and engagement are important in understanding customer behaviors. Analytics service implements events tracking. Integration between the notification system and the analytics service is usually required. Figure 10-13 shows an example of events that might be tracked for analytics purposes. 通知指标(如打开率、点击率和参与度)在 了解客户行为。分析服务实现事件跟踪。集成 通常需要在通知系统和分析服务之间。图 10-13 显示了可能出于分析目的而跟踪的事件示例。
Putting everything together, Figure 10-14 shows the updated notification system design.
In this design, many new components are added in comparison with the previous design. • The notification servers are equipped with two more critical features: authentication and rate-limiting. • We also add a retry mechanism to handle notification failures. If the system fails to send notifications, they are put back in the messaging queue and the workers will retry for a predefined number of times. • Furthermore, notification templates provide a consistent and efficient notification creation process. • Finally, monitoring and tracking systems are added for system health checks and future improvements. 在此设计中,与以前的设计相比,添加了许多新组件。 • 通知服务器还配备了两个关键功能:身份验证和速率限制。 • 我们还添加了重试机制来处理通知失败。如果系统发送失败通知,它们被放回消息队列中,工作人员将重试 预定义的次数。 • 此外,通知模板提供一致且高效的通知创建过程。 • 最后,添加监控和跟踪系统,用于系统运行状况检查和未来改进。
Notifications are indispensable because they keep us posted with important information. It could be a push notification about your favorite movie on Netflix, an email about discounts on new products, or a message about your online shopping payment confirmation. In this chapter, we described the design of a scalable notification system that supports multiple notification formats: push notification, SMS message, and email. We adopted message queues to decouple system components. Besides the high-level design, we dug deep into more components and optimizations. • Reliability: We proposed a robust retry mechanism to minimize the failure rate. • Security: AppKey/appSecret pair is used to ensure only verified clients can send notifications. • Tracking and monitoring: These are implemented in any stage of a notification flow to capture important stats. • Respect user settings: Users may opt-out of receiving notifications. Our system checks user settings first before sending notifications. • Rate limiting: Users will appreciate a frequency capping on the number of notifications they receive. Congratulations on getting this far! Now give yourself a pat on the back. Good job! 通知是必不可少的,因为它们让我们随时发布重要信息。它可能是关于您在 Netflix 上最喜欢的电影的推送通知,可能是关于折扣的电子邮件关于新产品,或有关您的在线购物付款确认的消息。 在本章中,我们描述了一个可扩展的通知系统的设计,该系统支持多种通知格式:推送通知、短信和电子邮件。我们采用了用于分离系统组件的消息队列。除了高级设计之外,我们还深入研究了更多组件和优化。 • 可靠性:我们提出了一种强大的重试机制,以最大限度地降低故障率。 • 安全性:AppKey/appSecret 对用于确保只有经过验证的客户端才能发送通知。 • 跟踪和监视:这些在通知流的任何阶段实现捕获重要统计数据。 • 尊重用户设置:用户可以选择不接收通知。我们的系统检查在发送通知之前先进行用户设置。 • 速率限制:用户将喜欢收到通知数量的频率上限 恭喜你走到了这一步!现在拍拍自己的背。干得好!
[1] Twilio SMS: https://www.twilio.com/sms [2] Nexmo SMS: https://www.nexmo.com/products/sms [3] Sendgrid: https://sendgrid.com/ [4] Mailchimp: https://mailchimp.com/ [5] You Cannot Have Exactly-Once Delivery: https://bravenewgeek.com/you-cannot-haveexactly-once-delivery/ [6] Security in Push Notifications: https://cloud.ibm.com/docs/services/mobilepush? topic=mobile-pushnotification-security-in-push-notifications [7] RadditMQ: https://bit.ly/2sotIa6
本文作者:Eric
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!