2023-06-05

CHAPTER 9: DESIGN A WEB CRAWLER


In this chapter, we focus on web crawler design: an interesting and classic system design
interview question. 
A web crawler is known as a robot or spider. It is widely used by search engines to discover
new or updated content on the web. Content can be a web page, an image, a video, a PDF
file, etc. A web crawler starts by collecting a few web pages and then follows links on those
pages to collect new content. Figure 9-1 shows a visual example of the crawl process.

在本章中，我们重点介绍网络爬虫设计：一个有趣而经典的系统设计
面试问题。
网络爬虫被称为机器人或蜘蛛。它被搜索引擎广泛用于发现
网络上新增或更新的内容。内容可以是网页、图像、视频、PDF
文件等。网络爬虫首先收集一些网页，然后跟踪这些网页上的链接
页面以收集新内容。图 9-1 显示了爬网过程的直观示例。

阅读全文

《系统设计访谈：内幕指南》08设计 URL 缩短器

编辑

2023-06-05

👨‍🎓 无限进步

CHAPTER 8: DESIGN A URL SHORTENER



In this chapter, we will tackle an interesting and classic system design interview question:
designing a URL shortening service like tinyurl.

在本章中，我们将解决一个有趣而经典的系统设计面试问题：
设计一个像tinyurl这样的URL缩短服务。

阅读全文

《系统设计访谈：内幕指南》07 在分布式系统中设计唯一 ID 生成器

编辑

2023-06-02

👨‍🎓 无限进步

CHAPTER 7: DESIGN A UNIQUE ID GENERATOR IN DISTRIBUTED SYSTEMS


In this chapter, you are asked to design a unique ID generator in distributed systems. Your
first thought might be to use a primary key with the auto_increment attribute in a traditional
database. However, auto_increment does not work in a distributed environment because a
single database server is not large enough and generating unique IDs across multiple
databases with minimal delay is challenging.
Here are a few examples of unique IDs:

在本章中，您将被要求在分布式系统中设计一个唯一的 ID 生成器。

你首先想到的可能是在传统数据库中使用具有 auto_increment 属性的主键。但是，auto_increment在分布式环境中不起作用，因为单个数据库服务器不够大，并且跨多个数据库服务器生成唯一 ID

延迟最小的数据库具有挑战性。
以下是唯一 ID 的几个示例：

2023-06-02

前言

正常情况下，Redis集群中数据都是均匀分配到每个节点，请求也会均匀的分布到每个分片上，但在一些特殊场景中，比如外部爬虫、攻击、热点商品等，最典型的就是明星在微博上宣布离婚，吃瓜群众纷纷涌入路演，导致微博评论功能崩溃，这种短时间内某些key访问量过于大，对于这种相同的key会请求到同一台数据分片上，导致该分片负责较高成为瓶颈问题，导致雪崩等一些问题。

2023-06-01

为什么不推荐使用物理外键

之前在设计数据库的时候，有思考过这个问题，到底应该在什么情况下使用外键，什么情况不使用外键？

阅读全文