To achieve horizontal scaling, it is important to distribute requests/data efficiently and evenly across servers. Consistent hashing is a commonly used technique to achieve this goal. But first, let us take an in-depth look at the problem. 要实现水平扩展,在服务器之间高效、均匀地分发请求/数据非常重要。一致哈希是实现此目标的常用技术。但首先,让我们深入研究一下这个问题。
If you have n cache servers, a common way to balance the load is to use the following hash method: serverIndex = hash(key) % N, where N is the size of the server pool. Let us use an example to illustrate how it works. As shown in Table 5-1, we have 4 servers and 8 string keys with their hashes. 如果您有n个缓存服务器,平衡负载的常用方法是使用以下哈希方法:serverIndex=hash(key)%N,其中▁N▁是服务器池的大小。 让我们用一个例子来说明它是如何工作的。如表5-1所示,我们有4个服务器和8个字符串密钥及其哈希值。
key | hash | hash%4 |
---|---|---|
key0 | 18358617 | 1 |
key1 | 26143584 | 0 |
key2 | 18131146 | 2 |
key3 | 35863496 | 0 |
key4 | 34085809 | 1 |
key5 | 27581703 | 3 |
key6 | 38164978 | 2 |
key7 | 22530351 | 3 |
To fetch the server where a key is stored, we perform the modular operation f(key) % 4. Forinstance, hash(key0) % 4 = 1 means a client must contact server 1 to fetch the cached data. Figure 5-1 shows the distribution of keys based on Table 5-1. 为了获取存储密钥的服务器,我们执行模块化操作 f(key) % 4。 例如,hash(key0) % 4 = 1 表示客户端必须联系服务器 1 才能获取缓存的数据。 图 5-1 显示了基于表 5-1 的密钥分布。
This approach works well when the size of the server pool is fixed, and the data distribution is even. However, problems arise when new servers are added, or existing servers are removed. For example, if server 1 goes offline, the size of the server pool becomes 3. Using the same hash function, we get the same hash value for a key. But applying modular operation gives us different server indexes because the number of servers is reduced by 1. We get the results as shown in Table 5-2 by applying hash % 3: 当服务器池的大小固定且数据分布均匀时,此方法非常有效。 但是,添加新服务器或删除现有服务器时会出现问题。 例如,如果服务器 1 脱机,则服务器池的大小将变为 3。 使用相同的哈希函数,我们得到一个键的相同哈希值。 但是应用模块化操作会给我们不同的服务器索引,因为服务器数量减少了 1。 我们通过应用哈希 % 3 得到表 5-2 中所示的结果:
Figure 5-2 shows the new distribution of keys based on Table 5-2. 图 5-2 显示了基于表 5-2 的新密钥分布。
As shown in Figure 5-2, most keys are redistributed, not just the ones originally stored in the offline server (server 1). This means that when server 1 goes offline, most cache clients will connect to the wrong servers to fetch data. This causes a storm of cache misses. Consistent hashing is an effective technique to mitigate this problem. 如图 5-2 所示,大多数密钥都是重新分发的,而不仅仅是最初存储在脱机服务器(服务器 1)中的密钥。 这意味着当服务器 1 脱机时,大多数缓存客户端将连接到错误的服务器来获取数据。 这会导致缓存未命中风暴。一致哈希是缓解此问题的有效技术。
Quoted from Wikipedia: "Consistent hashing is a special kind of hashing such that when a hash table is re-sized and consistent hashing is used, only k/n keys need to be remapped on average, where k is the number of keys, and n is the number of slots. In contrast, in most traditional hash tables, a change in the number of array slots causes nearly all keys to be remapped [1]”. 引自维基百科:“一致散列是一种特殊的散列,当 重新调整哈希表的大小并使用一致的哈希,只需要重新映射 k/n 键 平均值,其中 k 是密钥数,n 是插槽数。相比之下,在大多数 传统的哈希表,数组插槽数量的变化导致几乎所有键 重新映射 [1]”。
Now we understand the definition of consistent hashing, let us find out how it works. Assume SHA-1 is used as the hash function f, and the output range of the hash function is: x0, x1, x2, x3, …, xn. In cryptography, SHA-1’s hash space goes from 0 to 2^160 - 1. That means x0 corresponds to 0, xn corresponds to 2^160 – 1, and all the other hash values in the middle fall between 0 and 2^160 - 1. Figure 5-3 shows the hash space. 现在我们了解了一致哈希的定义,让我们找出它是如何工作的。假设 SHA-1 用作哈希函数 f,哈希函数的输出范围为:x0, x1, x2, x3, ..., xn.在密码学中,SHA-1 的哈希空间从 0 到 2^160 - 1。这意味着 x0 对应于 0,xn 对应于 2^160 – 1,中间的所有其他哈希值都下降 介于 0 和 2^160 - 1 之间。图 5-3 显示了哈希空间。
By collecting both ends, we get a hash ring as shown in Figure 5-4: 通过收集两端,我们得到一个哈希环,如图 5-4 所示:
Using the same hash function f, we map servers based on server IP or name onto the ring. Figure 5-5 shows that 4 servers are mapped on the hash ring. 使用相同的哈希函数 f,我们根据服务器 IP 或名称将服务器映射到环上。 图 5-5 显示哈希环上映射了 4 台服务器。
One thing worth mentioning is that hash function used here is different from the one in “the rehashing problem,” and there is no modular operation. As shown in Figure 5-6, 4 cache keys (key0, key1, key2, and key3) are hashed onto the hash ring 值得一提的一件事是,这里使用的哈希函数与“The 重新散列问题“,并且没有模块化操作。如图 5-6 所示,4 个缓存键 (key0、key1、key2 和 key3)被散列到哈希环上
To determine which server a key is stored on, we go clockwise from the key position on the ring until a server is found. Figure 5-7 explains this process. Going clockwise, key0 is stored on server 0; key1 is stored on server 1; key2 is stored on server 2 and key3 is stored on server 3. 为了确定密钥存储在哪个服务器上,我们从密钥位置顺时针方向移动 振铃,直到找到服务器。图 5-7 说明了此过程。顺时针方向,存储 key0 在服务器 0 上;密钥 1 存储在服务器 1 上;密钥 2 存储在服务器 2 上,密钥 3 存储在服务器上 3.
Using the logic described above, adding a new server will only require redistribution of a fraction of keys. In Figure 5-8, after a new server 4 is added, only key0 needs to be redistributed. k1, k2, and k3 remain on the same servers. Let us take a close look at the logic. Before server 4 is added, key0 is stored on server 0. Now, key0 will be stored on server 4 because server 4 is the first server it encounters by going clockwise from key0’s position on the ring. The other keys are not redistributed based on consistent hashing algorithm. 使用上述逻辑,添加新服务器只需要重新分发 键的分数。 在图 5-8 中,添加新服务器 4 后,只需要重新分发 key0。 K1、K2 和K3 保留在相同的服务器上。让我们仔细看看逻辑。在添加服务器 4 之前, key0 存储在服务器 0 上。现在,key0 将存储在服务器 4 上,因为服务器 4 是第一个 它通过从 key0 在环上的位置顺时针方向遇到服务器。 其他键是不基于一致的哈希算法重新分发。
When a server is removed, only a small fraction of keys require redistribution with consistent hashing. In Figure 5-9, when server 1 is removed, only key1 must be remapped to server 2. The rest of the keys are unaffected. 删除服务器时,只有一小部分密钥需要重新分发,并具有一致的 散列法。 在图 5-9 中,删除服务器 1 时,只有 key1 必须重新映射到服务器 2。 其余密钥不受影响。
The consistent hashing algorithm was introduced by Karger et al. at MIT [1]. The basic steps are: • Map servers and keys on to the ring using a uniformly distributed hash function. • To find out which server a key is mapped to, go clockwise from the key position until the first server on the ring is found. 一致哈希算法是由麻省理工学院的Karger等人引入的[1]。基本步骤 是: • 使用均匀分布的哈希函数将服务器和密钥映射到环上。 • 要找出密钥映射到哪个服务器,请从密钥位置顺时针方向移动,直到 找到环上的第一个服务器。
Two problems are identified with this approach. First, it is impossible to keep the same size of partitions on the ring for all servers considering a server can be added or removed. A partition is the hash space between adjacent servers. It is possible that the size of the partitions on the ring assigned to each server is very small or fairly large. In Figure 5-10, if s1 is removed, s2’s partition (highlighted with the bidirectional arrows) is twice as large as s0 and s3’s partition. 这种方法发现了两个问题。首先,不可能保持相同的尺寸 可以添加或删除考虑服务器的所有服务器的环上的分区。一个 分区是相邻服务器之间的哈希空间。有可能的大小 分配给每个服务器的环上的分区非常小或相当大。在图 5-10 中,如果 s1 删除,S2 的分区(用双向箭头突出显示)是 S0 的两倍 和 S3 的分区。
Second, it is possible to have a non-uniform key distribution on the ring. For instance, if servers are mapped to positions listed in Figure 5-11, most of the keys are stored on server 2. However, server 1 and server 3 have no data. 其次,戒指上的密钥分布可能不均匀。例如,如果 服务器映射到图 5-11 中列出的位置,大多数密钥存储在服务器 2 上。 但是,服务器 1 和服务器 3 没有数据。 A technique called virtual nodes or replicas is used to solve these problems. 一种称为虚拟节点或副本的技术用于解决这些问题。
A virtual node refers to the real node, and each server is represented by multiple virtual nodes on the ring. In Figure 5-12, both server 0 and server 1 have 3 virtual nodes. The 3 is arbitrarily chosen; and in real-world systems, the number of virtual nodes is much larger. Instead of using s0, we have s0_0, s0_1, and s0_2 to represent server 0 on the ring. Similarly, s1_0, s1_1, and s1_2 represent server 1 on the ring. With virtual nodes, each server is responsible for multiple partitions. Partitions (edges) with label s0 are managed by server 0. On the other hand, partitions with label s1 are managed by server 1. 虚拟节点是指真实节点,每个服务器由多个虚拟节点表示 在戒指上。在图 5-12 中,服务器 0 和服务器 1 都有 3 个虚拟节点。3 是 任意选择;在现实世界的系统中,虚拟节点的数量要大得多。 我们没有使用 s0,而是用s0_0、s0_1 和 s0_2 来表示环上的服务器 0。同样地 s1_0、s1_1 和 s1_2 表示环上的服务器 1。使用虚拟节点,每个服务器都是 负责多个分区。带有标签 s0 的分区(边缘)由服务器 0 管理。 另一方面,带有标签 s1 的分区由服务器 1 管理。
To find which server a key is stored on, we go clockwise from the key’s location and find the first virtual node encountered on the ring. In Figure 5-13, to find out which server k0 is stored on, we go clockwise from k0’s location and find virtual node s1_1, which refers to server 1. 要查找密钥存储在哪个服务器上,我们从密钥的位置顺时针方向找到 环上遇到的第一个虚拟节点。在图 5-13 中,找出存储了哪个服务器 k0 on,我们从 k0 的位置顺时针方向找到虚拟节点 s1_1,它指的是服务器 1。
As the number of virtual nodes increases, the distribution of keys becomes more balanced. This is because the standard deviation gets smaller with more virtual nodes, leading to balanced data distribution. Standard deviation measures how data are spread out. The outcome of an experiment carried out by online research [2] shows that with one or two hundred virtual nodes, the standard deviation is between 5% (200 virtual nodes) and 10% (100 virtual nodes) of the mean. The standard deviation will be smaller when we increase the number of virtual nodes. However, more spaces are needed to store data about virtual nodes. This is a tradeoff, and we can tune the number of virtual nodes to fit our system requirements. 随着虚拟节点数量的增加,密钥的分布变得更加平衡。 这是因为标准偏差随着虚拟节点的增加而变小,从而导致 平衡的数据分布。标准差衡量数据的分布方式。这 在线研究[2]进行的一项实验结果表明,用一两个 百个虚拟节点,标准差在5%(200个虚拟节点)和10%之间 (100 个虚拟节点)的平均值。当我们增加 虚拟节点数。但是,需要更多的空间来存储有关虚拟节点的数据。 这是一个权衡,我们可以调整虚拟节点的数量以满足我们的系统要求。
When a server is added or removed, a fraction of data needs to be redistributed. How can we find the affected range to redistribute the keys? In Figure 5-14, server 4 is added onto the ring. The affected range starts from s4 (newly added node) and moves anticlockwise around the ring until a server is found (s3). Thus, keys located between s3 and s4 need to be redistributed to s4. 添加或删除服务器时,需要重新分发一小部分数据。我们怎么能 找到受影响的范围以重新分发密钥? 在图 5-14 中,服务器 4 已添加到环上。受影响的范围从 s4(新 添加节点)并围绕环逆时针移动,直到找到服务器 (S3)。因此,键 位于 S3 和 S4 之间需要重新分发到 S4。
When a server (s1) is removed as shown in Figure 5-15, the affected range starts from s1 (removed node) and moves anticlockwise around the ring until a server is found (s0). Thus, keys located between s0 and s1 must be redistributed to s2. 如图 5-15 所示删除服务器 (s1) 时,受影响的范围从 s1 开始 (移除节点)并绕环逆时针移动,直到找到服务器 (s0)。因此 位于 S0 和 S1 之间的密钥必须重新分发到 S2。
In this chapter, we had an in-depth discussion about consistent hashing, including why it is needed and how it works. The benefits of consistent hashing include: • Minimized keys are redistributed when servers are added or removed. • It is easy to scale horizontally because data are more evenly distributed. • Mitigate hotspot key problem. Excessive access to a specific shard could cause server overload. Imagine data for Katy Perry, Justin Bieber, and Lady Gaga all end up on the same shard. Consistent hashing helps to mitigate the problem by distributing the data more evenly. Consistent hashing is widely used in real-world systems, including some notable ones: • Partitioning component of Amazon’s Dynamo database [3] • Data partitioning across the cluster in Apache Cassandra [4] • Discord chat application [5] • Akamai content delivery network [6] • Maglev network load balancer [7] Congratulations on getting this far! Now give yourself a pat on the back. Good job! 在本章中,我们对一致性哈希进行了深入讨论,包括为什么 需要以及它是如何工作的。一致哈希的好处包括: • 添加或删除服务器时,将重新分发最小化的密钥。 • 由于数据分布更均匀,因此易于水平扩展。 • 缓解热点关键问题。过度访问特定分片可能会导致服务器超载。想象一下,Katy Perry,Justin Bieber和Lady Gaga的数据最终都出现在相同的分片。一致的哈希通过更多地分发数据来帮助缓解问题平等地。 一致散列在现实世界系统中被广泛使用,包括一些值得注意的系统: • 亚马逊 Dynamo 数据库的分区组件 [3] • 在 Apache Cassandra 中跨集群的数据分区 [4] • 不和谐聊天应用程序[5] • Akamai 内容交付网络 [6] • 磁悬浮网络负载均衡器 [7] 恭喜你走到了这一步!现在拍拍自己的背。干得好!
[1] Consistent hashing: https://en.wikipedia.org/wiki/Consistent_hashing [2] Consistent Hashing: https://tom-e-white.com/2007/11/consistent-hashing.html [3] Dynamo: Amazon’s Highly Available Key-value Store: https://www.allthingsdistributed.com/files/amazon-dynamo-sosp2007.pdf [4] Cassandra - A Decentralized Structured Storage System: http://www.cs.cornell.edu/Projects/ladis2009/papers/Lakshman-ladis2009.PDF [5] How Discord Scaled Elixir to 5,000,000 Concurrent Users: https://blog.discord.com/scaling-elixir-f9b8e1e7c29b [6] CS168: The Modern Algorithmic Toolbox Lecture #1: Introduction and Consistent Hashing: http://theory.stanford.edu/~tim/s16/l/l1.pdf [7] Maglev: A Fast and Reliable Software Network Load Balancer: https://static.googleusercontent.com/media/research.google.com/en//pubs/archive/44824.pdf
本文作者:Eric
本文链接:
版权声明:本博客所有文章除特别声明外,均采用 BY-NC-SA 许可协议。转载请注明出处!