Computer Science

Consistent Hashing

Visualize consistent hashing with virtual nodes for distributed caching and load balancing.

Consistent Hashing

Concept Overview

Consistent hashing is a distributed hashing scheme that operates independently of the number of servers or objects in a distributed hash table. In traditional hash tables, changing the number of array slots causes nearly all keys to be remapped. Consistent hashing minimizes reorganization: when a hash table is resized, only k/n keys need to be remapped on average, where k is the number of keys and n is the number of slots or servers.

Mathematical Definition

Consistent hashing maps both data (keys) and nodes (servers) to the same circular hash space (or "ring"). The hash space is typically defined by a hash function returning values in the range [0, 2L-1], where L is the number of bits in the hash (e.g., L=160 for SHA-1).

// Ring Mapping
H(x) → [0, 2L-1]
// Server and Key Mapping
Server Position: si = H(Server_IDi)
Key Position: kj = H(Key_IDj)
// Assignment Rule
A key kj is assigned to the first server si encountered moving clockwise on the ring:
Assigned_Server(kj) = min(si) such that si ≥ kj
(If no such si exists, wrap around to the smallest si)

Key Concepts

The Rehashing Problem

In standard modulo hashing, a key is assigned to a server using index = H(key) % N, where N is the number of servers. If a server is added or removed, N changes, and the modulo operation yields different results for almost all keys. This causes a massive cache invalidation (a "thundering herd" problem) in distributed systems. Consistent hashing solves this by decoupling the hashing from the exact number of servers.

Virtual Nodes (Vnodes)

A raw consistent hash ring can lead to non-uniform data distribution if servers are placed randomly on the ring, resulting in some servers taking disproportionately large segments of the hash space. To fix this, we use "virtual nodes". Each physical server is assigned multiple positions (virtual nodes) on the ring.

By increasing the number of virtual nodes, the standard deviation of load distribution decreases, ensuring a much more balanced load across all physical servers. Virtual nodes also allow for heterogeneous clusters, where a more powerful server can simply be assigned more virtual nodes than a weaker one.

Historical Context

Consistent hashing was introduced by David Karger et al. at MIT in 1997 in their seminal paper "Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web". It was originally developed as a way to distribute requests among a changing population of web caches and was fundamental to the design of Akamai's content delivery network (CDN).

Real-world Applications

  • Distributed NoSQL Databases: Systems like Apache Cassandra and Amazon DynamoDB use consistent hashing (with virtual nodes) to partition data across a cluster of nodes. It allows them to scale incrementally without massive data movement.
  • Content Delivery Networks (CDNs): CDNs use consistent hashing to distribute web content to edge servers. If an edge server goes down, only a fraction of the cached content needs to be re-fetched from the origin.
  • Distributed Caching Systems: Memcached clients and Redis Cluster utilize consistent hashing algorithms to route keys to specific cache instances, ensuring high cache hit rates even during rolling restarts or scaling events.
  • Load Balancing: Advanced load balancers (like HAProxy or Envoy) use consistent hashing to provide "sticky sessions" or session affinity, ensuring requests for a specific user ID always hit the same backend server while allowing backend servers to be added or removed gracefully.

Related Concepts

  • Rendezvous Hashing (Highest Random Weight Hashing) — an alternative to consistent hashing that solves the same problem without a ring structure
  • Distributed Hash Tables (DHT) — decentralized systems that use consistent hashing (like Chord) for peer-to-peer data storage
  • Bloom Filter — another fundamental distributed systems data structure for fast set membership queries

Experience it interactively

Adjust parameters, observe in real time, and build deep intuition with Riano’s interactive Consistent Hashing module.

Try Consistent Hashing on Riano →

More in Computer Science