Cuckoo Hashing: A Clever Way to Resolve Collisions

A hash table is a fundamental data structure that provides fast insertion, deletion, and lookup operations, typically in $O(1)$ time. The core idea is simple: we have an underlying array where data is stored, and each item is mapped to an index using a hash function. By doing so, we can efficiently locate values without scanning the entire dataset.

However, no data structure is perfect, and hash tables come with their own set of challenges. One of the most common issues is hash collisions - a scenario where two different items are assigned the same index in the array. Since an array has a finite number of slots, no matter how well-designed the hash function is, collisions are inevitable in theory (See pigeonhole principle for more detail). The **birthday paradox** tells us that as the number of entries in the hash table increases, the chance of collision increases very quickly.

There’s a couple of solutions of solutions for the hash collision has introduced throughout of the computer science history. Some of most famous ones are:

Separate Chaining: Using linked lists (or other structures) to store multiple elements at the same index.
Linear Probing (Open Addressing): Resolving collisions by searching for the next available slot in the table.
Double Hashing (Open Addressing): Using a second independent hash function for collision resolution
Cuckoo Hashing: Using two (or more) hash functions to resolve collisions.

In this blog, I want to introduce & deep dive into Cuckoo Hashing - it stands out for its worst case $O(1)$ lookup time and efficient memory usage.

Cuckoo hashing is a hash table scheme using 2 (or more) hash tables $T_1$ & $T_2$ , … $T_n$ , each of them has $r$ buckets (slots). Each hash table is associated with a corresponding hash function $h_1$ , $h_2$ , …, $h_n$ . The item $x$ can be stored in exactly one of those hash tables. Where $x$ is stored is determined by the result of the hash function associated with that hash table. The output of the hash table will be the index (bucket/slot) where we put the item.

⚠️ Please note that we can have more than 2 hash tables/hash functions, but the common approach is to use 2

From now on, we stick to just 2 hash tables ( $T_1$ & $T_2$ ) & 2 hash functions ( $h_1$ , $h_2$ )

In Cuckoo Hashing, the lookup process is straightforward. To check whether an item exists, we:

Compute its two possible locations using the corresponding hash functions
- $h_1(x)$ → Check $T_1[h_1(x)]$
- $h_2(x)$ → Check $T_2[h_2(x)]$
If the item is found in either $T_1$ or $T_2$ ****, return True , otherwise, return False

Since lookups require checking at most two fixed locations, they always run in O(1) worst-case time.

Here’s the pseudo code:

func lookup(item):
return T1[h1(item)] || T2[h2(item)]

func lookup(item):
return T1[h1(item)] || T2[h2(item)]

The deletion process in Cuckoo Hashing is straightforward—simply remove the item from its designated bucket by clearing the slot where it is stored. Since each key has a fixed number of possible locations, deletion is $O(1)$ worst-case time and does not require additional handling, such as tombstones used in other hashing schemes.

func lookup(item):
if T1[h1(item)]:
T1[h1(item)] = null
else if T2[h2(item)]:
T2[h2(item)] = null

func lookup(item):
if T1[h1(item)]:
T1[h1(item)] = null
else if T2[h2(item)]:
T2[h2(item)] = null

In the traditional version of Cuckoo Hashing, we always try inserting starting from $T_1$ . We hash the item $x$ using hash function $h_1$ to get the chosen bucket in $T_1$

if the bucket is empty, we can place that item in that bucket of $T_1$ and we stop there.
If the chosen bucket is hold by other key $y$ , the key $y$ is evicted to its alternative position in the other hash table. For example, if $y$ is currently in table $T_1$ , we will compute $h_2(y)$ and move it to $T_2$ . And if the new chosen bucket for $y$ in other hash table is also taken by the key $z$ , we still place $y$ in that bucket and continue to evict $z$ using the same approach with $y$ we continue that process until we find empty slot for all the keys involved in this eviction flow.
- In this case, we have a series of keys to be updated $x_1 = x$ , $x_2 = y$ , $x_3 = z$ , · · · - let’s call them nested keys

Here’s the pseudo code

func insert(x):
    while True:
        if T1[h1(x)] is empty:
            T1[h1(x)] = x
            return True

        # Evict current item in T1
        swap x with T1[h1(x)]

        if T2[h2(x)] is empty:
            T2[h2(x)] = x
            return True

        # Evict current item in T2
        swap x with T2[h2(x)]

func insert(x):
    while True:
        if T1[h1(x)] is empty:
            T1[h1(x)] = x
            return True

        # Evict current item in T1
        swap x with T1[h1(x)]

        if T2[h2(x)] is empty:
            T2[h2(x)] = x
            return True

        # Evict current item in T2
        swap x with T2[h2(x)]

For the case the chosen bucket is not empty in insertion process, there are a few scenarios that need to take into consideration:

This is the case where all the nested keys are unique, we don’t encounter any chain that $x_1 = x$ , $x_2 = y$ , $x_3 = z$ , $x_4 = x$ at some point.

Example, we want to add $x_1$ to the hash tables, the hash key of $x_1$ is the bucket (2) in $T_1$ .

The bucket (2) is already taken by $x_2$ , so we evict it by hashing it again using $h_2(x_2)$ to find the bucket to place $x_2$ in $T_2$ and its new bucket is bucket (3) in $T_2$

a. If the nested keys don’t contain any loop — figure 1

Unfortunately, the bucket (3) in $T_2$ is already occupied by $x_3$

a. If the nested keys don’t contain any loop — figure 2

So, we continue to evict $x_3$ rehash it again using $h_1(x_3)$ the chosen bucket for $x_3$ in $T_1$ is bucket (4) which is empty

a. If the nested keys don’t contain any loop — figure 3

So now, all the nested keys are placed, and we’re done

a. If the nested keys don’t contain any loop — figure 4

In this case, we have a nested key which is visited more than one. In this scenario, we have 2 smaller cases.

The loop exists, but we still find empty slots for all nested keys.
b. If the nested key involves a loop. — figure 1
In this case, the $x_5$ eviction leads to place $x_5$ to bucket is taken by $x_1$ , and rehash $x_1$ again using $h_2(x_1)$ leads to empty slots in $T_2$ (remember that at the beginning we hash $x_1$ using $h_1(x_1)$ )
The loop exists, but we can’t find empty slots for all nested keys, since there’s an infinity loop between nested keys. Sorry for not having the illustration for this case, I think that you guys can imagine that case, the loop between keys cause a cyclic loops.

In case we encounter the infinity eviction process, most of the time, it’s because of the load factor (the number of taken buckets / the total number of buckets are high). So in order to avoid it, we can:

Enlarge the table size (typically 2x in size)
Rehash all existing items in the hash tables.

One notable issue in this case is that, what if we have another requests during the rehashing?

if it’s write request (insert or delete) - we wait for the insert is done before processing those kind of request.

If it’s read request - lookup, we have a couple of solutions, here’s 2 of them.

Pause Lookups Until Rehashing Completes: Apply the same strategy as the write request, we force the client to wait for the rehashing to be completed. ****
Two versions of hash tables co-exist - Redis also uses this approach for their hash table in case that they need to rehash. In short, we maintain the old hash tables and still use them in case of there’s lookup request, after we’re done with rehashing, we switch to the new hash tables. (You can read more detail at link )

So our pseudo code shown in the last section lacks of 2 things

The stop condition for the process, since we can encounter infinity eviction process, let’s say we have a variable named maxLoop and we stop eviction process if we exceed maxLoop times.
- So what’s should be the value of maxLoop ? Typical value: $O(log n)$ (logarithmic in table size).
  - For small tables: maxLoop ≈ 10–50 is usually sufficient.
  - For large tables: Higher values may be needed, but too high a limit leads to slow insertions.
The rehash process if we encounter infinity eviction

Here’s the full pseudo code for insert process:

func insert(x):
		intial_x
    for i from 0 to maxLoop:
        if T1[h1(x)] is empty:
            T1[h1(x)] = x
            return True

        # Evict current item in T1
        swap x with T1[h1(x)]

        if T2[h2(x)] is empty:
            T2[h2(x)] = x
            return True

        # Evict current item in T2
        swap x with T2[h2(x)]

		# if we can reach this points, that mean we encounter finity eviction
		rehash()
		insert(intial_x)

func insert(x):
		intial_x
    for i from 0 to maxLoop:
        if T1[h1(x)] is empty:
            T1[h1(x)] = x
            return True

        # Evict current item in T1
        swap x with T1[h1(x)]

        if T2[h2(x)] is empty:
            T2[h2(x)] = x
            return True

        # Evict current item in T2
        swap x with T2[h2(x)]

		# if we can reach this points, that mean we encounter finity eviction
		rehash()
		insert(intial_x)

Unlike Chaining (which has O(n) worst-case lookup time) and Linear Probing (which suffers from clustering), Cuckoo Hashing guarantees worst-case $O(1)$ lookups & deletion since we only need to check for the bucket of 2 (or more) hash tables using hash values.

Cuckoo Hashing typically requires more space than the actual number of stored elements because:

It uses multiple tables (usually two, sometimes more in k-way Cuckoo Hashing).
The load factor must remain below 50% to minimize rehashing, meaning that half of the table remains empty.
Rehashing requires extra space during the transition to a larger table.

Even though that in best case, it’s O(1), and in average case it’s $O(logn)$ - for eviction chain to insert a new item into the hash table, we still have the O(n) worst-case time in insertion - in case rehash is required.

Chaining and Open Addressing avoid this issue by never requiring rehashing due to insertion failures.

If insertion fails due to cycles, we must rehash the entire table.

Rehashing involves:

Generating new hash functions (optional)
Resizing the table (usually doubling in size)
Reinserting all n keys
This is an $O(n)$ operation, making it very expensive for large datasets.

Cuckoo Hashing does not work well when the table is nearly full

As the load factor increases beyond ~50%, insertions fail more often due to long eviction chains
If the table is too full, insertion frequently triggers rehashing.

Chaining and Open Addressing can operate at higher load factors (~75-90%) without severe slowdowns.

Chaining is easy: just append to a linked list.
Linear Probing is simple: just scan forward for an empty slot.
Cuckoo Hashing requires careful eviction handling, detecting cycles, and managing rehashing.

Source: https://arxiv.org/pdf/2209.07663

Bytedance has published out a paper mentioned about their monolith recommendation system, the authors address the critical issue of hash key collisions in embedding tables, which can significantly degrade model quality in recommendation systems. To tackle this, they introduce a collisionless embedding table utilizing a Cuckoo HashMap. This approach ensures that each feature ID maps to a unique embedding, effectively eliminating collisions. The Cuckoo HashMap operates by assigning multiple potential locations for each key and relocating existing keys when necessary to accommodate new entries, thereby maintaining efficient and collision-free storage. Additionally, to manage memory usage, the system incorporates mechanisms like expirable embeddings and frequency filtering, which remove infrequently accessed or obsolete embeddings. This design not only preserves the integrity of embeddings but also optimizes memory footprint, leading to improved model performance and scalability in real-time recommendation scenarios.

Here’s my simple Cuckoo Hashing implementation in Go, with some modifications & optimizations. You guys can take a look. If there’s any issue or something wrong, feel free to comment or create issue in the Github repo. 🙏 🙏🙏

https://github.com/trunghieu99tt/cuckoo-hashing

Feature	Cuckoo hashing	Chaining (Separate Chaining)	Linear Probing (Open Addressing)	Double Hashing (Open Addressing)
Lookup	O(1) worst-case	O(1) average, O(n) worst-case (long chains)	O(1) average, O(n) worst-case (clustering)	O(1) average, O(n) worst-case
Insert	O(1) expected, O(logn) average, O(n) worst-case (rehash needed)	O(1) average, O(n) worst-case (chained list traversal)	O(1) average, O(n) worst-case (clustering)	O(1) average, O(n) worst-case
Deletion	O(1)	O(1) average, O(n) worst-case	O(1) average, O(n) worst-case (shifting required)	O(1) average, O(n) worst-case
Memory Usage	Requires extra space (~50% empty slots)	More memory-efficient (linked lists)	Most space-efficient (compact array)	Most space-efficient (compact array)
Handling Collisions	Evictions & rehashing	Chaining (linked lists/trees)	Linear probing (scan for next available slot)	Uses second hash function for probing
Performance at High Load Factor	Degrades rapidly (~50% max load factor)	Handles high load factors well (~75-90%)	Severe clustering at high loads	Better than Linear Probing but still affected by clustering
Cache Performance	Not cache-friendly (scattered data in multiple tables)	Poor cache performance (linked lists are scattered in memory)	Best cache performance (data stored contiguously)	Good cache performance
Concurrency	Hard to parallelize due to eviction process	Easier to parallelize (independent linked lists)	Difficult due to race conditions in probing	Difficult due to probe sequence dependencies
Rehashing Cost	O(n) (all elements reinserted with new hash functions)	Rarely needed (expand bucket size instead)	O(n) (entire table resized and reinserted)	O(n) (entire table resized and reinserted)
Best Used For	Fast lookups (O(1) worst-case), high-speed networking, caching	Memory-efficient storage, databases	Performance-sensitive applications (CPU caches)	Good balance of performance and memory efficiency

Cuckoo Hashing is a high-performance hashing scheme that ensures O(1) worst-case lookups using multiple hash functions and eviction-based collision resolution. It excels in read-heavy applications like networking, caching, and databases but has trade-offs: higher memory overhead (~50% empty space), complex insertions, and costly rehashing (O(n) worst-case). Despite these, it remains a top choice when predictable and fast lookups are critical.

(I’m not the author) You can try a Cuckoo Visualization here: https://www.lkozma.net/cuckoo_hashing_visualization/

https://www.brics.dk/RS/01/32/BRICS-RS-01-32.pdf

https://cs.stanford.edu/~rishig/courses/ref/l13a.pdf

https://en.wikipedia.org/wiki/Tabulation_hashing

https://www.cs.cmu.edu/~binfan/papers/conext14_cuckoofilter.pdf

https://arxiv.org/pdf/2209.07663