Robin Hood Hashing – Fairness in Collision Resolution

A write-up about Robin Hood hashing - fairness in collision resolution.

A hash table is a fundamental data structure that is widely used in real systems. The core idea is simple: we give the hash table a key and a value, it stores that key-value pair internally, and later returns the value when we query the key again. A standard hash table typically consists of:

  • A fixed-size backing array where entries are stored.
  • A hash function that transforms an input key (which may be an integer, string, pair, tuple, or any custom type) into a numeric hash value.
  • An optional index function that maps the hash value to a bucket in the backing array. If not provided, the hash value is used directly as the bucket index.

On insertion, we locate an available bucket using the hash function and the optional index function and store the entry there. On lookup, we use the hash function and the optional index function to locate the bucket where the entry resides and return the stored value.

This simple idea gives us an amortized time complexity of O(1)O(1).

One major problem with hash tables is collision. No matter how good the hash function is, there is always a chance that two different keys will map to the same bucket. This happens because the key space is usually much larger than the number of buckets in the backing array. For example, if the backing array has size 100 and we have 1000 integers, then even if the hash function produces 1000 distinct hash values, those values still have to be mapped into only 100 buckets. As a result, multiple keys must eventually land in the same bucket.

  • The famous birthday paradox is a good analogy. It states that among 23 or more people, the probability that at least two of them share the same birthday is greater than 12\frac{1}{2}. In other words, if we choose a random function that maps 23 records into a table of size 365, the probability that no two keys map to the same location is only 0.4927.

Since collisions are inevitable, there has been a great deal of research on how to resolve them. One of the most well-known families of techniques is open addressing.

In open addressing, when a collision occurs, the algorithm probes alternative positions based on a predefined probing sequence until an empty bucket is found.

It is called open addressing because of the following characteristic:

  • Each key has a home bucket: the bucket that its hash value maps to.
  • If that bucket is occupied, we search for another bucket (another address) in the same table.

In other words, the address of the key remains “open” for probing.

Some notable schemes in open addressing are the following.

Given a hash table of size n, and a home bucket i:

  • Linear probing: the probe step is 1, the probe sequence is (i+1)%n(i + 1) \% n, (i+2)%n(i + 2) \% n, (i+3)%n(i + 3) \% n, and so on.
  • Quadratic probing: the probe step is 121^2, 222^2, 323^2, and so on. The probe sequence is (i+12)%n(i + 1^2) \% n, (i+22)%n(i + 2^2) \% n, (i+32)%n(i + 3^2) \% n, and so on.
  • Double hashing: we use another hash function to determine the step size. If we have two hash functions h1(k)h_1(k) and h2(k)h_2(k), the probe sequence is (h1(k)+0h2(k))%n(h_1(k) + 0 \cdot h_2(k)) \% n, (h1(k)+1h2(k))%n(h_1(k) + 1 \cdot h_2(k)) \% n, (h1(k)+2h2(k))%n(h_1(k) + 2 \cdot h_2(k)) \% n, and so on.

In this article, we will look at another open addressing scheme: Robin Hood hashing.

And another thing is, in this article, we'll compare Robin Hood hashing with linear probing, as linear probing is the most basic and well-known open addressing scheme.

The name of this scheme comes from Robin Hood, the legendary outlaw who “steals from the rich and gives to the poor.”

In Robin Hood hashing, we define the Distance to Initial Bucket (DIB) as the distance between the bucket where an entry is actually stored and its home bucket, that is, the bucket where its hash value originally maps.

A smaller DIB means the entry is “richer,” because it is closer to its ideal position. A larger DIB means the entry is “poorer,” because it has already probed farther.

The core rule of Robin Hood hashing is:

When inserting, if the incoming element has a larger DIB than the current occupant, it takes the position and the displaced element continues probing

Let us walk through the insertion algorithm step by step:

  1. Compute the home bucket.
  2. If the bucket is empty, place the entry there.
  3. Otherwise, continue probing until an empty bucket is found.
    1. Compare the incoming entry's DIB with the existing entry's DIB.
    2. Swap them if the incoming entry has a larger DIB than the existing entry.
    3. Continue probing with the displaced entry.

Let’s take a look at an example:

Assume a hash table of size 7 with the following hash function:

h(k) = k % 7
h(k) = k % 7

For simplicity, we use the same expression for both the hash function and the index function. In a real implementation, these are often treated separately. We will return to this point in the implementation section.

We insert the keys in order:

A, B, C, X
A, B, C, X

With:

  • A → 0
  • B → 1
  • C → 2
  • X → 0

Step 1: Insert A

A hashes to index 0 and is placed directly.

Index:  0   1   2   3   4   5   6
Table:  A   -   -   -   -   -   -
DIB:    0
Index:  0   1   2   3   4   5   6
Table:  A   -   -   -   -   -   -
DIB:    0

Step 2: Insert B

B hashes to index 1, which is empty.

Index:  0   1   2   3   4   5   6
Table:  A   B   -   -   -   -   -
DIB:    0   0
Index:  0   1   2   3   4   5   6
Table:  A   B   -   -   -   -   -
DIB:    0   0

Step 3: Insert C

C hashes to index 2, which is also empty.

Index:  0   1   2   3   4   5   6
Table:  A   B   C   -   -   -   -
DIB:    0   0   0
Index:  0   1   2   3   4   5   6
Table:  A   B   C   -   -   -   -
DIB:    0   0   0

At this point, all elements are in their home buckets.

Step 4: Insert X

X hashes to index 0, which is occupied. We begin probing.

At index 0

  • X has DIB = 0
  • A has DIB = 0

Since the DIBs are equal, no swap occurs. X moves forward.

At index 1

  • X has DIB = 1
  • B has DIB = 0

Since X has a larger DIB, it takes the bucket.

After the swap:

Index:  0   1   2   3   4   5   6
Table:  A   X   C   -   -   -   -
DIB:    0   1   0
Index:  0   1   2   3   4   5   6
Table:  A   X   C   -   -   -   -
DIB:    0   1   0

Now B is displaced and continues probing with DIB = 1.

Continue inserting displaced B

At index 2:

  • B has DIB = 1
  • C has DIB = 0

Again, B has a larger DIB, so it swaps with C.

Index:  0   1   2   3   4   5   6
Table:  A   X   B   -   -   -   -
DIB:    0   1   1
Index:  0   1   2   3   4   5   6
Table:  A   X   B   -   -   -   -
DIB:    0   1   1

Now C is displaced and continues probing.

Continue inserting displaced C

At index 3:

  • bucket is empty → insert C
Index:  0   1   2   3   4   5   6
Table:  A   X   B   C   -   -   -
DIB:    0   1   1   1
Index:  0   1   2   3   4   5   6
Table:  A   X   B   C   -   -   -
DIB:    0   1   1   1

Final state

Index:  0   1   2   3   4   5   6
Table:  A   X   B   C   -   -   -
DIB:    0   1   1   1
Index:  0   1   2   3   4   5   6
Table:  A   X   B   C   -   -   -
DIB:    0   1   1   1

We can also observe that, in Robin Hood hashing, colliding keys tend to end up adjacent to one another, and their DIB values form a non-decreasing sequence. This is one of the key properties that enables an important lookup optimization.

For lookup, a standard open-addressing table starts from the home bucket and keeps probing until it either finds the key or reaches an empty bucket.

Robin Hood hashing allows us to stop earlier.

During lookup, we may stop when either of the following conditions holds:

  • The current bucket has a smaller DIB than the current probe distance of the search key. This follows from the insertion rule. If our key had been displaced this far, it would have stolen the bucket from any entry with a smaller DIB. Therefore, once we encounter an entry with a smaller DIB, our key cannot appear later in the probe sequence.
  • If we maintain the maximum probe length seen so far during insertion, we may also stop once the current probe distance exceeds that value. If the key exists, it must appear within that maximum probe length.

Deleting an entry from a hash table is not as simple as clearing a bucket.

In open addressing, lookup stops when it reaches an empty bucket. Because of that, if we simply remove an entry and leave the bucket empty, we may accidentally break the probe chain and make later entries unreachable.

For example, suppose we have four keys: A, B, C, and D.

All of them have the same home bucket 0, so with linear probing, after insertion we end up with the following table:

initial state

Now, we delete key B, and the hash table becomes:

deleted key B

Now suppose we search for key D. Since its hash value maps to 0, the lookup starts from bucket 0. It then continues to bucket 1, which is now empty, so the search stops and incorrectly reports that D is not present, even though it actually resides at bucket 3.

A common workaround in open addressing is to mark deleted entries with a special state instead of clearing them completely. This gives us three possible bucket states: empty, occupied, and deleted. A deleted bucket is treated as occupied during lookup, but as available during insertion. Such logically deleted entries are usually called tombstones.

Tombstones preserve correctness, but they create another problem. After enough insertions and deletions, the table may contain many tombstones. At that point, unsuccessful lookups can become very expensive because the probe sequence must pass through those tombstones rather than stopping early.

A further optimization is to use backward shift deletion.

When an entry is deleted, we shift subsequent entries backward until we reach either an empty bucket or an entry whose DIB is 0.

Every time an entry shifts backward by one bucket, its DIB also decreases by 1.

Let us return to the previous example where we delete key B.

To make the Robin Hood layout easier to read, the small green box shows the DIB value of each entry.

initial state

Now let us see what happens when we delete key B:

backward shift deletion result

Step 1: Key B is deleted.

Step 2: We shift C into B's bucket and decrease its DIB by 1, from 2 to 1.

Step 3: We shift D into C's previous bucket and decrease its DIB by 1, from 3 to 2.

We stop after D because bucket 4 is empty.

This approach is more expensive than logical deletion with tombstones. In the worst case, if we delete an entry near the front of a long cluster, we may need to shift many subsequent entries. The exact cost depends on the quality of the hash function and the current load factor. In practice, under a good hash function and a controlled load factor, the expected number of shifted entries is usually small, although the worst case can still be linear in the cluster size.

On average, all core operations: insertion, lookup, and deletion run in O(1)O(1) time, assuming:

  • a reasonably uniform hash function
  • a controlled load factor, typically below 0.8

Insertion and lookup follow the usual probing process, with the extra step of swapping during insertion when necessary. Although this introduces more work than linear probing, the expected probe length remains small, so the overall average cost stays constant.

Deletion is slightly more involved because Robin Hood hashing typically uses backward-shift deletion to preserve the probe invariant. While deletion may require shifting multiple entries, the expected number of shifts remains small under normal conditions. As a result, deletion is also O(1)O(1) on average.

However, it is important to note that all operations can degrade to O(n)O(n) in the worst case, especially under poor hashing or very high load factors. This is not unique to Robin Hood hashing; it is a general characteristic of open-addressing hash tables.

Variance measures how spread out probe lengths are across all keys.

  • Low variance means most keys have similar probe lengths.
  • High variance means some keys are very cheap to find, while others are much more expensive.

For example, suppose we have five keys: A, B, C, D, and E. Keys A and E map to the same home bucket 0, while B, C, and D map to buckets 1, 2, and 3, respectively.

If we insert them in the order A, B, C, D, E, then linear probing produces the following table.

linear probing result

With Robin Hood hashing, however, we get:

robin hood hashing result

We can see that the probe lengths in Robin Hood hashing are much more evenly distributed.

All elements are stored directly in a flat array. During probing:

  • access patterns are sequential
  • memory accesses are predictable
  • CPU prefetching works effectively

This makes both lookups and insertions efficient on modern hardware.

While Robin Hood hashing improves lookup predictability and reduces the variance of probe lengths, it also introduces several trade-offs that are worth understanding in practice.

Unlike linear probing, insertion in Robin Hood hashing is not simply a matter of finding the next empty bucket.

  • It may require swapping entries when the incoming key has a larger probe distance.
  • Those swaps may cascade, displacing multiple entries along the probe sequence.

As a result, insertion incurs additional overhead and can be less predictable, especially when clusters become long.

To maintain correctness, Robin Hood hashing usually uses backward-shift deletion instead of tombstones.

  • After deleting an entry, subsequent entries may need to be shifted backward.
  • This continues until we reach either:
    • an empty bucket, or
    • an entry with DIB 0

Although the expected cost is small under typical conditions, the worst case can still be linear in the cluster size.

This makes deletion more expensive than simpler approaches.

Robin Hood hashing performs more memory writes than simpler open-addressing schemes:

  • swaps during insertion
  • shifts during deletion

This increases write amplification, which may hurt performance in write-heavy workloads or memory-bandwidth-constrained systems.

Compared to linear probing, Robin Hood hashing is more complex to implement correctly.

  • It requires maintaining or computing the Distance to Initial Bucket (DIB).
  • The insertion and deletion logic must carefully preserve the core invariants.

This added complexity makes the implementation harder to reason about and increases the risk of subtle bugs.

As the load factor increases:

  • probe sequences become longer
  • more swaps occur during insertion
  • deletion becomes more expensive

Although Robin Hood hashing handles high load factors better than plain linear probing, its performance still degrades as the table approaches capacity.

image.png

It is time to get our hands dirty.

The walkthrough below is distilled from a small Go reference implementation (rbhashing).

Let's start with the core data structures and their responsibilities.

The two main data structures are entry and table.

go
package main

// --- bucket: one key/value plus its distance from its ideal bucket (DIB) ---
type entry[K comparable, V any] struct {
	dib   int8 // distance-from-ideal-bucket; -1 = empty
	key   K
	value V
}

// --- table: metadata + backing array (logical size vs allocated length) ---
type table[K comparable, V any] struct {
	entries      []entry[K, V]
	hasher       func(key K) int64
	sizeMinusOne int64 // logical table is (sizeMinusOne + 1); power of two
	numElements  int64
	size         int64
	loadFactor   float64
	maxLookup    int8 // cap on probe length before grow
}

// --- routing ---
// getBucket(hash int64) int64          // home index from hash
// allocEntries(size int64)             // allocate entries + sentinel
// getMaxProb() int8                    // set maxLookup from table size
// grow()                                // grow the table and rehash
// rehash(newSize int64)                 // rehash the table

// --- operations ---
// Insert(key K, value V) bool          // Robin Hood insert; split fast/slow path
// Get(key K) (bool, V)                 // lookup; same probe stop rule as Insert
// Delete(key K) bool                   // delete the key
// Clear()                              // clear the table
package main

// --- bucket: one key/value plus its distance from its ideal bucket (DIB) ---
type entry[K comparable, V any] struct {
	dib   int8 // distance-from-ideal-bucket; -1 = empty
	key   K
	value V
}

// --- table: metadata + backing array (logical size vs allocated length) ---
type table[K comparable, V any] struct {
	entries      []entry[K, V]
	hasher       func(key K) int64
	sizeMinusOne int64 // logical table is (sizeMinusOne + 1); power of two
	numElements  int64
	size         int64
	loadFactor   float64
	maxLookup    int8 // cap on probe length before grow
}

// --- routing ---
// getBucket(hash int64) int64          // home index from hash
// allocEntries(size int64)             // allocate entries + sentinel
// getMaxProb() int8                    // set maxLookup from table size
// grow()                                // grow the table and rehash
// rehash(newSize int64)                 // rehash the table

// --- operations ---
// Insert(key K, value V) bool          // Robin Hood insert; split fast/slow path
// Get(key K) (bool, V)                 // lookup; same probe stop rule as Insert
// Delete(key K) bool                   // delete the key
// Clear()                              // clear the table

hasher maps a key to a 64-bit hash, and getBucket turns that hash into a starting index. Everything else lives in entries, a flat slice that is probed forward according to Robin Hood rules.

Flow: hash → home index → linear probe in entries. The skeleton omits grow, rehash, and helper methods such as replace; those follow the same sizing rules as allocEntries.

dib is the first field so that the hot probe loop usually touches only the first byte of each bucket. Empty buckets use dib < 0, while active entries use dib >= 0.

go
// dib is the first field so the probe loop reads only the first byte of each
// entry on every step (same layout as ska::detailv3::sherwood_v3_entry).
type entry[K comparable, V any] struct {
	dib   int8 // distance-from-ideal-bucket; -1 = empty
	key   K
	value V
}

func (e *entry[K, V]) replace(key K, newValue V, newDIB int8) {
	e.dib = newDIB
	e.key = key
	e.value = newValue
}

func (e *entry[K, V]) IsEmpty() bool {
	return e.dib < 0
}

func (e *entry[K, V]) Equal(key K) bool {
	return e.dib >= 0 && e.key == key
}
// dib is the first field so the probe loop reads only the first byte of each
// entry on every step (same layout as ska::detailv3::sherwood_v3_entry).
type entry[K comparable, V any] struct {
	dib   int8 // distance-from-ideal-bucket; -1 = empty
	key   K
	value V
}

func (e *entry[K, V]) replace(key K, newValue V, newDIB int8) {
	e.dib = newDIB
	e.key = key
	e.value = newValue
}

func (e *entry[K, V]) IsEmpty() bool {
	return e.dib < 0
}

func (e *entry[K, V]) Equal(key K) bool {
	return e.dib >= 0 && e.key == key
}

The logical table length is always a power of two, both when the table is first created and when it grows. That allows the home bucket to be computed with hash & sizeMinusOne instead of hash % size. The bitwise AND simply keeps the lower log2(size) bits of the hash, whereas modulo typically requires a division-based remainder operation, which is more expensive.

go
func (t *table[K, V]) getBucket(hash int64) int64 {
	return hash & t.sizeMinusOne
}
func (t *table[K, V]) getBucket(hash int64) int64 {
	return hash & t.sizeMinusOne
}

The logical table size is size, and maxLookup is the maximum probe distance we allow during insertion or lookup. For that reason, the backing slice is allocated with length size + maxLookup, rather than just size.

The purpose of this extra space is to avoid wrap-around during probing. In a conventional circular table, once probing reaches the end of the array, it must wrap back to 0, 1, and so on. That usually requires modulo arithmetic, which adds cost to the hot path. Instead, this implementation allocates a small overflow region beyond the logical table so probing can continue linearly without wrapping.

In other words, the target entry should never be more than maxLookup slots away from its home bucket. That assumption allows us to replace circular probing with straight-line probing over a slightly larger slice.

Initially, every slot is marked as empty with dib == -1. The final slot acts as a sentinel with dib == 0, which allows the probe loop to keep incrementing i without needing an explicit bounds check on every iteration. Once probing reaches that sentinel, it stops naturally.

go
// allocEntries creates the flat entry slice and places the sentinel.
func (t *table[K, V]) allocEntries(size int64) {
	total := size + int64(t.maxLookup)
	t.entries = make([]entry[K, V], total)
	for i := range t.entries {
		t.entries[i].dib = -1
	}
	t.entries[total-1].dib = 0
}
// allocEntries creates the flat entry slice and places the sentinel.
func (t *table[K, V]) allocEntries(size int64) {
	total := size + int64(t.maxLookup)
	t.entries = make([]entry[K, V], total)
	for i := range t.entries {
		t.entries[i].dib = -1
	}
	t.entries[total-1].dib = 0
}

maxLookup is roughly log2(size), with a floor such as MinProb = 4. If insertion would exceed that distance, the implementation grows the table and retries. This matches the idea from the lookup section that the maximum probe distance can be used as an additional stopping condition.

go
const MinProb = 4

func (t *table[K, V]) getMaxProb() int8 {
	v := int8(math.Log2(float64(t.size)))
	if v < MinProb {
		return MinProb
	}
	return v
}
const MinProb = 4

func (t *table[K, V]) getMaxProb() int8 {
	v := int8(math.Log2(float64(t.size)))
	if v < MinProb {
		return MinProb
	}
	return v
}

Insert first walks forward while occupant.dib >= incoming.dib. Duplicate keys are updated in this fast path, without swapping. Once that condition no longer holds, insertSlowPath performs the Robin Hood swaps.

Marking insertSlowPath with //go:noinline keeps the common scan-only loop smaller and easier for the compiler to optimize.

After each swap, if dib reaches maxLookup, the table grows and the displaced key is inserted again. This is the implementation counterpart of the maximum-probe-length idea discussed earlier.

go
func (t *table[K, V]) Insert(key K, value V) bool {
	if float64(t.numElements+1) > float64(t.size)*t.loadFactor {
		t.grow()
	}

	hash := t.hasher(key)
	i := t.getBucket(hash)
	dib := int8(0)

	for ; t.entries[i].dib >= dib; dib++ {
		if t.entries[i].Equal(key) {
			t.entries[i].value = value
			return true
		}
		i++
	}

	return t.insertSlowPath(i, dib, key, value)
}

//go:noinline
func (t *table[K, V]) insertSlowPath(i int64, dib int8, key K, value V) bool {
	for {
		if t.entries[i].IsEmpty() {
			t.entries[i].replace(key, value, dib)
			t.numElements++
			return true
		}

		key, t.entries[i].key = t.entries[i].key, key
		value, t.entries[i].value = t.entries[i].value, value
		dib, t.entries[i].dib = t.entries[i].dib, dib

		dib++
		if dib == t.maxLookup {
			t.grow()
			t.Insert(key, value)
			return true
		}
		i++
	}
}
func (t *table[K, V]) Insert(key K, value V) bool {
	if float64(t.numElements+1) > float64(t.size)*t.loadFactor {
		t.grow()
	}

	hash := t.hasher(key)
	i := t.getBucket(hash)
	dib := int8(0)

	for ; t.entries[i].dib >= dib; dib++ {
		if t.entries[i].Equal(key) {
			t.entries[i].value = value
			return true
		}
		i++
	}

	return t.insertSlowPath(i, dib, key, value)
}

//go:noinline
func (t *table[K, V]) insertSlowPath(i int64, dib int8, key K, value V) bool {
	for {
		if t.entries[i].IsEmpty() {
			t.entries[i].replace(key, value, dib)
			t.numElements++
			return true
		}

		key, t.entries[i].key = t.entries[i].key, key
		value, t.entries[i].value = t.entries[i].value, value
		dib, t.entries[i].dib = t.entries[i].dib, dib

		dib++
		if dib == t.maxLookup {
			t.grow()
			t.Insert(key, value)
			return true
		}
		i++
	}
}

Lookup uses the same stop rule as insertion. Once entries[i].dib < dib, the searched key cannot appear later in the probe sequence.

go
func (t *table[K, V]) findIdx(key K) int64 {
	hash := t.hasher(key)
	i := t.getBucket(hash)
	for dib := int8(0); t.entries[i].dib >= dib; dib++ {
		if t.entries[i].Equal(key) {
			return i
		}
		i++
	}
	return -1
}
func (t *table[K, V]) findIdx(key K) int64 {
	hash := t.hasher(key)
	i := t.getBucket(hash)
	for dib := int8(0); t.entries[i].dib >= dib; dib++ {
		if t.entries[i].Equal(key) {
			return i
		}
		i++
	}
	return -1
}

After removing the target entry, each following entry shifts one bucket to the left and its DIB decreases by 1. This continues until the next bucket is empty (dib < 0) or contains an entry already in its home bucket (dib == 0). The sentinel with dib == 0 stops the run in the same way.

go
func (t *table[K, V]) Delete(key K) bool {
	idx := t.findIdx(key)
	if idx < 0 {
		return false
	}

	t.numElements--

	for {
		next := idx + 1
		if t.entries[next].dib <= 0 {
			t.entries[idx].dib = -1
			break
		}
		t.entries[idx].key = t.entries[next].key
		t.entries[idx].value = t.entries[next].value
		t.entries[idx].dib = t.entries[next].dib - 1
		idx = next
	}

	return true
}
func (t *table[K, V]) Delete(key K) bool {
	idx := t.findIdx(key)
	if idx < 0 {
		return false
	}

	t.numElements--

	for {
		next := idx + 1
		if t.entries[next].dib <= 0 {
			t.entries[idx].dib = -1
			break
		}
		t.entries[idx].key = t.entries[next].key
		t.entries[idx].value = t.entries[next].value
		t.entries[idx].dib = t.entries[next].dib - 1
		idx = next
	}

	return true
}

The block below follows the same order as the skeleton: entry → table metadata → getBucketallocEntriesgetMaxProbInsert / insertSlowPathfindIdxDelete, followed by grow / rehash so that Insert does not end with a dangling call. The helper functions at the bottom match the rbhashing repository (NearestPowerOfTwo, MaxInt64).

go
package main

import "math"

const MinProb = 4

type entry[K comparable, V any] struct {
	dib   int8
	key   K
	value V
}

func (e *entry[K, V]) replace(key K, newValue V, newDIB int8) {
	e.dib = newDIB
	e.key = key
	e.value = newValue
}

func (e *entry[K, V]) IsEmpty() bool     { return e.dib < 0 }
func (e *entry[K, V]) Equal(key K) bool { return e.dib >= 0 && e.key == key }

type table[K comparable, V any] struct {
	entries      []entry[K, V]
	hasher       func(key K) int64
	sizeMinusOne int64
	numElements  int64
	size         int64
	loadFactor   float64
	maxLookup    int8
}

func (t *table[K, V]) getBucket(hash int64) int64 {
	return hash & t.sizeMinusOne
}

func (t *table[K, V]) allocEntries(size int64) {
	total := size + int64(t.maxLookup)
	t.entries = make([]entry[K, V], total)
	for i := range t.entries {
		t.entries[i].dib = -1
	}
	t.entries[total-1].dib = 0
}

func (t *table[K, V]) getMaxProb() int8 {
	v := int8(math.Log2(float64(t.size)))
	if v < MinProb {
		return MinProb
	}
	return v
}

func (t *table[K, V]) Insert(key K, value V) bool {
	if float64(t.numElements+1) > float64(t.size)*t.loadFactor {
		t.grow()
	}
	hash := t.hasher(key)
	i := t.getBucket(hash)
	dib := int8(0)
	for ; t.entries[i].dib >= dib; dib++ {
		if t.entries[i].Equal(key) {
			t.entries[i].value = value
			return true
		}
		i++
	}
	return t.insertSlowPath(i, dib, key, value)
}

//go:noinline
func (t *table[K, V]) insertSlowPath(i int64, dib int8, key K, value V) bool {
	for {
		if t.entries[i].IsEmpty() {
			t.entries[i].replace(key, value, dib)
			t.numElements++
			return true
		}
		key, t.entries[i].key = t.entries[i].key, key
		value, t.entries[i].value = t.entries[i].value, value
		dib, t.entries[i].dib = t.entries[i].dib, dib
		dib++
		if dib == t.maxLookup {
			t.grow()
			t.Insert(key, value)
			return true
		}
		i++
	}
}

func (t *table[K, V]) findIdx(key K) int64 {
	hash := t.hasher(key)
	i := t.getBucket(hash)
	for dib := int8(0); t.entries[i].dib >= dib; dib++ {
		if t.entries[i].Equal(key) {
			return i
		}
		i++
	}
	return -1
}

func (t *table[K, V]) Delete(key K) bool {
	idx := t.findIdx(key)
	if idx < 0 {
		return false
	}
	t.numElements--
	for {
		next := idx + 1
		if t.entries[next].dib <= 0 {
			t.entries[idx].dib = -1
			break
		}
		t.entries[idx].key = t.entries[next].key
		t.entries[idx].value = t.entries[next].value
		t.entries[idx].dib = t.entries[next].dib - 1
		idx = next
	}
	return true
}

func (t *table[K, V]) grow() {
	t.rehash(maxInt64(4, t.size*2))
}

func (t *table[K, V]) rehash(newSize int64) {
	needed := maxInt64(newSize, int64(math.Round(float64(t.numElements)/t.loadFactor)))
	n := nearestPowerOfTwo(needed)
	oldEntries := t.entries
	t.size = n
	t.sizeMinusOne = n - 1
	t.maxLookup = t.getMaxProb()
	t.numElements = 0
	t.allocEntries(n)
	for i := range oldEntries[:len(oldEntries)-1] {
		if !oldEntries[i].IsEmpty() {
			t.Insert(oldEntries[i].key, oldEntries[i].value)
		}
	}
}

func maxInt64(a, b int64) int64 {
	if a < b {
		return b
	}
	return a
}

func nearestPowerOfTwo(n int64) int64 {
	if n <= 0 {
		return 1
	}
	return 1 << uint64(math.Ceil(math.Log2(float64(n))))
}
package main

import "math"

const MinProb = 4

type entry[K comparable, V any] struct {
	dib   int8
	key   K
	value V
}

func (e *entry[K, V]) replace(key K, newValue V, newDIB int8) {
	e.dib = newDIB
	e.key = key
	e.value = newValue
}

func (e *entry[K, V]) IsEmpty() bool     { return e.dib < 0 }
func (e *entry[K, V]) Equal(key K) bool { return e.dib >= 0 && e.key == key }

type table[K comparable, V any] struct {
	entries      []entry[K, V]
	hasher       func(key K) int64
	sizeMinusOne int64
	numElements  int64
	size         int64
	loadFactor   float64
	maxLookup    int8
}

func (t *table[K, V]) getBucket(hash int64) int64 {
	return hash & t.sizeMinusOne
}

func (t *table[K, V]) allocEntries(size int64) {
	total := size + int64(t.maxLookup)
	t.entries = make([]entry[K, V], total)
	for i := range t.entries {
		t.entries[i].dib = -1
	}
	t.entries[total-1].dib = 0
}

func (t *table[K, V]) getMaxProb() int8 {
	v := int8(math.Log2(float64(t.size)))
	if v < MinProb {
		return MinProb
	}
	return v
}

func (t *table[K, V]) Insert(key K, value V) bool {
	if float64(t.numElements+1) > float64(t.size)*t.loadFactor {
		t.grow()
	}
	hash := t.hasher(key)
	i := t.getBucket(hash)
	dib := int8(0)
	for ; t.entries[i].dib >= dib; dib++ {
		if t.entries[i].Equal(key) {
			t.entries[i].value = value
			return true
		}
		i++
	}
	return t.insertSlowPath(i, dib, key, value)
}

//go:noinline
func (t *table[K, V]) insertSlowPath(i int64, dib int8, key K, value V) bool {
	for {
		if t.entries[i].IsEmpty() {
			t.entries[i].replace(key, value, dib)
			t.numElements++
			return true
		}
		key, t.entries[i].key = t.entries[i].key, key
		value, t.entries[i].value = t.entries[i].value, value
		dib, t.entries[i].dib = t.entries[i].dib, dib
		dib++
		if dib == t.maxLookup {
			t.grow()
			t.Insert(key, value)
			return true
		}
		i++
	}
}

func (t *table[K, V]) findIdx(key K) int64 {
	hash := t.hasher(key)
	i := t.getBucket(hash)
	for dib := int8(0); t.entries[i].dib >= dib; dib++ {
		if t.entries[i].Equal(key) {
			return i
		}
		i++
	}
	return -1
}

func (t *table[K, V]) Delete(key K) bool {
	idx := t.findIdx(key)
	if idx < 0 {
		return false
	}
	t.numElements--
	for {
		next := idx + 1
		if t.entries[next].dib <= 0 {
			t.entries[idx].dib = -1
			break
		}
		t.entries[idx].key = t.entries[next].key
		t.entries[idx].value = t.entries[next].value
		t.entries[idx].dib = t.entries[next].dib - 1
		idx = next
	}
	return true
}

func (t *table[K, V]) grow() {
	t.rehash(maxInt64(4, t.size*2))
}

func (t *table[K, V]) rehash(newSize int64) {
	needed := maxInt64(newSize, int64(math.Round(float64(t.numElements)/t.loadFactor)))
	n := nearestPowerOfTwo(needed)
	oldEntries := t.entries
	t.size = n
	t.sizeMinusOne = n - 1
	t.maxLookup = t.getMaxProb()
	t.numElements = 0
	t.allocEntries(n)
	for i := range oldEntries[:len(oldEntries)-1] {
		if !oldEntries[i].IsEmpty() {
			t.Insert(oldEntries[i].key, oldEntries[i].value)
		}
	}
}

func maxInt64(a, b int64) int64 {
	if a < b {
		return b
	}
	return a
}

func nearestPowerOfTwo(n int64) int64 {
	if n <= 0 {
		return 1
	}
	return 1 << uint64(math.Ceil(math.Log2(float64(n))))
}

You can find the complete code here.

Benchmarks run on Apple M3 Pro (12 cores)

  • rb - the Robin Hood open-addressing table from this article (loadFactor = 0.5, SplitMix64 for int64 keys and FNV-1a for string keys).
  • gomap - Go’s built-in map, pre-sized with make(..., n) where applicable.

Sizes on the x-axis are n{4,64,1024,16384,262144,4194304}n \in \{4, 64, 1024, 16384, 262144, 4194304\}. Int64 keys are 0n10 \ldots n-1. String keys are fixed-width 16-character decimal strings ("0000000000000042" style); values are the key itself for integers and len(key) for strings so both tables do comparable work.

Insert: Each benchmark iteration builds a new empty table and inserts all nn keys; items/op is reported so you can derive nanoseconds per insertion from ns/op ÷ items/op.

Hit lookup: The table is filled once before timing (not included in the loop). Each iteration does one successful Get in sequential order over the key slice, so the access pattern is cache-friendly for the key array.

Miss lookup: The table holds keys 0n10 \ldots n-1 (or the string equivalents). Miss keys are drawn from a disjoint set (int64 values starting at 2n2n, strings with a miss- prefix) so every probe is a true miss.

Delete: One timed iteration deletes a key and immediately re-inserts it, cycling through all nn keys so occupancy stays fixed and Robin Hood’s backward-shift deletion stays on the hot path.

Cache miss: Lookups use a fixed random permutation of indices (rand.Perm(n) once per sub-benchmark). Access order is shuffled but repeatable; for large nn the table stops fitting in cache, so the curve stresses pointer-chasing / cache-miss behavior rather than sequential locality.

Results are assigned to sink variables so the compiler cannot drop the lookups.

Insert benchmark, int64 keys
Insert benchmark, string keys
Hit lookup benchmark, int64 keys
Hit lookup benchmark, string keys
Cache miss benchmark, int64 keys
Cache miss benchmark, string keys
Delete benchmark, int64 keys
Delete benchmark, string keys

The benchmark results show that Go’s built-in map is still the stronger general-purpose hash table. It consistently performs better for small and medium table sizes, especially for insertion, lookup, miss lookup, and nearly all string workloads. Yeah, the built-in map is damn fast, but we can still see that Robin Hood hashing is not that bad.

In order to beat the built-in map, we may need some further optimizations, maybe some techniques mentioned in this video. I tried to add the prime-size backed array but it does not improve the benchmark results (it's even worse), not really sure if I implemented it correctly. if I have chance, I'll revisit this and see if there's any other room for optimization.

Tagged:#Backend
0