Redis internal: Understand underlying data structures used in Redis
Discover the powerful internal data structures that make Redis a blazing-fast in-memory data store. This comprehensive guide explores how Redis uses SDS for strings, quicklists for lists, and skiplists for sorted sets, ensuring optimal performance, memory efficiency, and scalability. Perfect for developers and enthusiasts looking to deepen their understanding of Redis’s inner workings. Learn why Redis is the go-to choice for caching, real-time analytics, and more!
Redis, an incredibly fast in-memory data store, stands out for its simplicity and performance. What powers Redis under the hood? The answer lies in its meticulously designed internal data structures. These aren't mere abstract concepts—they're carefully optimized components handling everything from caching and messaging to real-time analytics and geospatial queries.
In this blog, we'll explore Redis's internal mechanisms, examining the building blocks behind its core data types—strings, lists, sets, hashes, and sorted sets. We'll see how Redis cleverly adapts its storage strategies to achieve an optimal balance of speed, memory efficiency, and scalability. From the elegant SDS (Simple Dynamic Strings) for string handling to the hybrid quicklists for list management and the precise skiplists for sorted sets, each structure is crafted to maximize Redis's potential.
Through this guide, you'll develop a richer understanding of Redis's internals and how its data structures drive its impressive performance and flexibility. Whether you're an experienced Redis user or a curious developer, this deep dive will illuminate why Redis continues to be a fundamental part of modern application architecture.
Let's dive in!
Here’s table of mapping the value type with the data structured Redis uses for that value
| Data Type | Underlying Data Structure |
|---|---|
| String | SDS (Simple Dynamic String) |
| Hash | ZipList or Dict |
| List | Quicklist |
| Set | IntSet or Dict |
| Sorted Set | SkipList + Dict |
The implementation is available as a standalone library at https://github.com/antirez/sds.
Redis is written entirely in C but doesn't use traditional C strings (char*).
Key limitations of traditional C strings:
- Fixed memory allocation—strings require manual memory reallocation when modified, with potential overflow risks
- Unable to store binary data because C strings use the null character (
\0) to mark string endings, while binary data may contain null bytes - O(n) length retrieval—calculating string length requires iterating through the entire string
Let’s see how Redis eliminates those limitations using SDS
Here's the structure of SDS
It consists of two parts: the metadata part (sdshdr) and the data part (buf).
The metadata stores essential information using different header types based on the string's length.
There are 5 SDS header types:
| Header type | Header size | Capacity usage |
|---|---|---|
| SDS_TYPE_5 | 1 byte | For very small strings |
| SDS_TYPE_8 | 3 bytes | For small strings |
| SDS_TYPE_16 | 5 byte | For medium strings |
| SDS_TYPE_32 | 9 bytes | For large strings |
| SDS_TYPE_64 | 17 bytes | For large strings |
/* Note: sdshdr5 is never used, we just access the flags byte directly.
* However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
uint8_t len; /* used */
uint8_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
uint16_t len; /* used */
uint16_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
uint32_t len; /* used */
uint32_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
uint64_t len; /* used */
uint64_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};/* Note: sdshdr5 is never used, we just access the flags byte directly.
* However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
uint8_t len; /* used */
uint8_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
uint16_t len; /* used */
uint16_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
uint32_t len; /* used */
uint32_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
uint64_t len; /* used */
uint64_t alloc; /* excluding the header and null terminator */
unsigned char flags; /* 3 lsb of type, 5 unused bits */
char buf[];
};Let's examine each field:
alloc:- Represents total allocated memory for the buf[] array
- Always equals or exceeds len
- Uses preallocated space to minimize future reallocations during string growth:
- Strategy: For strings ≤ 1MB, double the allocated size (rounded to nearest power of
2); for strings > 1MB, add 1MB - 🧠 By using this way, Redis can minimize the impact of reallocation every time the value is updated.
- Strategy: For strings ≤ 1MB, double the allocated size (rounded to nearest power of
- Always a power of
2
flags: Indicates header type.- This is always 1 byte field. The first 3 bits are used to indicate the header type. The following 5 bits are used to store the len of the string in case it’s
SDS_TYPE_5****
- This is always 1 byte field. The first 3 bits are used to indicate the header type. The following 5 bits are used to store the len of the string in case it’s
len:- Tracks the number of bytes used in the buf[] array
- Excludes the null terminator (
\0):- 🧠 enable storing binary data
- Enables constant-time length retrieval
- The
SDS_TYPE_5****doesn’t have this field because its len is stored directly in the flag (🧠)
buf[]:- Flexible array storing the string data and null terminator (
\0): - Still store
\0for C function compatibility but doesn't use it for length calculation
- Flexible array storing the string data and null terminator (
Redis uses different header types to minimize metadata overhead. For instance, using uint32_t for len would be wasteful for small strings like test.
Memory layout example:
For the SDS string "hello":
- len: 5 (length of
"hello"without\0) - alloc: 8 (total buffer size with extra space)
- buf[]:
"hello\0"
While SDS offers many advantages over traditional C strings, it has some trade-offs:
- Memory overhead: Additional metadata storage increases memory usage
- Fragmentation: Preallocation may waste memory if strings don't grow as anticipated
- SDS stores both Redis keys and string values (not just for string data types, but for all string values within other data types like lists, sets, and sorted sets)
A ziplist is a specially designed data structure that stores elements contiguously in memory to minimize overhead. Each element in a ziplist can be a string or an integer, with its size dynamically adjusted based on the actual content.
A ziplist consists of 4 components:
zlbytes(4 bytes): Total size of theziplistin bytes, including header, entries, and end marker. Used for memory management.zltail(4 bytes): Offset to the last entry in theziplist, enabling quick tail access.zllen(2 bytes): Number of entries in theziplist.- Note: When entries exceed
65535, this field is set to65535, and the actual count requires list traversal.- 🧠 This approach aligns with
ziplist'spurpose of storing small collections
- 🧠 This approach aligns with
- Note: When entries exceed
Entries: A series of contiguous elements using flexible encoding to minimize memory usage.- Each entry contains:
prevlen: Length of the previous entry- 1 byte for lengths
≤ 254 bytes - 5 bytes for lengths
> 254 bytes
- 1 byte for lengths
encoding: Encoding and length information- For integers: Contains the actual integer value
- For strings: Contains the string length
data: The actual string or integer value
- Each entry contains:
The ziplist supports mixed data types, allowing both integers and strings within the same list.
ZipList has several key advantages:
- Eliminates pointer overhead through contiguous storage
- Enables fast sequential element traversal
- Adjusts size dynamically as elements change
- Poor ****Random ****Access: Accessing elements by index requires sequential traversal.
- Resizing ****Overhead: Large
ziplistsface significant costs during element addition or removal.- Since data is stored contiguously, adding elements may require reallocating the entire list when space is insufficient.
- Modifying a value to exceed
254 bytestriggers a chain reaction, requiringprevlenupdates in subsequent nodes—leading toO(n)complexity
- Size Limitations: Performance degrades with large datasets due to resizing and traversal costs.
ziplist serves multiple data types in Redis:
Hash:
Small hash objects use ziplist for storage.
Example hash:
{ "field1": "value1", "field2": "value2" },{ "field1": "value1", "field2": "value2" },Memory layout:
[ zlbytes ][ zltail ][ zllen ][ field1 ][ value1 ][ field2 ][ value2 ][ zlend ][ zlbytes ][ zltail ][ zllen ][ field1 ][ value1 ][ field2 ][ value2 ][ zlend ]Size criteria are controlled by:
hash-max-ziplist-entries: Maximum field-value pairs (default:512)hash-max-ziplist-value: Maximum value size in bytes (default:64)
List:
Small lists also use ziplist storage.
Example list:
LPUSH mylist "A" "B" "C"LPUSH mylist "A" "B" "C"Memory layout:
[ zlbytes ][ zltail ][ zllen ][ C ][ B ][ A ][ zlend ][ zlbytes ][ zltail ][ zllen ][ C ][ B ][ A ][ zlend ]Criteria:
- Controlled by
list-max-ziplist-size: Maximumziplistnode size- Negative value: Maximum entry size in bytes
- Positive value: Maximum number of entries
Default value is -2
Sorted Set:
Small sorted sets use ziplist storage.
Example sorted set:
ZADD myzset 1.0 "A"
ZADD myzset 2.0 "B"ZADD myzset 1.0 "A"
ZADD myzset 2.0 "B"Memory layout:
[ zlbytes ][ zltail ][ zllen ][ score1 ][ member1 ][ score2 ][ member2 ][ zlend ][ zlbytes ][ zltail ][ zllen ][ score1 ][ member1 ][ score2 ][ member2 ][ zlend ]Criteria controlled by:
zset-max-ziplist-entries: Maximum elements (default:128)zset-max-ziplist-value: Maximum member size in bytes (default:64 bytes)
A quicklist is a hybrid data structure that:
- Represents a
Redislist as alinked listofziplistnodes. - Combines the benefits of:
ziplist: Compact, contiguous memory storage for list elements.Linked List: Dynamic growth and efficient insertions and deletions.
Redis used to represent lists with:
Plain Linked Lists: High memory overhead due to pointers, each node is separately allocated, leading to potential memory fragmentation.Ziplists: Efficient for small lists but inefficient for large lists due to resizing costs.
What ****quicklist addresses the downsides: By grouping multiple elements into a single ziplist, we resolve all of those downsize
- Reduce the number of pointers overall
- For example, if we have 10 elements and use linked list, we need 20 pointers (10 prev pointers and 10 next pointers) but if we group those elements into 2 ziplist, the number of points is now reduced to just 4 (2 prev pointers & 2 next pointers)
- Elements in
ziplistare next to each other:- 🧠 reduce memory fragmentation
- Breaking into multiple
ziplistalso keeps theziplistnot too large- 🧠 Optimize for resize
As mentioned in previous section, the quicklist is just a linked list of ziplist nodes
Here’s simplified structure
Here’s how it presented in Redis source code
/* quicklistNode is a 32 byte struct describing a listpack for a quicklist.
* We use bit fields keep the quicklistNode at 32 bytes.
* count: 16 bits, max 65536 (max lp bytes is 65k, so max count actually < 32k).
* encoding: 2 bits, RAW=1, LZF=2.
* container: 2 bits, PLAIN=1 (a single item as char array), PACKED=2 (listpack with multiple items).
* recompress: 1 bit, bool, true if node is temporary decompressed for usage.
* attempted_compress: 1 bit, boolean, used for verifying during testing.
* dont_compress: 1 bit, boolean, used for preventing compression of entry.
* extra: 9 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
struct quicklistNode *prev;
struct quicklistNode *next;
unsigned char *entry;
size_t sz; /* entry size in bytes */
unsigned int count : 16; /* count of items in listpack */
unsigned int encoding : 2; /* RAW==1 or LZF==2 */
unsigned int container : 2; /* PLAIN==1 or PACKED==2 */
unsigned int recompress : 1; /* was this node previous compressed? */
unsigned int attempted_compress : 1; /* node can't compress; too small */
unsigned int dont_compress : 1; /* prevent compression of entry that will be used later */
unsigned int extra : 9; /* more bits to steal for future usage */
} quicklistNode;
/* quicklistLZF is a 8+N byte struct holding 'sz' followed by 'compressed'.
* 'sz' is byte length of 'compressed' field.
* 'compressed' is LZF data with total (compressed) length 'sz'
* NOTE: uncompressed length is stored in quicklistNode->sz.
* When quicklistNode->entry is compressed, node->entry points to a quicklistLZF */
typedef struct quicklistLZF {
size_t sz; /* LZF size in bytes*/
char compressed[];
} quicklistLZF;
/* Bookmarks are padded with realloc at the end of of the quicklist struct.
* They should only be used for very big lists if thousands of nodes were the
* excess memory usage is negligible, and there's a real need to iterate on them
* in portions.
* When not used, they don't add any memory overhead, but when used and then
* deleted, some overhead remains (to avoid resonance).
* The number of bookmarks used should be kept to minimum since it also adds
* overhead on node deletion (searching for a bookmark to update). */
typedef struct quicklistBookmark {
quicklistNode *node;
char *name;
} quicklistBookmark;
/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
* 'count' is the number of total entries.
* 'len' is the number of quicklist nodes.
* 'compress' is: 0 if compression disabled, otherwise it's the number
* of quicklistNodes to leave uncompressed at ends of quicklist.
* 'fill' is the user-requested (or default) fill factor.
* 'bookmarks are an optional feature that is used by realloc this struct,
* so that they don't consume memory when not used. */
typedef struct quicklist {
quicklistNode *head;
quicklistNode *tail;
unsigned long count; /* total count of all entries in all listpacks */
unsigned long len; /* number of quicklistNodes */
signed int fill : QL_FILL_BITS; /* fill factor for individual nodes */
unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
unsigned int bookmark_count: QL_BM_BITS;
quicklistBookmark bookmarks[];
} quicklist;/* quicklistNode is a 32 byte struct describing a listpack for a quicklist.
* We use bit fields keep the quicklistNode at 32 bytes.
* count: 16 bits, max 65536 (max lp bytes is 65k, so max count actually < 32k).
* encoding: 2 bits, RAW=1, LZF=2.
* container: 2 bits, PLAIN=1 (a single item as char array), PACKED=2 (listpack with multiple items).
* recompress: 1 bit, bool, true if node is temporary decompressed for usage.
* attempted_compress: 1 bit, boolean, used for verifying during testing.
* dont_compress: 1 bit, boolean, used for preventing compression of entry.
* extra: 9 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
struct quicklistNode *prev;
struct quicklistNode *next;
unsigned char *entry;
size_t sz; /* entry size in bytes */
unsigned int count : 16; /* count of items in listpack */
unsigned int encoding : 2; /* RAW==1 or LZF==2 */
unsigned int container : 2; /* PLAIN==1 or PACKED==2 */
unsigned int recompress : 1; /* was this node previous compressed? */
unsigned int attempted_compress : 1; /* node can't compress; too small */
unsigned int dont_compress : 1; /* prevent compression of entry that will be used later */
unsigned int extra : 9; /* more bits to steal for future usage */
} quicklistNode;
/* quicklistLZF is a 8+N byte struct holding 'sz' followed by 'compressed'.
* 'sz' is byte length of 'compressed' field.
* 'compressed' is LZF data with total (compressed) length 'sz'
* NOTE: uncompressed length is stored in quicklistNode->sz.
* When quicklistNode->entry is compressed, node->entry points to a quicklistLZF */
typedef struct quicklistLZF {
size_t sz; /* LZF size in bytes*/
char compressed[];
} quicklistLZF;
/* Bookmarks are padded with realloc at the end of of the quicklist struct.
* They should only be used for very big lists if thousands of nodes were the
* excess memory usage is negligible, and there's a real need to iterate on them
* in portions.
* When not used, they don't add any memory overhead, but when used and then
* deleted, some overhead remains (to avoid resonance).
* The number of bookmarks used should be kept to minimum since it also adds
* overhead on node deletion (searching for a bookmark to update). */
typedef struct quicklistBookmark {
quicklistNode *node;
char *name;
} quicklistBookmark;
/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
* 'count' is the number of total entries.
* 'len' is the number of quicklist nodes.
* 'compress' is: 0 if compression disabled, otherwise it's the number
* of quicklistNodes to leave uncompressed at ends of quicklist.
* 'fill' is the user-requested (or default) fill factor.
* 'bookmarks are an optional feature that is used by realloc this struct,
* so that they don't consume memory when not used. */
typedef struct quicklist {
quicklistNode *head;
quicklistNode *tail;
unsigned long count; /* total count of all entries in all listpacks */
unsigned long len; /* number of quicklistNodes */
signed int fill : QL_FILL_BITS; /* fill factor for individual nodes */
unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
unsigned int bookmark_count: QL_BM_BITS;
quicklistBookmark bookmarks[];
} quicklist;Let’s go through them one by one.
quickList: main container, the top level represents the QuickList. It contains:- Metadata: Information about the
quicklist, such as the total number of elements and nodes - Head and Tail Pointers: Pointers to the first (head) and last (tail) quicklist Nodes
- Linked List of Nodes: Each node is a
quicklistNode, representing a portion of the list.
- Metadata: Information about the
quicklistNode: represents a single node in the linked list. Each node contains the pointersunsigned char *entrywhich point to theziplistorquicklistLZFif node is compressedquicklistLZFCompressedziplistby using LZF compression algorithmquicklistcompresses middle node to save the memory and uncompresses them when needed.
quicklistBookmarkquicklistalso stores some pointer which points to specific nodes in theQuicklistfor fast access.
When it comes to quicklist, one of the most difficult thing is to decide the configuration for the quicklist, we have 2 configurations:
list-max-ziplist-size: Controls the maximum size ofziplistwithinquicklistnodes- Positive: Maximum number of elements per
ziplist. - Negative: Maximum size in bytes (e.g., -2 means up to 8 KB).
- Positive: Maximum number of elements per
list-compress-depth: Controls how many nodes at the head and tail remain uncompressed.- Middle nodes are compressed to save memory.
- Default: 0 (no compression).
The shorter ziplist, the more memory fragmentation. But the longer ziplist, the more difficulty in allocating large contiguous memory space for the ziplist. So you should be really careful when setting those values, if not sure, just use the default one, it’s not the best one, but at least it can serve your need in some way.
- When we access the middle node, we need travel from the head to tail →
O(n)****complexity - Splitting / Merging the
ziplistin case of insertion or deletion can add overhead - Compression / decompression adds CPU overhead if the middle node is frequent accessed / updated
Quicklist is an internal data structure used by Redis to implement lists
IntSet is a Redis data structure used to store unique integer values. All items are stored contiguously in ascending order without using pointers.
Sets use IntSet when the number of items is small.
typedef struct intset {
uint32_t encoding; // Current encoding (e.g., INTSET_ENC_INT16, INTSET_ENC_INT32, INTSET_ENC_INT64)
uint32_t length; // Number of integers in the Intset
int8_t contents[]; // Array of integers stored in the chosen encoding
} intset;typedef struct intset {
uint32_t encoding; // Current encoding (e.g., INTSET_ENC_INT16, INTSET_ENC_INT32, INTSET_ENC_INT64)
uint32_t length; // Number of integers in the Intset
int8_t contents[]; // Array of integers stored in the chosen encoding
} intset;Intset stores all items using the same byte size, starting with the smallest possible data type that can hold all values. If a new value exceeds the current data type's capacity, it automatically upgrades all items to a larger data type that fits.
encoding: Indicates the data type used to store integersINTSET_ENC_INT16(16-bit integers, 2 bytes per integer).INTSET_ENC_INT32(32-bit integers, 4 bytes per integer).INTSET_ENC_INT64(64-bit integers, 8 bytes per integer)
length: number of integers in the IntSetcontent**[]**: Array of integers stored in the chosen encoding.
- Memory Efficient: Uses the smallest possible integer type (int16_t, int32_t, or int64_t) without extra overhead like pointers.
- Dynamic Upgrading: Automatically upgrades integer type as needed, optimizing memory usage for small sets.
- Fast Operations: Maintains sorted order, enabling O(logn) lookups and efficient insertions/deletions.
- Not suitable for Large Sets:
- Since insertion and deletion operations require shifting elements to maintain contiguity, these operations become inefficient for large sets
- Can only store integers
- All integers must use the same data type, even when some values could fit in a smaller type
Redis Sets use IntSet when all elements are integers and the set is small. If either condition isn't met, Redis Sets switch to using a hash table.
The maximum number of elements before Redis Sets switches to a hash table is controlled by set-max-intset-entries, with a default value of 512 ****
A Skiplist is a probabilistic data structure that enables fast ordered operations ****like lookups, insertions, and deletions. In **Redis, Sorted Sets (ZSETs)** combine a Skiplist with a hash table to efficiently manage members and their associated scores.
A SkipList improves upon the Sorted Linked List by speeding up search operations. While searching in a regular Linked List takes O(n) time, SkipList offers better performance (O(logn**)**)
SkipList achieves this by creating multiple levels of Sorted Linked List. The bottom level includes all nodes, and each higher level contains a subset of nodes from the level below.
Here is a simplified diagram of SkipList
At the first level, we have all the nodes: 1, 4, 9, 12, 15, 22, and 27.
At the second level, we have nodes 4, 15, and 22.
At the third level, we have only node 15.
By using multiple levels of Sorted Linked List, we reduce the search complexity from O(n) to O(logn).
Let's see how searching works with an example value of 9:
We begin at level 3 (the highest level). From the header, we compare 9 with the next node's value (15). Since 15 is greater than 9, we move down to level 2. Here, we advance to node 4 (since 4 < 9). The next node is 15, which is greater than 9, so we move down to level 1.
At level 1, the next node after 4 is 9—we've found our target.
The advantages of skipList are:
- Efficient for sorted operations:
- search, insert, delete can be done in O(logn**)** where
nis total number of nodes.
- search, insert, delete can be done in O(logn**)** where
typedef struct zskiplistNode {
sds ele;
double score;
struct zskiplistNode *backward;
struct zskiplistLevel {
struct zskiplistNode *forward;
unsigned long span;
} level[];
} zskiplistNode;
typedef struct zskiplist {
struct zskiplistNode *header, *tail;
unsigned long length;
int level;
} zskiplist;typedef struct zskiplistNode {
sds ele;
double score;
struct zskiplistNode *backward;
struct zskiplistLevel {
struct zskiplistNode *forward;
unsigned long span;
} level[];
} zskiplistNode;
typedef struct zskiplist {
struct zskiplistNode *header, *tail;
unsigned long length;
int level;
} zskiplist;Each zskiplistNode consists of:
ele: The value of node, you might notice that we usesds****here, so even if you store theintegervalue inSortedSet, underneath, it’s still stored assdsscore: the score valuebackward: pointer to the previous node- 🧠 this is to support reverse traversal (command
ZREVRANGE)
- 🧠 this is to support reverse traversal (command
level[]:The node is copied to multiple level, in each level, we store 2 informations:- forward: the next node in current level
- span: Number of nodes skipped at this level:
- 🧠 This is to support the rank calculation of each element
The size of level array is determined probabilistically, when inserting a new node to skiplist, the level of node is determined randomly by this function
int zslRandomLevel(void) {
static const int threshold = ZSKIPLIST_P*RAND_MAX;
int level = 1;
while (random() < threshold)
level += 1;
return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}int zslRandomLevel(void) {
static const int threshold = ZSKIPLIST_P*RAND_MAX;
int level = 1;
while (random() < threshold)
level += 1;
return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}Then, we iterate from level 1 → this value and create zskiplistLevel for each level
The zskiplist consists of:
header**,** tail: pointer to theheadandtailnodelength: The number of nodelevel: The number of level
- Efficient Range Queries: Supports fast range lookups (e.g., ZRANGEBYSCORE) with
O(log n)complexity. - Dynamic Balancing: Maintains balance probabilistically without the overhead of rebalancing like in balanced trees.
- Memory Efficient: Uses fewer pointers compared to trees, especially with a well-tuned maximum level.
- Simplicity: Easy to implement and debug compared to complex tree structures.
- Sorted Order: Keeps elements in sorted order, enabling efficient traversal for operations like iterating over a range.
Skiplist has some drawbacks:
- Requires extra memory to store multiple levels of forward pointes for each node.
- The level of each node is randomly chosen, so the worst case is that all levels have the same number of nodes, which leads to the search, insertion, deletion operation complexity to
O(n), notO(logn), but this case is rare SkipListis not cache friendly, node are scattered across memory.- Lack of Balance: Because of relying on the probability***,*** the balance of the
skiplistis not strict, so the structure can become skewed → leads to performance degradation
- Sorted Set uses skiplist in combination with hash table to store & retrieve data when the amount of data is large.
Dictionaries are fundamental data structures in Redis. They serve as the foundation for the CommandTable (which stores all Redis commands) and the Redis database itself. All core database operations—adding, deleting, modifying, and querying data—are built on dictionary structures.
As far as I know, when we say hash table in Redis, it refers to this dict datastructure
struct dictEntry {
void *key;
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v;
struct dictEntry *next; /* Next entry in the same hash bucket. */
};
struct dict {
dictType *type;
dictEntry **ht_table[2];
unsigned long ht_used[2];
long rehashidx; /* rehashing not in progress if rehashidx == -1 */
/* Keep small vars at end for optimal (minimal) struct padding */
unsigned pauserehash : 15; /* If >0 rehashing is paused */
unsigned useStoredKeyApi : 1; /* See comment of storedHashFunction above */
signed char ht_size_exp[2]; /* exponent of size. (size = 1<<exp) */
int16_t pauseAutoResize; /* If >0 automatic resizing is disallowed (<0 indicates coding error) */
void *metadata[];
};struct dictEntry {
void *key;
union {
void *val;
uint64_t u64;
int64_t s64;
double d;
} v;
struct dictEntry *next; /* Next entry in the same hash bucket. */
};
struct dict {
dictType *type;
dictEntry **ht_table[2];
unsigned long ht_used[2];
long rehashidx; /* rehashing not in progress if rehashidx == -1 */
/* Keep small vars at end for optimal (minimal) struct padding */
unsigned pauserehash : 15; /* If >0 rehashing is paused */
unsigned useStoredKeyApi : 1; /* See comment of storedHashFunction above */
signed char ht_size_exp[2]; /* exponent of size. (size = 1<<exp) */
int16_t pauseAutoResize; /* If >0 automatic resizing is disallowed (<0 indicates coding error) */
void *metadata[];
};Note: A void pointer is a universal pointer that can point to any variable not declared with the const or volatile keywords. Source
dictEntryvoid *key: Pointer to the key associated with the entry. Typically an SDS, but can be other data types depending on the hash table's usage.union v: Union representing the value associated with the keyvoid *val:Pointer to the actual value (e.g., SDS string, hash table, skiplist)uint64_t u64For storing unsigned integers directlyint64_t s64: For storing signed integers directlydouble d: For storing double values directly
dictEntry *next: Next entry in the same hash bucket
dictdictType *type: Pointer to a dictType structure defining type-specific functions for the hash table- Provides custom operations for keys and values, including hashing functions, key comparisons, and destruction methods
dictEntry **ht_table[2]: Array of two hash table pointers for incremental hashinght_table[0]: Active hash tableht_table[1]: Temporary hash table for rehashing
unsigned long ht_used[2]: Number of used buckets in each hash table- Tracks occupied buckets in
ht_table[0]andht_table[1] - Used to determine when resizing or rehashing is needed
- Tracks occupied buckets in
long rehashidx: Tracks rehashing progress (-1 when no rehashing is active)unsigned pauserehash: 15-bit field indicating whether rehashing is paused, allowing consistency during mass writesunsigned useStoredKeyApi: 1-bit flag for special key handling during hashing- Controls use of specific stored hash functions for keys
- Used in Redis modules and optimizations
signed char ht_size_exp[2]: Array storing hash table size exponents- Hash table size is
$1<<exp$. Example: ifht_size_exp[0]=16, thenht_table[0]has$2^{16}$ = 65536buckets
- Hash table size is
int16_t pauseAutoResize: Controls automatic hash table resizing- Prevents resizing during operations requiring fixed table size
- Negative values indicate coding errors
void *metadata[]: Flexible array for hash table-specific metadata
If you're not familiar with hash tables, think of them as arrays where each element is called a bucket. A bucket can store a linked list or a tree, depending on the implementation. Hash tables use a hash function that converts a key into an integer, which serves as the index where the value will be stored.
When retrieving a value, the process is simple: hash the key, find the corresponding bucket, and search for the value within that bucket.
A common challenge with hash tables is hash collisions — when different keys produce the same hash result. This happens more frequently with fewer buckets. For instance, if you're storing 1 million key-value pairs in a hash table with only 1,000 buckets, collisions are inevitable, regardless of the hash function's quality. During a collision, new items join the linked list or tree in the affected bucket. As these collisions accumulate, bucket sizes grow, degrading retrieval time complexity from O(1) to O(n).
This creates a tradeoff: too few buckets lead to performance-degrading collisions, while too many buckets waste memory through unused space.
Redis solves this through rehashing. It begins with a modest number of buckets and tracks the load factor, defined as:
When the load factor exceeds 1, Redis expands the number of buckets and rehashes all existing data. The new hash table temporarily lives in ht_table[1] during this process. After rehashing completes, it becomes the new ht_table[0]. This approach prevents blocking the main thread, allowing the existing hash table to continue serving data.
By keeping the load factor at or below 1 and using a strong hash function, Redis maintains efficient lookups and updates while minimizing collisions.
- Fast lookup / update / insert - O(1) on average
- Dynamic resizing to avoid hash collision
- Memory overhead: Needs to store extra data for pointers
- Rehashing add memory & processing overhead
- Insufficient for range query
It is used extensively across Redis’s core functionality. Below are the main areas where dict is utilized:
- Main Key-Value Store: Stores all database keys and their associated values.
- Hashes: Used for field-value pairs when the hash exceeds ziplist thresholds.
- Sorted Sets: Maps elements to scores for O(1) lookups alongside skiplists for sorting.
- Pub/Sub: Tracks channel-to-client and client-to-channel subscriptions.
- Expiry Tracking: Maps keys to their expiration timestamps.
- Cluster Management: Tracks node metadata and slot ownership in cluster mode.
- Replication: Manages replica clients and synchronization backlogs.
- Lua Scripting: Tracks variables and accessed keys during script execution.
Redis dynamically switches internal data structures based on specific scenarios and rules to optimize memory usage and performance. However, Redis does not revert to a smaller data structure once it has upgraded to a larger one, even if the conditions for the smaller structure are met again.
Example: Hash Storage
- Ziplist: Used for hashes if:
- The number of entries is
≤ 512. - Each entry’s size is
≤ 64 bytes.
- The number of entries is
- Dict: If either condition is exceeded,
Redisswitches to adict.
No Reversion: Redis will not convert back to a ziplist, even if the hash becomes small again.
Real-World Impact
This behavior can lead to unexpected issues if not understood. For example, an issue at Grab arose partly due to engineers being unaware of this feature. You can read the full story here: Uncovering the Truth Behind Lua and Redis Data Consistency.
Understanding this behavior is crucial for designing efficient Redis usage patterns and avoiding potential pitfalls.
Redis’s internal data structures are the foundation of its exceptional speed, versatility, and efficiency. By understanding how Redis uses SDS for strings, quicklists for lists, and other structures like ziplists and skiplists, I hope this blog has shed some light on the engineering brilliance behind Redis’s design. These structures enable Redis to handle a wide range of use cases, from simple caching to complex real-time analytics.
That said, this is a personal exploration, and there might be some mistakes or oversights. If you spot anything that could be improved or corrected, I’d love to hear your feedback. Your contributions can help make this content even more valuable for others.
Redis is not just a tool—it’s a masterclass in the art of efficient data storage and retrieval. Thanks for reading, and let’s keep learning and exploring together!