Redis internal: Understand underlying data structures used in Redis

Redis, an incredibly fast in-memory data store, stands out for its simplicity and performance. What powers Redis under the hood? The answer lies in its meticulously designed internal data structures. These aren't mere abstract concepts—they're carefully optimized components handling everything from caching and messaging to real-time analytics and geospatial queries.

In this blog, we'll explore Redis's internal mechanisms, examining the building blocks behind its core data types—strings, lists, sets, hashes, and sorted sets. We'll see how Redis cleverly adapts its storage strategies to achieve an optimal balance of speed, memory efficiency, and scalability. From the elegant SDS (Simple Dynamic Strings) for string handling to the hybrid quicklists for list management and the precise skiplists for sorted sets, each structure is crafted to maximize Redis's potential.

Through this guide, you'll develop a richer understanding of Redis's internals and how its data structures drive its impressive performance and flexibility. Whether you're an experienced Redis user or a curious developer, this deep dive will illuminate why Redis continues to be a fundamental part of modern application architecture.

Let's dive in!

Here’s table of mapping the value type with the data structured Redis uses for that value

Data Type	Underlying Data Structure
String	SDS (Simple Dynamic String)
Hash	ZipList or Dict
List	Quicklist
Set	IntSet or Dict
Sorted Set	SkipList + Dict

The implementation is available as a standalone library at https://github.com/antirez/sds.

Redis is written entirely in C but doesn't use traditional C strings (char*).

Key limitations of traditional C strings:

Fixed memory allocation—strings require manual memory reallocation when modified, with potential overflow risks
Unable to store binary data because C strings use the null character (\0) to mark string endings, while binary data may contain null bytes
O(n) length retrieval—calculating string length requires iterating through the entire string

Let’s see how Redis eliminates those limitations using SDS

Here's the structure of SDS

It consists of two parts: the metadata part (sdshdr) and the data part (buf).

The metadata stores essential information using different header types based on the string's length.

There are 5 SDS header types:

Header type	Header size	Capacity usage
SDS_TYPE_5	1 byte	For very small strings
SDS_TYPE_8	3 bytes	For small strings
SDS_TYPE_16	5 byte	For medium strings
SDS_TYPE_32	9 bytes	For large strings
SDS_TYPE_64	17 bytes	For large strings

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

/* Note: sdshdr5 is never used, we just access the flags byte directly.
 * However is here to document the layout of type 5 SDS strings. */
struct __attribute__ ((__packed__)) sdshdr5 {
    unsigned char flags; /* 3 lsb of type, and 5 msb of string length */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr8 {
    uint8_t len; /* used */
    uint8_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr16 {
    uint16_t len; /* used */
    uint16_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr32 {
    uint32_t len; /* used */
    uint32_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};
struct __attribute__ ((__packed__)) sdshdr64 {
    uint64_t len; /* used */
    uint64_t alloc; /* excluding the header and null terminator */
    unsigned char flags; /* 3 lsb of type, 5 unused bits */
    char buf[];
};

Let's examine each field:

alloc:
- Represents total allocated memory for the buf[] array
- Always equals or exceeds len
- Uses preallocated space to minimize future reallocations during string growth:
  - Strategy: For strings ≤ 1MB, double the allocated size (rounded to nearest power of 2); for strings > 1MB, add 1MB
  - 🧠 By using this way, Redis can minimize the impact of reallocation every time the value is updated.
- Always a power of 2
flags: Indicates header type.
- This is always 1 byte field. The first 3 bits are used to indicate the header type. The following 5 bits are used to store the len of the string in case it’s SDS_TYPE_5 ****
len:
- Tracks the number of bytes used in the buf[] array
- Excludes the null terminator (\0):
  - 🧠 enable storing binary data
- Enables constant-time length retrieval
- The SDS_TYPE_5 ****doesn’t have this field because its len is stored directly in the flag (🧠)
buf[]:
- Flexible array storing the string data and null terminator (\0):
- Still store \0 for C function compatibility but doesn't use it for length calculation

Redis uses different header types to minimize metadata overhead. For instance, using uint32_t for len would be wasteful for small strings like test.

Memory layout example:

For the SDS string "hello":

len: 5 (length of "hello" without \0)
alloc: 8 (total buffer size with extra space)
buf[]: "hello\0"

While SDS offers many advantages over traditional C strings, it has some trade-offs:

Memory overhead: Additional metadata storage increases memory usage
Fragmentation: Preallocation may waste memory if strings don't grow as anticipated

SDS stores both Redis keys and string values (not just for string data types, but for all string values within other data types like lists, sets, and sorted sets)

A ziplist is a specially designed data structure that stores elements contiguously in memory to minimize overhead. Each element in a ziplist can be a string or an integer, with its size dynamically adjusted based on the actual content.

A ziplist consists of 4 components:

zlbytes (4 bytes): Total size of the ziplist in bytes, including header, entries, and end marker. Used for memory management.
zltail (4 bytes): Offset to the last entry in the ziplist, enabling quick tail access.
zllen (2 bytes): Number of entries in the ziplist.
- Note: When entries exceed 65535, this field is set to 65535, and the actual count requires list traversal.
  - 🧠 This approach aligns with ziplist's purpose of storing small collections
Entries: A series of contiguous elements using flexible encoding to minimize memory usage.
- Each entry contains:
  - prevlen: Length of the previous entry
    - 1 byte for lengths ≤ 254 bytes
    - 5 bytes for lengths > 254 bytes
  - encoding: Encoding and length information
    - For integers: Contains the actual integer value
    - For strings: Contains the string length
  - data: The actual string or integer value

The ziplist supports mixed data types, allowing both integers and strings within the same list.

ZipList has several key advantages:

Eliminates pointer overhead through contiguous storage
Enables fast sequential element traversal
Adjusts size dynamically as elements change

Poor ****Random ****Access: Accessing elements by index requires sequential traversal.
Resizing ****Overhead: Large ziplists face significant costs during element addition or removal.
- Since data is stored contiguously, adding elements may require reallocating the entire list when space is insufficient.
- Modifying a value to exceed 254 bytes triggers a chain reaction, requiring prevlen updates in subsequent nodes—leading to O(n) complexity
Size Limitations: Performance degrades with large datasets due to resizing and traversal costs.

ziplist serves multiple data types in Redis:

Hash:

Small hash objects use ziplist for storage.

Example hash:

{ "field1": "value1", "field2": "value2" },

{ "field1": "value1", "field2": "value2" },

Memory layout:

[ zlbytes ][ zltail ][ zllen ][ field1 ][ value1 ][ field2 ][ value2 ][ zlend ]

[ zlbytes ][ zltail ][ zllen ][ field1 ][ value1 ][ field2 ][ value2 ][ zlend ]

Size criteria are controlled by:

hash-max-ziplist-entries: Maximum field-value pairs (default: 512)
hash-max-ziplist-value: Maximum value size in bytes (default: 64)

List:

Small lists also use ziplist storage.

Example list:

LPUSH mylist "A" "B" "C"

LPUSH mylist "A" "B" "C"

Memory layout:

[ zlbytes ][ zltail ][ zllen ][ C ][ B ][ A ][ zlend ]

[ zlbytes ][ zltail ][ zllen ][ C ][ B ][ A ][ zlend ]

Criteria:

Controlled by list-max-ziplist-size: Maximum ziplist node size
- Negative value: Maximum entry size in bytes
- Positive value: Maximum number of entries

Default value is -2

Sorted Set:

Small sorted sets use ziplist storage.

Example sorted set:

ZADD myzset 1.0 "A"
ZADD myzset 2.0 "B"

ZADD myzset 1.0 "A"
ZADD myzset 2.0 "B"

Memory layout:

[ zlbytes ][ zltail ][ zllen ][ score1 ][ member1 ][ score2 ][ member2 ][ zlend ]

[ zlbytes ][ zltail ][ zllen ][ score1 ][ member1 ][ score2 ][ member2 ][ zlend ]

Criteria controlled by:

zset-max-ziplist-entries: Maximum elements (default: 128)
zset-max-ziplist-value: Maximum member size in bytes (default: 64 bytes)

A quicklist is a hybrid data structure that:

Represents a Redis list as a linked list of ziplist nodes.
Combines the benefits of:
- ziplist: Compact, contiguous memory storage for list elements.
- Linked List: Dynamic growth and efficient insertions and deletions.

Redis used to represent lists with:

Plain Linked Lists: High memory overhead due to pointers, each node is separately allocated, leading to potential memory fragmentation.
Ziplists: Efficient for small lists but inefficient for large lists due to resizing costs.

What ****quicklist addresses the downsides: By grouping multiple elements into a single ziplist, we resolve all of those downsize

Reduce the number of pointers overall
- For example, if we have 10 elements and use linked list, we need 20 pointers (10 prev pointers and 10 next pointers) but if we group those elements into 2 ziplist, the number of points is now reduced to just 4 (2 prev pointers & 2 next pointers)
Elements in ziplist are next to each other:
- 🧠 reduce memory fragmentation
Breaking into multiple ziplist also keeps the ziplist not too large
- 🧠 Optimize for resize

As mentioned in previous section, the quicklist is just a linked list of ziplist nodes

Here’s simplified structure

Here’s how it presented in Redis source code

/* quicklistNode is a 32 byte struct describing a listpack for a quicklist.
 * We use bit fields keep the quicklistNode at 32 bytes.
 * count: 16 bits, max 65536 (max lp bytes is 65k, so max count actually < 32k).
 * encoding: 2 bits, RAW=1, LZF=2.
 * container: 2 bits, PLAIN=1 (a single item as char array), PACKED=2 (listpack with multiple items).
 * recompress: 1 bit, bool, true if node is temporary decompressed for usage.
 * attempted_compress: 1 bit, boolean, used for verifying during testing.
 * dont_compress: 1 bit, boolean, used for preventing compression of entry.
 * extra: 9 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
    struct quicklistNode *prev;
    struct quicklistNode *next;
    unsigned char *entry;
    size_t sz;             /* entry size in bytes */
    unsigned int count : 16;     /* count of items in listpack */
    unsigned int encoding : 2;   /* RAW==1 or LZF==2 */
    unsigned int container : 2;  /* PLAIN==1 or PACKED==2 */
    unsigned int recompress : 1; /* was this node previous compressed? */
    unsigned int attempted_compress : 1; /* node can't compress; too small */
    unsigned int dont_compress : 1; /* prevent compression of entry that will be used later */
    unsigned int extra : 9; /* more bits to steal for future usage */
} quicklistNode;

/* quicklistLZF is a 8+N byte struct holding 'sz' followed by 'compressed'.
 * 'sz' is byte length of 'compressed' field.
 * 'compressed' is LZF data with total (compressed) length 'sz'
 * NOTE: uncompressed length is stored in quicklistNode->sz.
 * When quicklistNode->entry is compressed, node->entry points to a quicklistLZF */
typedef struct quicklistLZF {
    size_t sz; /* LZF size in bytes*/
    char compressed[];
} quicklistLZF;

/* Bookmarks are padded with realloc at the end of of the quicklist struct.
 * They should only be used for very big lists if thousands of nodes were the
 * excess memory usage is negligible, and there's a real need to iterate on them
 * in portions.
 * When not used, they don't add any memory overhead, but when used and then
 * deleted, some overhead remains (to avoid resonance).
 * The number of bookmarks used should be kept to minimum since it also adds
 * overhead on node deletion (searching for a bookmark to update). */
typedef struct quicklistBookmark {
    quicklistNode *node;
    char *name;
} quicklistBookmark;

/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
 * 'count' is the number of total entries.
 * 'len' is the number of quicklist nodes.
 * 'compress' is: 0 if compression disabled, otherwise it's the number
 *                of quicklistNodes to leave uncompressed at ends of quicklist.
 * 'fill' is the user-requested (or default) fill factor.
 * 'bookmarks are an optional feature that is used by realloc this struct,
 *      so that they don't consume memory when not used. */
typedef struct quicklist {
    quicklistNode *head;
    quicklistNode *tail;
    unsigned long count;        /* total count of all entries in all listpacks */
    unsigned long len;          /* number of quicklistNodes */
    signed int fill : QL_FILL_BITS;       /* fill factor for individual nodes */
    unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
    unsigned int bookmark_count: QL_BM_BITS;
    quicklistBookmark bookmarks[];
} quicklist;

/* quicklistNode is a 32 byte struct describing a listpack for a quicklist.
 * We use bit fields keep the quicklistNode at 32 bytes.
 * count: 16 bits, max 65536 (max lp bytes is 65k, so max count actually < 32k).
 * encoding: 2 bits, RAW=1, LZF=2.
 * container: 2 bits, PLAIN=1 (a single item as char array), PACKED=2 (listpack with multiple items).
 * recompress: 1 bit, bool, true if node is temporary decompressed for usage.
 * attempted_compress: 1 bit, boolean, used for verifying during testing.
 * dont_compress: 1 bit, boolean, used for preventing compression of entry.
 * extra: 9 bits, free for future use; pads out the remainder of 32 bits */
typedef struct quicklistNode {
    struct quicklistNode *prev;
    struct quicklistNode *next;
    unsigned char *entry;
    size_t sz;             /* entry size in bytes */
    unsigned int count : 16;     /* count of items in listpack */
    unsigned int encoding : 2;   /* RAW==1 or LZF==2 */
    unsigned int container : 2;  /* PLAIN==1 or PACKED==2 */
    unsigned int recompress : 1; /* was this node previous compressed? */
    unsigned int attempted_compress : 1; /* node can't compress; too small */
    unsigned int dont_compress : 1; /* prevent compression of entry that will be used later */
    unsigned int extra : 9; /* more bits to steal for future usage */
} quicklistNode;

/* quicklistLZF is a 8+N byte struct holding 'sz' followed by 'compressed'.
 * 'sz' is byte length of 'compressed' field.
 * 'compressed' is LZF data with total (compressed) length 'sz'
 * NOTE: uncompressed length is stored in quicklistNode->sz.
 * When quicklistNode->entry is compressed, node->entry points to a quicklistLZF */
typedef struct quicklistLZF {
    size_t sz; /* LZF size in bytes*/
    char compressed[];
} quicklistLZF;

/* Bookmarks are padded with realloc at the end of of the quicklist struct.
 * They should only be used for very big lists if thousands of nodes were the
 * excess memory usage is negligible, and there's a real need to iterate on them
 * in portions.
 * When not used, they don't add any memory overhead, but when used and then
 * deleted, some overhead remains (to avoid resonance).
 * The number of bookmarks used should be kept to minimum since it also adds
 * overhead on node deletion (searching for a bookmark to update). */
typedef struct quicklistBookmark {
    quicklistNode *node;
    char *name;
} quicklistBookmark;

/* quicklist is a 40 byte struct (on 64-bit systems) describing a quicklist.
 * 'count' is the number of total entries.
 * 'len' is the number of quicklist nodes.
 * 'compress' is: 0 if compression disabled, otherwise it's the number
 *                of quicklistNodes to leave uncompressed at ends of quicklist.
 * 'fill' is the user-requested (or default) fill factor.
 * 'bookmarks are an optional feature that is used by realloc this struct,
 *      so that they don't consume memory when not used. */
typedef struct quicklist {
    quicklistNode *head;
    quicklistNode *tail;
    unsigned long count;        /* total count of all entries in all listpacks */
    unsigned long len;          /* number of quicklistNodes */
    signed int fill : QL_FILL_BITS;       /* fill factor for individual nodes */
    unsigned int compress : QL_COMP_BITS; /* depth of end nodes not to compress;0=off */
    unsigned int bookmark_count: QL_BM_BITS;
    quicklistBookmark bookmarks[];
} quicklist;

Let’s go through them one by one.

quickList : main container, the top level represents the QuickList. It contains:
- Metadata: Information about the quicklist, such as the total number of elements and nodes
- Head and Tail Pointers: Pointers to the first (head) and last (tail) quicklist Nodes
- Linked List of Nodes: Each node is a quicklistNode, representing a portion of the list.
quicklistNode : represents a single node in the linked list. Each node contains the pointers unsigned char *entry which point to the ziplist or quicklistLZF if node is compressed
quicklistLZF Compressed ziplist by using LZF compression algorithm
- quicklist compresses middle node to save the memory and uncompresses them when needed.
quicklistBookmark quicklist also stores some pointer which points to specific nodes in the Quicklist for fast access.

When it comes to quicklist, one of the most difficult thing is to decide the configuration for the quicklist, we have 2 configurations:

list-max-ziplist-size: Controls the maximum size of ziplist within quicklist nodes
- Positive: Maximum number of elements per ziplist.
- Negative: Maximum size in bytes (e.g., -2 means up to 8 KB).
list-compress-depth: Controls how many nodes at the head and tail remain uncompressed.
- Middle nodes are compressed to save memory.
- Default: 0 (no compression).

The shorter ziplist, the more memory fragmentation. But the longer ziplist, the more difficulty in allocating large contiguous memory space for the ziplist. So you should be really careful when setting those values, if not sure, just use the default one, it’s not the best one, but at least it can serve your need in some way.

When we access the middle node, we need travel from the head to tail → O(n) ****complexity
Splitting / Merging the ziplist in case of insertion or deletion can add overhead
Compression / decompression adds CPU overhead if the middle node is frequent accessed / updated

Quicklist is an internal data structure used by Redis to implement lists

IntSet is a Redis data structure used to store unique integer values. All items are stored contiguously in ascending order without using pointers.

Sets use IntSet when the number of items is small.

typedef struct intset {
    uint32_t encoding;  // Current encoding (e.g., INTSET_ENC_INT16, INTSET_ENC_INT32, INTSET_ENC_INT64)
    uint32_t length; // Number of integers in the Intset
    int8_t contents[]; // Array of integers stored in the chosen encoding
} intset;

typedef struct intset {
    uint32_t encoding;  // Current encoding (e.g., INTSET_ENC_INT16, INTSET_ENC_INT32, INTSET_ENC_INT64)
    uint32_t length; // Number of integers in the Intset
    int8_t contents[]; // Array of integers stored in the chosen encoding
} intset;

Intset stores all items using the same byte size, starting with the smallest possible data type that can hold all values. If a new value exceeds the current data type's capacity, it automatically upgrades all items to a larger data type that fits.

encoding: Indicates the data type used to store integers
- INTSET_ENC_INT16 (16-bit integers, 2 bytes per integer).
- INTSET_ENC_INT32 (32-bit integers, 4 bytes per integer).
- INTSET_ENC_INT64 (64-bit integers, 8 bytes per integer)
length: number of integers in the IntSet
content**[]**: Array of integers stored in the chosen encoding.

Memory Efficient: Uses the smallest possible integer type (int16_t, int32_t, or int64_t) without extra overhead like pointers.
Dynamic Upgrading: Automatically upgrades integer type as needed, optimizing memory usage for small sets.
Fast Operations: Maintains sorted order, enabling O(logn) lookups and efficient insertions/deletions.

Not suitable for Large Sets:
- Since insertion and deletion operations require shifting elements to maintain contiguity, these operations become inefficient for large sets
Can only store integers
All integers must use the same data type, even when some values could fit in a smaller type

Redis Sets use IntSet when all elements are integers and the set is small. If either condition isn't met, Redis Sets switch to using a hash table.

The maximum number of elements before Redis Sets switches to a hash table is controlled by set-max-intset-entries, with a default value of 512 ****

A Skiplist is a probabilistic data structure that enables fast ordered operations ****like lookups, insertions, and deletions. In **Redis, Sorted Sets (ZSETs)** combine a Skiplist with a hash table to efficiently manage members and their associated scores.

A SkipList improves upon the Sorted Linked List by speeding up search operations. While searching in a regular Linked List takes O(n) time, SkipList offers better performance (O(logn**)**)

SkipList achieves this by creating multiple levels of Sorted Linked List. The bottom level includes all nodes, and each higher level contains a subset of nodes from the level below.

Here is a simplified diagram of SkipList

At the first level, we have all the nodes: 1, 4, 9, 12, 15, 22, and 27.

At the second level, we have nodes 4, 15, and 22.

At the third level, we have only node 15.

By using multiple levels of Sorted Linked List, we reduce the search complexity from O(n) to O(logn).

Let's see how searching works with an example value of 9:

We begin at level 3 (the highest level). From the header, we compare 9 with the next node's value (15). Since 15 is greater than 9, we move down to level 2. Here, we advance to node 4 (since 4 < 9). The next node is 15, which is greater than 9, so we move down to level 1.

At level 1, the next node after 4 is 9—we've found our target.

The advantages of skipList are:

Efficient for sorted operations:
- search, insert, delete can be done in O(logn**)** where n is total number of nodes.

typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

typedef struct zskiplistNode {
    sds ele;
    double score;
    struct zskiplistNode *backward;
    struct zskiplistLevel {
        struct zskiplistNode *forward;
        unsigned long span;
    } level[];
} zskiplistNode;

typedef struct zskiplist {
    struct zskiplistNode *header, *tail;
    unsigned long length;
    int level;
} zskiplist;

Each zskiplistNode consists of:

ele: The value of node, you might notice that we use sds ****here, so even if you store the integer value in SortedSet, underneath, it’s still stored as sds
score: the score value
backward: pointer to the previous node
- 🧠 this is to support reverse traversal (command ZREVRANGE)
level[]: The node is copied to multiple level, in each level, we store 2 informations:
- forward: the next node in current level
- span: Number of nodes skipped at this level:
  - 🧠 This is to support the rank calculation of each element

The size of level array is determined probabilistically, when inserting a new node to skiplist, the level of node is determined randomly by this function

int zslRandomLevel(void) {
    static const int threshold = ZSKIPLIST_P*RAND_MAX;
    int level = 1;
    while (random() < threshold)
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}

int zslRandomLevel(void) {
    static const int threshold = ZSKIPLIST_P*RAND_MAX;
    int level = 1;
    while (random() < threshold)
        level += 1;
    return (level<ZSKIPLIST_MAXLEVEL) ? level : ZSKIPLIST_MAXLEVEL;
}

Then, we iterate from level 1 → this value and create zskiplistLevel for each level

The zskiplist consists of:

header**,** tail: pointer to the head and tail node
length: The number of node
level: The number of level

Efficient Range Queries: Supports fast range lookups (e.g., ZRANGEBYSCORE) with O(log n) complexity.
Dynamic Balancing: Maintains balance probabilistically without the overhead of rebalancing like in balanced trees.
Memory Efficient: Uses fewer pointers compared to trees, especially with a well-tuned maximum level.
Simplicity: Easy to implement and debug compared to complex tree structures.
Sorted Order: Keeps elements in sorted order, enabling efficient traversal for operations like iterating over a range.

Skiplist has some drawbacks:

Requires extra memory to store multiple levels of forward pointes for each node.
The level of each node is randomly chosen, so the worst case is that all levels have the same number of nodes, which leads to the search, insertion, deletion operation complexity to O(n), not O(logn), but this case is rare
SkipList is not cache friendly, node are scattered across memory.
Lack of Balance: Because of relying on the probability***,*** the balance of the skiplist is not strict, so the structure can become skewed → leads to performance degradation

Sorted Set uses skiplist in combination with hash table to store & retrieve data when the amount of data is large.

Dictionaries are fundamental data structures in Redis. They serve as the foundation for the CommandTable (which stores all Redis commands) and the Redis database itself. All core database operations—adding, deleting, modifying, and querying data—are built on dictionary structures.

As far as I know, when we say hash table in Redis, it refers to this dict datastructure

struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;     /* Next entry in the same hash bucket. */
};

struct dict {
    dictType *type;

    dictEntry **ht_table[2];
    unsigned long ht_used[2];

    long rehashidx; /* rehashing not in progress if rehashidx == -1 */

    /* Keep small vars at end for optimal (minimal) struct padding */
    unsigned pauserehash : 15; /* If >0 rehashing is paused */

    unsigned useStoredKeyApi : 1; /* See comment of storedHashFunction above */
    signed char ht_size_exp[2]; /* exponent of size. (size = 1<<exp) */
    int16_t pauseAutoResize;  /* If >0 automatic resizing is disallowed (<0 indicates coding error) */
    void *metadata[];
};

struct dictEntry {
    void *key;
    union {
        void *val;
        uint64_t u64;
        int64_t s64;
        double d;
    } v;
    struct dictEntry *next;     /* Next entry in the same hash bucket. */
};

struct dict {
    dictType *type;

    dictEntry **ht_table[2];
    unsigned long ht_used[2];

    long rehashidx; /* rehashing not in progress if rehashidx == -1 */

    /* Keep small vars at end for optimal (minimal) struct padding */
    unsigned pauserehash : 15; /* If >0 rehashing is paused */

    unsigned useStoredKeyApi : 1; /* See comment of storedHashFunction above */
    signed char ht_size_exp[2]; /* exponent of size. (size = 1<<exp) */
    int16_t pauseAutoResize;  /* If >0 automatic resizing is disallowed (<0 indicates coding error) */
    void *metadata[];
};

Note: A void pointer is a universal pointer that can point to any variable not declared with the const or volatile keywords. Source

dictEntry
- void *key: Pointer to the key associated with the entry. Typically an SDS, but can be other data types depending on the hash table's usage.
- union v: Union representing the value associated with the key
  - void *val: Pointer to the actual value (e.g., SDS string, hash table, skiplist)
  - uint64_t u64 For storing unsigned integers directly
  - int64_t s64: For storing signed integers directly
  - double d: For storing double values directly
- dictEntry *next: Next entry in the same hash bucket
dict
- dictType *type: Pointer to a dictType structure defining type-specific functions for the hash table
  - Provides custom operations for keys and values, including hashing functions, key comparisons, and destruction methods
- dictEntry **ht_table[2]: Array of two hash table pointers for incremental hashing
  - ht_table[0]: Active hash table
  - ht_table[1]: Temporary hash table for rehashing
- unsigned long ht_used[2]: Number of used buckets in each hash table
  - Tracks occupied buckets in ht_table[0] and ht_table[1]
  - Used to determine when resizing or rehashing is needed
- long rehashidx: Tracks rehashing progress (-1 when no rehashing is active)
- unsigned pauserehash: 15-bit field indicating whether rehashing is paused, allowing consistency during mass writes
- unsigned useStoredKeyApi: 1-bit flag for special key handling during hashing
  - Controls use of specific stored hash functions for keys
  - Used in Redis modules and optimizations
- signed char ht_size_exp[2]: Array storing hash table size exponents
  - Hash table size is $1<<exp$ . Example: if ht_size_exp[0] = 16, then ht_table[0] has $2^{16}$ = 65536 buckets
- int16_t pauseAutoResize: Controls automatic hash table resizing
  - Prevents resizing during operations requiring fixed table size
  - Negative values indicate coding errors
- void *metadata[]: Flexible array for hash table-specific metadata

If you're not familiar with hash tables, think of them as arrays where each element is called a bucket. A bucket can store a linked list or a tree, depending on the implementation. Hash tables use a hash function that converts a key into an integer, which serves as the index where the value will be stored.

When retrieving a value, the process is simple: hash the key, find the corresponding bucket, and search for the value within that bucket.

A common challenge with hash tables is hash collisions — when different keys produce the same hash result. This happens more frequently with fewer buckets. For instance, if you're storing 1 million key-value pairs in a hash table with only 1,000 buckets, collisions are inevitable, regardless of the hash function's quality. During a collision, new items join the linked list or tree in the affected bucket. As these collisions accumulate, bucket sizes grow, degrading retrieval time complexity from O(1) to O(n).

This creates a tradeoff: too few buckets lead to performance-degrading collisions, while too many buckets waste memory through unused space.

Redis solves this through rehashing. It begins with a modest number of buckets and tracks the load factor, defined as:

$\text{Load Factor (delta)} = \frac{\text{Number of Items}}{\text{Number of Buckets}}$

When the load factor exceeds 1, Redis expands the number of buckets and rehashes all existing data. The new hash table temporarily lives in ht_table[1] during this process. After rehashing completes, it becomes the new ht_table[0]. This approach prevents blocking the main thread, allowing the existing hash table to continue serving data.

By keeping the load factor at or below 1 and using a strong hash function, Redis maintains efficient lookups and updates while minimizing collisions.

Fast lookup / update / insert - O(1) on average
Dynamic resizing to avoid hash collision

Memory overhead: Needs to store extra data for pointers
Rehashing add memory & processing overhead
Insufficient for range query

It is used extensively across Redis’s core functionality. Below are the main areas where dict is utilized:

Main Key-Value Store: Stores all database keys and their associated values.
Hashes: Used for field-value pairs when the hash exceeds ziplist thresholds.
Sorted Sets: Maps elements to scores for O(1) lookups alongside skiplists for sorting.
Pub/Sub: Tracks channel-to-client and client-to-channel subscriptions.
Expiry Tracking: Maps keys to their expiration timestamps.
Cluster Management: Tracks node metadata and slot ownership in cluster mode.
Replication: Manages replica clients and synchronization backlogs.
Lua Scripting: Tracks variables and accessed keys during script execution.

Redis dynamically switches internal data structures based on specific scenarios and rules to optimize memory usage and performance. However, Redis does not revert to a smaller data structure once it has upgraded to a larger one, even if the conditions for the smaller structure are met again.

Example: Hash Storage

Ziplist: Used for hashes if:
- The number of entries is ≤ 512.
- Each entry’s size is ≤ 64 bytes.
Dict: If either condition is exceeded, Redis switches to a dict.

No Reversion: Redis will not convert back to a ziplist, even if the hash becomes small again.

Real-World Impact

This behavior can lead to unexpected issues if not understood. For example, an issue at Grab arose partly due to engineers being unaware of this feature. You can read the full story here: Uncovering the Truth Behind Lua and Redis Data Consistency.

Understanding this behavior is crucial for designing efficient Redis usage patterns and avoiding potential pitfalls.

Redis’s internal data structures are the foundation of its exceptional speed, versatility, and efficiency. By understanding how Redis uses SDS for strings, quicklists for lists, and other structures like ziplists and skiplists, I hope this blog has shed some light on the engineering brilliance behind Redis’s design. These structures enable Redis to handle a wide range of use cases, from simple caching to complex real-time analytics.

That said, this is a personal exploration, and there might be some mistakes or oversights. If you spot anything that could be improved or corrected, I’d love to hear your feedback. Your contributions can help make this content even more valuable for others.

Redis is not just a tool—it’s a masterclass in the art of efficient data storage and retrieval. Thanks for reading, and let’s keep learning and exploring together!

https://axlgrep-github-io.translate.goog/tech/redis-dict.html?_x_tr_sl=en&_x_tr_tl=vi&_x_tr_hl=vi&_x_tr_pto=wapp

https://axlgrep-github-io.translate.goog/tech/redis-sds.html?_x_tr_sl=en&_x_tr_tl=vi&_x_tr_hl=vi&_x_tr_pto=wapp

https://medium.com/devops-dev/redis-underlying-data-structure-everything-you-need-to-know-aafd99840c2b