Roaring Bitmaps: The Powerful Bitmap Compression Format

A bitmap, also known as a bit array or bitset, is a data structure used to efficiently represent a set of distinct integers, particularly useful in scenarios where the range of the data is limited. It is an array of bits where each bit corresponds to an integer in the set, indicating whether the integer is present (1) or absent (0).

Bitmaps are particularly advantageous due to their fast and straightforward operations:

Setting a bit: This operation marks a particular integer as present in the set (set the bit to 0)
Clearing a bit: This corresponds to removing an integer from the set (set the bit to 1)
Testing a bit: This checks whether a specific integer is included in the set (check whether the bit is 0 or 1)

Due to their binary nature, bitmaps allow for rapid bitwise operations such as AND, OR, and XOR, which can efficiently perform set operations like intersections, unions, and symmetric differences.

You might wonder, okay, then what’s its usage?

There are many practical uses for bitmaps. Some of them include:

Database Indexing: Bitmaps are particularly useful in databases for indexing and quickly querying data. They are used to create bitmap indexes where each bit in a bitmap corresponds to a potential row in the database, and the bit's value (0 or 1) indicates whether a specific attribute or condition holds true for that row. This allows for fast retrieval, updates, and management of large datasets, especially when performing complex queries involving multiple conditions.
Data Analytics: Bitmaps are employed in data analytics to perform fast aggregation operations. For instance, they can help in counting the number of users who have a particular set of characteristics or behaviors by using bitwise operations to intersect or union multiple bitmaps, each representing a different characteristic.
Content Filtering: Search engines and content filters use bitmaps to store flags for filtering criteria, enabling them to quickly exclude or include content in search results based on user settings or regulatory requirements.

A bitmap is a great choice, not only for its performance but also for its storage efficiency, as its elements are just bits.

However, there are cases where bitmap compression is necessary.

The bitmap will set the n-th bit in case you add the number n to that bitmap.

Assume that we have an empty bitmap, then add a number 1 and number 8,000,000 to that bitmap. In this case, the bitmap will allocate 1**,000,008** bytes, then set the 8,000,000 th bit to 1

You see? we only want 2 numbers, but it has to allocate 1MB!! it’s way too many than what we actually need.

That’s why some bitmap compression formats were born. Some popular bitmap compression format are:

Oracle’s BBC is an obsolete format at this point; though it may provide good compression, it is likely much slower than more recent alternatives due to excessive branching.
WAH is a patented variation on BBC that provides better performance.
Concise is a variation on the patented WAH. In some specific instances, it can compress much better than WAH (up to 2x better), but it is generally slower.
EWAH is both free of patent and faster than all the above. On the downside, it does not compress quite as well. It is faster because it allows some form of “skipping” over uncompressed words. So though none of these formats are great at random access, EWAH is better than the alternatives.

All these bitmap compression formats share a common pain point: they do not support random access. This means that to check if a given value is present in the set, you must start from the beginning and "uncompress" the entire thing. Consequently, intersecting a large set with another big set requires decompressing the entire big set in the worst case.

That's where Roaring Bitmap comes in.

Roaring Bitmap not only outperforms these formats in terms of compression ratio but also addresses the random access issue. Let's explore the core concept behind Roaring Bitmap in the next section.

Instead of storing a gigantic bitmap, it partitions the range of 32-bit indexes ([0, n)) into chunks of $2^{16}$ integers sharing the same 16 most significant digits with the following characteristics:

There is $2^{16}$ chunks in total
Each chunk contains at most $2^{16}$ consecutive integers
The value in each chunk can be either bit or actual integer, based on the type of chunk, aka container
Chunk are called “containers.”

In the implementation, roaring bitmap is a key-value data structure.

The key is made of shared 16 bits.
The value is the container

In the actual implementation, the key-value is implemented as 2 arrays. Those arrays expand/shrink dynamically when there are insertions or deletions.

The key which makes Roaring Bitmap efficient in both performance and storage is the flexible container type. There are 3 types of containers, based on the number of integer in each container.

Each container has information about the cardinality—the number of integers in that container (except for run-length container) and numbers it stores

Array container: if there are ≤ 4096 integers, Roaring Bitmap uses a sorted array of packed 16-bit integers. The size of array container is calculated by formula: $2c + 2$ bytes where c is the cardinality.

Bitmap: In case there are > 4096 integers. Roaring bitmap uses bitmap of size $2^{16}$ . The size of bitmap container is always 8192 bytes.

https://www.vikramoberoi.com/a-primer-on-roaring-bitmaps-what-they-are-and-how-they-work/

Run container: a packed array of pair of 16-bit integers, the first value of each pair is the starting value, the second value is the length of the run (the number of contiguous values from the starting value). For example, we would store the values 10, 11, 12, 13, and 14 as the pair 10, 5. The size of run container is $2 + 4r$ where r is number of runs

One special note is that you might notice that the container (no matter what type it is) is always less than or equal to 8kB. This makes the container fit in the L1 CPU cache of most processors; this also boosts performance since we can utilize the CPU cache, especially for hot containers.

Starting from empty Roaring Bitmap, when a new value is added, an array container is created. If the inserted value makes the cardinality exceed 4096, the container is transformed to a bitmap container. On the other hand, if a value is removed from a bitmap container so that the cardinality falls to 4096, then it’s transformed into array container. If the array container is empty, it will be removed entirely from the top key-value store.

The run container is special case; it’s usually created upon request by calling the runOptimize function, which means that, by default, the Roaring Bitmap always switch back and forth between array container and bitmap container based on the cardinality of the container. But there are some special cases where the Roaring Bitmap will transform the container into a running container. The rule for this case is quite complex so I think that if you really want to dive deep into it, you can find the information on page 6 and 7 of this paper

To check for the presence of a 32-bit integer x, we first seek for the container corresponding to that number using its most 16 significant bits $x/2^{16}$

If the container is not exist, x is not in Roaring Bitmap

If the container is:

Bitmap: we just need to check if the bit at $x\%2^{16}$ is set
Array: we use binary search to find $x \% 2^{16}$ in sorted, packed array

we first seek for the container corresponding to that number using its most 16 significant bits $x/2^{16}$

If the container does not exist, create a new array container and add that number to the container

If the container exists,

Bitmap container: Set the bit at $x\%2^{16}$
Array container: Insert $x\%2^{16}$ to the array (make sure the array is still sorted after insertion)

To intersect Roaring bitmap A and B, we only need to intersect the containers that appear in both A and B. The intersect strategy is based on the type of matching container

Bitmap/Bitmap: Compute the bitwise AND of 2 bitmaps, then use bitCount function to get the result cardinality, If the cardinality exceeds 4096, we materialize the new bitmap using the result of AND; otherwise, create an array container
Bitmap/Array: Iterate over the array, checking if the bit of that integer is set; if yes, add it to resulting array container. The result of intersection of bitmap and array is always array container
Array/Array: The algorithm used to compute the intersection varies by a cardinality heuristic described at the bottom of page 5 here. It will either be a simple merge (as used in merge sort) or a galloping intersection, described in this paper.
Run/Run: The union algorithm creates a new run container with its capacity set to the sum of the runs in both containers.
- Iterates through each container, appending runs based on their starting points, either as new runs or extensions of existing runs.
- remaining runs in one container are appended after the other is exhausted.
- The resulting container may convert to a bitmap if it has too many runs.
Run/Array: The result of the intersection is always an array container, as it cannot exceed the cardinality of the array container. The algorithm iterates over the array values, try adding it to the run if possible, if, after adding, some runs overlap, we merge them into one (or more) runs.
Run/Bitmap: The intersection starts by checking the cardinality of the run container.
- If 4,096 or less, iterate over integers in the run, checking for their presence in the bitmap container and appending them to a new array container.
- Otherwise, clone the bitmap container and clear bits outside the run, converting to an array if necessary.

To combine Roaring Bitmaps A and B, combine all matching containers from A and B. The union algorithms depend on the types of containers involved, and the resulting container type is determined accordingly:

Bitmap: Perform a bitwise OR operation between the two bitmaps. The result will always be a new bitmap container.
Bitmap / Array: Copy the bitmap container and set bits corresponding to all integers in the array container. The result will always be a new bitmap container.
Array: If the combined cardinality of both array containers is 4,096 or less, create a new array container and add all integers from both arrays. Otherwise, create a new bitmap container, set bits for all integers in both arrays, and convert it back to an array container if its cardinality is 4,096 or less.
Run/Run: The union algorithm creates a new run container with its capacity set to the sum of the runs in both containers.
- Iterates through each container, appending runs based on their starting points, either as new runs or extensions of existing runs.
- remaining runs in one container are appended after the other is exhausted.
- The resulting container may convert to a bitmap if it has too many runs.
Run/Array: The union between a run and array container can lead to different outcomes.
- Treating both containers as runs, iterating over integers, and deciding between array or bitmap based on the resulting container.
- Alternatively, treat the array container as a series of single-run elements and follow the union heuristic for run containers.
Run/Bitmap: The union is achieved by cloning the bitmap container and setting bits corresponding to integers in the run container. If needed, it may convert to a run container for more efficient representation.

There is a lot of theory so far, and I think that it’s time we can have some real-world application of using Roaring Bitmap. Actually, Roaring bitmaps are used in many big tech companies, such as Google, Netflix, Microsoft, etc., as listed in their homepage

Grab shared their solution in Grab's Segmentation Platform, which made use of Roaring Bitmap, in this blog

In grab, the segment is a sub-group of passengers, driver-partners or merchant-partners. Checking whether a user belongs to a segment is a crucial part in many critical flows in Grab, so that they need to make it as fast as possible. The segment-user relationship is stored in ScyllaDB cluster, for example:

https://engineering.grab.com/streamlining-grabs-segmentation-platform

There are pain points:

Creating new segment with a lot of users creates a lot of queries to the database
The latency for checking if user is within a set of segments does not satisfy the requirements from other consumer teams.

After researching, they decided to use Roaring Bitmap, so each Segment is a Roaring Bitmap. Since the user id is integer, it’s very easy to adopt Roaring Bitmap. Now, service teams are able to perform set operations (union, interaction, and difference, etc) on the segments on the fly, no need to query to the database. The creation time is also reduce since they just need to create Roaring Bitmap instead of write to the database. The Segment Roaring Bitmap is store in a Blob store.

To mitigate the maintaining cost and minimize the latency, the developer team developed an SDK that handles the retrieval, caching, watching of segment.

The result is that: The communication platforms is able to perform membership checks on multiple multi-million member segments, achieving peak QPS 15K/s with a p99 latency of <1ms

https://roaringbitmap.org/about/

https://www.vikramoberoi.com/a-primer-on-roaring-bitmaps-what-they-are-and-how-they-work/

https://arxiv.org/pdf/1402.6407

https://arxiv.org/pdf/1603.06549

https://github.com/RoaringBitmap/RoaringBitmap

https://engineering.grab.com/streamlining-grabs-segmentation-platform

https://www.baeldung.com/java-roaring-bitmap-intro