TinyLFU: Highly Efficient Cache Admission Policy

Discover how TinyLFU improves cache efficiency by filtering out low-value items using approximate frequency tracking, optimizing both hit ratio and memory usage.

When reading a blog from Redis about cache eviction strategy, I see they mention a few common strategies including LRU, LFU, but there’s another one that I haven’t ever seen before. It’s Window TinyLFU (W-TinyLFU). As described in the blog

W-TinyLFU keeps the most relevant cache entries in the cache by considering both how recent and how frequently the data has been accessed. This cache eviction policy is especially beneficial in scenarios with varying access patterns and distributed caching environments.

It sparked my curiosity, and, yeah, that’s how this blog was born. :bufo-bless-back:

TinyLFU is a cache admission policy (Answer the question: Should I cache this new item?) , not cache eviction policy (Answer the question: If I must, which item should I remove from current cache?). The admission policy can be used in combination with the eviction policy to improve the performance & space complexity of the cache.

We typically distinguish two request‐pattern scenarios:

  • Static distribution: Every key is accessed with roughly the same probability (1N\frac 1 N). There is no natural “hot” subset—all surviving items share similar counts once they’ve been seen.
  • Real-world distribution: Accesses follow a heavy-tailed pattern (e.g. Zipf or “80/20 rule”): a small fraction of keys account for the majority of requests.

In the TinyLFU paper, they mentioned that, when adding TinyLFU admission policy:

  • With static distributions: The performance gap (e.g hit-rate) between the various eviction policies becomes marginal.
  • In case of real distributions: A smarter eviction policy (e.g. SLRU or frequency-aware LRU) still provides some additional benefit—but it’s far smaller than the gains you get simply by filtering out one-hit items with TinyLFU.

LFU (Least Frequently Used) is a well-known cache eviction policy. It relies on the fact that: The bigger number of access to an item is, the more benefit we can gain if we cache it. So, with LFU, in case that we need to evict an item to reclaim the cache memory space, we choose the one with the smallest access frequency.

LFU is based purely on the frequency count (how many times items have been accessed). Item with the lowest access counts are evicted first. However, LFU has no built-in mechanism to “forget” old frequencies, leading to:

  • Old popular items never leave the cache: Some items were frequently accessed in the past, but rarely accessed recently might remain in the cache too long, because their high frequency.
  • New items struggle to enter the cache: Even if the new item suddenly becomes popular, its frequency counts is low compared to older items, so in case the eviction happens, they are likely the ones that would be evicted first.

In order to decide which items should be evicted, we need to store some meta-data , in LFU, they’re

  • Frequency counts (e.g. an integer or small byte counter) to track how many times each item has been accessed.

And (Optionally, based on the implementation)

  • Timestamp (to break ties if 2 items have same frequency)
  • Pointers (in case we use heaps or buckets)

This can add significant memory consumption when using LFU.

📝 There are some notable LFU variations:

  • Perfect LFU (PLFU): It tracks the full history of all object accesses, even for objects not currently cached, hence, the frequency counts (and some other metadata) are not embedded in the cache object.
    • By doing so, the frequency estimations across all objects are accurate
    • Since it stores metadata of objects even if they’re not in cached. It’s deemed impractical and consumes a lot of memory.
  • In-memory LFU (ILFU): Only tracks access frequencies of cached objects. Once evicted, frequency is forgotten. This variant is more practical, but still has Aging problem.
  • Window LFU (WLFU):
    • it caches the W most recent **requests;
    • cache object replacement is performed with LFU, using the object popularities in window
    • if two or more objects have the same LRU is used to identify the object to replace.

Below is the architecture of TinyLFU

1. Overview — figure 1

Remember that, TinyLFU is an admission policy, which means, it determines wether the item should be stored in the cache or not. The main cache can use any kind of eviction policy, not just LFU.

In case of the new item comes in:

  • If cache has free space, the item is admitted immediately.
  • If cache has no more free space: In this case, we the cache victim (the cached item is going to be evicted based on the eviction policy) and candidate (the new item). TinyLFU compares the chance of increasing cache-hit between candidate & cache victim by using the estimated frequency (we’ll go into detail in next section). If estimated_freq(candidate)>estimated_freq(victim)estimated\_freq(candidate) > estimated\_freq(victim), tinyLFU tell the cache to evict the victim and cache the candidate, otherwise, candidate is discarded.

TinyLFU needs to maintain the statistic of items frequency over a sizable recent history. This data consists of not only cached items but any other items as well. The challenges we face with LFU are also applied to TinyLFU

  • Freshness: Maintain a freshness mechanism in order to keep the history recent and remove old items
  • Memory overhead: Memory consumption in order to keep the frequency of not only the items in cache but historical items.

Approximate Counting algorithms are techniques that allow us to count a large number of events using a very small amount of memory. The counter values returned by those algorithms are just approximate ones. There are variety of algorithms for this, Minimal Increment Counting Bloom Filter (Minimal Increment CBF) ****- an augmented Counting Bloom Filter is one of them.

A Bloom filter is a compact, probabilistic data structure used to test if an element might be in a set or definitely is not. A traditional implementation of this data structure is:

  • We have kk hash functions and an bit array (aka sketch)
  • To add an item, it hashes the item through each function and sets the bits at the resulting positions in the array
    a. Bloom filter — figure 1
  • To check for membership, it hashes the item again and examines the corresponding bits:
    • If any bit is 0, the item is definitely not in the set
    • If all bits are 1, the item might be in the set: Because multiple keys can map to the same bit positions, those 1-bits may have been set by other items, producing a false positive.
      a. Bloom filter — figure 2

The sketch no longer stores individual bits, each cell now holds a integer counter that approximates an item’s access frequency. When adding item to the CBF, the kk hash functions locate kk counters and each is incremented by 1.

Why is the counter just an approximation? ****because a single counter can be incremented by **multiple keys** → When you read that counter, you learn only that your key contributed to its value - not how much it contributed!

For example:

  • Counter value = 5
  • Key A has actually been accessed 3 times
  • Key B also hashes to the same slot and has added 2 more increments.

So the sketch reports 5, even though A’s true frequency is just 3.

Because the kk hash functions can map a key to counters with different values, the CBF estimates that key’s frequency as the 

freq_estimate(item)=mincounter[h1(item)],,counter[hk(item)]freq\_estimate(item) = min{counter[h1(item)],…,counter[hk(item)]}

Minimal Increment CBF is an optimized variant of CBF. The main different is that, instead of increasing all counters, this structure only increments the smallest counter(s).

This is the solution to resolve the Aging challenge. TinyLFU apply a method called reset . We maintain a global counter, every time we add an item into sketch, we increment the global counter. If this global counter reaches a value WW - it’s the sample size, we divide it and all counters in the sketch by 2

It’s a cool trick that resolves the aging issue, even if the counter(s) of the old item is high, due to the reset operation, if it’s rarely accessed, its counter(s) are likely to decreased at each reset operation. Hence, at some point, we can say it’s completely removed from the sketch

Besides, this has 2 interesting merits:

  • It does not require much extra space since it only adds a global counter of Log2(W)Log_2(W) bits
  • Low computational cost:
    • Even though dividing all counters sounds expensive, but it happens rarely (only once every WW insertion)
    • The dividing operator is fast: The bit shift (dividing by 2 ) is cheap in hardware and software.
  • It still remains the correctness, if an item has higher frequency than others, its popularity compared to other is still the same after the reset (since others’ counters are reset too!). In the paper they have a mathematic proof for this correctness, if you want you can take a look at the section 3.3.1 in the paper.

In 3.Freshness mechanism, we know that TinyLFU resets (dividing counters by 2) every WW requests. So, we can say, the maximum value of one counter in the sketch array is WW, (let’s say one item is repeatedly requested in WW consecutive requests), hence, the counters in sketch array cost Log2(W)Log_2(W) bits each.

One observation is that, for a cache of size CC, all items whose access frequency is above 1C\frac 1 C should be in the cache. Hence, for a given sample of size W, we can safely cap the counters by WC\frac W C which means, the maximum value of counter is WC\frac W C

→ we only need the counters of Log2(WC)Log_2(\frac W C) bits in the sketch.

You might ask: *“*Wait, even if I only need log2(WC)\log_2(\frac W C) bits per counter, aren’t variables like int, long, or long long always fixed-size — say 32 or 64 bits? You’re absolutely right. If you use a plain int[] or long[] to store frequency counts, each counter will occupy an entire word — even though you only need 3 or 4 bits.

But here’s the trick: you can pack multiple counters into a single word — and that’s exactly what efficient TinyLFU implementations do.

For instance, Caffeine uses a packed long[] array, where each 64-bit word holds 16 counters, each 4 bits wide. Bitwise operations are then used to efficiently read, increment, halve, and saturate these small counters.

Reference:

Caffeine FrequencySketch.java

One challenge arise is the heavy-tailed workloads in which a huge number of unique items are accessed only once (aka the long-tail). In such scenarios, most frequency counters would be wasted on one-hit-wonders that are unlikely to be requested again. Maintaining the detail count for every unique items is memory-intensive and inefficient. That’s why the Doorkeeper comes into play.

The DoorKeeper in TinyLFU is implemented as a Bloom filter that fronts the main counting structure. It use a fixed-size bit array and a few hash functions.

Each items’s hash bits serve as a single-bit counter (presence / absence).

Here’s how it works:

On each item request, TinyLFU performs the following steps:

  • Check the Doorkeeper: If the item is not present in the Doorkeeper, it’s treated as the first time request, TinyLFU inserts the items into the Doorkeeper (setting the bits for that item) and doesn’t insert that item into the main counting.
  • If the item is already in the Doorkeeper, TinyLFU promotes the item to the main frequency sketch, updating the approximate count in the Minimal Increment CBF.
  • When TinyLFU needs to estimate the item’s frequency (for making the admission decision), it consults both structure. If the item is found in the Doorkeeper, the frequency counter of the item = counter in the main count + 1. We need to plus one because in the first hit of the item, we did not count that hit!
  • In the reset operation, TinyLFU clear the Doorkeeper in addition to halving all counters in the main main structure.

By using Doorkeeper, we can reduce the memory consumption of the main structure, since only the counters of items that worths storing are in the main structure.

False positive

The Bloom Filter can return the false positive value, it can occasionally report that an item was seen before when in fact it wasn’t. It’s because the filter’s bits for the new item happen to all be set due to other items’ insertions. In case of false positive, TinyLFU would prematurely promote it to the main structure and increment a counter for it. The impact of this is that some truly new (infrequent) items might be introduced in the main structure, even though they have no prior access.

However, the false positive rate can be avoided by sizing the Bloom filter appropriately so that these cases are rare. But even if the false positive occurs, its impact is relatively small in context of cache admission decisions since frequency counters for that item would not be large. And at the end of the day, TinyLFU’s admission still compares the new item’s frequency against a victim’s frequency, an extra +1 usually doesn’t overturn the decision unless the counts were very close.

TinyLFU has some limitations:

  • Lack of Recency Awareness: TinyLFU focuses purely on frequency. A fresh new item starts with low frequency count, to TinyLFU might reject it in favor of a slightly older item that has higher count. So that new item can be evicted immediately, even if they might be relevant or re-request in short future.
  • “Sparse Burst” Problem: Even though the new item is accessed repeatedly in a short time, but as long as it hasn’t yet accumulated enough count, TinyLFU keeps rejecting it until its frequency count is large enough, causing repeated cache miss for that item.

W-TinyLFU is a cache management algorithm designed to optimize hit rates in environments with skewed or dynamic access patterns. It combines frequency-based admission policies with recency-aware caching strategies, addressing limitations of traditional methods like LRU and LFU.

Here’s W-Tiny LFU Architecture

1. Architecture — figure 3

W-TinyLFU adds a recency buffer (window) so that newly accessed items get a chance to stay in cache briefly even with low frequency. It uses LRU for this window → Resolve “Sparse Burst” Problem.

W-TinyLFU organizes the cache into two distinct regions: a small LRU-based window and a larger main cache managed by TinyLFU’s admission logic.

  • Window Cache (LRU ): Small portion of cache (e.g 1% - 20% of capacity) is reserved as an LRU cache for newly arrived items. This window has no cache admission & uses purely LRU.
  • Main cache (TinyLFU + SLRU ): The majority of the cache (~80-99% of capacity) is allocated to the main cache, which is governed by the TinyLFU admission policy combined with Segmented LRU (SLRU) eviction strategy. TinyLFU’s role here is to decide which items get to enter or stay in the main cache based on their estimated frequency of access (using a compact frequency sketch) . The main cache is typically implemented as an SLRU, meaning it is splited into two LRU sub-sections:
    • A probationary section (often denoted A1) for items that have been admitted recently but not yet proven “hot”. New items entering the main cache start here.
    • A protected section (A2) for items that have been accessed again (i.e they’re already in A1) . The protected segment is given the majority of space (commonly 80% of main region) and indicates long-term hot entries.

  1. Admission to Window: New items on a cache miss are first inserted into the window cache (an LRU segment), without any admission test. Their access is also recorded in the TinyLFU frequency sketch

  2. Eviction from Window: When the window is full, the least-recently-used item is evicted. Instead of discarding it immediately, it becomes a candidate for entering the main cache.

  3. Admission Test:

    If the main cache is not full yet, the candidate is added into the main cache immedidately.

    Otherwise: The candidate’s frequency is compared to the frequency of the main cache’s weakest item (typically from the probation segment A1).

    • If the candidate’s frequency is higher, it is admitted to main and replaces the weaker item.
    • Otherwise, the candidate is rejected and evicted completely.
  4. Promotion in Main Cache (SLRU):

    Once admitted to the main cache, the item starts in the probationary segment (A1).

    If it is accessed again, it gets promoted to the protected segment (A2), gaining stronger protection against eviction.

    • In case A2 is already full, the least-recently-used in A2 is added to A1
      • In case A1 is already full, the least-recently-used in A1 is discarded completely

I did a simple implementation of W-TinyLFU, you can check it here:

https://github.com/trunghieu99tt/wtinylfu

Tagged:#Backend
0