Tired of the slowness of map ? Go for Policy Hash Table!

Back story: Before aware of Hash Table, I mostly use std::map and std::unordered_map to solve problems. They work fine (in most cases), sometimes if they're too slow, I have to find another solution or use another data structure (like vector or array). But there's a case when I need the key, value type so that's when the problem came, After researching on Google, I found a new tool and they are what I'm going to show you guys today

It's the Policy Hash Table.

Policy Hash Table belongs to Policy-Based Data Structure - is a family of data structures that are not part of the standard C ++ library. Maybe many of you will be quite new to these data structures, but this family provides data structures with amazing power that I believe after learning about it you will be persuaded and use it as a powerful tool in solving problems especially in competitive programming.

The Policy Hash Table has a remarkable power of 3 to 6 times faster for add and deletes instructions, 4 to 10 times for reading/writing. Sounds crazy, right? Below are benchmarks comparing the processing speed of the Policy Hash Table with unordered_map.

You can find the benchmarks code here

As I mentioned before, Policy Based Data Structure is not standard library of C++ so in order to use it, you must define these 2 lines in the beginning of your cpp file:

#inlude<ext/pb_ds/asso_container.hpp>
using namespace __gnu__pbds;

#inlude<ext/pb_ds/asso_container.hpp>
using namespace __gnu__pbds;

Declaration: The declaration of Hash Table in Policy Based Data Structure is similar to std::map, std::unordered_map, it contains key, value

gp_hash_table<int,int> table;

gp_hash_table<int,int> table;

Look at the benchmarks you can see how powerful Policy Hash Table is, but it's only theory, right?

So let's practice it with real problems.

Ex1:

Problem

Solution using map: (TLE on 10th test) - Link
Solution using unordered_map: (AC: 1075ms) - Link
Solution using gp_hash_table (AC: 405ms) - Link

Ex2:

Problem

Solution using unodered_map: (AC: 0.07s) Link
Solution using gp_hash_table: (AC: 0.07s) Link

As you can see, gp_hash_table is remarkably faster (it runs 2.5x faster in the first example compared to unordered_map)

But you guys may wonder if there is any case that can show the power of gp_hash_table more clearly, some cases that we can't use map (or even unordered_map). So here you go:

Ex3:

Problem

Soltuion using unordered_map: (TLE on 8th test) - Link
Solution using gp_hash_table: (AC: 3105ms) - Link

I know one example is not enough, but I'm sure that there are so many other cases where gp_hash_table can show you how power it is over map or unordered_map (I've been through some cases like that, but I can't find those example to show to you guys. My bad 😢)

Okay, we've seen the power of Policy Hash Table. So the question now is: Does it have any weakness? The answer is: Yes. Definitely.

Here is a example:

Problem

Solution using gp_hash_table: (TLE on 20th test) - Link
Solution using unordered_map (TLE on 18th test) - Link
Solution using map (AC - 234ms) - Link

I was very surprised by the fact that we can get AC with std::map.

But why we get that result ?

Policy Hash Table - as its name is built using the hash function. It will hash its keys using some hash functions so if someone knows these functions, they can easily build test cases which contain hashing collides that blow up our hash function. Therefore make the performance of Policy Hash Table (and unordered_map too). I'm not going into details about it. You can read it here for more details.

So basically, people can easily hack our solution by generating ant-hash test cases which have collisions offline and blow up the complexity of our hashmap. That's the case when our Policy Hash Table (or even unordered_map) will be the worst choice. But anytime a problem comes, a solution will come with it. 😌

We can defeat anti-hash tests by defining our hash function. Or there's a better solution. You just need to write your custom hash operator.

Like this:

const int RANDOM = chrono::high_resolution_clock::now().time_since_epoch().count();
struct chash {
    int operator()(int x) const { return x ^ RANDOM; }
};
gp_hash_table<key, int, chash> table;

const int RANDOM = chrono::high_resolution_clock::now().time_since_epoch().count();
struct chash {
    int operator()(int x) const { return x ^ RANDOM; }
};
gp_hash_table<key, int, chash> table;

Or:

const int RANDOM = chrono::high_resolution_clock::now().time_since_epoch().count();
struct chash 
{
	int operator()(int x) const {return x^RANDOM);}
}

const int RANDOM = chrono::high_resolution_clock::now().time_since_epoch().count();
struct chash 
{
	int operator()(int x) const {return x^RANDOM);}
}

struct custom_hash
{
    static uint64_t splitmix64(uint64_t x)
    {
        // http://xorshift.di.unimi.it/splitmix64.c
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }

    size_t operator()(uint64_t x) const
    {
        static const uint64_t FIXED_RANDOM = chrono::steady_clock::now().time_since_epoch().count();
        return splitmix64(x + FIXED_RANDOM);
    }
};

struct custom_hash
{
    static uint64_t splitmix64(uint64_t x)
    {
        // http://xorshift.di.unimi.it/splitmix64.c
        x += 0x9e3779b97f4a7c15;
        x = (x ^ (x >> 30)) * 0xbf58476d1ce4e5b9;
        x = (x ^ (x >> 27)) * 0x94d049bb133111eb;
        return x ^ (x >> 31);
    }

    size_t operator()(uint64_t x) const
    {
        static const uint64_t FIXED_RANDOM = chrono::steady_clock::now().time_since_epoch().count();
        return splitmix64(x + FIXED_RANDOM);
    }
};

Among above hash functions, I prefer the 3rd one because it's the one which I usually use and it's the most stable one. But it's longer and harder to remember compared to other hash functions.

How to use ?

gp_hash_table<int,int, custom_hash> [name_here];

gp_hash_table<int,int, custom_hash> [name_here];

It's really easy, right ?

So it's time to revenge. Yay

Solutions for Problem

Solution using 1st hash operator: (AC - 1715ms) - Link
Solution using 2nd hash operator: (AC - 156ms) - Link

Currently, Policy Hash Table only supports primitive data type as key. It means if you want to use an advanced data type like pair<..> you will have to write your hashing operator for it.

Like this:

// using pair<int,int> as key
struct chash_key
{
    int operator()(pii x) const { return x.first * 31 + x.second; }
};

// using pair<int,int> as key
struct chash_key
{
    int operator()(pii x) const { return x.first * 31 + x.second; }
};

🗒️

Another thing that you might wonder about is that in the benchmarks, there is another hash table cc_hash_table but I don't mention it because cc_hash_table and gp_hash_table use different hash functions. cc_hash_table uses chained hash tables while gp_hash_table use open addressed hash tables. You can check out this Link for more details. gp_hash_table sacrifices ~10% reading/writing speed to gain 3-6x in insertion/deletion/clearing

Okay, I've given you guys a brief introduction to Policy Hash Table, I think they're very interesting and useful. Since I knew them, I rarely use map, unordered_map. Moreover, Policy-Based Data Structures also has a few more data structures which are cool, you can check them out

So see you next time. Bye bye.

Tired of the slowness of map ? Go for Policy Hash Table!

1. Overview

2. How to use

3. Example

4. Weakness

4.1. Anti-hash tests

4.2. Lack of non-primitive key type

Summary