Advanced: SIMD-Accelerated Block Filter

The repository includes an additional high-performance filter implementation, SimdBlockFilter, located in src/simd-block.h. This is not a Cuckoo Filter but rather a highly optimized block Bloom filter included for performance comparison in the benchmarks.

Note: This filter requires a CPU that supports AVX2 (Advanced Vector Extensions 2) instructions.

Overview

SimdBlockFilter is based on the paper "Cache-, Hash- and Space-Efficient Bloom Filters" by Putze et al., with some modifications:

  1. Block-Based: The filter is divided into 32-byte blocks (or "buckets").
  2. Split Bloom Filter: Each block is a split Bloom filter, meaning each hash function maps to a separate portion of the block. In this implementation, each block is split into eight 32-bit lanes.
  3. SIMD-Optimized: The core operations (Add and Find) are implemented using AVX2 intrinsics. Instead of hashing an item multiple times, a single 64-bit hash value is used to derive 8 sub-hashes, which are processed in parallel to set or check bits across the 8 lanes of a 256-bit AVX register. This leads to extremely high throughput.

Usage

SimdBlockFilter is not integrated into the main CuckooFilter class. It is used as a standalone filter in the benchmark suite.

Here is a simplified example of its usage from benchmarks/bulk-insert-and-query.cc:

#include "simd-block.h"
#include <cmath>
#include <climits>

// Constructor takes the log2 of the desired heap space.
size_t add_count = 1000000;
// Allocate roughly add_count * 8 bits of space.
SimdBlockFilter<> filter(ceil(log2(add_count * 8.0 / CHAR_BIT)));

// Add an item
filter.Add(12345);

// Check for an item
bool found = filter.Find(12345); // returns true

Performance

The primary reason to consider a filter like this is for its exceptional performance. As seen in the output of the bulk-insert-and-query benchmark, the SimdBlock8 filter can achieve significantly higher insertion and lookup rates than the Cuckoo Filter implementations.

                  Million    Find
                 adds/sec      0% ...
     Cuckoo12       23.78   37.24 ...
   SimdBlock8       74.22   72.34 ...

In this example, SimdBlock8 is over 3x faster for insertions and 2x faster for lookups compared to a standard Cuckoo Filter.

When to Consider SimdBlockFilter

  • When your application's bottleneck is the speed of the probabilistic set.
  • When your target hardware is guaranteed to support AVX2.
  • When you do not require the ability to delete items, as SimdBlockFilter, like a standard Bloom filter, does not support deletion.