Advanced: SIMD-Accelerated Block Filter
The repository includes an additional high-performance filter implementation, SimdBlockFilter
, located in src/simd-block.h
. This is not a Cuckoo Filter but rather a highly optimized block Bloom filter included for performance comparison in the benchmarks.
Note: This filter requires a CPU that supports AVX2 (Advanced Vector Extensions 2) instructions.
Overview
SimdBlockFilter
is based on the paper "Cache-, Hash- and Space-Efficient Bloom Filters" by Putze et al., with some modifications:
- Block-Based: The filter is divided into 32-byte blocks (or "buckets").
- Split Bloom Filter: Each block is a split Bloom filter, meaning each hash function maps to a separate portion of the block. In this implementation, each block is split into eight 32-bit lanes.
- SIMD-Optimized: The core operations (
Add
andFind
) are implemented using AVX2 intrinsics. Instead of hashing an item multiple times, a single 64-bit hash value is used to derive 8 sub-hashes, which are processed in parallel to set or check bits across the 8 lanes of a 256-bit AVX register. This leads to extremely high throughput.
Usage
SimdBlockFilter
is not integrated into the main CuckooFilter
class. It is used as a standalone filter in the benchmark suite.
Here is a simplified example of its usage from benchmarks/bulk-insert-and-query.cc
:
#include "simd-block.h"
#include <cmath>
#include <climits>
// Constructor takes the log2 of the desired heap space.
size_t add_count = 1000000;
// Allocate roughly add_count * 8 bits of space.
SimdBlockFilter<> filter(ceil(log2(add_count * 8.0 / CHAR_BIT)));
// Add an item
filter.Add(12345);
// Check for an item
bool found = filter.Find(12345); // returns true
Performance
The primary reason to consider a filter like this is for its exceptional performance. As seen in the output of the bulk-insert-and-query
benchmark, the SimdBlock8
filter can achieve significantly higher insertion and lookup rates than the Cuckoo Filter implementations.
Million Find
adds/sec 0% ...
Cuckoo12 23.78 37.24 ...
SimdBlock8 74.22 72.34 ...
In this example, SimdBlock8
is over 3x faster for insertions and 2x faster for lookups compared to a standard Cuckoo Filter.
When to Consider SimdBlockFilter
- When your application's bottleneck is the speed of the probabilistic set.
- When your target hardware is guaranteed to support AVX2.
- When you do not require the ability to delete items, as
SimdBlockFilter
, like a standard Bloom filter, does not support deletion.