Usage Guide & Core Concepts
This guide covers the primary API of the Cuckoo Filter and the core concepts you need to understand to use it effectively.
Core API
The CuckooFilter
class provides a simple interface for adding, checking, and deleting items.
#include "cuckoofilter.h"
// All operations return a status code
enum Status {
Ok = 0,
NotFound = 1,
NotEnoughSpace = 2,
NotSupported = 3,
};
-
Status Add(const ItemType &item)
Adds an item to the filter. ReturnsOk
on success. If the filter is full and can't find a space after multiple attempts to move existing items, it will place the item in a temporary "victim cache" and returnOk
. If the victim cache is already occupied, it returnsNotEnoughSpace
. -
Status Contain(const ItemType &item) const
Checks for the presence of an item. ReturnsOk
if the item is likely in the set andNotFound
if it is definitely not. Due to the probabilistic nature of the filter, this can returnOk
for an item that was never added (a false positive). -
Status Delete(const ItemType &item)
Removes an item from the filter. ReturnsOk
on success andNotFound
if the item was not found. Important: You must ensure that the item being deleted is actually in the filter. Deleting a non-existent item can inadvertently remove a different item that happens to have the same fingerprint and hash to the same locations. -
size_t Size() const
Returns the total number of items currently stored in the filter. -
size_t SizeInBytes() const
Returns the total memory size of the filter's internal hash table in bytes.
Instantiation and Configuration
You create a filter by specifying the item type, bits per item, and maximum capacity. Additional template parameters allow for advanced configuration.
// Template parameters:
// CuckooFilter<ItemType, bits_per_item, TableType, HashFamily>
// Example: Filter for up to 1 million uint64_t items, using 12 bits per fingerprint.
CuckooFilter<uint64_t, 12> filter(1000000);
Template Parameters
-
ItemType
: The data type of items to be stored (e.g.,int
,uint64_t
,std::string
). The library's hasher must be able to process this type. -
bits_per_item
: The number of bits used for each item's fingerprint. This is the most critical parameter for tuning.- More bits: Lower false positive rate (FPR), but higher memory usage.
- Fewer bits: Higher FPR, but lower memory usage.
The theoretical lower bound on bits per item for a given FPR
ε
islog2(1/ε)
. The cuckoo filter requires slightly more space, typicallylog2(1/ε) + log2(2b/α)
whereb
is bucket size andα
is load factor.
-
TableType
(Optional): Specifies the internal storage mechanism. It defaults tocuckoofilter::SingleTable
. An alternative iscuckoofilter::PackedTable
, which offers better space efficiency through a technique called semi-sorting. See the Advanced Topics for details. -
HashFamily
(Optional): Specifies the hash function family to use. It defaults tocuckoofilter::TwoIndependentMultiplyShift
, a fast and effective universal hash function. Other options likeSimpleTabulation
are also available insrc/hashutil.h
.
How It Works: Cuckoo Hashing and Fingerprints
When you Add(item)
, the filter performs these steps:
- Hashing: The item is hashed to generate two potential bucket indices (
i1
,i2
) and a fingerprint (tag
). - Insertion Attempt: The filter tries to place the
tag
in an empty slot in bucketi1
ori2
. - Cuckoo Kicks: If both buckets are full, it randomly evicts an existing item from one of the buckets, moves it to its alternate location, and repeats the process. This chain of evictions is called a "cuckoo kick."
- Victim Cache: If a kick chain becomes too long (defaulting to 500 kicks), the filter gives up and places the last displaced item into a single-slot "victim cache". This prevents insertion failures at high load factors.