Usage Guide & Core Concepts

This guide covers the primary API of the Cuckoo Filter and the core concepts you need to understand to use it effectively.

Core API

The CuckooFilter class provides a simple interface for adding, checking, and deleting items.

#include "cuckoofilter.h"

// All operations return a status code
enum Status {
  Ok = 0,
  NotFound = 1,
  NotEnoughSpace = 2,
  NotSupported = 3,
};
  • Status Add(const ItemType &item) Adds an item to the filter. Returns Ok on success. If the filter is full and can't find a space after multiple attempts to move existing items, it will place the item in a temporary "victim cache" and return Ok. If the victim cache is already occupied, it returns NotEnoughSpace.

  • Status Contain(const ItemType &item) const Checks for the presence of an item. Returns Ok if the item is likely in the set and NotFound if it is definitely not. Due to the probabilistic nature of the filter, this can return Ok for an item that was never added (a false positive).

  • Status Delete(const ItemType &item) Removes an item from the filter. Returns Ok on success and NotFound if the item was not found. Important: You must ensure that the item being deleted is actually in the filter. Deleting a non-existent item can inadvertently remove a different item that happens to have the same fingerprint and hash to the same locations.

  • size_t Size() const Returns the total number of items currently stored in the filter.

  • size_t SizeInBytes() const Returns the total memory size of the filter's internal hash table in bytes.

Instantiation and Configuration

You create a filter by specifying the item type, bits per item, and maximum capacity. Additional template parameters allow for advanced configuration.

// Template parameters:
// CuckooFilter<ItemType, bits_per_item, TableType, HashFamily>

// Example: Filter for up to 1 million uint64_t items, using 12 bits per fingerprint.
CuckooFilter<uint64_t, 12> filter(1000000);

Template Parameters

  1. ItemType: The data type of items to be stored (e.g., int, uint64_t, std::string). The library's hasher must be able to process this type.

  2. bits_per_item: The number of bits used for each item's fingerprint. This is the most critical parameter for tuning.

    • More bits: Lower false positive rate (FPR), but higher memory usage.
    • Fewer bits: Higher FPR, but lower memory usage. The theoretical lower bound on bits per item for a given FPR ε is log2(1/ε). The cuckoo filter requires slightly more space, typically log2(1/ε) + log2(2b/α) where b is bucket size and α is load factor.
  3. TableType (Optional): Specifies the internal storage mechanism. It defaults to cuckoofilter::SingleTable. An alternative is cuckoofilter::PackedTable, which offers better space efficiency through a technique called semi-sorting. See the Advanced Topics for details.

  4. HashFamily (Optional): Specifies the hash function family to use. It defaults to cuckoofilter::TwoIndependentMultiplyShift, a fast and effective universal hash function. Other options like SimpleTabulation are also available in src/hashutil.h.

How It Works: Cuckoo Hashing and Fingerprints

When you Add(item), the filter performs these steps:

  1. Hashing: The item is hashed to generate two potential bucket indices (i1, i2) and a fingerprint (tag).
  2. Insertion Attempt: The filter tries to place the tag in an empty slot in bucket i1 or i2.
  3. Cuckoo Kicks: If both buckets are full, it randomly evicts an existing item from one of the buckets, moves it to its alternate location, and repeats the process. This chain of evictions is called a "cuckoo kick."
  4. Victim Cache: If a kick chain becomes too long (defaulting to 500 kicks), the filter gives up and places the last displaced item into a single-slot "victim cache". This prevents insertion failures at high load factors.