Unexpected degradation of FPR when `pattern_bits=4` #682

kevkrist · 2025-02-26T22:46:55Z

I am encountering unexpected behavior when using cuco::bloom_filter with pattern_bits = 4. The false positive rate (FPR) degrades too dramatically when changing from pattern_bits = 8 with a constant 'load factor' (i.e., the fraction of bits set in the filter). The issue may be related to the bit pattern selection.

The following code demonstrates the issue:

#include <cuco/bloom_filter.cuh>
#include <iostream>
#include <thrust/count.h>
#include <thrust/device_vector.h>
#include <thrust/sequence.h>

// 'Blocked' filter policy with 8B blocks
using policy_t = cuco::default_filter_policy<cuco::xxhash_64<uint32_t>, uint64_t, 1>;
using bf_t =
  cuco::bloom_filter<uint32_t, cuco::extent<std::size_t>, cuda::thread_scope_device, policy_t>;
constexpr size_t bits_per_block   = 64;
constexpr uint32_t pattern_bits_A = 4;
constexpr uint32_t pattern_bits_B = 8;
constexpr size_t bits_per_key_A   = 2 * pattern_bits_A;
constexpr size_t bits_per_key_B   = 2 * pattern_bits_B;

int main()
{
  // Initialize non-overlapping build and probe key sets
  thrust::device_vector<uint32_t> build_keys(1U << 20U);
  thrust::device_vector<uint32_t> probe_keys(1U << 25U);
  thrust::device_vector<bool> flags_A(1U << 25U, false);
  thrust::device_vector<bool> flags_B(1U << 25U, false);
  thrust::sequence(build_keys.begin(), build_keys.end(), 0, 2);
  thrust::sequence(probe_keys.begin(), probe_keys.end(), 1, 2);

  // Specify pattern bits for the policy
  policy_t policy_A(pattern_bits_A);
  bf_t filter_A(cuda::ceil_div(bits_per_key_A * build_keys.size(), bits_per_block), {}, policy_A);
  filter_A.add(build_keys.begin(), build_keys.end());
  filter_A.contains(probe_keys.begin(), probe_keys.end(), flags_A.begin());
  size_t fps_A   = thrust::count(flags_A.begin(), flags_A.end(), true);
  double_t fpr_A = 100.0 * fps_A / flags_A.size();
  std::cout << "FPR A: " << fpr_A << "\n";

  policy_t policy_B(pattern_bits_B);
  bf_t filter_B(cuda::ceil_div(bits_per_key_B * build_keys.size(), bits_per_block), {}, policy_B);
  filter_B.add(build_keys.begin(), build_keys.end());
  filter_B.contains(probe_keys.begin(), probe_keys.end(), flags_B.begin());
  size_t fps_B   = thrust::count(flags_B.begin(), flags_B.end(), true);
  double_t fpr_B = 100.0 * fps_B / flags_B.size();
  std::cout << "FPR B: " << fpr_B << "\n";

  return 0;
}

Observed Behavior:

FPR A: 16.9311
FPR B: 0.611573

Expected Behavior:
The FPR should increase more smoothly with decreasing pattern_bits / filter size. This configuration of 8B blocks with 4 bits being set per key is common (arrow/acero) and is not expected to produce such a high FPR with a 'load factor' of 0.5.

Environment:

Cuco version: 0.0.1
CUDA version: 12.2
Compiler: gcc 11.4.0
GPU: L4
OS: Ubuntu

Would appreciate any insights into what might be causing this! Or, if I'm missing something. Thanks!

The text was updated successfully, but these errors were encountered:

sleeepyjack · 2025-02-27T00:08:03Z

Hi @kevkrist,

Thanks for raising this issue.
I'm looking into it.

I suspect two possible reasons where this bug could originate from:

I made some recent changes to the bulk add kernel in Bloom filter optimizations (4/5): Implement device bulk add #672 which aren't perfectly covered by our unit tests. I opened [ENHANCEMENT]: Better tests for operations that allow varying CG sizes. #676 and I'm currently working to get these tests upstreamed.
It could be that I messed up the logic that generates the bit pattern for a key here. I could use a second pair of eyes to verify that the logic is actually correct.

I'll send an update as soon as I find the culprit.

sleeepyjack · 2025-02-27T01:11:47Z

I opened #683 and it seems that option 1 from my above post (regression after #672) is out of the race.

kevkrist · 2025-02-27T17:14:25Z

The pattern bits look correctly set to me. I do have one suspicion: this filter configuration only uses the last 24 bits of the hash (4 bits per word, 6 bits to shift the hash at each bit-setting iteration), unless I'm mistaken. These are also the hash bits that are predominantly used to determine the block index into which to insert the pattern. So, it's likely that keys that get hashed to the same block also have the same bit pattern.

This suspicion is partly confirmed with the following change to code in word_pattern (line 100 ff.):

    for (int32_t bit = 0; bit < bits_per_word; ++bit) {
      hash = cuda::std::rotl(hash, (int)bit_index_width);
      word |= word_type{1} << (hash & bit_index_mask);
      // hash >>= bit_index_width;
    }

Here I'm simply using the upper 24 bits of the hash to set the bits for the pattern. The output is now:

FPR A: 3.36147
FPR B: 0.49789

which is much better. Not sure yet if this is ideal. Let me know what you think.

EDIT: It might be a good idea, in general, to decouple (as much as possible) the hash bits for block indexing from those for bit pattern selection.

sleeepyjack · 2025-02-28T02:02:19Z

Thanks @kevkrist!

Yeah, for this specific parameter combination this approach brings down the FPR a lot. However, it seems that it performs worse in other cases.

It might be a good idea, in general, to decouple (as much as possible) the hash bits for block indexing from those for bit pattern selection.

I totally agree, and I've been aware of this concept for a long time. This is where I messed up in the implementation :)

I tried a few other things such as double hashing with a cheap secondary hash function such as a Murmur integer finalizer, but I'm not happy with the results yet. Maybe I should give multiplicative hashing a try 🤔

Also, I'm working on #685 for sanity checking the FPR with our test suite.

kevkrist · 2025-02-28T19:35:19Z

Sounds great! Let me know what you end up deciding to do.

PointKernel added the topic: bloom_filter Issues related to bloom_filter label Feb 26, 2025

sleeepyjack added the type: bug Something isn't working label Feb 27, 2025

sleeepyjack self-assigned this Feb 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected degradation of FPR when `pattern_bits=4` #682

Unexpected degradation of FPR when `pattern_bits=4` #682

kevkrist commented Feb 26, 2025

sleeepyjack commented Feb 27, 2025

sleeepyjack commented Feb 27, 2025 •

edited

Loading

kevkrist commented Feb 27, 2025 •

edited

Loading

sleeepyjack commented Feb 28, 2025

kevkrist commented Feb 28, 2025

Unexpected degradation of FPR when pattern_bits=4 #682

Unexpected degradation of FPR when pattern_bits=4 #682

Comments

kevkrist commented Feb 26, 2025

sleeepyjack commented Feb 27, 2025

sleeepyjack commented Feb 27, 2025 • edited Loading

kevkrist commented Feb 27, 2025 • edited Loading

sleeepyjack commented Feb 28, 2025

kevkrist commented Feb 28, 2025

Unexpected degradation of FPR when `pattern_bits=4` #682

Unexpected degradation of FPR when `pattern_bits=4` #682

sleeepyjack commented Feb 27, 2025 •

edited

Loading

kevkrist commented Feb 27, 2025 •

edited

Loading