In row_filter.rs, the required_columns field in both PushdownChecker and PushdownColumns structs uses BTreeSet<usize> to track column indices required for filter evaluation.
Current implementation:
struct PushdownChecker<'schema> {
required_columns: BTreeSet<usize>,
// ... other fields
}
struct PushdownColumns {
required_columns: BTreeSet<usize>,
nested: NestedColumnSupport,
}
Motivation
For typical filter predicates, the number of columns referenced is usually small (1-5 columns). Using BTreeSet<usize> adds overhead:
- Memory overhead from tree structure
- Insertion cost is O(log n) vs O(1) for append to
Vec
- The only operation that benefits from
BTreeSet is deduplication, but for small sets, a simple Vec with linear scan would be faster
Proposed Solution
Replace BTreeSet<usize> with Vec<usize> and handle deduplication explicitly if needed
source: #19545 (comment)