Skip to content

Optimize required_columns from BTreeSet<usize> to Vec<usize> in struct PushdownChecker<'schema> #19673

@kosiew

Description

@kosiew

In row_filter.rs, the required_columns field in both PushdownChecker and PushdownColumns structs uses BTreeSet<usize> to track column indices required for filter evaluation.

Current implementation:

struct PushdownChecker<'schema> {
    required_columns: BTreeSet<usize>,
    // ... other fields
}

struct PushdownColumns {
    required_columns: BTreeSet<usize>,
    nested: NestedColumnSupport,
}

Motivation

For typical filter predicates, the number of columns referenced is usually small (1-5 columns). Using BTreeSet<usize> adds overhead:

  • Memory overhead from tree structure
  • Insertion cost is O(log n) vs O(1) for append to Vec
  • The only operation that benefits from BTreeSet is deduplication, but for small sets, a simple Vec with linear scan would be faster

Proposed Solution

Replace BTreeSet<usize> with Vec<usize> and handle deduplication explicitly if needed

source: #19545 (comment)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions