Skip to content

feat(scan): support built-in global index search during scan process#23

Merged
lszskye merged 2 commits into
alibaba:mainfrom
lszskye:global_index_batch_scan
Dec 22, 2025
Merged

feat(scan): support built-in global index search during scan process#23
lszskye merged 2 commits into
alibaba:mainfrom
lszskye:global_index_batch_scan

Conversation

@lszskye
Copy link
Copy Markdown
Collaborator

@lszskye lszskye commented Dec 19, 2025

Purpose

feat(scan): support built-in global index lookup during scan

This PR enhances the scan operation to natively support global index lookups in two ways:

  1. Index-driven scan: When a distributed query provides precomputed index results, the scan can directly utilize these results to fetch only the relevant data, significantly reducing I/O overhead.

  2. Built-in scan: During the scan process, if global index search is enabled and applicable global indexes exist, the system automatically evaluates the scan predicates, uses them to query the relevant global index first, and then retrieves only the corresponding data.

Linked issue: #5

Tests

GlobalIndexTest

Comment thread src/paimon/core/global_index/indexed_split_test.cpp
Comment thread src/paimon/core/schema/table_schema.cpp
Comment thread src/paimon/core/table/source/data_evolution_batch_scan.cpp
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR enhances the scan operation to natively support global index lookups in two ways: (1) utilizing precomputed index results provided by distributed queries, and (2) automatically evaluating scan predicates against global indexes during the scan process when enabled. The implementation introduces a new DataEvolutionBatchScan wrapper that orchestrates global index evaluation and integrates results into the scan pipeline.

Key changes:

  • Introduces DataEvolutionBatchScan to wrap DataTableBatchScan and handle global index integration
  • Refactors ScanContext to use GlobalIndexResult instead of raw std::vector<Range> for row ranges
  • Adds Range::Exclude() utility for computing non-indexed ranges and GlobalIndexResult::ToRanges() for conversion

Reviewed changes

Copilot reviewed 58 out of 58 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/paimon/core/table/source/data_evolution_batch_scan.cpp New class implementing built-in global index scan with automatic evaluation and result merging
src/paimon/core/global_index/global_index_scan_impl.cpp Adds ParallelScan method for concurrent index evaluation across ranges
src/paimon/core/operation/scan_context.cpp Changes ScanContext to accept GlobalIndexResult instead of raw row ranges
src/paimon/common/utils/range.cpp Implements Range::Exclude() to compute complement ranges
src/paimon/common/global_index/global_index_result.cpp Adds ToRanges() to convert bitmap results to range vectors
test/inte/global_index_test.cpp Refactors tests to use new ScanGlobalIndexAndData helper and adds comprehensive scan integration tests
src/paimon/core/core_options.cpp Adds global-index.enabled option (default true) to control feature enablement

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/paimon/core/table/source/data_evolution_batch_scan.cpp
Comment thread src/paimon/core/global_index/global_index_scan_impl.cpp
Comment thread src/paimon/core/table/source/data_evolution_batch_scan.cpp
Comment thread src/paimon/core/table/source/data_evolution_batch_scan.cpp Outdated
@lucasfang
Copy link
Copy Markdown
Collaborator

+1

@lszskye lszskye merged commit b9cd10e into alibaba:main Dec 22, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants