feat: optimize count(*) for pk/append table#317
Conversation
There was a problem hiding this comment.
Pull request overview
Adds an optimized CountReader API for PK tables. The default TableRead::CreateCountReader returns NotImplemented, while KeyValueTableRead provides a real implementation built on a new CountSplitRead that picks between a metadata-only count for raw-convertible/DV splits and a merge + drop-delete + count path (via a new RowCountAccumulator) for MOR splits, using a column-pruned read schema. Predicate pushdown and force_keep_delete are rejected for now. Integration tests verify counts against existing full-read results, plus empty-splits and unsupported-option cases.
Changes:
- New
CountReaderabstraction (include/paimon/reader/count_reader.h) andTableRead::CreateCountReaderwith PK implementation viaCountSplitRead+RowCountAccumulator. - Metadata-based counting reuses
DataSplitImpl::PartialMergedRowCount; MOR path reusesMergeFileSplitRead::CreateSortMergeReaderForSection(..., drop_delete=true)with a trimmed PK-only schema (addedInternalReadContext::CreateWithSchemaandMergeFileSplitRead::GetKeyComparator). - Adds count integration tests and an unrelated
NormalizePathscheme test.
Reviewed changes
Copilot reviewed 15 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| include/paimon/reader/count_reader.h | New CountReader interface with CountRows(). |
| include/paimon/table/source/table_read.h | Adds virtual CreateCountReader(splits) on TableRead. |
| src/paimon/core/table/source/table_read.cpp | Default CreateCountReader returns NotImplemented. |
| src/paimon/core/table/source/key_value_table_read.{h,cpp} | PK CreateCountReader implementation; rejects predicate and force_keep_delete; stores path factory/context/executor. |
| src/paimon/core/operation/count_split_read.{h,cpp} | New CountSplitRead with metadata-count + merge-count strategies and PK-only read schema. |
| src/paimon/core/mergetree/row_count_accumulator.{h,cpp} | New helper that iterates SortMergeReader output and counts rows. |
| src/paimon/core/operation/internal_read_context.{h,cpp} | Adds CreateWithSchema factory to clone context with a new read schema. |
| src/paimon/core/operation/merge_file_split_read.h | Exposes GetKeyComparator() for use by count path. |
| src/paimon/CMakeLists.txt | Registers the two new source files. |
| test/inte/scan_and_read_inte_test.cpp | Adds count IT cases (MOR/DV/empty/consistency/predicate/force-keep-delete). |
| src/paimon/common/utils/path_util_test.cpp | Unrelated test for NormalizePath with URI scheme. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
469ae2d to
9ffcc24
Compare
|
Please add a issue for this pr. |
1d3b95c to
da51039
Compare
Purpose
Linked issue: close #321
This PR adds an optimized count(*) path for both PK and Append tables by introducing a dedicated CountReader API and table-specific implementations.
Main changes:
Tests
UT:
IT:
API and Format
API impact:
Storage format / protocol impact:
Documentation
No additional user-facing documentation is required for this PR.
Generative AI tooling
Generated-by: GitHub Copilot (GPT-5.3-Codex)