Skip to content

Conversation

@adriangb
Copy link
Contributor

@adriangb adriangb commented Jan 1, 2026

Which issue does this PR close?

Part of #19433

Rationale for this change

In preparation for ordering inference from Parquet metadata, we need to be able to store per-file ordering information on PartitionedFile. This PR adds the necessary infrastructure.

What changes are included in this PR?

  • Add ordering: Option<LexOrdering> field to PartitionedFile struct
  • Add new_from_meta(ObjectMeta) constructor for creating files from metadata (cleaner than manually constructing)
  • Add with_ordering() builder method to set ordering information
  • Add with_partition_values() builder method for consistency
  • Update all PartitionedFile constructors to initialize ordering: None
  • Update callsites in test_util, proto, and substrait to use new_from_meta

Are these changes tested?

Yes, existing tests pass. The new field is currently always None so no new tests are needed yet. Tests for ordering inference will come in a follow-up PR.

Are there any user-facing changes?

No user-facing API changes. The ordering field is public but users typically construct PartitionedFile via the provided constructors which handle this automatically.

🤖 Generated with Claude Code

@github-actions github-actions bot added core Core DataFusion crate substrait Changes to the substrait crate proto Related to proto crate datasource Changes to the datasource crate labels Jan 1, 2026
@adriangb adriangb force-pushed the partitioned-file-ordering branch 3 times, most recently from 96de14b to aeda804 Compare January 1, 2026 21:49
adriangb and others added 2 commits January 1, 2026 16:14
…uctor

Add infrastructure to track ordering information per file in preparation
for ordering inference from Parquet metadata.

Changes:
- Add `ordering: Option<LexOrdering>` field to `PartitionedFile` struct
- Add `new_from_meta(ObjectMeta)` constructor for creating files from metadata
- Add `with_ordering()` and `with_partition_values()` builder methods
- Update all PartitionedFile constructors to initialize `ordering: None`
- Update callsites in test_util, proto, and substrait to use new_from_meta

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Update all PartitionedFile struct literals to use new_from_meta()
builder pattern with appropriate builder methods like
with_partition_values(), with_extensions(), with_range(),
and with_statistics().

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@adriangb adriangb force-pushed the partitioned-file-ordering branch from aeda804 to bf51e52 Compare January 1, 2026 22:22
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds infrastructure to support per-file ordering information in preparation for ordering inference from Parquet metadata. The changes introduce a new ordering field to PartitionedFile along with a cleaner constructor pattern using new_from_meta() and builder methods.

Key Changes:

  • Added ordering: Option<LexOrdering> field to PartitionedFile struct with corresponding with_ordering() builder method
  • Introduced new_from_meta(ObjectMeta) constructor and with_partition_values() builder method for cleaner object construction
  • Refactored all existing PartitionedFile instantiations across the codebase to use the new constructor pattern

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
datafusion/datasource/src/mod.rs Core changes: added ordering field, new_from_meta() constructor, with_ordering() and with_partition_values() builder methods, updated all constructors to initialize ordering
datafusion/substrait/src/physical_plan/consumer.rs Refactored to use new_from_meta() instead of manual struct construction
datafusion/proto/src/physical_plan/from_proto.rs Refactored protobuf conversion to use new_from_meta() with builder pattern, updated test helper
datafusion/datasource/src/file_scan_config.rs Refactored test helper to use new_from_meta() with builder methods
datafusion/datasource/src/display.rs Simplified test utility to use new_from_meta()
datafusion/datasource-parquet/src/row_group_filter.rs Refactored test setup to use new_from_meta()
datafusion/core/tests/parquet/page_pruning.rs Simplified test setup to use new_from_meta()
datafusion/core/tests/parquet/custom_reader.rs Refactored to use new_from_meta() with with_extensions() builder
datafusion/core/src/test_util/parquet.rs Simplified FileScanConfigBuilder setup to use new_from_meta()
datafusion/core/src/datasource/physical_plan/parquet.rs Refactored multiple test helpers to use new_from_meta() with builder pattern
datafusion/core/src/datasource/mod.rs Simplified test utility to use new_from_meta()
datafusion/core/src/datasource/file_format/mod.rs Simplified test utility to use new_from_meta()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Core DataFusion crate datasource Changes to the datasource crate proto Related to proto crate substrait Changes to the substrait crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant