Skip to content

feat(rust/sedona-pointcloud): add laz chunk statistics#604

Merged
paleolimbot merged 2 commits intoapache:mainfrom
b4l:statistics
Feb 13, 2026
Merged

feat(rust/sedona-pointcloud): add laz chunk statistics#604
paleolimbot merged 2 commits intoapache:mainfrom
b4l:statistics

Conversation

@b4l
Copy link
Contributor

@b4l b4l commented Feb 12, 2026

No description provided.

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you 🎉 !

I took a high level skim and this looks great! It does need some more tests (e.g., you have pruning code here but there are no tests that exercise pruning). It is tricky to test if pruning actually occurred (can be done by inspecting the output of EXPLAIN ANALYZE + adding custom Metrics like we do for GeoParquet) and you don't have to do that here, but testing that the result is correct for a query that should be exercising the pruning path is something you can copy from the GeoParquet tests. I have found that test to be particularly useful when upgrading DataFusion because those internals change frequently (will change again in 52, and again in 53). A few test files may help (test files with a single point don't do a great job ensuring that mins and maxes aren't swapped).

This also needs some documentation for anything marked pub (maybe some of these should be pub(crate)) or behaviour that is custom (e.g., your chunk_statistics() needs some documentation to let users know how this works and why it is needed).

@b4l
Copy link
Contributor Author

b4l commented Feb 13, 2026

I added some tests to ensure no faulty pruning.

state: &dyn Session,
conf: FileScanConfig,
) -> Result<Arc<dyn ExecutionPlan>, DataFusionError> {
let mut source = conf
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@paleolimbot, these changes are necessary to actually cache the metadata. Maybe this should be considered for the Parquet reader as well.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! We do in general have cache issues 😬 (Peter tried something like this a while ago but we didn't notice a difference at the time: #294 ).

Copy link
Member

@paleolimbot paleolimbot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you!

state: &dyn Session,
conf: FileScanConfig,
) -> Result<Arc<dyn ExecutionPlan>, DataFusionError> {
let mut source = conf
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you! We do in general have cache issues 😬 (Peter tried something like this a while ago but we didn't notice a difference at the time: #294 ).

@paleolimbot paleolimbot merged commit 2f5f1e4 into apache:main Feb 13, 2026
17 checks passed
@b4l b4l deleted the statistics branch February 13, 2026 16:13
@paleolimbot paleolimbot added this to the 0.3.0 milestone Feb 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants