feat(rust/sedona-pointcloud): add laz chunk statistics#604
feat(rust/sedona-pointcloud): add laz chunk statistics#604paleolimbot merged 2 commits intoapache:mainfrom
Conversation
paleolimbot
left a comment
There was a problem hiding this comment.
Thank you 🎉 !
I took a high level skim and this looks great! It does need some more tests (e.g., you have pruning code here but there are no tests that exercise pruning). It is tricky to test if pruning actually occurred (can be done by inspecting the output of EXPLAIN ANALYZE + adding custom Metrics like we do for GeoParquet) and you don't have to do that here, but testing that the result is correct for a query that should be exercising the pruning path is something you can copy from the GeoParquet tests. I have found that test to be particularly useful when upgrading DataFusion because those internals change frequently (will change again in 52, and again in 53). A few test files may help (test files with a single point don't do a great job ensuring that mins and maxes aren't swapped).
This also needs some documentation for anything marked pub (maybe some of these should be pub(crate)) or behaviour that is custom (e.g., your chunk_statistics() needs some documentation to let users know how this works and why it is needed).
|
I added some tests to ensure no faulty pruning. |
| state: &dyn Session, | ||
| conf: FileScanConfig, | ||
| ) -> Result<Arc<dyn ExecutionPlan>, DataFusionError> { | ||
| let mut source = conf |
There was a problem hiding this comment.
@paleolimbot, these changes are necessary to actually cache the metadata. Maybe this should be considered for the Parquet reader as well.
There was a problem hiding this comment.
Thank you! We do in general have cache issues 😬 (Peter tried something like this a while ago but we didn't notice a difference at the time: #294 ).
| state: &dyn Session, | ||
| conf: FileScanConfig, | ||
| ) -> Result<Arc<dyn ExecutionPlan>, DataFusionError> { | ||
| let mut source = conf |
There was a problem hiding this comment.
Thank you! We do in general have cache issues 😬 (Peter tried something like this a while ago but we didn't notice a difference at the time: #294 ).
No description provided.