feat(rust/sedona-pointcloud): add optional round robin partitioning and parallel statistics extraction by b4l · Pull Request #648 · apache/sedona-db

b4l · 2026-02-20T08:50:25Z

This contains two optional features that greatly improve the performance of the LAS/LAZ listing table provider.

Round-robin partitioning: The default way to partition a dataset to enable parallel reading by DataFusion is through splitting files by byte ranges into the number of target partitions. For selective queries on (partially) ordered datasets that support pruning, this can result in unequal resource use, as all the work is done on one partition while the rest is pruned. Additionally, this breaks the existing locality in the input when it is converted, as data from all partitions ends up in each output row group. This approach addresses these issues by partitioning the dataset using a round-robin scheme across sequential chunks. This improves selective query performance by more than half.
Parallel statistics extraction: While the method to infer the schema, adopted from the Parquet reader, uses concurrency (metadata fetch concurrency), it is not parallel. Extracting statistics in parallel can substantially improve the extraction process by a factor of the number of cores available.

paleolimbot

Thank you for continuing to work on this!

At a high level, I think DataFusion automatically applies round robin partitioning if it thinks that it will benefit the query plan. The built-in Parquet reader doesn't do this and I would be surprised if you need to explicitly do anything here unless I'm not understanding what is going on here.

This will also need tests that enable the various pieces you've added here. It would also benefit other members of the community to have a PR description with brief summary / justification of the functionality being added.

paleolimbot · 2026-02-25T15:19:48Z

Ah, I see your point about the partitioning...the partitioning already occurred at the data source but there are just a lot of empty partitions. This is probably also an issue we have with the pruning in the GeoParquet reader (and perhaps DataFusion's built in Parquet reader since I just copied how they do pruning).

paleolimbot

Sorry it took me a while to circle back to this...I had a release TODO list and lost track of a few things. Feel free to ping me if this happens again!

This looks great! I added some optional suggestions of where you put the nice text that you have in the PR description into the code for future readers.

paleolimbot · 2026-02-28T04:16:53Z

rust/sedona-pointcloud/src/las/source.rs

    }

+    fn repartitioned(
+        &self,


Can you add some of the text you have in this PR to this section so that future readers have some background on why this is necessary?

done in c6f34a3

paleolimbot · 2026-02-28T04:18:26Z

rust/sedona-pointcloud/src/options.rs

+        pub parallel_statistics_extraction: bool, default = false
        pub persist_statistics: bool, default = false
+        pub round_robin_partitioning: bool, default = false


All of these would benefit from a brief summary docstring of when these values should be modified (e.g., use round robin partitioning when running queries with selective workloads that benefit from parallelization).

done in c6f34a3

b4l · 2026-03-02T15:27:51Z

@paleolimbot, no worries, I guess you have a full plate already. Added some documentation and reworked the options to be self-contained in the las module for now, which seems more concise.

paleolimbot

Thank you!

b4l added 2 commits February 19, 2026 13:18

Optional round-robin partitioning

bc52f65

Small refactor for metadata

de0bcbc

b4l force-pushed the tuning branch from 604a9d5 to 2ae2e26 Compare February 20, 2026 08:57

b4l changed the title ~~feat(rust/sedona-pointcloud) add optional round robin partitioning and parallel statistics extraction~~ feat(rust/sedona-pointcloud): add optional round robin partitioning and parallel statistics extraction Feb 20, 2026

Optional parallel statistics extraction

745df7c

b4l force-pushed the tuning branch from 2ae2e26 to 745df7c Compare February 20, 2026 09:07

Fix las statistics extraction

1a0b58e

b4l force-pushed the tuning branch from d9d6ee6 to 1a0b58e Compare February 20, 2026 10:15

paleolimbot reviewed Feb 20, 2026

View reviewed changes

Add some tests

57b0338

paleolimbot approved these changes Feb 28, 2026

View reviewed changes

Documentation and options revamp

c6f34a3

spelling mistakes

419261e

paleolimbot approved these changes Mar 2, 2026

View reviewed changes

paleolimbot merged commit 1637efe into apache:main Mar 2, 2026
17 checks passed

b4l deleted the tuning branch March 3, 2026 09:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(rust/sedona-pointcloud): add optional round robin partitioning and parallel statistics extraction#648

feat(rust/sedona-pointcloud): add optional round robin partitioning and parallel statistics extraction#648
paleolimbot merged 7 commits intoapache:mainfrom
b4l:tuning

b4l commented Feb 20, 2026 •

edited

Loading

Uh oh!

paleolimbot left a comment

Uh oh!

paleolimbot commented Feb 25, 2026

Uh oh!

paleolimbot left a comment

Uh oh!

paleolimbot Feb 28, 2026

Uh oh!

b4l Mar 2, 2026

Uh oh!

paleolimbot Feb 28, 2026

Uh oh!

b4l Mar 2, 2026

Uh oh!

b4l commented Mar 2, 2026

Uh oh!

paleolimbot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

b4l commented Feb 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

paleolimbot commented Feb 25, 2026

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

paleolimbot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

b4l Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

paleolimbot Feb 28, 2026

Choose a reason for hiding this comment

Uh oh!

b4l Mar 2, 2026

Choose a reason for hiding this comment

Uh oh!

b4l commented Mar 2, 2026

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

b4l commented Feb 20, 2026 •

edited

Loading