feat(c/sedona-sedona-libgpuspatial): Refactoring GPU Spatial Join Library #556

pwrliang · 2026-01-27T19:01:58Z

This PR introduces a major refactoring of the GPU-accelerated library. Compared to the previous design, this version decouples the spatial join into distinct filtering and refinement stages, making it easier to integrate it into rust/sedona-spatial-join in an upcoming PR. Additionally, this update includes performance optimizations and minor structural improvements:

ParallelWkbLoader: Enhances parsing performance by balancing workloads across threads based on byte counts.
RelateEngine: Improves Point-in-Polygon (PIP) performance by allowing a single AABB to bound multiple line segments (conceptually similar to TG's Natural index).
MemoryManager: Enables the memory pool to reduce frequent allocation and deallocation overheads.
Header Renaming: Standardizes naming conventions by using .cuh for CUDA-exclusive headers and .hpp for mixed CUDA/C++ code.

petern48 · 2026-01-28T02:17:56Z

As you wait for a proper review, I'll suggest that you consider breaking this into multiple PRs. It looks like you described 4 separate changes (the bullets) that would tend nicely to at least 4 isolated PRs. I know it's more work for you, but it reduces the review burden significantly, as reviewers don't need to figure out which of the 4 bullets a particular code change applies to.

You can always base branches off of each other to reuse work from your other branches. Something like below, or however you see fit.

Header Renaming -> MemoryManager -> RelateEngine
       \
        \---> ParallelWkbLoader

(this is a random example, i didn't actually look into it that much)

Doing a separate PR for Header Renaming and straightforward changes would cut down the 88-file diff significantly, and could even be reviewed by people with less context about GPU join (like me).

Separate PRs could help your changes land faster, and help with future debugging / understanding when someone tries to figure out what happened. WDYT? Would breaking this up be reasonable?

paleolimbot · 2026-01-28T03:35:24Z

As you wait for a proper review, I'll suggest that you consider breaking this into multiple PRs.

@pwrliang kindly split this one out from the larger parent PR at my request...you are absolutely correct that 88 files and 6000 lines is a huge diff and those are excellent suggestions on how to split that up further.

I do plan to attempt reviewing this tomorrow; however, we do need to respect that this development is happening in a public repository with a community and in the future (or now, if Peter or other members of the community are blocked from participating because of this PR's size) we will need to have changes be incremental. We want this code in SedonaDB not just because it is awesome but because we want to be the place where future contributors add CUDS-accellerated spatial functions, and to do that we'll need to work in the open and incrementally.

My idea with scoping this to only include code in c/sedona-libgpuspatial was that it isn't used in the rest of the engine (and won't be by default without very special opt-in build/runtime configurations for some time). Even though this is large, the consequences of missing details are low...this is a very hard thing to do and until the end-to-end is in place it is hard to even know that it worked. Big PRs are definitely not ideal but also sometimes we need to do hard things. GPU spatial joins are a very very hard thing!

petern48 · 2026-01-28T08:50:54Z

(or now, if Peter or other members of the community are blocked from participating because of this PR's size)

To be clear, I'm not dying to participate 😅. Definitely too busy for that. Just figured I'd encourage making it more modular, for both the purposes of reviewing and for cleaner traceability and git history. Though yeah, the fact that this is isolated to c/sedona-libgpuspatial makes those less important.

kindly split this one out from the larger parent PR at my request

I guess my point is that it looks like it could be broken up even more (primarily based on the PR description). But if you're all good with reviewing it as it is, go for it! I don't mean to stand in the way.

pwrliang · 2026-01-28T13:16:38Z

Thanks for your suggestions. For the future PRs, I will make incremental changes to make them easy to review.

paleolimbot

Thank you for working on this, and apologies for taking so long to review it! I promise all of my comments are just details...there just happens to be a lot of code in this PR and so there are also a lot of details. Generally:

This is great! I really appreciate the test coverage and the new C interface that aligns more readily with the way our existing spatial join works. I'm glad to know this approach is also faster
There are a few things around iterating over WKB arrays that could be faster. It is OK if you don't want to explore ways to make this faster in this PR...I was just here and looking at your code and so I put a comment if I noticed something that could be faster. I will be looking at making it easier to do this sort of thing in geoarrow-c soon because I need it for Geography support (also a binding to a C++ library) and at least now I know that functions like memory size estimates or rechunking are worth building in when I'm there.
I do think we should deduplicate the three tests where you compare results to the GEOS-calculated version. The one that uses GEOS C++ is the cleanest and perhaps the other two could use that approach as well. It is also OK if they are testing the same bit of code to have them be one test.
A few minor comments on the C API
Removing the Rust section of this PR (e.g., by just deleting everything from the Rust wrapper crate except the bindgen bindings) will make it easier to merge this PR and review the Rust code separately (see inline comment).

c/sedona-libgpuspatial/libgpuspatial/include/gpuspatial/loader/parallel_wkb_loader.hpp

paleolimbot · 2026-02-02T17:00:13Z

c/sedona-libgpuspatial/libgpuspatial/include/gpuspatial/loader/parallel_wkb_loader.hpp

+    // Avoid division by zero
+    if (num_threads > 0) {


Should something like this fail with an assertion or exception?

paleolimbot · 2026-02-02T17:02:12Z

c/sedona-libgpuspatial/libgpuspatial/include/gpuspatial/loader/parallel_wkb_loader.hpp

+      if (!ArrowArrayViewIsNull(array_view_.get(), offset)) {
+        auto item = ArrowArrayViewGetBytesUnsafe(array_view_.get(), offset);
        total_bytes += item.size_bytes - 1      // byte order
                       - 2 * sizeof(uint32_t);  // type + size


Later you just use item.size_bytes. Are the extra five bytes per item important here and not elsewhere?

It does not matter. We just need to gauge the total number of bytes loaded into memory to avoid OOM with chunk parsing.

paleolimbot · 2026-02-02T17:08:37Z

c/sedona-libgpuspatial/libgpuspatial/include/gpuspatial/loader/parallel_wkb_loader.hpp

+  template <typename OFFSET_IT>
+  std::vector<uint32_t> assignBalancedWorks(OFFSET_IT begin, OFFSET_IT end,
+                                            uint32_t num_threads) const {
+    size_t total_bytes = 0;


To my reading these are only iterating over sequences representable by offset and length. If this is the case you can optmize some things (e.g., calling ArrowArrayViewIsNull() in a loop is slow and can often be skipped by checking for a zero null count and proceeding with a branchless loop).

Also, if this can be reduced to an offset/length you can compute the value you're looking for without peeking into the values at all depending on the array type (binary/large binary this is "just" reading two values from the offsets buffer).

I changed it to two branches with/without nulls.

paleolimbot · 2026-02-02T17:18:17Z

c/sedona-libgpuspatial/libgpuspatial/include/gpuspatial/refine/spatial_refiner.hpp

+
+  virtual void FinishBuilding() = 0;
+
+  virtual uint32_t Refine(const ArrowSchema* probe_schema, const ArrowArray* probe_array,


For the purposes of the internal C++ class for this, can you simplify these signatures by passing a const ArrowArrayView*?

paleolimbot · 2026-02-03T03:37:58Z

c/sedona-libgpuspatial/libgpuspatial/include/gpuspatial/gpuspatial_c.h

+
+struct SedonaSpatialRefiner {


Can this (and its members) have docstrings?

paleolimbot · 2026-02-03T03:41:47Z

c/sedona-libgpuspatial/libgpuspatial/include/gpuspatial/gpuspatial_c.h

+  int (*push_build)(struct SedonaSpatialRefiner* self,
+                    const struct ArrowSchema* build_schema,
+                    const struct ArrowArray* build_array);


Is it possible to add a callback int (*init_schema)(struct ArrowSchema* build_schema, struct ArrowSchema* probe_schema) and eliminate all the const struct ArrowSchema* arguments for the other callbacks?

I know this doesn't help you in Rust because arrow-rs likes to export the schema whether you want it or not. But this might be fixed at some point and from a C API standpoint it's much cleaner.

paleolimbot · 2026-02-03T03:44:05Z

c/sedona-libgpuspatial/libgpuspatial/include/gpuspatial/gpuspatial_c.h

+                       uint32_t* build_indices, uint32_t* probe_indices,
+                       uint32_t indices_size, uint32_t* new_indices_size);


I noticed in a recent PR that @Kontinuation used 64-bit indices for the build side but 32-bit indices for the probe side. I am guessing that here the extra 4 bytes per row probably makes a difference because you have to shuffle them to and from the GPU, but if it doesn't, using a uint64 for the build side might better align with Kristin's join code.

paleolimbot · 2026-02-03T03:45:12Z

c/sedona-libgpuspatial/libgpuspatial/include/gpuspatial/gpuspatial_c.h

+  int (*refine)(struct SedonaSpatialRefiner* self, const struct ArrowSchema* schema1,
+                const struct ArrowArray* array1, const struct ArrowSchema* schema2,
+                const struct ArrowArray* array2,
+                enum SedonaSpatialRelationPredicate predicate, uint32_t* indices1,
+                uint32_t* indices2, uint32_t indices_size, uint32_t* new_indices_size);


Some documentation for these parameters would be helpful (are these schemas the same as probe/build?)

paleolimbot · 2026-02-03T03:50:26Z

c/sedona-libgpuspatial/libgpuspatial/test/c_wrapper_test.cc

+    struct Payload {
+      GEOSContextHandle_t handle;
+      const GEOSGeometry* geom;
+      std::vector<uint32_t> build_indices;
+      std::vector<uint32_t> stream_indices;
+      SedonaSpatialRelationPredicate predicate;
+    };
+


I believe a lot of this code also exists in another test? Is it worth testing the C wrapper in that test as well instead of duplicating the code that calculates the "correct" answer or consolidating the three places where that is currently done into some common code?

Improve libgpuspaital

cb8f418

pwrliang changed the title ~~feat(c/sedona-sedona-libgpuspatial): Improving GPU Spatial Join Library~~ feat(c/sedona-sedona-libgpuspatial): Refactoring GPU Spatial Join Library Jan 27, 2026

paleolimbot reviewed Feb 3, 2026

View reviewed changes


		virtual void FinishBuilding() = 0;

		virtual uint32_t Refine(const ArrowSchema* probe_schema, const ArrowArray* probe_array,

		uint32_t* build_indices, uint32_t* probe_indices,
		uint32_t indices_size, uint32_t* new_indices_size);

feat(c/sedona-sedona-libgpuspatial): Refactoring GPU Spatial Join Library #556

Are you sure you want to change the base?

feat(c/sedona-sedona-libgpuspatial): Refactoring GPU Spatial Join Library #556

Uh oh!

Conversation

pwrliang commented Jan 27, 2026

Uh oh!

petern48 commented Jan 28, 2026

Uh oh!

paleolimbot commented Jan 28, 2026

Uh oh!

petern48 commented Jan 28, 2026

Uh oh!

pwrliang commented Jan 28, 2026

Uh oh!

paleolimbot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pwrliang Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

pwrliang Feb 3, 2026 •

edited

Loading