fix(rust/sedona-raster-functions): RasterExecutor always call the closure only once for scalar raster input#559
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a bug in the RasterExecutor where scalar raster inputs were only processed once instead of being broadcast across all iterations when combined with array arguments, causing output arrays to have incorrect lengths.
Changes:
- Modified
execute_raster_void()to iteratenum_iterationstimes for scalar rasters, broadcasting the scalar value - Added regression tests for scalar raster with array coordinate inputs in both world-to-raster and raster-to-world coordinate functions
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| rust/sedona-raster-functions/src/executor.rs | Fixed scalar raster broadcasting logic to iterate across all batch rows |
| rust/sedona-raster-functions/src/rs_worldcoordinate.rs | Added tests for scalar raster + array coordinates in raster-to-world conversion |
| rust/sedona-raster-functions/src/rs_rastercoordinate.rs | Added tests for scalar raster + array coordinates in world-to-raster conversion |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
9b05141 to
efb42d9
Compare
| pub fn execute_raster_void<F>(&self, mut func: F) -> Result<()> | ||
| where | ||
| F: FnMut(usize, Option<RasterRefImpl<'_>>) -> Result<()>, | ||
| F: FnMut(usize, Option<&RasterRefImpl<'_>>) -> Result<()>, |
There was a problem hiding this comment.
Passing reference instead of moving value into the closure to avoid repeatedly constructing RasterRefImpl from raster array or cloning it. We will not want to persist the RasterRefImpl value passed in for most of the time.
paleolimbot
left a comment
There was a problem hiding this comment.
Just a few comments on how to make the tests less verbose for the future LLMs/people implementing new functions that will probably copy them. Thank you!
| let world_x = Arc::new(arrow_array::Float64Array::from(vec![2.0, 2.0, 2.0])); | ||
| let world_y = Arc::new(arrow_array::Float64Array::from(vec![3.0, 3.0, 3.0])); |
There was a problem hiding this comment.
This particular test seems like it might benefit from having >1 unique value in the input.
There was a problem hiding this comment.
Changed the array argument to have non-unique values.
| let raster_struct = rasters.as_any().downcast_ref::<StructArray>().unwrap(); | ||
| let raster_struct_one = raster_struct | ||
| .slice(1, 1) | ||
| .as_any() | ||
| .downcast_ref::<StructArray>() | ||
| .unwrap() | ||
| .clone(); | ||
| let scalar_raster = ScalarValue::Struct(Arc::new(raster_struct_one)); | ||
|
|
||
| let world_x = Arc::new(arrow_array::Float64Array::from(vec![2.0, 2.0, 2.0])); | ||
| let world_y = Arc::new(arrow_array::Float64Array::from(vec![3.0, 3.0, 3.0])); |
There was a problem hiding this comment.
Same comments as above (I think the scalar raster can be generated more compactly and probably more than one unique value in the output would be a better test).
There was a problem hiding this comment.
Changed the array argument to have non-unique values.
Fixes RasterExecutor::execute_raster_void() to broadcast scalar raster inputs across num_iterations when other arguments are arrays, and tightens coordinate UDF tests to cover scalar raster + array coordinates.
1fbc755 to
a5ab298
Compare
paleolimbot
left a comment
There was a problem hiding this comment.
Thanks!
I think it is worth one more round of making these compact. There are a lot of raster tests in our future!
| let expected_coords: Vec<Option<i64>> = world_x | ||
| .iter() | ||
| .zip(world_y.iter()) | ||
| .map(|(x, y)| match (x, y) { | ||
| (Some(x), Some(y)) => { | ||
| let (rx, ry) = to_raster_coordinate(&raster_ref, x, y).unwrap(); | ||
| Some(match coord { | ||
| Coord::X => rx, | ||
| Coord::Y => ry, | ||
| }) | ||
| } | ||
| _ => None, | ||
| }) | ||
| .collect(); |
There was a problem hiding this comment.
It would probably be better (and more compact) to hard-code these values. This is a bit circular since it's essentially the same as the UDF implementation.
(Also for other tests)
There was a problem hiding this comment.
Makes sense. My initial attempt is to test the RasterExecutor but not the RS function itself. Hardcoding expected values is the correct approach since this test lives in rs_rastercoordinate.rs so it should test the correctness of the RS function as its primary goal.
| // Use raster 1 (invertible) as scalar. | ||
| let rasters = generate_test_rasters(2, Some(0)).unwrap(); | ||
| let raster_struct = rasters.as_any().downcast_ref::<StructArray>().unwrap(); | ||
| let scalar_raster = ScalarValue::try_from_array(raster_struct, 1).unwrap(); |
There was a problem hiding this comment.
| let scalar_raster = ScalarValue::try_from_array(raster_struct, 1).unwrap(); | |
| let scalar_raster = ScalarValue::try_from_array(&rasters, 1).unwrap(); |
I know this was my suggestion but I think it can be even more compact 😬
(Also for the other tests)
There was a problem hiding this comment.
Removed redundant code generated by LLM
Summary
Why
DataFusion can invoke these scalar UDFs with a scalar raster and array coordinate arguments. Previously the executor only invoked the per-row closure once for scalar rasters, producing output arrays of length 1 instead of the expected batch length.
Testing