feat(r/sedonadb): Add basic DataFrame API with sd_select(), sd_transmute(), and sd_filter()#499
Conversation
3da7864 to
c72fb81
Compare
sd_select(), sd_transmute(), and sd_filter()
sd_select(), sd_transmute(), and sd_filter()sd_select(), sd_transmute(), and sd_filter()
|
why do we need |
...probably a better example would be just |
There was a problem hiding this comment.
Pull request overview
This PR implements basic DataFrame manipulation functions (sd_select(), sd_transmute(), and sd_filter()) for the SedonaDB R package, wrapping the expression translation system introduced in a previous PR. These functions provide a familiar dplyr-like API for column selection, transformation, and row filtering.
Changes:
- Added three new exported functions (
sd_select(),sd_transmute(),sd_filter()) in R/dataframe.R with corresponding Rust implementations - Updated documentation to consistently describe
.dataparameter as "A sedonadb_dataframe or an object that can be coerced to one" - Added comprehensive test coverage for the new DataFrame API functions
Reviewed changes
Copilot reviewed 17 out of 17 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| r/sedonadb/R/dataframe.R | Implements sd_select(), sd_transmute(), and sd_filter() functions with expression translation support |
| r/sedonadb/src/rust/src/dataframe.rs | Adds Rust methods select() and filter() to InternalDataFrame for expression-based operations |
| r/sedonadb/src/rust/src/expression.rs | Makes exprs() method public to support DataFrame operations |
| r/sedonadb/tests/testthat/test-dataframe.R | Adds test cases for the three new DataFrame functions |
| r/sedonadb/R/000-wrappers.R | Auto-generated wrapper functions for new Rust methods |
| r/sedonadb/src/rust/api.h | Auto-generated C API declarations |
| r/sedonadb/src/init.c | Auto-generated C initialization code |
| r/sedonadb/NAMESPACE | Exports new functions |
| r/sedonadb/man/*.Rd | Documentation files for new and updated functions |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

This PR implements
sd_select(),sd_transmute(), andsd_filter()wrapping the expression translation implemented in #468. The supported expressions are still very minimal but this establishes the first API we can expose in this way.I chose to do this instead of just implementing
dplyr::transmute()anddplyr::filter()because those functions have other arguments and perhaps the expectation of exact compatibility. Thesd_...()versions have the added benefit of converting to a SedonaDB data frame for you and usually it's a good idea for this to be explicit (particularly for now).This doesn't support aggregate expressions in the arguments, which does work in SQL and in dplyr. I took a stab at translating the DataFusion assembler of SELECT statements and it does work but is a bit more complicated and needs more testing than I have time to put together right now ( https://gist.github.com/paleolimbot/de220c55c96e721a50a4752397f1cbf9 ).
The next step is to add blanket support for all functions in the sedona-specific function registry so that we can do geo stuff.
Also,
sd_join()would be particularly useful to expose the (arguably) most useful part of SedonaDB as an engine.Created on 2026-01-23 with reprex v2.1.1