Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement a rex backend for xarray (rexarray) #192

Open
wants to merge 62 commits into
base: main
Choose a base branch
from
Open

Conversation

ppinchuk
Copy link
Collaborator

@ppinchuk ppinchuk commented Feb 3, 2025

Add a backend for xarray that allows users to read in rex-style NREL data.

The implementation itself closely follows the h5netcdf backend implementation:
https://github.com/pydata/xarray/blob/main/xarray/backends/h5netcdf_.py

Lazy loading is fully supported, which is a big reason why this implementation is so long.

HSDS and S3 (via fsspec) access is explicitly supported.

@ppinchuk ppinchuk self-assigned this Feb 3, 2025
else:
lock = combine_locks([HDF5_LOCK, get_write_lock(filename)])

manager = CachingFileManager(h5py.File, filename, mode=mode,
Copy link
Member

@grantbuster grantbuster Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay so you might have to walk me through how all of this works, but is this where you define how the file is actually being opened? and with what object handler?

My extra ask (or just naive question) is whether we can open an S3 or HSDS path with xarray.open_dataset() by using h5pyd.File or fsspec.open? It seems like maybe we can open an S3 path but my kernel just hangs when i try. I can't tell if it's latency because its doing a lot behind the scenes or if S3/fsspec is just not supported.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IDK if we can support those but I definitely want to if possible. Adding/validating h5py/fsspec is still on my TODO list (if it works already, its basically by accident). I plan to add tests for these cases as well, assuming it's possible to support these cases out of the box.

Sorry for dragging my feet on this code, I haven't touched it in a while. My plan is to get back into it this week

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries at all, i haven't had time to review or provide feedback! I'm basically trying to figure out where the h5 file is actually opened and where we specify which object opens it. I think if we can figure that out we could add hsds/fsspec integration? Why don't you reach out when you start work on this again and explain stuff to me and maybe i can help or brainstorm.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So actually, the file is opened pretty much where you left this comment.

Definitely still happy to walk though this code - just let me know when you have time!

As of now, HSDS and S3 are explicitly supported. I have added tests for reading files using xarray in both the hsds and s3 test files.

@ppinchuk ppinchuk requested review from castelao, grantbuster and bnb32 and removed request for castelao February 28, 2025 17:04
@ppinchuk ppinchuk added enhancement Update to logic or general code improvements p-medium Priority: medium labels Feb 28, 2025
@ppinchuk ppinchuk marked this pull request as ready for review February 28, 2025 17:05
@ppinchuk
Copy link
Collaborator Author

Dare I say this is ready for review? :P

One question for all, but especially @grantbuster : should we make fsspec a base rex dependency? I think this library is used all over the place, including in our other dependencies like pandas. This would mean users can immediately access S3 files without having to specify [s3] during installation. Did we make it an optional dependency intentionally? I know we had to make h5pyd optional because of compatibility issues with the wtk code

@ppinchuk ppinchuk requested a review from castelao February 28, 2025 17:43
@ppinchuk ppinchuk changed the title [WIP] Implement a rex backend for xarray (rexarray) Implement a rex backend for xarray (rexarray) Feb 28, 2025
Copy link
Collaborator

@bnb32 bnb32 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will take a deeper dive later but here are just a couple random comments.

return data * scale_factor + adder


def import_fsspec_or_fail(file_path=None):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Little thing - did you consider just having a combined function import_module_or_fail which accepts either "hsds" or "s3"?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, went fast and didn't consider this. I assume you are suggesting an if statement inside a more generic function?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, exactly. Always find myself with these choices for concision vs simplicity haha.

self.attrs = {}


class RexArrayWrapper(BackendArray):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file mostly follows xarray code, right? Could you point to the main differences I should focus on? Is the main new thing this wrapper class?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yup, this class and RexStore are the ones that contain most of the rex-specific changes. I would focus on those two

@ppinchuk
Copy link
Collaborator Author

@bnb32, @grantbuster, @castelao Friendly bump :)

Curious to hear if anyone has tried using this and/or had success with it

Copy link
Member

@castelao castelao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ppinchuk , some minor comments.

@@ -1,4 +1,5 @@
click>=7.0
dask
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's probably a good idea to set an expected range of versions such as dask = ">=2025.2.0,<2026"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, will add!

"description": "Global horizontal irradiance is the total amount of solar radiation received per unit area on a horizontal surface."
},
"inversemoninobukhovlength": {
"standard_name": "latitude",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was probably left behind, but it looks like there is no standard name for the inverse. If that's right, the standard_name should be removed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shoot, great catch! Thanks!!

"standard_name": "air_temperature",
"units": "C"
},
"time": {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you use this time? A common approach is using f64 and using the metadata to encode/decode it, according to the calendar and units.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is just a convenience alias for time_index, since people might expect that kind of data in a time variable

@bnb32
Copy link
Collaborator

bnb32 commented Mar 19, 2025

@bnb32, @grantbuster, @castelao Friendly bump :)

Curious to hear if anyone has tried using this and/or had success with it

Digging back in is on the list for next week :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Update to logic or general code improvements p-medium Priority: medium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants