-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a rex
backend for xarray (rexarray)
#192
base: main
Are you sure you want to change the base?
Conversation
rex/external/rexarray.py
Outdated
else: | ||
lock = combine_locks([HDF5_LOCK, get_write_lock(filename)]) | ||
|
||
manager = CachingFileManager(h5py.File, filename, mode=mode, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay so you might have to walk me through how all of this works, but is this where you define how the file is actually being opened? and with what object handler?
My extra ask (or just naive question) is whether we can open an S3 or HSDS path with xarray.open_dataset() by using h5pyd.File
or fsspec.open
? It seems like maybe we can open an S3 path but my kernel just hangs when i try. I can't tell if it's latency because its doing a lot behind the scenes or if S3/fsspec is just not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IDK if we can support those but I definitely want to if possible. Adding/validating h5py/fsspec is still on my TODO list (if it works already, its basically by accident). I plan to add tests for these cases as well, assuming it's possible to support these cases out of the box.
Sorry for dragging my feet on this code, I haven't touched it in a while. My plan is to get back into it this week
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries at all, i haven't had time to review or provide feedback! I'm basically trying to figure out where the h5 file is actually opened and where we specify which object opens it. I think if we can figure that out we could add hsds/fsspec integration? Why don't you reach out when you start work on this again and explain stuff to me and maybe i can help or brainstorm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So actually, the file is opened pretty much where you left this comment.
Definitely still happy to walk though this code - just let me know when you have time!
As of now, HSDS and S3 are explicitly supported. I have added tests for reading files using xarray in both the hsds and s3 test files.
Dare I say this is ready for review? :P One question for all, but especially @grantbuster : should we make |
rex
backend for xarray (rexarray)rex
backend for xarray (rexarray)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will take a deeper dive later but here are just a couple random comments.
return data * scale_factor + adder | ||
|
||
|
||
def import_fsspec_or_fail(file_path=None): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Little thing - did you consider just having a combined function import_module_or_fail
which accepts either "hsds" or "s3"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, went fast and didn't consider this. I assume you are suggesting an if
statement inside a more generic function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, exactly. Always find myself with these choices for concision vs simplicity haha.
self.attrs = {} | ||
|
||
|
||
class RexArrayWrapper(BackendArray): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file mostly follows xarray code, right? Could you point to the main differences I should focus on? Is the main new thing this wrapper class?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, this class and RexStore
are the ones that contain most of the rex-specific changes. I would focus on those two
@bnb32, @grantbuster, @castelao Friendly bump :) Curious to hear if anyone has tried using this and/or had success with it |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ppinchuk , some minor comments.
@@ -1,4 +1,5 @@ | |||
click>=7.0 | |||
dask |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably a good idea to set an expected range of versions such as dask = ">=2025.2.0,<2026"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good call, will add!
"description": "Global horizontal irradiance is the total amount of solar radiation received per unit area on a horizontal surface." | ||
}, | ||
"inversemoninobukhovlength": { | ||
"standard_name": "latitude", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was probably left behind, but it looks like there is no standard name for the inverse. If that's right, the standard_name
should be removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shoot, great catch! Thanks!!
"standard_name": "air_temperature", | ||
"units": "C" | ||
}, | ||
"time": { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you use this time
? A common approach is using f64
and using the metadata to encode/decode it, according to the calendar and units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is just a convenience alias for time_index
, since people might expect that kind of data in a time variable
Digging back in is on the list for next week :) |
Add a backend for
xarray
that allows users to read in rex-style NREL data.The implementation itself closely follows the
h5netcdf
backend implementation:https://github.com/pydata/xarray/blob/main/xarray/backends/h5netcdf_.py
Lazy loading is fully supported, which is a big reason why this implementation is so long.
HSDS and S3 (via
fsspec
) access is explicitly supported.