Skip to content

Clean up ECMWF data sources#747

Open
swbg wants to merge 5 commits intoNVIDIA:mainfrom
swbg:stefan/minor-ecmwf-changes
Open

Clean up ECMWF data sources#747
swbg wants to merge 5 commits intoNVIDIA:mainfrom
swbg:stefan/minor-ecmwf-changes

Conversation

@swbg
Copy link
Collaborator

@swbg swbg commented Mar 12, 2026

Earth2Studio Pull Request

Description

The ECMWF data sources had some additional, unused code and xarray coords were not cleanly removed. Changes:

  • Rename _call to fetch and remove the previously unused fetch methods
  • Rename "sample" to "ensemble" for consistency with other parts of Earth2Studio
  • Rename fetch_wrapper to _download_wrapper, because it actually wraps the download method

Two questions:

  • Should we remove IFS_ENS? I see the point of keeping IFS in addition to IFS_FX because IFS has been around for longer, but we introduced IFS_ENS alongside IFS_ENS_FX, which has the same functionality. We also don't have AIFS_ENS, just AIFS_ENS_FX
  • The _ENS subclasses go out of their way to not allow fetching multiple ensemble members at once. Should we add this functionality back?

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.
  • Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

@swbg swbg changed the title Stefan/minor ecmwf changes Clean up ECMWF data sources Mar 12, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 12, 2026

Greptile Summary

This PR cleans up the ECMWF open-data sources by consolidating a confusing dual-method design (__call__/_call + duplicate async fetch) into a single, clear fetch method on the base class, and tidies up dimension naming (sampleensemble) and method visibility (fetch_wrapper_download_wrapper).

Key changes:

  • API simplification: The previously unused async fetch overrides in every child class are removed; _call is promoted to the public fetch method on the base class, eliminating ~130 lines of boilerplate.
  • Coordinate clean-up: drop_vars("ensemble") and drop_vars("lead_time") are now called after isel reductions so that scalar coordinate remnants no longer leak into the returned xr.DataArray. This fixes a subtle pre-existing issue in AIFS_ENS_FX where .isel(sample=0) was called unconditionally (without guarding on whether the sample dim existed), which would have raised a ValueError for control-forecast members.
  • Naming consistency: The sample dimension is renamed to ensemble to match the rest of Earth2Studio; fetch_wrapper is renamed to _download_wrapper to make its private/internal nature explicit.
  • Breaking changes: Downstream code relying on await source.fetch(...) or the sample dimension name will need updating.

Confidence Score: 4/5

  • This PR is safe to merge; changes are well-scoped refactoring with one minor stale annotation to clean up.
  • The logic changes are correct — the drop_vars additions fix a real coord-leakage issue, the AIFS_ENS_FX unconditional isel(sample=0) guard is properly added, and the async consolidation removes dead code without altering runtime behaviour. The only finding is a leftover # type: ignore[override] comment that no longer applies.
  • No files require special attention beyond the single stale type-ignore comment noted.

Important Files Changed

Filename Overview
earth2studio/data/ecmwf.py Removes unused async fetch overrides from all child classes, renames _callfetch, sampleensemble dimension, and fetch_wrapper_download_wrapper; adds drop_vars calls to cleanly strip reduced coordinates. One stale # type: ignore[override] comment remains.

Last reviewed commit: 838660f

pass

def _call( # type: ignore[override]
def fetch( # type: ignore[override]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale # type: ignore[override] annotation

The # type: ignore[override] comment on fetch is a leftover from when the base class had an @abstractmethod async def fetch that this method was overriding (with a different, incompatible sync signature). Now that the abstract async def fetch has been removed from the base class entirely, there is nothing being overridden here and the suppression is no longer meaningful.

Suggested change
def fetch( # type: ignore[override]
def fetch(

@NickGeneva
Copy link
Collaborator

Should we remove IFS_ENS? I see the point of keeping IFS in addition to IFS_FX because IFS has been around for longer, but we introduced IFS_ENS alongside IFS_ENS_FX, which has the same functionality. We also don't have AIFS_ENS, just AIFS_ENS_FX

The non FX ones are useful when someone just wants the analysis states used by the forecast... in the case of when a model needs history the non FX version is useful. The FX is when the lead_time needs to reference the actual forecast steps.

The _ENS subclasses go out of their way to not allow fetching multiple ensemble members at once. Should we add this functionality back?

Right now none of the other ensemble classes output multiple ensembles. I think I prefer to keep the consistency of always returning [time, variable, lat, lon] over adding in the potential for the method to return an additional ensemble dim.

But if this would be particularly useful, perhaps we can change the behaviors but other ENS models like GEFS would need a similar update.

pass

@abstractmethod
async def fetch( # type: ignore[override]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fetch should still be async here. Call is the synchronous. If there isnt really async implementation then this should then just pass through to the call function.

Copy link
Collaborator

@NickGeneva NickGeneva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Stefan!

The updates to the xarray dim and also the rename of the _fetch_wrapper seem good. But the removal of the async fetch has some consistency issues with the interface for the data sources.

@swbg
Copy link
Collaborator Author

swbg commented Mar 18, 2026

I have renamed async def _fetch to async def fetch to comply with the data source protocol. This way, fetch is now also parallel in structure to other data sources (e.g., validating times, setting up the xarray array, creating tasks).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants