Skip to content

[WIP] Rewrite download infrastructure with pooch#1182

Draft
Copilot wants to merge 7 commits intomainfrom
copilot/rewrite-download-infrastructure
Draft

[WIP] Rewrite download infrastructure with pooch#1182
Copilot wants to merge 7 commits intomainfrom
copilot/rewrite-download-infrastructure

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Mar 30, 2026

  • Replace wget dependency with pooch
  • Rewrite data_handling.py to use pooch.retrieve()
  • Add known_hash / known_hashes fields and thread through download functions
  • Add explicit known_hash=None TODOs to all GridDescription/Experiment instances
  • Remove default values from known_hash/known_hashes fields (make them required)
  • Remove default from download_and_extract() and download_test_data() known_hash param
  • Use [] instead of .get() in _download_ser_data
  • Add explicit known_hash=None TODOs to all MuphysExperiment instances
  • Fix TODO format: add : after TODO(msimberg) to pass pre-commit checks

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

@jenkins-apn
Copy link
Copy Markdown

Hi there, this is jenkins continuous integration...
Do you want me to verify this patch?

1 similar comment
@jenkins-apn
Copy link
Copy Markdown

Hi there, this is jenkins continuous integration...
Do you want me to verify this patch?

Copilot AI linked an issue Mar 30, 2026 that may be closed by this pull request
Copilot AI and others added 2 commits March 30, 2026 10:34
- Replace `wget` library with `pooch` in data_handling.py
- Use `pooch.retrieve()` with `pooch.Untar(extract_dir=".")` for downloading
  and extracting tar archives
- Remove manual temp directory management and tarfile handling
- Remove the private `_perform_download()` helper function
- Update dependency in model/testing/pyproject.toml: wget>=3.2 -> pooch>=1.7.0
- Update root pyproject.toml: wget>=3.2 -> pooch>=1.7.0
- Update uv.lock to reflect dependency changes

Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/97eae63c-abce-4bf1-90d6-1f7c34967217

Co-authored-by: jcanton <5622559+jcanton@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@msimberg msimberg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle I think this already looks good because it's a simple change (I've very briefly tested it locally as well).

However, if copilot (not mentioning you, yet) is motivated, I think there are some adjacent changes that could be made.

Comment thread model/testing/src/icon4py/model/testing/data_handling.py Outdated

Downloads to a temporary directory in the destination directory
(not /tmp to avoid space constraints).
Uses pooch for downloading and archive extraction.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does someone know why we avoid /tmp? If we want to keep avoiding it then I think this comment should still mention /tmp. Otherwise I'd prefer to move to /tmp, or more correctly to $TMPDIR (is that a thing on macos?). Possibly with an override through ICON4PY_TMP_DATA_PATH or something like that?

Side note, but I'd also like to see ICON4PY_TEST_DATA_PATH default to something in ~/.cache/icon4py (or $XDG_CACHE_DIR/icon4py) but I don't know if others agree? I already set ICON4PY_TEST_DATA_PATH to ~/.cache/icon4py to share the cache between different icon4py worktrees.

This can also be out of scope for this PR.

Thread known_hash parameter through download_and_extract(),
download_test_data(), _download_grid_file(), and muphys test fixtures.
All hashes are None for now with TODO(msimberg) reminders.

Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/9803a0a8-5d2a-40c0-9350-bb28ca915902

Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
Copilot AI requested a review from msimberg April 15, 2026 09:16
Comment thread model/testing/src/icon4py/model/testing/definitions.py Outdated
…nces

- Add known_hash=None with TODO(msimberg) to every GridDescription instance
- Add known_hashes dict field to Experiment dataclass (per-rank: 1, 2, 4)
- Add known_hashes={1: None, 2: None, 4: None} with TODO(msimberg) to every Experiment instance
- Thread known_hashes through _download_ser_data in datatest.py

Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/6dcb0164-6d9b-4703-8785-37167685647b

Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
Copilot AI requested a review from msimberg April 15, 2026 10:47
Comment thread model/testing/src/icon4py/model/testing/fixtures/datatest.py Outdated
Comment thread model/testing/src/icon4py/model/testing/data_handling.py Outdated
Comment thread model/testing/src/icon4py/model/testing/definitions.py Outdated
Comment thread model/testing/src/icon4py/model/testing/definitions.py Outdated
… values

- Remove default=None from GridDescription.known_hash, MuphysExperiment.known_hash
- Remove default_factory from Experiment.known_hashes, reorder before version field
- Remove default=None from download_and_extract() and download_test_data() known_hash param
- Use [] instead of .get() for known_hashes lookup in _download_ser_data
- Add explicit known_hash=None TODO(msimberg) to all MuphysExperiment instances

Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/80900bd2-f4e9-4d60-8600-a3543d7c8cbd

Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
Copilot AI requested a review from msimberg April 15, 2026 11:13
@github-actions
Copy link
Copy Markdown

Mandatory Tests

Please make sure you run these tests via comment before you merge!

  • cscs-ci run default
  • cscs-ci run distributed

Optional Tests

To run benchmarks you can use:

  • cscs-ci run benchmark-bencher

To run tests and benchmarks with the DaCe backend you can use:

  • cscs-ci run dace

To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:

  • cscs-ci run extra

For more detailed information please look at CI in the EXCLAIM universe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Rewrite download infrastructure with pooch

4 participants