[WIP] Rewrite download infrastructure with pooch#1182
Conversation
|
Hi there, this is jenkins continuous integration... |
1 similar comment
|
Hi there, this is jenkins continuous integration... |
- Replace `wget` library with `pooch` in data_handling.py - Use `pooch.retrieve()` with `pooch.Untar(extract_dir=".")` for downloading and extracting tar archives - Remove manual temp directory management and tarfile handling - Remove the private `_perform_download()` helper function - Update dependency in model/testing/pyproject.toml: wget>=3.2 -> pooch>=1.7.0 - Update root pyproject.toml: wget>=3.2 -> pooch>=1.7.0 - Update uv.lock to reflect dependency changes Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/97eae63c-abce-4bf1-90d6-1f7c34967217 Co-authored-by: jcanton <5622559+jcanton@users.noreply.github.com>
Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/97eae63c-abce-4bf1-90d6-1f7c34967217 Co-authored-by: jcanton <5622559+jcanton@users.noreply.github.com>
msimberg
left a comment
There was a problem hiding this comment.
In principle I think this already looks good because it's a simple change (I've very briefly tested it locally as well).
However, if copilot (not mentioning you, yet) is motivated, I think there are some adjacent changes that could be made.
|
|
||
| Downloads to a temporary directory in the destination directory | ||
| (not /tmp to avoid space constraints). | ||
| Uses pooch for downloading and archive extraction. |
There was a problem hiding this comment.
Does someone know why we avoid /tmp? If we want to keep avoiding it then I think this comment should still mention /tmp. Otherwise I'd prefer to move to /tmp, or more correctly to $TMPDIR (is that a thing on macos?). Possibly with an override through ICON4PY_TMP_DATA_PATH or something like that?
Side note, but I'd also like to see ICON4PY_TEST_DATA_PATH default to something in ~/.cache/icon4py (or $XDG_CACHE_DIR/icon4py) but I don't know if others agree? I already set ICON4PY_TEST_DATA_PATH to ~/.cache/icon4py to share the cache between different icon4py worktrees.
This can also be out of scope for this PR.
Thread known_hash parameter through download_and_extract(), download_test_data(), _download_grid_file(), and muphys test fixtures. All hashes are None for now with TODO(msimberg) reminders. Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/9803a0a8-5d2a-40c0-9350-bb28ca915902 Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
…nces
- Add known_hash=None with TODO(msimberg) to every GridDescription instance
- Add known_hashes dict field to Experiment dataclass (per-rank: 1, 2, 4)
- Add known_hashes={1: None, 2: None, 4: None} with TODO(msimberg) to every Experiment instance
- Thread known_hashes through _download_ser_data in datatest.py
Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/6dcb0164-6d9b-4703-8785-37167685647b
Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
… values - Remove default=None from GridDescription.known_hash, MuphysExperiment.known_hash - Remove default_factory from Experiment.known_hashes, reorder before version field - Remove default=None from download_and_extract() and download_test_data() known_hash param - Use [] instead of .get() for known_hashes lookup in _download_ser_data - Add explicit known_hash=None TODO(msimberg) to all MuphysExperiment instances Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/80900bd2-f4e9-4d60-8600-a3543d7c8cbd Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
|
Mandatory Tests Please make sure you run these tests via comment before you merge!
Optional Tests To run benchmarks you can use:
To run tests and benchmarks with the DaCe backend you can use:
To run test levels ignored by the default test suite (mostly simple datatest for static fields computations) you can use:
For more detailed information please look at CI in the EXCLAIM universe. |
…-commit Agent-Logs-Url: https://github.com/C2SM/icon4py/sessions/1df385a6-8a39-4e58-8aba-f1061bdf682c Co-authored-by: msimberg <42977+msimberg@users.noreply.github.com>
wgetdependency withpoochdata_handling.pyto usepooch.retrieve()known_hash/known_hashesfields and thread through download functionsknown_hash=NoneTODOs to all GridDescription/Experiment instancesknown_hash/known_hashesfields (make them required)download_and_extract()anddownload_test_data()known_hashparam[]instead of.get()in_download_ser_dataknown_hash=NoneTODOs to all MuphysExperiment instances:afterTODO(msimberg)to pass pre-commit checks💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.