Update data tools #61
Conversation
There was a problem hiding this comment.
Pull request overview
This PR expands and cleans up data_tools by adding new dataset download helpers (Swarm, SuperMAG, SuperDARN, CHAMP, AMPERE, DMSP SSUSI/SSIES), a convenience loader for event-based workflows, and small SSUSI-related updates in the existing loader code.
Changes:
- Added new downloader modules for multiple data sources (SuperMAG, SuperDARN, Swarm, CHAMP, AMPERE, DMSP SSUSI/SSIES) plus an event-oriented “prepare + subset” helper.
- Updated
dataloader.read_ssusi()to support selecting SSUSI source and adjusted SSIES/Iridium handling. - Added/updated supporting assets (station list CSV, SuperDARN Zenodo catalog CSV,
data_toolsREADME notes).
Reviewed changes
Copilot reviewed 13 out of 14 changed files in this pull request and generated 18 comments.
Show a summary per file
| File | Description |
|---|---|
lompe/utils/time.py |
Formatting changes + new date2doy() helper for YYYY-MM-DD strings. |
lompe/data_tools/swarm.py |
New Swarm downloader using viresclient and HDF output. |
lompe/data_tools/supermag.py |
New SuperMAG downloader with parallel station downloads and HDF output. |
lompe/data_tools/superdarn.py |
New SuperDARN downloader/parser driven by a Zenodo URL catalog and HDF output. |
lompe/data_tools/get_lompe_data.py |
New “prepare event data once” + “extract subsets into lompe.Data” helper and example main. |
lompe/data_tools/dmsp_ssusi.py |
New SSUSI downloader (CDAWeb) + processing to NetCDF. |
lompe/data_tools/dmsp_ssies.py |
New SSIES downloader via Madrigal FTP-like pages + HDF output. |
lompe/data_tools/dataloader.py |
SSUSI source option + SSIES file filtering changes + Iridium glob extension change + formatting tweaks. |
lompe/data_tools/champ.py |
New CHAMP downloader/processor producing HDF. |
lompe/data_tools/ampere.py |
New AMPERE raw download + conversion via existing read_iridium(). |
lompe/data_tools/README |
Updated dataset list and SSUSI source note. |
lompe/data/supermag_stations.csv |
Added SuperMAG station metadata CSV. |
lompe/data/sdarn_2010_to_2021.csv |
Added SuperDARN Zenodo catalog CSV. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # URL of the Zenodo record | ||
| url = filtered_df['url'].tolist()[0] | ||
|
|
There was a problem hiding this comment.
If there is no matching (year, month) entry in the CSV, filtered_df will be empty and tolist()[0] will raise IndexError. Add an explicit empty check and return/raise a clear “event not covered” error.
| @@ -0,0 +1,156 @@ | |||
| from joblib import Parallel, delayed | |||
There was a problem hiding this comment.
joblib is imported at module import time, but it’s not listed in pyproject.toml dependencies. This will cause import lompe.data_tools.supermag to fail in a standard install. Consider moving the import inside download_supermag() with a helpful error message, and/or adding joblib as an optional dependency.
| from joblib import Parallel, delayed | |
| try: | |
| from joblib import Parallel, delayed | |
| except ImportError: | |
| def _missing_joblib(*args, **kwargs): | |
| raise ImportError( | |
| "joblib is required to use download_supermag() with parallel execution. " | |
| "Install joblib to enable this functionality." | |
| ) | |
| Parallel = _missing_joblib | |
| delayed = _missing_joblib |
| from .champ import download_champ | ||
| from .superdarn import download_sdarn | ||
| from .swarm import download_swarm | ||
| from .dmsp_ssies import download_dmsp_ssies |
There was a problem hiding this comment.
download_dmsp_ssies is imported but never used in this module. Consider removing it until it’s actually wired into prepare_event_data() to avoid unused-import warnings.
| from .dmsp_ssies import download_dmsp_ssies |
| for line in lines: | ||
| if line.startswith("token ="): | ||
| token_value = line.split('=', 1)[1].strip() | ||
| if token_value: | ||
| print("Swarm token is present:", token_value) |
There was a problem hiding this comment.
The token-check loop only prints when a non-empty token is found; if the file exists but has no token = ... entry (or it’s empty), the function will still proceed to make the request and fail later. Consider explicitly detecting the “no token found/empty token” case and returning with a clear message.
| for line in lines: | |
| if line.startswith("token ="): | |
| token_value = line.split('=', 1)[1].strip() | |
| if token_value: | |
| print("Swarm token is present:", token_value) | |
| token_value = None | |
| for line in lines: | |
| if line.startswith("token ="): | |
| token_value = line.split('=', 1)[1].strip() | |
| break | |
| if not token_value: | |
| print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | |
| return | |
| print("Swarm token is present:", token_value) |
| import requests | ||
| import pandas as pd | ||
| import xarray as xr | ||
| from bs4 import BeautifulSoup | ||
|
|
There was a problem hiding this comment.
This module imports third-party packages (requests, bs4) at import time, but they are not declared in pyproject.toml dependencies/optionals. That means import lompe.data_tools.superdarn can fail in a standard install even if the caller never uses these functions. Prefer lazy-importing inside the download functions (and raising a clear install hint) and/or declare these as optional dependencies.
| import numpy as np | ||
| import pandas as pd | ||
| import lompe | ||
| from lompe.data_tools import datadownloader, dataloader |
There was a problem hiding this comment.
datadownloader and dataloader are imported but not used in this module (only referenced in commented code). Removing unused imports avoids linter warnings and reduces import-time work.
| from lompe.data_tools import datadownloader, dataloader |
| try: | ||
| files = glob.glob( | ||
| f"{temp_sdarn_path}*{event.replace('-', '')}*.nc") | ||
| files.sort() |
There was a problem hiding this comment.
If no files are downloaded/found, files will be empty and the downstream processing will fail (e.g., missing 'time' column / ddd.index[0]). Add a guard for if not files: (and ideally clean up the temp directory) before continuing.
| files.sort() | |
| files.sort() | |
| if not files: | |
| print(f"No SuperDARN files found for event {event}") | |
| shutil.rmtree(temp_sdarn_path, ignore_errors=True) | |
| return None |
| for line in lines: | ||
| if line.startswith("token ="): | ||
| token_value = line.split('=', 1)[1].strip() | ||
| if token_value: | ||
| print("Swarm token is present:", token_value) | ||
| except: | ||
| print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | ||
| return |
There was a problem hiding this comment.
Bare except: will swallow unrelated errors (e.g., permission issues, parse errors) and makes debugging difficult. Catch specific exceptions (e.g., FileNotFoundError, PermissionError) and handle the “token missing” case explicitly.
| for line in lines: | |
| if line.startswith("token ="): | |
| token_value = line.split('=', 1)[1].strip() | |
| if token_value: | |
| print("Swarm token is present:", token_value) | |
| except: | |
| print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | |
| return | |
| except FileNotFoundError: | |
| print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | |
| return | |
| except PermissionError: | |
| print("Unable to read Swarm token configuration due to file permissions. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | |
| return | |
| except OSError as e: | |
| print(f"Unable to read Swarm token configuration: {e}") | |
| return | |
| token_value = None | |
| for line in lines: | |
| if line.startswith("token ="): | |
| token_value = line.split('=', 1)[1].strip() | |
| break | |
| if not token_value: | |
| print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | |
| return | |
| print("Swarm token is present:", token_value) |
| fn = basepath + event.replace('-', '') + 'Amp_invert.ncdf' | ||
| if not os.path.isfile(fn): | ||
| files = glob.glob(basepath + '*' + event.replace('-', '') + '*.ncdf') | ||
| files = glob.glob(basepath + '*' + event.replace('-', '') + '*.nc') |
There was a problem hiding this comment.
The fallback search now only globs for *.nc, but AMPERE fitted files are often *.ncdf (and the default filename is still Amp_invert.ncdf). Consider globbing both *.nc and *.ncdf to avoid missing valid files.
| files = glob.glob(basepath + '*' + event.replace('-', '') + '*.nc') | |
| files = ( | |
| glob.glob(basepath + '*' + event.replace('-', '') + '*.nc') + | |
| glob.glob(basepath + '*' + event.replace('-', '') + '*.ncdf') | |
| ) |
| (irid.lon.values, irid.lat.values, irid.r.values)) | ||
| else: | ||
| irid_B = np.empty((3, 0)) | ||
| irid_coords = np.empty((2, 0)) |
There was a problem hiding this comment.
When irid is empty, irid_coords is created with shape (2, 0), but space_mag_fac expects 3 coordinate rows (lon, lat, r). This will raise ShapeError in lompe.Data; make the empty-case coords (3, 0).
| irid_coords = np.empty((2, 0)) | |
| irid_coords = np.empty((3, 0)) |
cleaned data_tools adding case example how to use the data tools.