Update data tools #60
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a set of data download/assembly utilities under lompe/data_tools (plus small supporting changes) to fetch common event-day datasets and produce Lompe-ready files/dataframes.
Changes:
- Added new download scripts for Swarm, SuperMAG, SuperDARN, CHAMP, and SuperMAG API utilities.
- Added SSUSI/SSIES download helpers and an orchestration module to prepare per-event datasets and Lompe
Dataobjects. - Updated existing time and loader utilities to support day-of-year handling and alternate SSUSI file sources/patterns.
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
lompe/utils/time.py |
Formatting cleanup and new date2doy helper used by data tools. |
lompe/data_tools/swarm.py |
New Swarm downloader using viresclient and HDF output. |
lompe/data_tools/supermag_api.py |
Added SuperMAG API helper module (contains restrictive license header). |
lompe/data_tools/supermag.py |
New SuperMAG downloader (parallel station downloads) producing Lompe-style dataframe/HDF. |
lompe/data_tools/superdarn.py |
New SuperDARN downloader that pulls Zenodo files and converts to Lompe-style dataframe/HDF. |
lompe/data_tools/get_lompe_data.py |
New orchestration layer to download multiple sources and build Lompe Data subsets. |
lompe/data_tools/dmsp_ssusi.py |
New SSUSI downloader/processor + concurrent download helper. |
lompe/data_tools/dmsp_ssies.py |
New SSIES downloader/processor using Madrigal endpoints. |
lompe/data_tools/dataloader.py |
Updated SSUSI reader signature and added DOY handling + small robustness tweaks. |
lompe/data_tools/champ.py |
New CHAMP downloader/processor producing Lompe-style dataframe/HDF. |
lompe/data_tools/ampere.py |
New AMPERE/Iridium downloader (raw + processed) integrating with existing loader. |
lompe/data_tools/README |
Updated list of supported datasets and SSUSI source note. |
lompe/data/supermag_stations.csv |
Added SuperMAG station metadata CSV. |
lompe/data/sdarn_2010_to_2021.csv |
Added Zenodo record mapping used by SuperDARN downloader. |
Comments suppressed due to low confidence (1)
lompe/utils/time.py:30
- The error message has a typo (
date2ody) and references a different function name than the actual one (date_to_doy). Since this is user-facing, consider correcting it todate_to_doy: ...to make debugging clearer.
if month.shape != day.shape:
raise ValueError('date2ody: month and day must have the same shape')
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ;supermag-api.py | ||
| ; ================ | ||
| ; Author S. Antunes, based on supermag-api.pro by R.J.Barnes | ||
|
|
||
|
|
||
| ; (c) 2021 The Johns Hopkins University Applied Physics Laboratory | ||
| ;LLC. All Rights Reserved. | ||
|
|
||
| ;This material may be only be used, modified, or reproduced by or for | ||
| ;the U.S. Government pursuant to the license rights granted under the | ||
| ;clauses at DFARS 252.227-7013/7014 or FAR 52.227-14. For any other | ||
| ;permission, | ||
| ;please contact the Office of Technology Transfer at JHU/APL. | ||
|
|
||
| ; NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED "AS IS." | ||
| ; JHU/APL MAKES NO REPRESENTATION OR WARRANTY WITH RESPECT TO THE | ||
| ; PERFORMANCE OF THE MATERIALS, INCLUDING THEIR SAFETY, EFFECTIVENESS, | ||
| ; OR COMMERCIAL VIABILITY, AND DISCLAIMS ALL WARRANTIES IN THE | ||
| ; MATERIAL, WHETHER EXPRESS OR IMPLIED, INCLUDING (BUT NOT LIMITED TO) | ||
| ; ANY AND ALL IMPLIED WARRANTIES OF PERFORMANCE, MERCHANTABILITY, | ||
| ; FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF | ||
| ; INTELLECTUAL PROPERTY OR OTHER THIRD PARTY RIGHTS. ANY USER OF THE | ||
| ; MATERIAL ASSUMES THE ENTIRERISK AND LIABILITY FOR USING THE | ||
| ; MATERIAL. IN NO EVENT SHALL JHU/APL BE LIABLE TO ANY USER OF THE | ||
| ; MATERIAL FOR ANY ACTUAL, INDIRECT, CONSEQUENTIAL, SPECIAL OR OTHER | ||
| ; DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE, THE MATERIAL, | ||
| ; INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS. |
There was a problem hiding this comment.
The module docstring contains a restrictive “All Rights Reserved… may only be used… by or for the U.S. Government” notice. This is incompatible with this repository’s MIT license and likely can’t be redistributed here. Please remove this file, replace it with a clean-room implementation, or include only code that is explicitly MIT-compatible (with appropriate attribution).
| ;supermag-api.py | |
| ; ================ | |
| ; Author S. Antunes, based on supermag-api.pro by R.J.Barnes | |
| ; (c) 2021 The Johns Hopkins University Applied Physics Laboratory | |
| ;LLC. All Rights Reserved. | |
| ;This material may be only be used, modified, or reproduced by or for | |
| ;the U.S. Government pursuant to the license rights granted under the | |
| ;clauses at DFARS 252.227-7013/7014 or FAR 52.227-14. For any other | |
| ;permission, | |
| ;please contact the Office of Technology Transfer at JHU/APL. | |
| ; NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED "AS IS." | |
| ; JHU/APL MAKES NO REPRESENTATION OR WARRANTY WITH RESPECT TO THE | |
| ; PERFORMANCE OF THE MATERIALS, INCLUDING THEIR SAFETY, EFFECTIVENESS, | |
| ; OR COMMERCIAL VIABILITY, AND DISCLAIMS ALL WARRANTIES IN THE | |
| ; MATERIAL, WHETHER EXPRESS OR IMPLIED, INCLUDING (BUT NOT LIMITED TO) | |
| ; ANY AND ALL IMPLIED WARRANTIES OF PERFORMANCE, MERCHANTABILITY, | |
| ; FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF | |
| ; INTELLECTUAL PROPERTY OR OTHER THIRD PARTY RIGHTS. ANY USER OF THE | |
| ; MATERIAL ASSUMES THE ENTIRERISK AND LIABILITY FOR USING THE | |
| ; MATERIAL. IN NO EVENT SHALL JHU/APL BE LIABLE TO ANY USER OF THE | |
| ; MATERIAL FOR ANY ACTUAL, INDIRECT, CONSEQUENTIAL, SPECIAL OR OTHER | |
| ; DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE, THE MATERIAL, | |
| ; INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS. | |
| Utilities for building requests to the SuperMAG web services and | |
| parsing the returned data into Python-friendly structures. | |
| This module contains helper functions used to construct API URLs and | |
| work with SuperMAG service responses. |
| savefile = tempfile_path + event.replace('-', '') + '_superdarn_grdmap.h5' | ||
| if os.path.isfile(savefile): | ||
| return savefile | ||
| else: | ||
| from lompe.data_tools.dataloader import radar_losvec_from_mag | ||
| temp_sdarn_path = basepath + f"sdarn_files_{event.replace('-', '')}/" | ||
| os.makedirs(temp_sdarn_path, exist_ok=True) | ||
| download_sdarn_files(event, temp_sdarn_path) | ||
| # looking for the .nc files for the event | ||
| try: | ||
| files = glob.glob( | ||
| f"{temp_sdarn_path}*{event.replace('-', '')}*.nc") |
There was a problem hiding this comment.
Several paths are built with basepath + ... / tempfile_path + ..., which breaks when the base path doesn’t end with / and is non-portable on Windows. Prefer os.path.join(...) for save_path, savefile, and temp_sdarn_path construction.
| savefile = tempfile_path + event.replace('-', '') + '_superdarn_grdmap.h5' | |
| if os.path.isfile(savefile): | |
| return savefile | |
| else: | |
| from lompe.data_tools.dataloader import radar_losvec_from_mag | |
| temp_sdarn_path = basepath + f"sdarn_files_{event.replace('-', '')}/" | |
| os.makedirs(temp_sdarn_path, exist_ok=True) | |
| download_sdarn_files(event, temp_sdarn_path) | |
| # looking for the .nc files for the event | |
| try: | |
| files = glob.glob( | |
| f"{temp_sdarn_path}*{event.replace('-', '')}*.nc") | |
| event_id = event.replace('-', '') | |
| savefile = os.path.join(tempfile_path, f'{event_id}_superdarn_grdmap.h5') | |
| if os.path.isfile(savefile): | |
| return savefile | |
| else: | |
| from lompe.data_tools.dataloader import radar_losvec_from_mag | |
| temp_sdarn_path = os.path.join(basepath, f"sdarn_files_{event_id}") | |
| os.makedirs(temp_sdarn_path, exist_ok=True) | |
| download_sdarn_files(event, temp_sdarn_path) | |
| # looking for the .nc files for the event | |
| try: | |
| files = glob.glob( | |
| os.path.join(temp_sdarn_path, f"*{event_id}*.nc")) |
| print('DMSP SSUSI file saved: ' + savefile) | ||
|
|
||
| imgs.to_netcdf(savefile) | ||
| shutil.rmtree(basepath) | ||
| return savefile |
There was a problem hiding this comment.
download_ssusi unconditionally deletes basepath via shutil.rmtree(basepath). If a caller passes a non-temporary directory, this can remove unrelated files. Only delete directories you created for this run (e.g., a dedicated temp dir under tempfile_path) or gate deletion behind an explicit cleanup=True option.
| def get_data_subsets(event_data, event, delta_minutes=2, sources=None, **kwargs): | ||
| ''' | ||
| Extract data subsets for the given time interval [t0, t1]. and prepare lompe.Data objects. | ||
| Returns: iridium_data, supermag_data, superdarn_data, champ_data''' | ||
| if sources is None: | ||
| sources = ["supermag", "iridium", "superdarn", "champ"] | ||
| T0 = pd.to_datetime(event) | ||
| DT = pd.Timedelta(minutes=delta_minutes) | ||
| t0, t1 = T0 - DT / 2, T0 + DT / 2 | ||
|
|
||
| def ensure_datetimeindex(df): | ||
| if not isinstance(df.index, pd.DatetimeIndex): | ||
| try: | ||
| df = df.copy() | ||
| df.index = pd.to_datetime(df.index) | ||
| except Exception as e: | ||
| raise TypeError(f"Failed to convert index to datetime: {e}") | ||
| return df | ||
|
|
||
| # --- iridium --- | ||
| iridium = event_data["iridium"] | ||
| irid = iridium[(iridium.time >= t0) & (iridium.time <= t1)] | ||
|
|
There was a problem hiding this comment.
get_data_subsets assumes required columns exist (e.g., iridium.time, smag.Be, etc.). If a download fails, safe_read_hdf returns an empty DataFrame without these columns, and attribute access like iridium.time will raise. Add guards for missing/empty datasets (and/or honor the sources argument) so the function can gracefully return empty lompe.Data objects.
| for sat in ['F16', 'F17', 'F18', 'F19']: | ||
| files = glob.glob(basepath + '*' + sat + '*' + | ||
| event[0:4]+event[5:7]+event[8:10] + '*.NC') | ||
| if source == 'jhuapl': | ||
| files = glob.glob(basepath + '*' + sat + '*' + | ||
| event[0:4] + event[5:7] + event[8:10] + '*.NC') | ||
| elif source == 'cdaweb': | ||
| files = glob.glob(basepath + '*' + sat + '*' + | ||
| event[0:4] + doy_str + '*.nc') | ||
| files.sort() |
There was a problem hiding this comment.
In the source == 'cdaweb' branch, the glob uses '*' + sat + '*' where sat is 'F16' (uppercase). In this PR, download_ssusi_files() saves CDAWeb filenames verbatim (likely containing dmspf16 in lowercase), so this glob may miss the downloaded files on case-sensitive filesystems. Consider using sat.lower() (or a case-insensitive pattern) for the CDAWeb glob to align with the downloader’s naming.
| event (_type_): _description_ | ||
| userid (str, optional): Defaults to "lompe". | ||
| n_jobs (int, optional): Defaults to -1. | ||
| save (bool, optional): Defaults to False. |
There was a problem hiding this comment.
Docstring says save defaults to False, but the function signature sets save=True. Update either the default value or the docstring so callers don’t get surprised by a file being written by default.
| save (bool, optional): Defaults to False. | |
| save (bool, optional): Defaults to True. |
| import numpy as np | ||
| import pandas as pd | ||
| import lompe | ||
| from lompe.data_tools import datadownloader, dataloader |
There was a problem hiding this comment.
from lompe.data_tools import datadownloader, dataloader are currently unused (only referenced in commented-out code). Consider removing these imports to avoid unnecessary import-time side effects and to keep the module clean.
| from lompe.data_tools import datadownloader, dataloader |
| try: | ||
| with open(os.path.expanduser('~/.viresclient.ini'), 'r') as file: | ||
| lines = file.readlines() | ||
| for line in lines: | ||
| if line.startswith("token ="): | ||
| token_value = line.split('=', 1)[1].strip() | ||
| if token_value: | ||
| print("Swarm token is present:", token_value) | ||
| except: | ||
| print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | ||
| return |
There was a problem hiding this comment.
The access token is printed to stdout (print("Swarm token is present:", token_value)), which can leak credentials into logs/notebooks. Avoid printing token values; at most log that a token is configured (without the secret) or rely on viresclient’s own validation.
| try: | |
| with open(os.path.expanduser('~/.viresclient.ini'), 'r') as file: | |
| lines = file.readlines() | |
| for line in lines: | |
| if line.startswith("token ="): | |
| token_value = line.split('=', 1)[1].strip() | |
| if token_value: | |
| print("Swarm token is present:", token_value) | |
| except: | |
| print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | |
| return | |
| token_found = False | |
| try: | |
| with open(os.path.expanduser('~/.viresclient.ini'), 'r') as file: | |
| for line in file: | |
| if line.startswith("token ="): | |
| token_value = line.split('=', 1)[1].strip() | |
| if token_value: | |
| token_found = True | |
| print("Swarm token is configured.") | |
| break | |
| except OSError: | |
| print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | |
| return | |
| if not token_found: | |
| print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it") | |
| return |
| filtered_df = file_loc[(file_loc['year'].astype( | ||
| str) == year) & (file_loc['Month_Num'] == month)] | ||
|
|
||
| # Apply function and add to DataFrame | ||
| event_date_str = event.replace('-', '') | ||
|
|
||
| # URL of the Zenodo record | ||
| url = filtered_df['url'].tolist()[0] | ||
|
|
There was a problem hiding this comment.
filtered_df['url'].tolist()[0] will raise IndexError when the CSV has no matching record for the event’s year/month (or if Month_Num mapping yields NaN). Add an explicit empty-check and return/raise a clear error when no Zenodo record is found for the requested month.
| if day.min() < 1: | ||
| raise ValueError('date2doy: day must not be less than 1') | ||
|
|
||
| # flatten arrays: | ||
| shape = month.shape | ||
| month = month.flatten() | ||
| day = day.flatten() | ||
| day = day.flatten() | ||
|
|
||
| # check if day exceeds days in months | ||
| days_in_month = np.array([0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]) | ||
| days_in_month_ly = np.array([0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]) | ||
| if ( (np.any(day[~leapyear] > days_in_month [month[~leapyear]])) | | ||
| (np.any(day[ leapyear] > days_in_month_ly[month[ leapyear]])) ): | ||
| raise ValueError('date2doy: day must not exceed number of days in month') | ||
| days_in_month = np.array( | ||
| [0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]) | ||
| days_in_month_ly = np.array( | ||
| [0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31]) | ||
| if ((np.any(day[~leapyear] > days_in_month[month[~leapyear]])) | | ||
| (np.any(day[leapyear] > days_in_month_ly[month[leapyear]]))): | ||
| raise ValueError( | ||
| 'date2doy: day must not exceed number of days in month') |
There was a problem hiding this comment.
Several ValueError messages in this function use the prefix date2doy: even though the function is date_to_doy. Aligning the prefix with the actual function name will make tracebacks and user reports less confusing.
Adding downloading scripts and sample usage (notebook)