Skip to content

Update data tools #60

Closed
FasilGibdaw wants to merge 4 commits into
klaundal:mainfrom
FasilGibdaw:update-data-folders
Closed

Update data tools #60
FasilGibdaw wants to merge 4 commits into
klaundal:mainfrom
FasilGibdaw:update-data-folders

Conversation

@FasilGibdaw
Copy link
Copy Markdown
Contributor

Adding downloading scripts and sample usage (notebook)

Copilot AI review requested due to automatic review settings April 28, 2026 17:11
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a set of data download/assembly utilities under lompe/data_tools (plus small supporting changes) to fetch common event-day datasets and produce Lompe-ready files/dataframes.

Changes:

  • Added new download scripts for Swarm, SuperMAG, SuperDARN, CHAMP, and SuperMAG API utilities.
  • Added SSUSI/SSIES download helpers and an orchestration module to prepare per-event datasets and Lompe Data objects.
  • Updated existing time and loader utilities to support day-of-year handling and alternate SSUSI file sources/patterns.

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 15 comments.

Show a summary per file
File Description
lompe/utils/time.py Formatting cleanup and new date2doy helper used by data tools.
lompe/data_tools/swarm.py New Swarm downloader using viresclient and HDF output.
lompe/data_tools/supermag_api.py Added SuperMAG API helper module (contains restrictive license header).
lompe/data_tools/supermag.py New SuperMAG downloader (parallel station downloads) producing Lompe-style dataframe/HDF.
lompe/data_tools/superdarn.py New SuperDARN downloader that pulls Zenodo files and converts to Lompe-style dataframe/HDF.
lompe/data_tools/get_lompe_data.py New orchestration layer to download multiple sources and build Lompe Data subsets.
lompe/data_tools/dmsp_ssusi.py New SSUSI downloader/processor + concurrent download helper.
lompe/data_tools/dmsp_ssies.py New SSIES downloader/processor using Madrigal endpoints.
lompe/data_tools/dataloader.py Updated SSUSI reader signature and added DOY handling + small robustness tweaks.
lompe/data_tools/champ.py New CHAMP downloader/processor producing Lompe-style dataframe/HDF.
lompe/data_tools/ampere.py New AMPERE/Iridium downloader (raw + processed) integrating with existing loader.
lompe/data_tools/README Updated list of supported datasets and SSUSI source note.
lompe/data/supermag_stations.csv Added SuperMAG station metadata CSV.
lompe/data/sdarn_2010_to_2021.csv Added Zenodo record mapping used by SuperDARN downloader.
Comments suppressed due to low confidence (1)

lompe/utils/time.py:30

  • The error message has a typo (date2ody) and references a different function name than the actual one (date_to_doy). Since this is user-facing, consider correcting it to date_to_doy: ... to make debugging clearer.
    if month.shape != day.shape:
        raise ValueError('date2ody: month and day must have the same shape')


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +18 to +44
;supermag-api.py
; ================
; Author S. Antunes, based on supermag-api.pro by R.J.Barnes


; (c) 2021 The Johns Hopkins University Applied Physics Laboratory
;LLC. All Rights Reserved.

;This material may be only be used, modified, or reproduced by or for
;the U.S. Government pursuant to the license rights granted under the
;clauses at DFARS 252.227-7013/7014 or FAR 52.227-14. For any other
;permission,
;please contact the Office of Technology Transfer at JHU/APL.

; NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED "AS IS."
; JHU/APL MAKES NO REPRESENTATION OR WARRANTY WITH RESPECT TO THE
; PERFORMANCE OF THE MATERIALS, INCLUDING THEIR SAFETY, EFFECTIVENESS,
; OR COMMERCIAL VIABILITY, AND DISCLAIMS ALL WARRANTIES IN THE
; MATERIAL, WHETHER EXPRESS OR IMPLIED, INCLUDING (BUT NOT LIMITED TO)
; ANY AND ALL IMPLIED WARRANTIES OF PERFORMANCE, MERCHANTABILITY,
; FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF
; INTELLECTUAL PROPERTY OR OTHER THIRD PARTY RIGHTS. ANY USER OF THE
; MATERIAL ASSUMES THE ENTIRERISK AND LIABILITY FOR USING THE
; MATERIAL. IN NO EVENT SHALL JHU/APL BE LIABLE TO ANY USER OF THE
; MATERIAL FOR ANY ACTUAL, INDIRECT, CONSEQUENTIAL, SPECIAL OR OTHER
; DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE, THE MATERIAL,
; INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS.
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The module docstring contains a restrictive “All Rights Reserved… may only be used… by or for the U.S. Government” notice. This is incompatible with this repository’s MIT license and likely can’t be redistributed here. Please remove this file, replace it with a clean-room implementation, or include only code that is explicitly MIT-compatible (with appropriate attribution).

Suggested change
;supermag-api.py
; ================
; Author S. Antunes, based on supermag-api.pro by R.J.Barnes
; (c) 2021 The Johns Hopkins University Applied Physics Laboratory
;LLC. All Rights Reserved.
;This material may be only be used, modified, or reproduced by or for
;the U.S. Government pursuant to the license rights granted under the
;clauses at DFARS 252.227-7013/7014 or FAR 52.227-14. For any other
;permission,
;please contact the Office of Technology Transfer at JHU/APL.
; NO WARRANTY, NO LIABILITY. THIS MATERIAL IS PROVIDED "AS IS."
; JHU/APL MAKES NO REPRESENTATION OR WARRANTY WITH RESPECT TO THE
; PERFORMANCE OF THE MATERIALS, INCLUDING THEIR SAFETY, EFFECTIVENESS,
; OR COMMERCIAL VIABILITY, AND DISCLAIMS ALL WARRANTIES IN THE
; MATERIAL, WHETHER EXPRESS OR IMPLIED, INCLUDING (BUT NOT LIMITED TO)
; ANY AND ALL IMPLIED WARRANTIES OF PERFORMANCE, MERCHANTABILITY,
; FITNESS FOR A PARTICULAR PURPOSE, AND NON-INFRINGEMENT OF
; INTELLECTUAL PROPERTY OR OTHER THIRD PARTY RIGHTS. ANY USER OF THE
; MATERIAL ASSUMES THE ENTIRERISK AND LIABILITY FOR USING THE
; MATERIAL. IN NO EVENT SHALL JHU/APL BE LIABLE TO ANY USER OF THE
; MATERIAL FOR ANY ACTUAL, INDIRECT, CONSEQUENTIAL, SPECIAL OR OTHER
; DAMAGES ARISING FROM THE USE OF, OR INABILITY TO USE, THE MATERIAL,
; INCLUDING, BUT NOT LIMITED TO, ANY DAMAGES FOR LOST PROFITS.
Utilities for building requests to the SuperMAG web services and
parsing the returned data into Python-friendly structures.
This module contains helper functions used to construct API URLs and
work with SuperMAG service responses.

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +99
savefile = tempfile_path + event.replace('-', '') + '_superdarn_grdmap.h5'
if os.path.isfile(savefile):
return savefile
else:
from lompe.data_tools.dataloader import radar_losvec_from_mag
temp_sdarn_path = basepath + f"sdarn_files_{event.replace('-', '')}/"
os.makedirs(temp_sdarn_path, exist_ok=True)
download_sdarn_files(event, temp_sdarn_path)
# looking for the .nc files for the event
try:
files = glob.glob(
f"{temp_sdarn_path}*{event.replace('-', '')}*.nc")
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several paths are built with basepath + ... / tempfile_path + ..., which breaks when the base path doesn’t end with / and is non-portable on Windows. Prefer os.path.join(...) for save_path, savefile, and temp_sdarn_path construction.

Suggested change
savefile = tempfile_path + event.replace('-', '') + '_superdarn_grdmap.h5'
if os.path.isfile(savefile):
return savefile
else:
from lompe.data_tools.dataloader import radar_losvec_from_mag
temp_sdarn_path = basepath + f"sdarn_files_{event.replace('-', '')}/"
os.makedirs(temp_sdarn_path, exist_ok=True)
download_sdarn_files(event, temp_sdarn_path)
# looking for the .nc files for the event
try:
files = glob.glob(
f"{temp_sdarn_path}*{event.replace('-', '')}*.nc")
event_id = event.replace('-', '')
savefile = os.path.join(tempfile_path, f'{event_id}_superdarn_grdmap.h5')
if os.path.isfile(savefile):
return savefile
else:
from lompe.data_tools.dataloader import radar_losvec_from_mag
temp_sdarn_path = os.path.join(basepath, f"sdarn_files_{event_id}")
os.makedirs(temp_sdarn_path, exist_ok=True)
download_sdarn_files(event, temp_sdarn_path)
# looking for the .nc files for the event
try:
files = glob.glob(
os.path.join(temp_sdarn_path, f"*{event_id}*.nc"))

Copilot uses AI. Check for mistakes.
Comment on lines +188 to +192
print('DMSP SSUSI file saved: ' + savefile)

imgs.to_netcdf(savefile)
shutil.rmtree(basepath)
return savefile
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_ssusi unconditionally deletes basepath via shutil.rmtree(basepath). If a caller passes a non-temporary directory, this can remove unrelated files. Only delete directories you created for this run (e.g., a dedicated temp dir under tempfile_path) or gate deletion behind an explicit cleanup=True option.

Copilot uses AI. Check for mistakes.
Comment on lines +72 to +94
def get_data_subsets(event_data, event, delta_minutes=2, sources=None, **kwargs):
'''
Extract data subsets for the given time interval [t0, t1]. and prepare lompe.Data objects.
Returns: iridium_data, supermag_data, superdarn_data, champ_data'''
if sources is None:
sources = ["supermag", "iridium", "superdarn", "champ"]
T0 = pd.to_datetime(event)
DT = pd.Timedelta(minutes=delta_minutes)
t0, t1 = T0 - DT / 2, T0 + DT / 2

def ensure_datetimeindex(df):
if not isinstance(df.index, pd.DatetimeIndex):
try:
df = df.copy()
df.index = pd.to_datetime(df.index)
except Exception as e:
raise TypeError(f"Failed to convert index to datetime: {e}")
return df

# --- iridium ---
iridium = event_data["iridium"]
irid = iridium[(iridium.time >= t0) & (iridium.time <= t1)]

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

get_data_subsets assumes required columns exist (e.g., iridium.time, smag.Be, etc.). If a download fails, safe_read_hdf returns an empty DataFrame without these columns, and attribute access like iridium.time will raise. Add guards for missing/empty datasets (and/or honor the sources argument) so the function can gracefully return empty lompe.Data objects.

Copilot uses AI. Check for mistakes.
Comment on lines 96 to 103
for sat in ['F16', 'F17', 'F18', 'F19']:
files = glob.glob(basepath + '*' + sat + '*' +
event[0:4]+event[5:7]+event[8:10] + '*.NC')
if source == 'jhuapl':
files = glob.glob(basepath + '*' + sat + '*' +
event[0:4] + event[5:7] + event[8:10] + '*.NC')
elif source == 'cdaweb':
files = glob.glob(basepath + '*' + sat + '*' +
event[0:4] + doy_str + '*.nc')
files.sort()
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the source == 'cdaweb' branch, the glob uses '*' + sat + '*' where sat is 'F16' (uppercase). In this PR, download_ssusi_files() saves CDAWeb filenames verbatim (likely containing dmspf16 in lowercase), so this glob may miss the downloaded files on case-sensitive filesystems. Consider using sat.lower() (or a case-insensitive pattern) for the CDAWeb glob to align with the downloader’s naming.

Copilot uses AI. Check for mistakes.
event (_type_): _description_
userid (str, optional): Defaults to "lompe".
n_jobs (int, optional): Defaults to -1.
save (bool, optional): Defaults to False.
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Docstring says save defaults to False, but the function signature sets save=True. Update either the default value or the docstring so callers don’t get surprised by a file being written by default.

Suggested change
save (bool, optional): Defaults to False.
save (bool, optional): Defaults to True.

Copilot uses AI. Check for mistakes.
import numpy as np
import pandas as pd
import lompe
from lompe.data_tools import datadownloader, dataloader
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from lompe.data_tools import datadownloader, dataloader are currently unused (only referenced in commented-out code). Consider removing these imports to avoid unnecessary import-time side effects and to keep the module clean.

Suggested change
from lompe.data_tools import datadownloader, dataloader

Copilot uses AI. Check for mistakes.
Comment thread lompe/data_tools/swarm.py
Comment on lines +33 to +43
try:
with open(os.path.expanduser('~/.viresclient.ini'), 'r') as file:
lines = file.readlines()
for line in lines:
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
if token_value:
print("Swarm token is present:", token_value)
except:
print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The access token is printed to stdout (print("Swarm token is present:", token_value)), which can leak credentials into logs/notebooks. Avoid printing token values; at most log that a token is configured (without the secret) or rely on viresclient’s own validation.

Suggested change
try:
with open(os.path.expanduser('~/.viresclient.ini'), 'r') as file:
lines = file.readlines()
for line in lines:
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
if token_value:
print("Swarm token is present:", token_value)
except:
print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return
token_found = False
try:
with open(os.path.expanduser('~/.viresclient.ini'), 'r') as file:
for line in file:
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
if token_value:
token_found = True
print("Swarm token is configured.")
break
except OSError:
print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return
if not token_found:
print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +60
filtered_df = file_loc[(file_loc['year'].astype(
str) == year) & (file_loc['Month_Num'] == month)]

# Apply function and add to DataFrame
event_date_str = event.replace('-', '')

# URL of the Zenodo record
url = filtered_df['url'].tolist()[0]

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

filtered_df['url'].tolist()[0] will raise IndexError when the CSV has no matching record for the event’s year/month (or if Month_Num mapping yields NaN). Add an explicit empty-check and return/raise a clear error when no Zenodo record is found for the requested month.

Copilot uses AI. Check for mistakes.
Comment thread lompe/utils/time.py
Comment on lines 36 to +52
if day.min() < 1:
raise ValueError('date2doy: day must not be less than 1')

# flatten arrays:
shape = month.shape
month = month.flatten()
day = day.flatten()
day = day.flatten()

# check if day exceeds days in months
days_in_month = np.array([0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31])
days_in_month_ly = np.array([0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31])
if ( (np.any(day[~leapyear] > days_in_month [month[~leapyear]])) |
(np.any(day[ leapyear] > days_in_month_ly[month[ leapyear]])) ):
raise ValueError('date2doy: day must not exceed number of days in month')
days_in_month = np.array(
[0, 31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31])
days_in_month_ly = np.array(
[0, 31, 29, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31])
if ((np.any(day[~leapyear] > days_in_month[month[~leapyear]])) |
(np.any(day[leapyear] > days_in_month_ly[month[leapyear]]))):
raise ValueError(
'date2doy: day must not exceed number of days in month')
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Several ValueError messages in this function use the prefix date2doy: even though the function is date_to_doy. Aligning the prefix with the actual function name will make tracebacks and user reports less confusing.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants