Skip to content

Update data tools #61

Merged
klaundal merged 5 commits into
klaundal:mainfrom
FasilGibdaw:update-data-folders
Apr 28, 2026
Merged

Update data tools #61
klaundal merged 5 commits into
klaundal:mainfrom
FasilGibdaw:update-data-folders

Conversation

@FasilGibdaw
Copy link
Copy Markdown
Contributor

cleaned data_tools adding case example how to use the data tools.

Copilot AI review requested due to automatic review settings April 28, 2026 18:33
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands and cleans up data_tools by adding new dataset download helpers (Swarm, SuperMAG, SuperDARN, CHAMP, AMPERE, DMSP SSUSI/SSIES), a convenience loader for event-based workflows, and small SSUSI-related updates in the existing loader code.

Changes:

  • Added new downloader modules for multiple data sources (SuperMAG, SuperDARN, Swarm, CHAMP, AMPERE, DMSP SSUSI/SSIES) plus an event-oriented “prepare + subset” helper.
  • Updated dataloader.read_ssusi() to support selecting SSUSI source and adjusted SSIES/Iridium handling.
  • Added/updated supporting assets (station list CSV, SuperDARN Zenodo catalog CSV, data_tools README notes).

Reviewed changes

Copilot reviewed 13 out of 14 changed files in this pull request and generated 18 comments.

Show a summary per file
File Description
lompe/utils/time.py Formatting changes + new date2doy() helper for YYYY-MM-DD strings.
lompe/data_tools/swarm.py New Swarm downloader using viresclient and HDF output.
lompe/data_tools/supermag.py New SuperMAG downloader with parallel station downloads and HDF output.
lompe/data_tools/superdarn.py New SuperDARN downloader/parser driven by a Zenodo URL catalog and HDF output.
lompe/data_tools/get_lompe_data.py New “prepare event data once” + “extract subsets into lompe.Data” helper and example main.
lompe/data_tools/dmsp_ssusi.py New SSUSI downloader (CDAWeb) + processing to NetCDF.
lompe/data_tools/dmsp_ssies.py New SSIES downloader via Madrigal FTP-like pages + HDF output.
lompe/data_tools/dataloader.py SSUSI source option + SSIES file filtering changes + Iridium glob extension change + formatting tweaks.
lompe/data_tools/champ.py New CHAMP downloader/processor producing HDF.
lompe/data_tools/ampere.py New AMPERE raw download + conversion via existing read_iridium().
lompe/data_tools/README Updated dataset list and SSUSI source note.
lompe/data/supermag_stations.csv Added SuperMAG station metadata CSV.
lompe/data/sdarn_2010_to_2021.csv Added SuperDARN Zenodo catalog CSV.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +58 to +60
# URL of the Zenodo record
url = filtered_df['url'].tolist()[0]

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no matching (year, month) entry in the CSV, filtered_df will be empty and tolist()[0] will raise IndexError. Add an explicit empty check and return/raise a clear “event not covered” error.

Copilot uses AI. Check for mistakes.
@@ -0,0 +1,156 @@
from joblib import Parallel, delayed
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

joblib is imported at module import time, but it’s not listed in pyproject.toml dependencies. This will cause import lompe.data_tools.supermag to fail in a standard install. Consider moving the import inside download_supermag() with a helpful error message, and/or adding joblib as an optional dependency.

Suggested change
from joblib import Parallel, delayed
try:
from joblib import Parallel, delayed
except ImportError:
def _missing_joblib(*args, **kwargs):
raise ImportError(
"joblib is required to use download_supermag() with parallel execution. "
"Install joblib to enable this functionality."
)
Parallel = _missing_joblib
delayed = _missing_joblib

Copilot uses AI. Check for mistakes.
from .champ import download_champ
from .superdarn import download_sdarn
from .swarm import download_swarm
from .dmsp_ssies import download_dmsp_ssies
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

download_dmsp_ssies is imported but never used in this module. Consider removing it until it’s actually wired into prepare_event_data() to avoid unused-import warnings.

Suggested change
from .dmsp_ssies import download_dmsp_ssies

Copilot uses AI. Check for mistakes.
Comment thread lompe/data_tools/swarm.py
Comment on lines +36 to +40
for line in lines:
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
if token_value:
print("Swarm token is present:", token_value)
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The token-check loop only prints when a non-empty token is found; if the file exists but has no token = ... entry (or it’s empty), the function will still proceed to make the request and fail later. Consider explicitly detecting the “no token found/empty token” case and returning with a clear message.

Suggested change
for line in lines:
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
if token_value:
print("Swarm token is present:", token_value)
token_value = None
for line in lines:
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
break
if not token_value:
print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return
print("Swarm token is present:", token_value)

Copilot uses AI. Check for mistakes.
Comment on lines +6 to +10
import requests
import pandas as pd
import xarray as xr
from bs4 import BeautifulSoup

Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This module imports third-party packages (requests, bs4) at import time, but they are not declared in pyproject.toml dependencies/optionals. That means import lompe.data_tools.superdarn can fail in a standard install even if the caller never uses these functions. Prefer lazy-importing inside the download functions (and raising a clear install hint) and/or declare these as optional dependencies.

Copilot uses AI. Check for mistakes.
import numpy as np
import pandas as pd
import lompe
from lompe.data_tools import datadownloader, dataloader
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

datadownloader and dataloader are imported but not used in this module (only referenced in commented code). Removing unused imports avoids linter warnings and reduces import-time work.

Suggested change
from lompe.data_tools import datadownloader, dataloader

Copilot uses AI. Check for mistakes.
try:
files = glob.glob(
f"{temp_sdarn_path}*{event.replace('-', '')}*.nc")
files.sort()
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no files are downloaded/found, files will be empty and the downstream processing will fail (e.g., missing 'time' column / ddd.index[0]). Add a guard for if not files: (and ideally clean up the temp directory) before continuing.

Suggested change
files.sort()
files.sort()
if not files:
print(f"No SuperDARN files found for event {event}")
shutil.rmtree(temp_sdarn_path, ignore_errors=True)
return None

Copilot uses AI. Check for mistakes.
Comment thread lompe/data_tools/swarm.py
Comment on lines +36 to +43
for line in lines:
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
if token_value:
print("Swarm token is present:", token_value)
except:
print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare except: will swallow unrelated errors (e.g., permission issues, parse errors) and makes debugging difficult. Catch specific exceptions (e.g., FileNotFoundError, PermissionError) and handle the “token missing” case explicitly.

Suggested change
for line in lines:
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
if token_value:
print("Swarm token is present:", token_value)
except:
print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return
except FileNotFoundError:
print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return
except PermissionError:
print("Unable to read Swarm token configuration due to file permissions. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return
except OSError as e:
print(f"Unable to read Swarm token configuration: {e}")
return
token_value = None
for line in lines:
if line.startswith("token ="):
token_value = line.split('=', 1)[1].strip()
break
if not token_value:
print("Token is missing or empty. \nPlease visit https://viresclient.readthedocs.io/en/latest/config_details.html to configure it")
return
print("Swarm token is present:", token_value)

Copilot uses AI. Check for mistakes.
fn = basepath + event.replace('-', '') + 'Amp_invert.ncdf'
if not os.path.isfile(fn):
files = glob.glob(basepath + '*' + event.replace('-', '') + '*.ncdf')
files = glob.glob(basepath + '*' + event.replace('-', '') + '*.nc')
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fallback search now only globs for *.nc, but AMPERE fitted files are often *.ncdf (and the default filename is still Amp_invert.ncdf). Consider globbing both *.nc and *.ncdf to avoid missing valid files.

Suggested change
files = glob.glob(basepath + '*' + event.replace('-', '') + '*.nc')
files = (
glob.glob(basepath + '*' + event.replace('-', '') + '*.nc') +
glob.glob(basepath + '*' + event.replace('-', '') + '*.ncdf')
)

Copilot uses AI. Check for mistakes.
(irid.lon.values, irid.lat.values, irid.r.values))
else:
irid_B = np.empty((3, 0))
irid_coords = np.empty((2, 0))
Copy link

Copilot AI Apr 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When irid is empty, irid_coords is created with shape (2, 0), but space_mag_fac expects 3 coordinate rows (lon, lat, r). This will raise ShapeError in lompe.Data; make the empty-case coords (3, 0).

Suggested change
irid_coords = np.empty((2, 0))
irid_coords = np.empty((3, 0))

Copilot uses AI. Check for mistakes.
@klaundal klaundal merged commit 435bf11 into klaundal:main Apr 28, 2026
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants