das-anomaly is an open-source Python package for unsupervised anomaly detection in distributed acoustic sensing (DAS) datasets using an autoencoder-based deep learning algorithm. It is being developed by Ahmad Tourei under the supervision of Dr. Eileen R. Martin at Colorado School of Mines.
If you use das-anomaly in your work, please cite the following:
Ahmad Tourei. (2025). DASDAE/das-anomaly: latest (Concept). Zenodo. http://doi.org/10.5281/zenodo.12747212
- Python = 3.10, 3.11, 3.12
- pip
Optional:
Dependency notes:
-
Installation and loading of Open MPI is required prior to
MPI4Pyinstallation. Ensure proper installation using a helloworld example. -
If you'd like to train the model on GPU, make sure you install TensorFlow with GPU setup in your environment. More information can be found here.
-
Currently waiting on
TensorFlowto support Python 3.13 before we can support it as well.
For clean dependency management, use a virtual environment or a fresh Conda environment. To install the package in editable mode with the required dependencies, run the following after cloning the repository and navigating to the repo directory:
pip install -e .To install the package in editable mode with all optional dependencies, run:
pip install -e '.[all]'To uninstall the package, run:
pip uninstall das_anomalyThe package implements a convolutional autoencoder designed to compress and reconstruct power spectral density (PSD) inputs.
- Encoder: A lightweight convolutional neural network reduces the input dimensionality, mapping it into a compact latent space.
- Decoder: A symmetric decoder reconstructs the data by upsampling the latent representation back to the original resolution.
The overall workflow for using the package is illustrated below:
The main steps are:
- Define constants and create a Spool of data:
Using the config_user script in the das_anomaly directory, define the constants and directory paths for the data, power spectral density (PSD) images, detected anomaly results, etc. You would complete adding the values and paths as you go over the steps mentioned below. Then, using DASCore, create an index file for the spool of data first time reading the DAS data directory:
import dascore as dc
from das_anomaly.settings import SETTINGS
data_path = SETTINGS.DATA_PATH
# Update will create an index of the contents for fast querying/access. No need to apply update() in future.
spool = dc.spool(directory_path).update()Note: Creating the spool for the first time may take some time if your directory contains hundreds of gigabytes or terabytes of DAS data. However, DASCore creates an index file, allowing it to quickly query the directory on subsequent accesses.
- Set a consistent upper bound for PSD amplitude values:
To ensure all PSD images share the same colorbar scale (in RGB), determine an appropriate CLIP_VALUE_MAX in the config_user input file. This can be done using the get_psd_max_clip function, which computes the mean value of maximum amplitude from TIME_WINDOWs of the data which does not include obvious anomalies (therefore, a quick exploratory data analysis is needed here.)
from das_anomaly.psd import PSDConfig, PSDGenerator
from das_anomaly.settings import SETTINGS
# path to one or a few background noise patches
bn_data_path = SETTINGS.BN_DATA_PATH
cfg = PSDConfig(data_path=bn_data_path)
gen = PSDGenerator(cfg)
percentile = 90 # data dependent - need visual inspection
clip_val = gen.run_get_psd_val(percentile=percentile)
print(f"Mean {percentile}-percentile amplitude across all patches: {clip_val:.3e}")- Generate PSD plots:
Use the das_anomaly.psd module and create PSD plots in RGB format and in plain mode (with no axes or colorbar). The das_anomaly.psd.PSDGenerator reads DAS data, creates a spool using DASCore library, applies a detrend function to each patch of the chunked spool, and then average the energy over a desired time window and stack all channels together to create a spatial PSD image with channels on the X-axis and frequency on the Y-axis. You can use MPI to embarrassingly distribute reading data and plotting PSDs over CPUs.
from das_anomaly.psd import PSDConfig, PSDGenerator
cfg = PSDConfig()
# serial processing with single processor:
PSDGenerator(cfg).run()
# parallel processing with multiple processors using MPI:
PSDGenerator(cfg).run_parallel()Note: If you'd like to use PSDs for purposes other than training the model, the hide_axes=False will plot the PSD with axes and colorbar (default is True).
from das_anomaly.psd import PSDConfig, PSDGenerator
cfg = PSDConfig(hide_axes=False)
# serial processing with single processor:
PSDGenerator(cfg).run()
# parallel processing with multiple processors using MPI (first, make sure you've installed the package with all dependencies explained above):
PSDGenerator(cfg).run_parallel()- Select and copy known anomaly PSD plots:
From the generated PSD plots, visually identify and then copy examples of known anomalies to the ANOMALY_IMAGES_PATH specified in the config_user input script. These anomalies can include events such as earthquakes from an existing catalog, instrument noise, anthropogenic disturbances, etc. Including these examples helps improve thresholding during the detection process.
- Train:
The das_anomaly.train module helps with randomly selecting train and test PSD images and training the model (with CPU or GPU) on anomaly-free PSD images.
from das_anomaly.settings import SETTINGS
from das_anomaly.train import TrainAEConfig, AutoencoderTrainer, TrainSplitConfig, ImageSplitter
# select and copy train and test datasets from PSD
cfg = TrainSplitConfig()
ImageSplitter(cfg).run()
# train the autoencoder model
cfg = TrainAEConfig()
AutoencoderTrainer(cfg).run()Note: Since the TrainSplitConfig() function randomly selects PSD images from the generated plots, you must ensure the training and testing datasets do not include obvious anomalies. If you have an excel sheet with time stamp of anomalies (such as a catalog), use the "exclude_known_events_from_training" in examples directory to exclude them. Or, manually inspect both the training and testing sets to ensure they do not contain apparent anomalies. Review their time- and frequency-domain plots, and remove any suspicious samples to maintain the quality of training.
- Test and set thresholds:
Using the validate_and_plot_density and thresholding_f_score jupyter notebooks in the examples directory, validate the trained model and find appropriate MSE and density score as thresholds for anomaly detection. Make sure to modify the DENSITY_THRESHOLD and MSE_THRESHOLD parameters in the config_user script.
- Run the trained model:
The das_anomaly.detect module uses the trained model to detect anomalies in the PSD images and writes their information (e.g., time stamp). It also copies the detected anomaly to the RESULTS_PATH. MPI can be used to distribute PSDs over CPUs. Then, using the das_anomaly.count module, count the number of detected anomalies and display their details and file paths.
from das_anomaly.count.counter import CounterConfig, AnomalyCounter
from das_anomaly.detect import DetectConfig, AnomalyDetector
cfg = DetectConfig()
# serial processing with single processor:
AnomalyDetector(cfg).run()
# parallel processing with multiple processors using MPI:
AnomalyDetector(cfg).run_parallel()
# count number of detected anomalies
cfg = CounterConfig(keyword="anomaly", classify_types=True, max_gap_seconds=0)
anomalies = AnomalyCounter(cfg).run()
print(anomalies) # prints info on number of anomalies and path to themStill under development. Use with caution.
Ahmad Tourei Colorado School of Mines [email protected]



