Skip to content

Capture and store software environment for analysis reproducibility#89

Open
Copilot wants to merge 4 commits intov0.8-previewfrom
copilot/improve-environment-reproducibility
Open

Capture and store software environment for analysis reproducibility#89
Copilot wants to merge 4 commits intov0.8-previewfrom
copilot/improve-environment-reproducibility

Conversation

Copy link

Copilot AI commented Jan 23, 2026

Asimov analyses currently run without recording their software environment (Python version, installed packages), making exact reproduction impossible.

Changes

Environment capture module (asimov/environment.py)

  • Detects environment type (conda/venv/system)
  • Captures conda list --export and pip freeze output
  • Validates commands exist before execution via shutil.which()
  • Saves to JSON metadata + text files

Pipeline integration (asimov/pipeline.py)

  • Hooks into Pipeline.before_config() to capture at build time
  • Stores environment files in analysis rundir:
    • environment.json - metadata (Python version, env type, UTC timestamp)
    • environment-pip.txt - pip freeze output
    • environment-conda.txt - conda list (if applicable)
  • Uploads files to results store via _store_environment_files() on completion
  • Graceful degradation on errors or missing directories

Usage
Environment capture is automatic and transparent:

# Build triggers capture
$ asimov manage build

# Environment files created in rundir:
working/GW150914/prod0/environment.json
working/GW150914/prod0/environment-pip.txt

# On completion, files uploaded to results store
# Retrievable via Storage API:
from asimov.storage import Store
store = Store(root=config.get("storage", "directory"))
env_file = store.fetch_file("GW150914", "prod0", "environment.json")

Skip capture in dry-run mode via --dryrun flag.

Architecture

Modular design supports future extensions:

  • Container-based environment management (Docker/Singularity)
  • Shared environment pools to reduce storage duplication
  • System-level dependency tracking

No breaking changes. Existing analyses continue unchanged.

Original prompt

This section details on the original issue you should resolve

<issue_title>Improve software environment reproducibility</issue_title>
<issue_description>At present asimov runs in a local python environment, typically either a python venv or a conda environment.
However, it does not record the software versions or environment details with each analysis, as would be required for full reproducibility.

In order to make asimov able to precisely reproduce an analysis we need two things

  • store the precise software environment (conda list or pip freeze) in the working directory and the results store so that it can be packaged (analysis packaging is for a future issue, but this information will be required)
  • The ability for asimov to create and control environments, allowing it to precisely reproduce an analysis

While allowing asimov to control software environments will be a valuable tool, we will need to think of a sensible way of managing these without using enormous amounts of storage; most, or potentially all analyses will use exactly the same environment, so having one environment per analysis is impractical.

An obvious option would be to run all analyses in versioned containers, and we should add support for this, however the need to build containers for simple workflows is also unpalatable to many users.</issue_description>

Comments on the Issue (you are @copilot in this section)


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Copilot AI and others added 3 commits January 23, 2026 22:54
- Created asimov/environment.py module with EnvironmentCapture class
- Detects environment type (conda/venv/system)
- Captures conda list and pip freeze outputs
- Saves environment files to working directory
- Integrated into Pipeline.before_config() to capture at build time
- Added _store_environment_files() to store in results store
- Added comprehensive tests for environment capture

Co-authored-by: transientlunatic <4365778+transientlunatic@users.noreply.github.com>
- Created comprehensive documentation in environment-reproducibility.rst
- Added integration tests for pipeline environment capture
- Updated documentation index to include new environment reproducibility section
- All tests passing (21 total environment tests)

Co-authored-by: transientlunatic <4365778+transientlunatic@users.noreply.github.com>
- Added shutil.which() validation for conda command before execution
- Changed timestamp to use UTC (timezone.utc) for consistency
- Added Windows commands to documentation for cross-platform support
- All tests still passing

Co-authored-by: transientlunatic <4365778+transientlunatic@users.noreply.github.com>
Copilot AI changed the title [WIP] Add ability to record software environment details for reproducibility Capture and store software environment for analysis reproducibility Jan 23, 2026
@transientlunatic transientlunatic marked this pull request as ready for review January 28, 2026 15:14
Copilot AI review requested due to automatic review settings January 28, 2026 15:14
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds automatic software environment capture to asimov to enable reproducible analyses. When an analysis is built, asimov now captures the Python version, installed packages (via pip freeze and conda list), and environment metadata, storing this information both in the working directory and the results store upon completion.

Changes:

  • New asimov/environment.py module providing environment detection and capture functionality
  • Integration into Pipeline.before_config() to capture environment at build time, with storage to results store via Pipeline._store_environment_files() on completion
  • Comprehensive test coverage in test_environment.py and test_pipeline_environment.py
  • Documentation in environment-reproducibility.rst with usage examples and technical details

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file
File Description
asimov/environment.py Core environment capture module with conda/venv/system detection, pip freeze and conda list export functionality
asimov/pipeline.py Integrates environment capture into build pipeline via before_config hook and stores files to results store on completion
tests/test_environment.py Unit tests for EnvironmentCapture class covering detection, capture, and save functionality
tests/test_pipeline_environment.py Integration tests for pipeline environment capture including dryrun mode and error handling
docs/source/environment-reproducibility.rst Complete documentation covering usage, API examples, and technical implementation details
docs/source/index.rst Adds environment-reproducibility documentation to Advanced topics section

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +209 to +213
except (OSError, IOError) as e:
self.logger.warning(f"Failed to store environment file {filepath}: {e}")
except Exception as e:
# Handle case where file might already be in store
self.logger.debug(f"Environment file {filepath} already in store or error: {e}")
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code catches generic Exception which could mask specific issues like AlreadyPresentException from the Store. Based on the pattern in asimov/pipelines/bayeswave.py (lines 541-544), AlreadyPresentException should be caught separately and logged as a debug message, not a warning, since it indicates the file is already stored rather than an error condition.

Copilot uses AI. Check for mistakes.
Returns
-------
str
One of: 'conda', 'virtualenv', 'venv', or 'system'
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring states that this method returns one of: 'conda', 'virtualenv', 'venv', or 'system', but the implementation only returns 'conda', 'virtualenv', or 'system'. The return value 'venv' is documented but never returned. Either the implementation should distinguish between venv and virtualenv, or the docstring should be updated to remove 'venv' from the list of possible return values.

Suggested change
One of: 'conda', 'virtualenv', 'venv', or 'system'
One of: 'conda', 'virtualenv', or 'system'

Copilot uses AI. Check for mistakes.
"""Test environment type detection."""
capture = EnvironmentCapture()
# The environment type should be one of the expected values
self.assertIn(capture.env_type, ['conda', 'virtualenv', 'venv', 'system'])
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test expects the environment type to be one of 'conda', 'virtualenv', 'venv', or 'system', but the implementation only returns 'conda', 'virtualenv', or 'system'. The value 'venv' is never returned by _detect_environment_type, so this test assertion includes an impossible value.

Suggested change
self.assertIn(capture.env_type, ['conda', 'virtualenv', 'venv', 'system'])
self.assertIn(capture.env_type, ['conda', 'virtualenv', 'system'])

Copilot uses AI. Check for mistakes.
with patch.object(sys, 'prefix', '/path/to/venv'):
with patch.object(sys, 'base_prefix', '/usr'):
capture = EnvironmentCapture()
# Should detect as virtualenv (not system)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test comment states "Should detect as virtualenv (not system)" but the assertion allows both 'virtualenv' and 'conda' as valid return values. The comment should clarify that conda is also a possible return value due to the conda-meta check in the implementation, or the test should be structured to explicitly test both cases separately.

Suggested change
# Should detect as virtualenv (not system)
# Should detect as virtualenv or conda (not system)

Copilot uses AI. Check for mistakes.
if pip_output is not None:
self.assertIsInstance(pip_output, str)
# Should contain package information
self.assertTrue(len(pip_output) > 0)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.

Suggested change
self.assertTrue(len(pip_output) > 0)
self.assertGreater(len(pip_output), 0)

Copilot uses AI. Check for mistakes.
with open(created_files['pip'], 'r') as f:
content = f.read()
# Should contain some package information
self.assertTrue(len(content) > 0)
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.

Suggested change
self.assertTrue(len(content) > 0)
self.assertGreater(len(content), 0)

Copilot uses AI. Check for mistakes.
import subprocess
import tempfile
import shutil
from pathlib import Path
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'Path' is not used.

Suggested change
from pathlib import Path

Copilot uses AI. Check for mistakes.
import tempfile
import shutil
from pathlib import Path
from unittest.mock import patch, MagicMock
Copy link

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import of 'MagicMock' is not used.

Suggested change
from unittest.mock import patch, MagicMock
from unittest.mock import patch

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants