Capture and store software environment for analysis reproducibility by Copilot · Pull Request #89 · etive-io/asimov

Copilot · 2026-01-23T22:49:16Z

Asimov analyses currently run without recording their software environment (Python version, installed packages), making exact reproduction impossible.

Changes

Environment capture module (asimov/environment.py)

Detects environment type (conda/venv/system)
Captures conda list --export and pip freeze output
Validates commands exist before execution via shutil.which()
Saves to JSON metadata + text files

Pipeline integration (asimov/pipeline.py)

Hooks into Pipeline.before_config() to capture at build time
Stores environment files in analysis rundir:
- environment.json - metadata (Python version, env type, UTC timestamp)
- environment-pip.txt - pip freeze output
- environment-conda.txt - conda list (if applicable)
Uploads files to results store via _store_environment_files() on completion
Graceful degradation on errors or missing directories

Usage
Environment capture is automatic and transparent:

# Build triggers capture
$ asimov manage build

# Environment files created in rundir:
working/GW150914/prod0/environment.json
working/GW150914/prod0/environment-pip.txt

# On completion, files uploaded to results store
# Retrievable via Storage API:
from asimov.storage import Store
store = Store(root=config.get("storage", "directory"))
env_file = store.fetch_file("GW150914", "prod0", "environment.json")

Skip capture in dry-run mode via --dryrun flag.

Architecture

Modular design supports future extensions:

Container-based environment management (Docker/Singularity)
Shared environment pools to reduce storage duplication
System-level dependency tracking

No breaking changes. Existing analyses continue unchanged.

Original prompt

This section details on the original issue you should resolve

<issue_title>Improve software environment reproducibility</issue_title>
<issue_description>At present asimov runs in a local python environment, typically either a python venv or a conda environment.
However, it does not record the software versions or environment details with each analysis, as would be required for full reproducibility.

In order to make asimov able to precisely reproduce an analysis we need two things

store the precise software environment (conda list or pip freeze) in the working directory and the results store so that it can be packaged (analysis packaging is for a future issue, but this information will be required)

The ability for asimov to create and control environments, allowing it to precisely reproduce an analysis

While allowing asimov to control software environments will be a valuable tool, we will need to think of a sensible way of managing these without using enormous amounts of storage; most, or potentially all analyses will use exactly the same environment, so having one environment per analysis is impractical.

An obvious option would be to run all analyses in versioned containers, and we should add support for this, however the need to build containers for simple workflows is also unpalatable to many users.</issue_description>

Comments on the Issue (you are @copilot in this section)

Fixes Improve software environment reproducibility #88

✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

- Created asimov/environment.py module with EnvironmentCapture class - Detects environment type (conda/venv/system) - Captures conda list and pip freeze outputs - Saves environment files to working directory - Integrated into Pipeline.before_config() to capture at build time - Added _store_environment_files() to store in results store - Added comprehensive tests for environment capture Co-authored-by: transientlunatic <4365778+transientlunatic@users.noreply.github.com>

- Created comprehensive documentation in environment-reproducibility.rst - Added integration tests for pipeline environment capture - Updated documentation index to include new environment reproducibility section - All tests passing (21 total environment tests) Co-authored-by: transientlunatic <4365778+transientlunatic@users.noreply.github.com>

- Added shutil.which() validation for conda command before execution - Changed timestamp to use UTC (timezone.utc) for consistency - Added Windows commands to documentation for cross-platform support - All tests still passing Co-authored-by: transientlunatic <4365778+transientlunatic@users.noreply.github.com>

Copilot

Pull request overview

This PR adds automatic software environment capture to asimov to enable reproducible analyses. When an analysis is built, asimov now captures the Python version, installed packages (via pip freeze and conda list), and environment metadata, storing this information both in the working directory and the results store upon completion.

Changes:

New asimov/environment.py module providing environment detection and capture functionality
Integration into Pipeline.before_config() to capture environment at build time, with storage to results store via Pipeline._store_environment_files() on completion
Comprehensive test coverage in test_environment.py and test_pipeline_environment.py
Documentation in environment-reproducibility.rst with usage examples and technical details

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.

Show a summary per file

File	Description
asimov/environment.py	Core environment capture module with conda/venv/system detection, pip freeze and conda list export functionality
asimov/pipeline.py	Integrates environment capture into build pipeline via before_config hook and stores files to results store on completion
tests/test_environment.py	Unit tests for EnvironmentCapture class covering detection, capture, and save functionality
tests/test_pipeline_environment.py	Integration tests for pipeline environment capture including dryrun mode and error handling
docs/source/environment-reproducibility.rst	Complete documentation covering usage, API examples, and technical implementation details
docs/source/index.rst	Adds environment-reproducibility documentation to Advanced topics section

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-01-28T15:20:54Z

asimov/pipeline.py

+                except (OSError, IOError) as e:
+                    self.logger.warning(f"Failed to store environment file {filepath}: {e}")
+                except Exception as e:
+                    # Handle case where file might already be in store
+                    self.logger.debug(f"Environment file {filepath} already in store or error: {e}")


The code catches generic Exception which could mask specific issues like AlreadyPresentException from the Store. Based on the pattern in asimov/pipelines/bayeswave.py (lines 541-544), AlreadyPresentException should be caught separately and logged as a debug message, not a warning, since it indicates the file is already stored rather than an error condition.

Copilot · 2026-01-28T15:20:55Z

asimov/environment.py

+        Returns
+        -------
+        str
+            One of: 'conda', 'virtualenv', 'venv', or 'system'


The docstring states that this method returns one of: 'conda', 'virtualenv', 'venv', or 'system', but the implementation only returns 'conda', 'virtualenv', or 'system'. The return value 'venv' is documented but never returned. Either the implementation should distinguish between venv and virtualenv, or the docstring should be updated to remove 'venv' from the list of possible return values.

Suggested change

One of: 'conda', 'virtualenv', 'venv', or 'system'

One of: 'conda', 'virtualenv', or 'system'

Copilot · 2026-01-28T15:20:55Z

tests/test_environment.py

+        """Test environment type detection."""
+        capture = EnvironmentCapture()
+        # The environment type should be one of the expected values
+        self.assertIn(capture.env_type, ['conda', 'virtualenv', 'venv', 'system'])


The test expects the environment type to be one of 'conda', 'virtualenv', 'venv', or 'system', but the implementation only returns 'conda', 'virtualenv', or 'system'. The value 'venv' is never returned by _detect_environment_type, so this test assertion includes an impossible value.

Suggested change

self.assertIn(capture.env_type, ['conda', 'virtualenv', 'venv', 'system'])

self.assertIn(capture.env_type, ['conda', 'virtualenv', 'system'])

Copilot · 2026-01-28T15:20:55Z

tests/test_environment.py

+        with patch.object(sys, 'prefix', '/path/to/venv'):
+            with patch.object(sys, 'base_prefix', '/usr'):
+                capture = EnvironmentCapture()
+                # Should detect as virtualenv (not system)


The test comment states "Should detect as virtualenv (not system)" but the assertion allows both 'virtualenv' and 'conda' as valid return values. The comment should clarify that conda is also a possible return value due to the conda-meta check in the implementation, or the test should be structured to explicitly test both cases separately.

Suggested change

# Should detect as virtualenv (not system)

# Should detect as virtualenv or conda (not system)

Copilot · 2026-01-28T15:20:56Z

tests/test_environment.py

+        if pip_output is not None:
+            self.assertIsInstance(pip_output, str)
+            # Should contain package information
+            self.assertTrue(len(pip_output) > 0)


assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.

Suggested change

self.assertTrue(len(pip_output) > 0)

self.assertGreater(len(pip_output), 0)

Copilot · 2026-01-28T15:20:56Z

tests/test_environment.py

+            with open(created_files['pip'], 'r') as f:
+                content = f.read()
+                # Should contain some package information
+                self.assertTrue(len(content) > 0)


assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.

Suggested change

self.assertTrue(len(content) > 0)

self.assertGreater(len(content), 0)

Copilot · 2026-01-28T15:20:56Z

tests/test_environment.py

+import subprocess
+import tempfile
+import shutil
+from pathlib import Path


Import of 'Path' is not used.

Suggested change

from pathlib import Path

Copilot · 2026-01-28T15:20:57Z

tests/test_environment.py

+import tempfile
+import shutil
+from pathlib import Path
+from unittest.mock import patch, MagicMock


Import of 'MagicMock' is not used.

Suggested change

from unittest.mock import patch, MagicMock

from unittest.mock import patch

Initial plan

eb678bc

Copilot AI assigned Copilot and transientlunatic Jan 23, 2026

Copilot started work on behalf of transientlunatic January 23, 2026 22:49 View session

Copilot AI and others added 3 commits January 23, 2026 22:54

Copilot AI changed the title ~~[WIP] Add ability to record software environment details for reproducibility~~ Capture and store software environment for analysis reproducibility Jan 23, 2026

Copilot finished work on behalf of transientlunatic January 23, 2026 23:02

Copilot AI requested a review from transientlunatic January 23, 2026 23:02

transientlunatic approved these changes Jan 23, 2026

View reviewed changes

transientlunatic marked this pull request as ready for review January 28, 2026 15:14

Copilot AI review requested due to automatic review settings January 28, 2026 15:14

Copilot started reviewing on behalf of transientlunatic January 28, 2026 15:14 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Capture and store software environment for analysis reproducibility#89

Capture and store software environment for analysis reproducibility#89
Copilot wants to merge 4 commits intov0.8-previewfrom
copilot/improve-environment-reproducibility

Copilot AI commented Jan 23, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Copilot AI Jan 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	One of: 'conda', 'virtualenv', 'venv', or 'system'
	One of: 'conda', 'virtualenv', or 'system'

	self.assertIn(capture.env_type, ['conda', 'virtualenv', 'venv', 'system'])
	self.assertIn(capture.env_type, ['conda', 'virtualenv', 'system'])

	# Should detect as virtualenv (not system)
	# Should detect as virtualenv or conda (not system)

	self.assertTrue(len(pip_output) > 0)
	self.assertGreater(len(pip_output), 0)

	self.assertTrue(len(content) > 0)
	self.assertGreater(len(content), 0)

	from unittest.mock import patch, MagicMock
	from unittest.mock import patch

Conversation

Copilot AI commented Jan 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Architecture

Comments on the Issue (you are @copilot in this section)

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Jan 23, 2026 •

edited

Loading