Capture and store software environment for analysis reproducibility#89
Capture and store software environment for analysis reproducibility#89Copilot wants to merge 4 commits intov0.8-previewfrom
Conversation
- Created asimov/environment.py module with EnvironmentCapture class - Detects environment type (conda/venv/system) - Captures conda list and pip freeze outputs - Saves environment files to working directory - Integrated into Pipeline.before_config() to capture at build time - Added _store_environment_files() to store in results store - Added comprehensive tests for environment capture Co-authored-by: transientlunatic <4365778+transientlunatic@users.noreply.github.com>
- Created comprehensive documentation in environment-reproducibility.rst - Added integration tests for pipeline environment capture - Updated documentation index to include new environment reproducibility section - All tests passing (21 total environment tests) Co-authored-by: transientlunatic <4365778+transientlunatic@users.noreply.github.com>
- Added shutil.which() validation for conda command before execution - Changed timestamp to use UTC (timezone.utc) for consistency - Added Windows commands to documentation for cross-platform support - All tests still passing Co-authored-by: transientlunatic <4365778+transientlunatic@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR adds automatic software environment capture to asimov to enable reproducible analyses. When an analysis is built, asimov now captures the Python version, installed packages (via pip freeze and conda list), and environment metadata, storing this information both in the working directory and the results store upon completion.
Changes:
- New
asimov/environment.pymodule providing environment detection and capture functionality - Integration into
Pipeline.before_config()to capture environment at build time, with storage to results store viaPipeline._store_environment_files()on completion - Comprehensive test coverage in
test_environment.pyandtest_pipeline_environment.py - Documentation in
environment-reproducibility.rstwith usage examples and technical details
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
| asimov/environment.py | Core environment capture module with conda/venv/system detection, pip freeze and conda list export functionality |
| asimov/pipeline.py | Integrates environment capture into build pipeline via before_config hook and stores files to results store on completion |
| tests/test_environment.py | Unit tests for EnvironmentCapture class covering detection, capture, and save functionality |
| tests/test_pipeline_environment.py | Integration tests for pipeline environment capture including dryrun mode and error handling |
| docs/source/environment-reproducibility.rst | Complete documentation covering usage, API examples, and technical implementation details |
| docs/source/index.rst | Adds environment-reproducibility documentation to Advanced topics section |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| except (OSError, IOError) as e: | ||
| self.logger.warning(f"Failed to store environment file {filepath}: {e}") | ||
| except Exception as e: | ||
| # Handle case where file might already be in store | ||
| self.logger.debug(f"Environment file {filepath} already in store or error: {e}") |
There was a problem hiding this comment.
The code catches generic Exception which could mask specific issues like AlreadyPresentException from the Store. Based on the pattern in asimov/pipelines/bayeswave.py (lines 541-544), AlreadyPresentException should be caught separately and logged as a debug message, not a warning, since it indicates the file is already stored rather than an error condition.
| Returns | ||
| ------- | ||
| str | ||
| One of: 'conda', 'virtualenv', 'venv', or 'system' |
There was a problem hiding this comment.
The docstring states that this method returns one of: 'conda', 'virtualenv', 'venv', or 'system', but the implementation only returns 'conda', 'virtualenv', or 'system'. The return value 'venv' is documented but never returned. Either the implementation should distinguish between venv and virtualenv, or the docstring should be updated to remove 'venv' from the list of possible return values.
| One of: 'conda', 'virtualenv', 'venv', or 'system' | |
| One of: 'conda', 'virtualenv', or 'system' |
| """Test environment type detection.""" | ||
| capture = EnvironmentCapture() | ||
| # The environment type should be one of the expected values | ||
| self.assertIn(capture.env_type, ['conda', 'virtualenv', 'venv', 'system']) |
There was a problem hiding this comment.
The test expects the environment type to be one of 'conda', 'virtualenv', 'venv', or 'system', but the implementation only returns 'conda', 'virtualenv', or 'system'. The value 'venv' is never returned by _detect_environment_type, so this test assertion includes an impossible value.
| self.assertIn(capture.env_type, ['conda', 'virtualenv', 'venv', 'system']) | |
| self.assertIn(capture.env_type, ['conda', 'virtualenv', 'system']) |
| with patch.object(sys, 'prefix', '/path/to/venv'): | ||
| with patch.object(sys, 'base_prefix', '/usr'): | ||
| capture = EnvironmentCapture() | ||
| # Should detect as virtualenv (not system) |
There was a problem hiding this comment.
The test comment states "Should detect as virtualenv (not system)" but the assertion allows both 'virtualenv' and 'conda' as valid return values. The comment should clarify that conda is also a possible return value due to the conda-meta check in the implementation, or the test should be structured to explicitly test both cases separately.
| # Should detect as virtualenv (not system) | |
| # Should detect as virtualenv or conda (not system) |
| if pip_output is not None: | ||
| self.assertIsInstance(pip_output, str) | ||
| # Should contain package information | ||
| self.assertTrue(len(pip_output) > 0) |
There was a problem hiding this comment.
assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.
| self.assertTrue(len(pip_output) > 0) | |
| self.assertGreater(len(pip_output), 0) |
| with open(created_files['pip'], 'r') as f: | ||
| content = f.read() | ||
| # Should contain some package information | ||
| self.assertTrue(len(content) > 0) |
There was a problem hiding this comment.
assertTrue(a > b) cannot provide an informative message. Using assertGreater(a, b) instead will give more informative messages.
| self.assertTrue(len(content) > 0) | |
| self.assertGreater(len(content), 0) |
| import subprocess | ||
| import tempfile | ||
| import shutil | ||
| from pathlib import Path |
There was a problem hiding this comment.
Import of 'Path' is not used.
| from pathlib import Path |
| import tempfile | ||
| import shutil | ||
| from pathlib import Path | ||
| from unittest.mock import patch, MagicMock |
There was a problem hiding this comment.
Import of 'MagicMock' is not used.
| from unittest.mock import patch, MagicMock | |
| from unittest.mock import patch |
Asimov analyses currently run without recording their software environment (Python version, installed packages), making exact reproduction impossible.
Changes
Environment capture module (
asimov/environment.py)conda list --exportandpip freezeoutputshutil.which()Pipeline integration (
asimov/pipeline.py)Pipeline.before_config()to capture at build timerundir:environment.json- metadata (Python version, env type, UTC timestamp)environment-pip.txt- pip freeze outputenvironment-conda.txt- conda list (if applicable)_store_environment_files()on completionUsage
Environment capture is automatic and transparent:
Skip capture in dry-run mode via
--dryrunflag.Architecture
Modular design supports future extensions:
No breaking changes. Existing analyses continue unchanged.
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.