-
Notifications
You must be signed in to change notification settings - Fork 2
feat: add S3 results tagging workflow for orphaned directories cleanup #173
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Add automated GitHub Action workflow to identify and tag orphaned S3 results directories that cause "no AWS results found" issues on the website. The workflow: - Fetches current pipeline releases from pipelines.json - Scans S3 bucket for all results-* directories - Tags current releases with metadata (pipeline, release, sha, status) - Tags orphaned directories with deleteme=true for future cleanup - Runs weekly by default with manual trigger options - Includes safety features: dry-run mode and deletion toggle This addresses the hash mismatch issue where re-tagged releases leave unreachable results-* directories in S3, preventing the website from displaying AWS results tabs properly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Add pull request trigger to run S3 results tagging workflow in safe dry-run mode for testing changes to the workflow or script. Safety features for PR runs: - Forces dry-run mode regardless of inputs - Disables deletion functionality completely - Only triggers when workflow or script files are modified - Clear logging about safety mode activation This allows testing the tagging logic and workflow changes safely before they're merged and run in production. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…tching The S3 results tagging script was failing with a 404 error when trying to fetch pipeline data from https://nf-core.s3.amazonaws.com/pipelines.json, which doesn't exist. Changes: - Replace S3 URL with GitHub API calls to fetch nf-core pipeline releases - Add GitHub token support for higher API rate limits - Update script to use GitHub API endpoints for repository and release data - Maintain backward compatibility with custom JSON endpoints - Update GitHub workflow to use GitHub token instead of S3 URL - Add uv script dependencies for boto3 and requests This resolves "no AWS results found" issues on the nf-core website by ensuring the tagging script can properly identify current vs orphaned results directories. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…management Replace pip-based dependency installation with uv for faster, more reliable package management in the S3 results tagging workflow. Changes: - Add astral-sh/setup-uv action to install uv - Use uv python install instead of actions/setup-python - Replace pip install commands with uv run for script execution - Leverage inline script dependencies defined in s3_results_tagger.py This improves build performance and ensures consistent dependency resolution with the uv script format already implemented in the Python script. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Fix GitHub Actions error by updating astral-sh/setup-uv from invalid hash reference to the correct v6 tag reference, and remove explicit version specification to use the latest version. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
…agger Enhance the S3 results tagging script with comprehensive reporting that shows: - List of all orphaned directories that will be tagged for deletion - Breakdown by pipeline showing count of orphaned directories - Clear indication of deletion status (tagged only vs will be deleted) - Summary of current release directories (up to 10, with truncation for more) - Improved visual formatting with emojis and structured output This provides better visibility into what directories are being processed and helps operators understand the impact of the cleanup operations before and after execution. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
6b2caac
to
78592a7
Compare
…port Improve the S3 results tagging workflow to capture and display detailed information in the GitHub Actions job summary, including: - Enhanced statistics table with execution mode and deletion status - Complete list of orphaned directories found during the run - Pipeline breakdown showing orphaned directory counts per pipeline - Clear visual indicators for deletion vs tagging-only mode - Links to full workflow logs for detailed information The workflow now captures script output to a log file and parses it to extract relevant information for the GitHub Actions summary, providing better visibility into cleanup operations directly in the Actions UI. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
78592a7
to
8855ec9
Compare
- Format orphaned directories as markdown list with SHA details - Style pipeline breakdown with bold headers and directory counts - Replace comma-separated format with readable bullet points - Limit display to first 50 directories with overflow indicator 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
Hmm, I would not like this to run weekly. For now, we often manually fix bucket names to not lose the results and spent compute. This happens sometimes months after the run finished. I would first have something that finds or fixes the above issues automatically before adding a cleanup section. |
Hmm that gives me an idea to see how many results for releases we're missing. |
- Add generate_test_coverage_report.py script to analyze pipeline test coverage - Create test-coverage-report.yml workflow for automated weekly reporting - Cross-reference all nf-core releases with S3 test results to identify gaps - Generate reports in multiple formats (markdown, CSV, JSON) with actionable insights - Include priority analysis for pipelines with lowest coverage - Integrate with existing GitHub Actions and S3 infrastructure - Update README with comprehensive documentation and usage examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Add --test-mode flag to generate_test_coverage_report.py - Enable testing without AWS credentials by using mock S3 data - Create synthetic test results for first 3 pipelines with 2 releases each - Improve development workflow and testing capabilities 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
- Replace GitHub API repository enumeration with official nf-core pipelines.json - Eliminate non-pipeline repositories (vale, setup-nf-test, etc.) from analysis - Improve efficiency with single HTTP request vs multiple GitHub API calls - Use authoritative data source matching nf-core website - Process 137 official pipelines with 593 total releases - Remove rate limit issues and improve accuracy 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>
@mashehu to clarify for right now this isn't running the deletion unless you're manually calling it but we can turn it off. It would just tag anything that would get deleted with I added a report action to see what versions we're missing in the bucket: test_coverage_report_20250822_163155.json |
Add automated GitHub Action workflow to identify and tag orphaned S3 results directories that cause "no AWS results found" issues on the website.
The workflow:
This addresses the hash mismatch issue where re-tagged releases leave unreachable results-* directories in S3, preventing the website from displaying AWS results tabs properly.
https://nfcore.slack.com/archives/C01QPMKBYNR/p1755718410593069
https://nfcore.slack.com/archives/CE6SDBX2A/p1755739921163179