feat: add S3 results tagging workflow for orphaned directories cleanup #173

edmundmiller · 2025-08-21T20:20:12Z

Add automated GitHub Action workflow to identify and tag orphaned S3 results directories that cause "no AWS results found" issues on the website.

The workflow:

Fetches current pipeline releases from pipelines.json
Scans S3 bucket for all results-* directories
Tags current releases with metadata (pipeline, release, sha, status)
Tags orphaned directories with deleteme=true for future cleanup
Runs weekly by default with manual trigger options
Includes safety features: dry-run mode and deletion toggle

This addresses the hash mismatch issue where re-tagged releases leave unreachable results-* directories in S3, preventing the website from displaying AWS results tabs properly.

https://nfcore.slack.com/archives/C01QPMKBYNR/p1755718410593069
https://nfcore.slack.com/archives/CE6SDBX2A/p1755739921163179

Add automated GitHub Action workflow to identify and tag orphaned S3 results directories that cause "no AWS results found" issues on the website. The workflow: - Fetches current pipeline releases from pipelines.json - Scans S3 bucket for all results-* directories - Tags current releases with metadata (pipeline, release, sha, status) - Tags orphaned directories with deleteme=true for future cleanup - Runs weekly by default with manual trigger options - Includes safety features: dry-run mode and deletion toggle This addresses the hash mismatch issue where re-tagged releases leave unreachable results-* directories in S3, preventing the website from displaying AWS results tabs properly. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Add pull request trigger to run S3 results tagging workflow in safe dry-run mode for testing changes to the workflow or script. Safety features for PR runs: - Forces dry-run mode regardless of inputs - Disables deletion functionality completely - Only triggers when workflow or script files are modified - Clear logging about safety mode activation This allows testing the tagging logic and workflow changes safely before they're merged and run in production. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

…tching The S3 results tagging script was failing with a 404 error when trying to fetch pipeline data from https://nf-core.s3.amazonaws.com/pipelines.json, which doesn't exist. Changes: - Replace S3 URL with GitHub API calls to fetch nf-core pipeline releases - Add GitHub token support for higher API rate limits - Update script to use GitHub API endpoints for repository and release data - Maintain backward compatibility with custom JSON endpoints - Update GitHub workflow to use GitHub token instead of S3 URL - Add uv script dependencies for boto3 and requests This resolves "no AWS results found" issues on the nf-core website by ensuring the tagging script can properly identify current vs orphaned results directories. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

…management Replace pip-based dependency installation with uv for faster, more reliable package management in the S3 results tagging workflow. Changes: - Add astral-sh/setup-uv action to install uv - Use uv python install instead of actions/setup-python - Replace pip install commands with uv run for script execution - Leverage inline script dependencies defined in s3_results_tagger.py This improves build performance and ensures consistent dependency resolution with the uv script format already implemented in the Python script. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

Fix GitHub Actions error by updating astral-sh/setup-uv from invalid hash reference to the correct v6 tag reference, and remove explicit version specification to use the latest version. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

…agger Enhance the S3 results tagging script with comprehensive reporting that shows: - List of all orphaned directories that will be tagged for deletion - Breakdown by pipeline showing count of orphaned directories - Clear indication of deletion status (tagged only vs will be deleted) - Summary of current release directories (up to 10, with truncation for more) - Improved visual formatting with emojis and structured output This provides better visibility into what directories are being processed and helps operators understand the impact of the cleanup operations before and after execution. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

…port Improve the S3 results tagging workflow to capture and display detailed information in the GitHub Actions job summary, including: - Enhanced statistics table with execution mode and deletion status - Complete list of orphaned directories found during the run - Pipeline breakdown showing orphaned directory counts per pipeline - Clear visual indicators for deletion vs tagging-only mode - Links to full workflow logs for detailed information The workflow now captures script output to a log file and parses it to extract relevant information for the GitHub Actions summary, providing better visibility into cleanup operations directly in the Actions UI. 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Format orphaned directories as markdown list with SHA details - Style pipeline breakdown with bold headers and directory counts - Replace comma-separated format with readable bullet points - Limit display to first 50 directories with overflow indicator 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

mashehu · 2025-08-22T09:52:03Z

Hmm, I would not like this to run weekly. For now, we often manually fix bucket names to not lose the results and spent compute. This happens sometimes months after the run finished. I would first have something that finds or fixes the above issues automatically before adding a cleanup section.

edmundmiller · 2025-08-22T14:56:05Z

Hmm that gives me an idea to see how many results for releases we're missing.

- Add generate_test_coverage_report.py script to analyze pipeline test coverage - Create test-coverage-report.yml workflow for automated weekly reporting - Cross-reference all nf-core releases with S3 test results to identify gaps - Generate reports in multiple formats (markdown, CSV, JSON) with actionable insights - Include priority analysis for pipelines with lowest coverage - Integrate with existing GitHub Actions and S3 infrastructure - Update README with comprehensive documentation and usage examples 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Add --test-mode flag to generate_test_coverage_report.py - Enable testing without AWS credentials by using mock S3 data - Create synthetic test results for first 3 pipelines with 2 releases each - Improve development workflow and testing capabilities 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

- Replace GitHub API repository enumeration with official nf-core pipelines.json - Eliminate non-pipeline repositories (vale, setup-nf-test, etc.) from analysis - Improve efficiency with single HTTP request vs multiple GitHub API calls - Use authoritative data source matching nf-core website - Process 137 official pipelines with 593 total releases - Remove rate limit issues and improve accuracy 🤖 Generated with [Claude Code](https://claude.ai/code) Co-Authored-By: Claude <[email protected]>

edmundmiller · 2025-08-22T20:15:58Z

@mashehu to clarify for right now this isn't running the deletion unless you're manually calling it but we can turn it off. It would just tag anything that would get deleted with deleteme.

I added a report action to see what versions we're missing in the bucket:

test_coverage_report_20250822_163155.json
test_coverage_report_20250822_163155.csv

edmundmiller self-assigned this Aug 21, 2025

edmundmiller and others added 5 commits August 21, 2025 15:22

edmundmiller force-pushed the megatests-results branch from 6b2caac to 78592a7 Compare August 21, 2025 21:53

edmundmiller force-pushed the megatests-results branch from 78592a7 to 8855ec9 Compare August 21, 2025 22:16

edmundmiller marked this pull request as ready for review August 21, 2025 22:23

edmundmiller requested review from maxulysse and a team as code owners August 21, 2025 22:23

edmundmiller requested a review from mashehu August 21, 2025 22:23

edmundmiller added this to nf-core infrastructure projects Aug 22, 2025

edmundmiller and others added 3 commits August 22, 2025 10:15

edmundmiller moved this to In Progress in nf-core infrastructure projects Aug 26, 2025

ci: Remove deletion for now

8779f05

edmundmiller moved this from In Progress to In Review in nf-core infrastructure projects Sep 9, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add S3 results tagging workflow for orphaned directories cleanup #173

feat: add S3 results tagging workflow for orphaned directories cleanup #173

edmundmiller commented Aug 21, 2025 •

edited

Loading

Uh oh!

mashehu commented Aug 22, 2025

Uh oh!

edmundmiller commented Aug 22, 2025

Uh oh!

edmundmiller commented Aug 22, 2025

Uh oh!

Uh oh!

feat: add S3 results tagging workflow for orphaned directories cleanup #173

Are you sure you want to change the base?

feat: add S3 results tagging workflow for orphaned directories cleanup #173

Conversation

edmundmiller commented Aug 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mashehu commented Aug 22, 2025

Uh oh!

edmundmiller commented Aug 22, 2025

Uh oh!

edmundmiller commented Aug 22, 2025

Uh oh!

Uh oh!

edmundmiller commented Aug 21, 2025 •

edited

Loading