Skip to content

Conversation

edmundmiller
Copy link
Contributor

@edmundmiller edmundmiller commented Aug 21, 2025

Add automated GitHub Action workflow to identify and tag orphaned S3 results directories that cause "no AWS results found" issues on the website.

The workflow:

  • Fetches current pipeline releases from pipelines.json
  • Scans S3 bucket for all results-* directories
  • Tags current releases with metadata (pipeline, release, sha, status)
  • Tags orphaned directories with deleteme=true for future cleanup
  • Runs weekly by default with manual trigger options
  • Includes safety features: dry-run mode and deletion toggle

This addresses the hash mismatch issue where re-tagged releases leave unreachable results-* directories in S3, preventing the website from displaying AWS results tabs properly.

https://nfcore.slack.com/archives/C01QPMKBYNR/p1755718410593069
https://nfcore.slack.com/archives/CE6SDBX2A/p1755739921163179

Add automated GitHub Action workflow to identify and tag orphaned S3 results
directories that cause "no AWS results found" issues on the website.

The workflow:
- Fetches current pipeline releases from pipelines.json
- Scans S3 bucket for all results-* directories
- Tags current releases with metadata (pipeline, release, sha, status)
- Tags orphaned directories with deleteme=true for future cleanup
- Runs weekly by default with manual trigger options
- Includes safety features: dry-run mode and deletion toggle

This addresses the hash mismatch issue where re-tagged releases leave
unreachable results-* directories in S3, preventing the website from
displaying AWS results tabs properly.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@edmundmiller edmundmiller self-assigned this Aug 21, 2025
edmundmiller and others added 5 commits August 21, 2025 15:22
Add pull request trigger to run S3 results tagging workflow in safe
dry-run mode for testing changes to the workflow or script.

Safety features for PR runs:
- Forces dry-run mode regardless of inputs
- Disables deletion functionality completely
- Only triggers when workflow or script files are modified
- Clear logging about safety mode activation

This allows testing the tagging logic and workflow changes safely
before they're merged and run in production.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…tching

The S3 results tagging script was failing with a 404 error when trying to fetch
pipeline data from https://nf-core.s3.amazonaws.com/pipelines.json, which doesn't exist.

Changes:
- Replace S3 URL with GitHub API calls to fetch nf-core pipeline releases
- Add GitHub token support for higher API rate limits
- Update script to use GitHub API endpoints for repository and release data
- Maintain backward compatibility with custom JSON endpoints
- Update GitHub workflow to use GitHub token instead of S3 URL
- Add uv script dependencies for boto3 and requests

This resolves "no AWS results found" issues on the nf-core website by ensuring
the tagging script can properly identify current vs orphaned results directories.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…management

Replace pip-based dependency installation with uv for faster, more reliable
package management in the S3 results tagging workflow.

Changes:
- Add astral-sh/setup-uv action to install uv
- Use uv python install instead of actions/setup-python
- Replace pip install commands with uv run for script execution
- Leverage inline script dependencies defined in s3_results_tagger.py

This improves build performance and ensures consistent dependency resolution
with the uv script format already implemented in the Python script.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
Fix GitHub Actions error by updating astral-sh/setup-uv from invalid hash
reference to the correct v6 tag reference, and remove explicit version
specification to use the latest version.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…agger

Enhance the S3 results tagging script with comprehensive reporting that shows:
- List of all orphaned directories that will be tagged for deletion
- Breakdown by pipeline showing count of orphaned directories
- Clear indication of deletion status (tagged only vs will be deleted)
- Summary of current release directories (up to 10, with truncation for more)
- Improved visual formatting with emojis and structured output

This provides better visibility into what directories are being processed
and helps operators understand the impact of the cleanup operations before
and after execution.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
…port

Improve the S3 results tagging workflow to capture and display detailed
information in the GitHub Actions job summary, including:

- Enhanced statistics table with execution mode and deletion status
- Complete list of orphaned directories found during the run
- Pipeline breakdown showing orphaned directory counts per pipeline
- Clear visual indicators for deletion vs tagging-only mode
- Links to full workflow logs for detailed information

The workflow now captures script output to a log file and parses it to
extract relevant information for the GitHub Actions summary, providing
better visibility into cleanup operations directly in the Actions UI.

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Format orphaned directories as markdown list with SHA details
- Style pipeline breakdown with bold headers and directory counts
- Replace comma-separated format with readable bullet points
- Limit display to first 50 directories with overflow indicator

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@edmundmiller edmundmiller marked this pull request as ready for review August 21, 2025 22:23
@edmundmiller edmundmiller requested review from maxulysse and a team as code owners August 21, 2025 22:23
@edmundmiller edmundmiller requested a review from mashehu August 21, 2025 22:23
@mashehu
Copy link
Contributor

mashehu commented Aug 22, 2025

Hmm, I would not like this to run weekly. For now, we often manually fix bucket names to not lose the results and spent compute. This happens sometimes months after the run finished. I would first have something that finds or fixes the above issues automatically before adding a cleanup section.

@edmundmiller
Copy link
Contributor Author

Hmm that gives me an idea to see how many results for releases we're missing.

edmundmiller and others added 3 commits August 22, 2025 10:15
- Add generate_test_coverage_report.py script to analyze pipeline test coverage
- Create test-coverage-report.yml workflow for automated weekly reporting
- Cross-reference all nf-core releases with S3 test results to identify gaps
- Generate reports in multiple formats (markdown, CSV, JSON) with actionable insights
- Include priority analysis for pipelines with lowest coverage
- Integrate with existing GitHub Actions and S3 infrastructure
- Update README with comprehensive documentation and usage examples

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Add --test-mode flag to generate_test_coverage_report.py
- Enable testing without AWS credentials by using mock S3 data
- Create synthetic test results for first 3 pipelines with 2 releases each
- Improve development workflow and testing capabilities

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
- Replace GitHub API repository enumeration with official nf-core pipelines.json
- Eliminate non-pipeline repositories (vale, setup-nf-test, etc.) from analysis
- Improve efficiency with single HTTP request vs multiple GitHub API calls
- Use authoritative data source matching nf-core website
- Process 137 official pipelines with 593 total releases
- Remove rate limit issues and improve accuracy

🤖 Generated with [Claude Code](https://claude.ai/code)

Co-Authored-By: Claude <[email protected]>
@edmundmiller
Copy link
Contributor Author

@mashehu to clarify for right now this isn't running the deletion unless you're manually calling it but we can turn it off. It would just tag anything that would get deleted with deleteme.

I added a report action to see what versions we're missing in the bucket:

test_coverage_report_20250822_163155.json
test_coverage_report_20250822_163155.csv

@edmundmiller edmundmiller moved this from In Progress to In Review in nf-core infrastructure projects Sep 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Review
Development

Successfully merging this pull request may close these issues.

2 participants