Skip to content

Conversation

marinak-ebi
Copy link
Contributor

@marinak-ebi marinak-ebi commented Jun 27, 2025

Improvements

Optimisation

  • Logs compression changed from console command to sbatch job
  • Added deterministic shuf to make size of each statpacket evenly distributed
  • Reduce number of statspackets by increasing records per file
  • Closes Consider using SSD instead of HDD #57

Performance Notes

  • Statistical pipeline execution time:
    • HDD: 3 days 12 hours
    • SSD: 3 days 5 hours (annotation pipeline failed due to timeout)
  • SSD offers faster read/write operations but slower job submission.
  • Decision: Continue using HDD for now.

@marinak-ebi marinak-ebi self-assigned this Jun 27, 2025
@marinak-ebi marinak-ebi added bug Something isn't working refactor Restructure code, but not its functionality labels Jun 27, 2025
@marinak-ebi marinak-ebi changed the title Add shuf Restructure the pipeline, optimise logs and performance Aug 12, 2025
@marinak-ebi marinak-ebi requested a review from ficolo August 12, 2025 12:49
@marinak-ebi marinak-ebi marked this pull request as ready for review September 5, 2025 10:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working refactor Restructure code, but not its functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Fix argument list too long error in orchestration.sh Consider using SSD instead of HDD
1 participant