-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Slurm array jobs to limit concurrent extraction jobs #335
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #335 +/- ##
==========================================
+ Coverage 74.81% 75.27% +0.46%
==========================================
Files 32 32
Lines 4892 4959 +67
==========================================
+ Hits 3660 3733 +73
+ Misses 1232 1226 -6 ☔ View full report in Codecov by Sentry. |
😮💨 no, we can't use array task IDs directly as run numbers, because:
So this will need a rework. |
OK, now it creates one script per run in |
…fter the first batch finishes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm increasingly worried by this swathe of code that's entirely untested 😅 Maybe in the future we can add some tests that'll only run on Maxwell. Anyway, LGTM!
Good point, I'll try to figure out some ways to test this stuff better. Thanks for the review! |
When reprocessing a whole bunch of runs at once, submit them as a Slurm array job, and ask Slurm to limit how many will run at once. This should mitigate the 'database locked' errors, without requiring people to manually batch their jobs and monitor their progress.
The limit is set at 30 concurrent jobs for now. Obviously we can make that configurable.
I've taken the shortcut of using the array task IDs for the run numbers. This avoids having to define a way to communicate 'array task 1 -> run 53', but it does mean that only the run numbers can vary within an array. So the one job we ask to update the variables table in the DB is submitted separately, and if you do
reprocess all
on a database spanning multiple proposals, each proposal would be submitted separately, raising the effective limit to N * 30 jobs. That's not ideal, but so far it's largely theoretical.Closes #96.