-
Notifications
You must be signed in to change notification settings - Fork 12
Legacy Scheduler Design Notes
Adrian Edwards edited this page Jun 28, 2026
·
1 revision
Scheduler Design
- Requirements
- Be able to permanently sideline jobs for dead repos
- Periodically review dead repos to see if they in fact “came back” or “really exist'”. I think this is necessary to replace our “constant rechecking”, which would have to be disabled if we are permanently sidelining jobs
- Enable administrator override of the queue to manually put some repos at the top.
- Put repos in the front of the line
- Pause collection on specific repos for a set period of time (i.e., put the pause in, but do not require the admin to remember to come back and release it because they won’t remember far too often)
- Monitoring API
- Component metric ideas:
- flower/celery task completion rate (and maybe over time too with a running average?)
- postgres checkpoint lag
- Collection metric ideas:
- task completion rate by task
- API key expiry (with censored keys)
- oldest last collected date for each worker/task?
- Component metric ideas:
implementation scheme: some kind of scoring or points based framework/mental model maybe? then maybe we could allow admins to drill down and look at where each point has been assigned from/what source it came from and/or adjust how many points each task has
- maybe this could be used to allow admins enough control to be able to tune/move the bottlenecks of augur given the system its on. i.e. if its maxing out the available disk IO, maybe the admin could tune it to either do that less, or adjust the scheduling mix to include more cpu, memory, or network intensive tasks so the system is as fully utilized as it can be (or maybe intentionally bottleneck another resource thats cheaper or less problematic)