-
Notifications
You must be signed in to change notification settings - Fork 17
Proposal for implementing hold, release, and info methods #521
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Hi and thank you for the PR. I am adding @andre-merzky to the discussion. This is indeed something that makes good sense and I believe it to be within the scope of PSI/J. Holding/release were not part of the initial design because we mostly focused on what we perceived automation to be and that is workflows. As far as I can tell without doing a full review, but the code looks clean and nicely follows the existing codebase in style and organization. I think it is likely that info/JobInfo would need a bit of discussion due to some overlapping functionality with the existing code. Specifically, the walltime, various state transition times, the node list, etc. are already available in other places, although not as nicely aggregated. We would also want to ensure that a potential So I'll add a few research tasks here that we should probably work out. Some likely have obvious answers, but it's probably a good idea to have them listed out anyway.
Let's try to answer these and go from there. |
Thank you for your prompt response. I'm happy to be able to discuss this with you. I believe we need to consider the use cases for how hold and release can be utilized within workflow automation. At the very least, hold and release operations are necessary for OOD, but I would also like to investigate whether other workflow tools have similar use cases for these functions. I would also like to discuss info/JobInfo. Monitoring not only job statuses but also real-time job information, such as CPU usage, can be useful for verifying job health. Since some of this information overlaps, I think we need to organize it properly. Regarding info, in cases where monitoring needs to be done separately from the job submission process, the current system does not seem to provide all the necessary data. Since job schedulers retain submission-time information, I was considering them as a way to query and update job details. At least for OOD, this was essential. I will look into the research tasks raised. |
@andre-merzky can weigh in on this, but I think that hold/release are a reasonable part of interacting with a scheduler and should be included. We'll need to do some work on our end beyond this PR, but that can be done separately.
These are indeed useful. However, real-time CPU usage would be beyond the scope of PSI/J. There are two reasons for this:
Perhaps we could start by listing exactly what information is needed by OOD, and then we can see if there is a way to implement a solution that can be layered on top of PSI/J rather than within.
And thank you for your input and contributions. |
This is part of our effort to integrate PSI/J into Open OnDemand (URL: https://openondemand.org/), a web portal for HPC systems.
Currently, Open OnDemand maintains an adapter (backend) for each scheduler, leading to increased maintenance costs.
We are planning to utilize PSI/J to abstract different schedulers and create a single adapter for all schedulers supported by PSI/J.
Open OnDemand requires APIs for hold, release, and info operations, in addition to job submission and deletion already supported by PSI/J. We have implemented these methods as follows:
Once this PR is approved, we will update the documentation accordingly. We would appreciate any feedback.