Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maestro platform #2235

Open
wants to merge 25 commits into
base: master
Choose a base branch
from
Open

Maestro platform #2235

wants to merge 25 commits into from

Conversation

isimo00
Copy link
Contributor

@isimo00 isimo00 commented Mar 18, 2025

  • Write this prototype workflow for docs/tests
  • Try to extend the Slurm platform (create a new class, extend slurm platform, override one function and do something different)
  • Then start checking what you think will be needed for those that want/need to extend a platform like HPE

Closes #2165

Sorry, something went wrong.

@isimo00 isimo00 added this to the 4.1.13 milestone Mar 18, 2025
@isimo00 isimo00 marked this pull request as draft March 18, 2025 14:26
@codecov-commenter
Copy link

codecov-commenter commented Mar 18, 2025

Codecov Report

Attention: Patch coverage is 46.32353% with 73 lines in your changes missing coverage. Please review.

Project coverage is 55.30%. Comparing base (53b2a14) to head (92ca4a2).
Report is 8 commits behind head on master.

Files with missing lines Patch % Lines
autosubmit/platforms/slurmplatform.py 49.10% 51 Missing and 6 partials ⚠️
autosubmit/platforms/slurm_example.py 0.00% 12 Missing ⚠️
autosubmit/job/job_list.py 57.14% 3 Missing ⚠️
autosubmit/autosubmit.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #2235      +/-   ##
==========================================
+ Coverage   53.90%   55.30%   +1.40%     
==========================================
  Files          72       73       +1     
  Lines       17242    17237       -5     
  Branches     3352     3350       -2     
==========================================
+ Hits         9294     9533     +239     
+ Misses       7103     6866     -237     
+ Partials      845      838       -7     
Flag Coverage Δ
fast-tests 55.30% <46.32%> (+1.40%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@VindeeR
Copy link
Contributor

VindeeR commented Mar 26, 2025

@kinow we have added quite a bit of things, skeleton to the MaestroPlatform, we removed some redundant and repetitive explanation on the Documentation and add a new session, albeit mostly a draft for now, of the SlurmPlatform test.

Could you double check what we got so far so we know we are going in the right direction?

@kinow kinow requested a review from dbeltrankyl March 28, 2025 11:21
@kinow
Copy link
Member

kinow commented Mar 28, 2025

Probably won't have time to review this one (and it's still a draft). Can you have a look at this one later, please, @dbeltrankyl ? You don't need to worry about the Maestro part. The idea here is to provide documentation with enough information (and examples, if possible) to be used by developers outside of the BSC to create new platforms, focused on Slurm as the idea for Maestro is to extend it. So you'd have to just take a look if you find that useful for external developers, and if there's anything wrong in the docs/code. 🙇

Ini docstring

fixing SlurmPlatform docstring

Add some descriptions (testing)

Add skeleton of Maestro platform class

adding docstring (#2236)

* adding docstring

* import failling CI/CD

* docstring slurmplatform

* linting + docstring

Fix exit code in autosubmit commands that return bool

4.1.12 recovery is slow (#2237)

* Changelog

* WIP DROPME Use a dev version of config parser

* WIP DROPME Use a dev version of config parser

* rebse

* update asconfigparser version

* removed update_parameters call

---------

Co-authored-by: Bruno P. Kinoshita <[email protected]>

specify python version and update command line

4.1.12 release branch (#2042)

* Changelog

* WIP DROPME Use a dev version of config parser

* WIP DROPME Use a dev version of config parser

* rebse

* update asconfigparser version

* 4.1.12 recovery is slow (#2237)

* Changelog

* WIP DROPME Use a dev version of config parser

* WIP DROPME Use a dev version of config parser

* rebse

* update asconfigparser version

* removed update_parameters call

---------

Co-authored-by: Bruno P. Kinoshita <[email protected]>

* WIP DROPME Use a dev version of config parser

* WIP DROPME Use a dev version of config parser

* rebse

* changelog added

* changelog added

---------

Co-authored-by: Bruno P. Kinoshita <[email protected]>

Add documentation about workflow commit, fix make watch

update eb recipe (#2243)

Update setuptools requirement from <=76.0.0 to <=77.0.3

Updates the requirements on [setuptools](https://github.com/pypa/setuptools) to permit the latest version.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](pypa/setuptools@0.6...v77.0.3)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Update setuptools requirement from <=77.0.3 to <=78.0.2

Updates the requirements on [setuptools](https://github.com/pypa/setuptools) to permit the latest version.
- [Release notes](https://github.com/pypa/setuptools/releases)
- [Changelog](https://github.com/pypa/setuptools/blob/main/NEWS.rst)
- [Commits](pypa/setuptools@0.6...v78.0.2)

---
updated-dependencies:
- dependency-name: setuptools
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Use pytestify to convert unittests to pytest

Clean unused code

Adding slurm_monitor and strategies tests, deleting unused function in slurm_monitor (see BSC-ES/autosubmit-api#175)

Add Docker marker, update CONTRIBUTING docs with information for contributors to run integration+docker tests

Update portalocker requirement from <=2.7.0 to <=3.1.1

Updates the requirements on [portalocker](https://github.com/wolph/portalocker) to permit the latest version.
- [Release notes](https://github.com/wolph/portalocker/releases)
- [Changelog](https://github.com/wolph/portalocker/blob/develop/CHANGELOG.rst)
- [Commits](wolph/portalocker@v0.5.6...v3.1.1)

---
updated-dependencies:
- dependency-name: portalocker
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <[email protected]>

Add DbManager factory function (#2107)

* add DbManager factory function

* add extra typings

* Update autosubmit/job/job_list_persistence.py

Co-authored-by: Luiggi Tenorio <[email protected]>

---------

Co-authored-by: Bruno P. Kinoshita <[email protected]>

verify args to format message (#2113)

Documentation improvement maestro (#2247)

* fixing some minor documentation problems and removing verbosity that would cause confusion

* fixing sphinx identation problem

* Creating a step-by-step of slurm tests + reaorganizing existing documentation

* refining test

* removing file that shouldnt be there

* removing duplicated parameters simplifying description and proportions

* adjusting docstring of the function

improving quality of documentation bettering wording of instruction and giving hints

changes to platform to simulate different behaviours between slurm and maestro

Change file name

Update job.py

revert

Update ecplatform.py

revert

Update paramiko_submitter.py

revert

removing unused imports

fixing instructions

Fix sphyinx local build and example_platform name change

Small fixes

adjusting configure platform

draft of a extending platform

Add some references

extra information where might be needed changes

Clean Configure Experiments page

small fixes document

Move autosubmit create syntax doc

Move autosubmit create docs

Rewrite some snippets

Add permalinks and formatting
@isimo00 isimo00 force-pushed the create-maestro-platform#2165 branch from 1167908 to b9e4eea Compare March 31, 2025 12:21
@isimo00 isimo00 marked this pull request as ready for review March 31, 2025 12:31
@dbeltrankyl
Copy link
Contributor

I'll take a look at this today

Copy link
Contributor

@dbeltrankyl dbeltrankyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello

Thank you for the work and the improvement in code quality!

I've partially revised it and added some comments; I'll finish it tomorrow. I didn't fully read the RST changes

I have one comment about the example_platform.py,.

Do you plan to rename it to maestro_platform.py at some point? Otherwise I'm not sure if we should maintain the example_platform.py inside the code as it will be unused code. Maybe a better place would be to include it as a snippet in the readthedocs? what do you think?

return SlurmPlatform(self.expid, self.name, self.config)

def get_submit_cmd_x11(self, args, script_name, job):
def get_submit_cmd_x11(self, args: str, script_name: str) -> str:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maestro doesn't need to support this

return int(job_id)
return None

def submit_Script(self, hold: bool=False) -> Union[List[int], int]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of this function can be confused with the def submit_job one.

This always submits a script that has a bunch of sbatch ... and each line represents one job submission

While the def submit_job

Sometimes, it submits the job directly ( I don't recall the cases; I think that some conditions are not slurm platform or x11 is enabled)

Usually, the slurm platform sends the submission script to the remote and calls later by the submit_script function.

And in other cases, it submits the script

This has to be remade at some point, so we always submit all jobs in batches. The current "solution" was a quick patch due to the necessity of the moment ( before, we were only submitting one by one, and the submission time was a performance bottleneck)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is declared in various platforms besides the slurm one, so I created a separate issue to address this, see #2261

Copy link
Contributor

@dbeltrankyl dbeltrankyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello!

Thanks for the work!

Continuing the feedback... I still miss some files I'm trying to find time to, apologies for the delay

@dbeltrankyl
Copy link
Contributor

dbeltrankyl commented Apr 3, 2025

Hello!

Thanks for the work!

Continuing the feedback... I still miss some files I'm trying to find time to, apologies for the delay

Oh, I think I don't miss files

One general comment since it is meant for developers:

I miss:

  1. Some explanation of the platform attributes and what they do ( some affect to the header setting, other indicates the scheduler that will be used, some affect the path's etc )

  2. How to obtain the yaml configuration and how to add new platform attributes based on that. ex.

How to read this:

PLATFORMS:
  MARENOSTRUM5:
     maestro_attribute: "something"

And Autosubmit understands that maestro_attribute is a platform attribute, not something the user adds for the templates.

  1. How to trigger your new platform code through the yml configuration.

After the feedback resolve, I'll take another general look and see if we can fit something else

Thanks for the work!

EDIT: Also, I found some typos but I'm not good at finding them. Maybe it is worth revising the doc using Grammarly or another language tool

@VindeeR VindeeR force-pushed the create-maestro-platform#2165 branch from 597dc81 to 743f744 Compare April 3, 2025 15:11

.. warning::
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you remove this? @VindeeR :(

@dbeltrankyl
Copy link
Contributor

dbeltrankyl commented Apr 4, 2025

Let me know when you want me to re-revise the .rst files ( maybe @LuiggiTenorioK want also read it and see if it can be followed well )

Copy link
Member

@LuiggiTenorioK LuiggiTenorioK left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems well-structured @VindeeR @isimo00! Congrats!

I didn't go deep into the correctness of the content, but here are some comments to improve it.

@VindeeR
Copy link
Contributor

VindeeR commented Apr 4, 2025

Let me know when you want me to re-revise the .rst files ( maybe @LuiggiTenorioK want also read it and see if it can be followed well )

Aside from the last question I've done I guess it's ready to be reviewed again.

Thanks both @dbeltrankyl @LuiggiTenorioK for the big help reviewing the concepts, structure, and bullet proofing the documentation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create Maestro Platform based on Slurm
6 participants