-
Notifications
You must be signed in to change notification settings - Fork 0
Simplify plan-workflow maintenance #62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: pegasus
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @lispandfound, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly refactors the 'plan-workflow' utility to enhance its maintainability and adaptability. The core changes involve modularizing the Cylc template structure, implementing a smarter dependency discovery mechanism, and generating more human-readable Cylc flow files. This overhaul aims to streamline workflow generation and facilitate future contributions.
Highlights
- Modular Cylc Templates: The previous host-specific Cylc template structure has been replaced with a single set of templates combined with host-specific environment configurations, reducing redundancy and simplifying updates.
- Dynamic Dependency Discovery: The plan-workflow tool now infers workflow dependencies from required and provided files/configurations, eliminating hardcoded workflow graphs and automatically placing new stages correctly.
- Improved Cylc Flow File Generation: Generated Cylc flow files are now more human-readable, utilizing shell environment variables and Cylc template variables, which simplifies updates when environmental paths change.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This is an excellent and significant refactoring of the plan-workflow utility. The move to a data-driven approach with stages.toml for dependency management and modular, host-specific configurations is a huge improvement for maintainability and extensibility. The use of modern Cylc features like parameterized tasks and Jinja2 templating for generating the workflow files is also a great step forward.
While the overall design is very strong, I've found several critical issues in the new template files that would prevent the generated workflows from running correctly. These seem to be mostly copy-paste errors or oversights during the refactoring. I've detailed them in the specific comments below. Once these are addressed, this will be a fantastic improvement to the project.
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
Co-authored-by: Copilot <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR refactors the plan-workflow utility to simplify maintenance and improve modularity. The goal is to make the tool easier to update as environments change and enable broader contributor participation.
- Unified template structure with one set of cylc templates and host-specific environment configurations
- Dynamic dependency inference from stage requirements and outputs instead of hardcoded dependencies
- Environment variable and template variable usage for better human-readable and maintainable cylc files
Reviewed Changes
Copilot reviewed 66 out of 69 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| workflow/templates/stages/*.cylc | New unified workflow stage templates extending base stage template |
| workflow/templates/stages.toml | Stage definitions with parameters, requirements, and outputs for dependency inference |
| workflow/templates/stage.cylc | Base template for all workflow stages with host environment integration |
| workflow/templates/environments/*.toml | Host-specific environment configurations replacing per-host cylc directories |
| workflow/templates/flow.cylc | Updated main flow template using new parameterized approach |
| workflow/scripts/plan_workflow.py | Complete rewrite implementing dynamic dependency resolution and modular template system |
| pyproject.toml | Removed pyvis dependency |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
This PR refactors the
plan-workflowutility to make it hopefully easier to keep the tool up-to-date and hence usable for the future.The Changes in a Nutshell
The
plan-workflowtool takes in a list of realisations, workflow stages to target (like intensity measure calculation) , workflow stages to exclude (like generating a realisation from GCMT solutions), and produces a cylc flow file that can be run on a target host (like Hypocentre, NeSI, RCH, TACC, KISTI, etc). The goal of this tool is to simplify the generation of workflows for researchers.This tool needs to change a lot because our environments change a lot. It needs to do this in a way that means that everybody can contribute. The previous design of
plan-workflowwas not modular enough, which results in the tool lagging behind workflow changes. To prevent this occurring in the future I have refactored the way the tool loads the modules by:hf-simrequired updating it forNhosts. Most of the only slurm directives or environment variables change between different hosts. Hence, there was a lot of redundant definitions that were tedious to maintain. Now, there is one set of cylc templates and a set of host environments that add the parts that changes (directives, environment, pre-script).plan-workflowsmarter at discovering dependencies. Previously, theplan-workflowscript had the workflow graph baked into the python source code. This meant, in addition to updating the cylc templates, you also needed to potentially change the dependencies inplan_workflow.pypython code. Moreover, it wasn't easy to remember the dependencies between workflow stages. This PR changes the behaviour so that dependencies are inferred from the required and provided files rather than explicit dependencies where possible. The upshot of this is that the workflow graph is created dynamically and new stages are automatically placed in the right spot in the graph.plan-workflownow generates cylc flow files using shell environment variables and cylc template variables. Previously, cylc flow files were essentially compiled output and not intended to be human readable. This meant that if you generate a cylc file and then the environment changed (say, the container path changed), the only way to update the changes would be to do lots of error-prone find and replace. Now cylc files generate fewer stages with cylc template variables, and uses environment variables to simplify stage definitions.Anatomy of a Workflow Plan
Workflows are now composed from two short files and a directory of cylc templates.
The Stage Definitions
Inside
workflow/templates/stages.toml