Skip to content

Conversation

@lispandfound
Copy link
Contributor

This PR refactors the plan-workflow utility to make it hopefully easier to keep the tool up-to-date and hence usable for the future.

The Changes in a Nutshell

The plan-workflow tool takes in a list of realisations, workflow stages to target (like intensity measure calculation) , workflow stages to exclude (like generating a realisation from GCMT solutions), and produces a cylc flow file that can be run on a target host (like Hypocentre, NeSI, RCH, TACC, KISTI, etc). The goal of this tool is to simplify the generation of workflows for researchers.

This tool needs to change a lot because our environments change a lot. It needs to do this in a way that means that everybody can contribute. The previous design of plan-workflow was not modular enough, which results in the tool lagging behind workflow changes. To prevent this occurring in the future I have refactored the way the tool loads the modules by:

  1. Changing the directory structure of the cylc templates. Previously, every host needed its own set of cylc files. Hence, updating the calling procedure for hf-sim required updating it for N hosts. Most of the only slurm directives or environment variables change between different hosts. Hence, there was a lot of redundant definitions that were tedious to maintain. Now, there is one set of cylc templates and a set of host environments that add the parts that changes (directives, environment, pre-script).
  2. Making plan-workflow smarter at discovering dependencies. Previously, the plan-workflow script had the workflow graph baked into the python source code. This meant, in addition to updating the cylc templates, you also needed to potentially change the dependencies in plan_workflow.py python code. Moreover, it wasn't easy to remember the dependencies between workflow stages. This PR changes the behaviour so that dependencies are inferred from the required and provided files rather than explicit dependencies where possible. The upshot of this is that the workflow graph is created dynamically and new stages are automatically placed in the right spot in the graph.
  3. plan-workflow now generates cylc flow files using shell environment variables and cylc template variables. Previously, cylc flow files were essentially compiled output and not intended to be human readable. This meant that if you generate a cylc file and then the environment changed (say, the container path changed), the only way to update the changes would be to do lots of error-prone find and replace. Now cylc files generate fewer stages with cylc template variables, and uses environment variables to simplify stage definitions.

Anatomy of a Workflow Plan

Workflows are now composed from two short files and a directory of cylc templates.

The Stage Definitions

Inside workflow/templates/stages.toml

Copilot AI review requested due to automatic review settings August 12, 2025 02:52
Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @lispandfound, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the 'plan-workflow' utility to enhance its maintainability and adaptability. The core changes involve modularizing the Cylc template structure, implementing a smarter dependency discovery mechanism, and generating more human-readable Cylc flow files. This overhaul aims to streamline workflow generation and facilitate future contributions.

Highlights

  • Modular Cylc Templates: The previous host-specific Cylc template structure has been replaced with a single set of templates combined with host-specific environment configurations, reducing redundancy and simplifying updates.
  • Dynamic Dependency Discovery: The plan-workflow tool now infers workflow dependencies from required and provided files/configurations, eliminating hardcoded workflow graphs and automatically placing new stages correctly.
  • Improved Cylc Flow File Generation: Generated Cylc flow files are now more human-readable, utilizing shell environment variables and Cylc template variables, which simplifies updates when environmental paths change.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

This comment was marked as outdated.

Copy link

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is an excellent and significant refactoring of the plan-workflow utility. The move to a data-driven approach with stages.toml for dependency management and modular, host-specific configurations is a huge improvement for maintainability and extensibility. The use of modern Cylc features like parameterized tasks and Jinja2 templating for generating the workflow files is also a great step forward.

While the overall design is very strong, I've found several critical issues in the new template files that would prevent the generated workflows from running correctly. These seem to be mostly copy-paste errors or oversights during the refactoring. I've detailed them in the specific comments below. Once these are addressed, this will be a fantastic improvement to the project.

This comment was marked as outdated.

@lispandfound lispandfound requested a review from Copilot August 12, 2025 23:20
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the plan-workflow utility to simplify maintenance and improve modularity. The goal is to make the tool easier to update as environments change and enable broader contributor participation.

  • Unified template structure with one set of cylc templates and host-specific environment configurations
  • Dynamic dependency inference from stage requirements and outputs instead of hardcoded dependencies
  • Environment variable and template variable usage for better human-readable and maintainable cylc files

Reviewed Changes

Copilot reviewed 66 out of 69 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
workflow/templates/stages/*.cylc New unified workflow stage templates extending base stage template
workflow/templates/stages.toml Stage definitions with parameters, requirements, and outputs for dependency inference
workflow/templates/stage.cylc Base template for all workflow stages with host environment integration
workflow/templates/environments/*.toml Host-specific environment configurations replacing per-host cylc directories
workflow/templates/flow.cylc Updated main flow template using new parameterized approach
workflow/scripts/plan_workflow.py Complete rewrite implementing dynamic dependency resolution and modular template system
pyproject.toml Removed pyvis dependency

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants