Skip to content

Feature: dry run support for deploy command#106

Merged
vasu2856 merged 6 commits intoaws:mainfrom
vasu2856:feature/dry_run
Apr 12, 2026
Merged

Feature: dry run support for deploy command#106
vasu2856 merged 6 commits intoaws:mainfrom
vasu2856:feature/dry_run

Conversation

@vasu2856
Copy link
Copy Markdown
Contributor

feat(dry-run): add deploy dry-run validation engine with comprehensive checkers

  • Add DryRunEngine orchestrating phase-specific validation checkers
  • Implement 11 specialized checkers (bootstrap, bundle, catalog, connectivity, dependency, git, manifest, permission, project, quicksight, storage, workflow)
  • Add PermissionChecker using iam:SimulatePrincipalPolicy for IAM validation
  • Add DependencyChecker validating pre-existing AWS resources and DataZone types
  • Add DryRunReport model collecting findings classified as OK/WARNING/ERROR
  • Integrate dry-run as pre-deployment validation step in deploy command
  • Add --dry-run flag for standalone validation mode
  • Add --skip-validation flag to bypass pre-deployment checks
  • Add --output option for JSON report generation
  • Include comprehensive unit and integration tests for all checkers
  • Add design, requirements, and testing documentation
  • Update CLI and project dependencies for dry-run support

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…e checkers

- Add DryRunEngine orchestrating phase-specific validation checkers
- Implement 11 specialized checkers (bootstrap, bundle, catalog, connectivity, dependency, git, manifest, permission, project, quicksight, storage, workflow)
- Add PermissionChecker using iam:SimulatePrincipalPolicy for IAM validation
- Add DependencyChecker validating pre-existing AWS resources and DataZone types
- Add DryRunReport model collecting findings classified as OK/WARNING/ERROR
- Integrate dry-run as pre-deployment validation step in deploy command
- Add --dry-run flag for standalone validation mode
- Add --skip-validation flag to bypass pre-deployment checks
- Add --output option for JSON report generation
- Include comprehensive unit and integration tests for all checkers
- Add design, requirements, and testing documentation
- Update CLI and project dependencies for dry-run support
@vasu2856 vasu2856 requested review from Shnekit and abaror75 March 25, 2026 01:45
Comment thread .kiro/specs/deploy-dry-run/requirements.md

## Introduction

The Deploy Dry Run feature adds a `--dry-run` option to the existing `smus-cicd deploy` command. When enabled, the CLI walks through every phase of the deployment pipeline — manifest loading, bundle exploration, project initialization, storage deployment, git deployment, catalog import, QuickSight dashboard deployment, workflow creation, and bootstrap actions — without creating, modifying, or deleting any actual resources. It also proactively verifies IAM permissions, S3 bucket accessibility, DataZone domain/project reachability, and catalog asset availability, producing a structured report of what would happen and any issues detected. The goal is to let operators confirm a deployment will succeed before committing to it, avoiding partial deployment failures.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, are we saying

If the --dry-run deploy does not fail, the actual deploy will not fail

?

We might want to stay away from such claims because a lot of things (like notebooks for example) are in customer control and we cannot predict whether these will execute okay.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, though i'm not able to see it anymore. Looks like its updated. Will do a search for it on other docs. Do keep an eye when you review the updated revision.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, my comment was around this phrasing:

The goal is to let operators confirm a deployment will succeed before committing to it,...

The deployment will not 100% succeed even after successful dry-run. We need to make it clear

Comment thread .kiro/specs/deploy-dry-run/requirements.md
1. WHEN a bundle archive path is provided, THE Dry_Run_Engine SHALL open the bundle archive and enumerate all files contained within it.
2. WHEN a bundle archive path is not provided, THE Dry_Run_Engine SHALL attempt to locate the bundle in the `./artifacts` directory using the same resolution logic as the deploy command.
3. THE Dry_Run_Engine SHALL verify that each storage item referenced in the Deployment_Configuration has corresponding files in the bundle or on the local filesystem.
4. THE Dry_Run_Engine SHALL verify that each git item referenced in the Deployment_Configuration has corresponding content in the bundle or is accessible via the configured repository URL.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we still do not have a github workflow for git stuff. We should add it to verify this.

2. WHEN dry-run mode is active, THE Permission_Checker SHALL verify that the current IAM identity has DataZone permissions (`datazone:GetDomain`, `datazone:GetProject`, `datazone:SearchListings`) required for the target domain and project.
3. WHEN the Deployment_Configuration includes catalog assets, THE Permission_Checker SHALL verify that the current IAM identity has catalog import permissions (`datazone:CreateAsset`, `datazone:CreateGlossary`, `datazone:CreateGlossaryTerm`, `datazone:CreateFormType`).
4. WHEN the manifest configures IAM role creation or update, THE Permission_Checker SHALL verify that the current IAM identity has `iam:CreateRole`, `iam:AttachRolePolicy`, and `iam:PutRolePolicy` permissions.
5. WHEN the manifest configures QuickSight dashboard deployment, THE Permission_Checker SHALL verify that the current IAM identity has QuickSight permissions (`quicksight:DescribeDashboard`, `quicksight:CreateDashboard`, `quicksight:UpdateDashboard`).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is actually another QuickSightServiceRole which is used by default to perform dashboard refresh where I faced issues last time: https://github.com/aws/CICD-for-SageMakerUnifiedStudio/tree/main/examples/analytic-workflow/dashboard-glue-quick#quicksight-dataset-refresh-fails

We might want to figure out, if we can set a different role for it in the examples

2. WHEN dry-run mode is active, THE Dry_Run_Engine SHALL simulate storage deployment and report the target S3 bucket, prefix, and file count for each storage item.
3. WHEN dry-run mode is active, THE Dry_Run_Engine SHALL simulate git deployment and report the target connection, repository, and file count for each git item.
4. WHEN dry-run mode is active AND the bundle contains catalog export data, THE Dry_Run_Engine SHALL simulate catalog import and report the count and types of catalog resources that would be created, updated, or deleted.
5. WHEN dry-run mode is active AND the manifest configures QuickSight dashboards, THE Dry_Run_Engine SHALL simulate QuickSight deployment and report which dashboards would be exported and imported.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wondering: do we have any more "special" resources like QuickSight? Does it make sense to create a unique logic for handling it?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Technically, the CLI directly interacts only with Quicksight and MWAA serverless services. I can explore what's needed in addition for MWAA-S


**User Story:** As a DevOps engineer, I want the dry run to verify that target AWS resources are reachable, so that I can detect network or configuration issues before deployment.

#### Acceptance Criteria
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we add necessary connection checks to this as well?

- Add form type status validation in target domain to detect DISABLED form types
- Implement _check_disabled_form_types method to query DataZone API for form type status
- Resolve target_domain_id and target_region in dry-run engine after manifest validation
- Add target_domain_id and target_region fields to DryRunContext for downstream checkers
- Update catalog_import to re-enable DISABLED form types during import via create_form_type upsert
- Add add_policy_grants integration test utility for catalog import workflows
- Enhance cleanup_catalog_resources with improved resource deletion handling
- Emit WARNING findings when form types exist but are DISABLED in target environment
2. WHEN the `--dry-run` flag is not provided, THE CLI SHALL execute the deploy command with its current behavior unchanged, preceded by an automatic dry-run validation step (see Requirement 14).
3. THE CLI SHALL accept the `--dry-run` flag in combination with all existing deploy options (`--manifest`, `--targets`, `--bundle-archive-path`, `--emit-events/--no-events`, `--event-bus-name`).
4. WHEN the `--dry-run` flag is provided, THE CLI SHALL suppress EventBridge event emission regardless of the `--emit-events` setting.
5. THE CLI SHALL accept a `--skip-validation` flag that, when provided, skips the automatic pre-deployment dry-run validation step and proceeds directly to deployment.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this like --force? Maybe we should use that since we already use it for many other commands, though it is less descriptive

Comment thread .kiro/specs/deploy-dry-run/requirements.md
Comment thread .kiro/specs/deploy-dry-run/requirements.md

1. WHEN the deploy command is invoked without `--dry-run` and without `--skip-validation`, THE CLI SHALL automatically execute the Dry_Run_Engine as a pre-deployment validation step before calling the actual deploy logic.
2. IF the pre-deployment dry-run validation produces one or more ERROR findings, THEN THE CLI SHALL abort the deployment, display the Dry_Run_Report, and exit with a non-zero exit code without executing any deployment actions.
3. IF the pre-deployment dry-run validation produces only OK and WARNING findings (zero ERROR findings), THEN THE CLI SHALL log a summary of the validation results and proceed with the actual deployment.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we request a go-ahead from the customer to allow them time to review the summary and the warnings?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if its practical to expect customers to enter input as part of CI/CD deployment. Instead, the option is setup to return error code, if there is a failure expected failing the step.


## Overview

The deploy dry-run feature adds a `--dry-run` flag to the existing `smus-cicd deploy` command. When active, the CLI executes every deployment phase in read-only mode: it loads and validates the manifest, explores the bundle archive, verifies IAM permissions, checks AWS resource reachability, simulates each deployment phase (project init, storage, git, catalog import, QuickSight, workflows, bootstrap actions), validates pre-existing resource dependencies (Glue Data Catalog resources including tables, views, and databases; data sources; form types; asset types), and produces a structured report — all without creating, modifying, or deleting any resources.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be aws-smus-cicd-cli throughout the PR

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah. Mainline merge hadn't happened. It should now all be 'aws-smus-cicd-cli'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rebase the branch against main, so that we get rid of them all?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i had already rebased it and but still saw a few of them. But they should all be addressed now.

Comment thread .kiro/specs/deploy-dry-run/design.md
Comment thread src/smus_cicd/cli.py Outdated
Comment thread .kiro/specs/deploy-dry-run/design.md
Comment thread .kiro/specs/deploy-dry-run/design.md
Comment thread docs/cli-commands.md Outdated
Comment thread docs/cli-commands.md Outdated
Comment thread docs/github-actions-integration.md Outdated
Comment thread docs/pipeline-deployment-metrics.md Outdated
Comment thread examples/catalog-import-export/app_tests/test_catalog_deployed.py Outdated
Comment thread src/smus_cicd/commands/dry_run/checkers/bootstrap_checker.py Outdated
Comment thread src/smus_cicd/commands/dry_run/checkers/connectivity_checker.py Outdated
# Resolve project_name — may come from config or target_config.
project_name = config.get("project_name")
if not project_name:
project_cfg = getattr(context.target_config, "project", None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is target_config?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

target config is the config of the target in the manifest.

Comment thread src/smus_cicd/commands/dry_run/checkers/connectivity_checker.py Outdated
Comment thread src/smus_cicd/commands/dry_run/checkers/connectivity_checker.py Outdated
Comment thread src/smus_cicd/commands/dry_run/checkers/connectivity_checker.py Outdated
Finding(
severity=Severity.WARNING,
message=(
f"DataZone project '{project_name}' not found "
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can check the manifest and see if the project is getting created through the CLI. If it is, then we are fine. If it is not, we should throw an ERROR. I do not think we need this warning at all

Comment thread src/smus_cicd/commands/dry_run/checkers/connectivity_checker.py Outdated
if not self._db_cache[db_name]:
findings.append(
Finding(
severity=Severity.ERROR,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we marking this one ERROR and the above ones WARNING? Are we advising the customer to have the data prior to deployment or are we blocking the deployment?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are expecting customers to have the assets with the same names.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then I think previous ones should be marked ERRORs as well, no?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. already fixed.

Comment thread src/smus_cicd/commands/dry_run/checkers/dependency_checker.py Outdated
Comment thread src/smus_cicd/commands/dry_run/checkers/git_checker.py
permissions: Dict[str, List[str]],
dz_arn: str,
) -> None:
"""Add catalog import and grant permissions when catalog assets exist."""
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But it can also contain catalog assets with disabled: true. We should probably include this edge-case

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. updated this already in the code to make sure we do this, only if disable is not true.

Copy link
Copy Markdown
Contributor

@Shnekit Shnekit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, reviewing. Please take a look at the comments :)

vasu2856 and others added 3 commits April 8, 2026 16:10
…ramework

- Add preflight_checker.py to consolidate pre-deployment validation logic
- Implement comprehensive dry-run validation step in GitHub Actions workflow before test deployment
- Update all dry-run checker modules with improved error handling and validation logic
- Enhance dry-run engine and models to support preflight validation patterns
- Update catalog import/export helpers and integration tests with refined validation
- Correct CLI command references from smus-cli to aws-smus-cicd-cli across documentation and specs
- Add unit tests for new preflight_checker module
- Update requirements and testing guide documentation for catalog import/export feature
- Improve manifest and quickstart documentation with updated command references
- Move DryRunEngine import after describe_command import for logical grouping
- Reformat deploy_command function calls to split arguments across multiple lines
- Improve code readability and maintain consistent import organization
@vasu2856 vasu2856 requested a review from Shnekit April 9, 2026 23:05
bundle_path = find_bundle_file(
"./artifacts",
context.manifest.application_name,
str(context.config.get("region", "")) if context.config else "",
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove defaults altogether and fail when the region is not present? With this, it is going to use empty string as the region, right?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup. updated the code to remove any region fallbacks and fail if region is not available.

Copy link
Copy Markdown
Contributor

@Shnekit Shnekit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see there are several comments which are not addressed, but I think they are all minor. Feel free to address them later, if needed

@vasu2856 vasu2856 merged commit aca32a8 into aws:main Apr 12, 2026
3 checks passed
@vasu2856
Copy link
Copy Markdown
Contributor Author

I see there are several comments which are not addressed, but I think they are all minor. Feel free to address them later, if needed

Will catchup offline to understand which comments were not addressed. Will post a separate PR for those.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants