Feature: dry run support for deploy command by vasu2856 · Pull Request #106 · aws/CICD-for-SageMakerUnifiedStudio

vasu2856 · 2026-03-25T01:45:53Z

feat(dry-run): add deploy dry-run validation engine with comprehensive checkers

Add DryRunEngine orchestrating phase-specific validation checkers
Implement 11 specialized checkers (bootstrap, bundle, catalog, connectivity, dependency, git, manifest, permission, project, quicksight, storage, workflow)
Add PermissionChecker using iam:SimulatePrincipalPolicy for IAM validation
Add DependencyChecker validating pre-existing AWS resources and DataZone types
Add DryRunReport model collecting findings classified as OK/WARNING/ERROR
Integrate dry-run as pre-deployment validation step in deploy command
Add --dry-run flag for standalone validation mode
Add --skip-validation flag to bypass pre-deployment checks
Add --output option for JSON report generation
Include comprehensive unit and integration tests for all checkers
Add design, requirements, and testing documentation
Update CLI and project dependencies for dry-run support

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

…e checkers - Add DryRunEngine orchestrating phase-specific validation checkers - Implement 11 specialized checkers (bootstrap, bundle, catalog, connectivity, dependency, git, manifest, permission, project, quicksight, storage, workflow) - Add PermissionChecker using iam:SimulatePrincipalPolicy for IAM validation - Add DependencyChecker validating pre-existing AWS resources and DataZone types - Add DryRunReport model collecting findings classified as OK/WARNING/ERROR - Integrate dry-run as pre-deployment validation step in deploy command - Add --dry-run flag for standalone validation mode - Add --skip-validation flag to bypass pre-deployment checks - Add --output option for JSON report generation - Include comprehensive unit and integration tests for all checkers - Add design, requirements, and testing documentation - Update CLI and project dependencies for dry-run support

Shnekit · 2026-03-27T19:54:06Z

+
+## Introduction
+
+The Deploy Dry Run feature adds a `--dry-run` option to the existing `smus-cicd deploy` command. When enabled, the CLI walks through every phase of the deployment pipeline — manifest loading, bundle exploration, project initialization, storage deployment, git deployment, catalog import, QuickSight dashboard deployment, workflow creation, and bootstrap actions — without creating, modifying, or deleting any actual resources. It also proactively verifies IAM permissions, S3 bucket accessibility, DataZone domain/project reachability, and catalog asset availability, producing a structured report of what would happen and any issues detected. The goal is to let operators confirm a deployment will succeed before committing to it, avoiding partial deployment failures.


Hmmm, are we saying

If the --dry-run deploy does not fail, the actual deploy will not fail

?

We might want to stay away from such claims because a lot of things (like notebooks for example) are in customer control and we cannot predict whether these will execute okay.

Good catch, though i'm not able to see it anymore. Looks like its updated. Will do a search for it on other docs. Do keep an eye when you review the updated revision.

Sorry, my comment was around this phrasing:

The goal is to let operators confirm a deployment will succeed before committing to it,...

The deployment will not 100% succeed even after successful dry-run. We need to make it clear

Shnekit · 2026-03-27T20:05:47Z

+1. WHEN a bundle archive path is provided, THE Dry_Run_Engine SHALL open the bundle archive and enumerate all files contained within it.
+2. WHEN a bundle archive path is not provided, THE Dry_Run_Engine SHALL attempt to locate the bundle in the `./artifacts` directory using the same resolution logic as the deploy command.
+3. THE Dry_Run_Engine SHALL verify that each storage item referenced in the Deployment_Configuration has corresponding files in the bundle or on the local filesystem.
+4. THE Dry_Run_Engine SHALL verify that each git item referenced in the Deployment_Configuration has corresponding content in the bundle or is accessible via the configured repository URL.


I think we still do not have a github workflow for git stuff. We should add it to verify this.

Shnekit · 2026-03-27T20:12:39Z

+2. WHEN dry-run mode is active, THE Permission_Checker SHALL verify that the current IAM identity has DataZone permissions (`datazone:GetDomain`, `datazone:GetProject`, `datazone:SearchListings`) required for the target domain and project.
+3. WHEN the Deployment_Configuration includes catalog assets, THE Permission_Checker SHALL verify that the current IAM identity has catalog import permissions (`datazone:CreateAsset`, `datazone:CreateGlossary`, `datazone:CreateGlossaryTerm`, `datazone:CreateFormType`).
+4. WHEN the manifest configures IAM role creation or update, THE Permission_Checker SHALL verify that the current IAM identity has `iam:CreateRole`, `iam:AttachRolePolicy`, and `iam:PutRolePolicy` permissions.
+5. WHEN the manifest configures QuickSight dashboard deployment, THE Permission_Checker SHALL verify that the current IAM identity has QuickSight permissions (`quicksight:DescribeDashboard`, `quicksight:CreateDashboard`, `quicksight:UpdateDashboard`).


There is actually another QuickSightServiceRole which is used by default to perform dashboard refresh where I faced issues last time: https://github.com/aws/CICD-for-SageMakerUnifiedStudio/tree/main/examples/analytic-workflow/dashboard-glue-quick#quicksight-dataset-refresh-fails

We might want to figure out, if we can set a different role for it in the examples

Shnekit · 2026-03-27T20:31:03Z

+2. WHEN dry-run mode is active, THE Dry_Run_Engine SHALL simulate storage deployment and report the target S3 bucket, prefix, and file count for each storage item.
+3. WHEN dry-run mode is active, THE Dry_Run_Engine SHALL simulate git deployment and report the target connection, repository, and file count for each git item.
+4. WHEN dry-run mode is active AND the bundle contains catalog export data, THE Dry_Run_Engine SHALL simulate catalog import and report the count and types of catalog resources that would be created, updated, or deleted.
+5. WHEN dry-run mode is active AND the manifest configures QuickSight dashboards, THE Dry_Run_Engine SHALL simulate QuickSight deployment and report which dashboards would be exported and imported.


Wondering: do we have any more "special" resources like QuickSight? Does it make sense to create a unique logic for handling it?

Technically, the CLI directly interacts only with Quicksight and MWAA serverless services. I can explore what's needed in addition for MWAA-S

Shnekit · 2026-03-27T20:33:25Z

+
+**User Story:** As a DevOps engineer, I want the dry run to verify that target AWS resources are reachable, so that I can detect network or configuration issues before deployment.
+
+#### Acceptance Criteria


Should we add necessary connection checks to this as well?

- Add form type status validation in target domain to detect DISABLED form types - Implement _check_disabled_form_types method to query DataZone API for form type status - Resolve target_domain_id and target_region in dry-run engine after manifest validation - Add target_domain_id and target_region fields to DryRunContext for downstream checkers - Update catalog_import to re-enable DISABLED form types during import via create_form_type upsert - Add add_policy_grants integration test utility for catalog import workflows - Enhance cleanup_catalog_resources with improved resource deletion handling - Emit WARNING findings when form types exist but are DISABLED in target environment

Shnekit · 2026-04-03T20:26:25Z

+2. WHEN the `--dry-run` flag is not provided, THE CLI SHALL execute the deploy command with its current behavior unchanged, preceded by an automatic dry-run validation step (see Requirement 14).
+3. THE CLI SHALL accept the `--dry-run` flag in combination with all existing deploy options (`--manifest`, `--targets`, `--bundle-archive-path`, `--emit-events/--no-events`, `--event-bus-name`).
+4. WHEN the `--dry-run` flag is provided, THE CLI SHALL suppress EventBridge event emission regardless of the `--emit-events` setting.
+5. THE CLI SHALL accept a `--skip-validation` flag that, when provided, skips the automatic pre-deployment dry-run validation step and proceeds directly to deployment.


Is this like --force? Maybe we should use that since we already use it for many other commands, though it is less descriptive

Shnekit · 2026-04-03T20:32:50Z

+
+1. WHEN the deploy command is invoked without `--dry-run` and without `--skip-validation`, THE CLI SHALL automatically execute the Dry_Run_Engine as a pre-deployment validation step before calling the actual deploy logic.
+2. IF the pre-deployment dry-run validation produces one or more ERROR findings, THEN THE CLI SHALL abort the deployment, display the Dry_Run_Report, and exit with a non-zero exit code without executing any deployment actions.
+3. IF the pre-deployment dry-run validation produces only OK and WARNING findings (zero ERROR findings), THEN THE CLI SHALL log a summary of the validation results and proceed with the actual deployment.


Shall we request a go-ahead from the customer to allow them time to review the summary and the warnings?

Not sure if its practical to expect customers to enter input as part of CI/CD deployment. Instead, the option is setup to return error code, if there is a failure expected failing the step.

Shnekit · 2026-04-03T20:39:39Z

+
+## Overview
+
+The deploy dry-run feature adds a `--dry-run` flag to the existing `smus-cicd deploy` command. When active, the CLI executes every deployment phase in read-only mode: it loads and validates the manifest, explores the bundle archive, verifies IAM permissions, checks AWS resource reachability, simulates each deployment phase (project init, storage, git, catalog import, QuickSight, workflows, bootstrap actions), validates pre-existing resource dependencies (Glue Data Catalog resources including tables, views, and databases; data sources; form types; asset types), and produces a structured report — all without creating, modifying, or deleting any resources.


Should be aws-smus-cicd-cli throughout the PR

yeah. Mainline merge hadn't happened. It should now all be 'aws-smus-cicd-cli'

Can you rebase the branch against main, so that we get rid of them all?

i had already rebased it and but still saw a few of them. But they should all be addressed now.

Shnekit · 2026-04-06T15:21:14Z

+        # Resolve project_name — may come from config or target_config.
+        project_name = config.get("project_name")
+        if not project_name:
+            project_cfg = getattr(context.target_config, "project", None)


What is target_config?

target config is the config of the target in the manifest.

Shnekit · 2026-04-06T15:26:24Z

+                    Finding(
+                        severity=Severity.WARNING,
+                        message=(
+                            f"DataZone project '{project_name}' not found "


We can check the manifest and see if the project is getting created through the CLI. If it is, then we are fine. If it is not, we should throw an ERROR. I do not think we need this warning at all

Shnekit · 2026-04-06T16:24:25Z

+            if not self._db_cache[db_name]:
+                findings.append(
+                    Finding(
+                        severity=Severity.ERROR,


Why are we marking this one ERROR and the above ones WARNING? Are we advising the customer to have the data prior to deployment or are we blocking the deployment?

we are expecting customers to have the assets with the same names.

Then I think previous ones should be marked ERRORs as well, no?

yup. already fixed.

Shnekit · 2026-04-06T20:04:00Z

+        permissions: Dict[str, List[str]],
+        dz_arn: str,
+    ) -> None:
+        """Add catalog import and grant permissions when catalog assets exist."""


But it can also contain catalog assets with disabled: true. We should probably include this edge-case

yup. updated this already in the code to make sure we do this, only if disable is not true.

Shnekit

Done, reviewing. Please take a look at the comments :)

…ramework - Add preflight_checker.py to consolidate pre-deployment validation logic - Implement comprehensive dry-run validation step in GitHub Actions workflow before test deployment - Update all dry-run checker modules with improved error handling and validation logic - Enhance dry-run engine and models to support preflight validation patterns - Update catalog import/export helpers and integration tests with refined validation - Correct CLI command references from smus-cli to aws-smus-cicd-cli across documentation and specs - Add unit tests for new preflight_checker module - Update requirements and testing guide documentation for catalog import/export feature - Improve manifest and quickstart documentation with updated command references

- Move DryRunEngine import after describe_command import for logical grouping - Reformat deploy_command function calls to split arguments across multiple lines - Improve code readability and maintain consistent import organization

Shnekit · 2026-04-10T21:17:48Z

+            bundle_path = find_bundle_file(
+                "./artifacts",
+                context.manifest.application_name,
+                str(context.config.get("region", "")) if context.config else "",


Can we remove defaults altogether and fail when the region is not present? With this, it is going to use empty string as the region, right?

yup. updated the code to remove any region fallbacks and fail if region is not available.

Shnekit

I see there are several comments which are not addressed, but I think they are all minor. Feel free to address them later, if needed

vasu2856 · 2026-04-12T22:09:44Z

I see there are several comments which are not addressed, but I think they are all minor. Feel free to address them later, if needed

Will catchup offline to understand which comments were not addressed. Will post a separate PR for those.

vasu2856 requested review from Shnekit and abaror75 March 25, 2026 01:45

Shnekit reviewed Mar 27, 2026

View reviewed changes

Comment thread .kiro/specs/deploy-dry-run/requirements.md

Shnekit reviewed Mar 27, 2026

View reviewed changes

Comment thread .kiro/specs/deploy-dry-run/requirements.md

Shnekit reviewed Mar 27, 2026

View reviewed changes