feat(gitops-update): manifest-driven topology + anacleto cluster#212
feat(gitops-update): manifest-driven topology + anacleto cluster#212
Conversation
…to support Introduce config/deployment-matrix.yaml as the single source of truth for which apps deploy to which clusters. The workflow now reads the manifest at the same pinned ref as itself (sparse checkout) and resolves the cluster set from app_name. Adds anacleto as the third deployment target. deploy_in_<cluster> inputs become force-off overrides — they subtract clusters from the manifest-resolved set but cannot add a cluster the manifest does not list. This prevents accidental cross-cluster spillover while still allowing emergency containment. Adds src/lint/deployment-matrix composite (Python embedded, follows the composite-schema pattern) that validates schema, app/cluster integrity, duplicates, and orphan apps. Wired into self-pr-validation as a gated job that only runs when config/deployment-matrix.yaml changes. The manifest topology was inferred empirically from the GitOps repo by cross-referencing folder presence with CI commit history — only apps that are real callers of this workflow are included (excludes apps managed manually like underwriter, jd-mock-api, mock-btg-server, control-plane, platform-console, ledger, dockerhub-secret).
WalkthroughSwitches gitops-update to manifest-driven cluster resolution via Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
🛡️ CodeQL Analysis ResultsLanguages analyzed: Found 2 issue(s): 2 Medium
🔍 View full scan logs | 🛡️ Security tab |
🔍 Lint Analysis
|
…tution Address PR #212 lint failures: 1. Pin all `actions/checkout@v6` occurrences in self-pr-validation.yml to the SHA already used in gitops-update.yml. Required by pinned-actions lint for external (non-LerianStudio) actions. Also clears pre-existing tech debt in this file that surfaced because the new deployment-matrix job touched it. 2. Replace `echo "$RESOLVED" | sed 's/^/ - /'` with a bash `while read` loop in the resolve_clusters step. Fixes shellcheck SC2001 (prefer bash parameter expansion over sed for simple substitutions).
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/gitops-update-workflow.md (1)
199-209:⚠️ Potential issue | 🟡 MinorFinish the terminology migration from
servertocluster.These sections still document the path placeholders, sync naming, and examples in terms of
<server>/ “selected servers”, even though the workflow now resolves clusters from the deployment matrix. That leaves the docs internally inconsistent right where callers look for path and ArgoCD naming rules.As per coding guidelines,
docs/gitops-update-workflow.mdmust update naming/path conventions to use<cluster>in GitOps paths and ArgoCD app naming (not the old single-server<server>terminology).Also applies to: 240-263
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@docs/gitops-update-workflow.md` around lines 199 - 209, The docs still use the old placeholder `<server>` and phrases like “selected servers” in the GitOps path and ArgoCD naming examples; update all occurrences in docs/gitops-update-workflow.md to use `<cluster>` and “selected clusters” instead, e.g. change the path template to gitops/environments/<cluster>/helmfile/applications/<env>/<app_name>/values.yaml and adjust any ArgoCD app naming examples to reference cluster (also apply the same replacements in the later section referenced as lines 240-263); ensure wording and examples are consistent with the deployment matrix terminology and keep the `<env>` values unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/gitops-update.yml:
- Around line 195-198: The command that sets RESOLVED currently silences errors
with "|| true", masking parse/query failures as an empty RESOLVED; remove the
"|| true" fallback so yq/jq errors propagate and fail the job, i.e. change the
RESOLVED assignment (the pipeline using yq -o=json '.' "$MANIFEST" | jq -r --arg
app "$APP_NAME" '.clusters | to_entries[] | select(.value.apps // [] |
index($app)) | .key' | sort -u) to not append "|| true"; ensure the rest of the
script still treats an actual empty RESOLVED as the legitimate “no matching
clusters” case and let the pipeline fail on real yq/jq errors.
In `@config/deployment-matrix.yaml`:
- Around line 1-109: The file was added with a .yaml extension which violates
the repo convention; rename deployment-matrix.yaml to deployment-matrix.yml and
update all references in this PR (workflow inputs, docs, and any helper scripts)
so the new path is used; ensure the renamed manifest still contains the same
top-level keys (version, apps.registry, clusters) and that any CI/workflow that
reads the file (gitops-update.yml or helper script) is updated to point to
config/deployment-matrix.yml before merging.
---
Outside diff comments:
In `@docs/gitops-update-workflow.md`:
- Around line 199-209: The docs still use the old placeholder `<server>` and
phrases like “selected servers” in the GitOps path and ArgoCD naming examples;
update all occurrences in docs/gitops-update-workflow.md to use `<cluster>` and
“selected clusters” instead, e.g. change the path template to
gitops/environments/<cluster>/helmfile/applications/<env>/<app_name>/values.yaml
and adjust any ArgoCD app naming examples to reference cluster (also apply the
same replacements in the later section referenced as lines 240-263); ensure
wording and examples are consistent with the deployment matrix terminology and
keep the `<env>` values unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9dc319dc-e1d1-407d-8f29-4e3e18ff298f
📒 Files selected for processing (8)
.github/workflows/gitops-update.yml.github/workflows/self-pr-validation.ymlconfig/deployment-matrix.yamldocs/gitops-update-workflow.mdsrc/lint/deployment-matrix/README.mdsrc/lint/deployment-matrix/action.ymlsrc/notify/pr-lint-reporter/README.mdsrc/notify/pr-lint-reporter/action.yml
Resolves 6 medium-severity findings from github-advanced-security:
CODE INJECTION (4 findings — actions/code-injection/medium):
- Move `${{ github.workflow_ref }}` to step env: WORKFLOW_REF
- Bonus: replace `echo | sed -E 's|.*@||'` with bash `${VAR##*@}`
- Eliminates injection vectors at lines 106 + 108
- Move resolve_clusters outputs (has_clusters, clusters) to step env:
HAS_CLUSTERS + RESOLVED_SERVERS in apply_tags step
- Move inputs.yaml_key_mappings + inputs.configmap_updates to step env:
MAPPINGS + CONFIGMAP_MAPPINGS
- Replace `${{ env.IS_BETA/RC/PRODUCTION/SANDBOX }}` with direct
`$IS_BETA/...` (already in job-level env, no need to re-interpolate)
- Replace `${{ github.ref }}` with `${GITHUB_REF}` (auto-set by runner)
UNTRUSTED CHECKOUT (2 findings — actions/untrusted-checkout/medium):
- Add `persist-credentials: false` to manifest sparse checkout (read-only,
no credentials needed, never executes code from this checkout)
- Document trust model inline for the GitOps repo checkout (workflow_call
is not triggered by untrusted PRs; inputs.gitops_repository comes from
trusted internal callers; MANAGE_TOKEN is required for the subsequent
commit/push step, so we cannot drop persist-credentials there)
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
♻️ Duplicate comments (1)
.github/workflows/gitops-update.yml (1)
195-198:⚠️ Potential issue | 🟠 MajorDo not swallow manifest/query failures here.
The trailing
|| truestill turns malformed YAML, missing tools, or a brokenjqquery into the same empty-resolution path as “app not registered”, so the workflow exits successfully instead of surfacing a real topology failure.Suggested fix
RESOLVED=$(yq -o=json '.' "$MANIFEST" \ | jq -r --arg app "$APP_NAME" \ '.clusters | to_entries[] | select(.value.apps // [] | index($app)) | .key' \ - | sort -u || true) + | sort -u)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/gitops-update.yml around lines 195 - 198, The RESOLVED assignment is swallowing failures because of the trailing "|| true"; update the RESOLVED logic (the pipeline using yq/jq with MANIFEST and APP_NAME) to detect and propagate real errors instead of converting them to an empty value: remove the "|| true", capture and check the pipeline/command exit status (or enable errexit for the step) and, on error, log the failing command and exit non‑zero so malformed YAML, missing tools, or bad jq queries fail the workflow rather than being treated as “app not registered.”
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/gitops-update.yml:
- Around line 99-119: The code currently parses github.workflow_ref (which
points to the caller) and may checkout the wrong ref; add a required workflow
input named workflow_ref (default callers should pass ${{ github.workflow_ref }}
or their known pinned ref) and replace the sed parsing logic with using that
input directly: stop extracting REF from github.workflow_ref in the Resolve
shared-workflows ref step and instead set ref="${{ inputs.workflow_ref }}" (or
assign it to GITHUB_OUTPUT via steps.shared_ref) so the Checkout deployment
matrix manifest step uses the passed workflow_ref; update the reusable workflow
inputs to include workflow_ref and adjust any references to
steps.shared_ref.outputs.ref or similar to source the new input.
---
Duplicate comments:
In @.github/workflows/gitops-update.yml:
- Around line 195-198: The RESOLVED assignment is swallowing failures because of
the trailing "|| true"; update the RESOLVED logic (the pipeline using yq/jq with
MANIFEST and APP_NAME) to detect and propagate real errors instead of converting
them to an empty value: remove the "|| true", capture and check the
pipeline/command exit status (or enable errexit for the step) and, on error, log
the failing command and exit non‑zero so malformed YAML, missing tools, or bad
jq queries fail the workflow rather than being treated as “app not registered.”
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 5a5313c9-d193-45b6-9ffd-2619d0b198e0
📒 Files selected for processing (2)
.github/workflows/gitops-update.yml.github/workflows/self-pr-validation.yml
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/gitops-update.yml:
- Around line 18-29: The deploy_in_* inputs (deploy_in_firmino,
deploy_in_clotilde, deploy_in_anacleto) were changed to "force-off only"
semantics which can silently break callers who set them to true expecting to add
clusters; update the code and docs to mitigate this by (1) documenting the
semantic change in the migration guide and calling out that apps must be listed
in the deployment matrix to be deployed, and (2) adding runtime warnings in the
deployment input-processing logic that detect when a caller passed
deploy_in_<cluster>: true but the app is not present in that cluster's manifest
(emit a clear warning mentioning the cluster and app name); implement the
warning near the existing validation that currently warns only when an app is in
no cluster so it reuses the same validation flow and message style.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 2b5758c8-abf1-471d-ab63-9475e540737a
📒 Files selected for processing (1)
.github/workflows/gitops-update.yml
1. [CRITICAL] Replace `github.workflow_ref` with `github.job_workflow_sha` for manifest checkout. In reusable workflows, github.workflow_ref points to the CALLER's workflow file/ref, not the called reusable workflow — my previous design would have failed for every external caller. `job_workflow_sha` is the commit SHA of the running reusable workflow, which is exactly what we need. Bonus: SHA is more secure than textual ref, and removes the need for the `Resolve shared-workflows ref` step entirely (−18 lines). 2. [HIGH] Remove `|| true` from the RESOLVED pipeline. Silenced yq/jq failures would collapse into the "app not registered" warning path, hiding real manifest/query errors. Now fails fast on parse errors; empty RESOLVED from a successful query remains the legitimate "no matching clusters" case (handled explicitly below). 3. [MEDIUM] Rename config/deployment-matrix.yaml → .yml to match the repo convention (77 .yml files vs 2 .yaml). Updated all references: workflow input default, self-pr-validation gate, composite default, README docs, and the workflow doc. 4. [LOW] Add prominent migration callout to docs about deploy_in_* semantic change — apps must be in the manifest; inputs only subtract. Declined: per-cluster warning when deploy_in_<cluster>: true but app is absent from that cluster's manifest list. Inputs default to true, so this would fire for every app missing from any cluster on every run — noise without signal. Existing "app in zero clusters" warning already covers the actionable case.
actionlint v1.7.x (pinned via raven-actions/actionlint@v2.1.2) does not
yet include `github.job_workflow_sha` in its GitHub context schema,
triggering a false-positive "property not defined" error on the previous
direct reference.
Replace the inline `${{ github.job_workflow_sha }}` expression with an
intermediate step that reads the equivalent auto-set env var
GITHUB_JOB_WORKFLOW_SHA and exports it as a step output. Functionally
identical (the runner populates both from the same source) but the
`steps.X.outputs.Y` expression is recognized by actionlint.
Also adds a defensive guard that fails fast if GITHUB_JOB_WORKFLOW_SHA
is empty — which would mean the workflow is being called outside a
reusable-workflow context, catching that misconfiguration loudly.
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
.github/workflows/gitops-update.yml (2)
7-33:⚠️ Potential issue | 🟠 MajorAdd the required
dry_runinput before merging.This reusable workflow still mutates the GitOps repo and can trigger ArgoCD sync, but
on.workflow_call.inputsdoes not expose adry_runboolean. That leaves callers without the required preview path for the new manifest-driven rollout. As per coding guidelines, "Every reusable workflow must supportworkflow_calltrigger, expose explicitinputs, and always include adry_runinput (type: boolean,default: false)."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/gitops-update.yml around lines 7 - 33, The workflow is missing the required dry_run input under workflow_call.inputs; add a new boolean input named dry_run with type: boolean and default: false alongside the existing inputs (e.g., near gitops_repository, app_name, deploy_in_firmino, etc.) so callers can opt into preview mode; ensure the input key is exactly dry_run and its default is false to satisfy the reusable-workflow guideline and avoid mutating the GitOps repo when callers request a dry run.
71-83:⚠️ Potential issue | 🟠 MajorDeclare an explicit workflow-level
permissions:block.
permissionsis still undeclared, so this workflow inherits repository defaults. For a shared reusable workflow that checks out repositories and downloads artifacts, that is broader than the repo baseline and should be pinned to least privilege explicitly. As per coding guidelines, "Always declare explicitpermissions:block at workflow level; default tocontents: read; only escalate per-job when needed."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/gitops-update.yml around lines 71 - 83, Add an explicit workflow-level permissions block to the reusable workflow so it no longer inherits repo defaults: declare permissions: contents: read (and any other least-privilege scopes required) at the top-level of the workflow file that contains the update_gitops job, then if a specific step or job (e.g., update_gitops job, steps: checkout or artifact download) needs extra scopes, escalate only within that job via a per-job permissions override; ensure the workflow-level block is present before jobs: and limits privileges to the minimum.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/gitops-update.yml:
- Around line 215-220: The current "no-cluster" branches set has_clusters=false
in the resolve step (writing to GITHUB_OUTPUT) but downstream artifact
download/validation and update/commit steps still run and can fail before the
no-op is honored; update the workflow so any steps that download artifacts,
validate, or perform updates (the artifact/download/validate and commit steps)
are conditional on steps.resolve_clusters.outputs.has_clusters == 'true' (or
move an early conditional/if that exits the job immediately when RESOLVED is
empty or has_clusters == 'false'), and ensure the resolve step writes the output
key has_clusters consistently (GITHUB_OUTPUT) so the later if conditions (e.g.,
uses of steps.resolve_clusters.outputs.has_clusters) can gate those steps and
prevent them from running when an app is not in any cluster.
---
Outside diff comments:
In @.github/workflows/gitops-update.yml:
- Around line 7-33: The workflow is missing the required dry_run input under
workflow_call.inputs; add a new boolean input named dry_run with type: boolean
and default: false alongside the existing inputs (e.g., near gitops_repository,
app_name, deploy_in_firmino, etc.) so callers can opt into preview mode; ensure
the input key is exactly dry_run and its default is false to satisfy the
reusable-workflow guideline and avoid mutating the GitOps repo when callers
request a dry run.
- Around line 71-83: Add an explicit workflow-level permissions block to the
reusable workflow so it no longer inherits repo defaults: declare permissions:
contents: read (and any other least-privilege scopes required) at the top-level
of the workflow file that contains the update_gitops job, then if a specific
step or job (e.g., update_gitops job, steps: checkout or artifact download)
needs extra scopes, escalate only within that job via a per-job permissions
override; ensure the workflow-level block is present before jobs: and limits
privileges to the minimum.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 9dbe78ac-275b-42a0-a723-793548e8c828
📒 Files selected for processing (1)
.github/workflows/gitops-update.yml
…ssuming auto env var GITHUB_JOB_WORKFLOW_SHA is not exposed automatically by the runner. The github.job_workflow_sha context must be mapped explicitly through the step's env: block like any other context value. Prior implementation relied on a nonexistent auto env var and failed with 'is this job really running as part of a reusable workflow?' on every execution. Validated against real run: https://github.com/LerianStudio/plugin-br-pix-indirect-btg/actions/runs/24458387402/job/71466177318
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
.github/workflows/gitops-update.yml (1)
91-103:⚠️ Potential issue | 🟠 MajorAdd repository allowlist guard before checkout.
inputs.gitops_repositoryremains completely caller-controlled. The workflow passessecrets.MANAGE_TOKENdirectly toactions/checkout, which (per v6 defaults) persists credentials for subsequentgit pushoperations in the same job. Without a validation guard, any caller can direct this privileged token to arbitrary repositories.Enforce an allowlist of permitted GitOps repositories before the checkout step. The suggested guard pattern (case statement validating against
LerianStudio/midaz-firmino-gitops|LerianStudio/midaz-clotilde-gitops|LerianStudio/midaz-anacleto-gitops) blocks this privilege escalation.Suggested guard
+ - name: Validate GitOps repository target + shell: bash + env: + GITOPS_REPOSITORY: ${{ inputs.gitops_repository }} + run: | + set -euo pipefail + case "$GITOPS_REPOSITORY" in + LerianStudio/midaz-firmino-gitops|LerianStudio/midaz-clotilde-gitops|LerianStudio/midaz-anacleto-gitops) + ;; + *) + echo "::error::Unsupported gitops_repository: $GITOPS_REPOSITORY" + exit 1 + ;; + esac + - name: Checkout GitOps Repository uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6 with: repository: ${{ inputs.gitops_repository }}🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/gitops-update.yml around lines 91 - 103, Add a repository allowlist check before the "Checkout GitOps Repository" step that validates inputs.gitops_repository against the approved list (e.g., LerianStudio/midaz-firmino-gitops|LerianStudio/midaz-clotilde-gitops|LerianStudio/midaz-anacleto-gitops); implement this as a step that uses a shell `case` or `if` to compare ${{ inputs.gitops_repository }} and fail the workflow (exit non‑zero) when it is not on the allowlist so the subsequent actions/checkout step (which uses token: ${{ secrets.MANAGE_TOKEN }}) never runs for unapproved repositories. Ensure the step name and condition reference inputs.gitops_repository and that the checkout step remains unchanged other than being gated by the new validation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/gitops-update.yml:
- Around line 30-33: Remove the public input "deployment_matrix_file" so callers
cannot override the topology source; instead hard-code or load the canonical
path "config/deployment-matrix.yml" (from the pinned github.job_workflow_sha)
and perform sparse-checkout only for that exact file into
"shared-workflows/...". If you must accept overrides, replace the free-form
input "deployment_matrix_file" with a strict allowlist validation function that
rejects meta-patterns (globs, ../, absolute paths) and only permits enumerated
filenames, and ensure the checkout step uses the validated/whitelisted filename
rather than concatenating unchecked input.
---
Outside diff comments:
In @.github/workflows/gitops-update.yml:
- Around line 91-103: Add a repository allowlist check before the "Checkout
GitOps Repository" step that validates inputs.gitops_repository against the
approved list (e.g.,
LerianStudio/midaz-firmino-gitops|LerianStudio/midaz-clotilde-gitops|LerianStudio/midaz-anacleto-gitops);
implement this as a step that uses a shell `case` or `if` to compare ${{
inputs.gitops_repository }} and fail the workflow (exit non‑zero) when it is not
on the allowlist so the subsequent actions/checkout step (which uses token: ${{
secrets.MANAGE_TOKEN }}) never runs for unapproved repositories. Ensure the step
name and condition reference inputs.gitops_repository and that the checkout step
remains unchanged other than being gated by the new validation.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 6ece0172-5adc-4f1d-ad4d-de207071971b
📒 Files selected for processing (1)
.github/workflows/gitops-update.yml
Drops the 'Resolve reusable workflow SHA' step entirely — github.job_workflow_sha is empty when evaluated inside a job of a reusable workflow invoked via jobs.X.uses (empirically confirmed on run 24461037331). Three prior attempts to source that SHA all failed for different reasons: - parsing github.workflow_ref: points to the caller, not the reusable - GITHUB_JOB_WORKFLOW_SHA env var: does not exist - github.job_workflow_sha context: empty in this evaluation context This commit is a TEMP workaround for end-to-end validation: manifest checkout is hardcoded to the feature branch. Before merging #212 this will be replaced with a proper 'deployment_matrix_ref' input (default 'main').
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/gitops-update.yml:
- Around line 104-118: The workflow hardcodes the checkout ref to
feat/gitops-deployment-matrix-anacleto which will break post-merge; add a
workflow input deployment_matrix_ref with default "main" and replace the
hardcoded ref in the actions/checkout step (the step using
actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd and path:
shared-workflows) to use ${{ inputs.deployment_matrix_ref }}; alternatively, if
github.job_workflow_sha is now reliable, set the ref to that as a fallback
(e.g., prefer inputs.deployment_matrix_ref and fall back to
github.job_workflow_sha).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 1f52bd39-4857-47d2-8aee-2c7b01277cc7
📒 Files selected for processing (1)
.github/workflows/gitops-update.yml
… external action The LerianStudio/github-actions-argocd-sync action suppresses stderr via '> /dev/null 2>&1' on every CLI invocation. Any failure (auth, permission, network, malformed URL, expired token) is rendered indistinguishable from 'app does not exist' and skipped silently when skip-if-not-exists=true. Replaces the external action with inline argocd CLI calls that surface the real error output. Preserves the skip-if-not-exists semantics (warn + exit 0 on app get failure), but syncs fail the job loudly.
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
.github/workflows/gitops-update.yml (5)
7-8:⚠️ Potential issue | 🟠 MajorMissing explicit
permissions:block.Workflow-level
permissionsis required per coding guidelines. This workflow writes to GitOps repo and triggers ArgoCD sync but currently inherits caller permissions without bounds.Suggested fix
on: workflow_call: + +permissions: + contents: readAs per coding guidelines: "Always declare explicit
permissions:block at workflow level; default tocontents: read; only escalate per-job when needed."🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/gitops-update.yml around lines 7 - 8, Add an explicit workflow-level permissions block to the top-level workflow that currently begins with "on: workflow_call"; declare at minimum "contents: read" and only escalate per-job where needed (e.g., add "permissions:" followed by "contents: read" and any additional scoped permissions required for GitOps repo writes or ArgoCD triggers). Update the workflow that defines on: workflow_call to include this permissions block so the workflow no longer inherits caller permissions and only grants the minimal required scopes.
636-647:⚠️ Potential issue | 🔴 CriticalLocal path
uses: ./.github/workflows/slack-notify.ymlwill fail for external callers.When an external repo calls
gitops-update.yml, theuses: ./.github/workflows/slack-notify.ymlresolves to the caller's workspace, notgithub-actions-shared-workflows. The notify job will fail with "workflow not found" for every external caller.Use an absolute external ref pinned to the same mechanism as other cross-repo calls (e.g., a
deployment_matrix_refinput or hardcode to a stable ref for now).Suggested fix
notify: name: Notify needs: [update_gitops, argocd_sync] if: always() - uses: ./.github/workflows/slack-notify.yml + uses: LerianStudio/github-actions-shared-workflows/.github/workflows/slack-notify.yml@develop with:Based on learnings: "In LerianStudio/github-actions-shared-workflows, reusable workflows called from external repositories must reference them via an absolute external ref."
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/gitops-update.yml around lines 636 - 647, The notify job's reusable workflow reference (uses: ./.github/workflows/slack-notify.yml) will break for external callers; change the uses value to an absolute repo ref (e.g., uses: LerianStudio/github-actions-shared-workflows/.github/workflows/slack-notify.yml@${{ inputs.deployment_matrix_ref }} or a pinned tag/sha) so callers resolve the workflow in this repo rather than the caller's workspace; update the workflow inputs (or add an inputs.deployment_matrix_ref default) and replace the local relative path in the notify job with that external ref (keep the same with: and secrets: entries).
8-70:⚠️ Potential issue | 🟠 MajorMissing
dry_runinput for state-changing workflow.This workflow unconditionally commits/pushes to the GitOps repository and triggers ArgoCD sync. Per coding guidelines, every reusable workflow that applies state changes must include a
dry_runinput (boolean, default: false). When enabled, it should output verbose diagnostics and skip actual writes.Suggested addition
configmap_updates: description: 'JSON object mapping artifact names to configmap keys (e.g., {"pix.tag": ".pix.configmap.VERSION"})' type: string required: false + dry_run: + description: 'When true, skip commit/push and ArgoCD sync; output verbose diagnostics only.' + type: boolean + default: falseThen guard the commit/push and ArgoCD sync steps with
if: ${{ !inputs.dry_run }}.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/gitops-update.yml around lines 8 - 70, The workflow is missing a required dry_run input which should allow running without state changes; add a boolean input named dry_run (default: false) to the workflow_call inputs and update the commit/push and ArgoCD sync steps so they only execute when dry_run is false (use if: ${{ !inputs.dry_run }}). Also when dry_run is true, ensure the job emits verbose diagnostics/logging instead of performing writes—e.g., enhance the steps that perform git commit/push and the ArgoCD sync step (look for steps referencing commit/push and the ArgoCD sync action) to skip writes and print detailed output when inputs.dry_run is true.
151-161: 🧹 Nitpick | 🔵 TrivialSame pattern: pass
yq_versionthrough env.- name: Install yq shell: bash + env: + YQ_VERSION: ${{ inputs.yq_version }} run: | set -euo pipefail - YQ_VERSION="${{ inputs.yq_version }}" YQ_BIN="yq_linux_amd64"🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/gitops-update.yml around lines 151 - 161, The step currently reads the version from the workflow input variable inputs.yq_version; change it to read from an environment variable instead so callers can pass yq_version via env. Update the script to use YQ_VERSION from the environment (reference YQ_VERSION, YQ_BIN, YQ_URL) e.g. read YQ_VERSION="${{ env.yq_version }}" and ensure the job or workflow step defines env: yq_version: ${{ inputs.yq_version }} (or other env source) so the install step uses process environment rather than directly referencing inputs.yq_version.
120-150: 🧹 Nitpick | 🔵 TrivialPrefer env vars over direct
${{ inputs.* }}interpolation in shell.Lines 125, 128, 134, 137, 143, 146 interpolate
inputs.app_name,inputs.commit_message_prefix, andinputs.artifact_patterndirectly in therun:block. While these areworkflow_callinputs from trusted callers (not untrusted PR/issue content), the defensive pattern is to pass throughenv:like theresolve_clustersstep does.Suggested refactor
- name: Setup application name and paths id: setup shell: bash + env: + INPUT_APP_NAME: ${{ inputs.app_name }} + INPUT_COMMIT_PREFIX: ${{ inputs.commit_message_prefix }} + INPUT_ARTIFACT_PATTERN: ${{ inputs.artifact_pattern }} run: | # Extract app name from repository if not provided - if [[ -z "${{ inputs.app_name }}" ]]; then + if [[ -z "$INPUT_APP_NAME" ]]; then APP_NAME="${GITHUB_REPOSITORY##*/}" else - APP_NAME="${{ inputs.app_name }}" + APP_NAME="$INPUT_APP_NAME" fi # ... similar for other inputs🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/gitops-update.yml around lines 120 - 150, The run block in the setup step uses direct `${{ inputs.* }}` interpolation inside the shell (for inputs.app_name, inputs.commit_message_prefix, inputs.artifact_pattern); change this to expose those workflow_call inputs via environment variables and then reference the env vars (e.g., APP_NAME, COMMIT_PREFIX, ARTIFACT_PATTERN) inside the run script (similar to how resolve_clusters handles inputs), so update the step to add an env: mapping from inputs to APP_NAME/COMMIT_PREFIX/ARTIFACT_PATTERN and replace all `${{ inputs.* }}` occurrences in the shell logic with the corresponding `$APP_NAME`, `$COMMIT_PREFIX`, `$ARTIFACT_PATTERN` variables.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/gitops-update.yml:
- Line 602: The ArgoCD sync step currently uses "set -uo pipefail" without "-e",
causing inconsistency with other steps; either add a short inline comment near
the "set -uo pipefail" invocation explaining that "-e" is intentionally omitted
because the script uses explicit "if ! ...; then ...; fi" guards and manual
exits, or include "-e" (change to "set -euo pipefail") and remove/adjust
redundant manual exit handling; update the ArgoCD sync step where "set -uo
pipefail" appears to make one of these two approaches explicit to maintain
consistency and clarity.
---
Outside diff comments:
In @.github/workflows/gitops-update.yml:
- Around line 7-8: Add an explicit workflow-level permissions block to the
top-level workflow that currently begins with "on: workflow_call"; declare at
minimum "contents: read" and only escalate per-job where needed (e.g., add
"permissions:" followed by "contents: read" and any additional scoped
permissions required for GitOps repo writes or ArgoCD triggers). Update the
workflow that defines on: workflow_call to include this permissions block so the
workflow no longer inherits caller permissions and only grants the minimal
required scopes.
- Around line 636-647: The notify job's reusable workflow reference (uses:
./.github/workflows/slack-notify.yml) will break for external callers; change
the uses value to an absolute repo ref (e.g., uses:
LerianStudio/github-actions-shared-workflows/.github/workflows/slack-notify.yml@${{
inputs.deployment_matrix_ref }} or a pinned tag/sha) so callers resolve the
workflow in this repo rather than the caller's workspace; update the workflow
inputs (or add an inputs.deployment_matrix_ref default) and replace the local
relative path in the notify job with that external ref (keep the same with: and
secrets: entries).
- Around line 8-70: The workflow is missing a required dry_run input which
should allow running without state changes; add a boolean input named dry_run
(default: false) to the workflow_call inputs and update the commit/push and
ArgoCD sync steps so they only execute when dry_run is false (use if: ${{
!inputs.dry_run }}). Also when dry_run is true, ensure the job emits verbose
diagnostics/logging instead of performing writes—e.g., enhance the steps that
perform git commit/push and the ArgoCD sync step (look for steps referencing
commit/push and the ArgoCD sync action) to skip writes and print detailed output
when inputs.dry_run is true.
- Around line 151-161: The step currently reads the version from the workflow
input variable inputs.yq_version; change it to read from an environment variable
instead so callers can pass yq_version via env. Update the script to use
YQ_VERSION from the environment (reference YQ_VERSION, YQ_BIN, YQ_URL) e.g. read
YQ_VERSION="${{ env.yq_version }}" and ensure the job or workflow step defines
env: yq_version: ${{ inputs.yq_version }} (or other env source) so the install
step uses process environment rather than directly referencing
inputs.yq_version.
- Around line 120-150: The run block in the setup step uses direct `${{ inputs.*
}}` interpolation inside the shell (for inputs.app_name,
inputs.commit_message_prefix, inputs.artifact_pattern); change this to expose
those workflow_call inputs via environment variables and then reference the env
vars (e.g., APP_NAME, COMMIT_PREFIX, ARTIFACT_PATTERN) inside the run script
(similar to how resolve_clusters handles inputs), so update the step to add an
env: mapping from inputs to APP_NAME/COMMIT_PREFIX/ARTIFACT_PATTERN and replace
all `${{ inputs.* }}` occurrences in the shell logic with the corresponding
`$APP_NAME`, `$COMMIT_PREFIX`, `$ARTIFACT_PATTERN` variables.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 20c8e12e-3a55-4673-9c02-54fbf2e1bc87
📒 Files selected for processing (1)
.github/workflows/gitops-update.yml
…to validate resolution Temporary change for end-to-end testing of the manifest-driven gitops pipeline on PR #212. Expected behavior on next beta of plugin-br-pix-indirect-btg: - resolve_clusters: {firmino, anacleto} (clotilde dropped) - values.yaml updated only in firmino/dev and anacleto/dev - argocd_sync fan-out: 2 jobs (firmino-*-dev, anacleto-*-dev) Revert this commit before merging #212.
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@config/deployment-matrix.yml`:
- Around line 76-94: The deployment-matrix.yml entry for cluster "clotilde" is
missing the app "plugin-br-pix-indirect-btg" that was temporarily removed for
testing; restore it by adding "plugin-br-pix-indirect-btg" back into the
clotilde.apps list (matching how it's present under "firmino" and "anacleto") so
the cluster definitions are consistent before merging.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 11b1a77e-f142-488b-bc29-13379b02e0f6
📒 Files selected for processing (1)
config/deployment-matrix.yml
…d restore matrix - Adds deployment_matrix_ref input (default 'main'). Callers on pinned tags get the latest manifest automatically; test runs can override via the input without editing the workflow. - Drops the temporary hardcoded ref to the feature branch. - Restores plugin-br-pix-indirect-btg in the clotilde cluster (removed temporarily during exclusion-validation test). End-to-end validation completed against plugin-br-pix-indirect-btg: - v1.5.2-beta.9: full fan-out to firmino + clotilde + anacleto, sync OK - v1.5.2-beta.10: manifest exclusion respected (firmino + anacleto only)
…anges - New 'deployment-matrix' label auto-applied by the labeler on PRs that touch config/deployment-matrix.yml. - config/deployment-matrix.yml added to self-release.yml paths-ignore: since callers resolve the manifest from main at runtime (via the deployment_matrix_ref input with default 'main'), matrix-only changes propagate to all callers without requiring a new release tag. - Mixed commits that touch the matrix plus workflow/action code still trigger a release as usual.
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.github/workflows/self-release.yml:
- Line 19: The paths-ignore entry currently lists 'config/deployment-matrix.yml'
which misses the '.yaml' canonical name; update the paths-ignore filter in the
workflow so it either includes both 'config/deployment-matrix.yml' and
'config/deployment-matrix.yaml' or replace the existing entry with the exact
canonical filename ('config/deployment-matrix.yaml') to ensure matrix-only
changes are correctly ignored; locate the paths-ignore array in the self-release
workflow and apply the change to the string entries accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: ASSERTIVE
Plan: Pro
Run ID: 4a2f825d-5f73-4880-9477-946e4511bf67
📒 Files selected for processing (3)
.github/labeler.yml.github/labels.yml.github/workflows/self-release.yml
Description
Introduces
config/deployment-matrix.yamlas the single source of truth for which apps deploy to which Kubernetes clusters via thegitops-update.ymlreusable workflow. Adds anacleto as the third deployment target alongside firmino and clotilde.The workflow now:
app_nameby scanningclusters.<name>.apps.deploy_in_<cluster>inputs as force-off overrides — they can subtract clusters from the resolved set but cannot add a cluster the manifest doesn't list. Prevents accidental cross-cluster spillover while still allowing emergency containment.A new
src/lint/deployment-matrixcomposite validates the manifest schema, app/cluster integrity, and duplicates on every PR that touchesconfig/deployment-matrix.yaml(gated job inself-pr-validation.yml, follows the same Python-embedded pattern ascomposite-schema).The manifest topology was inferred empirically from the
midaz-firmino-gitopsrepo by cross-referencing folder presence with CI commit history (ci(<app>): update image tags), so only real workflow callers are included — manually-managed apps likeunderwriter,jd-mock-api,mock-btg-server,control-plane,platform-console,ledger,dockerhub-secretare excluded.Cluster membership today (18 apps):
firmino: 12 apps (core platform + plugins + plugin-br-bank-transfer)clotilde: 17 apps (core + plugins + plugin-br-bank-transfer + Lerian platform suite: backoffice-console, cs-platform, forge, lerian-map, tenant-manager)anacleto: 12 apps (core + plugins + plugin-br-bank-transfer-jd)Files affected:
.github/workflows/gitops-update.yml— manifest loader, resolve_clusters step, force-off semanticsconfig/deployment-matrix.yaml— new manifest (source of truth)src/lint/deployment-matrix/{action.yml,README.md}— new composite validator.github/workflows/self-pr-validation.yml— wires the new lint jobsrc/notify/pr-lint-reporter/{action.yml,README.md}— adds deployment-matrix result row to the PR commentdocs/gitops-update-workflow.md— new "Deployment Matrix" section, updated examples, migration guideType of Change
feat: New workflow or new input/output/step in an existing workflowfix: Bug fix in a workflowperf: Performance improvementrefactor: Internal restructuring with no behavior changedocs: Documentation onlyci: Changes to self-CIchore: Dependency bumps, config updates, maintenancetest: Adding or updating testsBREAKING CHANGE: Callers must update their configuration after this PRBreaking Changes
None. This is fully backward-compatible:
deploy_in_firmino: true, deploy_in_clotilde: truecontinue working — the inputs are now force-off (true is no-op, false subtracts).Testing
yaml.safe_load)plugin-br-bank-transfervsplugin-br-bank-transfer-jdedge case (caught a critical bug:jq contains([$app])does substring matching on strings — switched toindex($app)for exact equality)@developor the beta tagCaller repo / workflow run: Pending real release after merge.
Related Issues
Closes #
Summary by CodeRabbit
New Features
CI
Documentation
Behavior
Chores