feat: Improve ArgoCD sync action resilience (async sync + error visibility)

### Request Type

Performance improvement

### Affected Workflow (if applicable)

Infrastructure (gitops-update, build, helm-update-chart, dispatch-helm)

### Problem / Motivation

The `github-actions-argocd-sync` action fails intermittently when the ArgoCD server takes longer than the CLI's default timeout (~90s) to respond to a sync request. This was observed on the Reporter repo ([run #23253916955](https://github.com/LerianStudio/reporter/actions/runs/23253916955/job/67604135788)) where `firmino-reporter-dev` failed after 5 sync attempts, even though the sync was actually applied successfully on ArgoCD and the new image was running correctly in dev.

**Root cause (confirmed):** The `argocd app sync` command completes the sync successfully (`successfully synced (all tasks run)`), but the CLI exits with code 1 because there are orphaned resources that require pruning:

```
{"level":"fatal","msg":"2 resources require pruning","time":"2026-03-18T14:39:07-03:00"}
```

The orphaned resources are:
- `ClusterRole reporter-manager-midaz-plugins-dev`
- `ClusterRoleBinding reporter-manager-midaz-plugins-dev`

These were left behind after a rename from namespace-suffixed names to plain `reporter-manager`. The current entrypoint.sh redirects all output to `/dev/null`, hiding this error. It then retries 5 times — each retry successfully syncs but also exits 1 due to the same pruning requirement.

**Initial hypothesis (timeout) was incorrect.** The ~1min per attempt was the actual sync duration, not a timeout. The exit code 1 was from the pruning fatal log, not a gRPC timeout.

### Proposed Solution

Changes to `github-actions-argocd-sync/entrypoint.sh`:

1. **Remove `> /dev/null 2>&1`** from the sync command — expose the actual error message so failures are diagnosable from the GitHub Actions log. This is the most critical change — without it, the real error is invisible.

2. **Add `--prune` flag support** — new optional input `prune` (default: `false`). When enabled, pass `--prune` to `argocd app sync` so orphaned resources are cleaned up automatically during sync. This prevents the "resources require pruning" fatal from causing false failures.

3. **Use `--async` on `argocd app sync`** — fire the sync without waiting for completion. The script already has an `argocd app wait` step afterward that handles the confirmation. This separates sync dispatch from sync verification.

4. **Increase retry interval** from 5s to 30s — give time for a previous sync attempt to complete before retrying.

5. **Add explicit `--timeout`** to the sync and wait commands (e.g., `--timeout 180`) for predictable behavior regardless of CLI defaults.

### Alternatives Considered

- Only removing /dev/null (helps diagnosis but doesn't prevent the failure)
- Always pruning (risky in production — better as opt-in flag)
- Adding `--force` to sync retries (risky, could cause unintended overwrites)

### Example Usage

```yaml
# Existing usage remains the same (backward compatible)
- uses: LerianStudio/github-actions-argocd-sync@main
  with:
    app-name: firmino-reporter
    argo-cd-token: ${{ secrets.ARGOCD_TOKEN }}
    argo-cd-url: ${{ secrets.ARGOCD_URL }}
    env-prefix: dev
    skip-if-not-exists: true

# New: with safe pruning enabled
- uses: LerianStudio/github-actions-argocd-sync@main
  with:
    app-name: firmino-reporter
    argo-cd-token: ${{ secrets.ARGOCD_TOKEN }}
    argo-cd-url: ${{ secrets.ARGOCD_URL }}
    env-prefix: dev
    skip-if-not-exists: true
    prune: true
```

### Would This Be a Breaking Change?

No — fully backward compatible

### Checklist

- [x] I searched existing issues and this is not a duplicate.
- [x] This feature aligns with the repository's goal of providing reusable, organization-wide workflows.

### Additional Context

- Related Jira ticket: [DSINT-860](https://lerian.atlassian.net/browse/DSINT-860)
- Reported by Arthur Ribeiro in #devops-team
- Investigated by Lucas Bedatty — confirmed root cause via local sync without /dev/null redirect
- Orphaned resources from PR `fix/reporter-cluster-role-unique-names` (March 12) — namespace suffix added then reverted, old resources left behind

[DSINT-860]: https://lerian.atlassian.net/browse/DSINT-860?atlOrigin=eyJpIjoiNWRkNTljNzYxNjVmNDY3MDlhMDU5Y2ZhYzA5YTRkZjUiLCJwIjoiZ2l0aHViLWNvbS1KU1cifQ

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Improve ArgoCD sync action resilience (async sync + error visibility) #152

Request Type

Affected Workflow (if applicable)

Problem / Motivation

Proposed Solution

Alternatives Considered

Example Usage

Would This Be a Breaking Change?

Checklist

Additional Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat: Improve ArgoCD sync action resilience (async sync + error visibility) #152

Description

Request Type

Affected Workflow (if applicable)

Problem / Motivation

Proposed Solution

Alternatives Considered

Example Usage

Would This Be a Breaking Change?

Checklist

Additional Context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions