Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: check ephemeral metadata is set before delete #4089

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

com6056
Copy link

@com6056 com6056 commented Jan 30, 2025

We're seeing issues with rollout sync failing due to errors like this:

roCtx.reconcile err Operation cannot be fulfilled on pods \"REDACTED\": the object has been modified; please apply your changes to the latest version and try again"

It seems that we're trying to delete the metadata on every pod every sync, even if the pod doesn't have the keys. This means that in environments with high pod churn, this loop can get stuck for a very long time. This simply adds a check to make sure the key is actually present in the metadata before we delete it and mark it as modified.

Before the fix with the new test:

go test ./utils/replicaset/...
--- FAIL: TestSyncEphemeralPodMetadata (0.00s)
    canary_test.go:1353:
        	Error Trace:	/Users/jrodgers/argo-rollouts/utils/replicaset/canary_test.go:1353
        	Error:      	Should be false
        	Test:       	TestSyncEphemeralPodMetadata
time="2025-01-30T13:43:28-08:00" level=warning msg="Failed to determine existing ephemeral metadata from annotation: foo"
time="2025-01-30T13:43:28-08:00" level=info msg="ComputePodTemplateHash hash changed (new hash: 6c8776dbbb, old hash: fcb6c6ff9)" namespace=default rollout=red
time="2025-01-30T13:43:28-08:00" level=info msg="Pod template change detected (new: 8657b54cf, old: different-hash)" namespace=default rollout=nginx
time="2025-01-30T13:43:28-08:00" level=info msg="Canary steps change detected (new: 5ffbfbbd64, old: different-hash)" namespace=default rollout=nginx
time="2025-01-30T13:43:28-08:00" level=warning msg="Unable to convert ReplicaSet revision to int: strconv.Atoi: parsing \"not-an-int\": invalid syntax" ReplicaSet= namespace= rollout=ro
time="2025-01-30T13:43:28-08:00" level=warning msg="Unable to convert ReplicaSet revision to int: strconv.Atoi: parsing \"not-an-int\": invalid syntax" ReplicaSet= namespace= rollout=ro
FAIL
FAIL	github.com/argoproj/argo-rollouts/utils/replicaset	0.685s
FAIL

After the fix with the new test:

go test ./utils/replicaset/...
ok  	github.com/argoproj/argo-rollouts/utils/replicaset	(cached)

Checklist:

  • Either (a) I've created an enhancement proposal and discussed it with the community, (b) this is a bug fix, or (c) this is a chore.
  • The title of the PR is (a) conventional with a list of types and scopes found here, (b) states what changed, and (c) suffixes the related issues number. E.g. "fix(controller): Updates such and such. Fixes #1234".
  • I've signed my commits with DCO
  • I have written unit and/or e2e tests for my change. PRs without these are unlikely to be merged.
  • My builds are green. Try syncing with master if they are not.
  • My organization is added to USERS.md.

@com6056 com6056 force-pushed the check-metadata-set-before-delete branch from e89d0ad to 3f8a37c Compare January 30, 2025 20:55
Copy link
Contributor

github-actions bot commented Jan 30, 2025

Published E2E Test Results

  4 files    4 suites   3h 8m 36s ⏱️
113 tests 104 ✅  7 💤 2 ❌
454 runs  424 ✅ 28 💤 2 ❌

For more details on these failures, see this check.

Results for commit 68b5f6c.

♻️ This comment has been updated with latest results.

Copy link
Contributor

github-actions bot commented Jan 30, 2025

Published Unit Test Results

2 291 tests   2 291 ✅  2m 59s ⏱️
  128 suites      0 💤
    1 files        0 ❌

Results for commit 68b5f6c.

♻️ This comment has been updated with latest results.

Signed-off-by: Jordan Rodgers <[email protected]>
@com6056 com6056 marked this pull request as ready for review January 30, 2025 21:44
@@ -1367,10 +1376,8 @@ func TestSyncEphemeralPodMetadata(t *testing.T) {
"do-not": "touch",
},
}
assert.True(t, modified)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This same condition is being checked already right above this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants