Skip to content

[release-4.16] OCPBUGS-59647: Reduce Frequency of Update Requests for Copied CSVs (#3597) #1042

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: release-4.16
Choose a base branch
from

Conversation

tmshort
Copy link
Contributor

@tmshort tmshort commented Jul 22, 2025

Cherry-pick from release-4.17

  • (bugfix): reduce frequency of update requests for CSVs

by adding annotations to copied CSVs that are populated with hashes of the non-status fields and the status fields.

This seems to be how this was intended to work, but was not actually working this way because the annotations never actually existed on the copied CSV. This resulted in a hot loop of update requests being made on all copied CSVs.

  • update unit tests

  • updates to test so far

  • Small changes

  • Add metadata drift guard to copyToNamespace

Since we switched to a PartialObjectMetadata cache to save memory, we lost visibility into copied CSV spec and status fields, and the reintroduced nonStatusCopyHash/statusCopyHash annotations only partially solved the problem. Manual edits to a copied CSV could still go undetected, causing drift without reconciliation.

This commit adds two new annotations: olm.operatorframework.io/observedGeneration and olm.operatorframework.io/observedResourceVersion. It implements a mechanism to guard agains metadata drift at the top of the existing-copy path in copyToNamespace. If a stored observedGeneration or observedResourceVersion no longer matches the live object, the operator now:

  • Updates the spec and hash annotations
  • Updates the status subresource
  • Records the new generation and resourceVersion in the guard annotations

Because the guard only fires when its annotations are already present, all existing unit tests pass unchanged. We preserve the memory benefits of the metadata‐only informer, avoid extra GETs, and eliminate unnecessary API churn.

Future work may explore a WithTransform informer to regain full object visibility with minimal memory impact.

  • Tests for metadata guard

Verifies that exactly three updates (spec, status, guard) are issued when the observedGeneration doesn’t match.

  • Persist observed annotations on all status updates

  • GCI the file

  • Use TransformFunc

Unit tests not updated

  • Update operatorgroup tests to compile

  • Restore operatorgroup_test from master

Remove metadatalister

  • Remove more PartialObjectMetadata

  • Remove hashes from operator_test

  • Fix error messages for static-analysis

  • Update test annotations and test client

  • Rename pruning to listerwatcher

  • Set resync to 6h

  • Add CSV copy revert syncer

  • Log tweaks

  • Consolidate revert and gc syncers

  • Add logging and reduce the amount of metadata in the TransformFunc

  • Handle the copy CSV revert via a requeue of the primary CSV

  • Revert "Set resync to 6h"

This reverts commit 855f940a2199bd4071c51f14ef44728550bf13cf.

  • Add delete handler for copied csv

  • Revert whitespace change

  • Rename function, fix comment


Upstream-repository: operator-lifecycle-manager
Upstream-commit: d055f28750cf62f966f566d36990fff5285c7a71 (cherry picked from commit bc111a9) (cherry picked from commit 882eb21) (cherry picked from commit e4bc847)

@openshift-ci-robot openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Jul 22, 2025
@openshift-ci-robot
Copy link

@tmshort: This pull request references Jira Issue OCPBUGS-59647, which is invalid:

  • expected dependent Jira Issue OCPBUGS-59253 to be in one of the following states: VERIFIED, RELEASE PENDING, CLOSED (ERRATA), CLOSED (CURRENT RELEASE), CLOSED (DONE), CLOSED (DONE-ERRATA), but it is ON_QA instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Cherry-pick from release-4.17

  • (bugfix): reduce frequency of update requests for CSVs

by adding annotations to copied CSVs that are populated with hashes of the non-status fields and the status fields.

This seems to be how this was intended to work, but was not actually working this way because the annotations never actually existed on the copied CSV. This resulted in a hot loop of update requests being made on all copied CSVs.

  • update unit tests

  • updates to test so far

  • Small changes

  • Add metadata drift guard to copyToNamespace

Since we switched to a PartialObjectMetadata cache to save memory, we lost visibility into copied CSV spec and status fields, and the reintroduced nonStatusCopyHash/statusCopyHash annotations only partially solved the problem. Manual edits to a copied CSV could still go undetected, causing drift without reconciliation.

This commit adds two new annotations: olm.operatorframework.io/observedGeneration and olm.operatorframework.io/observedResourceVersion. It implements a mechanism to guard agains metadata drift at the top of the existing-copy path in copyToNamespace. If a stored observedGeneration or observedResourceVersion no longer matches the live object, the operator now:

 • Updates the spec and hash annotations
 • Updates the status subresource
 • Records the new generation and resourceVersion in the guard annotations

Because the guard only fires when its annotations are already present, all existing unit tests pass unchanged. We preserve the memory benefits of the metadata‐only informer, avoid extra GETs, and eliminate unnecessary API churn.

Future work may explore a WithTransform informer to regain full object visibility with minimal memory impact.

  • Tests for metadata guard

Verifies that exactly three updates (spec, status, guard) are issued when the observedGeneration doesn’t match.

  • Persist observed annotations on all status updates

  • GCI the file

  • Use TransformFunc

Unit tests not updated

  • Update operatorgroup tests to compile

  • Restore operatorgroup_test from master

Remove metadatalister

  • Remove more PartialObjectMetadata

  • Remove hashes from operator_test

  • Fix error messages for static-analysis

  • Update test annotations and test client

  • Rename pruning to listerwatcher

  • Set resync to 6h

  • Add CSV copy revert syncer

  • Log tweaks

  • Consolidate revert and gc syncers

  • Add logging and reduce the amount of metadata in the TransformFunc

  • Handle the copy CSV revert via a requeue of the primary CSV

  • Revert "Set resync to 6h"

This reverts commit 855f940a2199bd4071c51f14ef44728550bf13cf.

  • Add delete handler for copied csv

  • Revert whitespace change

  • Rename function, fix comment


Upstream-repository: operator-lifecycle-manager
Upstream-commit: d055f28750cf62f966f566d36990fff5285c7a71 (cherry picked from commit bc111a9) (cherry picked from commit 882eb21) (cherry picked from commit e4bc847)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci openshift-ci bot requested review from grokspawn and kevinrizza July 22, 2025 14:52
Copy link
Contributor

openshift-ci bot commented Jul 22, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: tmshort

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jul 22, 2025
@tmshort tmshort force-pushed the OPCBUGS-59647 branch 2 times, most recently from 4b8b6f0 to 638375e Compare July 22, 2025 17:21
…3597)

* (bugfix): reduce frequency of update requests for CSVs

by adding annotations to copied CSVs that are populated with
hashes of the non-status fields and the status fields.

This seems to be how this was intended to work, but was not actually
working this way because the annotations never actually existed on the
copied CSV. This resulted in a hot loop of update requests being made
on all copied CSVs.

Signed-off-by: everettraven <[email protected]>

* update unit tests

Signed-off-by: everettraven <[email protected]>

* updates to test so far

Signed-off-by: everettraven <[email protected]>

* Small changes

Signed-off-by: Brett Tofel <[email protected]>

* Add metadata drift guard to copyToNamespace

Since we switched to a PartialObjectMetadata cache to save memory, we lost visibility into copied CSV spec and status fields, and the reintroduced nonStatusCopyHash/statusCopyHash annotations only partially solved the problem. Manual edits to a copied CSV could still go undetected, causing drift without reconciliation.

This commit adds two new annotations: olm.operatorframework.io/observedGeneration and olm.operatorframework.io/observedResourceVersion. It implements a mechanism to guard agains metadata drift at the top of the existing-copy path in copyToNamespace. If a stored observedGeneration or observedResourceVersion no longer matches the live object, the operator now:

      • Updates the spec and hash annotations
      • Updates the status subresource
      • Records the new generation and resourceVersion in the guard annotations

Because the guard only fires when its annotations are already present, all existing unit tests pass unchanged. We preserve the memory benefits of the metadata‐only informer, avoid extra GETs, and eliminate unnecessary API churn.

Future work may explore a WithTransform informer to regain full object visibility with minimal memory impact.

Signed-off-by: Brett Tofel <[email protected]>

* Tests for metadata guard

Verifies that exactly three updates (spec, status, guard) are issued when the observedGeneration doesn’t match.

Signed-off-by: Brett Tofel <[email protected]>

* Persist observed annotations on all status updates

Signed-off-by: Brett Tofel <[email protected]>

* GCI the file

Signed-off-by: Brett Tofel <[email protected]>

* Use TransformFunc

Unit tests not updated

Signed-off-by: Todd Short <[email protected]>

* Update operatorgroup tests to compile

Signed-off-by: Todd Short <[email protected]>

* Restore operatorgroup_test from master

Remove metadatalister

Signed-off-by: Todd Short <[email protected]>

* Remove more PartialObjectMetadata

Signed-off-by: Todd Short <[email protected]>

* Remove hashes from operator_test

Signed-off-by: Todd Short <[email protected]>

* Fix error messages for static-analysis

Signed-off-by: Todd Short <[email protected]>

* Update test annotations and test client

Signed-off-by: Todd Short <[email protected]>

* Rename pruning to listerwatcher

Signed-off-by: Todd Short <[email protected]>

* Set resync to 6h

Signed-off-by: Todd Short <[email protected]>

* Add CSV copy revert syncer

Signed-off-by: Todd Short <[email protected]>

* Log tweaks

Signed-off-by: Todd Short <[email protected]>

* Consolidate revert and gc syncers

Signed-off-by: Todd Short <[email protected]>

* Add logging and reduce the amount of metadata in the TransformFunc

Signed-off-by: Todd Short <[email protected]>

* Handle the copy CSV revert via a requeue of the primary CSV

Signed-off-by: Todd Short <[email protected]>

* Revert "Set resync to 6h"

This reverts commit 855f940a2199bd4071c51f14ef44728550bf13cf.

Signed-off-by: Todd Short <[email protected]>

* Add delete handler for copied csv

Signed-off-by: Todd Short <[email protected]>

* Revert whitespace change

Signed-off-by: Todd Short <[email protected]>

* Rename function, fix comment

Signed-off-by: Todd Short <[email protected]>

---------

Signed-off-by: everettraven <[email protected]>
Signed-off-by: Brett Tofel <[email protected]>
Signed-off-by: Todd Short <[email protected]>
Co-authored-by: everettraven <[email protected]>
Co-authored-by: Brett Tofel <[email protected]>
Upstream-repository: operator-lifecycle-manager
Upstream-commit: d055f28750cf62f966f566d36990fff5285c7a71
(cherry picked from commit bc111a9)
(cherry picked from commit 882eb21)
(cherry picked from commit e4bc847)
@tmshort
Copy link
Contributor Author

tmshort commented Jul 22, 2025

I should've been using go 1.21 for building/testing this... hence the verify failures.

@tmshort
Copy link
Contributor Author

tmshort commented Jul 22, 2025

/retest-required

@kuiwang02
Copy link

/hold

I am sorry to hold it because I need time to pre-verify it.
after it is done with ok, I will unhold it. thanks

@openshift-ci openshift-ci bot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Jul 23, 2025
@tmshort
Copy link
Contributor Author

tmshort commented Jul 23, 2025

/retest-required

@kuiwang02
Copy link

@tmshort when I pre-verify it, I find the update is still mass operation although it is less than previous release.
I am fear that it still cause customer issue.
I update the ticket https://issues.redhat.com/browse/OCPBUGS-59647# with comment.
could you please check it? thanks

by the way, when I verify the issue on 4.17 (4.16 to 4.17), it has same behavior. but I do not find the behavior when I verify it on4.19.

@tmshort
Copy link
Contributor Author

tmshort commented Jul 24, 2025

/retest-required

2 similar comments
@tmshort
Copy link
Contributor Author

tmshort commented Jul 25, 2025

/retest-required

@tmshort
Copy link
Contributor Author

tmshort commented Jul 30, 2025

/retest-required

Copy link
Contributor

openshift-ci bot commented Jul 30, 2025

@tmshort: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-gcp-olm c2fb6b2 link true /test e2e-gcp-olm

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants