Skip to content

NETOBSERV-2375: make consistent status handling#2520

Merged
jotak merged 2 commits intonetobserv:mainfrom
jotak:status-refactor
Mar 18, 2026
Merged

NETOBSERV-2375: make consistent status handling#2520
jotak merged 2 commits intonetobserv:mainfrom
jotak:status-refactor

Conversation

@jotak
Copy link
Copy Markdown
Member

@jotak jotak commented Mar 10, 2026

Description

Address some old "TODOs" in status management. All statuses now follow this pattern:

  • They are initialized unknown
  • When controller found unused, status set to unused
  • At any point, if an error occurs, it's set with that error
  • Deployment/DaemonSet check progress set status either as InProgress, or Ready
  • New Degraded condition can be optionally set; a degraded status does not prevent it from being "ready".

Make the LokiStack status fit in this model (it was previously using an entirely different pattern)

Also, previously, in 2 different places LokiStack was fetched to get its status (for propagating status into FlowCollector status and into console plugin config) Now consolidated into a single entry point to get the loki status.

Status changes:

  • New status: WaitingEBPFAgents
  • New status: WaitingWebConsole
  • New status: WaitingDemoLoki
  • New status: WaitingLokiStack
  • Renamed status: WaitingFlowCollectorLegacy => WaitingFlowCollectorController
  • Removed: LokiIssue and LokiWarning

Dependencies

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
    • Standard QE validation, with pre-merge tests unless stated otherwise.
    • Regression tests only (e.g. refactoring with no user-facing change).
    • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Mar 10, 2026

@jotak: This pull request references NETOBSERV-2375 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Address some old "TODOs" in status management. All statuses now follow this pattern:

  • They are initialized unknown
  • When controller found unused, status set to unused
  • At any point, if an error occurs, it's set with that error
  • Deployment/DaemonSet check progress set status either as InProgress, or Ready
  • New Degraded condition can be optionally set; a degraded status does not prevent it from being "ready".

Make the LokiStack status fit in this model (it was previously using an entirely different pattern)

Also, previously, in 2 different places LokiStack was fetched to get its status (for propagating status into FlowCollector status and into console plugin config) Now consolidated into a single entry point to get the loki status.

Status changes:

  • New status: EBPFAgents
  • New status: WebConsole
  • New status: DemoLoki
  • New status: LokiStack
  • Renamed status: FlowCollectorLegacy => FlowCollectorController
  • Removed: LokiIssue and LokiWarning

Dependencies

n/a

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented Mar 10, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign oliviercazade for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Mar 10, 2026

@jotak: This pull request references NETOBSERV-2375 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Address some old "TODOs" in status management. All statuses now follow this pattern:

  • They are initialized unknown
  • When controller found unused, status set to unused
  • At any point, if an error occurs, it's set with that error
  • Deployment/DaemonSet check progress set status either as InProgress, or Ready
  • New Degraded condition can be optionally set; a degraded status does not prevent it from being "ready".

Make the LokiStack status fit in this model (it was previously using an entirely different pattern)

Also, previously, in 2 different places LokiStack was fetched to get its status (for propagating status into FlowCollector status and into console plugin config) Now consolidated into a single entry point to get the loki status.

Status changes:

  • New status: EBPFAgents
  • New status: WebConsole
  • New status: DemoLoki
  • New status: LokiStack
  • Renamed status: FlowCollectorLegacy => FlowCollectorController
  • Removed: LokiIssue and LokiWarning

Dependencies

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Mar 10, 2026

@jotak: This pull request references NETOBSERV-2375 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Address some old "TODOs" in status management. All statuses now follow this pattern:

  • They are initialized unknown
  • When controller found unused, status set to unused
  • At any point, if an error occurs, it's set with that error
  • Deployment/DaemonSet check progress set status either as InProgress, or Ready
  • New Degraded condition can be optionally set; a degraded status does not prevent it from being "ready".

Make the LokiStack status fit in this model (it was previously using an entirely different pattern)

Also, previously, in 2 different places LokiStack was fetched to get its status (for propagating status into FlowCollector status and into console plugin config) Now consolidated into a single entry point to get the loki status.

Status changes:

  • New status: WaitingEBPFAgents
  • New status: WaitingWebConsole
  • New status: WaitingDemoLoki
  • New status: WaitingLokiStack
  • Renamed status: WaitingFlowCollectorLegacy => WaitingFlowCollectorController
  • Removed: LokiIssue and LokiWarning

Dependencies

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jotak jotak added the needs-review Tells that the PR needs a review label Mar 10, 2026
jotak added 2 commits March 10, 2026 18:05
Address some old "TODOs" in status management. All statuses now follow
this pattern:

- They are initialized unknown
- When controller found unused, status set to unused
- At any point, if an error occurs, it's set with that error
- Deployment/DaemonSet check progress set status either as InProgress,
  or Ready
- New Degraded condition can be optionally set; a degraded status does
  not prevent it from being "ready".

Make the LokiStack status fit in this model (it was previously using an
entirely different pattern)

Also, previously, in 2 different places LokiStack was fetched to get its
status (for propagating status into FlowCollector status and into  console plugin config)
Now consolidated into a single entry point to get the loki status.

Status changes:
- New status: EBPFAgents
- New status: WebConsole
- New status: DemoLoki
- New status: LokiStack
- Renamed status: FlowCollectorLegacy => FlowCollectorController
- Removed: LokiIssue and LokiWarning
// Set status failure unless it was already set
if !r.status.HasFailure() {
r.status.SetFailure("FlowCollectorGenericError", err.Error())
r.status.SetFailure("ChildReconcilerError", err.Error())
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced by this namming but I don't know what could be better here actually 🤔

Copy link
Copy Markdown
Member Author

@jotak jotak Mar 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"SubReconcilerError" ? "DelegateReconcilerError" ? "DependentReconcilerError" ?

Copy link
Copy Markdown
Member

@jpinsonneau jpinsonneau left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work ! That will help a lot the users to have a clear view of what's going on !

Thanks @jotak

@openshift-ci openshift-ci Bot added the lgtm label Mar 17, 2026
@jotak jotak removed the needs-review Tells that the PR needs a review label Mar 17, 2026
@Amoghrd
Copy link
Copy Markdown
Member

Amoghrd commented Mar 17, 2026

/ok-to-test

@openshift-ci openshift-ci Bot added the ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. label Mar 17, 2026
@Amoghrd Amoghrd added ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. and removed ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. labels Mar 17, 2026
@github-actions
Copy link
Copy Markdown

New images:

quay.io/netobserv/network-observability-operator:542cb66
quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-542cb66
quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-542cb66

They will expire in two weeks.

To deploy this build:

# Direct deployment, from operator repo
IMAGE=quay.io/netobserv/network-observability-operator:542cb66 make deploy

# Or using operator-sdk
operator-sdk run bundle quay.io/netobserv/network-observability-operator-bundle:v0.0.0-sha-542cb66

Or as a Catalog Source:

apiVersion: operators.coreos.com/v1alpha1
kind: CatalogSource
metadata:
  name: netobserv-dev
  namespace: openshift-marketplace
spec:
  sourceType: grpc
  image: quay.io/netobserv/network-observability-operator-catalog:v0.0.0-sha-542cb66
  displayName: NetObserv development catalog
  publisher: Me
  updateStrategy:
    registryPoll:
      interval: 1m

@Amoghrd
Copy link
Copy Markdown
Member

Amoghrd commented Mar 17, 2026

@jotak PR looks good overall except I am not able to simulate a degraded state.

  • I tried high memory requests for FLP(100Gi) it showed failed.
  • I tried setting an incorrect nodeSelector for eBPF pods such that it does not get deployed on any node but the eBPF pod status still showed Ready with no failure which was weird too.

So let me know how do I hit the degraded state scenario

@jotak
Copy link
Copy Markdown
Member Author

jotak commented Mar 18, 2026

@Amoghrd the degraded condition should show up in the same conditions that previously would trigger a "LokiWarning" condition, so if you had tests previously with LokiWarning, you should be able to reuse the same. It's currently the only thing that triggers degraded.

Looking at the unit tests, an example of issue that triggers it is this one: Warning: The schema configuration does not contain the most recent schema version

I tried setting an incorrect nodeSelector for eBPF pods such that it does not get deployed on any node but the eBPF pod status still showed Ready with no failure which was weird too.

would be good to check if it's a regression of this PR; if not, we should open a bug

@Amoghrd
Copy link
Copy Markdown
Member

Amoghrd commented Mar 18, 2026

/label qe-approved

@openshift-ci openshift-ci Bot added the qe-approved QE has approved this pull request label Mar 18, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Collaborator

openshift-ci-robot commented Mar 18, 2026

@jotak: This pull request references NETOBSERV-2375 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the bug to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Description

Address some old "TODOs" in status management. All statuses now follow this pattern:

  • They are initialized unknown
  • When controller found unused, status set to unused
  • At any point, if an error occurs, it's set with that error
  • Deployment/DaemonSet check progress set status either as InProgress, or Ready
  • New Degraded condition can be optionally set; a degraded status does not prevent it from being "ready".

Make the LokiStack status fit in this model (it was previously using an entirely different pattern)

Also, previously, in 2 different places LokiStack was fetched to get its status (for propagating status into FlowCollector status and into console plugin config) Now consolidated into a single entry point to get the loki status.

Status changes:

  • New status: WaitingEBPFAgents
  • New status: WaitingWebConsole
  • New status: WaitingDemoLoki
  • New status: WaitingLokiStack
  • Renamed status: WaitingFlowCollectorLegacy => WaitingFlowCollectorController
  • Removed: LokiIssue and LokiWarning

Dependencies

Checklist

  • Does the changes in PR need specific configuration or environment set up for testing?
    • if so please describe it in PR description.
  • I have added thorough unit tests for the change.
  • QE requirements (check 1 from the list):
  • Standard QE validation, with pre-merge tests unless stated otherwise.
  • Regression tests only (e.g. refactoring with no user-facing change).
  • No QE (e.g. trivial change with high reviewer's confidence, or per agreement with the QE team).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@jotak jotak merged commit 8ce036a into netobserv:main Mar 18, 2026
12 of 13 checks passed
@Amoghrd
Copy link
Copy Markdown
Member

Amoghrd commented Mar 18, 2026

Yup the eBPF with nodeSelector bug is a regression, created a bug https://redhat.atlassian.net/browse/NETOBSERV-2674

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

jira/valid-reference lgtm ok-to-test To set manually when a PR is safe to test. Triggers image build on PR. qe-approved QE has approved this pull request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants