-
Notifications
You must be signed in to change notification settings - Fork 166
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Custom taints and toleration node operation #9920
base: master
Are you sure you want to change the base?
Custom taints and toleration node operation #9920
Conversation
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
273c9cf
to
ee87c8d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation
Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: vkathole-t26
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation
Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation
Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: vkathole-o1
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: vkathole-o1
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation
Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
if "config" in subscription_data.get("spec", {}): | ||
params = '[{"op": "remove", "path": "/spec/config"}]' | ||
sub_obj.patch(resource_name=sub, params=params, format_type="json") | ||
time.sleep(180) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we not supposed to remove the tolerations from the rook-ceph operator configmap and ocsinitializations.ocs.openshift.io too ??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are not added in that place now, we are adding it to storagecluster yaml only
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you have added tolerations on configmap and ocsint in func apply_custom_taint_and_toleration() and also since this test has the ability to run on all ODF versions >, < 4.16, the cleanup should be according to that. Removing toleration just from storagecluster might not be right for version < 4.16
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check and add cleanup accordingly.
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
def test_negative_custom_taint(self, nodes): | ||
""" | ||
Test runs the following steps | ||
1. Taint odf nodes with non-ocs taint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1. Taint odf nodes with non-ocs taint | |
1. Taint odf worker nodes with non-ocs taint |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
|
||
assert not wait_for_pods_to_be_running( | ||
timeout=120, sleep=15 | ||
), "Pods are running when they should not be." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we expecting all pods to go in a bad state ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see we apply tolerations on storagecluster and subscription other than ODF, are we sure all pods will not be running if the toleration is just not applied properly on sub when we are setting it properly on storagecluster ? Please check the scenario again. if we are setting the toleration properly on storagecluster few pods should be up and running.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, we are not expecting all pods in bad state, some in pending and some in running state, wait_for_pods_to_be_running will fail with even one pod in pending state. so assert not wait_for_pods_to_be_running will work fine for that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking for all the pods we can check for pods that are affected by the subscription changes. This will not be a blocker for the merge though.
tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation
Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: vkathole-m4
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_negative_custom_taint
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vkathole The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: vkathole The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Signed-off-by: Vishakha Kathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
49b3b71
to
bc093fd
Compare
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
Signed-off-by: vkathole <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: vkathole-m11
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_non_ocs_taint_and_tolerations tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_reboot_on_tainted_node tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_negative_custom_taint
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
Signed-off-by: vkathole <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation on existing cluster
Cluster Name: vkathole-m11
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_non_ocs_taint_and_tolerations tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_reboot_on_tainted_node tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_negative_custom_taint
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation
Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_non_ocs_taint_and_tolerations tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_reboot_on_tainted_node tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_negative_custom_taint
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job UNSTABLE (some or all tests failed).
Signed-off-by: vkathole <[email protected]>
ocs_ci/helpers/helpers.py
Outdated
@@ -5392,3 +5401,130 @@ def verify_performance_profile_change(perf_profile): | |||
), f"Performance profile is not updated successfully to {perf_profile}" | |||
logger.info(f"Performance profile successfully got updated to {perf_profile} mode") | |||
return True | |||
|
|||
|
|||
def apply_custom_taint_and_toleration(taint_label): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can have the custom taint label defined here in the arg
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation
Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_non_ocs_taint_and_tolerations tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_reboot_on_tainted_node tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_negative_custom_taint
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Signed-off-by: vkathole <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unknown PR validation on existing cluster
Cluster Name: vkathole-m11
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_non_ocs_taint_and_tolerations tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_reboot_on_tainted_node tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_negative_custom_taint
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job state: ABORTED.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR validation
Cluster Name:
Cluster Configuration:
PR Test Suite: tier4b
PR Test Path: tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_non_ocs_taint_and_tolerations tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_reboot_on_tainted_node tests/functional/z_cluster/nodes/test_non_ocs_taint_and_toleration.py::TestNonOCSTaintAndTolerations::test_negative_custom_taint
Additional Test Params:
OCP VERSION: 4.17
OCS VERSION: 4.17
tested against branch: master
Job FAILED (installation failed, tests not executed).
if "config" in subscription_data.get("spec", {}): | ||
params = '[{"op": "remove", "path": "/spec/config"}]' | ||
sub_obj.patch(resource_name=sub, params=params, format_type="json") | ||
time.sleep(180) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see you have added tolerations on configmap and ocsint in func apply_custom_taint_and_toleration() and also since this test has the ability to run on all ODF versions >, < 4.16, the cleanup should be according to that. Removing toleration just from storagecluster might not be right for version < 4.16
if "config" in subscription_data.get("spec", {}): | ||
params = '[{"op": "remove", "path": "/spec/config"}]' | ||
sub_obj.patch(resource_name=sub, params=params, format_type="json") | ||
time.sleep(180) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please check and add cleanup accordingly.
logger.info( | ||
"After adding toleration wait for some time for pods to respin as expected" | ||
) | ||
time.sleep(300) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reinitializing the pod variable again would be the best approach instead of sleep. Otherwise reduce the sleep time and timeout in line 214.
from tests.functional.z_cluster.nodes.test_node_replacement_proactive import ( | ||
delete_and_create_osd_node, | ||
select_osd_node_name, | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might have to modify
@pytest.mark.polarion_id("OCS-2705") markers accordingly. I see many tests added to this class. so markers to be modified accordingly
|
||
assert not wait_for_pods_to_be_running( | ||
timeout=120, sleep=15 | ||
), "Pods are running when they should not be." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of checking for all the pods we can check for pods that are affected by the subscription changes. This will not be a blocker for the merge though.
No description provided.