Skip to content

Conversation

AritraDey-Dev
Copy link
Member

@AritraDey-Dev AritraDey-Dev commented May 23, 2025

Description

fixes #16429

Reviewers

  • Ambient
  • Docs
  • Installation
  • Networking
  • Performance and Scalability
  • Extensions and Telemetry
  • Security
  • Test and Release
  • User Experience
  • Developer Infrastructure
  • Localization/Translation

@AritraDey-Dev AritraDey-Dev requested review from a team as code owners May 23, 2025 14:21
@istio-testing istio-testing added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. needs-ok-to-test labels May 23, 2025
@istio-testing
Copy link
Contributor

Hi @AritraDey-Dev. Thanks for your PR.

I'm waiting for a istio member to verify that this patch is reasonable to test. If it is, they should reply with /ok-to-test on its own line. Until that is done, I will not automatically test new commits in this PR, but the usual testing commands by org members will still work. Regular contributors should join the org to skip this step.

Once the patch is verified, the new status will be reflected by the ok-to-test label.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@AritraDey-Dev AritraDey-Dev requested a review from dhawton May 23, 2025 14:59
Copy link
Member

@Arhell Arhell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ok-to-test

@istio-testing istio-testing added ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. and removed needs-ok-to-test labels May 24, 2025
@AritraDey-Dev
Copy link
Member Author

/retest

@AritraDey-Dev
Copy link
Member Author

/test doc.test.profile-default

@craigbox
Copy link
Contributor

/retest

@craigbox
Copy link
Contributor

Does this PR cause a meaningful change to the time the tests take to run?


while (( attempt <= max_attempts )); do
# Check if the resource exists
if kubectl get "$kind" "$name" -n "$namespace" >/dev/null 2>&1; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aren't you basically doing

kubectl wait --for=create -n $namespace $kind/$name --timeout=30s

with this whole function?

Perhaps that would be cleaner?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what some of the other functions do.

The thing here is, that the goal of this wasn't so much to ensure that the configuration was created in Kubernetes, which either method checks for, but to ensure all the sidecars were programmed with it. They generally are, but in the instance that the object is created but is not translated to XDS and sent to the other proxy servers, there's no obvious way to tell.

So kubectl wait is probably necessary but not sufficient.

@craigbox
Copy link
Contributor

Congratulations, your testing has found a bug!

In #13208 the doc was changed to not require samples/bookinfo/networking/virtual-service-all-v1.yaml, so checking for productpage and details will fail as those are not actually created.

I'll change the test to only check for the services that are used by this example. The cleanup will remove all of them, whether they exist or not.

@craigbox craigbox requested a review from a team as a code owner May 27, 2025 00:05
@craigbox
Copy link
Contributor

That one feels a little more like a flake I've seen before.

/retest

@AritraDey-Dev
Copy link
Member Author

I'll change the test to only check for the services that are used by this example. The cleanup will remove all of them, whether they exist or not.

Thank you for catching that and updating the test accordingly!

@craigbox
Copy link
Contributor

np. Before we merge, please change this function to use kubectl wait and maybe even rename the function to _wait_for_resource throughout all the tests?

@istio-testing istio-testing added size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. and removed size/M Denotes a PR that changes 30-99 lines, ignoring generated files. labels May 27, 2025
@AritraDey-Dev
Copy link
Member Author

AritraDey-Dev commented May 27, 2025

please change this function to use kubectl wait and maybe even rename the function to _wait_for_resource throughout all the tests?

I have updated the implementation to use kubectl wait, and also renamed the function to _wait_for_resource throughout all the tests.

@craigbox
Copy link
Contributor

craigbox commented May 29, 2025

https://prow.istio.io/view/gs/istio-prow/pr-logs/pull/istio_istio.io/16515/doc.test.profile-default_istio.io/1928015336469172224

https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio.io/16515/doc.test.profile-default_istio.io/1928015336469172224/artifacts/tests-setup-profile-default-a89/TestDocs/tasks/traffic-management/ingress/secure-ingress/test.sh/test.sh/_test_context/test.sh_debug.txt

Interesting error

+_wait_for_istio gateway default mygateway
+_wait_for_istio(): local kind=gateway
+_wait_for_istio(): local namespace=default
+_wait_for_istio(): local name=mygateway
+_wait_for_istio(): kubectl wait --for=create -n default gateway/mygateway --timeout 30s
+_wait_for_istio virtualservice default helloworld-v1
+_wait_for_istio(): local kind=virtualservice
+_wait_for_istio(): local namespace=default
+_wait_for_istio(): local name=helloworld-v1
+_wait_for_istio(): kubectl wait --for=create -n default virtualservice/helloworld-v1 --timeout 30s
+_wait_for_istio(): echo 'Timed out waiting for virtualservice helloworld-v1 in namespace default to be created.'
+_wait_for_istio(): exit 1

Either this is a flake, or it's confused by (a) there being two different types of Gateway (b) waiting on create where a CRD should be condition=established ?

(Again, this shows why _wait_for_istio and kubectl wait are not the same)

@craigbox
Copy link
Contributor

/test doc.test.profile-default

@AritraDey-Dev
Copy link
Member Author

/retest

@AritraDey-Dev
Copy link
Member Author

AritraDey-Dev commented May 30, 2025

Our options here are (a) always sleep 1s, and hope that is enough (b) sleep something like 5s in every test where we install httpbin. I'd like feedback from other docs maintainers here.

As we're encountering edge cases where config propagation is still flaky, I guess we could consider adding an optional advanced wait mechanism in the future (e.g., polling Envoy config or using istioctl proxy-config).
For now, I'm adding a sleep 2s just to check the logs of this test. A simple sleep 1s might be sufficient, but this is for initial validation.

@AritraDey-Dev
Copy link
Member Author

/test doc.test.profile-default

1 similar comment
@dhawton
Copy link
Member

dhawton commented Jun 1, 2025

/test doc.test.profile-default

@dhawton
Copy link
Member

dhawton commented Jun 1, 2025

These aren't flakes it seems..

@AritraDey-Dev AritraDey-Dev force-pushed the remove-istioctl-experimental-wait branch from df798e2 to 71487b5 Compare June 1, 2025 20:12
@AritraDey-Dev
Copy link
Member Author

/retest

@AritraDey-Dev
Copy link
Member Author

Screenshot from 2025-06-02 14-17-45

The real issue appears to be Istiod control plane health or authentication problems, not the _wait_for_istio helper. I believe the root cause is unrelated to this script and likely due to Istiod not being healthy or reachable during the tests.

look at the logs here...
https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio.io/16515/doc.test.profile-default_istio.io/1929316089271947264/build-log.txt

@AritraDey-Dev AritraDey-Dev requested a review from craigbox June 2, 2025 10:18
Signed-off-by: Aritra Dey <[email protected]>
Signed-off-by: Aritra Dey <[email protected]>
@AritraDey-Dev
Copy link
Member Author

/retest

@AritraDey-Dev
Copy link
Member Author

Hi @craigbox @dhawton,
Just checking in on this PR as it has been on hold for a while.Any thoughts on this?

@craigbox
Copy link
Contributor

The real issue appears to be Istiod control plane health or authentication problems, not the _wait_for_istio helper. I believe the root cause is unrelated to this script and likely due to Istiod not being healthy or reachable during the tests.

look at the logs here... https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio.io/16515/doc.test.profile-default_istio.io/1929316089271947264/build-log.txt

I read that error as "test failed, trying to clean up, error logged while cleaning up"

I'm not 100% sure how best to progress on this. Right now it seems like we've just slowed everything down by 1s per test?
Daniel asked that we put back the time check.

Signed-off-by: Aritra Dey <[email protected]>
Signed-off-by: Aritra Dey <[email protected]>
@dhawton
Copy link
Member

dhawton commented Jun 10, 2025

The real issue appears to be Istiod control plane health or authentication problems, not the _wait_for_istio helper. I believe the root cause is unrelated to this script and likely due to Istiod not being healthy or reachable during the tests.
look at the logs here... https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio.io/16515/doc.test.profile-default_istio.io/1929316089271947264/build-log.txt

I read that error as "test failed, trying to clean up, error logged while cleaning up"

I'm not 100% sure how best to progress on this. Right now it seems like we've just slowed everything down by 1s per test? Daniel asked that we put back the time check.

I agree that we slowed everything down.. unfortunately, whichout a replacement for the old x wait command of istioctl, I don't know of a way to properly apply and wait for programming, deployment, etc. It's really unfortuante that that experimental command was dropped,.

I'm ok with implementing this for now... it makes things slower but hopefully will address some of the flakes that happen because we checked or tried to use something that hadn't yet applied.

@craigbox
Copy link
Contributor

Ok, approving, and we'll see what the impact is to other PRs.

@istio-testing istio-testing merged commit 7d560ea into istio:master Jun 10, 2025
13 checks passed
@AritraDey-Dev AritraDey-Dev deleted the remove-istioctl-experimental-wait branch June 11, 2025 07:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ok-to-test Set this label allow normal testing to take place for a PR not submitted by an Istio org member. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

remove use of istioctl x wait
5 participants