-
Notifications
You must be signed in to change notification settings - Fork 7.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
kube/controller: Prevent leaking stale endpoints #55171
Conversation
Skipping CI for Draft Pull Request. |
😊 Welcome @dwj300! This is either your first contribution to the Istio istio repo, or it's been You can learn more about the Istio working groups, Code of Conduct, and contribution guidelines Thanks for contributing! Courtesy of your friendly welcome wagon. |
/test all |
f508451
to
e3b0ba5
Compare
@@ -148,8 +148,18 @@ func (pc *PodCache) onEvent(old, pod *v1.Pod, ev model.Event) error { | |||
ip := pod.Status.PodIP | |||
// PodIP will be empty when pod is just created, but before the IP is assigned | |||
// via UpdateStatus. | |||
// In the case of a pod being created, we should not add it to the cache until it is ready. | |||
// However, if the pod *used to* have an IP, we do need to actually delete it. | |||
if len(ip) == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the ip is missing when being evicted. Then you can allow ip unset if this is a deletion event
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But the problem is that all the caches (podCache, and endpointShardz) are indexed by podIP, not pod name.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a map ipByPods map[types.NamespacedName]string
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whatever this effect is same, but checking delete may reduce map quering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, missed this! But yes, this PR checks in ipByPods, but even for "non deletions" (i.e. Failed), we need to check in the map. So we only do so if len(ip) == 0 already.
Change-Id: Ibe1264bfc48b4dee7e52964ad19cee9659631f1c
b92c37b
to
adf530a
Compare
adf530a
to
71de47f
Compare
71de47f
to
14e3aa6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great, I think a test covering this would go a long way. We have a lot of similar tests+infra for this exact type of test in pilot/pkg/serviceregistry/serviceregistry_test.go already that would be a good fit
@@ -149,7 +149,14 @@ func (pc *PodCache) onEvent(old, pod *v1.Pod, ev model.Event) error { | |||
// PodIP will be empty when pod is just created, but before the IP is assigned | |||
// via UpdateStatus. | |||
if len(ip) == 0 { | |||
return nil | |||
// However, in the case of an Eviction, the event that marks the pod as Failed may *also* have removed the IP. | |||
// If the pod *used to* have an IP, then we need to actually delete it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Talking to myself...
So the event may be "update" but then we hit deleteIP
anyways due to shouldPodBeInEndpoints and translate to a delete.
deleteIP already handles the case of "IP was removed" since its a map of ip->[]pods.
We then fallthrough to notifyWorkloadHandlers, build an EP with no IP but WorkloadInstance is identified by name not IP. This looks good ✔️
14e3aa6
to
250ea78
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 👍
Fixes istio#54997 Change-Id: I59d90f775d86821060588f446e4d50d77ab97921
250ea78
to
c3bfbc0
Compare
Backported from istio#55171 * pod.go: unexport ipByPods * kube/controller: Prevent leaking stale endpoints Fixes istio#54997 Change-Id: If9d45d7563e9e94ec0bbd477dfb0f57e7608e9a9
In response to a cherrypick label: #55171 failed to apply on top of branch "release-1.23":
|
In response to a cherrypick label: new issue created for failed cherrypick: #55195 |
In response to a cherrypick label: #55171 failed to apply on top of branch "release-1.23":
|
In response to a cherrypick label: new issue created for failed cherrypick: #55196 |
In response to a cherrypick label: #55171 failed to apply on top of branch "release-1.24":
|
In response to a cherrypick label: new issue created for failed cherrypick: #55197 |
* pod.go: unexport ipByPods Change-Id: Ibe1264bfc48b4dee7e52964ad19cee9659631f1c * kube/controller: Prevent leaking stale endpoints Fixes istio#54997 Change-Id: I59d90f775d86821060588f446e4d50d77ab97921
Backported from istio#55171 * pod.go: unexport ipByPods * kube/controller: Prevent leaking stale endpoints Fixes istio#54997 Change-Id: If9d45d7563e9e94ec0bbd477dfb0f57e7608e9a9
* pod.go: unexport ipByPods Change-Id: Ibe1264bfc48b4dee7e52964ad19cee9659631f1c * kube/controller: Prevent leaking stale endpoints Fixes istio#54997 Change-Id: I59d90f775d86821060588f446e4d50d77ab97921
Please provide a description of this PR:
When selecting pods via a ServiceEntry, there are times when the pod may not have an IP address:
The code today handles (1), however there exists a case when the IP is removed from the Pod in the same event that marks it with a terminal (Failed) status. Due to the cross-controller interaction (between the kube registry and serviceentry registry), the WorkloadInstance needs to have a valid IP, as the podCache and endpointShards are both keyed by address. This PR adds logic to fetch the oldIP from the ipByPods cache, and uses that when building the WorkloadInstance.
Fixes #54997