-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update rule management to avoid sporadic 503 errors #4039
base: main
Are you sure you want to change the base?
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: shraddhabang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
"resourceID", resLR.ID(), | ||
"arn", awssdk.ToString(sdkLR.ListenerRule.RuleArn)) | ||
return nil | ||
func (m *defaultListenerRuleManager) SetRulePriorities(ctx context.Context, unmatchedSDKLRs []ListenerRuleWithTags, lastAvailablePriority int32) (int32, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: unmatchedSDKLRs seems to be the wrong name. Aren't these listener rules that are matched but just at the wrong priority?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I read further on in the implementation seems we use this function for matched and unmatched rules.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we move "elasticloadbalancing:SetRulePriorities" to where we have condition, like other modify calls
"Condition": {
"Null": {
"aws:ResourceTag/elbv2.k8s.aws/cluster": "false"
}
}
|
||
matchedResAndSDKLRs, unmatchedResLRs, unmatchedSDKLRs := matchResAndSDKListenerRules(resLRs, sdkLRs) | ||
// matchedResAndSDKLRsBySettings : A slice of matched resLR and SDKLR rule pairs that have matching settings like actions and conditions | ||
// unmatchedResLRs : A slice of resLR) that do not have a corresponding match in the sdkLRs. These rules need to be created on the load balancer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resLR)
typo
if err != nil { | ||
return err | ||
} | ||
resLR.SetStatus(lrStatus) | ||
} | ||
for _, resAndSDKLR := range matchedResAndSDKLRs { | ||
// Update existing listener rules on the load balancer for their tags | ||
for _, resAndSDKLR := range matchedResAndSDKLRsBySettings { | ||
lsStatus, err := s.lrManager.Update(ctx, resAndSDKLR.resLR, resAndSDKLR.sdkLR) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this lrManager.Update
is used only for update tags with your change ? should we change the method name to make it clear
sdkLR: sdkLR, | ||
}) | ||
} | ||
unmatchedSDKLRs = append(unmatchedSDKLRs[:i], unmatchedSDKLRs[i+1:]...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I am not a fan of modifying the input of a function like this. I see that we do this is in other parts of the code but overall I think it's an anti-pattern. Using this pattern also means you have to do multiple modification to i
to get the index correct.
I would prefer to just allocate a new array to use.
break | ||
} | ||
} | ||
if !found { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will still result in 503s (please correct me if I'm wrong)
Basically, from my read of the code, when a user changes their conditions or actions we will delete the ListenerRule and re-create the rule. If we're unlucky, it's possible for the delete to get propagated and the create to be delayed a bit which causes the 503 errors.
Ideally, we can use modify-rule
https://docs.aws.amazon.com/cli/latest/reference/elbv2/modify-rule.html to detect these rules with the changed actions / conditions and modify the rule in place, rather than delete and create the rule.
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Issue
#3816
Description
The root cause for the above issue was identified as a race condition in the ALB's rule management process. The controller uses ModifyRule API sequentially to add any new rule at the existing priority which results in loss of the existing older rule at that priority. When the first API call to modify rules is received, the ALB waits 10 seconds to batch all subsequent changes before deploying them to the data plane. If some rule modification APIs arrive after the initial 10-second window, they are implemented in the next batch, leading to a brief period where a random rule may not exist, causing 404 errors.
The proposed improvement in the rule management logic make use of SetRulePriorities API to first push the unwanted rules down in the listener instead of deleting them. This creates the gaps in the rules order to add any new/updated rule. Once the new rule is successfully added, the controller will then delete all the unwanted rules. This way we ensure that no rules will be lost before the new/updated rules are present in the controller. This will avoid any sporadic 404 errors caused due to race conditions.
Checklist
README.md
, or thedocs
directory)BONUS POINTS checklist: complete for good vibes and maybe prizes?! 🤯