-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GEP 3388 Retry Budget API Design #3573
GEP 3388 Retry Budget API Design #3573
Conversation
Hi @ericdbishop. Thanks for your PR. I'm waiting for a kubernetes-sigs member to verify that this patch is reasonable to test. If it is, they should reply with Once the patch is verified, the new status will be reflected by the I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
75e5d0b
to
78667d6
Compare
Add API design for GEP-3388 Retry Budgets
Planning to expand on the doc comments further during implementation phase, but put together an initial draft of possible Go struct and YAML representations. The tricky part i tried to illustrate in YAML examples is the potential complexity of applicability to both N/S (ingress) and E/W (mesh) traffic, indicating the need for some sort of source discriminator (the Open questions/comments:
|
/ok-to-test |
Thanks @mikemorris! I've talked with @mlavacca, @shaneutt, and @youngnick and we're approving this exception. Of course we would strongly prefer that this gets in sooner to leave time for API types, docs, conformance tests, etc. Let us know if there's anything we can help with! |
Yes, just stay in touch! Feel free to reach out on Slack if you're feeling stuck 🫡 |
@kflynn the Envoy retry budget doesn't care about connections. You're right to be confused by that. It only tracks concurrent requests and the ratio of those that are retries. FWIW, I agree that modifying Envoy in the way folks here are suggesting (adding a time interval parameter) will work and be a fairly innocuous change. I said something similar in envoyproxy/envoy#30205 (comment). I can help with making that change in Envoy whenever you folks flesh out the specifics here. |
@mikemorris @ericdbishop any updates on this one? Are you still trying to get this in to v1.3? |
Yes, I've just been a bit swamped with other tasks this week and I think @ericdbishop has been busy on-call. Opened ericdbishop#2 against Eric's branch with revisions in response to discussion in comments, after merging that I think @ericdbishop would just need to accept @kflynn's minor phrasing tweaks then we would be in good shape! |
Gep 3388 revisions
Co-authored-by: Flynn <[email protected]>
@mikemorris @robscott Done, it's been a low-bandwidth week for me, apologies. |
Co-authored-by: Flynn <[email protected]>
@robscott @kflynn @howardjohn I think this should be ready for final review now! |
Looks good to me, thank you both! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @ericdbishop and @mikemorris! A few follow up questions we can cover in the next phase, but otherwise LGTM.
/approve
// Support: Extended | ||
// | ||
// +optional | ||
BudgetPercent *Int `json:"budgetPercent,omitempty"` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can save this for the follow up, but we'll need some more comprehensive validation here, at least a min and max
|
||
// CommonRetryPolicy defines the configuration for when to retry a request. | ||
// | ||
type CommonRetryPolicy struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the minimum viable set of fields here for an implementation to say that they support retry budgets?
// Implementations SHOULD retry on connection errors (disconnect, reset, timeout, | ||
// TCP failure) if a retry stanza is configured. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll likely need to add some guidance in the spec for how this overlaps with the other retry config already in the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh this might've just been an inadvertent copy-paste, was talking with @kflynn earlier about how we should add guidance on interaction with HTTPRoute retry clause - my initial thought is a retry budget is a constraint, but wouldn't on its own imply requests should be retried - thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, as a retry budget implies an overall constraint, and additionally application developers will decide which routes should require retries based on idempotency of requests, etc.
|
||
// RequestRate expresses a rate of requests over a given period of time. | ||
// | ||
type RequestRate struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are both of the fields in here meant to be optional? Surely at least one of them needs to be set? Can they both be set at the same time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, both would likely be set at the same time. I'm not sure if there's an obvious enough default for either field on its own, so maybe both should be required, and what should actually be optional and could have a reasonable default is just the MinRetryRate
field of this type in CommonRetryPolicy
?
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ericdbishop, robscott The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cancel |
What type of PR is this?
/kind gep
What this PR does / why we need it:
Following up on #3488, where GEP 3388 was moved to
provisional
and the general goals and some potential designs for retry budgets within Gateway API were agreed upon. This PR will present multiple API implementations based off of previous discussion.Which issue(s) this PR fixes:
Fixes #3388
Does this PR introduce a user-facing change?: