-
Notifications
You must be signed in to change notification settings - Fork 246
A100: Client-side WRR slow start configuration #498
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
A100: Client-side WRR slow start configuration #498
Conversation
CC @dfawley |
@dfawley did we get a chance to check this |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This would be a nice addition (it's something we've had requests for internally at Datadog). Generally the approach sounds good to me. Ideally we would be able to support this mechanism for other load balancers supported by gRPC and Envoy (lr & rr). But given that lr & rr do not implement endpoint weights, it seems difficult to fit the feature in those balancers.
@atollena I have addressed your comments regarding the proposal, let me know if I need to make any changes in it |
@ejona86 can I start with the implementation of the feature? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be useful to see an implementation of this in one of the 3 languages. I wouldn't wait for the gRFC to be approved before starting, especially since the result can be used as a custom, private LB policy before it is merged to upstream-supported gRPC implementations.
|
||
## Implementation | ||
|
||
This will be implemented in all languages C++, Java, and Go. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good place to add the actual implementation of your choice when you have it ready (it can be a draft or closed PR that you can re-open once the gRFC has been approved).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have added the implementation for Java, would love if you could review the changes :)
10be215
to
43b016a
Compare
@atollena I have opened the respective MR at the envoy side as well, they will merge it once the proposal is approved at our end. For this proposal I just need to handle the case for the timing part and make the respective changes in the proposal once done I will assign for re-review. |
0a3ea16
to
e490283
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This mostly seems fine to me, I just have some nits. Also, since I don't see a slow start config in the envoy WRR policy proto, I'm curious if you've (or someone else has?) sent a proposal to add it there already.
|
||
## Metrics | ||
|
||
The following metric will be exposed to help monitor the slow start behavior: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How important is this metric to include? Should we consider not adding it until it's needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally in most of the cases the slow_start_window will be applicable only for a short duration like 1-2 mins and if everything works well, it will not be called again.
I just thought that we can implement it in disabled mode, if someone wants to keeps a track of it.
We can decide this and I will make the respective changes in the proposal
@dfawley envoyproxy/envoy#40090 this is the MR I have created at envoy's end to include the slow_start_config in proto. However it requires this grpc proposal to be approved first before it is merged to the master. Then we will need to sync across repo and then we are good to implement this feature. I have added sample MR with the java implementation as well, please do check it as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing this up!
@markdroth I have made the respective changes and added a few comments for where I was confused, please help me resolve the same. |
@ejona86 @markdroth @atollena @dfawley can we re-check the latest changes and re-review the same. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay; I was on vacation for a few weeks, and I got Covid while I was out, and I'm still recovering...
This looks good overall, but the config content still needs work.
Please let me know if you have any questions.
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Co-authored-by: Antoine Tollenaere <[email protected]>
Co-authored-by: Doug Fawley <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
…osal. Adds links, clarifies slow start implementation details, and aligns with linked A24 proposal. Signed-off-by: anurag.ag <[email protected]>
e495afb
to
c346e9a
Compare
@markdroth I have made the respective changes, let me know if there are more changes required |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is looking really good! Only a few comments remaining.
For future reference, please don't force-push to a PR after reviews have started, because that makes it very difficult for a reviewer to tell what's changed since their last review pass.
Please let me know if you have any questions. Thanks!
…rmatting, and changes `aggression` type to `double`. Signed-off-by: anurag.ag <[email protected]>
… formatting fixes. Signed-off-by: anurag.ag <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great!
No description provided.