Skip to content

Conversation

anuragagarwal561994
Copy link

No description provided.

@ejona86
Copy link
Member

ejona86 commented Jun 2, 2025

CC @dfawley

@anuragagarwal561994
Copy link
Author

@dfawley did we get a chance to check this

Copy link
Contributor

@atollena atollena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be a nice addition (it's something we've had requests for internally at Datadog). Generally the approach sounds good to me. Ideally we would be able to support this mechanism for other load balancers supported by gRPC and Envoy (lr & rr). But given that lr & rr do not implement endpoint weights, it seems difficult to fit the feature in those balancers.

@anuragagarwal561994
Copy link
Author

@atollena I have addressed your comments regarding the proposal, let me know if I need to make any changes in it

@anuragagarwal561994
Copy link
Author

@ejona86 @atollena I have addressed the new comments, requesting review for any futher modifications needed

@anuragagarwal561994
Copy link
Author

@ejona86 can I start with the implementation of the feature?

@anuragagarwal561994
Copy link
Author

@ejona86 @atollena will it be possible to review the proposal this week so that I can begin with the implementation of the feature

Copy link
Contributor

@atollena atollena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be useful to see an implementation of this in one of the 3 languages. I wouldn't wait for the gRFC to be approved before starting, especially since the result can be used as a custom, private LB policy before it is merged to upstream-supported gRPC implementations.


## Implementation

This will be implemented in all languages C++, Java, and Go.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good place to add the actual implementation of your choice when you have it ready (it can be a draft or closed PR that you can re-open once the gRFC has been approved).

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the implementation for Java, would love if you could review the changes :)

grpc/grpc-java#12200

@anuragagarwal561994 anuragagarwal561994 changed the title Client-side WRR slow start configuration A100: Client-side WRR slow start configuration Jul 1, 2025
@anuragagarwal561994 anuragagarwal561994 force-pushed the wrrslowstart branch 2 times, most recently from 10be215 to 43b016a Compare July 1, 2025 12:39
@anuragagarwal561994
Copy link
Author

@atollena I have opened the respective MR at the envoy side as well, they will merge it once the proposal is approved at our end. For this proposal I just need to handle the case for the timing part and make the respective changes in the proposal once done I will assign for re-review.

@anuragagarwal561994
Copy link
Author

@atollena @ejona86 so I have created a rough implementation, not updated the test cases as of now.

I have also updated the proto on the envoy side, but to merge it we require to approve this proposal, then sync the proto may be in a separate MR.

How should we go about this?

Copy link
Contributor

@atollena atollena left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Copy link
Member

@dfawley dfawley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This mostly seems fine to me, I just have some nits. Also, since I don't see a slow start config in the envoy WRR policy proto, I'm curious if you've (or someone else has?) sent a proposal to add it there already.


## Metrics

The following metric will be exposed to help monitor the slow start behavior:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How important is this metric to include? Should we consider not adding it until it's needed?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally in most of the cases the slow_start_window will be applicable only for a short duration like 1-2 mins and if everything works well, it will not be called again.

I just thought that we can implement it in disabled mode, if someone wants to keeps a track of it.

We can decide this and I will make the respective changes in the proposal

@anuragagarwal561994
Copy link
Author

@dfawley envoyproxy/envoy#40090 this is the MR I have created at envoy's end to include the slow_start_config in proto.

However it requires this grpc proposal to be approved first before it is merged to the master.

Then we will need to sync across repo and then we are good to implement this feature. I have added sample MR with the java implementation as well, please do check it as well.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing this up!

@anuragagarwal561994
Copy link
Author

@markdroth I have made the respective changes and added a few comments for where I was confused, please help me resolve the same.

@anuragagarwal561994
Copy link
Author

@ejona86 @markdroth @atollena @dfawley can we re-check the latest changes and re-review the same.

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay; I was on vacation for a few weeks, and I got Covid while I was out, and I'm still recovering...

This looks good overall, but the config content still needs work.

Please let me know if you have any questions.

anuragagarwal561994 and others added 16 commits August 28, 2025 14:06
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Signed-off-by: anurag.ag <[email protected]>
Co-authored-by: Antoine Tollenaere <[email protected]>
…osal. Adds links, clarifies slow start implementation details, and aligns with linked A24 proposal.

Signed-off-by: anurag.ag <[email protected]>
@anuragagarwal561994
Copy link
Author

@markdroth I have made the respective changes, let me know if there are more changes required

Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking really good! Only a few comments remaining.

For future reference, please don't force-push to a PR after reviews have started, because that makes it very difficult for a reviewer to tell what's changed since their last review pass.

Please let me know if you have any questions. Thanks!

…rmatting, and changes `aggression` type to `double`.

Signed-off-by: anurag.ag <[email protected]>
Copy link
Member

@markdroth markdroth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

@anuragagarwal561994
Copy link
Author

@ejona86 @atollena @dfawley can you help me review and approve this proposal?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants