-
-
Notifications
You must be signed in to change notification settings - Fork 315
fix: do not assign capacity for pending streams #860
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
fix: do not assign capacity for pending streams #860
Conversation
34de0d6
to
98d1c12
Compare
I am hitting this problem in production, and I've spent weeks tracking it down. I am running a tonic GRPC client that is creating thousands of streams to a server that limits streams to 400, and I keep hitting a deadlock. I caught logs where h2 gives capacity to streams that are pending_open, and it triggers a deadlock where no more progress is made. I ended up with a patch to fix it that looks almost exactly like this, which led to me finding this PR. I support this PR. Here's my question though: Is it always a bug to assign capacity to a stream that is pending_open? If so, do you think there should be a check inside of |
Just wanted to chime in here, I think this fix sounds good, but part of me worries that it has deeper repercussions. (Like, does I hope to find time to thoroughly understand how this affects any other internal invariants... But I appreciate any reports that testing this out with live traffic improved things. |
98d1c12
to
716dc1e
Compare
`Prioritize::send_data` has a check to prevent assigning capacity to streams that are not yet open. Assigning flow control window to pending streams could starve already open streams. This change adds a similar check to `Prioritize::reserve_capacity`. Test `capacity_not_assigned_to_unopened_streams` in `flow_control.rs` demonstrates the fix. A number of other tests must be changed because they were assuming that pending streams immediately received connection capacity. This may be related to hyperium#853.
716dc1e
to
0980a62
Compare
I can confirm that I have tried out this change with live traffic and it improved things. But to demonstrate things even easier, I have created PR #863 that demonstrates the bug in this PR. The requirements to trigger this deadlock are:
It's pretty easy to hit the case where all of the connection capacity ends up in pending_open streams, and then a deadlock occurs. |
We are getting similar bug reports in production related to these same deadlock issues reported here, so any help to review these changes would be appreciated! |
Prioritize::send_data
has a check to prevent assigning capacity to streams that are not yet open. Assigning flow control window to pending streams could starve already open streams.This change moves the
stream.is_pending_open
check intoPrioritize::try_assign_capacity
. This prevents capacity from ever being assigned to pending streams. In particular, neither a client'sreserve_capacity
call nor a remote's initial window size adjustment will assign capacity to pending streams.Tests
capacity_not_assigned_to_unopened_streams
andnew_initial_window_size_capacity_not_assigned_to_unopened_streams
inflow_control.rs
demonstrates the fix.A number of other tests must be changed because they were assuming that pending streams immediately received connection capacity.
This may be related to #853.