-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpc: move to TransientFailure
in pick_first
LB policy when all addresses are removed
#5274
Conversation
…esses are removed
pickfirst.go
Outdated
b.ResolverError(errors.New("produced zero addresses")) | ||
return balancer.ErrBadResolverState | ||
} | ||
if b.sc == nil { | ||
|
||
if b.subConn == nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe invert, early return, and outdent?
if b.subConn != nil {
b.cc.UpdateAddresses()
b.subConn.Connect() // do we really need this?
return
}
...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
And yes, we don't have to call Connect
here. UpdateAddresses
does that for us if the current active connection is not in the new address list.
pickfirst.go
Outdated
} | ||
return nil | ||
} | ||
|
||
func (b *pickfirstBalancer) UpdateSubConnState(sc balancer.SubConn, s balancer.SubConnState) { | ||
func (b *pickfirstBalancer) UpdateSubConnState(subConn balancer.SubConn, s balancer.SubConnState) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was OK with the parameter name being short, since the scope is more limited. But if you're changing that, let's also change s
to scs
or subConnState
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Changed to state
both here and in UpdateClientConnState
.
// TestEmptyAddrs verifies client behavior when a working connection is | ||
// removed. In pick first and round-robin, both will continue using the old | ||
// connections. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure there's still a test for this set of circumstances, but that the RPCs begin failing instead of keep using old connections.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I thought we confirmed RR was removing the subconns, so RPCs should have been failing? But this test is validating the RPCs would still succeed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I thought we confirmed RR was removing the subconns, so RPCs should have been failing? But this test is validating the RPCs would still succeed?
Yes, this test was quite broken because it was using the pick_first clientConn to verify round_robin behaviour:
// Confirm several new RPCs succeed on round robin.
for i := 0; i < 10; i++ {
if _, err := pfclient.UnaryCall(ctx, &testpb.SimpleRequest{}); err != nil {
t.Fatalf("UnaryCall(_) = _, %v; want _, nil", err)
}
time.Sleep(5 * time.Millisecond)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make sure there's still a test for this set of circumstances, but that the RPCs begin failing instead of keep using old connections.
Yes, the newly added test verifies that the channel moves to TransientFailure
when addresses are removed and that a WaitForReady
RPC blocks until new addresses are added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this test was quite broken because it was using the pick_first clientConn to verify round_robin behaviour:
Oof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the newly added test verifies that the channel moves to TransientFailure when addresses are removed and that a WaitForReady RPC blocks until new addresses are added.
But only with pick first -- is there one for RR too elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- clarify comments when pf receives no addresses - skip calling `Connect` when the address list changes. UpdateAddresses takes care of this - switch conditional to outindent more code
pickfirst.go
Outdated
@@ -161,7 +163,7 @@ type picker struct { | |||
err error | |||
} | |||
|
|||
func (p *picker) Pick(info balancer.PickInfo) (balancer.PickResult, error) { | |||
func (p *picker) Pick(_ balancer.PickInfo) (balancer.PickResult, error) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: delete parameter name entirely instead of using "_". And below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
This behavior is in line with what other languages do and what we should have been doing all along.
Summary of changes:
pick_first
LB policy implementation, if a resolver address removes previously provided addresses, delete the existing subConn and move channel toTransientFailure
.TestEmptyAddrs
, which is no longer required because we have tests intest/pickfirst_test.go
andtest/roundrobin_test.go
for this scenario. Also,TestEmptyAddrs
was broken anyway.test/pibkfirst_test.go
to exercise this functionality.RELEASE NOTES:
TransientFailure
inpick_first
LB policy when all addresses are removed