LoadBalancer
keyed on slot instead of primary node, not reset on NodesManager.initialize()
#3683
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Pull Request check-list
Description of change
LoadBalancer
now usesslot_to_idx
instead ofprimary_to_idx
, using slot as the key instead of primary node name.NodesManager
resets ininitialize
, it no longer runsLoadBalancer.reset()
which would clear theslot_to_idx
dictionary.TestNodesManager.test_load_balancer
updated accordingly.As noted in #3681, reseting the load balancer on
NodesManager.initialize()
causes the index associated with the primary node to reset to 0. If aConnectionError
orTimeoutError
is raised by an attempt to connect to a primary node,NodesManager.initialize()
is called, and the the load balancer's index for that node will reset to 0. Therefore, the next attempt in the retry loop will not move on from the primary node to a replica node (with index > 0) as expected, but will instead retry the primary node again (and presumably raise the same error).Since
NodesManager.initialize()
being called onConnectionError
orTimeoutError
is the valid strategy, and since the primary node's host will often be replaced in tandem with events that cause these errors (e.g. when a primary node is deleted and then recreated in Kubernetes), keying theLoadBalancer
dictionary on the primary node's name (host:port
) doesn't feel appropriate. Instead, keying the dictionary on the Redis Cluster's slot seems to be a better strategy. As such, theserver_index
corresponding to keyslot
doesn't need to be reset to 0 onNodesManager.initialize()
as theslot
isn't expected to change and need to be reset, only thehost:port
would require such. Instead, theslot
can maintain its "state" even when theNodesManager
is reinitialized, thus resolving #3681.With the fix in this PR implemented, the output of the loop from #3681 becomes what is expected when the primary node goes down (the load balancer continues to the next node on a
TimeoutError
):