fix(cluster-hashring): heartbeat/full-sync never reconnect after Redis endpoint change#237
Merged
dcadenas merged 3 commits intoMay 19, 2026
Conversation
Member
|
Pushed follow-up cleanup in commit bdbc033. What changed:
Validation I ran on the updated branch:
cargo test --workspace --verbose still hits the existing local Postgres setup failure in atproto_http_test (PoolTimedOut in api/tests/common/mod.rs), so I’m using the GitHub test check as the merge gate for the full suite on this push. |
NotThatKindOfDrLiz
approved these changes
May 19, 2026
Member
NotThatKindOfDrLiz
left a comment
There was a problem hiding this comment.
Final pass looks good. The follow-up commit only tightens the Redis reconnect/IAM documentation and fixes the stale registry connection comment. GitHub checks are green on bdbc033, and I do not see any remaining blockers.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Redis command connections could get stuck after Redis closed an old socket.
This change makes cluster membership and API Redis helper commands reconnect, so later heartbeats, full syncs, polling writes, and admin Redis calls can recover without restarting the process.
Motivation
The root cause was that command paths reused long-lived multiplexed Redis connections.
When Redis closed one of those sockets, later commands kept using the broken connection.
redis::aio::ConnectionManager.Related Issue
Testing
I tested the broken-socket case directly and ran the Rust workspace checks.
The regression tests kill the active Redis client connection, confirm the first command can see the closed socket, then confirm the next command reconnects and succeeds.
cargo test --workspace --verbosecargo clippy --workspace --all-targets --all-features -- -D warnings -A deprecatedcargo fmt --all -- --checkTEST_REDIS_URL=redis://localhost:16379 cargo test -p cluster-hashring registry::tests::test_registry_recovers_after_connection_killed -- --ignored --exactandTEST_REDIS_URL=redis://localhost:16379 cargo test -p keycast_api redis::tests::test_prefixed_redis_recovers_after_connection_killed -- --ignored --exactVisuals
Risks
The command that first discovers a killed socket can still fail once.
That matches the existing retry/backoff behavior, and the following command should use the reconnected manager.
git fetch origin maincould not run here because SSH auth was unavailable.origin/mainref.