Skip to content

Reconnect after failure#2

Open
billstron wants to merge 9 commits into
masterfrom
reconnect-after-failure-2
Open

Reconnect after failure#2
billstron wants to merge 9 commits into
masterfrom
reconnect-after-failure-2

Conversation

@billstron
Copy link
Copy Markdown
Member

@billstron billstron commented Apr 2, 2024

The goal of this PR is to better recover from master node failures. It introduces the following:

  1. It adds a configurable commandTimeout that will error any commands that take longer than the setting. This ensures that we do not hang for an infinite amount of time.
  2. It adds a timeout to the getSlots method so that an accidental call to a failing node does not result in a hung system. The timeout from (1) could not be used without a more disruptive rewrite of the package.
  3. When reconfiguring the slots, we subscribeAll when the subscribeClient is pointed at the failing node. This is done after the failing node is removed from the list so that it doesn't get used again.

closes virtual-peaker/product-vp#3408

@billstron billstron changed the title Reconnect after failure 2 Reconnect after failure Apr 2, 2024
@billstron billstron requested a review from josephmudd April 2, 2024 19:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant