Skip to content

feat: Implement distributed Redis lock before provisioning mqinfra #433

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

alexluong
Copy link
Collaborator

@alexluong alexluong commented Jun 25, 2025

When a node starts up, it needs to ensure the required infrastructure (queues, topics, etc.) exists. Since we might have multiple nodes starting simultaneously, we need to coordinate who creates the infrastructure.

Here's how it works:

The retry loop: Each node tries up to 5 times to handle the infrastructure setup, with a 5-second wait between attempts. This gives nodes time to coordinate without hammering Redis.

The flow for each attempt:

  1. First, we check if the infrastructure already exists - if it does, we're done! No lock needed.
  2. If infrastructure doesn't exist, we try to grab a distributed lock in Redis (using SET NX with a 10-second expiry)
  3. If we get the lock, great! We provision the infrastructure and release the lock
  4. If someone else has the lock, we wait 5 seconds and try the whole thing again

Why this works well:

  • Most of the time (after initial setup), infrastructure already exists, so nodes skip everything and start immediately
  • During initial cluster deployment, one node will win the lock race and create the infrastructure while others wait
  • The 10-second lock expiry protects against crashed nodes holding locks forever
  • If something goes catastrophically wrong and a node can't start after 5 attempts, it fails health checks and gets recycled

This flow is really only relevant during the very first deployment. After that, the infrastructure exists and every node just breezes through the "does it exist?" check.

Copy link

vercel bot commented Jun 25, 2025

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
outpost-docs ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 25, 2025 2:31am
outpost-website ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jun 25, 2025 2:31am

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant