Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correct behaviour of Started/Ready/Live in the Health and Other endpoints #1827

Open
22 of 24 tasks
stevenj opened this issue Feb 11, 2025 · 0 comments · May be fixed by #1974
Open
22 of 24 tasks

Correct behaviour of Started/Ready/Live in the Health and Other endpoints #1827

stevenj opened this issue Feb 11, 2025 · 0 comments · May be fixed by #1974
Assignees
Labels
rust Pull requests that update Rust code

Comments

@stevenj
Copy link
Collaborator

stevenj commented Feb 11, 2025

Summary

Fix health statuses

Description

There are three health states:

  1. Live - to detect a non-responsive application
  2. Started - for identifying and delaying application startup until it’s prepared to handle requests
  3. Ready - to ensure that a service is ready to receive traffic

See: https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/
for detailed explanation of how they are used.

Live - #1870

  • Define a "Live" atomic boolean, default it to true.
  • The http api has a panic catcher.
  • This should increment an atomic counter.
  • Every 30 seconds, the counter should be reset to zero.
  • If the counter reaches a pre-defined threshold (say 100) then an atomic boolean "Live" gets set to false.
  • There is no way for "Live" to go back to true once it goes False
  • If "Live" is true, return NoContent, otherwise Service Unavailable.

Started - #1921

  • Define A "Started" atomic boolean, set it to False on startup.
  • Define A set of "Live" atomic booleans, current values are "LiveIndexDB" and "LiveEventDB", by default these all start as False.
  • Started also needs a atomic bool to be true (defaults to false) which is set once the chain indexer reaches tip for the first time.
  • When each DB is first connected, their respective flag is set True.
  • IF all three LIVE flags are true simultaneously, the "Started" flag gets set to true.
  • Started can not be set to False once it is set to True.
  • If "Started" is true, return NoContent, otherwise Service Unavailable.
  • On startup of the service, try and connect to each of the databases, keep trying until both are connected, at which point the initial connection check ceases.

Ready - #1919

  • Using the Liveness flags defined above: If a Flag is false, try and re-connect to the DB which is not Live.
  • If it can be contacted, set the flag to True.
  • After trying to reconnect to the DB's (if required), check if all Live flags are true, is so, return No Content, otherwise Service Unavailable.

Each Endpoint.

  • Before acting on an endpoint that needs either the Event DB or the Index DB, check if all the Live flags are True.
  • If any are False, immediately return "Service Unavailable" and do not execute the endpoint.
  • If ANY db interaction returns an error, set the respective Liveness flag to False, and return "Service Unavailable".
  • Endpoints DO not re-probe the DB to see if its now active, the health endpoint is solely responsible for re-establishing connection to the database if a connection fails.

Acceptance Criteria

  • Logic is implemented as outlined above.
  • Endpoint descriptions and documentation are updated to broadly reflect the above logic. (See the Kubernetes reference for details on how to document the endpoints aligining with Kubernetes expectations and our behaviour).
@Mr-Leshiy Mr-Leshiy changed the title 🛠️ [TASK] : Correct behaviour of Started/Ready/Live in the Health and Other endpoints Correct behaviour of Started/Ready/Live in the Health and Other endpoints Feb 11, 2025
@Mr-Leshiy Mr-Leshiy added F14 rust Pull requests that update Rust code and removed F14 labels Feb 11, 2025
@Mr-Leshiy Mr-Leshiy moved this from New to 🔖 Ready in Catalyst Feb 13, 2025
@Mr-Leshiy Mr-Leshiy self-assigned this Feb 13, 2025
@Mr-Leshiy Mr-Leshiy moved this from 🔖 Ready to 🏗 In progress in Catalyst Feb 13, 2025
@Mr-Leshiy Mr-Leshiy moved this from 🏗 In progress to 🔖 Ready in Catalyst Feb 18, 2025
@saibatizoku saibatizoku assigned saibatizoku and unassigned Mr-Leshiy Feb 18, 2025
@saibatizoku saibatizoku moved this from 🔖 Ready to 🏗 In progress in Catalyst Feb 18, 2025
@saibatizoku saibatizoku linked a pull request Mar 6, 2025 that will close this issue
8 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
rust Pull requests that update Rust code
Projects
Status: 🏗 In progress
Development

Successfully merging a pull request may close this issue.

3 participants