Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detect and warn about clock drift #6037

Open
nflaig opened this issue Oct 16, 2023 · 0 comments
Open

Detect and warn about clock drift #6037

nflaig opened this issue Oct 16, 2023 · 0 comments
Labels
meta-discussion Indicates a topic that requires input from various developers. meta-feature-request Issues to track feature requests.

Comments

@nflaig
Copy link
Member

nflaig commented Oct 16, 2023

Problem description

There have been a lot of issues reported by users due to their system clock being skewed / not synced correctly. Those range from client not being able to sync, to missed head votes / missed attestations, or just in general degraded effectiveness without logging any errors or warnings.

Determining that a clock drift as the root cause can be pretty difficult and most of the time just comes down to asking the user to check their system time via terminal or clock drift metric.

Fixing the issue itself it pretty easy by installing chrony / ntp for time synchronization.

Solution description

Lodestar could try to detect a clock drift by observing messages from the network and if there are many messages that come too early / late a warning could be logged to inform the user. This solution relies on heuristics and might not be that simple to implement but would most likely be the best to solve issues mentioned above.

Alternative solution

Another more simple option is to just detect a time discrepancy between the beacon node and validator client. This could be done by comparing the clock slot of the validator client with the head slot + sync distance reported by the beacon node sync API. The problem with this solution is that it might not detect all clock drift issues as the granularity is slot based and it does not help to detect a clock drift if both instances run on the same server which is the setup for most solo stakers. This solution has been implemented by Lighthouse already but might produce false positives due to a different interpretation by clients of what value should be used for sync_distance if the node is synced sigp/lighthouse#3421 (comment).

Additional context

At the moment, Lodestar would throw an error if the clock drift is significant enough to trigger a ATTESTATION_ERROR_FUTURE_SLOT / BLOCK_ERROR_FUTURE_SLOT error but that has only been observed once (or twice) so far.

There is also metric to track clock drift on the Lodestar summary dashboard but this requires to have metrics enabled and it is not something a user would actively look at during normal operation whereas a warning log would most likely get their attention.

image

Related

@nflaig nflaig added the meta-feature-request Issues to track feature requests. label Oct 16, 2023
@philknows philknows added the meta-discussion Indicates a topic that requires input from various developers. label Oct 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
meta-discussion Indicates a topic that requires input from various developers. meta-feature-request Issues to track feature requests.
Projects
None yet
Development

No branches or pull requests

2 participants