Detect and warn about clock drift #6037
Labels
meta-discussion
Indicates a topic that requires input from various developers.
meta-feature-request
Issues to track feature requests.
Problem description
There have been a lot of issues reported by users due to their system clock being skewed / not synced correctly. Those range from client not being able to sync, to missed head votes / missed attestations, or just in general degraded effectiveness without logging any errors or warnings.
Determining that a clock drift as the root cause can be pretty difficult and most of the time just comes down to asking the user to check their system time via terminal or clock drift metric.
Fixing the issue itself it pretty easy by installing chrony / ntp for time synchronization.
Solution description
Lodestar could try to detect a clock drift by observing messages from the network and if there are many messages that come too early / late a warning could be logged to inform the user. This solution relies on heuristics and might not be that simple to implement but would most likely be the best to solve issues mentioned above.
Alternative solution
Another more simple option is to just detect a time discrepancy between the beacon node and validator client. This could be done by comparing the clock slot of the validator client with the head slot + sync distance reported by the beacon node sync API. The problem with this solution is that it might not detect all clock drift issues as the granularity is slot based and it does not help to detect a clock drift if both instances run on the same server which is the setup for most solo stakers. This solution has been implemented by Lighthouse already but might produce false positives due to a different interpretation by clients of what value should be used for
sync_distance
if the node is synced sigp/lighthouse#3421 (comment).Additional context
At the moment, Lodestar would throw an error if the clock drift is significant enough to trigger a
ATTESTATION_ERROR_FUTURE_SLOT
/BLOCK_ERROR_FUTURE_SLOT
error but that has only been observed once (or twice) so far.There is also metric to track clock drift on the Lodestar summary dashboard but this requires to have metrics enabled and it is not something a user would actively look at during normal operation whereas a warning log would most likely get their attention.
Related
The text was updated successfully, but these errors were encountered: