Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix NewSlotTickerWithIntervals Panics #13307

Open
wants to merge 8 commits into
base: develop
Choose a base branch
from

Conversation

v4lproik
Copy link
Contributor

@v4lproik v4lproik commented Dec 8, 2023

What type of PR is this?

Bug fix: #13280

There's several issues highlighted by this bug which are:

@CLAassistant
Copy link

CLAassistant commented Dec 8, 2023

CLA assistant check
All committers have signed the CLA.

@v4lproik v4lproik changed the title fix tests fix NewSlotTickerWithIntervals Panics Dec 8, 2023
@v4lproik v4lproik marked this pull request as ready for review December 8, 2023 18:28
@v4lproik v4lproik requested a review from a team as a code owner December 8, 2023 18:28
@@ -111,6 +111,16 @@ func (s *Service) spawnProcessAttestationsRoutine() {
}()
}

// saturatingReorgInterval makes sure that the reorg interval cannot be a negative number
// it should be saturated to 0 if it's the case
func saturatingReorgInterval() time.Duration {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than setting this as zero, it might be better to set the following below as a dynamic value:

const reorgLateBlockCountAttestations = 2 * time.Second

Ex: If 2 seconds maps to 12 second slots, then for 1 second slots it should be 160ms. Otherwise the reorg interval starts breaking for non-standard configs.

cc @potuz might have some thoughts since this was introduced by him

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, we had discussed that this constant would be irrelevant on non-mainnet networks because we wouldn't be reorging late blocks, but now this constant becomes a higher issue if we're removing the flag to disable this behavior. I am not sure what is the right approach here. Computing this to be SecondsPerSlot / 2 / intervalsPerSlot would produce the right value for mainnet. I suspect that any network that tries to have short blocks would be very broken by the reorg feature.

@nisdas
Copy link
Member

nisdas commented Mar 25, 2024

any update on this @v4lproik ?

@v4lproik
Copy link
Contributor Author

v4lproik commented Mar 26, 2024

Sorry @nisdas, I thought since you weren't entirely certain whether it was the right approach or not, you might discuss it internally.

In my opinion, based on what I have observed in other CL implementations and specifically in LH, attempting to handle specific configurations depending on a value that is not expected initially is counterintuitive and prone to misunderstanding. I believe that calculating an interval or distance that directly depends on slots and/or durations should only make sense within the realm of the chain specs. Here, the value doesn't make sense as we have a negative value for the reorg interval, meaning that several options are available in order to tackle this issue (from least favorable to best in my opinion):

1/ We disable the feature for that particular case. I think this approach won't pay off in the long run, as it would be misleading trying to understand which values could potentially be accepted. For example, if the slot duration is 2, then the value is not negative but equal to 0, what if it's 3 then? What are the impact on other features in the code with those values? What's acceptable? It sounds like a rabbit hole.

2/ We do not run the service at all if we have a negative interval value. This is critical enough to prevent the service from running as distances and intervals cannot be negative. (As you can see, I added a check in this PR and a test).

3/ We saturate the value so it cannot be negative, we throw a log error as it’s critical enough informing the user that it's not the expected configuration, surely won't work as intended and should only be used at their own risk. Happy to add the error log to this PR if that's the consensus.

Let me know your thoughts!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

NewSlotTickerWithIntervals Panics if a One Second Slot is Used
4 participants