-
Notifications
You must be signed in to change notification settings - Fork 93
Karknu/max recon #5265
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: release/ouroboros-network-0.21
Are you sure you want to change the base?
Karknu/max recon #5265
Conversation
c1d52b4 to
110a1be
Compare
110a1be to
58c6ca5
Compare
coot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, just some minor suggestions.
ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor/ActivePeers.hs
Outdated
Show resolved
Hide resolved
| -> Set peeraddr | ||
| -- ^ peers with failure | ||
| -> (peeraddr -> Bool) | ||
| -- ^ do we have to remember the peer? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| -- ^ do we have to remember the peer? | |
| -- ^ do we have to remember the fail count for a peer? |
could you also add which peers are rememberd, e.g. local roots & extra root peers aka bootstrap peers.
unrelated rant
This is another reason why I think bootstrap peers should actually be part of ouroboros-network rather than cardano-diffusion, e.g. it's awkward for us do this part in cardano-diffusion - which would be the proper way in the current split between ouroboros-network and cardano-diffusion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not committing the suggestion. The purpose of the function is to control if we can forget the peer or not. Not simply if we need to track its fail count.
| -> (Set peeraddr -> a) | ||
| -- ^ callback for forgotten peers | ||
| -> KnownPeers peeraddr | ||
| -> (KnownPeers peeraddr, a) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems the callback is only used on the returned value so we can leave that to the caller and just return the forgotten peers.
| -> (Set peeraddr -> a) | |
| -- ^ callback for forgotten peers | |
| -> KnownPeers peeraddr | |
| -> (KnownPeers peeraddr, a) | |
| -> KnownPeers peeraddr | |
| -> (KnownPeers peeraddr, Set peeraddr) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 Great suggestion, will make that change
| } = | ||
| assert (all (`Map.member` allPeers) (Map.keysSet times)) $ | ||
| let knownPeers' = knownPeers { | ||
| reportFailures :: Ord peeraddr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a more explicit name would be setConnectTimesAndFailCount
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The real description would be setConnectionTimesAndFailCountAndPossiblyForgetPeers ;)
I prefer reportFailures to that, but I've added a comment to the function to make it clear what it is and what it does.
Enforce a maximum limit on the number of times we will attempt to promote a peer to warm. Localroot peers, bootstrap relays and manually configured public root peers are exempt from this limit. The clearing of the reconnection counter is delayd until a connection has managed to be active for a specific time (currently 120s).
Incase of an error use a shorter timeout when waiting for chainsync to exit.
Exclude shutdown peers in active peers calculations. It can take a while for peers to exit because blockfetch has to sync with chainsync as it exits. But we shouldn't count those peers as active or preferred anymore.
With p2p peerselection and the keepalive protocol we are not that dependant on chainsync timeout for detecting bad upstream peers. By bumping the timeout from between 135s and 269s to between 601s and 911s we change the false positive rate from something that happens a few times per epoch to something that happens less than once in a decade.
1463634 to
3270d14
Compare
Description
This is a series a changes that makes the node more robust.
Checklist
Quality
Maintenance
ouroboros-networkproject.