Karknu/max recon #5265

karknu · 2025-12-11T10:43:25Z

Description

This is a series a changes that makes the node more robust.

Checklist

Quality

Commit sequence makes sense and have useful messages, see ref.
New tests are added and existing tests are updated.
Self-reviewed the PR.

Maintenance

Linked an issue or added the PR to the current sprint of ouroboros-network project.
Added labels.
Updated changelog files.
The documentation has been properly updated, see ref.

coot

LGTM, just some minor suggestions.

ouroboros-network/src/Ouroboros/Network/PeerSelection/Governor/ActivePeers.hs

coot · 2025-12-11T12:42:52Z

ouroboros-network/src/Ouroboros/Network/PeerSelection/State/KnownPeers.hs

+               -> Set peeraddr
+               -- ^ peers with failure
+               -> (peeraddr -> Bool)
+               -- ^ do we have to remember the peer?


Suggested change

-- ^ do we have to remember the peer?

-- ^ do we have to remember the fail count for a peer?

could you also add which peers are rememberd, e.g. local roots & extra root peers aka bootstrap peers.

unrelated rant

This is another reason why I think bootstrap peers should actually be part of ouroboros-network rather than cardano-diffusion, e.g. it's awkward for us do this part in cardano-diffusion - which would be the proper way in the current split between ouroboros-network and cardano-diffusion.

I am not committing the suggestion. The purpose of the function is to control if we can forget the peer or not. Not simply if we need to track its fail count.

coot · 2025-12-11T12:51:43Z

ouroboros-network/src/Ouroboros/Network/PeerSelection/State/KnownPeers.hs

+               -> (Set peeraddr -> a)
+               -- ^ callback for forgotten peers
+               -> KnownPeers peeraddr
+               -> (KnownPeers peeraddr, a)


It seems the callback is only used on the returned value so we can leave that to the caller and just return the forgotten peers.

Suggested change

-> (Set peeraddr -> a)

-- ^ callback for forgotten peers

-> KnownPeers peeraddr

-> (KnownPeers peeraddr, a)

-> KnownPeers peeraddr

-> (KnownPeers peeraddr, Set peeraddr)

👍 Great suggestion, will make that change

coot · 2025-12-11T12:55:14Z

ouroboros-network/src/Ouroboros/Network/PeerSelection/State/KnownPeers.hs

-                } =
-    assert (all (`Map.member` allPeers) (Map.keysSet times)) $
-    let knownPeers' = knownPeers {
+reportFailures :: Ord peeraddr


Maybe a more explicit name would be setConnectTimesAndFailCount

The real description would be setConnectionTimesAndFailCountAndPossiblyForgetPeers ;)

I prefer reportFailures to that, but I've added a comment to the function to make it clear what it is and what it does.

Enforce a maximum limit on the number of times we will attempt to promote a peer to warm. Localroot peers, bootstrap relays and manually configured public root peers are exempt from this limit. The clearing of the reconnection counter is delayd until a connection has managed to be active for a specific time (currently 120s).

Incase of an error use a shorter timeout when waiting for chainsync to exit.

Exclude shutdown peers in active peers calculations. It can take a while for peers to exit because blockfetch has to sync with chainsync as it exits. But we shouldn't count those peers as active or preferred anymore.

With p2p peerselection and the keepalive protocol we are not that dependant on chainsync timeout for detecting bad upstream peers. By bumping the timeout from between 135s and 269s to between 601s and 911s we change the false positive rate from something that happens a few times per epoch to something that happens less than once in a decade.

github-project-automation bot added this to Ouroboros Network Dec 11, 2025

github-project-automation bot moved this to In Progress in Ouroboros Network Dec 11, 2025

karknu force-pushed the karknu/max_recon branch from c1d52b4 to 110a1be Compare December 11, 2025 12:14

karknu marked this pull request as ready for review December 11, 2025 12:15

karknu requested a review from a team as a code owner December 11, 2025 12:15

karknu added block-fetch Issues related to block fetch component. outbound-governor Issues / PRs related to outbound-governor chain-sync client labels Dec 11, 2025

karknu force-pushed the karknu/max_recon branch from 110a1be to 58c6ca5 Compare December 11, 2025 12:21

coot approved these changes Dec 11, 2025

View reviewed changes

karknu added 5 commits December 11, 2025 15:29

Use shorter timeout

203a9ed

Incase of an error use a shorter timeout when waiting for chainsync to exit.

Exclude shutdown peers

ec13b8a

Exclude shutdown peers in active peers calculations. It can take a while for peers to exit because blockfetch has to sync with chainsync as it exits. But we shouldn't count those peers as active or preferred anymore.

update CHANGELOG.md

3270d14

karknu force-pushed the karknu/max_recon branch from 1463634 to 3270d14 Compare December 11, 2025 14:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Karknu/max recon #5265

Karknu/max recon #5265

karknu commented Dec 11, 2025 •

edited

Loading

Uh oh!

coot left a comment

Uh oh!

Uh oh!

coot Dec 11, 2025

Uh oh!

karknu Dec 11, 2025

Uh oh!

coot Dec 11, 2025

Uh oh!

karknu Dec 11, 2025

Uh oh!

coot Dec 11, 2025

Uh oh!

karknu Dec 11, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	-- ^ do we have to remember the peer?
	-- ^ do we have to remember the fail count for a peer?

Karknu/max recon #5265

Are you sure you want to change the base?

Karknu/max recon #5265

Conversation

karknu commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Checklist

Quality

Maintenance

Uh oh!

coot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coot Dec 11, 2025

Choose a reason for hiding this comment

unrelated rant

Uh oh!

karknu Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

coot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

karknu Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

coot Dec 11, 2025

Choose a reason for hiding this comment

Uh oh!

karknu Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

karknu commented Dec 11, 2025 •

edited

Loading

karknu Dec 11, 2025 •

edited

Loading