Skip to content

fix: better greylisting of sync peers#332

Open
stringhandler wants to merge 4 commits intodevelopmentfrom
st-ban-sync-peers
Open

fix: better greylisting of sync peers#332
stringhandler wants to merge 4 commits intodevelopmentfrom
st-ban-sync-peers

Conversation

@stringhandler
Copy link
Contributor

@stringhandler stringhandler commented May 16, 2025

Adds more strict greylisting to failed peers. Specifically, if the peer fails to return data during catchup sync, then we greylist them.
This should help find a better peer to sync from.

In addition, I've changed the greylisting clearing interval down to 2 minutes, to help us get through intermittent errors.

Summary by CodeRabbit

  • Bug Fixes
    • Improved peer filtering to skip blacklisted and greylisted peers during synchronization and metadata exchange.
    • Peers are now properly disconnected and blacklisted or greylisted after certain validation failures.
  • New Features
    • Added a check to determine if a peer is currently greylisted.
  • Refactor
    • Enhanced efficiency by reusing peer information during metadata exchange.
  • Chores
    • Adjusted the default interval for clearing the greylist from 15 minutes to 2 minutes.
    • Disabled automatic promotion of peers from greylist to whitelist on ping updates.

@stringhandler stringhandler requested a review from Copilot May 16, 2025 09:12
@coderabbitai
Copy link
Contributor

coderabbitai bot commented May 16, 2025

Walkthrough

The changes introduce stricter peer filtering by adding blacklist and greylist checks in several peer-handling methods. The metadata exchange process is refactored to reuse a pre-created PeerInfo. Automatic promotion from greylist to whitelist on ping is disabled, and a new method for checking greylist status is added. The greylist clearing interval is reduced.

Changes

File(s) Change Summary
p2pool/src/server/p2p/network.rs Refactored initiate_meta_data_exchange to accept a PeerInfo argument; reused peer info in main loop; added blacklist/greylist checks in multiple methods; disconnect peers on validation or sync failures; reduced grey_list_clear_interval from 15 to 2 minutes; updated method signature.
p2pool/src/server/p2p/peer_store.rs Disabled automatic greylist-to-whitelist promotion on ping; added public method is_greylisted to check greylist status.

Poem

A hop and a skip, the peer list is tight,
Black and grey, we check left and right.
No more quick jumps from grey to the white,
Info reused, our code feels light!
Rabbits rejoice, the network’s secure—
With careful checks, our peers endure.
🐇✨

Note

⚡️ AI Code Reviews for VS Code, Cursor, Windsurf

CodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback.
Learn more here.


Note

⚡️ Faster reviews with caching

CodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure Review - Disable Cache at either the organization or repository level. If you prefer to disable all data retention across your organization, simply turn off the Data Retention setting under your Organization Settings.
Enjoy the performance boost—your workflow just got faster.


📜 Recent review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between afc0b8a and fda8c5c.

📒 Files selected for processing (1)
  • p2pool/src/server/p2p/network.rs (6 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • p2pool/src/server/p2p/network.rs
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: test (esmeralda)
  • GitHub Check: cargo check with stable
  • GitHub Check: ci
✨ Finishing Touches
  • 📝 Generate Docstrings

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR tightens greylisting behavior by preventing failed peers from immediate re-whitelisting, adding a flag to check greylisted peers, and accelerating the greylist reset interval.

  • Comment out automatic re-whitelisting on successful ping
  • Introduce is_greylisted and skip greylisted peers in sync/exchange flows
  • Reduce grey_list_clear_interval from 15 min to 2 min (intended 1 min per description)
  • Update initiate_meta_data_exchange to receive PeerInfo externally

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File Description
peer_store.rs Commented out re-whitelisting logic; added is_greylisted method
network.rs Adjusted greylist interval; added greylist checks; refactored metadata exchange

Comment on lines +113 to +117
// let record = self.greylist_peers.remove(&peer_id.to_base58()).unwrap();

// record.num_grey_listings = 0;

self.whitelist_peers.insert(peer_id.to_base58(), record);
// self.whitelist_peers.insert(peer_id.to_base58(), record);
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] Remove the commented-out re-whitelist logic or document why it remains commented to avoid confusion and improve code clarity.

Copilot uses AI. Check for mistakes.
self.blacklist_peers.contains_key(&peer_id.to_base58())
}

pub fn is_greylisted(&self, peer_id: &PeerId) -> bool {
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Calling to_base58() for every greylist check may incur unnecessary allocations; consider storing keys as PeerId or caching the string form to reduce overhead.

Copilot uses AI. Check for mistakes.
peer,
format!("ShareChainError during share chain sync:{}", error),
);
let _ = self.swarm.disconnect_peer_id(peer).inspect(|e| {
Copy link

Copilot AI May 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If disconnect_peer_id returns a Future, it needs to be .awaited to actually execute the disconnect logic.

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🔭 Outside diff range comments (1)
p2pool/src/server/p2p/peer_store.rs (1)

1-483: 💡 Verification agent

🧩 Analysis chain

Note about greylisting interval

The PR objectives mention reducing the greylisting clearing interval to 1 minute to help recover from intermittent errors more quickly. However, this change doesn't appear to be in this file. Make sure this change is implemented in the appropriate file, likely wherever the periodic clearing of the greylist is scheduled.


🏁 Script executed:

#!/bin/bash
# Find where the greylisting clearing interval is defined/used
rg -A 3 "clear_grey_list" --glob "*.rs"

Length of output: 937


🏁 Script executed:

#!/bin/bash
# Show context around clear_grey_list invocation to locate its scheduling interval
rg -B 10 -A 10 "clear_grey_list" p2pool/src/server/p2p/network.rs

Length of output: 1288


🏁 Script executed:

#!/bin/bash
# Locate where grey_list_clear_interval is instantiated to verify its duration
rg -n "grey_list_clear_interval" -A 5 -B 5 p2pool/src/server/p2p/network.rs

Length of output: 3227


Reduce the greylist clearing interval to 1 minute

The grey_list_clear_interval is still using a 2 minute default in your network configuration. To meet the PR objectives of faster recovery, update it to 1 minute:

• File: p2pool/src/server/p2p/network.rs
• Location: in the Config default block around line 166

Suggested change:

-    grey_list_clear_interval: Duration::from_secs(60 * 2),
+    grey_list_clear_interval: Duration::from_secs(60),
🧹 Nitpick comments (5)
p2pool/src/server/p2p/peer_store.rs (1)

112-118: Consider adding a comment explaining the greylisting behavior

While the commented-out code makes it clear that automatic promotion from greylist to whitelist has been disabled, it would be helpful to add a brief comment explaining this change in behavior and that peers now remain greylisted until the clearing interval runs.

            *entry = new_record;
            // Move it to the whitelist
-            //            let record = self.greylist_peers.remove(&peer_id.to_base58()).unwrap();
-
-            // record.num_grey_listings = 0;
-
-            // self.whitelist_peers.insert(peer_id.to_base58(), record);
+            // Automatic promotion from greylist to whitelist on ping is disabled.
+            // Peers now remain greylisted until the periodic clearing interval runs.
            self.update_peer_stats();
p2pool/src/server/p2p/network.rs (4)

892-917: initiate_meta_data_exchange refactor – good reuse, but propagate error handling

Accepting a pre-built PeerInfo eliminates the expensive create_peer_info per-peer call – nice!
Two follow-ups:

  1. Defensive clone: my_info is currently moved into the request; because the caller now passes the same instance repeatedly (my_info.clone() in the loop later), clone cost is proportional to peer count. If PeerInfo grows, consider passing an Arc<PeerInfo> to share the allocation cheaply.

  2. Lost error-path: the previous function returned early if building PeerInfo failed; now the onus is on the caller. Make that explicit in the signature (Result<(), Error> or at least a comment) to avoid silent “nothing happened” bugs when future callers forget the if let Ok(...) guard.


1090-1097: Multiple read-locks per check – merge to avoid needless contention

The new blacklist / greylist guards are great, but they acquire an RwLock twice in quick succession.
Combining them reduces lock churn and makes intent clearer:

-                if self.network_peer_store.read().await.is_blacklisted(&peer_id) {
-                    debug!(target: LOG_TARGET, "Peer {} is blacklisted, skipping", peer_id);
-                    return;
-                }
-                if self.network_peer_store.read().await.is_greylisted(&peer_id) {
-                    debug!(target: LOG_TARGET, "Peer {} is greylisted, skipping", peer_id);
-                    return;
-                }
+                {
+                    let store = self.network_peer_store.read().await;
+                    if store.is_blacklisted(&peer_id) {
+                        debug!(target: LOG_TARGET, "Peer {} is blacklisted, skipping", peer_id);
+                        return;
+                    }
+                    if store.is_greylisted(&peer_id) {
+                        debug!(target: LOG_TARGET, "Peer {} is greylisted, skipping", peer_id);
+                        return;
+                    }
+                }

2261-2270: Greylist check duplicated lock acquisition

Nice addition aborting catch-up when peer is greylisted.
As above, both is_blacklisted and is_greylisted obtain the read-lock separately.
Use a single scope to improve performance (same pattern suggested earlier).


2996-3005: Single PeerInfo per tick – good, but clone cost can be avoided

Creating my_info once per timer tick then cloning for each peer already reduces work drastically.
For further optimisation, pass an Arc<PeerInfo> into initiate_meta_data_exchange so each peer shares the same allocation:

- if let Ok(my_info) = self.create_peer_info(... ) {
-     for peer in connected_peers.iter().take(NUM_PEERS_TO_META_DATA_EXCHANGE) {
-         self.initiate_meta_data_exchange(peer, my_info.clone()).await;
-     }
+ if let Ok(my_info) = self.create_peer_info(... ).map(Arc::new) {
+     for peer in connected_peers.iter().take(NUM_PEERS_TO_META_DATA_EXCHANGE) {
+         self.initiate_meta_data_exchange(peer, my_info.clone()).await;
+     }
 }

Requires changing initiate_meta_data_exchange to accept Arc<PeerInfo>.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between cff9241 and ecb7161.

📒 Files selected for processing (2)
  • p2pool/src/server/p2p/network.rs (6 hunks)
  • p2pool/src/server/p2p/peer_store.rs (2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
  • GitHub Check: ci
  • GitHub Check: test (esmeralda)
  • GitHub Check: cargo check with stable
  • GitHub Check: Cucumber tests / Base Layer
🔇 Additional comments (3)
p2pool/src/server/p2p/peer_store.rs (2)

113-117: Prevents automatic promotion from greylist to whitelist on ping response

This change ensures peers stay greylisted until the periodic clearing interval runs, rather than being immediately promoted to whitelist on ping response. This aligns with the PR objective of enforcing stricter greylisting rules.


454-456: Added utility method to check peer greylist status

This method follows the same pattern as the existing is_blacklisted method and provides a clean way for network components to check if a peer is currently greylisted before attempting to interact with it.

p2pool/src/server/p2p/network.rs (1)

164-168: Grey-list reset interval shortened – double-check risk of premature un-greylisting

Changing the default grey_list_clear_interval from 15 min to 2 min (line 166) will let a misbehaving peer be reconsidered after only 120 s.
While this helps “recover quickly from intermittent errors”, it also increases the chance that a permanently faulty or malicious peer will oscillate between good ↔ grey status, wasting bandwidth and permits.

Consider:

  • making the interval configurable via CLI/env instead of hard-coding 2 min;
  • keeping an exponential back-off per peer (e.g. double the delay each time it re-enters the greylist) to avoid ping-pong;
  • at minimum, add comments/tests showing that 2 min was validated in production.

stringhandler and others added 3 commits May 16, 2025 11:20
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

♻️ Duplicate comments (1)
p2pool/src/server/p2p/network.rs (1)

1793-1795: ⚠️ Potential issue

Use inspect_err instead of inspect for error logging.

The code currently uses inspect() which executes the closure for both successful and failed disconnects, but the warning message is only appropriate for failures.

Apply this fix:

-let _ = self.swarm.disconnect_peer_id(peer).inspect(|e| {
+let _ = self.swarm.disconnect_peer_id(peer).inspect_err(|e| {
    warn!(target: SYNC_REQUEST_LOG_TARGET, "Failed to disconnect peer {peer}: {e:?}");
});
🧹 Nitpick comments (1)
p2pool/src/server/p2p/network.rs (1)

892-909: Good refactoring of metadata exchange to reuse peer info.

The function now accepts a pre-created PeerInfo object instead of creating it internally, which avoids redundant creation of peer info objects when initiating multiple metadata exchanges.

However, there are commented-out lines (907-909) that should be removed for cleaner code.

-        // } else {
-        //     error!(target: LOG_TARGET, "Failed to create peer info");
-        // }
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting

📥 Commits

Reviewing files that changed from the base of the PR and between ecb7161 and afc0b8a.

📒 Files selected for processing (1)
  • p2pool/src/server/p2p/network.rs (6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
  • GitHub Check: test (esmeralda)
  • GitHub Check: ci
  • GitHub Check: cargo check with stable
🔇 Additional comments (4)
p2pool/src/server/p2p/network.rs (4)

166-166: The greylisting clear interval was reduced to 2 minutes.

The change reduces the greylisting clear interval from 15 minutes to 2 minutes as intended. This allows peers to recover from temporary issues more quickly, improving network resilience.


1083-1090: Good addition of blacklist and greylist checks in metadata exchange.

These checks prevent processing metadata exchange responses from peers that are blacklisted or greylisted, effectively enforcing stricter peer filtering as per the PR objectives.


2259-2262: Good addition of greylist checks in catch-up sync.

The change ensures we don't attempt to sync from greylisted peers, which aligns with the PR objective of enforcing stricter greylisting rules during the sync process.


2989-2998: Correctly uses the refactored metadata exchange function.

This section correctly creates the peer information once and passes it to each call to initiate_meta_data_exchange for multiple peers, which is more efficient than creating it for each peer separately.

Copy link
Contributor

@hansieodendaal hansieodendaal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thing to consider to not re-use all the same peers all the time. apart from that LGTM.

utACK

Comment on lines +2990 to +2994
.create_peer_info(self.swarm.external_addresses().cloned().collect())
.await
.inspect_err(|error| {
error!(target: LOG_TARGET, "Failed to create peer info: {error:?}");
}) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting

for peer in connected_peers.iter().take(NUM_PEERS_TO_META_DATA_EXCHANGE) {
// Update their latest tip.
self.initiate_meta_data_exchange(peer).await;
self.initiate_meta_data_exchange(peer, my_info.clone()).await;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can shuffle here then take instead of

for peer in connected_peers.iter().take(NUM_PEERS_TO_META_DATA_EXCHANGE) {
~``

num_squads: 1,
user_agent: "tari-p2pool".to_string(),
grey_list_clear_interval: Duration::from_secs(60 * 15),
grey_list_clear_interval: Duration::from_secs(60 * 2),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines +1083 to +1090
if self.network_peer_store.read().await.is_blacklisted(&peer_id) {
debug!(target: LOG_TARGET, "Peer {} is blacklisted, skipping", peer_id);
return;
}
if self.network_peer_store.read().await.is_greylisted(&peer_id) {
debug!(target: LOG_TARGET, "Peer {} is greylisted, skipping", peer_id);
return;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Comment on lines +2259 to +2262
if self.network_peer_store.read().await.is_greylisted(&peer) {
warn!(target: SYNC_REQUEST_LOG_TARGET, "Peer {} is greylisted, not syncing", peer);
return Ok(());
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants