fix: better greylisting of sync peers#332
Conversation
WalkthroughThe changes introduce stricter peer filtering by adding blacklist and greylist checks in several peer-handling methods. The metadata exchange process is refactored to reuse a pre-created Changes
Poem
Note ⚡️ AI Code Reviews for VS Code, Cursor, WindsurfCodeRabbit now has a plugin for VS Code, Cursor and Windsurf. This brings AI code reviews directly in the code editor. Each commit is reviewed immediately, finding bugs before the PR is raised. Seamless context handoff to your AI code agent ensures that you can easily incorporate review feedback. Note ⚡️ Faster reviews with cachingCodeRabbit now supports caching for code and dependencies, helping speed up reviews. This means quicker feedback, reduced wait times, and a smoother review experience overall. Cached data is encrypted and stored securely. This feature will be automatically enabled for all accounts on May 16th. To opt out, configure 📜 Recent review detailsConfiguration used: CodeRabbit UI 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
⏰ Context from checks skipped due to timeout of 90000ms (3)
✨ Finishing Touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Pull Request Overview
This PR tightens greylisting behavior by preventing failed peers from immediate re-whitelisting, adding a flag to check greylisted peers, and accelerating the greylist reset interval.
- Comment out automatic re-whitelisting on successful ping
- Introduce
is_greylistedand skip greylisted peers in sync/exchange flows - Reduce
grey_list_clear_intervalfrom 15 min to 2 min (intended 1 min per description) - Update
initiate_meta_data_exchangeto receivePeerInfoexternally
Reviewed Changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| peer_store.rs | Commented out re-whitelisting logic; added is_greylisted method |
| network.rs | Adjusted greylist interval; added greylist checks; refactored metadata exchange |
| // let record = self.greylist_peers.remove(&peer_id.to_base58()).unwrap(); | ||
|
|
||
| // record.num_grey_listings = 0; | ||
|
|
||
| self.whitelist_peers.insert(peer_id.to_base58(), record); | ||
| // self.whitelist_peers.insert(peer_id.to_base58(), record); |
There was a problem hiding this comment.
[nitpick] Remove the commented-out re-whitelist logic or document why it remains commented to avoid confusion and improve code clarity.
| self.blacklist_peers.contains_key(&peer_id.to_base58()) | ||
| } | ||
|
|
||
| pub fn is_greylisted(&self, peer_id: &PeerId) -> bool { |
There was a problem hiding this comment.
Calling to_base58() for every greylist check may incur unnecessary allocations; consider storing keys as PeerId or caching the string form to reduce overhead.
p2pool/src/server/p2p/network.rs
Outdated
| peer, | ||
| format!("ShareChainError during share chain sync:{}", error), | ||
| ); | ||
| let _ = self.swarm.disconnect_peer_id(peer).inspect(|e| { |
There was a problem hiding this comment.
If disconnect_peer_id returns a Future, it needs to be .awaited to actually execute the disconnect logic.
There was a problem hiding this comment.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
p2pool/src/server/p2p/peer_store.rs (1)
1-483: 💡 Verification agent🧩 Analysis chain
Note about greylisting interval
The PR objectives mention reducing the greylisting clearing interval to 1 minute to help recover from intermittent errors more quickly. However, this change doesn't appear to be in this file. Make sure this change is implemented in the appropriate file, likely wherever the periodic clearing of the greylist is scheduled.
🏁 Script executed:
#!/bin/bash # Find where the greylisting clearing interval is defined/used rg -A 3 "clear_grey_list" --glob "*.rs"Length of output: 937
🏁 Script executed:
#!/bin/bash # Show context around clear_grey_list invocation to locate its scheduling interval rg -B 10 -A 10 "clear_grey_list" p2pool/src/server/p2p/network.rsLength of output: 1288
🏁 Script executed:
#!/bin/bash # Locate where grey_list_clear_interval is instantiated to verify its duration rg -n "grey_list_clear_interval" -A 5 -B 5 p2pool/src/server/p2p/network.rsLength of output: 3227
Reduce the greylist clearing interval to 1 minute
The
grey_list_clear_intervalis still using a 2 minute default in your network configuration. To meet the PR objectives of faster recovery, update it to 1 minute:• File:
p2pool/src/server/p2p/network.rs
• Location: in theConfigdefault block around line 166Suggested change:
- grey_list_clear_interval: Duration::from_secs(60 * 2), + grey_list_clear_interval: Duration::from_secs(60),
🧹 Nitpick comments (5)
p2pool/src/server/p2p/peer_store.rs (1)
112-118: Consider adding a comment explaining the greylisting behaviorWhile the commented-out code makes it clear that automatic promotion from greylist to whitelist has been disabled, it would be helpful to add a brief comment explaining this change in behavior and that peers now remain greylisted until the clearing interval runs.
*entry = new_record; // Move it to the whitelist - // let record = self.greylist_peers.remove(&peer_id.to_base58()).unwrap(); - - // record.num_grey_listings = 0; - - // self.whitelist_peers.insert(peer_id.to_base58(), record); + // Automatic promotion from greylist to whitelist on ping is disabled. + // Peers now remain greylisted until the periodic clearing interval runs. self.update_peer_stats();p2pool/src/server/p2p/network.rs (4)
892-917:initiate_meta_data_exchangerefactor – good reuse, but propagate error handlingAccepting a pre-built
PeerInfoeliminates the expensivecreate_peer_infoper-peer call – nice!
Two follow-ups:
Defensive clone:
my_infois currently moved into the request; because the caller now passes the same instance repeatedly (my_info.clone()in the loop later), clone cost is proportional to peer count. IfPeerInfogrows, consider passing anArc<PeerInfo>to share the allocation cheaply.Lost error-path: the previous function returned early if building
PeerInfofailed; now the onus is on the caller. Make that explicit in the signature (Result<(), Error>or at least a comment) to avoid silent “nothing happened” bugs when future callers forget theif let Ok(...)guard.
1090-1097: Multiple read-locks per check – merge to avoid needless contentionThe new blacklist / greylist guards are great, but they acquire an
RwLocktwice in quick succession.
Combining them reduces lock churn and makes intent clearer:- if self.network_peer_store.read().await.is_blacklisted(&peer_id) { - debug!(target: LOG_TARGET, "Peer {} is blacklisted, skipping", peer_id); - return; - } - if self.network_peer_store.read().await.is_greylisted(&peer_id) { - debug!(target: LOG_TARGET, "Peer {} is greylisted, skipping", peer_id); - return; - } + { + let store = self.network_peer_store.read().await; + if store.is_blacklisted(&peer_id) { + debug!(target: LOG_TARGET, "Peer {} is blacklisted, skipping", peer_id); + return; + } + if store.is_greylisted(&peer_id) { + debug!(target: LOG_TARGET, "Peer {} is greylisted, skipping", peer_id); + return; + } + }
2261-2270: Greylist check duplicated lock acquisitionNice addition aborting catch-up when peer is greylisted.
As above, bothis_blacklistedandis_greylistedobtain the read-lock separately.
Use a single scope to improve performance (same pattern suggested earlier).
2996-3005: SinglePeerInfoper tick – good, but clone cost can be avoidedCreating
my_infoonce per timer tick then cloning for each peer already reduces work drastically.
For further optimisation, pass anArc<PeerInfo>intoinitiate_meta_data_exchangeso each peer shares the same allocation:- if let Ok(my_info) = self.create_peer_info(... ) { - for peer in connected_peers.iter().take(NUM_PEERS_TO_META_DATA_EXCHANGE) { - self.initiate_meta_data_exchange(peer, my_info.clone()).await; - } + if let Ok(my_info) = self.create_peer_info(... ).map(Arc::new) { + for peer in connected_peers.iter().take(NUM_PEERS_TO_META_DATA_EXCHANGE) { + self.initiate_meta_data_exchange(peer, my_info.clone()).await; + } }Requires changing
initiate_meta_data_exchangeto acceptArc<PeerInfo>.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (2)
p2pool/src/server/p2p/network.rs(6 hunks)p2pool/src/server/p2p/peer_store.rs(2 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (4)
- GitHub Check: ci
- GitHub Check: test (esmeralda)
- GitHub Check: cargo check with stable
- GitHub Check: Cucumber tests / Base Layer
🔇 Additional comments (3)
p2pool/src/server/p2p/peer_store.rs (2)
113-117: Prevents automatic promotion from greylist to whitelist on ping responseThis change ensures peers stay greylisted until the periodic clearing interval runs, rather than being immediately promoted to whitelist on ping response. This aligns with the PR objective of enforcing stricter greylisting rules.
454-456: Added utility method to check peer greylist statusThis method follows the same pattern as the existing
is_blacklistedmethod and provides a clean way for network components to check if a peer is currently greylisted before attempting to interact with it.p2pool/src/server/p2p/network.rs (1)
164-168: Grey-list reset interval shortened – double-check risk of premature un-greylistingChanging the default
grey_list_clear_intervalfrom 15 min to 2 min (line 166) will let a misbehaving peer be reconsidered after only 120 s.
While this helps “recover quickly from intermittent errors”, it also increases the chance that a permanently faulty or malicious peer will oscillate between good ↔ grey status, wasting bandwidth and permits.Consider:
- making the interval configurable via CLI/env instead of hard-coding 2 min;
- keeping an exponential back-off per peer (e.g. double the delay each time it re-enters the greylist) to avoid ping-pong;
- at minimum, add comments/tests showing that 2 min was validated in production.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
…l into st-ban-sync-peers
There was a problem hiding this comment.
Actionable comments posted: 0
♻️ Duplicate comments (1)
p2pool/src/server/p2p/network.rs (1)
1793-1795:⚠️ Potential issueUse
inspect_errinstead ofinspectfor error logging.The code currently uses
inspect()which executes the closure for both successful and failed disconnects, but the warning message is only appropriate for failures.Apply this fix:
-let _ = self.swarm.disconnect_peer_id(peer).inspect(|e| { +let _ = self.swarm.disconnect_peer_id(peer).inspect_err(|e| { warn!(target: SYNC_REQUEST_LOG_TARGET, "Failed to disconnect peer {peer}: {e:?}"); });
🧹 Nitpick comments (1)
p2pool/src/server/p2p/network.rs (1)
892-909: Good refactoring of metadata exchange to reuse peer info.The function now accepts a pre-created
PeerInfoobject instead of creating it internally, which avoids redundant creation of peer info objects when initiating multiple metadata exchanges.However, there are commented-out lines (907-909) that should be removed for cleaner code.
- // } else { - // error!(target: LOG_TARGET, "Failed to create peer info"); - // }
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
Cache: Disabled due to data retention organization setting
Knowledge Base: Disabled due to data retention organization setting
📒 Files selected for processing (1)
p2pool/src/server/p2p/network.rs(6 hunks)
⏰ Context from checks skipped due to timeout of 90000ms (3)
- GitHub Check: test (esmeralda)
- GitHub Check: ci
- GitHub Check: cargo check with stable
🔇 Additional comments (4)
p2pool/src/server/p2p/network.rs (4)
166-166: The greylisting clear interval was reduced to 2 minutes.The change reduces the greylisting clear interval from 15 minutes to 2 minutes as intended. This allows peers to recover from temporary issues more quickly, improving network resilience.
1083-1090: Good addition of blacklist and greylist checks in metadata exchange.These checks prevent processing metadata exchange responses from peers that are blacklisted or greylisted, effectively enforcing stricter peer filtering as per the PR objectives.
2259-2262: Good addition of greylist checks in catch-up sync.The change ensures we don't attempt to sync from greylisted peers, which aligns with the PR objective of enforcing stricter greylisting rules during the sync process.
2989-2998: Correctly uses the refactored metadata exchange function.This section correctly creates the peer information once and passes it to each call to
initiate_meta_data_exchangefor multiple peers, which is more efficient than creating it for each peer separately.
hansieodendaal
left a comment
There was a problem hiding this comment.
Just one thing to consider to not re-use all the same peers all the time. apart from that LGTM.
utACK
| .create_peer_info(self.swarm.external_addresses().cloned().collect()) | ||
| .await | ||
| .inspect_err(|error| { | ||
| error!(target: LOG_TARGET, "Failed to create peer info: {error:?}"); | ||
| }) { |
| for peer in connected_peers.iter().take(NUM_PEERS_TO_META_DATA_EXCHANGE) { | ||
| // Update their latest tip. | ||
| self.initiate_meta_data_exchange(peer).await; | ||
| self.initiate_meta_data_exchange(peer, my_info.clone()).await; |
There was a problem hiding this comment.
Maybe we can shuffle here then take instead of
for peer in connected_peers.iter().take(NUM_PEERS_TO_META_DATA_EXCHANGE) {
~``
| num_squads: 1, | ||
| user_agent: "tari-p2pool".to_string(), | ||
| grey_list_clear_interval: Duration::from_secs(60 * 15), | ||
| grey_list_clear_interval: Duration::from_secs(60 * 2), |
| if self.network_peer_store.read().await.is_blacklisted(&peer_id) { | ||
| debug!(target: LOG_TARGET, "Peer {} is blacklisted, skipping", peer_id); | ||
| return; | ||
| } | ||
| if self.network_peer_store.read().await.is_greylisted(&peer_id) { | ||
| debug!(target: LOG_TARGET, "Peer {} is greylisted, skipping", peer_id); | ||
| return; | ||
| } |
| if self.network_peer_store.read().await.is_greylisted(&peer) { | ||
| warn!(target: SYNC_REQUEST_LOG_TARGET, "Peer {} is greylisted, not syncing", peer); | ||
| return Ok(()); | ||
| } |
Adds more strict greylisting to failed peers. Specifically, if the peer fails to return data during catchup sync, then we greylist them.
This should help find a better peer to sync from.
In addition, I've changed the greylisting clearing interval down to 2 minutes, to help us get through intermittent errors.
Summary by CodeRabbit