-
Notifications
You must be signed in to change notification settings - Fork 23
Description
The /api/v1/clients endpoint reports clients that no longer have an active TCP connection to the pool. This produces a mismatch between what the monitoring API reports and what is actually connected.
Reproduction Evidence
Running the following commands on the pool server shows a clear discrepancy:
$ ss -tn | grep 3333
ESTAB 0 0 <POOL_IP>:3333 <CLIENT_IP_1>:60497
ESTAB 0 0 <POOL_IP>:3333 <CLIENT_IP_1>:54639
ESTAB 0 0 <POOL_IP>:3333 <CLIENT_IP_2>:56512
ESTAB 0 0 <POOL_IP>:3333 <CLIENT_IP_1>:58306
4 active TCP connections, but the monitoring API reports 5 clients, two of which have 0 channels and -0 hashrate:
$ curl -s http://0.0.0.0:9090/api/v1/clients | jq
{
"offset": 0, "limit": 25, "total": 5,
"items": [
{ "client_id": 174, "extended_channels_count": 1, "standard_channels_count": 0, "total_hashrate": 626961400000 },
{ "client_id": 2, "extended_channels_count": 0, "standard_channels_count": 0, "total_hashrate": -0 },
{ "client_id": 175, "extended_channels_count": 0, "standard_channels_count": 1, "total_hashrate": 944576860000 },
{ "client_id": 146, "extended_channels_count": 1, "standard_channels_count": 0, "total_hashrate": 5661616000000 },
{ "client_id": 7, "extended_channels_count": 0, "standard_channels_count": 0, "total_hashrate": -0 }
]
}
Client IDs 2 and 7 have no active TCP connections, no channels, and zero hashrate — yet they persist in the API response.
Hypothesis (needs deeper analysis)
The pool has a remove_downstream() function and a DownstreamShutdown state handler that should clean up disconnected clients:
sv2-apps/pool-apps/pool/src/lib/channel_manager/mod.rs
Lines 424 to 435 in a13b643
| pub fn remove_downstream( | |
| &self, | |
| downstream_id: DownstreamId, | |
| ) -> PoolResult<(), error::ChannelManager> { | |
| self.channel_manager_data.super_safe_lock(|cm_data| { | |
| cm_data.downstream.remove(&downstream_id); | |
| cm_data | |
| .vardiff | |
| .retain(|key, _| key.downstream_id != downstream_id); | |
| }); | |
| Ok(()) | |
| } |
sv2-apps/pool-apps/pool/src/lib/mod.rs
Lines 229 to 240 in a13b643
| message = status_receiver.recv() => { | |
| if let Ok(status) = message { | |
| match status.state { | |
| State::DownstreamShutdown{downstream_id,..} => { | |
| warn!("Downstream {downstream_id:?} disconnected — cleaning up channel manager."); | |
| // Remove downstream from channel manager to prevent memory leak | |
| if let Err(e) = channel_manager_for_cleanup.remove_downstream(downstream_id) { | |
| error!("Failed to remove downstream {downstream_id:?}: {e:?}"); | |
| cancellation_token.cancel(); | |
| break; | |
| } | |
| } |
The monitoring API's get_sv2_clients() reads directly from the downstream HashMap:
sv2-apps/pool-apps/pool/src/lib/monitoring.rs
Lines 85 to 97 in a13b643
| fn get_sv2_clients(&self) -> Vec<Sv2ClientInfo> { | |
| // Clone Downstream references and release lock immediately to avoid contention | |
| // with template distribution and message handling | |
| let downstream_refs: Vec<Downstream> = self | |
| .channel_manager_data | |
| .safe_lock(|data| data.downstream.values().cloned().collect()) | |
| .unwrap_or_default(); | |
| downstream_refs | |
| .iter() | |
| .filter_map(downstream_to_sv2_client_info) | |
| .collect() | |
| } |
One possible hypothesis is that State::DownstreamShutdown is only sent on graceful disconnections, and abrupt TCP disconnections (RST, network timeout, etc.) fail to trigger the cleanup path — leaving stale entries in the downstream HashMap. However, this needs deeper analysis to confirm; there may be other explanations.
Impact
• Monitoring API reports inflated client counts
• Zombie entries with total_hashrate: -0 suggest uninitialized/stale state
• Makes it harder to diagnose real connectivity issues
Environment
• pool_sv2 running on Linux
• Observed with multiple miner clients connecting simultaneously
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Status