- 
                Notifications
    You must be signed in to change notification settings 
- Fork 254
Bugfix: Add circuit breaker #6143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| The latest updates on your projects. Learn more about Vercel for GitHub. 
 1 Skipped Deployment
 | 
| pub fn get_sdk_shutdown_tracker() -> Result<ShutdownTracker, RegistryAccessError> { | ||
| Ok(runtime_registry::RuntimeRegistry::get_or_create_sdk()?.shutdown_tracker_owned()) | ||
| pub fn create_sdk_shutdown_tracker() -> Result<ShutdownTracker, RegistryAccessError> { | ||
| Ok(runtime_registry::RuntimeRegistry::create_sdk()?.shutdown_tracker_owned()) | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did you change the existing behaviour to overwrite a pre-existing shutdown manager? this could be potentially dangereous, especially if it had already registered some signals, tasks, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having a global shutdown means that if you cancel the underlying token, you can never spin up a new mixnet client again, because the underlying token will be cancelled to start with.
The SDK manager is only used if no other shutdown manager is provided
| .clone()) | ||
| } | ||
|  | ||
| /// Get the ShutdownManager for SDK use. | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- if it's not yet used, just remove it
- don't leak SDK needs into the common task library (it's like exposing VPN-specific methods in the monorepo)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I merely modified the existing SDK shutdown manager. It's more of an internal shutdown manager rather than a specific one imo. Either you give your client a custom one, or it creates one for internal use. We could have an entire discussion on the shutdown process which I still think could use a proper look into
| // Use custom shutdown if provided, otherwise the sdk one will be used later down the line | ||
| if let Some(shutdown_tracker) = self.custom_shutdown { | ||
| base_builder = base_builder.with_shutdown(shutdown_tracker); | ||
| } | 
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so why do we no longer get the default static one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We still need it, but it was set a first time here, and then later in the base client startup too. There is no point in setting it in two places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference :
| let shutdown_tracker = match self.shutdown { | 
I'd be fine with setting it up there and remove it here mentioned. Then BaseClientBuilder.shutdown should no longer be optional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pronebird reviewed 6 of 7 files at r2.
Reviewable status: 6 of 7 files reviewed, 4 unresolved discussions (waiting on @durch and @jstuczyn)
common/client-core/src/client/real_messages_control/real_traffic_stream.rs line 283 at r2 (raw file):
}; let sending_res = tokio::select! {
NIT: This could be written as:
let sending_res = self.shutdown_token.run_until_cancelled(self.mix_tx.send(vec![next_message])).await;There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 7 files reviewed, 5 unresolved discussions (waiting on @durch, @jstuczyn, and @tommyv1987)
common/client-core/src/client/mix_traffic/mod.rs line 166 at r2 (raw file):
// Gateway is dead, we have to shut down currently error!("Signalling shutdown from the MixTrafficController"); self.shutdown_token.cancel();
Would it not be easier to pass a channel from parent task and bubble error back up? Then let parent cancel everything and exit cleanly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 7 files reviewed, 5 unresolved discussions (waiting on @durch, @jstuczyn, @pronebird, and @tommyv1987)
common/client-core/src/client/mix_traffic/mod.rs line 166 at r2 (raw file):
Previously, pronebird (Andrej Mihajlov) wrote…
Would it not be easier to pass a channel from parent task and bubble error back up? Then let parent cancel everything and exit cleanly.
Easier no, much better yes, hence my comment about it. We need to come back on that, that PR is just a bandaid over a bigger issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 7 files reviewed, 5 unresolved discussions (waiting on @durch, @jstuczyn, and @tommyv1987)
common/client-core/src/client/mix_traffic/mod.rs line 166 at r2 (raw file):
Previously, simonwicky (Simon Wicky) wrote…
Easier no, much better yes, hence my comment about it. We need to come back on that, that PR is just a bandaid over a bigger issue
If we have a clear parent task, to me it seems logical to eliminate bandaid and just add a simple channel, bubble up the message and then cancel all child tasks from above but not from below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 6 of 7 files reviewed, 5 unresolved discussions (waiting on @durch, @jstuczyn, @pronebird, and @tommyv1987)
common/client-core/src/client/mix_traffic/mod.rs line 166 at r2 (raw file):
Previously, pronebird (Andrej Mihajlov) wrote…
If we have a clear parent task, to me it seems logical to eliminate bandaid and just add a simple channel, bubble up the message and then cancel all child tasks from above but not from below.
Bubbling up messages from everywhere is gonna be real ugly, we need a signalling channel within the custom shutdown mechanism we have, and I don't have the time to do it properly
| Closing with more work being undertaken here: https://nymtech.atlassian.net/browse/NET-688 | 
Bugfix: Add circuit breaker
Cherry pick - request #6143 from nymtech/bugfix/mix-tx-closed-v2
Bugfix: Add circuit breaker
When the mixnet client's mix_tx channel closes during network drops,
OutQueueControlwould retry sending packets through the closed channel, flooding logs and
hanging the daemon. This affects all clients (VPN, SOCKS5, native clients), not just VPN.
Solution
Cancelling the root token from the MixTrafficController
Additionally, reduced
MAX_FAILURE_COUNTfrom 100 → 20 to detect dead gateways faster(~1-2s instead of ~6s), improving reconnection speed during mobile network drops and
sleep/wake cycles.
Example Logs (Before Fix)
This change is