Skip to content

Migrate pool to dashmap#304

Open
Shourya742 wants to merge 17 commits intostratum-mining:mainfrom
Shourya742:02-03-2026-migrate-pool-to-dashmap
Open

Migrate pool to dashmap#304
Shourya742 wants to merge 17 commits intostratum-mining:mainfrom
Shourya742:02-03-2026-migrate-pool-to-dashmap

Conversation

@Shourya742
Copy link
Collaborator

closes: #205

@Shourya742 Shourya742 force-pushed the 02-03-2026-migrate-pool-to-dashmap branch from 2ecf9d3 to 5a23188 Compare March 2, 2026 03:04
@Shourya742 Shourya742 marked this pull request as ready for review March 3, 2026 00:12
@Shourya742 Shourya742 force-pushed the 02-03-2026-migrate-pool-to-dashmap branch from 247919a to 281946f Compare March 3, 2026 00:13
Copy link
Contributor

@average-gary average-gary left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two concurrency concerns flagged below.

let Some(downstream) = channel_manager_data.downstream.get(&downstream_id) else {
return Err(PoolError::disconnect(PoolErrorKind::DownstreamNotFound(downstream_id), downstream_id));
};
let Some(downstream) = self.downstream.get(&downstream_id) else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DashMap guard held across .await

Unlike every other handler in this file, handle_update_channel doesn't use the closure pattern to scope DashMap guards. The downstream Ref acquired here lives through the for message in messages { message.forward(...).await; } loop at the bottom, blocking the entire shard for the duration of the async send.

Wrap the body in a closure like the other handlers do:

let process_update_channel = || {
    let Some(downstream) = self.downstream.get(&downstream_id) else { ... };
    // ... build messages ...
    Ok(messages)
};
let messages = process_update_channel()?;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The downstream object gets dropped as soon as its stop being used and we are calling await at the very end, and their scope doesn't intersect.

let vardiff_key = vardiff.key().clone();
let vardiff_state = vardiff.value_mut();
let downstream_id = &vardiff_key.downstream_id;
let channel_id = &vardiff_key.channel_id;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deadlock risk: inverted lock ordering with submit handlers

This loop holds three nested DashMap guards simultaneously: self.vardiff (iter_mut) → self.downstream (get_mut) → downstream.standard_channels (get_mut).

The submit handlers acquire these in the opposite order: self.downstreamstandard_channelsself.vardiff.

Under shard collision this is a classic lock-ordering deadlock. Consider collecting the keys first to avoid holding the vardiff iter guard while acquiring the others:

let keys: Vec<_> = self.vardiff.iter().map(|r| r.key().clone()).collect();
for key in keys {
    let Some(mut vardiff) = self.vardiff.get_mut(&key) else { continue };
    // ...
    drop(vardiff); // or scope it tightly
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TBH, I don't understand this. ;)

let messages = self.channel_manager_data.super_safe_lock(|channel_manager_data| {
let Some(downstream) = channel_manager_data.downstream.get_mut(&downstream_id) else {
return Err(PoolError::disconnect(PoolErrorKind::DownstreamIdNotFound, downstream_id));
let process_open_standard_mining_channel = || {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you put this here instead of the let messages = ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we require a closure here is that the block contains return statements. Without the closure, those returns would exit the entire handler method instead of just the block.

That said, I am not a big fan of this pattern anymore. It originally existed to work with the nested locking pattern we had before. Since that is no longer the case, we don’t really need this structure anymore. The code can likely be simplified to something much leaner and easier to reason about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would simplify this block though, and I would do the same in all the other places where you introduced this closure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, I am currently doing that. Will push changes in sometime

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Made the changes, the handlers should be leaner now. The commits are structured so that each method change is in its own atomic commit making review easier for latest set of changes related to this.

3af7b21
e567925
f11b665
6fde432
cc66092

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before merging this, I would squash them in the previous commits accordingly.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sure, they were there for ease of review.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commits are squashed now.

@Shourya742 Shourya742 force-pushed the 02-03-2026-migrate-pool-to-dashmap branch from 2645ba9 to cc66092 Compare March 11, 2026 11:08
if downstream.requires_custom_work.load(Ordering::SeqCst) {
error!("OpenStandardMiningChannel: Standard Channels are not supported for this connection");
let open_standard_mining_channel_error = OpenMiningChannelError {
let send_error = |error_code: &'static str| async {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this become a function in utils.rs, which can be called from every place where we need it?

I see it's currently defined and repeated multiple times.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we look at the implementation of all such closure, we can see that it points to a very specific error message tied to the method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not completely true, because we have some cases where the error message is exactly the same.

For example, we have two identical closures for the OpenMiningChannelError.

Since it seems something which can be used for different error messages, why can't it be a function in utils.rs, where you can also pass the error message you want, and it does the job (probably matching on the error message which is passed) ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Passing the entire error message to a helper method somewhat defeats the purpose of the closure in the first place. The closure was introduced to avoid constructing the error message repeatedly and to eliminate boilerplate across multiple call sites when the only variation is the error code.

Copy link
Member

@GitGab19 GitGab19 Mar 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With one helper method you put the logic which is inside the different closures only in one place, but you can use it for different error messages, and then call it from different contexts to send a specific error message with a specific error code.

Example:

forward_error_message_to_channel_manager(error_message_type, error_code)

and then:

forward_error_message_to_channel_manager(OPEN_MINING_CHANNEL_ERROR_MESSAGE_TYPE, "standard-channels-not-supported-for-custom-work")

or

forward_error_message_to_channel_manager(SET_CUSTOM_MINING_JOB_ERROR_MESSAGE_TYPE, "pool-payout-script-missing")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless I'm missing context, which is a very real possibility

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that should be an issue to be tracked for, its just a helper closure to remove repetitive message construction during method execution.

Copy link
Member

@plebhash plebhash Mar 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there seems to be two topics of discussion here:

  1. RouteTo
  2. closure

I'm just pointing out that (IIUC) @GitGab19 said we need an issue to keep track of 2. (replying to your ping), #330 was presented as the answer, while it's scope only covers 1

I'm fine if we decide to move forward without addressing the concerns raised about closure convolution, I'm just trying to make sure we're all on the same page and not masquerading one issue with another

anyways, I'm hitting the road in a bit so won't be able to do a deep dive on this PR today so I'll leave it for you guys to figure it out

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am removing the closure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be good now, updated the commits. IT passes. :)

@Shourya742 Shourya742 force-pushed the 02-03-2026-migrate-pool-to-dashmap branch from cc66092 to 49b0f44 Compare March 13, 2026 08:42
@Shourya742 Shourya742 force-pushed the 02-03-2026-migrate-pool-to-dashmap branch from 49b0f44 to 5a88034 Compare March 13, 2026 08:42
@Shourya742 Shourya742 force-pushed the 02-03-2026-migrate-pool-to-dashmap branch from 5a88034 to 6f19477 Compare March 13, 2026 14:31
@Shourya742 Shourya742 marked this pull request as draft March 13, 2026 17:38
@Shourya742 Shourya742 marked this pull request as ready for review March 14, 2026 11:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Refactor Pool to Reduce Nested Locking

4 participants