Skip to content

Respect settle queue size#4338

Open
fafk wants to merge 8 commits intomainfrom
respect-queue-size
Open

Respect settle queue size#4338
fafk wants to merge 8 commits intomainfrom
respect-queue-size

Conversation

@fafk
Copy link
Copy Markdown
Contributor

@fafk fafk commented Apr 15, 2026

Description

The reference driver rejects new solutions when there is already a backlog of solutions that still need to be submitted because they will most likely not be mined in time. This is intended to protect very competitive solvers from penalties when they win too much but can't submit fast enough.
#4167 introduced a bug where the check whether to reject the /solve request only looks at the available tx submission slots but not the settle queue.
This has the consequence that a solver with only a single submission EOA that won an auction will reject /solve requests until the previous solution was submitted.

Changes

Add a semaphore with capacity equal to queue size to mimic missing queue behavior.

@fafk fafk closed this Apr 15, 2026
@github-actions github-actions bot locked and limited conversation to collaborators Apr 15, 2026
@fafk fafk reopened this Apr 15, 2026
@fafk fafk marked this pull request as ready for review April 16, 2026 05:45
@fafk fafk requested a review from a team as a code owner April 16, 2026 05:45
@cowprotocol cowprotocol unlocked this conversation Apr 16, 2026
Comment thread crates/driver/src/domain/competition/mod.rs Outdated
Comment thread crates/driver/src/domain/competition/mod.rs Outdated

let this = Arc::clone(self);
let tracing_span = tracing::Span::current();
let handle = tokio::spawn(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs a good comment explaining that we spawn the task so that we still cancel the submission after the autopilot already terminated the request (i.e. when the autopilot knows we didn't submit within the deadline before we do).
Otherwise this code just looks super strange. 😅

self.settle_queue.try_send(request).map_err(|err| {
tracing::warn!(?err, "Failed to enqueue /settle request");
let admission_permit = self.submitter_pool.try_admit().ok_or_else(|| {
tracing::warn!("no idle submission slots; settle request rejected");
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This permission is only for getting into the queue, not yet for actually starting the submission process.

Suggested change
tracing::warn!("no idle submission slots; settle request rejected");
tracing::warn!("too many pending settlements; settle request rejected");

.await,
);

let solution_ids = join_all(vec![
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably good to have a comment explaining that we can submit as many bids as we want as long as there is at least 1 settle queue spot available.

Comment on lines +374 to +382
let mut admitted = 0;
let mut rejected = 0;
for result in &results {
match result.error_kind().as_deref() {
None | Some("FailedToSubmit") => admitted += 1,
Some("TooManyPendingSettlements") => rejected += 1,
Some(other) => panic!("unexpected error kind: {other}"),
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we sleep briefly (10ms) between the /settle calls does the outcome of the settle futures become deterministic? That way we could have a stricter test asserting the exact sequence of results:
None (success), "FailedToSubmit" (order already filled), "FailedToSubmit" (order already filled), "TooManyPendingSettlements", "TooManyPendingSettlements"

/// requests to `pool_slots + settle_queue_size` (default 1 + 2 = 3).
#[tokio::test]
#[ignore]
async fn admission_capacity_is_respected() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am I wrong or is this test a better version of discards_excess_settle_and_solve_requests? Can we delete the other one?

Comment on lines +762 to 765
handle.await.map_err(|err| {
tracing::error!(?err, "settle task panicked");
Error::SubmissionError
})?
Copy link
Copy Markdown
Contributor

@MartinquaXD MartinquaXD Apr 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You probably already worked on the new code before I edited one of my previous comments but I think getting rid of the channels comes with an edge case.
If the block stream of the driver is stuck (i.e. it doesn't see new incoming blocks) the driver will never stop submitting the current txs and the settle queue will fill up.
Since we can consider the autopilot the source of truth we should probably keep the logic that continues polling the settle future for a second. Either the block stream is healthy and the driver will quickly see that it should try to cancel the tx, or it's not healthy and it will still not cancel the submission but at least it will free up the submission slot again so that it can theoretically submit new solutions going forward. (see #3427)

However, we can probably still keep the "grace period" idea but a bit simpler than what we had with the oneshot channels. Something like this should do the trick:

struct SettleTaskHandle<T: Send + 'static>(tokio::task::JoinHandle<T>);

impl <T: Send + 'static> Drop for SettleTaskHandle<T> {
    fn drop(&mut self) {
        if self.0.is_finished() {
            return;
        }
        // continue polling the settle future for a short grace period
        // see <https://github.com/cowprotocol/services/pull/3427>
        let abort_handle = self.0.abort_handle();
        tokio::task::spawn(async move {
            tokio::time::sleep(std::time::Duration::from_secs(1)).await;
            abort_handle.abort();
        });
    }
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants