Conversation
|
|
||
| let this = Arc::clone(self); | ||
| let tracing_span = tracing::Span::current(); | ||
| let handle = tokio::spawn( |
There was a problem hiding this comment.
This needs a good comment explaining that we spawn the task so that we still cancel the submission after the autopilot already terminated the request (i.e. when the autopilot knows we didn't submit within the deadline before we do).
Otherwise this code just looks super strange. 😅
| self.settle_queue.try_send(request).map_err(|err| { | ||
| tracing::warn!(?err, "Failed to enqueue /settle request"); | ||
| let admission_permit = self.submitter_pool.try_admit().ok_or_else(|| { | ||
| tracing::warn!("no idle submission slots; settle request rejected"); |
There was a problem hiding this comment.
This permission is only for getting into the queue, not yet for actually starting the submission process.
| tracing::warn!("no idle submission slots; settle request rejected"); | |
| tracing::warn!("too many pending settlements; settle request rejected"); |
| .await, | ||
| ); | ||
|
|
||
| let solution_ids = join_all(vec![ |
There was a problem hiding this comment.
Probably good to have a comment explaining that we can submit as many bids as we want as long as there is at least 1 settle queue spot available.
| let mut admitted = 0; | ||
| let mut rejected = 0; | ||
| for result in &results { | ||
| match result.error_kind().as_deref() { | ||
| None | Some("FailedToSubmit") => admitted += 1, | ||
| Some("TooManyPendingSettlements") => rejected += 1, | ||
| Some(other) => panic!("unexpected error kind: {other}"), | ||
| } | ||
| } |
There was a problem hiding this comment.
If we sleep briefly (10ms) between the /settle calls does the outcome of the settle futures become deterministic? That way we could have a stricter test asserting the exact sequence of results:
None (success), "FailedToSubmit" (order already filled), "FailedToSubmit" (order already filled), "TooManyPendingSettlements", "TooManyPendingSettlements"
| /// requests to `pool_slots + settle_queue_size` (default 1 + 2 = 3). | ||
| #[tokio::test] | ||
| #[ignore] | ||
| async fn admission_capacity_is_respected() { |
There was a problem hiding this comment.
Am I wrong or is this test a better version of discards_excess_settle_and_solve_requests? Can we delete the other one?
| handle.await.map_err(|err| { | ||
| tracing::error!(?err, "settle task panicked"); | ||
| Error::SubmissionError | ||
| })? |
There was a problem hiding this comment.
You probably already worked on the new code before I edited one of my previous comments but I think getting rid of the channels comes with an edge case.
If the block stream of the driver is stuck (i.e. it doesn't see new incoming blocks) the driver will never stop submitting the current txs and the settle queue will fill up.
Since we can consider the autopilot the source of truth we should probably keep the logic that continues polling the settle future for a second. Either the block stream is healthy and the driver will quickly see that it should try to cancel the tx, or it's not healthy and it will still not cancel the submission but at least it will free up the submission slot again so that it can theoretically submit new solutions going forward. (see #3427)
However, we can probably still keep the "grace period" idea but a bit simpler than what we had with the oneshot channels. Something like this should do the trick:
struct SettleTaskHandle<T: Send + 'static>(tokio::task::JoinHandle<T>);
impl <T: Send + 'static> Drop for SettleTaskHandle<T> {
fn drop(&mut self) {
if self.0.is_finished() {
return;
}
// continue polling the settle future for a short grace period
// see <https://github.com/cowprotocol/services/pull/3427>
let abort_handle = self.0.abort_handle();
tokio::task::spawn(async move {
tokio::time::sleep(std::time::Duration::from_secs(1)).await;
abort_handle.abort();
});
}
}
Description
The reference driver rejects new solutions when there is already a backlog of solutions that still need to be submitted because they will most likely not be mined in time. This is intended to protect very competitive solvers from penalties when they win too much but can't submit fast enough.
#4167 introduced a bug where the check whether to reject the
/solverequest only looks at the available tx submission slots but not the settle queue.This has the consequence that a solver with only a single submission EOA that won an auction will reject
/solverequests until the previous solution was submitted.Changes
Add a semaphore with capacity equal to queue size to mimic missing queue behavior.