Skip to content

Conversation

@wo-o
Copy link

@wo-o wo-o commented Nov 22, 2025

Problem

A race condition exists where BestTransactions iterator creates an independent snapshot via BTreeMap::clone(). When the maintenance job removes transactions from the pool, existing iterators remain unaware, causing removed transactions to be included in blocks.

Production Impact

During load testing with --txpool.lifetime 600, this caused sequencer failure:

2025-11-15T13:30:01.790071Z WARN The database read transaction has been open for too long.
    open_duration=190.664088693s self.txn_id=7039115

2025-11-15T13:30:01.947723Z WARN Attempt to calculate state root for an old block
    might result in OOM target=2846869

The sequencer stopped producing blocks because:

  1. Maintenance removed expired transactions from pool
  2. Block builder's snapshot still contained the removed transactions
  3. Execution attempted to process non-existent transactions
  4. Database read stuck
  5. State root calculation failed

This is critical for transactions with time-based expiration.

Root Cause

crates/transaction-pool/src/pool/pending.rs:110 - The iterator snapshot is created via BTreeMap::clone(), which creates an independent copy. The pool has notification support for new transactions but lacks removal notifications.

Solution

Add removal notifications following the existing pattern for new transactions:

  1. PendingPool broadcasts transaction removals via new removed_transaction_notifier channel
  2. BestTransactions subscribes to removal notifications on creation
  3. Iterator processes removals before yielding next transaction

Previously, there was a race condition where:
1. Block builder creates a BestTransactions iterator (snapshot via BTreeMap::clone())
2. Maintenance job removes a transaction from the pool
3. Block builder's snapshot was independent, still containing the removed transaction

This could cause expired transactions with time-based validity to be included in
blocks after removal.

This commit adds removal notifications:
- PendingPool broadcasts transaction removals via a new channel
- BestTransactions iterator subscribes to removal notifications
- Iterator removes transactions from its snapshot when notified
- Test verifies that removed transactions are not found in snapshots

Fixes race condition for transactions with time-based expiration.
@wo-o wo-o force-pushed the fix/txpool-race-condition branch from 6574689 to 91709cf Compare November 22, 2025 04:51
@mattsse
Copy link
Collaborator

mattsse commented Nov 23, 2025

could you elaborate on why

causing removed transactions to be included in blocks.

would be problematic, because atm I believe this would only happen if the pending pool is at capacity, but I assume you need this to satisfy some protocol level rules?

Block builder's snapshot still contained the removed transactions
Execution attempted to process non-existent transactions
Database read stuck

not following the sequence of events here that's causing db/State root issues, maybe this has something to do with some protocol specific rules?
if the blockbuilder adds this tx into the block, how would this cause execution errors?

Copy link
Collaborator

@mattsse mattsse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lacking some context on why excluding removed txs is critical

but maybe this is similar to OP interop inclusion rules

impl<Cons, Pooled> MaybeInteropTransaction for OpPooledTransaction<Cons, Pooled> {
fn set_interop_deadline(&self, deadline: u64) {
self.interop.store(deadline, Ordering::Relaxed);
}
fn interop_deadline(&self) -> Option<u64> {
let interop = self.interop.load(Ordering::Relaxed);
if interop > NO_INTEROP_TX {
return Some(interop)
}
None
}

which is checked during block building for example

let interop = tx.interop_deadline();

// We skip invalid cross chain txs, they would be removed on the next block update in
// the maintenance job
if let Some(interop) = interop &&
!is_valid_interop(interop, self.config.attributes.timestamp())
{
best_txs.mark_invalid(tx.signer(), tx.nonce());
continue
}

Comment on lines 101 to 106
pub(crate) new_transaction_receiver: Option<Receiver<PendingTransaction<T>>>,
/// Used to receive transaction removals from the pool after this iterator was created.
///
/// Removed transactions are deleted from this iterator's snapshot before yielding the next
/// value, preventing inclusion of transactions that were removed by maintenance jobs.
pub(crate) removed_transaction_receiver: Option<Receiver<TransactionId>>,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not opposed to this, but since this is an internal channel we can unify this by introducing

enum PendingPoolEvent<T> {Added(Tx),Removed(tx)}

Copy link
Author

@wo-o wo-o Nov 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mattsse Thanks for the feedback. I think my original description was misleading.

The issue is not that removed transactions get included in blocks. The actual problem is that when a transaction is removed from the pool (e.g., due to --txpool.lifetime expiration), the BestTransactions iterator still holds references to it in its snapshot. During block building, this mismatch causes the iterator to get stuck trying to access transactions that no longer exist in the pool's state, leading to long-running DB read transactions and eventually blocking block production.

I've also unified the two notification channels into a single PendingPoolEvent enum as you suggested:
0f2d0af

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

leading to long-running DB read transactions and eventually blocking block production.

can you expand on this a bit more, because I don't quite understand how this would cause issues during building, is there something that just endlessly loops then? and how exactly

@github-project-automation github-project-automation bot moved this from Backlog to In Progress in Reth Tracker Nov 23, 2025
Merge new_transaction_notifier and removed_transaction_notifier
into a single event_notifier using PendingPoolEvent enum.
@wo-o wo-o requested a review from mattsse November 28, 2025 01:31
@mattsse
Copy link
Collaborator

mattsse commented Nov 28, 2025

@wo-o we had another recent fix for updates in this one c7b6890 unsure if related, but we now have a smol conflict, sorry about this.

I have one more questions, because it's unclear how still behaviour would stall block production because this iter is a full snapshot, basically independent of the pool

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: In Progress

Development

Successfully merging this pull request may close these issues.

2 participants