Let `BackgroundProcessor` drive HTLC forwarding #3891

tnull · 2025-06-25T15:12:49Z

Closes #3768.

Previously, we'd require the user to manually call process_pending_htlc_forwards as part of PendingHTLCsForwardable event handling. Here, we rather move this responsibility to BackgroundProcessor, which simplifies the flow and allows us to implement reasonable forwarding delays on our side rather than delegating to users' implementations.

Note this also introduces batching rounds rather than calling process_pending_htlc_forwards individually for each PendingHTLCsForwardable event, which had been unintuitive anyways, as subsequent PendingHTLCsForwardable could lead to overlapping batch intervals, resulting in the shortest timespan 'winning' every time, as process_pending_htlc_forwards would of course handle all pending HTLCs at once.

To this end, we implement random sampling of batch delays from a log-normal distribution with a mean of 50ms and drop the PendingHTLCsForwardable event.

~~Draft for now as I'm still cleaning up the code base as part of the final commit dropping PendingHTLCsForwardable.~~

ldk-reviews-bot · 2025-06-25T15:12:52Z

👋 Thanks for assigning @TheBlueMatt as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

joostjager · 2025-06-25T16:46:14Z

Does this in any way limit users to not have delays or not have batching? Assuming that's what they want.

tnull · 2025-06-25T16:58:39Z

Does this in any way limit users to not have delays or not have batching? Assuming that's what they want.

On the contrary actually: it effectively reduces the (mean and min forwarding) delay quite a bit, which we can allow as we're gonna add larger receiver-side delays in the next step. And, while it get's rid of the event, users are still free to call process_pending_htlc_forwards on a faster schedule if they really want to. IMO, this should result in a win-win situation: substantially reduced forwarding delays on average and by default, while still considerably improving receiver anonymity.

joostjager · 2025-06-26T09:11:32Z

Isn't it the case that without the event, as a user you are forced to "poll" for forwards, making extra delays unavoidable?

tnull · 2025-06-26T09:17:20Z

Isn't it the case that without the event, as a user you are forced to "poll" for forwards, making extra delays unavoidable?

LDK always processes HTLCs in batches (note that process_pending_htlcs never allowed to just forward a single HTLC, for good reason). Having some batching delay makes a lot of sense in any scenario. And given that 'polling' is really cheap, users could consider doing that frequently. But, they really shouldn't try to skip the batching entirely as IO overhead/delay would come to bite them (especially on more busy forwarding nodes), and of course since they should be 'good citizens' providing some privacy by default for the network.

joostjager · 2025-06-26T09:26:58Z

Polling may be cheap, but forcing users to poll when there is an event mechanism available, is that really the right choice? Perhaps the event is beneficial for testing, debugging and monitoring too?

tnull · 2025-06-26T09:32:32Z

Polling may be cheap, but forcing users to poll when there is an event mechanism available, is that really the right choice? Perhaps the event is beneficial for testing, debugging and monitoring too?

The event never featured any information so is not helpful for debugging or 'informational' purposes. Plus, it means at least 1-2 more rounds of ChannelManager persistence, just to queue and remove the event. So since we don't need it anymore, we should def. drop it in production. As you know I was on the fence whether to drop it for testing, but now went this way, especially given that nobody indicated a strong opinion either way. If we indeed want to introspect the holding cell during testing (or, e.g., in fuzzing), we should add another approach to do it, but that's up for discussion.

joostjager · 2025-06-26T09:41:28Z

But at least the event could wake up the background processor, where as now nothing is waking it up for forwards and the user is forced to call into channel manager at a high frequency? Not sure if there is a lighter way to wake up the bp without persistence involved.

Also if you have to call into channel manager always anyway, aren't there more events/notifiers that can be dropped?

As you know I was on the fence whether to drop it for testing, but now went this way, especially given that nobody indicated a strong opinion either way.

I may have missed this deciding moment.

If the assertions were useless to begin with, no problem dropping them of course. I can imagine though that at some points, a peek into the pending htlc state is still required to not reduce the coverage of the tests?

tnull · 2025-06-26T09:46:58Z

But at least the event could wake up the background processor, where as now nothing is waking it up for forwards and the user is forced to call into channel manager at a high frequency? Not sure if there is a lighter way to wake up the bp without persistence involved.

Also if you have to call into channel manager always anyway, aren't there more events/notifiers that can be dropped?

As you know I was on the fence whether to drop it for testing, but now went this way, especially given that nobody indicated a strong opinion either way.

I may have missed this deciding moment.

Again, the default behavior we had intended to switch to for quite some time is to introduce batching intervals (especially given that the current event-based approach was essentially broken/race-y). This is what is implemented here. If users want to bend the recommended/default approach they are free to do so, but I don't think it makes sense to keep all the legacy codepaths, including persistence overhead, around if it's not used anymore.

If the assertions were useless to begin with, no problem dropping them of course. I can imagine though that at some points, a peek into the pending htlc state is still required to not reduce the coverage of the tests?

I don't think this is generally the case, no. The 'assertion' that is mainly dropped is 'we generated an event', every thing else remains the same.

joostjager · 2025-06-26T10:25:37Z

Again, the default behavior we had intended to switch to for quite some time is to introduce batching intervals (especially given that the current event-based approach was essentially broken/race-y). This is what is implemented here. If users want to bend the recommended/default approach they are free to do so, but I don't think it makes sense to keep all the legacy codepaths, including persistence overhead, around if it's not used anymore.

This doesn't rule out a notification when there's something to forward, to at least not keep spinning when there's nothing to do?

tnull · 2025-06-27T09:31:17Z

Finished for now with the test refactoring post-dropping PendingHTLCsForwardable event. This should be good for a first round of (concept) review. Whether or not we should add a notifier on top is up for debate.

ldk-reviews-bot · 2025-06-27T09:40:24Z

✅ Added second reviewer: @valentinewallace

joostjager · 2025-06-27T13:22:18Z

lightning-background-processor/src/lib.rs

@@ -360,12 +376,24 @@ macro_rules! define_run_body {
 				break;
 			}

+			if $timer_elapsed(&mut last_forwards_processing_call, cur_batch_delay) {
+				$channel_manager.get_cm().process_pending_htlc_forwards();


Looked a bit closer at this function. There is a lot of logic in there. Also various locks obtained.

ldk-reviews-bot · 2025-06-30T00:00:43Z

🔔 1st Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2025-07-02T00:01:33Z

🔔 2nd Reminder

Hey @valentinewallace! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

codecov · 2025-07-03T07:34:39Z

Codecov Report

Attention: Patch coverage is 97.72727% with 8 lines in your changes missing coverage. Please review.

Project coverage is 88.78%. Comparing base (257ebad) to head (e088025).

Files with missing lines	Patch %	Lines
lightning-background-processor/src/lib.rs	76.66%	7 Missing ⚠️
lightning/src/ln/outbound_payment.rs	75.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #3891      +/-   ##
==========================================
- Coverage   88.82%   88.78%   -0.05%     
==========================================
  Files         165      166       +1     
  Lines      119075   119576     +501     
  Branches   119075   119576     +501     
==========================================
+ Hits       105769   106165     +396     
- Misses      10986    11099     +113     
+ Partials     2320     2312       -8

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

lightning/src/ln/channelmanager.rs

joostjager · 2025-07-03T12:02:56Z

lightning/src/ln/channelmanager.rs

@@ -6337,6 +6337,14 @@ where
 	///
 	/// Will regularly be called by the background processor.
 	pub fn process_pending_htlc_forwards(&self) {
+		static REENTRANCY_GUARD: AtomicBool = AtomicBool::new(false);


Did this happen, another round of processing still underway? Also wondering if processing can be skipped accidentally.

Did this happen, another round of processing still underway?

Yes, for example if users would manually call process_pending_htlc_forwards in addition to the background proceessor.

Also wondering if processing can be skipped accidentally.

No? How would this happen?

No? How would this happen?

process_pending_htlc_forwards is executing and while that's happening, new forwards arrive. Then concurrently another call to process_pending_htlc_forwards is initiated, which becomes a silent noop. At that point there are forwards that haven't been processed and must wait until the next round - if it comes - depending on implementation.

Also there can be a race condition between the persistence guard and the atomic bool I think?

process_pending_htlc_forwards is executing and while that's happening, new forwards arrive. Then concurrently another call to process_pending_htlc_forwards is initiated, which becomes a silent noop. At that point there are forwards that haven't been processed and must wait until the next round - if it comes - depending on implementation.

That's not an accidental skip though, they will be processed as part of the next batch.

Also there can be a race condition between the persistence guard and the atomic bool I think?

What do you mean by that?

That's not an accidental skip though, they will be processed as part of the next batch.

Not everyone may use our background processor. I think it is the expectation that process_pending_htlc_forwards does what it says? Or otherwise return an error perhaps.

What do you mean by that?

The persistence lock is released and then the atomic bool is reset. In between, another forward may come in, that is then not processed because the atomic bool is still set. The same argument basically, that it isn't fully safe, unless you keep retrying.

The persistence lock is released and then the atomic bool is reset.

No, the persistence guard will be dropped and trigger persistence at the end of the scope, i.e., first the atomic bool is set to false, then we'll trigger persistence.

Ah I see indeed. But is it really necessary to have the reentry guard? If users call this a bit more often, the only thing that would happen is that they may have to wait for the lock. And waiting for the lock may happen much more often anyway, because we are now polling.

Ah I see indeed. But is it really necessary to have the reentry guard? If users call this a bit more often, the only thing that would happen is that they may have to wait for the lock. And waiting for the lock may happen much more often anyway, because we are now polling.

I think it's safer to have it, as otherwise individual calls might get stacked all waiting on the locks, which might get more and more congested if they end up calling in faster than we can process.

joostjager · 2025-07-03T13:07:39Z

lightning-background-processor/src/lib.rs

@@ -360,12 +376,24 @@ macro_rules! define_run_body {
 				break;
 			}

+			if $timer_elapsed(&mut last_forwards_processing_call, cur_batch_delay) {
+				$channel_manager.get_cm().process_pending_htlc_forwards();


I was also wondering whether this function needs to be called twice per htlc (add and fail/settle), and hits the delay twice? And whether it is also required to be called at the sender and at the receiver node? (I know in lnd that the abstraction was so that for example on the final hop, it would 'forward' to the invoice)

lightning/src/ln/channelmanager.rs

valentinewallace · 2025-07-03T19:11:51Z

lightning-background-processor/src/fwd_batch.rs

+
+use core::time::Duration;
+
+pub(crate) struct BatchDelay {


Can this and the below be pub(super)?

Given they are at the first hierarchy level, it's the same thing?

lightning-background-processor/src/lib.rs

valentinewallace · 2025-07-03T20:08:37Z

lightning-background-processor/src/fwd_batch.rs

+// log_normal_data <- round(rlnorm(n, meanlog = meanlog, sdlog = sdlog))
+// cat(log_normal_data, file = "log_normal_data.txt", sep = ", ")
+// ```
+const FWD_DELAYS_MILLIS: [u16; 10000] = [


Hmm, I'm still not convinced this achieves much for the AS level attacker. The Revelio paper states "...the adversary can perfectly group payments (i.e., maintaining the success rate of 100%) with a per-
channel transaction rate of up to 0.33 tx/s. ... While this transaction rate may look small at first, it is in fact, 4-orders of magnitude larger than the estimated average transaction rate in current LN (i.e., 0.000019 tx/s per channel)." It also doesn't mention forwarding delays as a potential mitigation in the "Countermeasures" section, though I can see why that's somewhat intuitive.

I guess for me, for the AS threat model it all seems a bit security theater until we have constant bandwidth, basically? If this will be reused for receiver-side delays, it probably isn't worth holding up the PR over it though.

lightning/src/ln/channelmanager.rs

Previously, all `TIMER` constants were `u64` implictly assumed to represent seconds. Here, we switch them over to be `Duration`s, which allows for the introduction of sub-second timers. Moreover, it avoids any future confusions due to the implicitly assumed units.

Previously, we'd require the user to manually call `process_pending_htlc_forwards` as part of `PendingHTLCsForwardable` event handling. Here, we rather move this responsibility to `BackgroundProcessor`, which simplyfies the flow and allows us to implement reasonable forwarding delays on our side rather than delegating to users' implementations. Note this also introduces batching rounds rather than calling `process_pending_htlc_forwards` individually for each `PendingHTLCsForwardable` event, which had been unintuitive anyways, as subsequent `PendingHTLCsForwardable` could lead to overlapping batch intervals, resulting in the shortest timespan 'winning' every time, as `process_pending_htlc_forwards` would of course handle all pending HTLCs at once.

Now that we have `BackgroundProcessor` drive the batch forwarding of HTLCs, we implement random sampling of batch delays from a log-normal distribution with a mean of 50ms.

We move the code into the `optionally_notify` closure, but maintain the behavior for now. In the next step, we'll use this to make sure we only repersist when necessary. Best reviewed via `git diff --ignore-all-space`

We skip repersisting `ChannelManager` when nothing is actually processed.

We add a reenatrancy guard to disallow entering `process_pending_htlc_forwards` multiple times. This makes sure that we'd skip any additional processing calls if a prior round/batch of processing is still underway.

tnull marked this pull request as draft June 25, 2025 15:12

tnull force-pushed the 2025-06-batch-forwarding-delays branch from ceb3335 to 9ba691c Compare June 26, 2025 08:13

tnull force-pushed the 2025-06-batch-forwarding-delays branch from 9ba691c to b38c19e Compare June 26, 2025 09:49

tnull force-pushed the 2025-06-batch-forwarding-delays branch from c1a0b35 to d35c944 Compare June 26, 2025 13:17

tnull added this to Weekly Goals Jun 26, 2025

tnull self-assigned this Jun 26, 2025

tnull force-pushed the 2025-06-batch-forwarding-delays branch from d35c944 to c21aeab Compare June 27, 2025 09:29

tnull requested a review from TheBlueMatt June 27, 2025 09:29

tnull marked this pull request as ready for review June 27, 2025 09:29

tnull removed the request for review from TheBlueMatt June 27, 2025 09:36

tnull moved this to Goal: Merge in Weekly Goals Jun 27, 2025

ldk-reviews-bot requested a review from valentinewallace June 27, 2025 09:40

tnull requested review from TheBlueMatt and removed request for TheBlueMatt June 27, 2025 09:51

joostjager reviewed Jun 27, 2025

View reviewed changes

Add copyright header to lightning-background-processor/src/lib.rs

61348b1

tnull force-pushed the 2025-06-batch-forwarding-delays branch from 1ad6ce7 to e088025 Compare July 3, 2025 07:25

tnull force-pushed the 2025-06-batch-forwarding-delays branch 2 times, most recently from 7920f35 to 8a67f2a Compare July 3, 2025 11:52

joostjager reviewed Jul 3, 2025

View reviewed changes

lightning/src/ln/channelmanager.rs Show resolved Hide resolved

joostjager reviewed Jul 3, 2025

View reviewed changes

valentinewallace reviewed Jul 3, 2025

View reviewed changes

tnull added 21 commits July 4, 2025 10:34

f Address Val's comments

d672df7

f No need for interior mutability

18c8ed3

f Change comment

29b13ad

Randomly draw forwarding delays

fa60ba7

Now that we have `BackgroundProcessor` drive the batch forwarding of HTLCs, we implement random sampling of batch delays from a log-normal distribution with a mean of 50ms.

f Fix clippy lint

9e9510c

f Account for usize not always being 8 bytes

eaa76a6

Drop PendingHTLCsForwardable event

a235a0b

f Fix length check in expect_payment_failed_condition_event

051b504

f Drop expect_pending_htlcs_forwardable

ebf8a4d

f Drop expect_pending_htlcs_forwardable_conditions macro

b09f7b8

f Drop expect_pending_htlcs_forwardable_.._ignore macro

e9fd301

f Drop expect_pending_htlcs_forwardable_ignore macro

bb75d79

f Rename expect..and_htlc_handling_failed macro

6a3081f

f Rename expect_pending_htlcs_forwardable_conditions

66da430

f Adjust helper docs

e309c4e

f Try fix fuzzer

4740d01

Use optionally_notify in process_pending_htlc_forwards

0236418

We move the code into the `optionally_notify` closure, but maintain the behavior for now. In the next step, we'll use this to make sure we only repersist when necessary. Best reviewed via `git diff --ignore-all-space`

Skip unnecessary persists in process_pending_htlc_forwards

7758838

We skip repersisting `ChannelManager` when nothing is actually processed.

Add reenatrancy guard for process_pending_htlc_forwards

eb83451

We add a reenatrancy guard to disallow entering `process_pending_htlc_forwards` multiple times. This makes sure that we'd skip any additional processing calls if a prior round/batch of processing is still underway.

tnull force-pushed the 2025-06-batch-forwarding-delays branch from a8c1d66 to eb83451 Compare July 4, 2025 08:39

Let BackgroundProcessor drive HTLC forwarding #3891

Are you sure you want to change the base?

Let BackgroundProcessor drive HTLC forwarding #3891

Uh oh!

Conversation

tnull commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ldk-reviews-bot commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tnull commented Jun 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented Jun 26, 2025

Uh oh!

tnull commented Jun 26, 2025

Uh oh!

joostjager commented Jun 26, 2025

Uh oh!

tnull commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tnull commented Jun 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager commented Jun 26, 2025

Uh oh!

tnull commented Jun 27, 2025

Uh oh!

ldk-reviews-bot commented Jun 27, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented Jun 30, 2025

Uh oh!

ldk-reviews-bot commented Jul 2, 2025

Uh oh!

codecov bot commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

joostjager Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Let `BackgroundProcessor` drive HTLC forwarding #3891

Let `BackgroundProcessor` drive HTLC forwarding #3891

tnull commented Jun 25, 2025 •

edited

Loading

ldk-reviews-bot commented Jun 25, 2025 •

edited

Loading

joostjager commented Jun 25, 2025 •

edited

Loading

tnull commented Jun 25, 2025 •

edited

Loading

tnull commented Jun 26, 2025 •

edited

Loading

joostjager commented Jun 26, 2025 •

edited

Loading

tnull commented Jun 26, 2025 •

edited

Loading

codecov bot commented Jul 3, 2025 •

edited

Loading

joostjager Jul 3, 2025 •

edited

Loading