Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cancel back outgoing dust htlcs before commitment is confirmed. #9068

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Conversation

ziggie1984
Copy link
Collaborator

@ziggie1984 ziggie1984 commented Sep 5, 2024

Fixes #7969

Copy link
Contributor

coderabbitai bot commented Sep 5, 2024

Important

Review skipped

Auto reviews are limited to specific labels.

Labels to auto review (1)
  • llm-review

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Walkthrough

The changes enhance the handling of canceled Hash Time-Locked Contracts (HTLCs) within the contract arbitration process. New methods for inserting and fetching canceled HTLCs are added to the ArbitratorLog interface and its implementation. Additionally, logic for managing dust HTLCs is improved, ensuring they are canceled promptly to prevent issues during channel closures. The updates also include tests to validate these functionalities and address a bug related to dust HTLCs not being handled correctly during channel closures.

Changes

Files Change Summary
contractcourt/briefcase.go Added methods for inserting and fetching canceled HTLCs in the ArbitratorLog interface and its implementation.
contractcourt/briefcase_test.go Introduced a test function to validate the functionality of storing and retrieving canceled HTLCs.
contractcourt/channel_arbitrator.go Implemented logic to handle dust HTLCs, including immediate cancellation and improved resolution processes.
contractcourt/channel_arbitrator_test.go Updated mock arbitrator log to track canceled HTLCs and modified tests for dust HTLC resolution.
contractcourt/contract_resolver.go Added functions in ResolverConfig for managing canceled HTLCs.
contractcourt/htlc_success_resolver_test.go Introduced placeholder functions for managing canceled HTLCs in tests.
contractcourt/htlc_timeout_resolver.go Enhanced logic to prevent duplicate resolution attempts for canceled HTLCs during the resolution process.
contractcourt/htlc_timeout_resolver_test.go Added placeholder functions for managing canceled HTLCs in tests.
docs/release-notes/release-notes-0.19.0.md Documented a bug fix related to dust HTLC handling.
itest/lnd_multi-hop_test.go Modified logic for dust HTLC handling during multi-hop timeout processes for clarity.

Assessment against linked issues

Objective Addressed Explanation
Fail dust HTLCs upstream before downstream channel closure (7969)
Consider dust HTLCs in deadline computation for close transaction (7969)

Possibly related PRs

  • bumpforceclosefee rpc #8843: Enhancements to HTLC handling, specifically regarding the ability to bump close fees when no HTLCs are present.

Suggested labels

rpc, channel closing, force closes

Poem

🐇 In the land of contracts, where the rabbits play,
Canceled HTLCs now find their way.
Dust no longer lingers, it hops away fast,
With new methods in place, our troubles are past.
So let’s celebrate this code, oh so bright,
For a smoother transaction, all day and night! 🌟


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@ziggie1984 ziggie1984 added the P0 very high priority issue/PR, blocker on all others label Sep 5, 2024
@ziggie1984 ziggie1984 added this to the v0.19.0 milestone Sep 5, 2024
@ziggie1984 ziggie1984 changed the title Cancel back dust outgoing dust htlcs before commitment is confirmed. Cancel back outgoing dust htlcs before commitment is confirmed. Sep 5, 2024
@ziggie1984 ziggie1984 force-pushed the cancel-back-dust-htlc branch 2 times, most recently from 923c6b0 to fa7a925 Compare September 9, 2024 12:56
@ziggie1984 ziggie1984 marked this pull request as ready for review September 9, 2024 12:56
@ziggie1984
Copy link
Collaborator Author

ziggie1984 commented Sep 10, 2024

Hi reviewers I am not happy about the following in this PR maybe someone has a nice idea how to make it more clean:

So right now when we locally force close the channel we would fail the dust 2 times, meaning that the second time will cause the log error saying the closeCircuit is already gone. This is currently needed because we need to cancel dust even if the force-close is not initiated by us or the force-close is initiated by us but not by LND, broadcasting the force-close via some other means. This would right now cause some annying log entry similar to:

Example:

[ERR] HSWC: Unable to forward resolution msg: unable to find target channel for HTLC fail: channel ID = 443:2:0, HTLC ID = 0

Probably we should make the extra work and remove the outgoing htlc from the commitSet as soon as we cancel the incoming back. Will investigate.

@ziggie1984
Copy link
Collaborator Author

We could query the circuitMap and not attempt the cancelling of the incoming htlc but maybe just failing it and hitting the error is as efficient ?

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice refactors! My main comment is we could cancel the dust even earlier, once we've decided there are chain actions to be taken here,

if len(chainActions) == 0 && trigger == chainTrigger {
log.Debugf("ChannelArbitrator(%v): no actions for "+
"chain trigger, terminating", c.cfg.ChanPoint)
return StateDefault, closeTx, nil
}

The current design may end up calling the canceling logic twice, as indicated from this state transition diagram,

// StateDefault
// |
// |-> StateDefault: no actions and chain trigger
// |
// |-> StateBroadcastCommit: chain/user trigger
// | |
// | |-> StateCommitmentBroadcasted: chain/user trigger
// | | |
// | | |-> StateCommitmentBroadcasted: chain/user trigger
// | | |
// | | |-> StateContractClosed: local/remote/breach close trigger
// | | | |
// | | | |-> StateWaitingFullResolution: contract resolutions not empty
// | | | | |
// | | | | |-> StateWaitingFullResolution: contract resolutions not empty
// | | | | |
// | | | | |-> StateFullyResolved: contract resolutions empty
// | | | |
// | | | |-> StateFullyResolved: contract resolutions empty
// | | |
// | | |-> StateFullyResolved: coop/breach(legacy) close trigger
// | |
// | |-> StateContractClosed: local/remote/breach close trigger
// | | |
// | | |-> StateWaitingFullResolution: contract resolutions not empty
// | | | |
// | | | |-> StateWaitingFullResolution: contract resolutions not empty
// | | | |
// | | | |-> StateFullyResolved: contract resolutions empty
// | | |
// | | |-> StateFullyResolved: contract resolutions empty
// | |
// | |-> StateFullyResolved: coop/breach(legacy) close trigger
// |
// |-> StateContractClosed: local/remote/breach close trigger
// | |
// | |-> StateWaitingFullResolution: contract resolutions not empty
// | | |
// | | |-> StateWaitingFullResolution: contract resolutions not empty
// | | |
// | | |-> StateFullyResolved: contract resolutions empty
// | |
// | |-> StateFullyResolved: contract resolutions empty
// |
// |-> StateFullyResolved: coop/breach(legacy) close trigger.

contractcourt/channel_arbitrator.go Show resolved Hide resolved
contractcourt/channel_arbitrator.go Show resolved Hide resolved
contractcourt/channel_arbitrator.go Outdated Show resolved Hide resolved
contractcourt/channel_arbitrator.go Outdated Show resolved Hide resolved
contractcourt/channel_arbitrator.go Outdated Show resolved Hide resolved
contractcourt/channel_arbitrator.go Outdated Show resolved Hide resolved
// confirms or we broadcast the local commitment by some other
// means we need to fail them back here as well to be sure all
// outgoing dust htlcs resolve their incoming htlc.
case HtlcFailDanglingAction, HtlcFailDustAction:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like we may cancel the same dust HTLCs twice.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove this logic from prepContractResolutions - it's not related to contract resolutions anyway. Then we can call the new cancelDustPreMaturely wherever needed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thats a very good point.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

persisting the CancledHTLCs right now in the ArbLog, because I did not see another way to make sure we do not query the CircuitMap more than once and hence cancel back htlc twice which leads to log errors.

contractcourt/channel_arbitrator.go Show resolved Hide resolved
@ziggie1984 ziggie1984 force-pushed the cancel-back-dust-htlc branch 7 times, most recently from 12fb4ee to 67cd269 Compare September 19, 2024 10:34
@ziggie1984
Copy link
Collaborator Author

ziggie1984 commented Sep 19, 2024

Updated the commit-structure and also the code design quite a bit. Needed to persist the canceled htlcs in the arbLog to make sure we do not fail htlcs twice.

This PR is also created in a way so we can now add the canceling of non-dust htlcs as well in a followup PR.

I think might still be missing some unit-tests, need to dig into all the tests in the contractcourt, but lets see whether this new approach is the way we wanna go before adding more tests.

@ziggie1984
Copy link
Collaborator Author

@coderabbitai review

Copy link
Contributor

coderabbitai bot commented Sep 19, 2024

Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@ziggie1984 ziggie1984 added size/kilo medium, proper context needed, less than 1000 lines HTLC force closes labels Sep 19, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range and nitpick comments (6)
contractcourt/htlc_success_resolver_test.go (2)

120-122: Please provide more context on the purpose and usage of InsertCanceledHTLCs.

  • What is the expected behavior of this function?
  • Under what conditions should it return an error?
  • How and where is this function intended to be called?

Consider adding a comment to document the function's purpose and contract.


123-125: Clarify the scope and persistence of canceled HTLCs data.

  • Is the set of canceled HTLCs meant to be persisted across resolver restarts?
  • If yes, how and where is this data persisted?
  • If no, consider renaming the function to clarify that it only returns in-memory data (e.g. FetchCanceledHTLCsFromMemory).

Adding a comment to explain the function's behavior would be helpful.

contractcourt/htlc_timeout_resolver_test.go (2)

333-335: Implement the function or clarify it's intentionally left empty.

The InsertCanceledHTLCs function has an empty body. If this is intentional to provide a no-op stub for testing, please add a comment to clarify that. Otherwise, consider implementing the actual functionality to insert canceled HTLCs as the function name suggests.


336-338: Implement the function or rename it to clarify it returns an empty result.

The FetchCanceledHTLCs function always returns an empty set and a nil error without doing any actual fetching. If this is meant to be a stub for testing, please consider renaming the function to make it clear it's a stub that returns an empty result (e.g. FetchCanceledHTLCsStub). Otherwise, consider implementing the actual functionality to fetch canceled HTLCs as the function name suggests.

contractcourt/htlc_timeout_resolver.go (1)

Line range hint 1-1: New consumeSpendEvents function looks good with some suggestions

The new consumeSpendEvents function provides a robust way to consume spend events from both block and mempool subscriptions. The checkpointing logic to handle mempool spends is important.

Suggestions:

  1. Consider adding more comments explaining the reorg and preimage spend handling logic for better code readability.
  2. The TODO comment indicates some planned enhancements. Suggest creating a dedicated GitHub issue to track those enhancements.
contractcourt/channel_arbitrator_test.go (1)

947-963: Verify the resolution message for the outgoing dust HTLC.

The test checks for an immediate resolution message for the outgoing dust HTLC after the local force close. However, it doesn't verify the contents of the resolution message. Consider adding an assertion to check that the resolution message contains the expected details for the dust HTLC.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 611852f and 67cd269.

Files selected for processing (10)
  • contractcourt/briefcase.go (6 hunks)
  • contractcourt/briefcase_test.go (2 hunks)
  • contractcourt/channel_arbitrator.go (16 hunks)
  • contractcourt/channel_arbitrator_test.go (3 hunks)
  • contractcourt/contract_resolver.go (1 hunks)
  • contractcourt/htlc_success_resolver_test.go (1 hunks)
  • contractcourt/htlc_timeout_resolver.go (1 hunks)
  • contractcourt/htlc_timeout_resolver_test.go (1 hunks)
  • docs/release-notes/release-notes-0.19.0.md (1 hunks)
  • itest/lnd_multi-hop_test.go (1 hunks)
Additional context used
LanguageTool
docs/release-notes/release-notes-0.19.0.md

[duplication] ~27-~27: Possible typo: you repeated a word
Context: ...l/9068) where dust htlcs although not being being able to be resolved onchain were not ca...

(ENGLISH_WORD_REPEAT_RULE)

Markdownlint
docs/release-notes/release-notes-0.19.0.md

26-26: Expected: dash; Actual: asterisk
Unordered list style

(MD004, ul-style)

Additional comments not posted (36)
docs/release-notes/release-notes-0.19.0.md (1)

26-29: Critical bug fix for handling dust HTLCs.

This bug fix addresses a crucial issue where dust HTLCs were not being canceled before the commitment transaction was confirmed, potentially leading to force closures of incoming channels. Ensuring the proper cancellation of dust HTLCs is essential for maintaining the reliability and stability of channel operations in the Lightning Network.

Tools
LanguageTool

[duplication] ~27-~27: Possible typo: you repeated a word
Context: ...l/9068) where dust htlcs although not being being able to be resolved onchain were not ca...

(ENGLISH_WORD_REPEAT_RULE)

Markdownlint

26-26: Expected: dash; Actual: asterisk
Unordered list style

(MD004, ul-style)

contractcourt/contract_resolver.go (2)

102-105: LGTM!

The addition of the FetchCanceledHTLCs function to the ResolverConfig struct is a good enhancement. It provides a way to retrieve all canceled HTLCs from the database, which can help prevent HTLCs from being failed back more than once. The function signature is clear, follows Go conventions, and returns a set of uint64 values representing the IDs of the canceled HTLCs, along with an error to handle potential failures.

This change aligns with the PR objective of improving the handling of canceled HTLCs and maintaining data integrity.


107-108: Looks good!

The addition of the InsertCanceledHTLCs function to the ResolverConfig struct is a valuable complement to the FetchCanceledHTLCs function. It allows for the insertion of a list of canceled HTLCs into the database, enabling the system to keep track of HTLCs that have already been processed and avoid duplicate failures.

The function signature is clear, follows Go conventions, and takes a set of uint64 values representing the IDs of the canceled HTLCs to be inserted. It also returns an error to handle potential failures during the insertion process.

This change aligns with the PR objective of improving the management of canceled HTLCs and maintaining data consistency.

contractcourt/briefcase_test.go (1)

787-811: LGTM!

The new test TestCanceledHTLCStorage looks good:

  • It correctly tests the functionality of reading and writing a set of canceled HTLCs using a test log.
  • The test logic is sound, covering the essential steps of creating a test log, creating a set of canceled HTLCs, inserting the set into the log, retrieving the set from the log, and comparing the retrieved set with the original set.
  • The test fails with a detailed error message if the retrieved set doesn't match the original set.

Great job adding this test to improve the test coverage!

contractcourt/htlc_timeout_resolver.go (2)

456-484: Looks good!

The added check for canceled HTLCs before proceeding with the resolution is a good safeguard. The resolution process of logging, sending failure message, and updating canceled HTLCs set is handled correctly.


Line range hint 1-1: Skipped reviewing claimCleanUp as no changes were made to this function in the diff.

contractcourt/briefcase.go (5)

1206-1235: LGTM!

The function correctly converts the set of canceled HTLCs to a slice, serializes the data efficiently, and stores it in the database under the appropriate key.


1237-1292: LGTM!

The function correctly fetches the serialized data from the database, deserializes it into a slice, and converts it back to a set. It handles the case when no canceled HTLCs are found and returns the appropriate error.


Line range hint 1294-1328: LGTM!

The function correctly encodes the SignDetails struct, handling the case when it is nil. It uses the appropriate encoding functions for each field and writes the data efficiently to the provided writer.


Line range hint 1330-1369: LGTM!

The function correctly decodes the SignDetails struct from the provided reader, handling the case when it is not present. It uses the appropriate decoding functions for each field and parses the DER-encoded signature to create a Signature object.


Line range hint 1756-1835: LGTM!

The function correctly decodes the taproot specific data from the provided reader and populates the ContractResolutions struct. It handles various cases for IncomingHTLCs and OutgoingHTLCs based on the presence of SignedSuccessTx, SignedTimeoutTx, and SignDetails. It uses the resolver ID to retrieve the appropriate control blocks from the decoded maps.

itest/lnd_multi-hop_test.go (12)

253-256: Test logic looks good.

The test correctly asserts that dust HTLCs are immediately canceled backwards as soon as the commitment transaction is broadcast to the mempool. This prevents any loss of funds on dust HTLCs.


Line range hint 432-435: Looks good!

Using runMultiHopHtlcClaimTest to run the test case with different combinations of commitment types and zero-conf settings provides good test coverage.


Line range hint 437-585: Test logic is correct.

The test makes good use of hold invoices and restarting Bob to force an on-chain claim by Carol. It then correctly asserts that Bob extracts the preimage from Carol's claim transaction and settles the HTLC backwards to Alice. This ensures the multi-hop HTLC resolution works as expected.


Line range hint 587-590: Looks good!

Using runMultiHopHtlcClaimTest to run the test case with different combinations of commitment types and zero-conf settings provides good test coverage.


Line range hint 592-736: Test logic is sound.

The test correctly handles the scenario where Bob force closes his channel with Carol while an HTLC is in-flight. It asserts that Bob hands off the HTLC to his utxo nursery, and eventually confirms the second-level timeout transaction to fully resolve the HTLC. This ensures the HTLC is properly canceled backwards through the route.


Line range hint 738-741: Looks good!

Using runMultiHopHtlcClaimTest to run the test case with different combinations of commitment types and zero-conf settings provides good test coverage.


Line range hint 743-844: Test logic is correct.

The test properly handles the scenario where Carol force closes while an incoming HTLC from Bob is pending. It asserts that Bob waits until the HTLC expires, then broadcasts his timeout transaction. Once the timeout transaction confirms, it verifies that the HTLC is canceled back to Alice. This ensures the incoming HTLC is properly handled on the commitment transaction.


Line range hint 846-849: Looks good!

Using runMultiHopHtlcClaimTest to run the test case with different combinations of commitment types and zero-conf settings provides good test coverage.


Line range hint 851-1000: Test logic is sound.

The test correctly handles the scenario where Bob force closes a channel with an incoming HTLC, then later learns the preimage once Carol settles her invoice on-chain. It asserts that Bob broadcasts his second-level HTLC success transaction once he sees Carol's, and mines blocks to confirm the transactions and fully resolve the HTLC back to Alice. This ensures an incoming HTLC can be properly settled on-chain using the preimage.


Line range hint 1002-1005: Looks good!

Using runMultiHopHtlcClaimTest to run the test case with different combinations of commitment types and zero-conf settings provides good test coverage.


Line range hint 1007-1131: Test logic is correct.

The test properly handles the scenario where Alice force closes while Bob has an outgoing HTLC to Carol. It asserts that once Carol settles on-chain and Bob learns of the preimage, he claims the HTLC output on Alice's commitment using the preimage. Mining blocks to confirm the transactions ensures the HTLC is fully resolved on-chain back to Alice. This verifies an outgoing HTLC can be settled on the remote party's commitment when the preimage is learned.


Line range hint 1133-1136: Looks good!

Using runMultiHopHtlcClaimTest to run the test case with different combinations of commitment types and zero-conf settings provides good test coverage.

contractcourt/channel_arbitrator.go (13)

962-985: LGTM: Correctly cancelling outgoing dust HTLCs before commitment broadcast

The code appropriately handles the immediate cancellation of outgoing dust HTLCs by collecting them from chainActions, converting them into a set, and calling cancelIncomingHTLCs. Persisting the canceled HTLCs ensures they are not processed multiple times.


1231-1238: LGTM: Resolving HTLCs immediately after commitment confirmation

The call to resolveHTLCsImmediately efficiently processes HTLCs that can be acted upon immediately once the commitment transaction is confirmed, enhancing the timeliness of HTLC resolution.


1685-1691: LGTM: Added new chain action HtlcFailDustAction

The HtlcFailDustAction is correctly defined to handle the immediate failure of outgoing dust HTLCs that have no corresponding output on the commitment transaction.


1711-1716: LGTM: Added new chain action HtlcFailDanglingAction

The HtlcFailDanglingAction is appropriately introduced to handle outgoing HTLCs that need to be failed immediately after the commitment transaction is confirmed, excluding dust HTLCs covered by HtlcFailDustAction.


1926-1928: LGTM: Categorizing outgoing dust HTLCs for immediate failure

Dust HTLCs are correctly added to HtlcFailDustAction for prompt cancellation, ensuring they are handled appropriately since they cannot be enforced on-chain.


2120-2133: LGTM: Handling dangling dust HTLCs in remote commitments

The code correctly identifies and schedules outgoing dust HTLCs, present only in the remote commitment, for immediate failure by adding them to HtlcFailDustAction.


2232-2234: LGTM: Scheduling dangling HTLCs for immediate failure

HTLCs present only on the remote dangling commitment and not on the confirmed commitment are appropriately added to HtlcFailDanglingAction, ensuring they are failed promptly.


Line range hint 2299-2308: LGTM: Modifying prepContractResolutions to accept confCommitSet

Adding the confCommitSet parameter to prepContractResolutions allows access to the confirmed commitment set, aligning with the updated HTLC handling logic.


2327-2327: LGTM: Proper error handling when fetching historical channel state

The return of nil and the error ensures that any issues encountered while fetching the historical channel are appropriately propagated.


2381-2381: LGTM: Returning HTLC resolvers after preparing contract resolutions

The function returns the slice of ContractResolver correctly, concluding the preparation of contract resolutions.


2501-2505: Logging improvement: Enhanced error message for missing outgoing resolution

The log message provides clear context by including the channel point and the missing HTLC outpoint, aiding in debugging.


2545-2677: LGTM: Introducing resolveHTLCsImmediately for prompt HTLC resolution

The new function resolveHTLCsImmediately efficiently processes HTLCs that can be resolved immediately after the commitment transaction is confirmed, preventing duplicate cancellations by checking already canceled HTLCs.


3269-3308: LGTM: Implementing cancelIncomingHTLCs to cancel incoming HTLCs

The function correctly constructs failure messages for incoming HTLCs associated with outgoing HTLCs that need to be canceled, and ensures they are sent to the switch only when necessary.

@@ -23,6 +23,11 @@
propagate mission control and debug level config values to the main LND config
struct so that the GetDebugInfo response is accurate.

* [Fixed a bug](https://github.com/lightningnetwork/lnd/pull/9068) where dust
htlcs although not being being able to be resolved onchain were not canceled
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fix the typo in the bug fix description.

There is a typo in the description where the word "being" is repeated twice. Please remove the duplicate word to improve clarity.

Apply this diff to fix the typo:

-  htlcs although not being being able to be resolved onchain were not canceled
+  htlcs although not being able to be resolved onchain were not canceled
Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
htlcs although not being being able to be resolved onchain were not canceled
htlcs although not being able to be resolved onchain were not canceled
Tools
LanguageTool

[duplication] ~27-~27: Possible typo: you repeated a word
Context: ...l/9068) where dust htlcs although not being being able to be resolved onchain were not ca...

(ENGLISH_WORD_REPEAT_RULE)

Copy link
Collaborator

@yyforyongyu yyforyongyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good - main comment is whether we should start putting the canceled dust HTLCs into db in this PR or not. It seems to me that it's only used to de-dup dust HTLCs, which I think can be avoided if we only ever cancel once in StateDefault. The db is still needed for the future PR where we cancel the incoming HTLCs tho, which will be used to track the total canceled amount, but think we can defer the implementation there so land this one faster.

@@ -1204,10 +1204,18 @@ func (c *ChannelArbitrator) stateStep(
break
}

// With the commitment confirmed, we'll then send over all
// messages we can send immediately.
err = c.resolveHTLCsImmediately(contractResolutions,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nit: use new lines


// resolveHTLCsImmediately filters out all HTLCs for which an action can be
// taken immediately and performs the action.
func (c *ChannelArbitrator) resolveHTLCsImmediately(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: resolveHTLCsImmediately -> resolveIncomingHTLCsImmediately or resolveIncomingHTLCs to make it clear that we are handling the corresponding incoming HTLCs?


// canceledHTLCsKey is the primary key under the logScope that we'll
// use to store the set of HTLCs that were prematurely canceled back
// before the commitment transaction was confirmed onchain.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's also mention what the keys and values are?


var b bytes.Buffer

htlcs := canceledHTLCs.ToSlice()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use TLV records here, and use big size type for both the num of htlcs and the htlc index.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ohh true my bad sorry, should have thought of tlv in the first place.

@@ -959,6 +959,30 @@ func (c *ChannelArbitrator) stateStep(
return StateDefault, closeTx, nil
}

// Cancel all the outgoing dust htlcs available either on the
// local or the remote/remote pending commitment transaction.
dustHTLCs := chainActions[HtlcFailDustAction]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's create a new method, sth like cancelDustHTLCs or handleDustHTLCs since this method is already very long?

@@ -2579,8 +2627,22 @@ func (c *ChannelArbitrator) resolveHTLCsImmediately(
// If we can fail an HTLC immediately (an outgoing HTLC with no
// contract and it was not canceled before), then we'll assemble
// an HTLC fail packet to send.
//
// NOTE: In case we initiated the force close locally all the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think by the time we hit here we should cancel all the dust HTLCs, for all possible scenarios. This is because the channel always starts at state StateDefault, then we advance it.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohh true did not think about this, but you are absolutely right.

@@ -959,6 +959,30 @@ func (c *ChannelArbitrator) stateStep(
return StateDefault, closeTx, nil
}

// Cancel all the outgoing dust htlcs available either on the
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead of using c.checkLocalChainActions, we should instead use constructChainActions here, then call resolveHTLCsImmediately, maybe we can break resolveHTLCsImmediately into two methods, one for cancel the breach case, one for the rest - need to think twice hmm

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure why you think we can call resolveHTLCsImmediately because the commitment is still not confirmed here ?


// FetchCanceledHTLCs fetches all canceled HTLCs from the database so
// no HTLCs are failed back more than once.
FetchCanceledHTLCs func() (fn.Set[uint64], error)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should instead be put in ChannelArbitratorConfig, similar to PutResolverReport

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering why Checkpoint is done the other way, but sure will move.

// database. It converts the set to a slice and persists the slice.
//
// NOTE: Part of the ArbitratorLog interface.
func (b *boltArbitratorLog) InsertCanceledHTLCs(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think these commits can be move to a new PR where we want to cancel non-dust incoming HTLCs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm not sure, it solves the case for outgoing_htlc_resolver, when we restart before the second stage htlc is confirmed ?

Refactor the part where we are failing back the incoming htlc
when the channel of the corresponding outgoing htlc is force
closed.
Add an insert and fetch method for outgoing htlcs whose incoming
htlcs have been already failed back. This is necessary to make
sure to not fail back htlcs more than once although the incoming
htlcs have already been resolved.
@ziggie1984
Copy link
Collaborator Author

ziggie1984 commented Sep 20, 2024

Thank you for your review @yyforyongyu

I am in favour of also add the persisting to db change here, because in the third commit contractcourt: dont fail incoming htlcs more than once. I am fixing the behaviour which I encountered during this PR, where we would fail an incoming twice when the second level sweep is not yet confirmed and we restart the node. Moreover I still kept the check for already canceled htlcs in place, when the commitment is confirmed just because of a potential failure scenario where we would then also try to fail back htlcs more than once. Example:
We successfully fail back the htlcs after the confirmation but fail to create the Resolvers in prepContractResolutions which might never happen, so the question is should we program more defensively here or remove the check of already canceled htlcs? Open for both ways.

To Insert and Fetch the Canceled htlcs in the HTLC_Timeout_Resolver I think I cannot just put those methods into the ChannelArbitratorConfig as suggested in #9068 (comment)
because that would mean I need to define those two methods outside the scope of the ArbLog, I think thats also why the CheckPoint function is implemented this way. But I agree these are not the best solutions so maybe there is a bigger refactor to do, to make the arbLog available in the resolvers, I need to think about this.

Before failing the incoming htlc back after the outgoing htlc is
confirmed we make sure that this incoming htlc hasn't already been
canceled back. This can happen during restarts when the stage 1
htlc is already confirmed but the csv lock hasn't expired yet.
We distinguish between dangling and dust htlcs. This does not
change any logic but only introduces new types to later act on them
differently when we begin to fail dust htlcs earlier in a later
commit.
Now that we introduced the dangling category we now group dust
dangling htlcs into another categroy.
We will now cancel dust htlcs on the local/remote commits after
we decided to go onchain. This can be done because dust cannot
be enforced onchain and therefore there is no way to also reveil
the preimage onchain. Moreover we do not take dust htlcs into
account when setting the cpfp fee rate of the commitment transaction
to its even more important to fail them earlier before we force
close the incoming link.
In addition to this change we also persist the already canceled
htlcs, because otherwise we would fail them more than once.
Now outgoing dust-htlcs are canceled back before the commitment
is confirmed onchain.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
force closes HTLC P0 very high priority issue/PR, blocker on all others size/kilo medium, proper context needed, less than 1000 lines
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[bug]: dust HTLC is not failed upstream before downstream channel close is confirmed on-chain
3 participants