channeld: fix channel_ready retransmission after splice via funding_tx_index#9256
channeld: fix channel_ready retransmission after splice via funding_tx_index#9256nGoline wants to merge 3 commits into
Conversation
Add regression tests that each funding transaction carries a funding_tx_index (0 for the original funding, +1 per splice), that it is persisted and reloaded across restart, and that a pending splice inflight carries its index. Changelog-None
Give every funding tx a stable index: 0 for the original funding (including RBF attempts), incrementing by 1 per splice. Threaded through the channeld inflight, the channel, the channeld_init and channeld_add_inflight wire messages, set on splice creation and carried onto the channel funding when a splice locks, and persisted in the channel_funding_inflights and channels tables.
is_splice_active compared the peer's my_current_funding_locked txid against peer->channel->funding.txid, which is wrong: once a splice locks, funding.txid becomes the splice txid, both sides agree on it, the check falls through, and channel_ready is incorrectly retransmitted on reconnect. Resolve the txid to its funding_tx_index (the current channel funding or a pending splice inflight) and treat > 0 as a splice. Changelog-Fixed: channeld: correctly detect splice transactions when deciding whether to retransmit `channel_ready` on reconnect.
ddustin
left a comment
There was a problem hiding this comment.
This is great! 👏 Mostly style notes with only one major concern:
test_splice_reconnect_after_lock_no_channel_ready may not actually be testing for the lack of channel_ready because of the unneeded billboard: Channel ready for use. check.
| # add extra sats to pay fee | ||
| funds_result = l1.rpc.fundpsbt("107527sat", 0, 0, excess_as_change=True) | ||
|
|
||
| result = l1.rpc.splice_init(chan_id, 100000, funds_result['psbt']) |
There was a problem hiding this comment.
This can be done easier with the splicein command or the dev-splice command for more complex ones.
| def do_splice(amount): | ||
| funds = l1.rpc.fundpsbt('{}sat'.format(amount + 7527), 0, 0, | ||
| excess_as_change=True) | ||
| result = l1.rpc.splice_init(chan_id, amount, funds['psbt']) |
There was a problem hiding this comment.
Same here this can become a one liner with splicein or dev-splice (for scripts).
|
|
||
| l1.daemon.wait_for_log(r'peer_in WIRE_CHANNEL_REESTABLISH') | ||
| l2.daemon.wait_for_log(r'peer_in WIRE_CHANNEL_REESTABLISH') | ||
| l1.daemon.wait_for_log(r'billboard: Channel ready for use.') |
There was a problem hiding this comment.
Need to remove l1.daemon.wait_for_log(r'billboard: Channel ready for use.') check for the later on assert not l1.daemon.is_in_log(r'Retransmitting channel_ready') to have any effect as it could eat the "channel_ready" message.
| # Restart l1: the inflight (and its index) must reload from the DB. | ||
| l1.restart() | ||
| l1.daemon.wait_for_log(r'peer_in WIRE_CHANNEL_REESTABLISH') | ||
| inflights = l1.db_query("SELECT funding_tx_index FROM" |
There was a problem hiding this comment.
It looks like you meant to query this from the RPC instead of db? Doesn't seem critical since test_splice_reconnect_after_lock_no_channel_ready should cover this already.
| &recv_tlvs->my_current_funding_locked->my_current_funding_locked_txid, | ||
| &peer->channel->funding.txid)); | ||
| && funding_tx_index_for_txid(peer, | ||
| &recv_tlvs->my_current_funding_locked->my_current_funding_locked_txid) > 0); |
There was a problem hiding this comment.
The correct style is to have the BOLT reference at the code where it is used and the code needs to match the spec language as closely as possible. This is for many reasons.
I understand moving the check into the is_splice_active variable may feel like a clean up, but being clean to the spec is more important here.
Let's drop this variable, put the check back where it was on the channel_ready check, and use a correct BOLT reference.
The bolt reference should read something like:
/* BOLT #2:
* - if next_commitment_number is 1 in both the channel_reestablish it sent and received, and none of those channel_reestablish messages contain my_current_funding_locked or next_funding for a splice transaction:
* - MUST retransmit channel_ready.
* - otherwise:
* - MUST NOT retransmit channel_ready, but MAY send channel_ready with a different short_channel_id alias field.
* /Using the Bolt reference this way makes CI enforce the bolt reference is correct and serves as a reminder to easily scan the code directly beneath it to verify we are following the spec.
This ensures we're keeping up with the spec as it changes, makes audits significantly easier, and even helps us notice when the spec itself is wrong so we can update the spec.
There was a problem hiding this comment.
Also, adding a comment here explaining something like "tx_index of 1 or more means it was from a splice." could be useful.
| # the new funding_tx_index columns) and reconnects/reestablishes. | ||
| l1.restart() | ||
|
|
||
| # Force the reconnect rather than waiting on auto-reconnect backoff. |
There was a problem hiding this comment.
Not a big deal but probably should be in the first commit
Problem
On reconnect, channeld decides whether to retransmit
channel_readybased on whether the peer'schannel_reestablishreferences a splice (is_splice_active). It computed that by comparing the peer'smy_current_funding_lockedtxid againstpeer->channel->funding.txid.That comparison is wrong once a splice locks:
channel_update_funding()overwritesfunding.txidwith the splice txid, so afterwards both sides agree on the same txid, the check falls through, andchannel_readyis incorrectly retransmitted on reconnect, which can disrupt channels.This is the problem @ddustin raised reviewing #9079 (comparing against the txid is incorrect, and we should track metadata on the inflight instead), suggesting a
funding_tx_index, matching what eclair does.Fix
Give every funding tx a stable
funding_tx_index:0for the original funding (including RBF attempts), incrementing by 1 per splice. It is:channeld_initandchanneld_add_inflightwire,parent + 1) and copied onto the channel funding when a splice locks,channel_funding_inflightsandchannels(two migrations,DEFAULT 0, so existing channels are unaffected).The reestablish check now resolves the
my_current_funding_lockedtxid to itsfunding_tx_index(current funding or a pending inflight) and treats> 0as a splice.Commits (bisectable)
splice: add tests for funding_tx_index tracking, regression tests (xfail).splice: track funding_tx_index on inflights and channel funding, the plumbing.channeld: detect splice on reestablish via funding_tx_index, not txid, the fix.Tests
test_splice_reconnect_after_lock_no_channel_ready: splice, restart, reestablish; asserts the index persists andchannel_readyis not retransmitted. It fails on commit 2 (old txid compare) and passes on commit 3, so it genuinely guards the fix.test_splice_funding_tx_index_increments: two sequential splices reach index 2.test_splice_inflight_funding_tx_index: a pending splice inflight carries index 1 and survives a restart.test_reconnect_no_updateconfirms non-splice channels still retransmitchannel_ready.Risk and plan
The behavioural change is a single line in the reestablish check. The plan is to get the
funding_tx_indexsemantics reviewed properly, and if we don't converge before the v26.09 freeze, revert that one line with a FIXME and keep the plumbing for the follow-up. So this carries low risk.@ddustin, this implements the
funding_tx_indexapproach you suggested; would appreciate your review of the assignment semantics and whether it matches what you and tbast had in mind.