You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
dash-spv's get_quorum_at_height resolves a quorum only through the single active-window masternode list at or below the lookup height. Platform/Drive signs proofs with a signing quorum selected at a lagged height (~4.5 DKG intervals back on devnet), so by the proof's core_chain_locked_height that quorum can already have retired out of Core's active set. MasternodeList::apply_diff drops a retired quorum from the list's .quorums, so the active-window lookup misses — even though the quorum's public key is still resident in the engine's insert-only quorum_statuses by-hash index, which the read path never consults. The result is Quorum not found → InvalidQuorum. It is intermittent: most proofs reference an in-window quorum and verify fine; the failure fires only at the retirement edge.
Environment
rust-dashcore branch fix/sml-extnetinfo-v3-decode @ 2a68c3819131b71e42df39612e6d82228bd00a82 (the PR feat: decode SML ProTx v3 entries #797 head; confirmed against the local checkout).
Dash Core 23.1.2 devnet (paloma), protocol 70240.
LLMQ type 107 (llmq_devnet_platform / LLMQType::LlmqtypeDevnetPlatform), signing_active_quorum_count = 4, DKG interval 24 → active window = 96 blocks.
Symptom
WARN dash_spv::client::queries: Quorum not found: type 107_Dev-Platform at list height 16596
(requested 16596) with hash 50973dd2ab53091024fc2c8e344c91d07a98281e717ac83b9885607ab6020000
(masternode list exists with 4 quorums of this type)
Downstream this surfaces as ContextProviderError(InvalidQuorum) during proof verification. It is intermittent — at the same synced SPV state, proof verification succeeds for one Drive response and fails 200 ms later for another, the only difference being the quorum hash Drive embedded (see Evidence).
Root cause
Walking the read path on 2a68c38:
dash-spv/src/client/queries.rs:48-107 — get_quorum_at_height is the defect site. It calls masternode_lists_around_height(height), takes the before list, then ml.quorums.get(&quorum_type)?.get(&quorum_hash). On a miss it returns SpvError::QuorumLookupError (lines 71-82) — it never consults quorum_statuses:
dash/src/sml/masternode_list_engine/helpers.rs:29-40 — masternode_lists_around_height picks the single highest list ≤ height via self.masternode_lists.range(..=core_block_height).next_back() (line 33-34). One list, no fallback to history.
dash/src/sml/masternode_list/apply_diff.rs:70-78 removes a retired quorum from that list's.quorums the instant Core advertises it as deleted:
So a retired quorum is absent from every list whose known_height ≥ its retirement height — including the list ≤ the proof height.
dash/src/sml/masternode_list_engine/mod.rs:265-279 — quorum_statuses is, by contrast, an insert-only by-hash index of every quorum public key the engine has ever applied:
It is written on every ingest path (mod.rs:627, :998, :1080, :1240, :1272, :1306, :1374) and never removed on retirement — a grep for remove/retain/prune/truncate/clear against quorum_statuses across both dash/src and dash-spv/src returns nothing.
The miss condition: a proof carries (quorum_type = 107, quorum_hash = Q, core_chain_locked_height = H). When H ≥ mint(Q) + signing_active_quorum_count × DKG_interval (Q has retired) while Drive legitimately selected Q at the lagged selection height, the active list ≤ H no longer holds Q → get_quorum_at_height errors. Yet quorum_statuses[107][Q].1 still holds Q's BLSPublicKey — the read path simply never looks there.
Why the reference is legitimate (not a node bug)
Platform/Drive signs with the type-107 quorum selected at a lagged height, and the proof carries that quorum's hash. Verification should resolve the signing quorum by hash, regardless of whether the quorum is still in the active set at H. The verifier consumes only the 48-byte public key — height is pure context, not a membership constraint.
Live confirmation on paloma — dash-cli quorum info 107 000002b67a6085983bc87a711e28987ad0914c348e2cfc24100953abd23d9750 (the big-endian form of the logged 50973dd2…b6020000) returns a real, valid quorum:
height 16488, type llmq_devnet_platform, quorumIndex 0, 12 valid members
Heights 16536/16560/16584/16608 → DKG interval 24. The full sequence including the retired one is …16488, 16512, 16536, 16560, 16584, 16608… With active_count = 4, Q (16488) is active in [16488, 16584) and retires at 16584. The proof references it at core_chain_locked_height = 16596 — 12 blocks after retirement, i.e. exactly one step out of the active window. The selection offset 16596 − 16488 = 108 > 96 (the active window), which is the precise inequality above.
Evidence
Same-state success-then-fail, 200 ms apart (fully synced, chain-locked at 16596): proof verification successful at 10:31:16.703, then InvalidQuorum for a different embedded quorum hash at 10:31:16.905. Identical SPV state across those 200 ms rules out staleness — the discriminator is the hash, not height or list freshness.
54 type-107 Quorum not found warnings in a run where the engine's window was frozen at 16596 (all at list height 16596 (requested 16596)); hundreds in another run. A non-advancing window lags every incoming proof, so the edge condition that is rare against a live engine becomes pervasive — same defect, amplified.
4 distinct retired hashes recur across heights (50973dd2, 5eec7acc, 1c0b8f69, b7fc2340), rotating with height in a quorum-aging pattern.
Framing: this is a retirement-edge timing race. It is pervasive when the engine lags, rare-but-real when live — it fires only when a proof's signing quorum has just left the active window.
Deterministic reproduction (hermetic — no devnet)
The retirement asymmetry is directly observable without a live network. Mirror the existing engine_with_lists fixture pattern (dash-spv/src/sync/masternodes/manager.rs:711) and MasternodeList::empty(block_hash, block_height) (dash/src/sml/masternode_list/mod.rs:39). Parameters mirror the devnet cadence and the paloma instance: active_count = 4, interval = 24 → window 96; quorum Q minted at M = 100 (retires at 100 + 4×24 = 196); lookup at H = 208 (≥ 196; 208 − 100 = 108 > 96 — same inequality as paloma's 16488/16584/16596).
letmut engine = MasternodeListEngine::default_for_network(Network::Regtest);// mod.rs:342let type107 = LLMQType::LlmqtypeDevnetPlatform;// 1. A list at height 148 (Q still active) holding Q with pubkey PK.// Build via MasternodeList::empty(anchor_hash, 148), then insert Q into .quorums[type107].
engine.masternode_lists.insert(148,list_with_quorum(type107,Q,PK));// 2. A post-retirement list at 208 WITHOUT Q (Q deleted on retirement),// .quorums[type107] populated by the then-active 4, excluding Q.
engine.masternode_lists.insert(208,list_without_quorum(type107,/* excludes Q */));// 3. Mirror Q into the never-pruned by-hash index exactly as apply_diff does// (mod.rs:1240/1272/1306). Planted explicitly so the test doesn't depend on diff plumbing.
engine
.quorum_statuses.entry(type107).or_default().insert(Q,(BTreeSet::from([148]),PK,LLMQEntryVerificationStatus::Verified));// CURRENT behavior — proves the bug deterministically:assert!(engine
.masternode_lists_around_height(208).0.unwrap()// list at 208.quorums.get(&type107).unwrap().get(&Q).is_none());// Q absent from active-window listassert!(client.get_quorum_at_height(208, type107,Q).await.is_err());// => Quorum not found (THE BUG)assert_eq!(engine.quorum_statuses[&type107][&Q].1,PK);// but the pubkey IS resident by hash// AFTER the fix — same state, now resolves:assert_eq!(engine.quorum_public_key_by_hash(type107,Q),Some(PK));// new by-hash accessor
Higher-fidelity variant (optional, guards the diff plumbing): feed a base list then a sequence of MnListDiffs across ≥ 5 cycles, where the cycle-M diff carries Q in new_quorums and the cycle-196 diff carries Q in deleted_quorums. Real binary MnListDiff fixtures already live at dash/tests/data/test_DML_diffs/*.bin and back the existing apply_diff tests; a maintainer can synthesize an analogous devnet fixture.
Suggested fix
dash: add MasternodeListEngine::quorum_public_key_by_hash(&self, llmq_type, quorum_hash) -> Option<(BLSPublicKey, LLMQEntryVerificationStatus)> reading quorum_statuses. (The existing test at mod.rs:1958 already reads quorum_statuses via .get(&type).and_then(|m| m.get(&hash)).map(|(_, _, status)| ...) — same access shape.)
dash-spv: in get_quorum_at_height (client/queries.rs), on the active-list miss (queries.rs:71-82) fall back to the by-hash accessor instead of returning Err; keep QuorumLookupError only when both miss.
This is window-independent by construction — it keys on the hash, not on active-set membership at H — so it resolves the just-retired signing quorum the active-window lookup cannot. The common in-window path is untouched (the active list hits first). The verifier needs only the 48-byte key, which BLSPublicKey provides directly. Effort ~S. Prefer the fallback to surface/prefer Verified entries; correctness is ultimately gated by the BLS threshold-signature check against the returned pubkey, so a wrong pubkey fails the proof rather than forging one. Suggest an independent PR off dev (not stacked on #797 — the fix does not depend on the SML-v3 decode work).
Related (separate)
Downstream SDKs ban the entire DAPI pool on a single InvalidQuorum, turning one rare retirement-edge miss into a NoAvailableAddresses cascade. That is worth a separate hardening issue — out of scope here.
TL;DR
dash-spv'sget_quorum_at_heightresolves a quorum only through the single active-window masternode list at or below the lookup height. Platform/Drive signs proofs with a signing quorum selected at a lagged height (~4.5 DKG intervals back on devnet), so by the proof'score_chain_locked_heightthat quorum can already have retired out of Core's active set.MasternodeList::apply_diffdrops a retired quorum from the list's.quorums, so the active-window lookup misses — even though the quorum's public key is still resident in the engine's insert-onlyquorum_statusesby-hash index, which the read path never consults. The result isQuorum not found→InvalidQuorum. It is intermittent: most proofs reference an in-window quorum and verify fine; the failure fires only at the retirement edge.Environment
fix/sml-extnetinfo-v3-decode@2a68c3819131b71e42df39612e6d82228bd00a82(the PR feat: decode SML ProTx v3 entries #797 head; confirmed against the local checkout).llmq_devnet_platform/LLMQType::LlmqtypeDevnetPlatform),signing_active_quorum_count = 4, DKG interval 24 → active window = 96 blocks.Symptom
Downstream this surfaces as
ContextProviderError(InvalidQuorum)during proof verification. It is intermittent — at the same synced SPV state, proof verification succeeds for one Drive response and fails 200 ms later for another, the only difference being the quorum hash Drive embedded (see Evidence).Root cause
Walking the read path on
2a68c38:dash-spv/src/client/queries.rs:48-107—get_quorum_at_heightis the defect site. It callsmasternode_lists_around_height(height), takes thebeforelist, thenml.quorums.get(&quorum_type)?.get(&quorum_hash). On a miss it returnsSpvError::QuorumLookupError(lines 71-82) — it never consultsquorum_statuses:dash/src/sml/masternode_list_engine/helpers.rs:29-40—masternode_lists_around_heightpicks the single highest list ≤ height viaself.masternode_lists.range(..=core_block_height).next_back()(line 33-34). One list, no fallback to history.dash/src/sml/masternode_list/apply_diff.rs:70-78removes a retired quorum from that list's.quorumsthe instant Core advertises it as deleted:So a retired quorum is absent from every list whose
known_height≥ its retirement height — including the list ≤ the proof height.dash/src/sml/masternode_list_engine/mod.rs:265-279—quorum_statusesis, by contrast, an insert-only by-hash index of every quorum public key the engine has ever applied:It is written on every ingest path (
mod.rs:627,:998,:1080,:1240,:1272,:1306,:1374) and never removed on retirement — agrepforremove/retain/prune/truncate/clearagainstquorum_statusesacross bothdash/srcanddash-spv/srcreturns nothing.The miss condition: a proof carries
(quorum_type = 107, quorum_hash = Q, core_chain_locked_height = H). WhenH ≥ mint(Q) + signing_active_quorum_count × DKG_interval(Q has retired) while Drive legitimately selected Q at the lagged selection height, the active list ≤ H no longer holds Q →get_quorum_at_heighterrors. Yetquorum_statuses[107][Q].1still holds Q'sBLSPublicKey— the read path simply never looks there.Why the reference is legitimate (not a node bug)
Platform/Drive signs with the type-107 quorum selected at a lagged height, and the proof carries that quorum's hash. Verification should resolve the signing quorum by hash, regardless of whether the quorum is still in the active set at
H. The verifier consumes only the 48-byte public key — height is pure context, not a membership constraint.Live confirmation on paloma —
dash-cli quorum info 107 000002b67a6085983bc87a711e28987ad0914c348e2cfc24100953abd23d9750(the big-endian form of the logged50973dd2…b6020000) returns a real, valid quorum:llmq_devnet_platform, quorumIndex 0, 12 valid membersquorumPublicKey b1801046775dc6ca7c2b42bc3084b819ccb31712fcc4dea97d973c73261f92359c55a593d898ffa70ba617e4988b72d8The active-4 at the tip and their heights:
0000056d…000001da…0000030a…00000088…Heights 16536/16560/16584/16608 → DKG interval 24. The full sequence including the retired one is …16488, 16512, 16536, 16560, 16584, 16608… With
active_count = 4, Q (16488) is active in[16488, 16584)and retires at 16584. The proof references it atcore_chain_locked_height = 16596— 12 blocks after retirement, i.e. exactly one step out of the active window. The selection offset16596 − 16488 = 108 > 96(the active window), which is the precise inequality above.Evidence
proof verification successfulat10:31:16.703, thenInvalidQuorumfor a different embedded quorum hash at10:31:16.905. Identical SPV state across those 200 ms rules out staleness — the discriminator is the hash, not height or list freshness.Quorum not foundwarnings in a run where the engine's window was frozen at 16596 (allat list height 16596 (requested 16596)); hundreds in another run. A non-advancing window lags every incoming proof, so the edge condition that is rare against a live engine becomes pervasive — same defect, amplified.50973dd2,5eec7acc,1c0b8f69,b7fc2340), rotating with height in a quorum-aging pattern.Framing: this is a retirement-edge timing race. It is pervasive when the engine lags, rare-but-real when live — it fires only when a proof's signing quorum has just left the active window.
Deterministic reproduction (hermetic — no devnet)
The retirement asymmetry is directly observable without a live network. Mirror the existing
engine_with_listsfixture pattern (dash-spv/src/sync/masternodes/manager.rs:711) andMasternodeList::empty(block_hash, block_height)(dash/src/sml/masternode_list/mod.rs:39). Parameters mirror the devnet cadence and the paloma instance:active_count = 4,interval = 24→ window 96; quorum Q minted at M = 100 (retires at100 + 4×24 = 196); lookup at H = 208 (≥ 196;208 − 100 = 108 > 96— same inequality as paloma's 16488/16584/16596).Higher-fidelity variant (optional, guards the diff plumbing): feed a base list then a sequence of
MnListDiffs across ≥ 5 cycles, where the cycle-Mdiff carries Q innew_quorumsand the cycle-196 diff carries Q indeleted_quorums. Real binaryMnListDifffixtures already live atdash/tests/data/test_DML_diffs/*.binand back the existingapply_difftests; a maintainer can synthesize an analogous devnet fixture.Suggested fix
dash: addMasternodeListEngine::quorum_public_key_by_hash(&self, llmq_type, quorum_hash) -> Option<(BLSPublicKey, LLMQEntryVerificationStatus)>readingquorum_statuses. (The existing test atmod.rs:1958already readsquorum_statusesvia.get(&type).and_then(|m| m.get(&hash)).map(|(_, _, status)| ...)— same access shape.)dash-spv: inget_quorum_at_height(client/queries.rs), on the active-list miss (queries.rs:71-82) fall back to the by-hash accessor instead of returningErr; keepQuorumLookupErroronly when both miss.This is window-independent by construction — it keys on the hash, not on active-set membership at
H— so it resolves the just-retired signing quorum the active-window lookup cannot. The common in-window path is untouched (the active list hits first). The verifier needs only the 48-byte key, whichBLSPublicKeyprovides directly. Effort ~S. Prefer the fallback to surface/preferVerifiedentries; correctness is ultimately gated by the BLS threshold-signature check against the returned pubkey, so a wrong pubkey fails the proof rather than forging one. Suggest an independent PR offdev(not stacked on #797 — the fix does not depend on the SML-v3 decode work).Related (separate)
Downstream SDKs ban the entire DAPI pool on a single
InvalidQuorum, turning one rare retirement-edge miss into aNoAvailableAddressescascade. That is worth a separate hardening issue — out of scope here.🤖 Co-authored by Claudius the Magnificent AI Agent