fix(paths-through): use canonical resolved_path instead of naive prefix match — fixes wrong-node attribution (#1352)#1353
Conversation
…ttribution Two TDD tests covering issue #1352: 1. PrefixCollision_1352 (canonical resolved_path branch): 3 nodes share prefix "c0", tx has canonical resolved_path naming B. Only paths-through-B must include the tx; A and C must exclude. (Already passes thanks to #1278's Option-A canonical handling.) 2. PrefixCollision_1352_FallbackBranch (NULL resolved_path): 3 GPS-having prefix siblings, NULL resolved_path. Old fallback uses biased resolver with hopContext=[targetPK] — every paths-through-X query attributes the tx to X. Asserts ≤1 attribution. Currently fails on the fallback test: A=1 B=1 C=1 (all three include).
…ibution — #1352 handleNodePaths fallback branch (txs with NULL persisted resolved_path) biased the hop resolver with hopContext=[lowerPK], anchoring tier-2 geo and tier-3 GPS tiers on the queried target. When multiple sibling nodes shared a hop's 1-byte prefix (e.g. 5 nodes with 'c0' on staging), every paths-through-X query for any X in the sibling set resolved that hop to X and counted the tx as a path through X. The same tx appeared in paths-through for ALL collisions — wrong-node attribution. Fix: in the fallback branch only, accept a biased-resolver match as evidence of target membership when EITHER (a) the tx is already pre-confirmed via the resolved_path index or SQL contains-pubkey check, OR (b) the hop's candidate set is unique (single prefix candidate, no collision possible). Multi-candidate prefix hops without independent confirmation are now treated as ambiguous and excluded. Server is read-only; this is a query-time logic change only. No DB writes, no schema changes (#1289 invariant preserved). The canonical resolved_path Option-A branch (#1278) is unchanged: when a tx has a persisted canonical resolved_path, membership is decided solely from the persisted pubkeys (no biased re-resolution). Tests (red-then-green): cmd/server/paths_through_collision_1352_test.go - PrefixCollision_1352: canonical resolved_path naming B only attributes to B (was already correct via #1278) - PrefixCollision_1352_FallbackBranch: 3 GPS siblings, NULL resolved_path → previously A=1 B=1 C=1, now ≤1 attribution Fixes #1352
Independent review (round 1) — adversarialCold read against Must-fix
Out-of-scope
VerdictNEEDS-WORK — must-fix count: 10. |
Kent Beck Gate (round 1) — TDD + test qualityVerdict: NEEDS-WORKTDD history: PASS (structural)TDD history check
Caveat: GH Actions only ran on the PR head SHA (commit 2 = success). No CI artifact proves commit 1 was red on an assertion. The PR body's mutation claim (revert The Six Questionsa. Mutation: revert b. Smallest test that would have caught the original bug? c. Could a WRONG implementation pass these tests?
Nothing in this test file pins the positive fallback case. The weak d. What edge cases are NOT tested?
e. Test names describe behavior or implementation? f. Setup more complex than test? API smell? Must-fix
Out of scope (not blocking)
Requesting changes pending must-fix #1 and #2. Both are small additions to the same test file; no production-code touch needed. — Kent Beck Gate (round 1) |
MeshCore Review (round 1)Independent expert (MeshCore protocol engineer) cold review. Verdict: NEEDS-WORK — fix is protocol-correct, but the diff has improvable items. Protocol correctness — verified ✅
Must-fix (5)
Out-of-scope
|
Summary
/api/nodes/{pk}/paths(paths-through-node) attributed the same transmission to every prefix-sibling when their hop bytes collided (e.g. 5 nodes withc0…on staging). Querying any of them returned the tx — visible bug per #1352 where Kpa Roof Solar's view included a packet whose actual relay was C0ffee SF.Root cause
handleNodePathshas two branches:resolved_path, membership is decided from the stored pubkeys. This branch is correct.resolved_pathis NULL/missing, the code invokedpm.resolveWithContext(hop, []string{lowerPK}, graph)to re-resolve hops. ThehopContext=[lowerPK]anchors the resolver on the queried target, so the tier-2 (geo-proximity) / tier-3 (GPS+observation-count) tiers preferentially pick the target. Everypaths-through-Xcall for anyXin the sibling set then resolved the colliding hop toXand counted the tx — wrong-node attribution across the whole sibling set.Fix
Server-side, query-time only. No DB writes (
#1289read-only invariant preserved). No canonical-branch changes — only the fallback path.In the fallback branch, accept a biased-resolver match as evidence of target membership only when either:
INSTR(resolved_path, pubkey)check, orlen(pm.m[hop]) <= 1) — no collision, no bias possible.Multi-candidate prefix hops without independent SQL/index confirmation are now treated as ambiguous and excluded from paths-through. Same rule applied to the unresolvable-hop sub-case (when
resolveHopreturns nil but the prefix could match the target).Which canonical resolved_path source is used
This PR does not introduce a new resolved_path source. It piggybacks on what's already in place:
s.store.fetchResolvedPathForTxBest(tx)→ SQLiteobservations.resolved_path(populated upstream by the hop-disambiguator from fix(#1197): plumb hop-context + observation-count tiebreak to disambiguator #1198/fix(#1199): 6 deferred quality items from PR #1198 r2 review #1200/fix(#1229): source-diversity confidence weighting in neighbor-graph tier-1 resolver #1235).confirmedByFullKey(membership indexs.store.byPathHop[lowerPK]) andconfirmedBySQL(s.store.confirmResolvedPathContains→INSTR(LOWER(resolved_path), "pubkey")).So when canonical data exists, attribution is purely persisted-path driven; when it doesn't, attribution requires either a SQL pubkey hit or a unique prefix candidate. Biased resolution alone is no longer sufficient.
TDD — red, then green
Two new tests in
cmd/server/paths_through_collision_1352_test.go:TestHandleNodePaths_PrefixCollision_1352— canonical branch (already green via bug(node-detail): "Paths through this node" includes packets whose canonical resolved_path does NOT include the node — disambiguator anchor-bias inconsistency #1278). 3 nodes sharec0, tx canonical resolved_path = [B]. Only paths-through-B includes the tx.TestHandleNodePaths_PrefixCollision_1352_FallbackBranch— red before the fix. 3 GPS-havingc0siblings, NULL resolved_path. Before: A=1 B=1 C=1 (wrong-node attribution on all). After: ≤1 attribution.Mutation: reverting the
len(pm.m[hop]) <= 1guard inroutes.gorestores the failing red state.Existing tests preserved:
TestHandleNodePaths_PrefixCollisionExclusion(Paths Through: false positives from short-prefix collisions in byPathHop index #929) — still green.TestHandleNodePaths_AnchorBiasInconsistency_Issue1278(bug(node-detail): "Paths through this node" includes packets whose canonical resolved_path does NOT include the node — disambiguator anchor-bias inconsistency #1278) — still green.go test ./...oncmd/serverandcmd/ingestor: green.Acceptance criteria (from #1352)
Out of scope
Fixes #1352