Skip to content

Release prep v0.7.2: relay_node tracking (PR #45) + CRC_BAD packet drop (closes #34)#46

Merged
KMX415 merged 5 commits into
mainfrom
release/v0.7.2
May 5, 2026
Merged

Release prep v0.7.2: relay_node tracking (PR #45) + CRC_BAD packet drop (closes #34)#46
KMX415 merged 5 commits into
mainfrom
release/v0.7.2

Conversation

@KMX415
Copy link
Copy Markdown
Owner

@KMX415 KMX415 commented May 5, 2026

Summary

Prep branch for the v0.7.2 release. Bundles two changes that have been ready to ship since May 1:

  1. Relay-node debugging on the dashboard (originally PR Added information for debugging mesh connects #45, accidentally closed by the contributor on May 4)
  2. Drop CRC_BAD packets at source (the fix/crc-bad-drop branch, never opened as a PR)

Version bump and CHANGELOG entry are deliberately not in this branch yet — they will land as a final commit once hardware validation on the test RAK is green. That keeps the fleet-wide update indicator (__version__ on main) from firing until we've confirmed the bundle runs cleanly on real silicon.

What's in this branch

89b3dbe  KMX415          Drop CRC_BAD packets at source instead of passing them to decoder
1c19534  KMX415          Add relay_node tests and small review polish
2c77d91  kendelmccarley  Add relay_node tracking and packet-focus map line  (Co-Authored-By: Claude Sonnet 4.6)
1627b3f  kendelmccarley  Fix: read hop_limit from TransmitConfig instead of hardcoding 3  (Co-Authored-By: Claude Sonnet 4.6)
─── 1779b66  Fix coding rate on Short and Medium presets, add LongTurbo (#44)

Co-authored-by: trailers preserved per the May 4 attribution policy ("contributors choose their own attribution").

1. Relay-node tracking (originally #45)

Surfaces the Meshtastic header relay_node byte (the lowest byte of the last relay node's ID) through the decoder → Packet model → SQLite schema (with ALTER TABLE migration for existing installs) → WebSocket payload → dashboard packet feed. Clicking a row in the packet feed now draws a line on the map between the source and the relay, and the source cell shows !src ↝ !relay with full short-ID resolution when the relay is in the local node registry.

Real-world utility: kendel used this to trace a hop chain back to a rooftop node and discovered its ERP was bad.

Bonus fix in the same series: text-message and NodeInfo TX paths now read hop_limit from TransmitConfig instead of hardcoding 3 in two places. Behavior unchanged for anyone running the default (still 3); fixes the silent "I set hop_limit in my yaml and it didn't take" bug.

2. CRC_BAD packets dropped at the HAL boundary

SX1302Wrapper.receive() was logging the warning for STAT_CRC_BAD packets but still appending them to the returned packet list. The crc_ok=False field on ConcentratorPacket has never been read by any downstream code (concentrator_source, packet_router, decoders), so RF-corrupted bytes flowed all the way into MeshtasticDecoder.decode().

Three observable downstream symptoms produced by this:

  1. Phantom node IDs in the local SQLite node table and the cloud DynamoDB node catalog. A bit flip in the source-ID field creates a "new node" that shares all-but-one bit with a real source ID.
  2. False ENCRYPTED packets. A corrupted channel-hash byte stops matching the LongFast hash and the packet is filed under "encrypted on a private channel we lack the key for."
  3. Garbled-but-readable text. AES-CTR XORs corrupted ciphertext with the keystream and produces mostly-correct plaintext with a few mangled characters. This is the "3 hopr frOO\"Nc >> Mesa" symptom in Parsing Error on Some messages #34.

This bug predates v0.7.0 (the pre-source-publication HAL didn't even define STAT_CRC_BAD), but v0.7.0's RX diagnostic logging made it newly visible. The append-anyway behavior has been live since the initial RAK2287 wiring commit.

Fix is one line: continue after the CRC_BAD log. The diagnostic WARNING and crc_bad_count counter both keep working unchanged. STAT_CRC_OK and STAT_NO_CRC paths are deliberately untouched.

Closes #34.

Cloud-side evidence (informational)

A diagnostic audit on meshradar.io this morning found the cloud active_nodes_24h = 18,374 count is dominated by phantom IDs that match the CRC_BAD-passthrough fingerprint:

  • 89.6% of "active" nodes were heard by exactly one Meshpoint (real RF mesh nodes overlap across antennas)
  • 76.7% have no long_name AND no position (decryption fails on corrupted ciphertext → ENCRYPTED with no decoded payload)
  • 24.2% at hop_count ≥ 4 (the hop_count is 3 bits in the flags byte — a single bit flip promotes hop=0 → hop=4)
  • Top contributors are the busiest-RF Meshpoints in the fleet

After this fix lands and the fleet rolls forward, expect active_nodes_24h to drop sharply (rough estimate: from 18k toward 5–8k) over the following 24–48h as phantom IDs age out of the 24h window. We'll watch the trend post-deploy and decide whether to run a one-shot purge of pre-fix phantoms in DynamoDB or just let the 30-day NODE_TTL clean them up naturally.

Test plan

  • pytest tests/297 passed, 1 pre-existing warning, 2.60s
  • ruff check . → all checks passed
  • Hardware validation on RAK V2 .141 from this branch:
    • git checkout release/v0.7.2, sudo systemctl restart meshpoint, watch logs for clean startup
    • MESHPOINT_DEBUG_RX=1 for 5 minutes — confirm STAT_CRC_BAD lines no longer carry an associated phantom-node DB write
    • Send Meshtastic DM to/from .141 and validate relay_node populates correctly in the dashboard packet feed
    • Click a packet row, confirm the map line draws between source and relay
    • Verify meshpoint logs for any new tracebacks
  • After hardware-green: append a single Release v0.7.2 commit bumping __version__, default.yaml firmware_version, README badge, and CHANGELOG. Push, watch CI, squash-merge.

Authoring + attribution

Kendel's two commits on this branch (1627b3f and 2c77d91) carry his GitHub noreply email so his profile links correctly in the contributors graph + git blame, identical to the PR #38 squash on main. The Co-Authored-By: Claude Sonnet 4.6 trailers from his original commits are preserved.

PR #45 was accidentally closed by the contributor on May 4 with a misdirected force-push that also wiped the work from his fork. Local pr-45-kendelmccarley retained everything, and these commits are cherry-picked from there. A linkback comment will be posted on the closed PR pointing here so the original contribution thread stays connected.

kendelmccarley and others added 4 commits May 5, 2026 12:01
Both text message and NodeInfo TX were ignoring the configured hop_limit
and always sending with hop_limit=3/hop_start=3. Packets now carry the
value from TransmitConfig, falling back to the constant if no config is
present.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Surfaces the Meshtastic header relay_node byte (last byte of the
last relay node's ID) through the full stack so the dashboard can
show which node retransmitted each packet.

- meshtastic_decoder: parse header byte 15 as relay_node
- models/packet: add relay_node field and include in to_dict()
- storage/database: add relay_node column + migration
- storage/packet_repository: insert and restore relay_node
- simple_packet_feed: show relay as "!src ↝ !relay" in source cell;
  resolve full short-ID from node registry when known
- node_map: add drawFocusLine / clearFocusLine for persistent lines
- app.js: wire packet-row focus to map line draw/clear; keep node
  registry in sync with 15-second refresh

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Bundled additions on top of the relay_node tracking work in PR #45:

- tests/test_relay_node_header.py: locks the Meshtastic header byte 15
  read into Packet.relay_node. Covers the static parser in isolation
  (extracts nonzero byte, treats zero as direct, full-byte-range
  round-trip, no bleed from byte 14) and the full decode() path
  (Packet.relay_node populated, defaults to 0, included in to_dict()
  for the WebSocket payload).
- tests/test_database_migration.py: adds TestRelayNodeMigration with
  four cases for the packets.relay_node ALTER TABLE migration:
  fresh-install schema already includes the column, pre-PR-45
  databases get it added on next connect(), the migration is
  idempotent across restarts (no duplicate column, no data loss),
  and rows that predated the column read back with relay_node=0 by
  default. Module docstring updated to cover both migrations.
- frontend/css/dashboard.css: .relay-hop now uses var(--accent-amber)
  instead of var(--accent-yellow, #f0c040). The dashboard does not
  define an --accent-yellow token, so the hex fallback would have
  fired everywhere; --accent-amber (#f59e0b) is the canonical token
  already used by the position packet type, mid-band RSSI, and other
  amber accents on the page.
- src/transmit/tx_service.py: renamed NODEINFO_HOP_LIMIT to
  DEFAULT_HOP_LIMIT and used it in both fallback expressions, so the
  text-message fallback no longer relies on a bare literal 3 while
  the NodeInfo fallback uses a named constant. The constant has no
  importers outside this file. Behavior unchanged (still 3).

All 280 tests pass. ruff clean. No edge-runtime behavior changes.
SX1302Wrapper.receive() was logging the WARNING for STAT_CRC_BAD
packets but still appending them to the returned packet list. The
crc_ok=False field on ConcentratorPacket has never been read by any
downstream code (concentrator_source, packet_router, decoders), so
RF-corrupted bytes flowed all the way into MeshtasticDecoder.decode().

Three observable downstream symptoms produced by this:

1. Phantom node IDs in the local SQLite node table and the cloud
   DynamoDB node catalog. A bit flip in the source-ID field creates
   a "new node" that shares all-but-one bit with a real source ID.
   Visible in production logs as clusters like 4d8b18a1 / 4d8b98a1
   / 7d8b18a9 / 7d8b98a9 (single-bit flips of one or two real IDs).
2. False ENCRYPTED packets. A corrupted channel-hash byte stops
   matching 0x08 (LongFast) and the packet is filed under "encrypted
   on a private channel we lack the key for." Real-world fingerprint:
   uniform-random distribution of channel hashes across hex space,
   inconsistent with actual private-channel popularity.
3. Garbled-but-readable text. AES-CTR XORs corrupted ciphertext with
   the keystream and produces mostly-correct plaintext with a few
   mangled characters. Reported as issue #34 ("3 hopr frOO\"Nc >> Mesa"
   instead of "3 hops from Costa Mesa").

Confirmed via private-repo git log that this predates v0.7.0: the
pre-v0.7.0 source did not even define STAT_CRC_BAD, so no filtering
was possible. v0.7.0 added the WARNING + crc_bad_count counter (RX
diagnostic logging from the deferred core-module bundle) which made
the symptoms newly visible. The append-anyway behavior has been live
since the initial RAK2287 wiring commit.

Fix is one line: continue after the CRC_BAD log. The diagnostic
WARNING and crc_bad_count counter both keep working unchanged.
STAT_CRC_OK and STAT_NO_CRC paths are deliberately untouched
(NO_CRC packets pass through as before; if it turns out they should
also be dropped, that is a follow-up decision with its own evidence).

Tests: tests/test_sx1302_wrapper.py covers the filter contract
(CRC_BAD dropped, CRC_OK passed, NO_CRC still passed, mixed batch
returns only good packets, counter persists across calls, size==0
fast-path unchanged, not-started returns empty, lgw_receive error
returns empty, request size matches LGW_PKT_MAX). 9 new tests, all
passing alongside the existing 269 in the suite. Ruff clean.

Closes #34.
- src/version.py: 0.7.1 -> 0.7.2
- config/default.yaml: device.firmware_version 0.7.1 -> 0.7.2
- README.md: version badge bump
- docs/CHANGELOG.md: v0.7.2 entry covering CRC_BAD drop fix (closes #34),
  relay_node tracking, and the hop_limit TransmitConfig fix

Hardware-validated on RAK V2 .141:
- 14 historical phantom IDs in local DB match the bit-flip fingerprint of
  legitimate neighbors, all (NO NAME), all pre-fix
- Zero phantoms registered since checkout (12/12 named arrivals over 2.5h)
- One CRC_BAD WARNING fired during soak (sf7 bw125 rssi=-102 snr=-12.5),
  no decoder follow-on - confirms the new continue statement engages on
  real RF traffic

Tests: 297 passed, ruff clean.
@KMX415 KMX415 merged commit c5c0abf into main May 5, 2026
1 check passed
@KMX415 KMX415 deleted the release/v0.7.2 branch May 5, 2026 20:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Parsing Error on Some messages

2 participants