diff --git a/ALERTS.md b/ALERTS.md new file mode 100644 index 0000000000..92b0974a1c --- /dev/null +++ b/ALERTS.md @@ -0,0 +1,115 @@ +# Fault Alerts (Group Channel) + +This document describes MeshCore repeater fault alerts, including configuration, CLI commands, and operational behavior. + +The repeater can broadcast a one-line fault notification on a configured group channel when WiFi or any active MQTT slot has been disconnected longer than a configurable threshold. + +The alert is sent over **LoRa** as a `PAYLOAD_TYPE_GRP_TXT` flood packet on the configured channel (with sender = device name) - *not* over MQTT. This is intentional: the MQTT path is what's broken, so the only working delivery is the mesh itself. Anyone in radio range subscribed to the same channel/hashtag in their companion app will see the alert inline with normal channel chat. + +> **A small list of community channels is intentionally NOT supported.** Fault alerts are operator-infrastructure noise - broadcasting them on shared community channels would spam every node in the area (and on `#test` / `#bot` would amplify via well-known auto-responders). The currently banned destinations are: +> +> - The well-known **Public** group PSK (`izOH6cXN6mrJ5e26oRXNcg==`) +> - **`#test`** (`sha256("#test")[0..15]`) +> - **`#bot`** (`sha256("#bot")[0..15]`) +> +> The list lives in `BANNED_ALERT_CHANNELS[]` in [src/helpers/AlertReporter.cpp](src/helpers/AlertReporter.cpp); adding a new entry is one line (label + 32 hex chars). The matcher runs at both the CLI validation step (`set alert.psk`, `set alert.hashtag`) and the alert-send path, so a saved-config bypass is still refused at runtime. You must point alerts at a **private PSK** (`set alert.psk`) or a non-banned **hashtag channel** (`set alert.hashtag`) before alerts can fire. + +## Scope and routing + +Alert floods ride the **repeater's default scope** by default (the same TransportKey used for adverts and channel broadcasts - set via `region default ...`). Operators can override on a per-alert-feature basis with `set alert.region `: + +- If `alert.region` is set and the name resolves via `RegionMap`, that region's TransportKey is used. +- If `alert.region` is unset, or the name doesn't resolve, the repeater's `default_scope` is used. +- If both are null, the alert is sent unscoped (matches the pre-scoped firmware's behavior). + +`alert.region` is stored as-is - it does **not** create the region. Use `region put ` first if it doesn't exist. + +## What triggers an alert + +- **WiFi**: continuously down for at least `alert.wifi` minutes (default 30) +- **MQTT slot N**: enabled, has connected at least once since boot, and has been disconnected for at least `alert.mqtt` minutes (default 240, i.e. 4 h) + +A "recovered" message is sent once when the underlying connection comes back. After firing, a fault is rate-limited by `alert.interval` (default 60 minutes) before it can re-fire - this prevents flapping links from spamming the channel. + +## Defaults + +| Setting | Default | Notes | +|---------|---------|-------| +| `alert` | `off` | Master enable for automatic fault alerts | +| `alert.psk` | *(unset)* | Private channel secret as **32 hex chars** (16-byte channel key) - the same format the mobile app's "Share Channel" emits, and what every other secret-shaped CLI command (e.g. `prv.key`) uses. | +| `alert.hashtag` | *(unset)* | Informational only; set via `set alert.hashtag` to pre-derive `alert.psk` from `sha256("#name")[0..15]`. Cleared when `alert.psk` is set directly. | +| `alert.region` | *(unset)* | Optional region name; overrides the repeater's `default_scope` for alert sends only. Empty = use `default_scope`. Looked up lazily via `RegionMap`; unknown names silently fall back to `default_scope`. | +| `alert.wifi` | `30` (min) | 0 disables WiFi alerts | +| `alert.mqtt` | `240` (min) | 0 disables MQTT alerts | +| `alert.interval` | `60` (min) | Minutes between repeat alerts of the same fault. **Hard floor of 60 min** so a flapping link can't spam the mesh; the CLI rejects lower values and AlertReporter clamps stale prefs at runtime. | + +> `alert.psk` is unset on a fresh flash. **Alerts cannot fire and `alert test` will refuse to send until you configure either `alert.psk` directly or `alert.hashtag` (which derives one).** The sender shown on outgoing alert messages is always the node name (`set name ...`); there is no separate `alert.name`. + +## CLI + +Get: +- `get alert` - master on/off +- `get alert.psk` - the active 32-hex-char PSK (or `(unset)`) (**serial console only**) +- `get alert.hashtag` - the originating hashtag (or `(unset)`, e.g. after `set alert.psk` overrides the hashtag-derived key) +- `get alert.region` - alert-only scope override (or `(unset, using default scope)`) +- `get alert.wifi` / `get alert.mqtt` / `get alert.interval` + +Set: +- `set alert on` / `set alert off` +- `set alert.psk ` - 32 hex chars (16-byte channel secret); rejects banned channels (Public, `#test`, `#bot`). Paste the mobile app's "Share Channel" output as-is. Clears `alert.hashtag` since the new key is operator-supplied. +- `set alert.psk` (no argument) - clears both `alert.psk` and `alert.hashtag` +- `set alert.hashtag ` - derives the 16-byte key from `sha256("#name")` *once*, stores it as `alert.psk`, and remembers the hashtag for `get alert.hashtag`. `#` prefix is added if omitted (so `alerts` and `#alerts` are equivalent). Refuses banned hashtag names. +- `set alert.hashtag` (no argument) - clears both `alert.psk` and `alert.hashtag` +- `set alert.region ` - alert-only scope override (no region-map mutation; unknown names silently fall back to `default_scope`) +- `set alert.region` (no argument) - clear override, use `default_scope` +- `set alert.wifi ` (0-1440; 0 = disabled) +- `set alert.mqtt ` (0-10080; 0 = disabled) +- `set alert.interval ` (60-10080; 60-minute floor to protect mesh airtime) + +Action: +- `alert test` - send a one-off `[test] alert channel ok` immediately on the configured channel; ignores `alert on/off` so operators can verify the channel before enabling fault firing. Returns an error if no channel is configured. +- `alert test ` - send a custom test message: `[test] `. + +## Example: dedicated hashtag channel (recommended for operator groups) + +```bash +set alert.hashtag ops-alerts # stored as "#ops-alerts"; key = sha256("#ops-alerts")[0..15] +set alert.wifi 10 # tighter for ops monitoring +set alert.mqtt 60 +set alert on +alert test +``` + +Anyone running a companion app and subscribed to the `#ops-alerts` hashtag channel will see the alerts inline. + +## Example: dedicated alerts channel with a private PSK + +Generate a 16-byte random PSK as 32 hex chars (`openssl rand -hex 16`), or use the companion app's "Add channel" feature and copy the "Share Channel" output. Then: + +```bash +set alert.psk <32_hex_chars> # 16-byte channel secret; mobile "Share Channel" pastes in directly +set alert.wifi 10 +set alert.mqtt 60 +set alert on +alert test +``` + +Subscribers running a MeshCore companion app should add a channel with the same PSK; alerts will appear in that channel's chat view. (Pick any local name for it - the sender of incoming alert messages is the repeater's node name.) + +## Sample messages + +``` +MyObserver: WiFi down 47m (reason 201) +MyObserver: WiFi recovered after 1h3m +MyObserver: MQTT slot 1 (analyzer-us) down 4h12m +MyObserver: MQTT slot 1 (analyzer-us) recovered after 4h45m +``` + +## Notes + +- A reboot during an outage resets the timer; the alert won't double-fire because `millis()` starts at 0 at boot. The fault must persist `alert.wifi` / `alert.mqtt` minutes from boot. +- Fault state is stored in RAM only - no persistence across reboots. +- The MQTT-slot watcher uses a separate per-slot `current_outage_started_ms` field that is reset on each reconnect, distinct from the `first_disconnect_time` shown in `mqttN.diag` (which remains a "first disconnect since boot" counter for diagnostics). +- WiFi-down alerts can only be delivered if the LoRa radio is up. There is no fallback path. +- Banned channels (Public, `#test`, `#bot`) are **rejected** at both `set alert.psk` / `set alert.hashtag` and at the alert-send path, so even if you somehow set one via a saved config file, the firmware will silently refuse to broadcast on it. To add another banned channel, append a row to `BANNED_ALERT_CHANNELS[]` in [src/helpers/AlertReporter.cpp](src/helpers/AlertReporter.cpp); the format is `{ "label", "32-lowercase-hex-chars" }` (compute as `printf '#name' | openssl dgst -sha256 | cut -c1-32`). +- Alerts are sent via `sendFlood` with the resolved TransportKey codes attached, so they appear on the configured scope just like other broadcast traffic. Operators monitoring a specific region need to be subscribed to that region's scope to hear alerts. diff --git a/MQTT_IMPLEMENTATION.md b/MQTT_IMPLEMENTATION.md index c562179df8..d08363d9cb 100644 --- a/MQTT_IMPLEMENTATION.md +++ b/MQTT_IMPLEMENTATION.md @@ -682,13 +682,8 @@ set timezone UTC-5 # UTC offset Observer nodes include an optional SNMP v2c agent that exposes radio stats, MQTT connectivity, memory usage, and network information to standard monitoring tools. See [MQTT_SNMP.md](MQTT_SNMP.md) for setup and OID reference. -## Dependencies - -- **PsychicMqttClient**: MQTT client library (supports WSS and direct MQTT) -- **ArduinoJson**: JSON message formatting -- **NTPClient**: Network time protocol client -- **Timezone**: Timezone conversion library (JChristensen/Timezone) -- **WiFi**: ESP32 WiFi functionality -- **Ed25519**: Cryptographic library for JWT token signing -- **JWTHelper**: Custom JWT token generation for device authentication -- **SNMP_Agent**: Optional SNMPv2c agent (0neblock/SNMP_Agent, observer builds only) + +## Fault Alerts + +Fault alerts broadcast LoRa group-channel notifications when WiFi or configured MQTT links stay down past configured thresholds, with optional recovery notices and rate limiting to avoid spam. +For configuration, CLI commands, examples, and operational notes, see [ALERTS.md](ALERTS.md). diff --git a/examples/simple_repeater/MyMesh.cpp b/examples/simple_repeater/MyMesh.cpp index ecad3cd67f..0153ed9edc 100644 --- a/examples/simple_repeater/MyMesh.cpp +++ b/examples/simple_repeater/MyMesh.cpp @@ -947,6 +947,20 @@ MyMesh::MyMesh(mesh::MainBoard &board, mesh::Radio &radio, mesh::MillisecondCloc #endif _prefs.radio_watchdog_minutes = 5; // 5 minutes default + // Alert channel defaults — disabled by default, and the channel is left + // unconfigured so a freshly-flashed observer never broadcasts on the + // well-known Public hashtag. Operators must explicitly pick a private + // key (`set alert.psk`) or a hashtag (`set alert.hashtag`) before alerts + // can fire. The sender prefix on outgoing alert messages is always the + // node name (`set name ...`), so there's no separate `alert.name`. + _prefs.alert_enabled = 0; + _prefs.alert_psk_hex[0] = '\0'; + _prefs.alert_hashtag[0] = '\0'; + _prefs.alert_region[0] = '\0'; // empty = use default_scope + _prefs.alert_wifi_minutes = 30; // 30 minutes + _prefs.alert_mqtt_minutes = 240; // 4 hours + _prefs.alert_min_interval_min = 60; // re-arm window: 1 hour + // bridge defaults _prefs.bridge_enabled = 1; // enabled _prefs.bridge_delay = 500; // milliseconds @@ -1074,6 +1088,15 @@ void MyMesh::begin(FILESYSTEM *fs) { } #endif + // Wire fault-alert reporter. begin() is safe regardless of bridge state. + // Passing `this` as the callbacks lets the reporter resolve a TransportKey + // scope (alert.region override, falling back to default_scope) so alert + // floods ride the same scope as adverts/channel messages. + _alerter.begin(&_prefs, this, this); +#if defined(WITH_MQTT_BRIDGE) + _alerter.setBridge(bridge); +#endif + radio_driver.setParams(_prefs.freq, _prefs.bw, _prefs.sf, _prefs.cr); radio_driver.setTxPower(_prefs.tx_power_dbm); @@ -1100,6 +1123,24 @@ void MyMesh::sendFloodScoped(const TransportKey& scope, mesh::Packet* pkt, uint3 } } +bool MyMesh::resolveAlertScope(TransportKey& dest) { + // Prefer an explicit alert.region override; look it up lazily via + // RegionMap so the operator can name a region that doesn't exist yet + // without polluting region_map state — we just silently fall through + // to default_scope on miss. + if (_prefs.alert_region[0]) { + auto r = region_map.findByNamePrefix(_prefs.alert_region); + if (r && region_map.getTransportKeysFor(*r, &dest, 1) > 0 && !dest.isNull()) { + return true; + } + } + if (!default_scope.isNull()) { + dest = default_scope; + return true; + } + return false; +} + void MyMesh::applyTempRadioParams(float freq, float bw, uint8_t sf, uint8_t cr, int timeout_mins) { set_radio_at = futureMillis(2000); // give CLI reply some time to be sent back, before applying temp radio params pending_freq = freq; @@ -1425,6 +1466,8 @@ void MyMesh::loop() { uptime_millis += now - last_millis; last_millis = now; + _alerter.onLoop(now); + #ifdef WITH_SNMP // Push radio stats to SNMP agent every 2 seconds if (_snmp_agent.isRunning()) { diff --git a/examples/simple_repeater/MyMesh.h b/examples/simple_repeater/MyMesh.h index 4d3e6d4e36..57c7a877b9 100644 --- a/examples/simple_repeater/MyMesh.h +++ b/examples/simple_repeater/MyMesh.h @@ -34,6 +34,7 @@ #endif #include +#include #include #include #include @@ -130,6 +131,7 @@ class MyMesh : public mesh::Mesh, public CommonCLICallbacks { #ifdef WITH_SNMP MeshSNMPAgent _snmp_agent; #endif + AlertReporter _alerter; void putNeighbour(const mesh::Identity& id, uint32_t timestamp, float snr); uint8_t handleLoginReq(const mesh::Identity& sender, const uint8_t* secret, uint32_t sender_timestamp, const uint8_t* data, bool is_flood); @@ -211,6 +213,10 @@ class MyMesh : public mesh::Mesh, public CommonCLICallbacks { // CommonCLICallbacks void applyTempRadioParams(float freq, float bw, uint8_t sf, uint8_t cr, int timeout_mins) override; + + void onAlertConfigChanged() override { _alerter.onConfigChanged(); } + bool sendAlertText(const char* text) override { return _alerter.sendText(text); } + bool resolveAlertScope(TransportKey& dest) override; bool formatFileSystem() override; void sendSelfAdvertisement(int delay_millis, bool flood) override; void updateAdvertTimer() override; @@ -265,10 +271,16 @@ class MyMesh : public mesh::Mesh, public CommonCLICallbacks { bridge->setStatsSources(this, _radio, _cli.getBoard(), _ms); #endif bridge->begin(); +#ifdef WITH_MQTT_BRIDGE + _alerter.setBridge(bridge); +#endif } else { bridge->end(); +#ifdef WITH_MQTT_BRIDGE + _alerter.setBridge(nullptr); +#endif } } diff --git a/examples/simple_room_server/MyMesh.cpp b/examples/simple_room_server/MyMesh.cpp index fe2af05353..de7e3e1111 100644 --- a/examples/simple_room_server/MyMesh.cpp +++ b/examples/simple_room_server/MyMesh.cpp @@ -675,6 +675,16 @@ MyMesh::MyMesh(mesh::MainBoard &board, mesh::Radio &radio, mesh::MillisecondCloc _prefs.gps_interval = 0; _prefs.advert_loc_policy = ADVERT_LOC_PREFS; + // Alert channel defaults (same as repeater; off by default and unconfigured). + // Operator must pick `set alert.psk` or `set alert.hashtag` before alerts fire. + _prefs.alert_enabled = 0; + _prefs.alert_psk_hex[0] = '\0'; + _prefs.alert_hashtag[0] = '\0'; + _prefs.alert_region[0] = '\0'; + _prefs.alert_wifi_minutes = 30; + _prefs.alert_mqtt_minutes = 240; + _prefs.alert_min_interval_min = 60; + // bridge defaults (same as repeater) _prefs.bridge_enabled = 1; // enabled _prefs.bridge_delay = 500; // milliseconds @@ -795,6 +805,24 @@ void MyMesh::sendFloodScoped(const TransportKey& scope, mesh::Packet* pkt, uint3 } } +bool MyMesh::resolveAlertScope(TransportKey& dest) { + // Same resolution policy as simple_repeater: alert.region > default_scope. + // The room server doesn't currently embed an AlertReporter, but keeping + // the override in lockstep means the callback path works the same on both + // builds and we won't get caught out if/when it does. + if (_prefs.alert_region[0]) { + auto r = region_map.findByNamePrefix(_prefs.alert_region); + if (r && region_map.getTransportKeysFor(*r, &dest, 1) > 0 && !dest.isNull()) { + return true; + } + } + if (!default_scope.isNull()) { + dest = default_scope; + return true; + } + return false; +} + void MyMesh::sendFloodReply(mesh::Packet* packet, unsigned long delay_millis, uint8_t path_hash_size) { if (recv_pkt_region && !recv_pkt_region->isWildcard()) { // if _request_ packet scope is known, send reply with same scope TransportKey scope; diff --git a/examples/simple_room_server/MyMesh.h b/examples/simple_room_server/MyMesh.h index 74e57e808a..11c2dba2b4 100644 --- a/examples/simple_room_server/MyMesh.h +++ b/examples/simple_room_server/MyMesh.h @@ -198,6 +198,7 @@ class MyMesh : public mesh::Mesh, public CommonCLICallbacks { // CommonCLICallbacks void applyTempRadioParams(float freq, float bw, uint8_t sf, uint8_t cr, int timeout_mins) override; + bool resolveAlertScope(TransportKey& dest) override; bool formatFileSystem() override; void sendSelfAdvertisement(int delay_millis, bool flood) override; void updateAdvertTimer() override; diff --git a/src/helpers/AlertReporter.cpp b/src/helpers/AlertReporter.cpp new file mode 100644 index 0000000000..df38d6fc5e --- /dev/null +++ b/src/helpers/AlertReporter.cpp @@ -0,0 +1,313 @@ +#include "AlertReporter.h" + +#include +#include +#include +#include + +// Header layout for PAYLOAD_TYPE_GRP_TXT before encryption: +// [0..3] timestamp (uint32_t LE) — also helps make packet_hash unique +// [4] TXT_TYPE_PLAIN +// [5..] ": " (null-terminated by sender for legacy parsers) +#ifndef MAX_ALERT_TEXT_LEN +// Conservative ceiling: matches BaseChatMesh::MAX_TEXT_LEN (10 * 16 = 160) and +// stays under MAX_PACKET_PAYLOAD - 4(timestamp) - 1(type) - CIPHER_MAC_SIZE - 1. +#define MAX_ALERT_TEXT_LEN 160 +#endif + +#ifndef ALERT_TXT_TYPE_PLAIN +#define ALERT_TXT_TYPE_PLAIN 0 +#endif + +#ifdef MQTT_DEBUG +#include +#define ALERT_DEBUG_PRINTLN(...) Serial.printf("Alert: " __VA_ARGS__); Serial.println() +#else +#define ALERT_DEBUG_PRINTLN(...) do {} while (0) +#endif + +AlertReporter::AlertReporter() + : _prefs(nullptr), _mesh(nullptr), _callbacks(nullptr), +#ifdef WITH_MQTT_BRIDGE + _bridge(nullptr), +#endif + _next_check_ms(0) { +#ifdef WITH_MQTT_BRIDGE + memset(&_wifi, 0, sizeof(_wifi)); + memset(&_mqtt, 0, sizeof(_mqtt)); +#endif +} + +void AlertReporter::begin(NodePrefs* prefs, mesh::Mesh* mesh, CommonCLICallbacks* callbacks) { + _prefs = prefs; + _mesh = mesh; + _callbacks = callbacks; + onConfigChanged(); +} + +#ifdef WITH_MQTT_BRIDGE +void AlertReporter::setBridge(MQTTBridge* bridge) { + _bridge = bridge; +} +#endif + +// Channels banned as fault-alert destinations. Fault alerts are noisy +// operator-infrastructure messages; routing them to community channels would +// flood every nearby companion app (and amplify via well-known auto-responder +// bots), so the firmware refuses these keys at both CLI set-time and at +// runtime in resolveChannel. +// +// Provenance for each row can be re-derived with: +// printf '#name' | openssl dgst -sha256 | cut -c1-32 +// or for the Public PSK: +// echo 'izOH6cXN6mrJ5e26oRXNcg==' | base64 -d | xxd -p -c 16 +// +// To ban an additional channel: append one new row; no other code changes +// required. Both the table entries and `alert_psk_hex` are 32 lowercase hex +// chars (16-byte secret), so the matcher is a direct strcmp. +struct BannedAlertChannel { + const char* label; + const char* secret_hex; // 32 lowercase hex chars (no 0x, no separators) +}; + +static const BannedAlertChannel BANNED_ALERT_CHANNELS[] = { + // Public group PSK ("izOH6cXN6mrJ5e26oRXNcg==") + { "PUBLIC", "8b3387e9c5cdea6ac9e5edbaa115cd72" }, + // sha256("#test")[0..15] — auto-responders in many regions + { "#test", "9cd8fcf22a47333b591d96a2b848b73f" }, + // sha256("#bot")[0..15] — generic bot channel, frequent auto-responders + { "#bot", "eb50a1bcb3e4e5d7bf69a57c9dada211" }, +}; + +const char* alertReporterBannedChannelMatch(const uint8_t* secret16) { + char hex[33]; + mesh::Utils::toHex(hex, secret16, 16); + for (size_t i = 0; i < sizeof(BANNED_ALERT_CHANNELS) / sizeof(BANNED_ALERT_CHANNELS[0]); i++) { + if (strcmp(hex, BANNED_ALERT_CHANNELS[i].secret_hex) == 0) { + return BANNED_ALERT_CHANNELS[i].label; + } + } + return nullptr; +} + +const char* alertReporterBannedChannelMatchHex(const char* psk_hex) { + if (!psk_hex || strlen(psk_hex) != 32) return nullptr; + uint8_t secret[16]; + if (!mesh::Utils::fromHex(secret, 16, psk_hex)) return nullptr; + return alertReporterBannedChannelMatch(secret); +} + +bool AlertReporter::resolveChannel(mesh::GroupChannel& out) const { + if (!_prefs) return false; + + // alert_psk_hex is the single source of truth — `set alert.hashtag` + // pre-derives the hex-encoded PSK from sha256("#name")[0..15] at CLI time. + // Only 16-byte secrets (32 hex chars) are supported; 32-byte channel keys + // are not used anywhere in MeshCore practice and not represented in the + // banned table either. + const char* psk = _prefs->alert_psk_hex; + if (strlen(psk) != 32) return false; + + memset(out.secret, 0, sizeof(out.secret)); + if (!mesh::Utils::fromHex(out.secret, 16, psk)) return false; + + // Belt-and-suspenders against an operator pasting a banned PSK directly + // into alert.psk, or a hashtag whose hash somehow collides with one of the + // banned 16-byte secrets (astronomically improbable, but free to check). + const char* banned = alertReporterBannedChannelMatch(out.secret); + if (banned) { + ALERT_DEBUG_PRINTLN("refused banned channel '%s' for alert", banned); + return false; + } + + mesh::Utils::sha256(out.hash, sizeof(out.hash), out.secret, 16); + return true; +} + +void AlertReporter::onConfigChanged() { + // Reset transient state so a config change re-arms the edge detector. +#ifdef WITH_MQTT_BRIDGE + _wifi.state = OK; + _wifi.fired_at_ms = 0; + for (size_t i = 0; i < sizeof(_mqtt) / sizeof(_mqtt[0]); i++) { + _mqtt[i].state = OK; + _mqtt[i].fired_at_ms = 0; + } +#endif +} + +bool AlertReporter::sendChannel(const char* text) { + if (!_mesh || !_prefs) return false; + + mesh::GroupChannel channel; + if (!resolveChannel(channel)) return false; + + // Build ": " plaintext payload. Sender = node name (current). + uint8_t buf[5 + MAX_ALERT_TEXT_LEN + 32]; + uint32_t timestamp = _mesh->getRTCClock()->getCurrentTime(); + memcpy(buf, ×tamp, 4); + buf[4] = ALERT_TXT_TYPE_PLAIN; + + const char* sender = _prefs->node_name[0] ? _prefs->node_name : "node"; + int n = snprintf((char*)&buf[5], MAX_ALERT_TEXT_LEN, "%s: %s", sender, text); + if (n < 0) return false; + if (n >= MAX_ALERT_TEXT_LEN) n = MAX_ALERT_TEXT_LEN - 1; + + mesh::Packet* pkt = _mesh->createGroupDatagram(PAYLOAD_TYPE_GRP_TXT, channel, + buf, 5 + (size_t)n); + if (!pkt) { + ALERT_DEBUG_PRINTLN("createGroupDatagram failed (pool empty?)"); + return false; + } + + // Ride the repeater's default scope (or `alert.region` override) when the + // host MyMesh provides one — same path MyMesh uses for adverts and + // broadcast channel messages. Falls back to plain (unscoped) flood when + // no callbacks are wired or no scope is configured, matching the + // pre-scoped behavior on builds without RegionMap. + // + // path_hash_size must honor the repeater's configured path.hash.mode (1, 2, + // or 3-byte hashes); the Mesh.h default of 1 would silently downgrade + // observers running on 2/3-byte regional meshes. + const uint8_t path_hash_size = (uint8_t)(_prefs->path_hash_mode + 1); + TransportKey scope; + bool have_scope = _callbacks && _callbacks->resolveAlertScope(scope) && !scope.isNull(); + if (have_scope) { + uint16_t codes[2]; + codes[0] = scope.calcTransportCode(pkt); + codes[1] = 0; + _mesh->sendFlood(pkt, codes, 0, path_hash_size); + } else { + _mesh->sendFlood(pkt, 0, path_hash_size); + } + ALERT_DEBUG_PRINTLN("sent: %s", text); + return true; +} + +bool AlertReporter::sendText(const char* text) { + // sendText() is the manual entry point (`alert test` CLI). Deliberately + // does NOT check alert_enabled so operators can verify the PSK / hashtag + // setup without enabling automatic fault firing. + if (!_prefs || !text || !*text) return false; + return sendChannel(text); +} + +void AlertReporter::formatAge(unsigned long age_ms, char* out, size_t out_size) const { + unsigned long secs = age_ms / 1000UL; + unsigned long h = secs / 3600UL; + unsigned long m = (secs % 3600UL) / 60UL; + if (h > 0) { + snprintf(out, out_size, "%luh%lum", h, m); + } else { + snprintf(out, out_size, "%lum", m); + } +} + +void AlertReporter::onLoop(unsigned long now_ms) { + if (!_prefs || !_prefs->alert_enabled) return; + if (!_mesh) return; + + // Throttle: ~5 s cadence. The thresholds are minutes-scale so this is fine. + if ((long)(now_ms - _next_check_ms) < 0) return; + _next_check_ms = now_ms + 5000UL; + +#ifdef WITH_MQTT_BRIDGE + // Clamp to a 60-minute floor regardless of what's in NodePrefs. The CLI + // already enforces this on set, but a stale prefs file or future field + // tweak shouldn't be able to drag the floor below 1 hour and let a + // flapping link spam the mesh. + uint16_t cfg_min = _prefs->alert_min_interval_min; + if (cfg_min < 60) cfg_min = 60; + unsigned long min_interval_ms = (unsigned long)cfg_min * 60000UL; + + // -------- WiFi fault -------- + if (_prefs->alert_wifi_minutes > 0) { + unsigned long wifi_disc_ms = MQTTBridge::getLastWifiDisconnectTime(); + unsigned long wifi_conn_ms = MQTTBridge::getWifiConnectedAtMillis(); + bool wifi_down = (wifi_disc_ms != 0 && wifi_conn_ms == 0); + unsigned long down_ms = wifi_down ? (now_ms - wifi_disc_ms) : 0; + unsigned long thresh_ms = (unsigned long)_prefs->alert_wifi_minutes * 60000UL; + + if (_wifi.state == OK) { + if (wifi_down && down_ms >= thresh_ms && + (now_ms - _wifi.fired_at_ms) >= min_interval_ms) { + char age[16]; + formatAge(down_ms, age, sizeof(age)); + uint8_t reason = MQTTBridge::getLastWifiDisconnectReason(); + char text[80]; + if (reason != 0) { + snprintf(text, sizeof(text), "WiFi down %s (reason %u)", age, (unsigned)reason); + } else { + snprintf(text, sizeof(text), "WiFi down %s", age); + } + if (sendChannel(text)) { + _wifi.state = FIRING; + _wifi.fired_at_ms = now_ms; + _wifi.last_outage_started_ms = wifi_disc_ms; + } + } + } else { // FIRING + if (!wifi_down) { + unsigned long total = (wifi_conn_ms != 0 && _wifi.last_outage_started_ms != 0) + ? (wifi_conn_ms - _wifi.last_outage_started_ms) : 0; + char age[16]; + formatAge(total, age, sizeof(age)); + char text[80]; + snprintf(text, sizeof(text), "WiFi recovered after %s", age); + sendChannel(text); + _wifi.state = OK; + } + } + } else if (_wifi.state == FIRING) { + _wifi.state = OK; // threshold disabled mid-fault: silently re-arm + } + + // -------- MQTT slot faults -------- + if (_prefs->alert_mqtt_minutes > 0 && _bridge != nullptr) { + int n = MQTTBridge::getRuntimeSlotCount(); + if (n > (int)(sizeof(_mqtt) / sizeof(_mqtt[0]))) n = (int)(sizeof(_mqtt) / sizeof(_mqtt[0])); + unsigned long thresh_ms = (unsigned long)_prefs->alert_mqtt_minutes * 60000UL; + + for (int i = 0; i < n; i++) { + Fault& f = _mqtt[i]; + if (!_bridge->isSlotEnabledAndAttempted(i)) { + if (f.state == FIRING) f.state = OK; // slot disabled mid-fault + continue; + } + unsigned long outage_start = _bridge->getSlotCurrentOutageStartMs(i); + bool down = (outage_start != 0); + unsigned long down_ms = down ? (now_ms - outage_start) : 0; + + if (f.state == OK) { + if (down && down_ms >= thresh_ms && + (now_ms - f.fired_at_ms) >= min_interval_ms) { + char age[16]; + formatAge(down_ms, age, sizeof(age)); + char text[100]; + snprintf(text, sizeof(text), "MQTT slot %d (%s) down %s", + i + 1, _bridge->getSlotPresetName(i), age); + if (sendChannel(text)) { + f.state = FIRING; + f.fired_at_ms = now_ms; + f.last_outage_started_ms = outage_start; + } + } + } else { // FIRING + if (!down) { + unsigned long total = (f.last_outage_started_ms != 0) + ? (now_ms - f.last_outage_started_ms) : 0; + char age[16]; + formatAge(total, age, sizeof(age)); + char text[100]; + snprintf(text, sizeof(text), "MQTT slot %d (%s) recovered after %s", + i + 1, _bridge->getSlotPresetName(i), age); + sendChannel(text); + f.state = OK; + } + } + } + } +#else + (void)now_ms; +#endif +} diff --git a/src/helpers/AlertReporter.h b/src/helpers/AlertReporter.h new file mode 100644 index 0000000000..dbbd154f65 --- /dev/null +++ b/src/helpers/AlertReporter.h @@ -0,0 +1,109 @@ +#pragma once + +#include +#include +#include "CommonCLI.h" + +#ifdef WITH_MQTT_BRIDGE +#include "bridges/MQTTBridge.h" +#endif + +/** + * Returns the label of a banned alert channel if \a secret16 matches one of + * the channels in the BANNED_ALERT_CHANNELS table (e.g. "PUBLIC", "#test", + * "#bot"), or nullptr otherwise. Centralized here so both AlertReporter and + * the CommonCLI `set alert.psk` / `set alert.hashtag` handlers can share one + * source of truth — adding a new banned channel is a one-line table edit. + */ +const char* alertReporterBannedChannelMatch(const uint8_t* secret16); + +/** + * Convenience: hex-decodes \a psk_hex (32 lowercase/uppercase hex chars) and + * forwards to alertReporterBannedChannelMatch. Returns nullptr if not banned + * (or if the input isn't a valid 32-char hex string — only 16-byte secrets + * are present in the banned table). + */ +const char* alertReporterBannedChannelMatchHex(const char* psk_hex); + +/** + * \brief Send-only group-channel "fault alert" reporter for repeater/observer + * builds. + * + * Polls WiFi and per-MQTT-slot outage timers from MQTTBridge. When any timer + * exceeds its configured threshold, floods a single PAYLOAD_TYPE_GRP_TXT + * message on the configured alert channel ("WiFi down 47m — MyObserver"), + * then arms a "recovered" message for the next state transition. + * + * The alert channel must be explicitly configured to either a private hex + * PSK (`set alert.psk`) or a hashtag name (`set alert.hashtag`); the + * well-known PUBLIC group key (and a small list of other auto-responder + * channels — see BANNED_ALERT_CHANNELS in AlertReporter.cpp) are rejected on + * purpose so fault alerts never spam community channels. + * + * Edge-triggered + rate-limited via NodePrefs::alert_min_interval_min so a + * flapping link cannot spam the channel. + * + * Designed to compile and run on any repeater build: + * - The channel-send path uses only mesh::Mesh primitives that already + * exist in the Dispatcher hierarchy (createGroupDatagram + sendFlood). + * - WiFi/MQTT polling is #ifdef WITH_MQTT_BRIDGE-gated; without it, the + * reporter still supports manual `alert test` sends. + */ +class AlertReporter { +public: + AlertReporter(); + + /** + * Wire up the reporter. Must be called from MyMesh::begin() after prefs + * are loaded. \a callbacks is optional — when non-null the reporter uses + * it to resolve a TransportKey scope for outgoing alert floods (so the + * packet rides the repeater's default scope or an `alert.region` override). + */ + void begin(NodePrefs* prefs, mesh::Mesh* mesh, CommonCLICallbacks* callbacks = nullptr); + +#ifdef WITH_MQTT_BRIDGE + /** Bridge can be (re)created lazily; pass nullptr to detach. */ + void setBridge(MQTTBridge* bridge); +#endif + + /** + * Re-derive the cached GroupChannel from \a alert_psk_hex. Call from the + * CLI hot-reload hook after `set alert.psk` / `set alert.hashtag` / `set alert on|off`. + */ + void onConfigChanged(); + + /** + * Cooperative tick. Fast: returns immediately if disabled, throttled + * internally to ~5 s checks. Safe to call every loop(). + */ + void onLoop(unsigned long now_ms); + + /** + * Send an arbitrary text immediately (used by `alert test` CLI). Returns + * false when disabled, PSK invalid, or the underlying flood-send fails. + * Bypasses the rate limiter and edge logic. + */ + bool sendText(const char* text); + +private: + bool resolveChannel(mesh::GroupChannel& out) const; + bool sendChannel(const char* text); + void formatAge(unsigned long age_ms, char* out, size_t out_size) const; + + enum FaultState { OK, FIRING }; + struct Fault { + FaultState state; + unsigned long fired_at_ms; // millis() when we last sent a "down" alert + unsigned long last_outage_started_ms; // remembered so the recovered msg can quote duration + }; + + NodePrefs* _prefs; + mesh::Mesh* _mesh; + CommonCLICallbacks* _callbacks; +#ifdef WITH_MQTT_BRIDGE + MQTTBridge* _bridge; + Fault _wifi; + Fault _mqtt[RUNTIME_MQTT_SLOTS]; +#endif + unsigned long _next_check_ms; +}; diff --git a/src/helpers/CommonCLI.cpp b/src/helpers/CommonCLI.cpp index 39ef6eb8fb..2544e16b90 100644 --- a/src/helpers/CommonCLI.cpp +++ b/src/helpers/CommonCLI.cpp @@ -2,7 +2,9 @@ #include "CommonCLI.h" #include "TxtDataHelpers.h" #include "AdvertDataHelpers.h" +#include "AlertReporter.h" // for alertReporterBannedChannelMatch() #include +#include #ifndef BRIDGE_MAX_BAUD #define BRIDGE_MAX_BAUD 115200 @@ -278,7 +280,32 @@ void CommonCLI::loadPrefsInt(FILESYSTEM* fs, const char* filename) { if (file.available() >= (int)sizeof(_prefs->radio_watchdog_minutes)) { file.read((uint8_t *)&_prefs->radio_watchdog_minutes, sizeof(_prefs->radio_watchdog_minutes)); // 316 } - // next: 317 + // Alert channel fields (appended; older files won't have them — defaults from MyMesh ctor remain) + if (file.available() >= (int)sizeof(_prefs->alert_enabled)) { + file.read((uint8_t *)&_prefs->alert_enabled, sizeof(_prefs->alert_enabled)); + } + if (file.available() >= (int)sizeof(_prefs->alert_psk_hex)) { + file.read((uint8_t *)&_prefs->alert_psk_hex, sizeof(_prefs->alert_psk_hex)); + } + if (file.available() >= (int)sizeof(_prefs->alert_wifi_minutes)) { + file.read((uint8_t *)&_prefs->alert_wifi_minutes, sizeof(_prefs->alert_wifi_minutes)); + } + if (file.available() >= (int)sizeof(_prefs->alert_mqtt_minutes)) { + file.read((uint8_t *)&_prefs->alert_mqtt_minutes, sizeof(_prefs->alert_mqtt_minutes)); + } + if (file.available() >= (int)sizeof(_prefs->alert_min_interval_min)) { + file.read((uint8_t *)&_prefs->alert_min_interval_min, sizeof(_prefs->alert_min_interval_min)); + } + if (file.available() >= (int)sizeof(_prefs->alert_hashtag)) { + file.read((uint8_t *)&_prefs->alert_hashtag, sizeof(_prefs->alert_hashtag)); + } + if (file.available() >= (int)sizeof(_prefs->alert_region)) { + file.read((uint8_t *)&_prefs->alert_region, sizeof(_prefs->alert_region)); + } + // ensure null termination after raw read + _prefs->alert_psk_hex[sizeof(_prefs->alert_psk_hex) - 1] = '\0'; + _prefs->alert_hashtag[sizeof(_prefs->alert_hashtag) - 1] = '\0'; + _prefs->alert_region[sizeof(_prefs->alert_region) - 1] = '\0'; // sanitise bad pref values _prefs->rx_delay_base = constrain(_prefs->rx_delay_base, 0, 20.0f); @@ -401,7 +428,14 @@ void CommonCLI::savePrefs(FILESYSTEM* fs) { file.write((uint8_t *)&_prefs->snmp_enabled, sizeof(_prefs->snmp_enabled)); // 291 file.write((uint8_t *)&_prefs->snmp_community, sizeof(_prefs->snmp_community)); // 292 file.write((uint8_t *)&_prefs->radio_watchdog_minutes, sizeof(_prefs->radio_watchdog_minutes)); // 316 - // next: 317 + // Alert channel fields (appended) + file.write((uint8_t *)&_prefs->alert_enabled, sizeof(_prefs->alert_enabled)); + file.write((uint8_t *)&_prefs->alert_psk_hex, sizeof(_prefs->alert_psk_hex)); + file.write((uint8_t *)&_prefs->alert_wifi_minutes, sizeof(_prefs->alert_wifi_minutes)); + file.write((uint8_t *)&_prefs->alert_mqtt_minutes, sizeof(_prefs->alert_mqtt_minutes)); + file.write((uint8_t *)&_prefs->alert_min_interval_min, sizeof(_prefs->alert_min_interval_min)); + file.write((uint8_t *)&_prefs->alert_hashtag, sizeof(_prefs->alert_hashtag)); + file.write((uint8_t *)&_prefs->alert_region, sizeof(_prefs->alert_region)); file.close(); } @@ -808,6 +842,21 @@ void CommonCLI::handleCommand(uint32_t sender_timestamp, char* command, char* re } else if (memcmp(command, "clear stats", 11) == 0) { _callbacks->clearStats(); strcpy(reply, "(OK - stats reset)"); + } else if (memcmp(command, "alert test", 10) == 0 && (command[10] == 0 || command[10] == ' ')) { + // Send a one-off test alert on the configured alert channel. + const char* extra = command[10] == ' ' ? &command[11] : ""; + char text[120]; + if (*extra) { + snprintf(text, sizeof(text), "[test] %s", extra); + } else { + strcpy(text, "[test] alert channel ok"); + } + if (!_prefs->alert_psk_hex[0]) { + strcpy(reply, "Error: alert channel not configured (set alert.psk or set alert.hashtag)"); + } else { + bool ok = _callbacks->sendAlertText(text); + strcpy(reply, ok ? "OK - alert sent" : "Error: alert send failed (bad PSK or PUBLIC key refused?)"); + } } else if (memcmp(command, "get ", 4) == 0) { handleGetCmd(sender_timestamp, command, reply); } else if (memcmp(command, "set ", 4) == 0) { @@ -1528,6 +1577,173 @@ void CommonCLI::handleSetCmd(uint32_t sender_timestamp, char* command, char* rep savePrefs(); strcpy(reply, "OK"); #endif + } else if (memcmp(config, "alert ", 6) == 0) { + // set alert on|off + const char* val = &config[6]; + if (memcmp(val, "on", 2) == 0 && (val[2] == 0 || val[2] == ' ')) { + _prefs->alert_enabled = 1; + savePrefs(); + _callbacks->onAlertConfigChanged(); + strcpy(reply, "OK - alerts on"); + } else if (memcmp(val, "off", 3) == 0 && (val[3] == 0 || val[3] == ' ')) { + _prefs->alert_enabled = 0; + savePrefs(); + _callbacks->onAlertConfigChanged(); + strcpy(reply, "OK - alerts off"); + } else { + strcpy(reply, "Error: usage set alert on|off"); + } + } else if (memcmp(config, "alert.psk", 9) == 0 && (config[9] == 0 || config[9] == ' ')) { + // `set alert.psk` with no argument clears the field (alerts then disabled + // until a new psk/hashtag is configured). + const char* val = (config[9] == ' ') ? &config[10] : ""; + while (*val == ' ') val++; + size_t len = strlen(val); + if (len == 0) { + _prefs->alert_psk_hex[0] = '\0'; + _prefs->alert_hashtag[0] = '\0'; + savePrefs(); + _callbacks->onAlertConfigChanged(); + strcpy(reply, "OK - alert.psk cleared (alerts disabled until configured)"); + } else if (val[0] == '#') { + strcpy(reply, "Error: use 'set alert.hashtag' for hashtag channels"); + } else if (len != 32) { + // 16-byte channel secret = 32 hex chars. This is what the mobile app's + // "Share Channel" emits, what `set alert.hashtag` derives, and what the + // BANNED_ALERT_CHANNELS table holds. 32-byte channels aren't used + // anywhere in MeshCore practice. + strcpy(reply, "Error: PSK must be 32 hex chars (16-byte channel secret)"); + } else { + // Validate all-hex, then normalize via fromHex/toHex so the stored + // form is always lowercase regardless of input case. + uint8_t raw[16]; + bool all_hex = true; + for (size_t i = 0; i < len; i++) { + if (!mesh::Utils::isHexChar(val[i])) { all_hex = false; break; } + } + if (!all_hex || !mesh::Utils::fromHex(raw, 16, val)) { + strcpy(reply, "Error: PSK must be 32 hex chars (16-byte channel secret)"); + } else { + char normalized[33]; + mesh::Utils::toHex(normalized, raw, 16); + if (const char* banned = alertReporterBannedChannelMatchHex(normalized)) { + // Refuse any key on the banned channel list (Public PSK, well-known + // auto-responder hashtags like #test/#bot, etc.). Fault alerts on + // those channels would spam every node in the area. + sprintf(reply, "Error: refusing banned channel '%s'; pick a private key or hashtag", banned); + } else { + StrHelper::strncpy(_prefs->alert_psk_hex, normalized, sizeof(_prefs->alert_psk_hex)); + // The new PSK is operator-supplied, so any previously-derived + // hashtag name is no longer accurate provenance — drop it. + _prefs->alert_hashtag[0] = '\0'; + savePrefs(); + _callbacks->onAlertConfigChanged(); + strcpy(reply, "OK - alert.psk updated"); + } + } + } + } else if (memcmp(config, "alert.hashtag", 13) == 0 && (config[13] == 0 || config[13] == ' ')) { + const char* val = (config[13] == ' ') ? &config[14] : ""; + while (*val == ' ') val++; + size_t in_len = strlen(val); + if (in_len == 0) { + _prefs->alert_psk_hex[0] = '\0'; + _prefs->alert_hashtag[0] = '\0'; + savePrefs(); + _callbacks->onAlertConfigChanged(); + strcpy(reply, "OK - alert.hashtag cleared (alerts disabled until configured)"); + } else { + // Canonical stored form is "#name" because the leading '#' is part of + // the sha256 input (matching the companion-app hashtag-channel + // derivation in docs/companion_protocol.md). Accept the user typing + // either "alerts" or "#alerts". + char hashtag[sizeof(_prefs->alert_hashtag)]; + size_t need = (val[0] == '#') ? in_len : in_len + 1; + if (need >= sizeof(hashtag)) { + strcpy(reply, "Error: hashtag too long"); + } else { + if (val[0] == '#') { + StrHelper::strncpy(hashtag, val, sizeof(hashtag)); + } else { + hashtag[0] = '#'; + StrHelper::strncpy(&hashtag[1], val, sizeof(hashtag) - 1); + } + + // Derive the channel key once: first 16 bytes of sha256("#name"), + // store hex-encoded in alert_psk_hex. We don't re-derive on every + // send — operators can later override with `set alert.psk` without + // leaving stale hashtag text behind. + uint8_t digest[32]; + mesh::Utils::sha256(digest, sizeof(digest), + (const uint8_t*)hashtag, (int)strlen(hashtag)); + if (const char* banned = alertReporterBannedChannelMatch(digest)) { + // Hashtag derives to a banned key (e.g. `set alert.hashtag test` + // hits the #test entry). Refuse before clobbering existing config. + sprintf(reply, "Error: refusing banned channel '%s'", banned); + } else { + char hex[33]; + mesh::Utils::toHex(hex, digest, 16); + StrHelper::strncpy(_prefs->alert_hashtag, hashtag, sizeof(_prefs->alert_hashtag)); + StrHelper::strncpy(_prefs->alert_psk_hex, hex, sizeof(_prefs->alert_psk_hex)); + savePrefs(); + _callbacks->onAlertConfigChanged(); + sprintf(reply, "OK - alert.hashtag: %s", _prefs->alert_hashtag); + } + } + } + } else if (memcmp(config, "alert.region", 12) == 0 && (config[12] == 0 || config[12] == ' ')) { + // `set alert.region ` overrides the repeater's default_scope for + // alert sends only. `set alert.region` (no arg) clears it. The name is + // looked up lazily via RegionMap at send time; we deliberately don't + // mutate the region map here, so naming an unknown region is allowed + // but will silently fall back to default_scope until the operator runs + // `region put` for it. + const char* val = (config[12] == ' ') ? &config[13] : ""; + while (*val == ' ') val++; + size_t len = strlen(val); + if (len == 0) { + _prefs->alert_region[0] = '\0'; + savePrefs(); + _callbacks->onAlertConfigChanged(); + strcpy(reply, "OK - alert.region cleared (using default scope)"); + } else if (len >= sizeof(_prefs->alert_region)) { + strcpy(reply, "Error: alert.region too long"); + } else { + StrHelper::strncpy(_prefs->alert_region, val, sizeof(_prefs->alert_region)); + StrHelper::stripSurroundingQuotes(_prefs->alert_region, sizeof(_prefs->alert_region)); + savePrefs(); + _callbacks->onAlertConfigChanged(); + sprintf(reply, "OK - alert.region: %s", _prefs->alert_region); + } + } else if (memcmp(config, "alert.wifi ", 11) == 0) { + int mins = (int)_atoi(&config[11]); + if (mins < 0 || mins > 1440) { + strcpy(reply, "Error: alert.wifi must be 0-1440 minutes (0=off)"); + } else { + _prefs->alert_wifi_minutes = (uint16_t)mins; + savePrefs(); + sprintf(reply, "OK - alert.wifi %d min%s", mins, mins == 0 ? " (disabled)" : ""); + } + } else if (memcmp(config, "alert.mqtt ", 11) == 0) { + int mins = (int)_atoi(&config[11]); + if (mins < 0 || mins > 10080) { + strcpy(reply, "Error: alert.mqtt must be 0-10080 minutes (0=off)"); + } else { + _prefs->alert_mqtt_minutes = (uint16_t)mins; + savePrefs(); + sprintf(reply, "OK - alert.mqtt %d min%s", mins, mins == 0 ? " (disabled)" : ""); + } + } else if (memcmp(config, "alert.interval ", 15) == 0) { + int mins = (int)_atoi(&config[15]); + // Floor at 60 min: faster re-fires would let a flapping link spam the + // mesh with a fresh GRP_TXT flood every minute — terrible for airtime. + if (mins < 60 || mins > 10080) { + strcpy(reply, "Error: alert.interval must be 60-10080 minutes"); + } else { + _prefs->alert_min_interval_min = (uint16_t)mins; + savePrefs(); + sprintf(reply, "OK - alert.interval %d min", mins); + } } else if (memcmp(config, "adc.multiplier ", 15) == 0) { _prefs->adc_multiplier = atof(&config[15]); if (_board->setAdcMultiplier(_prefs->adc_multiplier)) { @@ -1835,6 +2051,22 @@ void CommonCLI::handleGetCmd(uint32_t sender_timestamp, char* command, char* rep #else strcpy(reply, "ERROR: unsupported"); #endif + } else if (memcmp(config, "alert.hashtag", 13) == 0) { + sprintf(reply, "> %s", _prefs->alert_hashtag[0] ? _prefs->alert_hashtag : "(unset)"); + } else if (sender_timestamp == 0 && memcmp(config, "alert.psk", 9) == 0) { // from serial command line only + sprintf(reply, "> %s", _prefs->alert_psk_hex[0] ? _prefs->alert_psk_hex : "(unset)"); + } else if (memcmp(config, "alert.region", 12) == 0) { + sprintf(reply, "> %s", _prefs->alert_region[0] ? _prefs->alert_region : "(unset, using default scope)"); + } else if (memcmp(config, "alert.wifi", 10) == 0) { + sprintf(reply, "> %u min%s", (unsigned)_prefs->alert_wifi_minutes, + _prefs->alert_wifi_minutes == 0 ? " (disabled)" : ""); + } else if (memcmp(config, "alert.mqtt", 10) == 0) { + sprintf(reply, "> %u min%s", (unsigned)_prefs->alert_mqtt_minutes, + _prefs->alert_mqtt_minutes == 0 ? " (disabled)" : ""); + } else if (memcmp(config, "alert.interval", 14) == 0) { + sprintf(reply, "> %u min", (unsigned)_prefs->alert_min_interval_min); + } else if (memcmp(config, "alert", 5) == 0 && (config[5] == 0 || config[5] == '\n' || config[5] == '\r')) { + sprintf(reply, "> %s", _prefs->alert_enabled ? "on" : "off"); } else if (memcmp(config, "adc.multiplier", 14) == 0) { float adc_mult = _board->getAdcMultiplier(); if (adc_mult == 0.0f) { diff --git a/src/helpers/CommonCLI.h b/src/helpers/CommonCLI.h index aaa7a77e0b..0d50aa7904 100644 --- a/src/helpers/CommonCLI.h +++ b/src/helpers/CommonCLI.h @@ -104,6 +104,25 @@ struct NodePrefs { // persisted to file uint8_t snmp_enabled; // boolean: 0=off, 1=on char snmp_community[24]; // community string (default "public") uint8_t radio_watchdog_minutes; // 0=disabled, 1-120 minutes + + // Fault alert channel (LoRa group-channel "observer status" message on prolonged WiFi/MQTT outage). + // Sent over the radio (NOT over MQTT) so the alert still works while the MQTT path is broken. + // All fields are appended at the end of NodePrefs for binary-compatible upgrades. + uint8_t alert_enabled; // 0 = off (default), 1 = on + char alert_psk_hex[33]; // 32 lowercase hex chars (16-byte channel secret) + null; empty = alerts disabled. Banned keys (Public/#test/#bot) are rejected. + uint16_t alert_wifi_minutes; // WiFi-down threshold in minutes (0 = disabled), default 30 + uint16_t alert_mqtt_minutes; // MQTT-down threshold in minutes (0 = disabled), default 240 (4 h) + uint16_t alert_min_interval_min; // min minutes between alerts for the same fault, default 60, floor 60 + // When the operator configures via `set alert.hashtag `, we derive + // alert_psk_hex from sha256("#name")[0..15] once and remember the hashtag + // text here purely for `get alert.hashtag` readback. A subsequent + // `set alert.psk` clears this field so it doesn't lie about provenance. + char alert_hashtag[24]; + // Optional region name (e.g. "us", "eu"); empty = use the repeater's + // default_scope. Looked up lazily via RegionMap::findByNamePrefix at send + // time, so the operator can name a region that doesn't exist yet without + // polluting region_map state. Falls back to default_scope on miss. + char alert_region[31]; }; #ifdef WITH_MQTT_BRIDGE @@ -274,6 +293,24 @@ class CommonCLICallbacks { virtual void setRxBoostedGain(bool enable) { // no op by default }; + + // Fault-alert channel hooks (see NodePrefs::alert_*). The default no-op + // implementations keep CLI commands harmless on builds that don't wire up + // an AlertReporter. + virtual void onAlertConfigChanged() { + // no op by default + } + virtual bool sendAlertText(const char* /*text*/) { + return false; // no op by default + } + // Resolve the TransportKey scope to use for outgoing fault-alert floods. + // Implementations should consult NodePrefs::alert_region first (look up via + // RegionMap), then fall back to the repeater's default_scope, then return + // false if neither yields a usable key. AlertReporter falls back to an + // unscoped flood when this returns false. + virtual bool resolveAlertScope(TransportKey& /*dest*/) { + return false; // no op by default + } }; class CommonCLI { diff --git a/src/helpers/bridges/MQTTBridge.cpp b/src/helpers/bridges/MQTTBridge.cpp index e43ee162df..8a251b484b 100644 --- a/src/helpers/bridges/MQTTBridge.cpp +++ b/src/helpers/bridges/MQTTBridge.cpp @@ -198,6 +198,25 @@ void MQTTBridge::formatMqttStatusReply(char* buf, size_t bufsize, const NodePref uint8_t MQTTBridge::getLastWifiDisconnectReason() { return s_wifi_disconnect_reason; } unsigned long MQTTBridge::getLastWifiDisconnectTime() { return s_wifi_disconnect_time; } +unsigned long MQTTBridge::getSlotCurrentOutageStartMs(int slot_index) const { + if (slot_index < 0 || slot_index >= RUNTIME_MQTT_SLOTS) return 0; + return _slots[slot_index].current_outage_started_ms; +} + +bool MQTTBridge::isSlotEnabledAndAttempted(int slot_index) const { + if (slot_index < 0 || slot_index >= RUNTIME_MQTT_SLOTS) return false; + const MQTTSlot& s = _slots[slot_index]; + return s.enabled && s.initial_connect_done; +} + +const char* MQTTBridge::getSlotPresetName(int slot_index) const { + if (slot_index < 0 || slot_index >= RUNTIME_MQTT_SLOTS) return "?"; + const MQTTSlot& s = _slots[slot_index]; + if (s.preset && s.preset->name) return s.preset->name; + if (!s.enabled) return MQTT_PRESET_NONE; + return MQTT_PRESET_CUSTOM; +} + const char* MQTTBridge::wifiReasonStr(uint8_t reason) { switch (reason) { case 2: return "auth expired"; @@ -997,6 +1016,7 @@ void MQTTBridge::initSlotClients() { _slots[index].last_tls_stack_err = 0; _slots[index].last_sock_errno = 0; _slots[index].last_error_time = 0; + _slots[index].current_outage_started_ms = 0; // clear current-outage timer for AlertReporter updateCachedConnectionStatus(); publishStatusToSlot(index); }); @@ -1006,6 +1026,9 @@ void MQTTBridge::initSlotClients() { if (_slots[index].first_disconnect_time == 0) { _slots[index].first_disconnect_time = millis(); } + if (_slots[index].current_outage_started_ms == 0) { + _slots[index].current_outage_started_ms = millis(); + } _slots[index].connected = false; updateCachedConnectionStatus(); }); diff --git a/src/helpers/bridges/MQTTBridge.h b/src/helpers/bridges/MQTTBridge.h index e633e1dea5..c9a511d2ed 100644 --- a/src/helpers/bridges/MQTTBridge.h +++ b/src/helpers/bridges/MQTTBridge.h @@ -96,6 +96,12 @@ class MQTTBridge : public BridgeBase { unsigned long last_error_time; // millis() of last error uint32_t disconnect_count; // Number of disconnect callbacks since boot unsigned long first_disconnect_time; // millis() of first disconnect after boot + + // Current-outage timer (used by AlertReporter to fire faults after a sustained + // outage). Reset to 0 on each successful connect, set to millis() on first + // disconnect-after-connect. first_disconnect_time is intentionally separate + // so the existing 'mqttN.diag' "first_disc" semantics don't change. + unsigned long current_outage_started_ms; }; MQTTSlot _slots[RUNTIME_MQTT_SLOTS]; @@ -381,6 +387,23 @@ class MQTTBridge : public BridgeBase { bool isReady() const; static unsigned long getWifiConnectedAtMillis(); + + /** + * Per-slot outage accessors used by AlertReporter to detect prolonged + * MQTT broker outages. Indices are 0..RUNTIME_MQTT_SLOTS-1. + * + * - getSlotCurrentOutageStartMs(): millis() of the current outage start + * (0 when the slot is connected). Reset on each reconnect. + * - isSlotEnabledAndAttempted(): true when the slot is enabled (preset + * != "none") and has reached at least one connect attempt — i.e. it is + * meaningful to alarm on its connection state. + * - getSlotPresetName(): preset name for friendly status text. Returns + * "custom"/"none"/preset->name; never null. + */ + unsigned long getSlotCurrentOutageStartMs(int slot_index) const; + bool isSlotEnabledAndAttempted(int slot_index) const; + const char* getSlotPresetName(int slot_index) const; + static int getRuntimeSlotCount() { return RUNTIME_MQTT_SLOTS; } /** Resolved origin for MQTT JSON: node_name when mqtt_origin is empty, else mqtt_origin (with quote stripping). */ static void getEffectiveMqttOrigin(const NodePrefs* prefs, char* buf, size_t buf_size); static void formatMqttStatusReply(char* buf, size_t bufsize, const NodePrefs* prefs);