Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
42 changes: 42 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,48 @@ Each entry corresponds to a [GitHub Release](https://github.com/timescale/rsigma

## [Unreleased]

### Unknown-field discovery API (#149)

The `engine daemon` learns to surface two halves of detection coverage live from inside the process: which event fields are not referenced by any loaded rule (gap signal) and which rule fields have never appeared in an event (broken-coverage signal). RSigma owns both rule parsing and event ingestion end-to-end, so this view does not need an external pipeline.

**Two new flags on `rsigma engine daemon`** (off by default; zero overhead when not set):

| Flag | Default | Purpose |
|------|---------|---------|
| `--observe-fields` | off | Enable the field observer. When enabled, every event evaluated by the engine task has its dotted field paths recorded. |
| `--observe-fields-max-keys <N>` | `10000` | Hard ceiling on distinct field names. Existing keys keep counting once the cap is hit; new keys are dropped and counted as overflow. |

**Four new HTTP endpoints.**

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/api/v1/fields` | Snapshot bundling `summary` + `unknown` + `missing` for a one-shot dashboard read. |
| `GET` | `/api/v1/fields/unknown` | Event fields not referenced by any rule. Sorted by descending count. |
| `GET` | `/api/v1/fields/missing` | Rule fields never seen in events. Each entry includes up to 10 rule titles with a `truncated` flag for fields that span more rules. |
| `DELETE` | `/api/v1/fields/observer` | Clear the observer's counters and return `{previous_keys, previous_events}`. |

Each list endpoint accepts `?limit=N&offset=M` (default `limit=100`, cap `1000`) and returns `total` + `next_offset` for deterministic pagination. All four return `503 Service Unavailable` with `{"error":"field observation disabled","hint":"..."}` when `--observe-fields` is not set.

**Three new Prometheus surfaces.**

| Metric | Type | Description |
|--------|------|-------------|
| `rsigma_fields_observed_total` | counter | Total events scanned by the opt-in field observer. |
| `rsigma_fields_observer_unique_keys` | gauge | Distinct field names currently tracked. |
| `rsigma_fields_observer_overflow_dropped_total` | counter | New-key insert attempts dropped because the observer was at capacity. |

The gauges refresh on every `/metrics` scrape and after every successful `/api/v1/fields/*` call, so a Prometheus alert on `rsigma_fields_observer_overflow_dropped_total` fires the moment an operator's `--observe-fields-max-keys` choice is too low for the deployment.

**Shared extraction with `rsigma rule fields`.** The rule-field side of the join lives in a new `rsigma_eval::fields` module (`RuleFieldSet`) that both the CLI subcommand and the daemon import. The daemon caches the post-pipeline set on `RuntimeEngine` via `ArcSwap` and refreshes it on every successful `load_rules()`, so the HTTP handlers run lock-free against a stable view even during hot reloads.

**Shared join primitive.** `FieldObservation::coverage(&RuleFieldSet) -> FieldCoverage` lives in `rsigma-eval` and partitions an observation snapshot into the unknown / intersection / missing buckets in one pass. Both the daemon's HTTP handlers and the eval report consume this, so the partition semantics cannot drift across runtimes.

**Implementation cost.** Default-off; the engine task takes a single `ArcSwap` load per batch when no observer is attached and skips field iteration entirely. With `--observe-fields` set, the only added work is one `Event::field_keys()` walk per parsed event (one `String` allocation per leaf path, depth-capped at 64; flat formats like `KvEvent` return `Cow::Borrowed`) plus a short `std::sync::Mutex` lock to update counters. Memory is bounded by `--observe-fields-max-keys` (10k default ≈ a few hundred KB; keys stored as `Arc<str>` so snapshots refcount-bump rather than copy).

**Offline coverage report.** `rsigma engine eval` mirrors the daemon's field-observability surface with three new flags: `--observe-fields` enables observation; `--observe-fields-max-keys <N>` (default 10000, validated as `NonZeroUsize` so 0 is rejected at parse time); `--observe-fields-report <PATH>` writes the JSON report to a file (defaults to stderr if omitted so detections on stdout stay machine-consumable; clap-`requires` `--observe-fields` so the typo case fails fast). The report has the same shape as `GET /api/v1/fields`, so the same `jq` queries work against either runtime. To make this possible without coupling `engine eval` to the `daemon` Cargo feature, `FieldObserver` lives in `rsigma-eval` (which every consumer already links) and uses `std::sync::Mutex` to keep `rsigma-eval` dependency-light. `rsigma-runtime` keeps a `pub use rsigma_eval::{FieldObserver, FieldObservation, FieldObservationEntry, FieldCoverage}` re-export so existing imports continue to compile unchanged.

**Docs.** Endpoint reference under "Field observability" in `docs/reference/http-api.md`; flag rows in `docs/cli/engine/daemon.md` and `docs/cli/engine/eval.md`; metric rows in `docs/reference/metrics.md`; combined daemon/eval workflow in `docs/guide/observability.md`.

### Server-side TLS for the daemon API listener (#128)

The `engine daemon` API listener now terminates TLS in-process for every protocol that already shares `--api-addr`: the Axum HTTP REST API (`/healthz`, `/readyz`, `/metrics`, `/api/v1/*`), OTLP/HTTP on `POST /v1/logs`, and OTLP/gRPC via `LogsService/Export`. Operators can drop the sidecar reverse proxy they previously needed for confidentiality, integrity, and agent-to-daemon pinning.
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ For rule quality and editor integration, a built-in linter validates rules again
* **Post-evaluation enrichment:** Inject contextual data (asset info, IP reputation, identity, GeoIP, runbook URLs, ...) into detection and correlation results via four primitives (`template`, `lookup`, `http`, `command`) with kind-aware template namespaces, response cache, scope filtering, and hot-reload
* **Rule conversion:** Convert rules into backend-native query strings via a pluggable backend trait (PostgreSQL/TimescaleDB SQL, LynxDB)
* **Eval prefilters:** Use optional prefilters for large rule sets, including a bloom filter for substring matchers (`--bloom-prefilter`) and cross-rule Aho-Corasick index for whole-rule pruning (`--cross-rule-ac`, requires `daachorse-index` feature)
* **Field observability:** Opt-in `--observe-fields` mode on both `engine daemon` (live, exposed over `GET /api/v1/fields*` with Prometheus counters) and `engine eval` (one-shot JSON report at end-of-run, ideal for CI gap analysis) surfaces which event fields no rule references (gap signal) and which rule fields have never appeared in an event (broken-coverage signal); same JSON shape across runtimes
* **TLS termination:** Use in-process TLS termination for the daemon API listener (HTTP REST, `/metrics`, OTLP/HTTP, OTLP/gRPC) with optional mutual TLS, `aws-lc-rs` crypto, and cross-platform certificate hot-reload
* **NATS JetStream:** Use NATS JetStream support with authentication (credentials, mTLS), replay, consumer groups, and dead-letter queues
* **OTLP ingestion:** Use OTLP support for any OpenTelemetry-compatible agent (Grafana Alloy, Vector, Fluent Bit, OTel Collector) via HTTP or gRPC
Expand Down
9 changes: 9 additions & 0 deletions crates/rsigma-cli/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -173,6 +173,8 @@ Unlike `engine eval`, the daemon stays alive after stdin reaches EOF and support
| `--bloom-prefilter` | flag | `false` | Enable bloom-filter pre-filtering of positive substring matchers (workload-dependent; see `crates/rsigma-eval/README.md`) |
| `--bloom-max-bytes` | integer | **1048576** | Memory budget for the bloom index (no effect without `--bloom-prefilter`) |
| `--cross-rule-ac` | flag | `false` | Enable cross-rule Aho-Corasick pre-filter (requires `--features daachorse-index`; see `crates/rsigma-eval/README.md`) |
| `--observe-fields` | flag | `false` | Record the field keys of every event evaluated by the engine so `/api/v1/fields*` can report gap and broken-coverage signals. Off by default; when off the engine task does not iterate event fields at all |
| `--observe-fields-max-keys` | integer | **10000** | Hard ceiling on distinct field names tracked by the observer. Overflow drops are counted via `rsigma_fields_observer_overflow_dropped_total`. No effect without `--observe-fields` |
| `--buffer-size` | integer | **10000** | Bounded channel capacity for source-to-engine and engine-to-sink queues |
| `--batch-size` | integer | **1** | Maximum events per engine lock acquisition (reduces mutex overhead under load) |
| `--drain-timeout` | integer | **5** | Seconds to wait for in-flight events to drain on shutdown |
Expand Down Expand Up @@ -311,6 +313,10 @@ rsigma engine daemon \
| `/api/v1/sources` | GET | List dynamic sources and their resolution status |
| `/api/v1/sources/resolve` | POST | Trigger re-resolution of all dynamic sources (or specific ones via request body) |
| `/api/v1/sources/cache/{source_id}` | DELETE | Invalidate the cached value for a specific source |
| `/api/v1/fields` | GET | Combined snapshot with summary, unknown (gap signal), and missing (broken coverage). Returns 503 unless `--observe-fields` is set. Paginated via `?limit=&offset=` |
| `/api/v1/fields/unknown` | GET | Event fields no rule references, sorted by descending count. Requires `--observe-fields`. Paginated |
| `/api/v1/fields/missing` | GET | Rule fields never observed in events, with sample rule titles. Requires `--observe-fields`. Paginated |
| `/api/v1/fields/observer` | DELETE | Clear the observer's counters and return `{previous_keys, previous_events}`. Requires `--observe-fields` |
| `/v1/logs` | POST | OTLP log ingestion (`application/x-protobuf` or `application/json`, gzip supported). Requires `daemon-otlp` feature |

**OTLP log ingestion** (requires `daemon-otlp` feature):
Expand Down Expand Up @@ -441,6 +447,9 @@ Evaluate JSON events against Sigma detection and correlation rules.
| `--bloom-prefilter` | flag | `false` | Enable bloom-filter pre-filtering of positive substring matchers (see `crates/rsigma-eval/README.md` for the trade-off) |
| `--bloom-max-bytes` | integer | **1048576** | Memory budget for the bloom index (no effect without `--bloom-prefilter`) |
| `--cross-rule-ac` | flag | `false` | Enable cross-rule Aho-Corasick pre-filter (requires `--features daachorse-index`; see `crates/rsigma-eval/README.md`) |
| `--observe-fields` | flag | `false` | Record the field keys of every evaluated event and emit a coverage report at end-of-run (gap signal + broken-coverage signal). Same JSON shape as the daemon's `GET /api/v1/fields` endpoint |
| `--observe-fields-max-keys` | integer | **10000** | Hard ceiling on distinct field names tracked. Overflow is counted via `overflow_dropped` in the report. No effect without `--observe-fields` |
| `--observe-fields-report` | path | none | Path to write the report. Defaults to stderr when omitted so detections on stdout stay machine-consumable. No effect without `--observe-fields` |

\* Feature-gated: `logfmt` requires the `logfmt` feature, `cef` requires the `cef` feature, `evtx` requires the `evtx` feature.

Expand Down
32 changes: 32 additions & 0 deletions crates/rsigma-cli/src/commands/daemon.rs
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,30 @@ pub(crate) struct DaemonArgs {
#[arg(long = "bloom-max-bytes")]
pub bloom_max_bytes: Option<usize>,

/// Enable opt-in observation of every event's field keys so the
/// daemon can answer two coverage questions over its admin API:
/// which fields appear in events but are never referenced by any
/// loaded rule (gap signal), and which fields are referenced by
/// rules but have never appeared in an event (broken coverage).
///
/// Off by default. When set, an in-memory counter records the field
/// keys of every event evaluated by the engine task; the counter is
/// hard-capped by `--observe-fields-max-keys` and surfaced via the
/// `/api/v1/fields`, `/api/v1/fields/unknown`, and
/// `/api/v1/fields/missing` endpoints (plus
/// `DELETE /api/v1/fields/observer` to reset).
#[arg(long = "observe-fields")]
pub observe_fields: bool,

/// Hard ceiling on the number of distinct field names tracked by
/// the field observer. Once the ceiling is reached, new keys are
/// dropped (and counted via
/// `rsigma_fields_observer_overflow_dropped_total`); existing keys
/// keep incrementing. Default: 10000. Has no effect unless
/// `--observe-fields` is set.
#[arg(long = "observe-fields-max-keys", default_value_t = 10_000)]
pub observe_fields_max_keys: usize,

/// Enable the cross-rule Aho-Corasick pre-filter (daachorse-index).
///
/// Off by default. When enabled, the engine builds a single
Expand Down Expand Up @@ -408,6 +432,8 @@ pub(crate) fn cmd_daemon(args: DaemonArgs) {
allow_remote_include,
bloom_prefilter,
bloom_max_bytes,
observe_fields,
observe_fields_max_keys,
#[cfg(feature = "daachorse-index")]
cross_rule_ac,
enrichers,
Expand Down Expand Up @@ -507,6 +533,8 @@ pub(crate) fn cmd_daemon(args: DaemonArgs) {
allow_remote_include,
bloom_prefilter,
bloom_max_bytes,
observe_fields,
observe_fields_max_keys,
#[cfg(feature = "daachorse-index")]
cross_rule_ac,
enrichers,
Expand Down Expand Up @@ -560,6 +588,8 @@ fn run_daemon(
allow_remote_include: bool,
bloom_prefilter: bool,
bloom_max_bytes: Option<usize>,
observe_fields: bool,
observe_fields_max_keys: usize,
#[cfg(feature = "daachorse-index")] cross_rule_ac: bool,
enrichers_path: Option<PathBuf>,
source_paths: Vec<PathBuf>,
Expand Down Expand Up @@ -685,6 +715,8 @@ fn run_daemon(
allow_remote_include,
bloom_prefilter,
bloom_max_bytes,
observe_fields,
observe_fields_max_keys,
#[cfg(feature = "daachorse-index")]
cross_rule_ac,
enrichers_path,
Expand Down
Loading
Loading