Releases: timescale/rsigma
v0.12.0
TL;DR
RSigma v0.12.0 is the "operability, performance, and documentation" release:
- Comprehensive daemon and CLI observability: tower-http API access logs, per-request OTLP tracing, batch processing spans, source resolution spans, DLQ visibility, NATS and sink lifecycle events, correlation state eviction warnings, rule load diagnostics, daemon lifecycle logs, and a global
--log-formatflag for non-daemon subcommands. - Eval rule loading is no longer O(N²):
Engine::add_ruleis amortized O(1), and bulk loaders (Engine::add_rules,extend_compiled_rules,add_collection) rebuild indexes exactly once per batch. The full 3,120-rule SigmaHQ corpus that previously appeared to hang now loads in ~120 ms. - CLI subcommands reorganized into five noun-led groups (
engine,rule,backend,pipeline). Flat aliases continue to work as deprecated forwarders for one release. - Full documentation site live at https://timescale.github.io/rsigma/: 47 pages spanning Getting Started, User Guide, CLI Reference, Library API, Developers, Reference (including a 66-rule lint catalogue and a 27-metric Prometheus catalogue), Deployment, Editors, and Ecosystem. Built from
docs/on every merge tomainvia the new.github/workflows/docs.yml. - Test reliability:
cli_daemon_httpandcli_daemon_otlpE2E suites are now flake-free on macOS under load. - Dependency bumps: opentelemetry-proto 0.31.0 to 0.32.0, async-nats 0.47 to 0.48, yamlpath/yamlpatch 1.25.2 (with the
serde_yamlcargo rename replaced byyaml_serdedirectly), tokio 1.52.3, jsonschema 0.46.4, tower-http 0.6.10, tonic 0.14.6.
Daemon and CLI observability (PR #107)
The daemon and CLI ship with structured logs, distributed tracing spans, and profiling hooks across the three observability pillars. All new instrumentation flows through the existing tracing-subscriber (JSON, env-filter) and is controlled via RUST_LOG. Spans are designed to be consumable by future tokio-console or tracing-opentelemetry exporters without code changes.
Phases. One commit per phase, in landing order:
| Phase | Scope |
|---|---|
| HTTP API access logs | tower-http::TraceLayer::new_for_http() on the Axum router; each request produces a span with method, URI, status, and latency |
| Event pipeline | Per-batch debug span (batch_size, input_format, match count, elapsed_ms); DLQ parse-failure debug events; checked DLQ channel send with warn-on-closed; DLQ task lifecycle logging |
| Source resolution | InstrumentedResolver debug span (source_id, source_type); cache hit / fetch boundary events; refresh scheduler cycle completion logs (sources, duration_ms) |
| Correlation memory pressure | Warn on hard-cap eviction (current count, max, evicted, target capacity) so high-cardinality traffic causing data loss is no longer silent |
| NATS, sinks, backpressure | NATS source/sink publish and ack events; spawn_source backpressure warn alongside the existing metric; Sink::FanOut per-sink labels (sink_index, sink_type, error) |
| Rule load diagnostics | load_rules info span (rules_path, duration_ms); first three parse error details when bad rules fail to compile |
| OTLP per-request tracing | otlp_ingest debug span on both HTTP and gRPC handlers; record_count event after decoding ExportLogsServiceRequest |
| Daemon lifecycle | Health state transitions; file watcher errors; reload-channel coalesce vs closed events; periodic state snapshot duration and serialized size; SQLite migration column events; per-task shutdown-join logs |
--log-format for CLI |
Global --log-format <json|text> initializes a stderr subscriber on non-daemon subcommands. engine eval, rule validate, and rule lint emit info events on completion (rules loaded, validation totals, lint summary) when a subscriber is installed. The daemon always logs JSON, so the flag is a no-op there. |
Verbosity targets.
RUST_LOG filter |
Surfaces |
|---|---|
info,tower_http=debug |
HTTP API access logs |
info,rsigma=debug |
Batch processing spans, DLQ routing, OTLP per-request fields, snapshot save duration |
info,rsigma_runtime::sources=debug |
Dynamic source resolution and refresh scheduler |
info,rsigma_eval=debug |
Correlation engine internals |
Span correctness fix. Holding an EnteredSpan guard from Span::enter() across .await is an anti-pattern on the multi-threaded tokio runtime: when the task is suspended, the thread-local span context can leak into other tasks scheduled on the same thread, producing incorrect span nesting. InstrumentedResolver::resolve, the OTLP HTTP and gRPC handlers, and the engine batch loop now use .instrument() on async blocks instead. Span fields, event payloads, and runtime behavior are unchanged.
Documentation. A new Observability section in the root README and an updated Logging paragraph in the CLI README list the supported RUST_LOG filter targets and document the new --log-format flag.
Eval rule loading performance (PRs #119, #121, #122, #123)
Loading rules into an engine is no longer O(N²) in the rule count.
Batched loaders rebuild indexes exactly once. New Engine::add_rules (compiles each rule with the configured pipelines and collects per-rule compile errors without aborting the batch) and Engine::extend_compiled_rules (pre-compiled equivalent) rebuild the inverted index and per-field bloom exactly once at the end of the batch. Engine::add_collection, the rsigma rule validate path, and the rsigma engine eval rule load path now route through these APIs so the daemon and every RuntimeEngine caller share the one-rebuild fast path. Loading the SigmaHQ corpus (~3,120 rules) used to pay around 3K full index rebuilds and appeared to hang; it now completes in roughly 120 ms.
Single-rule add path is amortized O(1). Engine::add_rule and Engine::add_compiled_rule no longer rebuild the indexes from scratch on every push. They fold the new rule into the inverted index incrementally via the new RuleIndex::append_rule(rule_idx, rule) primitive, and into the per-field bloom via FieldBloomIndex::append_rule(rule). The bloom uses a doubling watermark with a 64-rule floor to schedule full rebuilds when the rule count has at least doubled past the last rebuild, capping false-positive-rate drift while keeping the amortized per-rule cost O(1). Rules that introduce a brand-new indexed field get a fresh bloom on the fly.
| Rules | add_collection |
add_rules |
add_rule loop |
|---|---|---|---|
| 1,000 | 1.15 ms | 1.17 ms | 1.64 ms |
| 10,000 | 11.82 ms | 11.85 ms | 17.23 ms |
| 100,000 | 121.65 ms | 122.13 ms | 166.07 ms |
(M4 Pro, release build. Run via cargo bench -p rsigma-eval --bench eval -- rule_load.)
When cross_rule_ac_enabled is on, the daachorse cross-rule index has no incremental update story, so the single-rule add path falls back to a full Engine::rebuild_index. Bulk loaders are unaffected.
Correctness. Between bloom rebuilds, probes may answer MaybeMatch where the batched-rebuild path would answer DefinitelyNoMatch. Both verdicts are correct (MaybeMatch is always safe); the engine just evaluates the rule directly instead of short-circuiting. The new differential test append_rule_matches_build_verdicts pins this property by checking that positive verdicts match exactly and that disjoint haystacks are still rejected at >= 90% under incremental builds.
Benchmarks. A new rule_load Criterion group compares the three load entry points at 1K / 10K / 100K rules. Numbers recorded in BENCHMARKS.md under the Rule Load Paths (0.11.x) subsection.
CLI command groups (PR #124)
The 12 flat top-level subcommands are reorganized into five noun-led command groups so the CLI scales as more subcommands arrive. The flat aliases continue to work for one release as visible-deprecated forwarders, are hidden in the next release, and are removed in v1.0. Every existing invocation keeps working, so there is no breaking change in this release.
$ rsigma
Parse, validate, and evaluate Sigma detection rules
Usage: rsigma [OPTIONS] <COMMAND>
Commands:
engine Run rules against events (eval / daemon)
rule Inspect and operate on Sigma rule files
backend Convert Sigma rules to backend-native queries
pipeline Pipeline tooling (resolve dynamic sources, …)
attack MITRE ATT&CK tooling (reserved; populated by the ATT&CK contributor PR)
eval [deprecated] Use `rsigma engine eval` instead
daemon [deprecated] Use `rsigma engine daemon` instead
parse [deprecated] Use `rsigma rule parse` instead
validate [deprecated] Use `rsigma rule validate` instead
lint [deprecated] Use `rsigma rule lint` instead
fields [deprecated] Use `rsigma rule fields` instead
condition [deprecated] Use `rsigma rule condition` instead
stdin [deprecated] Use `rsigma rule stdin` instead
convert [deprecated] Use `rsigma backend convert` instead
list-targets [deprecated] Use `rsigma backend targets` instead
list-formats [deprecated] Use `rsigma backend formats` instead
resolve [deprecated] Use `rsigma pipeline resolve` instead
help Print this message or the help of the given subcommand(s)
Options:
--log-format <LOG_FORMAT> Emit structured diagnostic logs to stderr (for CI / log aggregation) [possible values: json, text]
-h, --help Print help (see more with '--help')
-V, --version Print versionMigration:
| Old (flat) | New (grouped) |
|---|---|
rsigma eval ... |
rsigma engine eval ... |
rsigma daemon ... |
rsigma engine daemon ... |
rsigma parse ... |
rsigma rule parse ... |
rsigma validate ... |
`r... |
v0.11.0
TL;DR
RSigma v0.11.0 is the "eval performance" release:
- Matcher optimizer: batches
|containslists into Aho-Corasick automata, groups sibling regex matchers into RegexSet DFAs, and eliminates redundantto_lowercase()calls via shared case-folding groups. - Opt-in bloom filter pre-filtering for substring matchers, skipping entire detection items when trigrams cannot match.
- Opt-in cross-rule Aho-Corasick prefilter via daachorse (behind the
daachorse-indexfeature flag), pruning entire rules before evaluation with up to ~100x speedup on substring-heavy workloads. - Security hardening for dynamic pipeline sources: 10 MB body/payload caps on HTTP, command stdout, and NATS; 30-second command execution timeout; 1-second refresh interval floor. Closes all v0.10.0 Known Limitations.
- Parser fix: the unsupported
|notmodifier is now rejected with guidance toward condition-level negation. - Dependency bumps: criterion 0.5.1 to 0.8.2, jsonschema 0.42.2 to 0.46.3.
What's New
Matcher optimizer (PRs #99, #100, #101, #105)
The compiler now includes an optimization pass that restructures AnyOf matcher trees for better runtime performance. The optimizer is always on and preserves evaluation semantics exactly. Three transformations are applied in order:
Aho-Corasick batching. When an AnyOf node contains 8 or more plain |contains children with the same case sensitivity, they are collapsed into a single Aho-Corasick automaton (AhoCorasickSet). Instead of N sequential substring scans, the engine makes one linear pass over the haystack. The threshold of 8 was chosen empirically from a benchmark sweep: below 8 patterns, sequential str::contains with SIMD acceleration (memchr / Two-Way) is faster; at 8 and above, throughput flattens because the AC automaton scans once regardless of pattern count.
| Patterns | h=100 B | h=1 KB | h=8 KB | h=64 KB |
|---|---|---|---|---|
| 1 | 13.0 Melem/s | 7.77 Melem/s | 1.85 Melem/s | 248 Kelem/s |
| 4 | 9.08 Melem/s | 2.03 Melem/s | 293 Kelem/s | 35.6 Kelem/s |
| 8 | 5.17 Melem/s | 620 Kelem/s | 79.0 Kelem/s | 9.76 Kelem/s |
| 16 | 5.19 Melem/s | 628 Kelem/s | 78.6 Kelem/s | 9.67 Kelem/s |
| 32 | 4.99 Melem/s | 607 Kelem/s | 76.4 Kelem/s | 8.88 Kelem/s |
RegexSet batching. When an AnyOf node contains 3 or more |re children, they are collapsed into a single RegexSet DFA. One DFA pass replaces N independent regex evaluations. Falls back to individual matchers if set construction fails.
Case-insensitive grouping. After AC and RegexSet restructuring, if 2 or more surviving children are all case-insensitive and "pre-lowerable," they are wrapped in a CaseInsensitiveGroup. The haystack is lowered once via ascii_lowercase_cow (borrow-if-already-lower fast path), and all children use matches_pre_lowered against the shared lowered string, eliminating repeated allocation.
The optimizer only applies to AnyOf (OR) groups, never to AllOf (AND). This is a correctness constraint: collapsing AND-of-contains into AC with any-match semantics would change the logic.
Correctness guarantee. A new differential fuzz target (fuzz_eval_matcher_diff) asserts that optimize_any_of(matchers) produces identical match results to AnyOf(matchers) for arbitrary needle sets, haystacks, and case sensitivity.
Bloom filter pre-filtering (PRs #102, #104)
An opt-in trigram-based bloom index that can skip expensive substring matching before it starts. The bloom filter operates at the detection-item level, inside evaluate_rule.
How it works. At rule load time, the engine extracts positive substring needles (|contains, |startswith, |endswith, and AhoCorasickSet needles) from all compiled rules and inserts every 3-byte trigram into a per-field bloom filter (double hashing from AHash-derived pairs). At eval time, for each string field value, the engine slides trigrams over the lowered haystack; if no trigram from any pattern is present in the bloom, the item returns DefinitelyNoMatch and the matcher is skipped entirely.
One-sided correctness. The bloom filter has no false negatives for "definitely no match." If it says MaybeMatch, the full matcher runs as usual. Negated branches, non-string fields, and short/huge values conservatively return MaybeMatch.
Memory budget. Default total budget is 1 MiB (DEFAULT_MAX_TOTAL_BYTES), with a 64 KiB per-field cap. If the total exceeds the budget, fields with the worst bits-per-pattern density are dropped first. The budget is configurable via Engine::set_bloom_max_bytes.
CLI flags.
rsigma eval -r rules/ -e @events.json --bloom-prefilter
rsigma eval -r rules/ -e @events.json --bloom-prefilter --bloom-max-bytes 131072
rsigma daemon -r rules/ --bloom-prefilter
rsigma daemon -r rules/ --bloom-prefilter --bloom-max-bytes 2097152
When to enable. The bloom index adds approximately 1 microsecond of per-event trigram probing overhead. It pays off when you have many substring-heavy rules and most events do not match (the common case for threat intel feeds against high-volume telemetry). Benchmark with your own data before enabling in production.
Cross-rule Aho-Corasick prefilter (PR #106)
An opt-in whole-rule prefilter that prunes entire rules before evaluate_rule runs. This is distinct from the per-item matcher optimizer and the per-item bloom filter: it operates at the rule level.
How it works. At index build time, the engine collects all positive substring needles (lowered) from every rule and builds one DoubleArrayAhoCorasick<u32> automaton per field using the daachorse crate. Pattern IDs map back to rule indices. At eval time, for each indexed field with a string value, one overlapping scan on the lowered haystack marks which rules had at least one pattern hit. Rules that are "AC-prunable" (all detections consist exclusively of positive substring matchers, no negation in conditions, no field-less keywords) and received zero hits are skipped entirely.
Benchmark results. 200 non-matching events against N pure-substring rules (best-case workload):
| Rules | Off (default) | On (--cross-rule-ac) |
Speedup |
|---|---|---|---|
| 1,000 | 17.34 ms (11.5 Kelem/s) | 253.0 us (790 Kelem/s) | ~68x |
| 5,000 | 85.51 ms (2.34 Kelem/s) | 883.0 us (226 Kelem/s) | ~97x |
| 10,000 | 173.37 ms (1.15 Kelem/s) | 1.71 ms (117 Kelem/s) | ~101x |
The cross-rule index turns O(rules x patterns) per event into O(haystack_length) for the AC scan, so throughput is essentially constant in rule count.
Feature flag. The daachorse dependency is optional and gated behind the daachorse-index Cargo feature. Build with:
cargo install rsigma --features daachorse-index
# or
cargo build --release --features daachorse-index
CLI flags.
rsigma eval -r rules/ -e @events.json --cross-rule-ac
rsigma daemon -r rules/ --cross-rule-ac
When to enable. This is off by default. For typical mixed workloads (substring + exact + regex rules, events that hit multiple fields, smaller rule sets), the index adds build-time and lookup overhead with smaller wins or none, and can cause a slowdown. Enable for large (5K+ rules), substring-heavy, shared-pattern packs where most events do not match. Always benchmark against representative data first.
Composition. The three prefilter layers stack: the rule index narrows by exact field values, the cross-rule AC narrows by substring patterns, and the bloom filter skips individual detection items. All three can be enabled simultaneously; regression tests assert that the combined output matches the no-prefilter baseline.
Security hardening for dynamic pipeline sources (PR #96)
This release closes all four items listed under "Known Limitations" in the v0.10.0 release notes. Dynamic pipeline sources that fetch from HTTP, command, or NATS now enforce resource limits.
HTTP response body size limit. Responses are capped at 10 MB (MAX_SOURCE_RESPONSE_BYTES). If the server advertises a Content-Length exceeding the limit, the response is rejected without buffering the body. During streaming, if the accumulated body exceeds the limit, the connection is dropped. A 30-second client timeout is also enforced.
Command execution timeout and stdout size limit. Command sources are killed after 30 seconds (DEFAULT_COMMAND_TIMEOUT). Stdout is read in 8 KB chunks and capped at 10 MB; exceeding the limit kills the child process. Stderr is separately capped at 64 KB to prevent a chatty failing command from exhausting memory.
NATS message payload size limit. NATS messages exceeding 10 MB are rejected before parsing.
Refresh interval floor. Source refresh intervals below 1 second are clamped to 1 second with a structured warning log. This prevents config mistakes or hostile configs from causing tight polling loops.
All limits use a new SourceErrorKind::ResourceLimit variant with descriptive messages. Integration tests validate timeout killing, stdout size rejection, and NATS payload rejection.
Parser: reject |not modifier (PR #103)
Writing field|not: value in a Sigma rule is a common mistake. The not keyword is a condition-level operator, not a value modifier. Previously this would produce a generic "unknown modifier" error. Now the parser returns a dedicated NotIsNotAModifier error with guidance:
notis not a value modifier in Sigma; express negation in the condition (e.g.not selection) or move the inverted check into a separate detection used as a filter (e.g.selection and not other)
Regression test suite (PRs #105, #106)
A new regression_eval.rs test file (459 lines) locks down optimizer and prefilter correctness with differential tests:
| Test | What it validates |
|------|---------...
v0.10.0
TL;DR
RSigma v0.10.0 is the "dynamic pipelines" release:
- Dynamic Sigma Pipelines: declare HTTP, command, file, and NATS sources inside pipeline YAML, with template expansion, include directives, TTL caching, background refresh, and three extract languages (jq, JSONPath, CEL).
- A new
rsigma resolveCLI command and full daemon integration with Prometheus instrumentation. - Native EVTX input: evaluate Sigma rules directly against Windows Event Log binary files.
- Pipeline hot-reload: the daemon now watches pipeline files alongside rules.
- Builtin pipelines:
ecs_windowsandsysmonembedded at compile time. - Comprehensive fuzz testing: 14 cargo-fuzz harnesses covering all untrusted input surfaces.
- Security hardening: SQL injection prevention, recursion limits, condition DoS caps, SIGTERM handler, and event size limits.
- CI and supply chain: MSRV enforcement, cargo-deny, serde_yaml migration, Dependabot, SECURITY.md, and CONTRIBUTING.md.
What's New
Dynamic Sigma Pipelines (PRs #86-#93)
Pipelines can now declare external data sources that are resolved at runtime and injected into pipeline fields via template expansion. This is a capability unique to RSigma: no other Sigma engine supports dynamic processing pipelines.
Four source types. A new sources section in pipeline YAML declares named data sources:
sources:
threat_intel:
type: http
url: https://feeds.example.com/iocs.json
format: json
extract:
expr: ".indicators[].value"
type: jsonpath
refresh:
interval: 300
on_error: use_cached
required: false| Source type | Description |
|---|---|
http |
Fetch from a URL (GET/POST) with optional headers |
command |
Execute a local command and capture stdout |
file |
Read from a local file path |
nats |
Subscribe to a NATS subject for push-based updates |
Template expansion. Pipeline field values reference resolved source data via ${source.threat_intel} syntax. Templates are expanded after all sources resolve, before the pipeline is applied to rules.
Three extract languages. Source responses can be filtered before injection:
| Type | Engine | Example |
|---|---|---|
jq (default) |
jaq | .records[] | .ip |
jsonpath |
jsonpath-rust | $.indicators[*].value |
cel |
cel-interpreter | data.filter(x, x.severity > 3) |
Include directives. Pipelines can include other pipeline fragments via include sources, with a recursive depth limit of 1. Remote includes (HTTP, NATS) require the --allow-remote-include daemon flag.
TTL-based caching. Resolved source data is cached in SQLite with configurable TTL. A cache invalidation API allows on-demand refresh without waiting for expiry.
Background refresh. After startup, sources refresh on their configured interval in the background. Failures for non-required sources do not block the pipeline; the last cached value is used (configurable via on_error: use_cached | fail | ignore).
SIGHUP re-resolution. Sending SIGHUP to the daemon triggers both a rule reload and a full source re-resolution cycle.
NATS control subject. A NATS message on a configurable control subject triggers source re-resolution, enabling external orchestration of pipeline updates.
rsigma resolve command (PR #88). A new CLI subcommand resolves dynamic sources and prints results:
# Resolve all sources in a pipeline
rsigma resolve -p pipelines/dynamic_threat_intel.yml
# Resolve a specific source by ID
rsigma resolve -p pipelines/dynamic_threat_intel.yml -s threat_intel --pretty
# Dry-run: show source metadata without resolving
rsigma resolve -p pipelines/dynamic_threat_intel.yml --dry-run
rsigma validate --resolve-sources (PR #88). Validate that pipeline sources can be resolved successfully alongside rule validation.
Prometheus metrics (PR #88). Five new metrics track source resolution in the daemon:
| Metric | Labels | Description |
|---|---|---|
rsigma_source_resolves_total |
source_id, source_type |
Total source resolution attempts |
rsigma_source_resolve_errors_total |
source_id, error_kind |
Resolution errors by kind (Fetch, Parse, Extract, Timeout) |
rsigma_source_resolve_seconds |
Resolution latency histogram | |
rsigma_source_cache_hits_total |
Cache hit counter | |
rsigma_source_last_resolved_timestamp |
source_id |
Unix timestamp of last successful resolution |
/api/v1/status extension (PR #88). The status endpoint now includes a dynamic_sources summary when sources are configured:
{
"status": "running",
"dynamic_sources": {
"total": 3,
"resolves_total": 42,
"errors_total": 1,
"cache_hits": 38
}
}Full test coverage. Integration and E2E tests validate the entire dynamic pipeline lifecycle against real daemon instances (PR #90). Criterion benchmarks measure resolution throughput and template expansion overhead (PR #91). Seven dedicated fuzz targets cover source YAML parsing, template expansion, extract expressions, include parsing, and HTTP response handling (PR #92). SigmaHQ corpus regression validates that dynamic pipelines do not regress existing static pipeline behavior (PR #93).
EVTX input adapter (PR #85)
RSigma can now evaluate Sigma rules directly against Windows Event Log binary files (.evtx). The adapter uses the evtx crate to parse the binary format and yield JSON records that feed directly into the detection engine.
# Evaluate rules against a Windows Event Log file
rsigma eval -r rules/windows/ -e @Security.evtx
# Works with pipelines
rsigma eval -r rules/ -p sysmon -e @Microsoft-Windows-Sysmon%4Operational.evtx
Auto-detection is extension-based: any @path argument ending in .evtx (case-insensitive) is routed through the EVTX parser. The feature is compile-time gated behind the evtx feature flag (included in default features).
Pipeline hot-reload (PR #68)
The daemon file watcher now monitors pipeline YAML files alongside the rules directory. Changes to any referenced pipeline file trigger the same debounced reload cycle as rule changes:
- Filesystem events on watched
.yml/.yamlfiles (500 ms debounce) - SIGHUP signal (Unix)
POST /api/v1/reloadendpoint
If a pipeline file fails to parse during reload, the old engine configuration is preserved and rsigma_reloads_failed_total is incremented.
Builtin pipelines (ecs_windows, sysmon) are embedded at compile time and excluded from the file watcher.
Bundled pipelines (PR #69)
Two processing pipelines are now embedded in the binary via include_str!():
| Name | Description |
|---|---|
ecs_windows |
Sigma/Sysmon field names to Elastic Common Schema (process creation, network, file, registry, DNS, pipe, driver, remote thread, process access) |
sysmon |
Adds EventID conditions for logsource-to-Sysmon-event routing |
Reference them by name instead of a file path:
rsigma eval -r rules/ -p ecs_windows -e @events.json
rsigma daemon -r rules/ -p sysmon
rsigma convert -r rules/ -t postgres -p ecs_windows
Fuzz testing (PR #70, PR #92)
Fourteen cargo-fuzz harnesses now cover every untrusted input surface:
| Target | Surface |
|---|---|
fuzz_parse_yaml |
Sigma YAML parser |
fuzz_condition |
Condition expression parser |
fuzz_field_modifiers |
Field modifier parsing |
fuzz_eval_matching |
Event evaluation engine |
fuzz_regex_compile |
Regex pattern compilation |
fuzz_pipeline_yaml |
Pipeline YAML parsing |
fuzz_input_formats |
Input format auto-detection (JSON, syslog, logfmt, CEF) |
fuzz_pipeline_sources_yaml |
Dynamic source YAML parsing |
fuzz_extract_jq |
jq extract expression evaluation |
fuzz_extract_jsonpath |
JSONPath extract expression evaluation |
fuzz_extract_cel |
CEL extract expression evaluation |
fuzz_template_expand |
Template ${source.*} expansion |
fuzz_include_parse |
Include directive parsing |
fuzz_http_response |
HTTP response body handling |
Seed corpora include real SigmaHQ rules, handcrafted adversarial inputs, and valid pipeline examples. A weekly scheduled CI job runs all targets with per-target --max_len limits. Crashes upload as artifacts.
Security hardening (PRs #71-#76)
Six PRs address security, robustness, and code quality:
SQL injection prevention (PR #71). The PostgreSQL backend now validates all identifiers (table, schema, field segments) against ^[A-Za-z_][A-Za-z0-9_$]*$ before embedding them in SQL. Malicious inputs are rejected with ConvertError::InvalidIdentifier instead of being interpolated.
Unbounded recursion limits (PR #71). YAML deep-merge is capped at 64 levels (MAX_DEPTH). Exceeding the limit returns SigmaParserError::MergeTooDeep.
Condition DoS caps (PR #71). Condition expressions are limited to 64 KiB (MAX_CONDITION_LEN) and 64 nesting levels (MAX_CONDITION_DEPTH). Both limits return descriptive parse errors instead of stack overflow.
SIGTERM handler (PR #74). The daemon now handles SIGTERM with the same graceful shutdown path as Ctrl+C: drain the pipeline within --drain-timeout, persist correlation state, and exit cleanly.
parking_lot mutexes (PR #74). Internal mutexes migrated from std::sync::Mutex to parking_lot::Mutex for fairer scheduling and no poisoning.
Event size cap (PR #74). HTTP ingestion rejects individual lines exceeding 1 MiB with 413 Payload Too Large.
Code quality (PR #75). KEY_CACHE completeness test ensures all modifier keys are cached. partial_cmp replaced with total_cmp for deterministic float comparisons.
Testing gaps (PR #76). Runtime integration tests and parser AST snapshot tests added to cover previously untested paths.
CI and supply chain (PRs #72-#73)
MSRV enforcement. A dedicated CI job runs cargo check --workspace --all-features --locked on the declared MSRV (...
v0.9.0
TL;DR
RSigma v0.9.0 is one of the largest releases yet:
- Production-grade NATS JetStream with at-least-once delivery, authentication and TLS, dead-letter queues, replay from offset or timestamp, consumer groups, and sequence-aware correlation state restoration
- Native OpenTelemetry log ingestion over HTTP (protobuf + JSON) and gRPC
- A new LynxDB conversion backend for SPL2-compatible queries
- The
rsigma fieldsfield catalog - Structured exit codes for CI/CD scripting
- Per-rule Prometheus metric labels
- The entire codebase restructured into directory-based modules
- And a comprehensive E2E test suite validating every I/O path against real Postgres and NATS instances via testcontainers
What's New
NATS production hardening (PR #59)
Five features bring the NATS pipeline from development-grade to production-ready.
At-least-once delivery with deferred ack. The streaming pipeline has been refactored from at-most-once to at-least-once delivery. Messages are now held in an AckToken until the sink confirms delivery. A new RawEvent struct bundles each payload with its ack token, and a dedicated ack task resolves tokens after sink confirmation. If the daemon crashes before ack, NATS redelivers the message after ack_wait expires. The EventSource trait now returns Option<RawEvent> instead of Option<String>, and NatsSink has been upgraded from core NATS publish to JetStream publish with server-confirmed persistence.
Authentication and TLS. A new NatsConnectConfig struct supports credentials file, token, username/password, NKey, mutual TLS (client cert + key), and require-TLS. Auth methods are mutually exclusive; the first configured one wins. Sensitive values can also be read from environment variables.
| CLI flag | Environment variable | Description |
|---|---|---|
--nats-creds |
NATS_CREDS |
Credentials file path |
--nats-token |
NATS_TOKEN |
Authentication token |
--nats-user / --nats-password |
NATS_USER / NATS_PASSWORD |
Username and password |
--nats-nkey |
NATS_NKEY |
NKey seed |
--nats-tls-cert / --nats-tls-key |
Client certificate and key for mutual TLS | |
--nats-require-tls |
Require TLS on the connection |
Dead-letter queue. Events that fail processing are routed to a configurable DLQ instead of being silently discarded. The --dlq flag accepts the same URL schemes as --output (stdout://, file://, nats://). Each DLQ entry is a JSON object containing original_event, error, and timestamp. Integration points: parse errors detected before engine processing and sink delivery failures. A new rsigma_dlq_events_total Prometheus counter tracks DLQ volume.
# Route failed events to a file
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --dlq file:///var/log/rsigma-dlq.ndjson
# Route failed events to a NATS subject
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --dlq nats://localhost:4222/dlq.rsigma
Replay from offset or timestamp. A ReplayPolicy enum (Resume, FromSequence, FromTime, Latest) controls the JetStream consumer's starting position. Three mutually exclusive CLI flags set the policy. Correlation state restoration is handled intelligently based on the replay direction (see "Smart correlation state restoration" below).
# Replay from a specific stream sequence
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --replay-from-sequence 42
# Replay from a point in time
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --replay-from-time 2026-04-30T00:00:00Z
# Start from the latest message, ignoring history
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --replay-from-latest
Consumer groups for horizontal scaling. The --consumer-group flag sets a shared durable consumer name across multiple daemon instances. All instances using the same group name pull from a single JetStream consumer, and NATS automatically distributes messages for load balancing. When not specified, the consumer name is auto-derived from the subject (existing behavior).
# Two daemon instances sharing a consumer group
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --consumer-group detection-workers
Smart correlation state restoration (PR #61)
The daemon now makes intelligent decisions about whether to restore correlation state from SQLite when restarting with a replay flag. Previously, any non-Resume replay policy unconditionally cleared correlation state to avoid double-counting. This was correct for forensic replay but overly conservative for forward catch-up scenarios where the daemon shuts down and restarts with --replay-from-sequence pointing after the last processed event.
Sequence-aware auto-restore. The daemon now tracks the NATS JetStream stream sequence and published timestamp of the last acknowledged message. This SourcePosition is stored alongside the correlation snapshot in SQLite (two new columns added via automatic schema migration). On restart, the decide_state_restore function compares the replay start point against the stored position: if the replay starts after the stored position (forward catch-up), state is restored safely; if at or before (backward replay), state is cleared to prevent double-counting.
Explicit overrides. Two new mutually exclusive CLI flags give operators direct control when the automatic decision is not appropriate:
| Flag | Behavior |
|---|---|
--keep-state |
Always restore correlation state, regardless of replay policy |
--clear-state |
Always clear correlation state and start fresh |
| (neither) | Automatic decision based on replay direction and stored position |
# Forward catch-up: state is auto-restored (replay starts after stored position)
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --replay-from-sequence 1001 --state-db state.db
# Forensic replay: state is auto-cleared (replay starts before stored position)
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --replay-from-sequence 1 --state-db state.db
# Force restore regardless of replay direction
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --replay-from-sequence 1 --state-db state.db --keep-state
Timestamp fallback control. A new --timestamp-fallback flag (wallclock or skip) controls how correlation windows handle events without parseable timestamp fields. The default wallclock substitutes the current time (existing behavior). The new skip mode causes detections to still fire but omits the event from correlation state updates, preventing wall-clock times from corrupting temporal windows during forensic replay of historical logs.
# Skip events without timestamps for correlation (detections still fire)
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --timestamp-fallback skip
Automatic schema migration. Existing SQLite state databases are transparently migrated on first open. The migration adds the source_sequence and source_timestamp columns without losing the existing correlation snapshot.
Codebase modularization (PRs #46-#58)
Thirteen PRs systematically split 12 large single-file modules into directory-based module structures across all six crates, improving navigability and reducing merge conflicts. The refactoring is purely structural with no behavioral changes.
| PR | File | Result |
|---|---|---|
| #46 | lint.rs (4,991 lines) |
lint/{mod,rules/{metadata,detection,correlation,filter,shared}}.rs |
| #47 | main.rs (2,221 lines) |
commands/{parse,validate,lint,eval,convert}.rs |
| #48 | postgres.rs (3,183 lines) |
postgres/{mod,correlation,tests}.rs |
| #49 | correlation_engine.rs (4,395 lines) |
correlation_engine/{mod,types,tests}.rs |
| #50 | transformations.rs (3,379 lines) |
pipeline/transformations/{mod,helpers,tests}.rs |
| #51 | parser.rs (2,276 lines) |
parser/{mod,detection,correlation,filter,tests}.rs |
| #52 | pipeline/mod.rs (2,235 lines) |
pipeline/{mod,parsing}.rs |
| #53 | compiler.rs (1,824 lines) |
compiler/{mod,helpers,tests}.rs |
| #54 | correlation.rs (1,781 lines) |
correlation/{mod,types,buffers,compiler,keys,window,tests}.rs |
| #55 | engine.rs (1,656 lines) |
engine/{mod,filters,tests}.rs |
| #56 | matcher.rs (1,118 lines) |
matcher/{mod,matching,helpers}.rs |
| #57 | event.rs (758 lines) |
event/{mod,json,kv,plain,map}.rs |
| #58 | cli/tests/cli.rs (1,745 lines) |
tests/{cli_parse,cli_validate,cli_lint,cli_eval,cli_daemon,common/mod}.rs |
Additional cleanup: is_valid_uuid was de-duplicated across lint rule modules, and pipeline parsing logic was extracted from mod.rs into its own submodule.
E2E test suite (PR #60)
A comprehensive end-to-end test suite validates every major I/O path against real infrastructure. All container-based tests use testcontainers and are automatically skipped when Docker is unavailable.
PostgreSQL integration tests. Convert Sigma rules to SQL and execute the generated queries against a real PostgreSQL instance. Uses the Okta cross-tenant impersonation scenario with JSONB schema, 6 sample events, and 4 SigmaHQ detection rules. Tests cover default format, VIEW creation, multi-rule conversion, event_count correlation, and the no-match case.
NATS E2E tests (binary-level). Spawn the rsigma daemon as a child process with --input/--output NATS URLs pointed at a testcontainers NATS instance. Four tests cover single detection, no-match silence, event_count correlation, and fan-out to multiple output subjects.
NATS E2E tests (library-level). Additional integration tests in rsigma-runtime covering JetStream publish/subscribe, detection routing, and the article scenarios from the companion blog ...
v0.8.1
TL;DR
RSigma v0.8.1 is a patch release for the PostgreSQL backend. Dotted Sigma field names (like securityContext.isProxy) now generate correct chained JSONB operators when using -O json_field=....
What's New
Nested JSONB field paths (#45)
When json_field is set (e.g. -O json_field=data), the PostgreSQL backend now generates chained -> / ->> operators for dotted Sigma field names instead of treating the entire dotted string as a single flat key.
Before (v0.8.0):
-- securityContext.isProxy treated as a literal top-level key (incorrect)
SELECT * FROM okta_events WHERE data->>'securityContext.isProxy' = 'true'After (v0.8.1):
-- Nested traversal into the securityContext object (correct)
SELECT * FROM okta_events WHERE data->'securityContext'->>'isProxy' = 'true'Deeply nested paths work as expected:
| Sigma field | Generated SQL |
|---|---|
eventType |
data->>'eventType' (unchanged) |
securityContext.isProxy |
data->'securityContext'->>'isProxy' |
actor.detail.sub.field |
data->'actor'->'detail'->'sub'->>'field' |
All intermediate segments use -> (returns jsonb), and the final segment uses ->> (returns text). Flat field names without dots are unaffected. NULL propagation works correctly for existence checks: data->'nonexistent'->>'child' returns NULL, so IS NOT NULL behaves as expected on nested paths.
This is particularly important for Okta System Log rules from SigmaHQ, where fields like securityContext.isProxy and client.ipAddress reference nested JSON objects.
Upgrade
cargo install rsigma
# or
docker pull ghcr.io/timescale/rsigma:0.8.1Full Changelog
v0.8.0
TL;DR
RSigma v0.8.0 is the "rule conversion" release. A new rsigma-convert crate transforms Sigma rules into backend-native query strings through a pluggable Backend trait. The first production backend targets PostgreSQL/TimescaleDB, a backend unique to RSigma and inspired by pySigma-backend-sqlite and pySigma-backend-athena. The CLI gains convert, list-targets, and list-formats commands. Multi-arch Docker images are now published to GHCR on every release. Processing pipelines support one-to-many field name mapping, and filter rules reach full behavioral parity with pySigma.
Please test this (and RSigma in general) and provide feedback. Contributions are also very welcome.
What's New
rsigma-convert crate (#36)
A new library crate for converting parsed Sigma rules into backend-native queries (SQL, SPL, KQL, Lucene, etc.):
Backendtrait with ~30 methods covering condition dispatch, detection item conversion, field/value escaping, regex, CIDR, comparison operators, field existence, field references, keywords, IN-list optimization, deferred expressions, and query finalization.TextQueryConfigwith ~90 configuration fields mirroring pySigma'sTextQueryBackendclass variables: precedence, boolean operators, wildcards, string/field quoting, match expressions (startswith/endswith/contains + case-sensitive variants), regex/CIDR templates, compare ops, IN-list optimization, unbound values, deferred parts, and query envelope.- Condition tree walker that recursively converts
ConditionExprnodes into query strings with selector/quantifier support. - Orchestrator via
convert_collection(), which applies pipelines, converts each rule, and collects results and errors. - Deferred expressions through the
DeferredExpressiontrait andDeferredTextExpressionfor backends that need post-query appendages (e.g. Splunk| regex,| where). - Test backend (
TextQueryTestBackendandMandatoryPipelineTestBackend) for backend-neutral foundation testing.
PostgreSQL/TimescaleDB backend (#37, #38, #43, #44)
The first production backend, and one that has no equivalent in the pySigma ecosystem. It is inspired by pySigma-backend-sqlite and pySigma-backend-athena, targeting PostgreSQL natively and leveraging features that map cleanly to Sigma modifiers:
| Sigma Modifier | PostgreSQL SQL |
|---|---|
contains |
ILIKE (case-insensitive) |
startswith / endswith |
ILIKE |
cased |
LIKE (case-sensitive) |
re |
~* (case-insensitive regex) or ~ (with cased) |
cidr |
field::inet <<= 'value'::cidr |
exists |
IS NOT NULL / IS NULL |
| keywords | to_tsvector() @@ plainto_tsquery() |
Five output formats:
| Format | Description |
|---|---|
default |
Plain SELECT * FROM {table} WHERE ... queries |
view |
CREATE OR REPLACE VIEW sigma_{id} AS SELECT ... |
timescaledb |
Queries with time_bucket() for TimescaleDB optimization |
continuous_aggregate |
CREATE MATERIALIZED VIEW ... WITH (timescaledb.continuous) |
sliding_window |
Correlation queries using window functions for per-row sliding detection |
Additional capabilities:
- SELECT column selection (inspired by pySigma-backend-athena): when a Sigma rule specifies
fields:, the backend emitsSELECT field1, field2, ...instead ofSELECT *. Supportsfield as aliassyntax and passthrough of function calls. - CLI backend options:
-O key=valueflags are now wired through to the PostgreSQL backend. Recognized keys:table,schema,database,timestamp_field,json_field,case_sensitive_re. - Custom table/schema/database resolution at three levels: rule-level
custom_attributes, pipelineset_state, and backend defaults. - Multi-table temporal correlations: when referenced detection rules target different tables (via per-logsource pipeline routing or custom attributes), the backend automatically generates a
UNION ALLCTE. Single-table correlations use the simpler direct approach. - CTE-based correlation pre-filtering (inspired by pySigma-backend-athena): non-temporal correlations wrap referenced rules' queries in a
WITH combined_events AS (q1 UNION ALL q2 ...)CTE, so aggregations only count events matching the detection logic rather than scanning the entire table. - Sliding window correlations (inspired by pySigma-backend-athena): the
sliding_windowoutput format uses SQL window functions (COUNT(*) OVER (PARTITION BY ... ORDER BY ... RANGE BETWEEN INTERVAL ... PRECEDING AND CURRENT ROW)) forevent_countcorrelations. This produces a per-row sliding window that identifies every event crossing the threshold, complementing the defaultGROUP BY+HAVINGapproach for periodic polling. - OCSF processing pipelines: two included pipelines for single-table (
ocsf_postgres.yml) and per-logsource multi-table routing (ocsf_postgres_multi_table.yml). - Reference TimescaleDB schema with hypertable setup, indexes (B-tree, GIN for full-text and JSONB), compression, retention policies, and an example continuous aggregate.
- Correlation SQL generation using
GROUP BY/HAVINGfor aggregation types (event_count,value_count,value_sum,value_avg,value_percentile,value_median) and CTEs with window functions for temporal correlation.
CLI: convert, list-targets, list-formats
# Convert rules to backend-native queries
rsigma convert -r rules/ -t postgres
# Convert with a processing pipeline and specific output format
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres.yml -f view
# Multi-table pipeline (per-logsource routing)
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres_multi_table.yml
# Generate TimescaleDB continuous aggregates
rsigma convert -r rules/ -t postgres -p pipelines/ocsf_postgres.yml -f continuous_aggregate
# Custom backend options (table, schema, timestamp field, etc.)
rsigma convert -r rules/ -t postgres -O table=security_logs -O schema=public -O timestamp_field=created_at
# Sliding window correlation format
rsigma convert -r rules/ -t postgres -f sliding_window
# List available conversion backends
rsigma list-targets
# List available output formats for a backend
rsigma list-formats postgresOptions include -p / --pipeline (repeatable), -f / --format, -o / --output, --skip-unsupported, --without-pipeline, and -O / --option for backend-specific key=value pairs.
Multi-arch Docker image (#39)
Multi-arch images (linux/amd64, linux/arm64) are published to GHCR on every release:
docker pull ghcr.io/timescale/rsigma:latest
docker run --rm ghcr.io/timescale/rsigma:latest --helpOne-to-many field name mapping (#40, #41)
Thanks to @fwosar, FieldNameMapping now supports mapping a single source field to multiple alternative field names. When more than one alternative is present, the matched detection item is replaced with an OR-conjunction (AnyOf) of items, one per alternative, preserving the rule's original AND structure across the rest of the items in the same selection via Cartesian expansion.
# Pipeline mapping: one source field -> multiple alternatives
transformations:
- id: multi_field_mapping
type: field_name_mapping
mapping:
CommandLine:
- process.command_line
- process.argsThe expansion is capped at 4,096 combinations per detection to prevent runaway Cartesian products in rules with many multi-mapped fields. For correlation rules, group_by fields are expanded to include all alternatives, while aliases mapping values and threshold field reject one-to-many mappings with an error since those positions are inherently scalar.
pySigma filter parity (#42)
Filter rules now match pySigma semantics across parsing, application, and linting:
Structural parity (Phase 0):
FilterRuleandCorrelationRuleAST types now carry the full set of standard Sigma fields:related,license,fields,scope,taxonomy,references,tags,falsepositives,level, andcustom_attributes.RelationType::Correlationadded for therelatedfield.CorrelationCondition::Thresholdgains apercentilefield (previously the percentile rank was overloaded from the condition's threshold value).- Correlation
condition.fieldsupports lists (Option<Vec<String>>) for multi-fieldvalue_count. - Lint rules updated: correlation and filter rule known keys expanded,
"correlation"added as a valid related type.
Behavioral parity (Phase 1):
filter.rulesaccepts"any"(string) and omission, both meaning "apply to all rules". The newFilterRuleTargetenum (Any|Specific(Vec<String>)) replaces the oldVec<String>.- Filter condition expressions are rewritten with namespaced identifiers (
__filter_0_selection) and applied as written, instead of hardcoding AND-NOT. Filters that exclude events must usenot selectionexplicitly in their condition. - Logsource matching changed from symmetric compatibility to asymmetric containment: every field the filter specifies must be present and equal in the rule, but fields the filter omits are treated as wildcards.
- Lint rule
MissingFilterRulesupdated:filter.rulesis now optional (omitted means "ap...
v0.7.0
TL;DR
RSigma v0.7.0 is the "any log format" release. The evaluation engine now operates on a generic Event trait instead of raw JSON, a new rsigma-runtime library crate decouples the streaming pipeline from the CLI, and the daemon can ingest JSON, syslog (RFC 3164/5424), logfmt, CEF, and plain text, with auto-detection by default. Hand-rolled zero-dependency parsers for logfmt and CEF keep the dependency tree lean.
This release is inspired by sigma_engine, thanks to @thomaspatzke and Sigma HQ folks.
What's New
Generic Event trait (breaking)
The rsigma-eval::Event struct has been replaced by an Event trait with three concrete implementations:
JsonEvent: wrapsserde_json::Value(the previous behavior)KvEvent: key-value map for structured formats (syslog, logfmt, CEF)PlainEvent: raw text for keyword-only matching
An EventValue enum provides typed access to field values across all implementations. This is a breaking change: callers using Event::new(value) should switch to JsonEvent::borrow(&value) or JsonEvent::owned(value).
rsigma-runtime crate
The streaming pipeline has been extracted from the CLI daemon into a reusable library crate:
RuntimeEngine: wrapsEngine+CorrelationEnginewith rule loading, hot-reload, and state management.LogProcessor: batch processing pipeline withArcSwapfor atomic engine swap, pluggableMetricsHook, andEventFilterfor JSON payload extraction (e.g..records[]).- Input format adapters (
input/module): JSON, syslog, logfmt, CEF, plain text, and auto-detect. Each adapter parses a raw line intoEventInputDecoded, a static-dispatch enum that implementsEventwithoutdynoverhead. - I/O primitives:
EventSourcetrait andSinkenum (stdin, stdout, file, NATS) moved from the CLI.
Multi-format input (--input-format)
The daemon and eval commands now accept --input-format and --syslog-tz:
# Auto-detect (default): tries JSON → syslog → plain
rsigma daemon -r rules/
# Explicit syslog with timezone offset
rsigma daemon -r rules/ --input-format syslog --syslog-tz +0530
# logfmt (requires logfmt feature)
rsigma eval -r rules/ --input-format logfmt < app.log
# CEF (requires cef feature)
rsigma eval -r rules/ --input-format cef < arcsight.logAuto-detect validates syslog parsing results (checks for facility/severity/hostname) before accepting and it won't misparse random text as syslog.
Zero-dependency parsers
- logfmt: hand-rolled parser supporting quoted values with escape sequences, bare keys, and mixed whitespace. No external dependencies.
- CEF (Common Event Format): hand-rolled parser for the full ArcSight CEF spec including 7-field pipe-delimited header + key=value extensions with
\=,\n,\\escapes. Handles syslog-wrapped CEF viafind_cef_start().
Both are feature-gated (logfmt, cef) and thoroughly tested with real-world log samples.
Examples and benchmarks
examples/jsonl_stdin.rs: read NDJSON from stdin, print detections.examples/tail_syslog.rs: read a syslog file, parse and evaluate.- Throughput benchmark suite (
runtime_throughput): Criterion benchmarks for theLogProcessorpipeline across all formats.
Baseline results (Apple M4 Pro, 100 rules):
| Format | Throughput |
|---|---|
| Plain text | 5.5–10.9 Melem/s |
| Syslog | 1.26–1.40 Melem/s |
| JSON | 955 Kelem/s–1.15 Melem/s |
| Auto-detect | ~966 Kelem/s–1.09 Melem/s |
Rule-count scaling is near-flat from 100 to 1,000 rules thanks to logsource index pruning.
Other changes
- Custom attributes (
custom_attributes): propagate custom rule attributes through results, then unified across detection and correlation rules into a singlecustom_attributesfield (breaking:custom_rule_attributesremoved). Thanks to @fwosar (#26). - Lint
--exclude: glob patterns to skip files during linting, plus detection of deprecated aggregation syntax. - Line feeds in conditions: fixed parsing of condition expressions containing line breaks. Thanks to @fwosar (#24).
- Dependencies:
notify7 → 8.2,rustls-webpki→ 0.103.13. - Architecture diagram: updated in README to reflect the runtime layer and Event trait.
BENCHMARKS.md: documents all benchmark groups, baseline results, and the 5% regression threshold.
Breaking Changes
| Before (0.6.0) | After (0.7.0) |
|---|---|
use rsigma_eval::Event; (struct) |
use rsigma_eval::event::Event; (trait) |
Event::new(value) |
JsonEvent::borrow(&value) or JsonEvent::owned(value) |
Event::from_value(v) |
JsonEvent::borrow(&v) |
result.custom_rule_attributes |
result.custom_attributes |
Contributors
Thanks to @fwosar for their contributions to this release (#24, #26).
Full Changelog
v0.6.0
TL;DR
RSigma grew up. v0.6.0 makes the daemon production-ready for streaming detection: plug it into NATS JetStream or HTTP, fan out to multiple sinks, and let rayon + an inverted index chew through rules 2-3× faster. Stateful correlation still survives restarts via SQLite, stdin/stdout still works by default, and cargo audit is back to zero vulnerabilities.
This release resolves the "not meant for streaming logs" gap correctly identified in Detection Engineering Weekly #149 by Zack Allen and positions RSigma as a single-node streaming detection engine - not just a CLI forensics tool. Three levels of work landed:
- Level 1: pluggable I/O adapters (NATS, HTTP, file, fan-out)
- Level 2: async pipeline hardening (backpressure, micro-batching, drain)
- Level 3: inverted index + feature-gated rayon parallel batch evaluation
Full details below.
What's New
v0.6.0 turns the RSigma daemon from a stdin/stdout pipe into a deployable streaming detection engine with parallel evaluation.
Streaming I/O adapters (Level 1)
The daemon now speaks NATS JetStream, HTTP, and files and not just stdin/stdout.
# HTTP POST input
rsigma daemon -r rules/ --input http
# Then: curl -X POST http://localhost:9090/api/v1/events -d '{"CommandLine":"whoami"}'
# NATS JetStream end-to-end (requires daemon-nats feature)
rsigma daemon -r rules/ --input nats://localhost:4222/events.> --output nats://localhost:4222/detections
# File output
rsigma daemon -r rules/ --output file:///var/log/detections.ndjson
# Fan-out to multiple sinks
rsigma daemon -r rules/ --output stdout --output file:///tmp/detections.ndjsonEventSourcetrait +Sinkenum: pluggable adapters with enum dispatch; async-friendlySink::FanOut(Vec<Sink>)for multi-sink output.--input/--outputURL schemes:stdin://,http://,nats://,file://,stdout://; multiple--outputflags cloneProcessResultper sink via bounded mpsc channels.daemon-natsfeature flag: gatesasync-nats; durable JetStream consumer with ACK, publisher sink.process_line()refactored: now returns typedProcessResult; serialization is a sink concern, engine stays pure.
Async pipeline hardening (Level 2)
- Fully async stdin via
tokio::io::AsyncBufReadExt(no morespawn_blocking). - Configurable back-pressure:
--buffer-size(default 10,000) sets bounded mpsc capacity for both source→engine and engine→sink queues. - Micro-batched evaluation:
--batch-size(default 1); engine collects up to N events per mutex acquisition viatry_recv(). - Graceful drain on shutdown:
--drain-timeout(default 5s) lets in-flight events finish before state save; natural EOF drains without timeout. - 5 new Prometheus metrics:
rsigma_input_queue_depth,rsigma_output_queue_depth(gauges)rsigma_back_pressure_events_total(counter)rsigma_pipeline_latency_seconds,rsigma_batch_size(histograms)
Performance: inverted index + parallel batch evaluation (Level 3)
- Inverted index:
RuleIndexmaps(field, exact_value) → rule indicesat load time.Engine::evaluate()queries candidates instead of scanning all rules. Rules without exact-match items are marked unindexable and always evaluated (no false negatives). - Feature-gated rayon: new
parallelfeature onrsigma-evalenablesEngine::evaluate_batch()andCorrelationEngine::process_batch(). Parallel detection + sequential correlation via a borrow split. - Daemon integration:
process_batch_lines()replaces the per-event loop. - Benchmark results (5,000 rules, synthetic events):
- Detection evaluation: 2.4–2.7× speedup from indexing alone.
- Correlation throughput: ~1.7× improvement (indexed + sequential).
- Batch evaluation scales with core count.
New public APIs (rsigma-eval)
Engine::evaluate_batch(&self, events: &[&Event]) -> Vec<Vec<MatchResult>>CorrelationEngine::evaluate(&self, event: &Event) -> Vec<MatchResult>CorrelationEngine::process_with_detections(&mut self, event, detections, timestamp_secs) -> ProcessResultCorrelationEngine::process_batch(&mut self, events: &[&Event]) -> Vec<ProcessResult>
Pipeline parity
- Named condition IDs supported in
rule_cond_expression(not just numeric indices). - Correlation rules now apply processing pipelines consistently with detection rules.
Dependencies & security
async-nats0.46 → 0.47 (drops pinned vulnerablerustls-webpki 0.102.8,rustls-pemfile 2.2.0,rand 0.8.5)rustls-webpki→ 0.103.12 (RUSTSEC-2026-0049/-0098/-0099)rand→ 0.9.4 (GHSA-cq8v-f236-94qc/RUSTSEC-2026-0097)lodashoverride → 4.18.x in VS Code extension (devDependency-only)- Dropped now-obsolete
--ignore RUSTSEC-2026-0049fromaudit.yml cargo audit: 0 vulnerabilities
Docs
- README updated with streaming examples (NATS/HTTP/file/fan-out/batching).
rsigma-evalREADME updated with new APIs and benchmark tables.
Full Changelog
v0.5.0
Daemon mode (rsigma daemon)
rsigma can now run as a long-running service for real-time event processing, with hot-reload, health checks, metrics, and a REST API.
rsigma daemon -r rules/ -p ecs.yml --api-addr 127.0.0.1:8080- Hot-reload: file watcher, SIGHUP, and
/api/v1/reloadendpoint. Correlation state is preserved across reloads. - Health endpoints:
/healthz,/readyz - Prometheus metrics: events processed, detection/correlation matches, rules loaded, uptime, state entries
- REST API:
/api/v1/status,/api/v1/rules,/api/v1/reload - Structured logging: JSON via
tracingwithRUST_LOGcontrol
SQLite state persistence (--state-db)
Correlation state (windows, suppression timers, event buffers) now survives daemon restarts.
rsigma daemon -r rules/ -p ecs.yml --state-db ./rsigma-state.db --state-save-interval 10- Periodic snapshots (configurable via
--state-save-interval, default 30s) - Graceful shutdown save
- Schema-versioned snapshots for forward compatibility
- Base64-encoded compressed event buffers for efficient storage
- State preserved across hot-reloads (export before engine swap, re-import after)
CI
- All workflows now use
--all-featuresto cover daemon-gated code
Dependencies
- Removed
protobuftransitive dependency (disabledprometheusdefault features) — resolves RUSTSEC-2024-0437
Full Changelog
v0.4.0
Bug Fixes, Validation, and Dependency Upgrades
Bug fixes
- Filter name collision — Multiple filters sharing detection names (e.g. both using
selection) no longer overwrite each other. Filter detections are now namespaced with a counter to prevent key collisions. - CVE-2026-26996 — Upgraded
minimatchto 10.2.1 in the VS Code extension.
Validation improvements
UnknownDetectionat compile time — Condition expressions referencing non-existent detections now fail eagerly duringcompile_rule()instead of silently at eval time.UnknownRuleRefat load time — Correlationrule_refsare validated to resolve to known rules or correlations when callingadd_collection().
Dependency upgrades
yamlpatch0.11 → 0.12,yamlpath0.33 → 0.34 (Unicode-aware patching, empty-routeRewriteFragmentfix).jsonschema0.29 → 0.42 (13 minor versions of improvements).tower-lsp0.20 →tower-lsp-server0.23 (actively maintained community fork; native async traits).- 49 transitive crate updates via
cargo update.
Test coverage
- ~1,300 lines of new tests: end-to-end integration, correlation edge cases, parser/eval error paths, and pipeline error handling.
Breaking changes
- Removed
EvalError::TimestampParsevariant (unused).
Full changelog: v0.3.0...v0.4.0