Skip to content

[BUG] Doc-Level Monitor Produces False Negatives When Detector Has More Than 10 Rules #1656

@joshikunal94

Description

@joshikunal94

Summary

The doc-level monitor in the OpenSearch Alerting plugin uses SearchSourceBuilder with default size=10 for the percolate query that matches ingested documents against detection rules. When a detector has more than 10 rules, rules beyond the 10th match silently produce false negatives — malicious events that should trigger findings are missed entirely. No error, warning, or log entry is emitted.

This is a silent false negative bug: the system accepts the rules, accepts the documents, and reports success — but simply never generates findings for the dropped rules. Operators have no indication that their detection coverage is incomplete.

Security Consideration

This bug has direct security implications for any organization relying on OpenSearch Security Analytics for threat detection:

  • Undetected threats: Any detector with more than 10 custom rules will have blind spots. Rules that appear correctly deployed and enabled will never fire, creating a false sense of coverage. An attacker whose activity matches one of the dropped rules will go undetected.

  • Silent degradation: There is no alert, error, or health indicator that rules are being dropped. Security teams reviewing their detector configuration will see all rules listed and enabled — nothing suggests that some rules are non-functional. This makes the issue extremely difficult to discover through normal operations.

  • No compensating control: The _execute API (manual detector execution) uses the same code path and has the same limitation, so it cannot be used to verify detection coverage either.

Affected Components

  • Plugin: opensearch-alerting (used by opensearch-security-analytics)
  • File: TransportDocLevelMonitorFanOutAction.kt
  • Function: runPercolateQueryOnTransformedDocs() (line ~985)
  • Versions tested: OpenSearch 3.5.0 (the code has been this way since doc-level monitors were introduced — likely affects all versions)

Root Cause

In TransportDocLevelMonitorFanOutAction.kt, lines 1012-1016:

val searchRequest =
    SearchRequest().indices(*queryIndices.toTypedArray()).preference(Preference.PRIMARY_FIRST.type())
val searchSourceBuilder = SearchSourceBuilder()
searchSourceBuilder.query(boolQueryBuilder)
searchRequest.source(searchSourceBuilder)

SearchSourceBuilder() defaults to size=10 (inherited from SearchSourceBuilder.DEFAULT_SIZE). No .size() call is made, so the percolate search returns at most 10 hits. Each hit represents one matched rule (with _percolator_document_slot listing which documents matched that rule). The downstream code at lines 926-933 iterates over response.hits and builds the docsToQueries map — any rule beyond the 10th hit is never processed, producing a false negative.

Proposed Fix

Add .size() to the percolate SearchSourceBuilder. The size should cover all rules in the queries index:

val searchSourceBuilder = SearchSourceBuilder()
searchSourceBuilder.query(boolQueryBuilder)
searchSourceBuilder.size(10000)  // or dynamically from queryIndex doc count
searchRequest.source(searchSourceBuilder)

Impact

  • False negatives: Rules beyond the 10th match silently produce no findings. Malicious activity matching these rules goes undetected.
  • Deterministic: The same rules always produce false negatives (determined by _id sort order in the queries index — the last N rules by sort order are dropped when there are 10+N rules).
  • Affects both scheduled and _execute API: The _execute endpoint uses the same runPercolateQueryOnTransformedDocs code path, so manual execution cannot be used to work around or verify the issue.
  • Any detector with >10 custom rules is affected: This is not edge-case usage — production SIEM deployments commonly have dozens of rules per log type.

Reproduction

A self-contained reproduction script is provided at reproduce_percolate_bug.sh. It runs against a local OpenSearch 3.5.0 container with no external dependencies.

./reproduce_percolate_bug.sh                          # default localhost:9200
./reproduce_percolate_bug.sh http://host:9200          # custom endpoint

The script performs the following:

  1. Creates a simple index with one keyword field (event_action)
  2. Registers SA field mappings (alias actionevent_action)
  3. Creates 12 custom rules, each matching a unique action value (action-01 through action-12)
  4. Creates a single detector with all 12 rules
  5. Waits for the monitor to initialize (65s)
  6. Ingests 12 documents — one per rule, each guaranteed to match exactly one rule
  7. Waits for the detector to execute (90s)
  8. Queries findings via the _plugins/_security_analytics/findings/_search API

Expected result

Rules in detector:    12
Documents ingested:   12
Findings produced:    12
Distinct rules fired: 12

All rules produced findings (bug NOT reproduced)

Actual result

Rules in detector:    12
Documents ingested:   12
Findings produced:    10
Distinct rules fired: 10

DROPPED (2 rules silently produced no findings):
  <rule_id_1>
  <rule_id_2>

BUG CONFIRMED: expected 12 findings, got 10

Two rules produce false negatives. The cap is always 10 regardless of how many rules match. Increasing to N rules (where N > 10) results in exactly N − 10 false negatives.

script

#!/usr/bin/env bash
# Reproduces the OpenSearch percolate size=10 bug.
# Creates 12 simple rules in a single detector — expects 12 findings, gets 10.
#
# Usage:
#   ./reproduce_percolate_bug.sh [OPENSEARCH_ENDPOINT]
#
# Default endpoint: http://localhost:9200

set -euo pipefail

EP="${1:-http://localhost:9200}"
INDEX="test-percolate-bug-000001"
ALIAS="test-percolate-bug"
LOG_TYPE="others_application"
NUM_RULES=12
MONITOR_INIT_WAIT=65
FINDINGS_WAIT=90

# ── helpers ──────────────────────────────────────────────────────────

api() { curl -sf -H 'Content-Type: application/json' "$@"; }

wait_healthy() {
    echo "Waiting for OpenSearch at ${EP}..."
    for _ in $(seq 1 30); do
        if api "${EP}/_cluster/health" -o /dev/null 2>/dev/null; then
            echo "  OK: $(api "${EP}" 2>/dev/null | python3 -c "import sys,json; v=json.load(sys.stdin)['version']; print(f\"{v['distribution']} {v['number']}\")")"
            return 0
        fi
        sleep 2
    done
    echo "ERROR: OpenSearch not reachable at ${EP}" >&2; exit 1
}

# ── clean slate ──────────────────────────────────────────────────────

cleanup() {
    echo -e "\n=== Cleanup ==="

    # detectors
    local det_ids
    det_ids=$(api -X POST "${EP}/_plugins/_security_analytics/detectors/_search" \
        -d '{"query":{"match_all":{}},"size":100}' 2>/dev/null \
        | python3 -c "import sys,json; [print(h['_id']) for h in json.load(sys.stdin).get('hits',{}).get('hits',[])]" 2>/dev/null) || true
    for did in $det_ids; do
        api -X DELETE "${EP}/_plugins/_security_analytics/detectors/${did}" -o /dev/null 2>/dev/null && echo "  Deleted detector ${did}" || true
    done

    # SA internal indices
    for pat in ".opensearch-sap-*-findings*" ".opensearch-sap-*-queries*" ".opensearch-sap-*-alerts*"; do
        curl -sf -X DELETE "${EP}/${pat}" -o /dev/null 2>/dev/null || true
    done

    # custom rules
    local rule_ids
    rule_ids=$(api -X POST "${EP}/_plugins/_security_analytics/rules/_search?pre_packaged=false" \
        -d '{"query":{"match_all":{}},"size":1000}' 2>/dev/null \
        | python3 -c "import sys,json; [print(h['_id']) for h in json.load(sys.stdin).get('hits',{}).get('hits',[])]" 2>/dev/null) || true
    for rid in $rule_ids; do
        api -X DELETE "${EP}/_plugins/_security_analytics/rules/${rid}?forced=true" -o /dev/null 2>/dev/null || true
    done

    # index
    curl -sf -X DELETE "${EP}/${INDEX}" -o /dev/null 2>/dev/null || true

    echo "  Done"
}

# ── main ─────────────────────────────────────────────────────────────

wait_healthy
cleanup

echo -e "\n=== Step 1: Create index ==="
api -X PUT "${EP}/${INDEX}" -d "{
  \"settings\": { \"number_of_shards\": 1 },
  \"mappings\": {
    \"properties\": {
      \"event_action\": { \"type\": \"keyword\" },
      \"timestamp\":    { \"type\": \"date\" }
    }
  },
  \"aliases\": { \"${ALIAS}\": {} }
}" -o /dev/null
echo "  OK: ${INDEX} (alias: ${ALIAS})"

echo -e "\n=== Step 2: Create SA field mappings ==="
api -X POST "${EP}/_plugins/_security_analytics/mappings" -d "{
  \"index_name\": \"${ALIAS}\",
  \"rule_topic\": \"${LOG_TYPE}\",
  \"partial\": true,
  \"alias_mappings\": {
    \"properties\": {
      \"action\": { \"path\": \"event_action\", \"type\": \"alias\" }
    }
  }
}" -o /dev/null
echo "  OK"

echo -e "\n=== Step 3: Create ${NUM_RULES} custom rules ==="
for i in $(seq -w 1 "$NUM_RULES"); do
    rule_id=$(api -X POST "${EP}/_plugins/_security_analytics/rules?category=${LOG_TYPE}" -d "
title: Test Rule ${i}
id: $(cat /proc/sys/kernel/random/uuid)
description: Matches action-${i}
status: test
level: high
author: test
date: 2026/01/01
logsource:
    category: ${LOG_TYPE}
detection:
    selection:
        action: action-${i}
    condition: selection
" | python3 -c "import sys,json; print(json.load(sys.stdin)['_id'])")
    echo "  Rule ${i} -> ${rule_id}"
done

echo -e "\n=== Step 4: Collect rule IDs ==="
RULE_IDS=$(api -X POST "${EP}/_plugins/_security_analytics/rules/_search?pre_packaged=false" \
    -d '{"query":{"match_all":{}},"size":100}' \
    | python3 -c "
import sys, json
hits = json.load(sys.stdin)['hits']['hits']
rules = [{'id': h['_id']} for h in hits if 'Test Rule' in h['_source'].get('title','')]
print(json.dumps(rules))
")
RULE_COUNT=$(echo "$RULE_IDS" | python3 -c "import sys,json; print(len(json.load(sys.stdin)))")
echo "  Rules collected: ${RULE_COUNT}"

echo -e "\n=== Step 5: Create detector with all ${RULE_COUNT} rules ==="
DET_ID=$(api -X POST "${EP}/_plugins/_security_analytics/detectors" -d "{
  \"name\": \"Percolate Bug Repro\",
  \"detector_type\": \"${LOG_TYPE}\",
  \"enabled\": true,
  \"schedule\": { \"period\": { \"interval\": 1, \"unit\": \"MINUTES\" } },
  \"inputs\": [{
    \"detector_input\": {
      \"description\": \"Repro for percolate size=10 bug\",
      \"indices\": [\"${ALIAS}\"],
      \"custom_rules\": ${RULE_IDS},
      \"pre_packaged_rules\": []
    }
  }],
  \"triggers\": []
}" | python3 -c "import sys,json; print(json.load(sys.stdin)['_id'])")
echo "  Detector: ${DET_ID}"

echo -e "\n=== Step 6: Wait ${MONITOR_INIT_WAIT}s for monitor to initialize ==="
for i in $(seq 1 "$MONITOR_INIT_WAIT"); do
    printf "\r  %ds / %ds" "$i" "$MONITOR_INIT_WAIT"
    sleep 1
done
echo ""

echo -e "\n=== Step 7: Ingest ${NUM_RULES} documents (one per rule) ==="
TS=$(date -u +%Y-%m-%dT%H:%M:%SZ)
BULK=""
for i in $(seq -w 1 "$NUM_RULES"); do
    BULK="${BULK}{\"index\":{}}\n{\"event_action\":\"action-${i}\",\"timestamp\":\"${TS}\"}\n"
done
echo -e "$BULK" | curl -s -X POST "${EP}/${ALIAS}/_bulk" \
    -H 'Content-Type: application/x-ndjson' --data-binary @- \
    | python3 -c "import sys,json; r=json.load(sys.stdin); print(f'  Errors: {r[\"errors\"]}, Items: {len(r[\"items\"])}')"

echo -e "\n=== Step 8: Wait ${FINDINGS_WAIT}s for detector execution ==="
for i in $(seq 1 "$FINDINGS_WAIT"); do
    printf "\r  %ds / %ds" "$i" "$FINDINGS_WAIT"
    sleep 1
done
echo ""

echo -e "\n=== Step 9: Results ==="
python3 -c "
import json, urllib.request

ep = '${EP}'
det_id = '${DET_ID}'
num_rules = ${NUM_RULES}

def get(path):
    req = urllib.request.Request(f'{ep}/{path}')
    return json.loads(urllib.request.urlopen(req).read())

# findings via SA findings API, filtered by detector
findings_resp = get(f'_plugins/_security_analytics/findings/_search?detector_id={det_id}')
findings = findings_resp.get('findings', [])

fired_rules = {}  # id -> name
for f in findings:
    for q in f.get('queries', []):
        fired_rules[q['id']] = q.get('name', '?')

# all rules created for this detector
det_resp = get(f'_plugins/_security_analytics/detectors/{det_id}')
detector = det_resp.get('detector', {})
custom_rule_ids = [r['id'] for inp in detector.get('inputs', [])
                   for r in inp.get('detector_input', {}).get('custom_rules', [])]

print(f'  Rules in detector:    {len(custom_rule_ids)}')
print(f'  Documents ingested:   {num_rules}')
print(f'  Findings produced:    {len(findings)}')
print(f'  Distinct rules fired: {len(fired_rules)}')
print()

missing_ids = [rid for rid in custom_rule_ids if rid not in fired_rules]
if missing_ids:
    print(f'  DROPPED ({len(missing_ids)} rules silently produced no findings):')
    for rid in missing_ids:
        print(f'    {rid}')
    print()
    print(f'  BUG CONFIRMED: expected {len(custom_rule_ids)} findings, got {len(findings)}')
else:
    print(f'  All rules produced findings (bug NOT reproduced)')
"

echo ""

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions