[BUG] Doc-Level Monitor Produces False Negatives When Detector Has More Than 10 Rules

## Summary

The doc-level monitor in the OpenSearch Alerting plugin uses `SearchSourceBuilder` with default `size=10` for the percolate query that matches ingested documents against detection rules. When a detector has more than 10 rules, **rules beyond the 10th match silently produce false negatives** — malicious events that should trigger findings are missed entirely. No error, warning, or log entry is emitted.

This is a **silent false negative** bug: the system accepts the rules, accepts the documents, and reports success — but simply never generates findings for the dropped rules. Operators have no indication that their detection coverage is incomplete.

## Security Consideration

This bug has direct security implications for any organization relying on OpenSearch Security Analytics for threat detection:

- **Undetected threats**: Any detector with more than 10 custom rules will have **blind spots**. Rules that appear correctly deployed and enabled will never fire, creating a false sense of coverage. An attacker whose activity matches one of the dropped rules will go undetected.

- **Silent degradation**: There is no alert, error, or health indicator that rules are being dropped. Security teams reviewing their detector configuration will see all rules listed and enabled — nothing suggests that some rules are non-functional. This makes the issue extremely difficult to discover through normal operations.

- **No compensating control**: The `_execute` API (manual detector execution) uses the same code path and has the same limitation, so it cannot be used to verify detection coverage either.

## Affected Components

- **Plugin**: `opensearch-alerting` (used by `opensearch-security-analytics`)
- **File**: `TransportDocLevelMonitorFanOutAction.kt`
- **Function**: `runPercolateQueryOnTransformedDocs()` (line ~985)
- **Versions tested**: OpenSearch 3.5.0 (the code has been this way since doc-level monitors were introduced — likely affects all versions)

## Root Cause

In `TransportDocLevelMonitorFanOutAction.kt`, lines 1012-1016:

```kotlin
val searchRequest =
    SearchRequest().indices(*queryIndices.toTypedArray()).preference(Preference.PRIMARY_FIRST.type())
val searchSourceBuilder = SearchSourceBuilder()
searchSourceBuilder.query(boolQueryBuilder)
searchRequest.source(searchSourceBuilder)
```

`SearchSourceBuilder()` defaults to `size=10` (inherited from `SearchSourceBuilder.DEFAULT_SIZE`). No `.size()` call is made, so the percolate search returns at most 10 hits. Each hit represents one matched **rule** (with `_percolator_document_slot` listing which documents matched that rule). The downstream code at lines 926-933 iterates over `response.hits` and builds the `docsToQueries` map — any rule beyond the 10th hit is never processed, producing a **false negative**.

## Proposed Fix

Add `.size()` to the percolate `SearchSourceBuilder`. The size should cover all rules in the queries index:

```kotlin
val searchSourceBuilder = SearchSourceBuilder()
searchSourceBuilder.query(boolQueryBuilder)
searchSourceBuilder.size(10000)  // or dynamically from queryIndex doc count
searchRequest.source(searchSourceBuilder)
```

## Impact

- **False negatives**: Rules beyond the 10th match silently produce no findings. Malicious activity matching these rules goes undetected.
- **Deterministic**: The same rules always produce false negatives (determined by `_id` sort order in the queries index — the last N rules by sort order are dropped when there are 10+N rules).
- **Affects both scheduled and `_execute` API**: The `_execute` endpoint uses the same `runPercolateQueryOnTransformedDocs` code path, so manual execution cannot be used to work around or verify the issue.
- **Any detector with >10 custom rules is affected**: This is not edge-case usage — production SIEM deployments commonly have dozens of rules per log type.

## Reproduction

A self-contained reproduction script is provided at [`reproduce_percolate_bug.sh`](reproduce_percolate_bug.sh). It runs against a local OpenSearch 3.5.0 container with no external dependencies.

```bash
./reproduce_percolate_bug.sh                          # default localhost:9200
./reproduce_percolate_bug.sh http://host:9200          # custom endpoint
```

The script performs the following:

1. Creates a simple index with one keyword field (`event_action`)
2. Registers SA field mappings (alias `action` → `event_action`)
3. Creates 12 custom rules, each matching a unique `action` value (`action-01` through `action-12`)
4. Creates a single detector with all 12 rules
5. Waits for the monitor to initialize (65s)
6. Ingests 12 documents — one per rule, each guaranteed to match exactly one rule
7. Waits for the detector to execute (90s)
8. Queries findings via the `_plugins/_security_analytics/findings/_search` API

### Expected result

```
Rules in detector:    12
Documents ingested:   12
Findings produced:    12
Distinct rules fired: 12

All rules produced findings (bug NOT reproduced)
```

### Actual result

```
Rules in detector:    12
Documents ingested:   12
Findings produced:    10
Distinct rules fired: 10

DROPPED (2 rules silently produced no findings):
  <rule_id_1>
  <rule_id_2>

BUG CONFIRMED: expected 12 findings, got 10
```

Two rules produce **false negatives**. The cap is always 10 regardless of how many rules match. Increasing to N rules (where N > 10) results in exactly N − 10 false negatives.

### script 

```
#!/usr/bin/env bash
# Reproduces the OpenSearch percolate size=10 bug.
# Creates 12 simple rules in a single detector — expects 12 findings, gets 10.
#
# Usage:
#   ./reproduce_percolate_bug.sh [OPENSEARCH_ENDPOINT]
#
# Default endpoint: http://localhost:9200

set -euo pipefail

EP="${1:-http://localhost:9200}"
INDEX="test-percolate-bug-000001"
ALIAS="test-percolate-bug"
LOG_TYPE="others_application"
NUM_RULES=12
MONITOR_INIT_WAIT=65
FINDINGS_WAIT=90

# ── helpers ──────────────────────────────────────────────────────────

api() { curl -sf -H 'Content-Type: application/json' "$@"; }

wait_healthy() {
    echo "Waiting for OpenSearch at ${EP}..."
    for _ in $(seq 1 30); do
        if api "${EP}/_cluster/health" -o /dev/null 2>/dev/null; then
            echo "  OK: $(api "${EP}" 2>/dev/null | python3 -c "import sys,json; v=json.load(sys.stdin)['version']; print(f\"{v['distribution']} {v['number']}\")")"
            return 0
        fi
        sleep 2
    done
    echo "ERROR: OpenSearch not reachable at ${EP}" >&2; exit 1
}

# ── clean slate ──────────────────────────────────────────────────────

cleanup() {
    echo -e "\n=== Cleanup ==="

    # detectors
    local det_ids
    det_ids=$(api -X POST "${EP}/_plugins/_security_analytics/detectors/_search" \
        -d '{"query":{"match_all":{}},"size":100}' 2>/dev/null \
        | python3 -c "import sys,json; [print(h['_id']) for h in json.load(sys.stdin).get('hits',{}).get('hits',[])]" 2>/dev/null) || true
    for did in $det_ids; do
        api -X DELETE "${EP}/_plugins/_security_analytics/detectors/${did}" -o /dev/null 2>/dev/null && echo "  Deleted detector ${did}" || true
    done

    # SA internal indices
    for pat in ".opensearch-sap-*-findings*" ".opensearch-sap-*-queries*" ".opensearch-sap-*-alerts*"; do
        curl -sf -X DELETE "${EP}/${pat}" -o /dev/null 2>/dev/null || true
    done

    # custom rules
    local rule_ids
    rule_ids=$(api -X POST "${EP}/_plugins/_security_analytics/rules/_search?pre_packaged=false" \
        -d '{"query":{"match_all":{}},"size":1000}' 2>/dev/null \
        | python3 -c "import sys,json; [print(h['_id']) for h in json.load(sys.stdin).get('hits',{}).get('hits',[])]" 2>/dev/null) || true
    for rid in $rule_ids; do
        api -X DELETE "${EP}/_plugins/_security_analytics/rules/${rid}?forced=true" -o /dev/null 2>/dev/null || true
    done

    # index
    curl -sf -X DELETE "${EP}/${INDEX}" -o /dev/null 2>/dev/null || true

    echo "  Done"
}

# ── main ─────────────────────────────────────────────────────────────

wait_healthy
cleanup

echo -e "\n=== Step 1: Create index ==="
api -X PUT "${EP}/${INDEX}" -d "{
  \"settings\": { \"number_of_shards\": 1 },
  \"mappings\": {
    \"properties\": {
      \"event_action\": { \"type\": \"keyword\" },
      \"timestamp\":    { \"type\": \"date\" }
    }
  },
  \"aliases\": { \"${ALIAS}\": {} }
}" -o /dev/null
echo "  OK: ${INDEX} (alias: ${ALIAS})"

echo -e "\n=== Step 2: Create SA field mappings ==="
api -X POST "${EP}/_plugins/_security_analytics/mappings" -d "{
  \"index_name\": \"${ALIAS}\",
  \"rule_topic\": \"${LOG_TYPE}\",
  \"partial\": true,
  \"alias_mappings\": {
    \"properties\": {
      \"action\": { \"path\": \"event_action\", \"type\": \"alias\" }
    }
  }
}" -o /dev/null
echo "  OK"

echo -e "\n=== Step 3: Create ${NUM_RULES} custom rules ==="
for i in $(seq -w 1 "$NUM_RULES"); do
    rule_id=$(api -X POST "${EP}/_plugins/_security_analytics/rules?category=${LOG_TYPE}" -d "
title: Test Rule ${i}
id: $(cat /proc/sys/kernel/random/uuid)
description: Matches action-${i}
status: test
level: high
author: test
date: 2026/01/01
logsource:
    category: ${LOG_TYPE}
detection:
    selection:
        action: action-${i}
    condition: selection
" | python3 -c "import sys,json; print(json.load(sys.stdin)['_id'])")
    echo "  Rule ${i} -> ${rule_id}"
done

echo -e "\n=== Step 4: Collect rule IDs ==="
RULE_IDS=$(api -X POST "${EP}/_plugins/_security_analytics/rules/_search?pre_packaged=false" \
    -d '{"query":{"match_all":{}},"size":100}' \
    | python3 -c "
import sys, json
hits = json.load(sys.stdin)['hits']['hits']
rules = [{'id': h['_id']} for h in hits if 'Test Rule' in h['_source'].get('title','')]
print(json.dumps(rules))
")
RULE_COUNT=$(echo "$RULE_IDS" | python3 -c "import sys,json; print(len(json.load(sys.stdin)))")
echo "  Rules collected: ${RULE_COUNT}"

echo -e "\n=== Step 5: Create detector with all ${RULE_COUNT} rules ==="
DET_ID=$(api -X POST "${EP}/_plugins/_security_analytics/detectors" -d "{
  \"name\": \"Percolate Bug Repro\",
  \"detector_type\": \"${LOG_TYPE}\",
  \"enabled\": true,
  \"schedule\": { \"period\": { \"interval\": 1, \"unit\": \"MINUTES\" } },
  \"inputs\": [{
    \"detector_input\": {
      \"description\": \"Repro for percolate size=10 bug\",
      \"indices\": [\"${ALIAS}\"],
      \"custom_rules\": ${RULE_IDS},
      \"pre_packaged_rules\": []
    }
  }],
  \"triggers\": []
}" | python3 -c "import sys,json; print(json.load(sys.stdin)['_id'])")
echo "  Detector: ${DET_ID}"

echo -e "\n=== Step 6: Wait ${MONITOR_INIT_WAIT}s for monitor to initialize ==="
for i in $(seq 1 "$MONITOR_INIT_WAIT"); do
    printf "\r  %ds / %ds" "$i" "$MONITOR_INIT_WAIT"
    sleep 1
done
echo ""

echo -e "\n=== Step 7: Ingest ${NUM_RULES} documents (one per rule) ==="
TS=$(date -u +%Y-%m-%dT%H:%M:%SZ)
BULK=""
for i in $(seq -w 1 "$NUM_RULES"); do
    BULK="${BULK}{\"index\":{}}\n{\"event_action\":\"action-${i}\",\"timestamp\":\"${TS}\"}\n"
done
echo -e "$BULK" | curl -s -X POST "${EP}/${ALIAS}/_bulk" \
    -H 'Content-Type: application/x-ndjson' --data-binary @- \
    | python3 -c "import sys,json; r=json.load(sys.stdin); print(f'  Errors: {r[\"errors\"]}, Items: {len(r[\"items\"])}')"

echo -e "\n=== Step 8: Wait ${FINDINGS_WAIT}s for detector execution ==="
for i in $(seq 1 "$FINDINGS_WAIT"); do
    printf "\r  %ds / %ds" "$i" "$FINDINGS_WAIT"
    sleep 1
done
echo ""

echo -e "\n=== Step 9: Results ==="
python3 -c "
import json, urllib.request

ep = '${EP}'
det_id = '${DET_ID}'
num_rules = ${NUM_RULES}

def get(path):
    req = urllib.request.Request(f'{ep}/{path}')
    return json.loads(urllib.request.urlopen(req).read())

# findings via SA findings API, filtered by detector
findings_resp = get(f'_plugins/_security_analytics/findings/_search?detector_id={det_id}')
findings = findings_resp.get('findings', [])

fired_rules = {}  # id -> name
for f in findings:
    for q in f.get('queries', []):
        fired_rules[q['id']] = q.get('name', '?')

# all rules created for this detector
det_resp = get(f'_plugins/_security_analytics/detectors/{det_id}')
detector = det_resp.get('detector', {})
custom_rule_ids = [r['id'] for inp in detector.get('inputs', [])
                   for r in inp.get('detector_input', {}).get('custom_rules', [])]

print(f'  Rules in detector:    {len(custom_rule_ids)}')
print(f'  Documents ingested:   {num_rules}')
print(f'  Findings produced:    {len(findings)}')
print(f'  Distinct rules fired: {len(fired_rules)}')
print()

missing_ids = [rid for rid in custom_rule_ids if rid not in fired_rules]
if missing_ids:
    print(f'  DROPPED ({len(missing_ids)} rules silently produced no findings):')
    for rid in missing_ids:
        print(f'    {rid}')
    print()
    print(f'  BUG CONFIRMED: expected {len(custom_rule_ids)} findings, got {len(findings)}')
else:
    print(f'  All rules produced findings (bug NOT reproduced)')
"

echo ""

```



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Doc-Level Monitor Produces False Negatives When Detector Has More Than 10 Rules #1656

Summary

Security Consideration

Affected Components

Root Cause

Proposed Fix

Impact

Reproduction

Expected result

Actual result

script

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] Doc-Level Monitor Produces False Negatives When Detector Has More Than 10 Rules #1656

Description

Summary

Security Consideration

Affected Components

Root Cause

Proposed Fix

Impact

Reproduction

Expected result

Actual result

script

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions