Add new query level trigger evaluation that doesn't depend on ScriptingService or _scripts API by eirsep · Pull Request #2079 · opensearch-project/alerting

eirsep · 2026-04-01T15:03:13Z

Adds remote query-level trigger evaluation behind the plugins.alerting.multi_tenant_trigger_eval_enabled feature flag (default: false).

When enabled, query-level trigger scripts are evaluated remotely via filter aggregations on the data index instead of locally via ScriptService. This
rewrites Painless trigger conditions (e.g. ctx.results[0].hits.total.value > 5) into filter aggregations, batches all triggers into a single _search request
with size: 0, and determines trigger state from doc_count. On evaluation error, triggers default to triggered=true (fail-open) so users are notified.

When disabled (default), the existing ScriptService evaluation path is completely unchanged.

Changes:

AlertingSettings / AlertingPlugin / MonitorRunnerExecutionContext / MonitorRunnerService — register and wire the new boolean cluster setting
TriggerScriptRewriter — rewrites ctx.results[0] → params.results_0 in trigger scripts
RemoteQueryLevelTriggerEvaluator — builds filter-agg search request, parses response into QueryLevelTriggerRunResult
QueryLevelMonitorRunner — branches on feature flag: remote eval path vs existing ScriptService path
AlertingRestTestCase — adds includeDefaults param to getSettings() helper
10 integration tests covering: simple threshold, aggregation value, boolean logic (AND/OR), loop over hits, multiple triggers, script error (fail-open),
large response, dry run
3 regression tests verifying flag=false preserves existing behavior

engechas

No blocking comments but it'd be great to have the test coverage include all existing cases for the non-remote monitor equivalents. See comment on parametrized test

engechas · 2026-04-01T20:27:07Z

alerting/src/test/kotlin/org/opensearch/alerting/RemoteQueryLevelTriggerIT.kt

+ * These tests verify that trigger scripts are correctly evaluated remotely via filter aggregations
+ * on the customer's cluster instead of locally via ScriptService.
+ */
+class RemoteQueryLevelTriggerIT : AlertingRestTestCase() {


Is there a way we can run the full set of QueryLevelTriggerIT with and without the remote flag enabled? Something similar to Junit's @ParametrizedTest annotation. Since the input/output should be the same for both modes, that sort of test would give confidence that the correctness is fully intact in the new mode

The OpenSearch test framework uses JUnit 4 (via RandomizedRunner), not JUnit 5, so @ParameterizedTest isn't available. I have added regression tests in this same class where I toggle the setting on and off and compare the results and validate that they are the same.

engechas · 2026-04-01T20:30:52Z

alerting/src/main/kotlin/org/opensearch/alerting/QueryLevelMonitorRunner.kt

+            monitorResult.inputResults.results.isNotEmpty()
+        ) {
+            val searchInput = monitor.inputs[0] as SearchInput
+            val queryLevelTriggers = monitor.triggers.filterIsInstance<QueryLevelTrigger>()


Is it possible for a trigger type to be non-QueryLevelTrigger if the monitor type is QueryLevel?

Wondering why the filterIsInstance is needed/if it excludes any valid triggers

well ideologically it can't be anything but query level triggers but just for sanity leaving this here to guardrail against missing this translation fi we add any newer trigger types.

I added this because there is a possilibity we need to support a DataFusion engine instead of lucene engine and they have some limitations which might force us to introduce new triggers.

alerting/src/main/kotlin/org/opensearch/alerting/QueryLevelMonitorRunner.kt

engechas · 2026-04-01T20:34:28Z

alerting/src/main/kotlin/org/opensearch/alerting/util/TriggerScriptRewriter.kt

+ * They are sent to the customer's cluster via a filter aggregation.
+ * The script context on the customer's cluster uses `params` instead of `ctx`,


nit: customer -> user

Register new boolean cluster setting to gate multi-tenant trigger evaluation code. Setting is dynamic, node-scoped, defaults to false. - Define MULTI_TENANT_TRIGGER_EVAL_ENABLED in AlertingSettings - Register in AlertingPlugin.getSettings() - Add multiTenantTriggerEvalEnabled field to MonitorRunnerExecutionContext - Wire initial read + dynamic update consumer in MonitorRunnerService - Add unit tests for default value and registration 🤖 Assisted by the code-assist SOP Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

…luator Add TriggerScriptRewriter that replaces ctx.results[0] with params.results_0 for remote Painless evaluation on customer's cluster. Add RemoteQueryLevelTriggerEvaluator that builds a filter-agg search request with one agg per trigger, sends it to the customer's cluster, and parses doc_count to determine trigger results. Multiple triggers are batched in a single request. Errors default to triggered (fail-open). 🤖 Assisted by the code-assist SOP Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

When multiTenantTriggerEvalEnabled is true and monitor type is QUERY_LEVEL_MONITOR, batch-evaluate all triggers via RemoteQueryLevelTriggerEvaluator before the trigger loop. Results are looked up per trigger from the pre-computed map. When flag is false, existing ScriptService-based evaluation path is completely unchanged. 🤖 Assisted by the code-assist SOP Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

Add comprehensive integration tests for multi-tenant query-level trigger evaluation covering all trigger script patterns from the design doc: - Simple threshold (hits.total.value comparison) - Aggregation value (avg agg comparison) - Boolean logic (AND and OR conditions) - Loop over hits (for loop checking _source fields) - Multiple triggers (mixed fire/no-fire) - Script error (malformed Painless triggers on error) - Large response (50 docs, non-trivial response) - Dry run (verify no alerts persisted) Each test enables the multi_tenant_trigger_eval_enabled flag, creates an index with test data, builds a monitor with the appropriate trigger script, executes via the _execute API, and verifies trigger results. Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

Add 3 integration tests to RemoteQueryLevelTriggerIT verifying that existing query-level alerting behavior is preserved when the multi_tenant_trigger_eval_enabled flag is disabled: - test query level trigger flag disabled: monitor executes correctly via ScriptService path when flag is off (default) - test query level trigger flag default is false: verify setting defaults to false via cluster settings REST API - test query level trigger toggle flag during execution: both ScriptService and remote eval paths produce correct results when flag is toggled between executions Also adds includeDefaults param to AlertingRestTestCase.getSettings() helper to support querying default cluster settings. 🤖 Assisted by the code-assist SOP Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

…igger_ Replace internal jargon in the filter aggregation key prefix used by RemoteQueryLevelTriggerEvaluator. The prefix _query_trigger_ better describes its scope (query-level monitor triggers). Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

On evaluation error or missing trigger results, default to triggered=false instead of triggered=true. Errors are still logged at WARN/ERROR level and surfaced in the trigger run result. This prevents false-positive alerts when the remote evaluation encounters issues. Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

Use open source terminology consistently throughout the codebase. Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

AWSHurneyt · 2026-04-02T16:27:34Z

alerting/src/main/kotlin/org/opensearch/alerting/trigger/RemoteQueryLevelTriggerEvaluator.kt

+        val searchSource = buildEvalSearchSource(triggerData, searchResponse)
+
+        return try {
+            val evalRequest = SearchRequest(*indices.toTypedArray()).source(searchSource)


Not blocking:
Do we want to make use of the plugins.alerting.request_timeout or one of our other timeout-related cluster settings here?

riysaxen-amzn · 2026-04-02T16:53:50Z

alerting/src/main/kotlin/org/opensearch/alerting/QueryLevelMonitorRunner.kt

+            Monitor.MonitorType.valueOf(monitor.monitorType.uppercase(Locale.ROOT)) == Monitor.MonitorType.QUERY_LEVEL_MONITOR &&
+            monitorResult.inputResults.results.isNotEmpty()
+        ) {
+            val searchInput = monitor.inputs[0] as SearchInput


nit: Query-level monitors should always have SearchInput, a ClassCastException here would be unhandled

eirsep requested review from AWSHurneyt, amsiglan, bowenlan-amzn, engechas, getsaurabh02, goyamegh, jowg-amazon, lezzago, praveensameneni, rishabhmaurya, riysaxen-amzn, sbcd90 and toepkerd as code owners April 1, 2026 15:03

eirsep force-pushed the feat/multi-tenant-trigger-eval branch from 45c682c to c1faf0b Compare April 1, 2026 18:34

engechas approved these changes Apr 1, 2026

View reviewed changes

eirsep added 8 commits April 2, 2026 00:55

docs: Replace 'customer' with 'user' in code comments

007a4a7

Use open source terminology consistently throughout the codebase. Signed-off-by: Surya Sashank Nistala <snistala@amazon.com>

eirsep force-pushed the feat/multi-tenant-trigger-eval branch from f7b83de to 007a4a7 Compare April 2, 2026 07:55

AWSHurneyt approved these changes Apr 2, 2026

View reviewed changes

AWSHurneyt reviewed Apr 2, 2026

View reviewed changes

riysaxen-amzn reviewed Apr 2, 2026

View reviewed changes

riysaxen-amzn approved these changes Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new query level trigger evaluation that doesn't depend on ScriptingService or _scripts API #2079

Add new query level trigger evaluation that doesn't depend on ScriptingService or _scripts API #2079
eirsep wants to merge 8 commits intoopensearch-project:mainfrom
eirsep:feat/multi-tenant-trigger-eval

eirsep commented Apr 1, 2026

Uh oh!

engechas left a comment

Uh oh!

engechas Apr 1, 2026

Uh oh!

eirsep Apr 2, 2026

Uh oh!

engechas Apr 1, 2026

Uh oh!

eirsep Apr 2, 2026

Uh oh!

Uh oh!

engechas Apr 1, 2026

Uh oh!

AWSHurneyt Apr 2, 2026

Uh oh!

riysaxen-amzn Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

		* They are sent to the customer's cluster via a filter aggregation.
		* The script context on the customer's cluster uses `params` instead of `ctx`,

Conversation

eirsep commented Apr 1, 2026

Uh oh!

engechas left a comment

Choose a reason for hiding this comment

Uh oh!

engechas Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

eirsep Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

engechas Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

eirsep Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

engechas Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

AWSHurneyt Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

riysaxen-amzn Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants