Skip to content

[sec-core] bug(sec-core): prompt_scan fails on every scan when ML model not downloaded (no graceful degradation) #790

Description

@jfeng18

Summary

On ECS, every prompt_scan event fails (result=failed) because the Llama-Prompt-Guard-2 model was never downloaded, yet MLClassifier is treated as a mandatory, always-available detector. The scanner then raises at scan time on every prompt, leaving prompts effectively unscanned.

agent-sec-cli events --event-type prompt_scan on a fresh ECS shows all events failed with error_type=ModelLoadError.

Root Cause

MLClassifier (L2) does not override is_available(); it inherits DetectionLayer.is_available() which unconditionally returns True (detectors/base.py). So the scanner believes L2 is available even when the model files are absent. At scan time, detect()classify()ModelManager._resolve_local_model_path() raises ModelLoadError ("Model not available locally, run scan-prompt warmup").

Because ml_classifier is not in _OPTIONAL_DETECTORS (scanner.py), there is no graceful degradation — the whole scan errors out instead of falling back to L1 (regex).

Note: torch/transformers ARE installed; only the model files are missing. This is distinct from #698/#680 (which change the cosh hook's fail-open→fail-ask behavior but do not fix the underlying model-availability gap).

How to Reproduce on ECS

# Ensure model is not downloaded
rm -rf ~/.cache/prompt_scanner/models

# Standard mode (L1+L2) on any prompt
agent-sec-cli scan-prompt --mode standard --format text --text "test"

# Result: verdict=error, "Detector 'ml_classifier' is not available" / ModelLoadError
# Every prompt_scan event recorded as failed.

Suggested Fix

Two coupled changes (must land together):

  1. MLClassifier.is_available() override — return False when torch/transformers missing OR the model is not downloaded (reuse a new ModelManager.is_model_downloaded() predicate), mirroring SemanticDetector's override pattern.

  2. Add ml_classifier to _OPTIONAL_DETECTORS — so when L2 is unavailable, the scanner skips it and degrades to L1 (regex) with a warning, instead of erroring on every scan.

Why coupled: changing only #1 makes things worse — is_available()=False on a mandatory detector raises LayerNotAvailableError in the constructor (eager crash) instead of the current lazy scan-time error.

Also: PromptScanner.warmup() must bypass the is_available() gate (build detectors directly from config), otherwise warmup can never download a model that is currently unavailable (chicken-and-egg).

Impact

Until fixed, prompt injection scanning is non-functional on any host where scan-prompt warmup was never run — which is the default after install (no install-time or daemon auto-warmup exists). The security layer silently does nothing.

Metadata

Metadata

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions