feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Pouyanpi · 2025-12-02T16:42:22Z

Description

Detect user input language and return refusal messages in the same language when content safety rails block unsafe content. Supports 9 languages: English, Spanish, Chinese, German, French, Hindi, Japanese, Arabic, and Thai.

Language Detection Benchmark Results

Datasets Used

Dataset	Description	Samples	Languages
papluca	language-identification	40,500	9 (all supported)
nemotron	NVIDIA Nemotron-Safety-Guard-Dataset-v3	336,283	8 (missing zh*)

Chinese samples in Nemotron are all REDACTED; Chinese coverage validated via papluca dataset.

Prompt Length Analysis (characters)

Dataset	Min	Max	Mean	P25	P50	P75	P90	P95	P99
papluca	2	3,657	129.8	50	97	162	258	351	627
nemotron	1	20,750	303.5	51	111	331	625	1,072	3,004

Note: fast-langdetect truncates input at 80 characters by default (max_input_length=80), so longer prompts are effectively evaluated on their first 80 chars.

Overall Accuracy comparison

Dataset	Samples	fast-langdetect	lingua	detect_language action
papluca	40,500	99.71%	99.79%	99.71%
nemotron	336,283	99.35%	99.46%	99.42%

Latency comparison (μs)

Dataset	fast-langdetect Avg	fast-langdetect P95	lingua Avg	lingua P95	Action Avg	Action P95
papluca	12.12	15.54	116.21	205.29	25.77	28.75
nemotron	11.53	15.50	162.59	377.92	26.25	28.71

Per Language Accuracy (fast-langdetect)

Language	papluca	nemotron
ar (Arabic)	98.87%	99.63%
de (German)	99.93%	99.39%
en (English)	100.00%	99.03%
es (Spanish)	100.00%	99.04%
fr (French)	99.98%	99.25%
hi (Hindi)	98.76%	99.60%
ja (Japanese)	100.00%	99.61%
th (Thai)	99.93%	99.29%
zh (Chinese)	99.93%	N/A

Per-Language Accuracy (lingua)

Language	papluca	nemotron
ar (Arabic)	99.84%	99.75%
de (German)	100.00%	99.55%
en (English)	99.93%	99.00%
es (Spanish)	99.98%	99.43%
fr (French)	99.82%	99.35%
hi (Hindi)	98.80%	99.81%
ja (Japanese)	100.00%	99.69%
th (Thai)	99.78%	99.12%
zh (Chinese)	99.93%	N/A

Why fast-langdetect?

https://github.com/LlmKira/fast-langdetect

MIT license and Creative Commons Attribution-Share-Alike License 3.0.
comparable accuracy: within 0.1-0.5% of lingua across all datasets (99.35% vs 99.46% on 336k samples)
10-14x faster: average latency ~12μs vs ~140μs
simpler integration: single lightweight dependency
no cold start issues: unlike lingua which requires model building
no dependency issue in future

Error analysis

Most errors occur with:

short text (single words): insufficient context for detection
mixed language content: text containing English within non-English context
similar language confusion: Spanish vs Galician, Hindi vs Marathi, Arabic vs Persian

The action correctly falls back to English (en) for unsupported detected languages.

Benchmark Scripts

checkout to temp/lang-detect-benchmark branch

Located in eval/language_detection/:
make sure to have datasets and pandas installed:

poetry run pip install pandas datasets

# run all benchmarks
poetry run python eval/language_detection/run_benchmarks.py

# Or run individually
poetry run python eval/language_detection/benchmark.py --dataset papluca --mode action --report eval/language_detection/reports/

codecov · 2025-12-02T16:49:30Z

Codecov Report

❌ Patch coverage is 94.73684% with 3 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
nemoguardrails/library/content_safety/actions.py	93.75%	3 Missing ⚠️

📢 Thoughts on this report? Let us know!

cparisien · 2025-12-03T14:25:34Z

nemoguardrails/library/content_safety/actions.py

+DEFAULT_REFUSAL_MESSAGES: Dict[str, str] = {
+    "en": "I'm sorry, I can't respond to that.",
+    "es": "Lo siento, no puedo responder a eso.",
+    "zh": "抱歉，我无法回应。",
+    "de": "Es tut mir leid, darauf kann ich nicht antworten.",
+    "fr": "Je suis désolé, je ne peux pas répondre à cela.",
+    "hi": "मुझे खेद है, मैं इसका जवाब नहीं दे सकता।",
+    "ja": "申し訳ありませんが、それには回答できません。",
+    "ar": "عذراً، لا أستطيع الرد على ذلك.",
+    "th": "ขออภัย ฉันไม่สามารถตอบได้",
+}


If we later had other multilingual rails, would we be repeating this mechanism in each rail? Or just the set of supported languages per rail? I don't think we need to do it now (since we don't have other multilingual rails to test it), but we should be aware of what refactoring would be needed to move the below language detection to a shared level.

Of course, we can relax this constraint later and allow users more flexibility. Once we need to support other models or other types of rails (beyond content safety) that require multilingual responses, we can:

Move the detect_language action from library/content_safety/actions.py to a shared location (nemoguardrails/actions/) making it available to all rails

It was possible to also introduce a Colang level abstraction like bot refuse to respond $multilang=true, could be done easily for Colang 2.0, but I think it is better if we don't add new Colang features for now.

I agree, for now, keeping it scoped to content safety keeps the implementation focused.

cparisien · 2025-12-03T14:40:42Z

nemoguardrails/library/content_safety/actions.py

+    try:
+        from fast_langdetect import detect
+
+        result = detect(text, k=1)


Does fast-langdetect ever return a full locale with dialect, like en-US versus en? I don't see it in the docs, but I do see some upper/lowercase inconsistency.

fair point, thanks for raising it. Just took a closer look: the fast-langdetect source code and fastText model behavior.

fast-langdetect README mentions BCP-47 tags like "zh-cn", "pt-br"

but the fastText lid.176.bin model uses simple ISO 639 codes: zh, pt, en, etc.

fast-langdetect source simply strips __label__ prefix from fastText output, no regional mapping is applied

validated with actual test:

>>> detect("抱歉，我无法处理该请求", k=2) [{'lang': 'zh', 'score': 0.80}, {'lang': 'ta', 'score': 0.08}]

returns "zh", NOT "zh-cn".

So no regional variant handling needed.

tgasser-nv · 2025-12-03T14:59:47Z

This looks really good @Pouyanpi ! I have a few comments:

Could you commit the evaluation scripts as well in the final PR (for reproducibility?)
What does the "Action" column in the latency report refer to? Is this the latency end-to-end when fast-langdetect embedded in a Guardrails action? It approximately doubles the mean and p95.
* Is it possible to customize refusal texts in Colang-only, or does it need a Python change? Just saw this is in the RailsConfig, that's perfect.
Could you calculate percentiles of prompt-length (ideally in tokens but characters is fine too) for each of the datasets?

Not needed in this PR, but I'm thinking of RAG prompts where we have LLM instructions, user query, and relevant context chunks are all in a flattened prompt. These prompts can be pretty long (up to 7k tokens in some cases). This isn't needed for this PR, but I would be interested in a follow-on where we sample part of a prompt before running classification on the sample (e.g. 200 chars). This would be an optional config field. Customers would then have a knob to trade off accuracy vs latency for language detection.

Pouyanpi · 2025-12-10T12:41:25Z

Could you commit the evaluation scripts as well in the final PR (for reproducibility?)

I've included them temp/lang-detect-benchmark branch to make review easier. If you find it easier I will do.
But we don't intend to merge those in develop, right?

What does the "Action" column in the latency report refer to? Is this the latency end-to-end when fast-langdetect embedded in a Guardrails action? It approximately doubles the mean and p95.

Yes

* Is it possible to customize refusal texts in Colang-only, or does it need a Python change? Just saw this is in the RailsConfig, that's perfect.

Yes, I would like to avoid adding colang level features as much as possible

Could you calculate percentiles of prompt-length (ideally in tokens but characters is fine too) for each of the datasets?

Done! updated the description.

Not needed in this PR, but I'm thinking of RAG prompts where we have LLM instructions, user query, and relevant context chunks are all in a flattened prompt. These prompts can be pretty long (up to 7k tokens in some cases). This isn't needed for this PR, but I would be interested in a follow-on where we sample part of a prompt before running classification on the sample (e.g. 200 chars). This would be an optional config field. Customers would then have a knob to trade off accuracy vs latency for language detection.

fast-langdetect already does the truncation by default but indeed we can give that flexibility to the users :

max_input_length=80 characters (configurable)

tgasser-nv · 2025-12-10T14:37:26Z

Could you commit the evaluation scripts as well in the final PR (for reproducibility?)

I've included them temp/lang-detect-benchmark branch to make review easier. If you find it easier I will do. But we don't intend to merge those in develop, right?

Why wouldn't we merge them into develop? It's best practice in ML to make any results reproducible, for which we need the input datasets and scripts. The datasets are public and linked above. I'd imagine we'll have to re-run evals for new languages as they're added to the content-safety and other models. So we'll run this script periodically.

What does the "Action" column in the latency report refer to? Is this the latency end-to-end when fast-langdetect embedded in a Guardrails action? It approximately doubles the mean and p95.

Yes

Was that measured at a concurrency of 1? Having a 100% overhead for each language inference is a lot higher than I'd expect. We don't need to fix it in this PR.

* Is it possible to customize refusal texts in Colang-only, or does it need a Python change? Just saw this is in the RailsConfig, that's perfect.

Yes, I would like to avoid adding colang level features as much as possible

+1

Could you calculate percentiles of prompt-length (ideally in tokens but characters is fine too) for each of the datasets?

Done! updated the description.

Could you check? I didn't see any length description.

Not needed in this PR, but I'm thinking of RAG prompts where we have LLM instructions, user query, and relevant context chunks are all in a flattened prompt. These prompts can be pretty long (up to 7k tokens in some cases). This isn't needed for this PR, but I would be interested in a follow-on where we sample part of a prompt before running classification on the sample (e.g. 200 chars). This would be an optional config field. Customers would then have a knob to trade off accuracy vs latency for language detection.

fast-langdetect already does the truncation by default but indeed we can give that flexibility to the users :

max_input_length=80 characters (configurable)

Could you add optional Pydantic fields for any of these values it makes sense to expose to users? Looking at the config I think normalize_input, max_input_length, and model are all fields users might care about

…age support Detect user input language and return refusal messages in the same language when content safety rails block unsafe content. Supports 9 languages: English, Spanish, Chinese, German, French, Hindi, Japanese, Arabic, and Thai.

fix

Add configurable parameters for language detection: - max_text_length: Control maximum input text length for detection - normalize_text: Toggle text normalization before detection - cache_dir: Specify custom cache directory for detection models Updated MultilingualConfig with new optional fields and modified _detect_language to use LangDetector with custom configuration instead of the simple detect function.

Pouyanpi · 2025-12-12T12:18:25Z

Thanks @tgasser-nv! Yes you're right, here is how it looks (didn't save the change):

I've extended the configuration but I'm not sure if we actually need to do it (I think better to revert its commit).
most users will just do multilingual.enabled: true and use library defaults. The config options (max_text_length, normalize_text, cache_dir) are edge cases.

Do we actually need these config options exposed?

cache_dir -> users can set FTLANG_CACHE env var
normalize_text -> library default (true) is almost always correct
max_text_length -> library default (80) is optimized for accuracy

Maybe the cleanest solution is: don't expose these options at all and keep _detect_language(text) simple.

If a power user really needs custom settings, they can:

Use FTLANG_CACHE env var for cache location
Or we add config options later when there's a real need

the model field looks library specific and we are better of not including it in the config. I found cache_dir might be interesting but we can document FTLANG_CACHE. Do you think we should keep the new config options or revert?

Regarding the evaluation scripts and datasets: let's address that in a follow-up PR to keep this one focused.
I think it's better to keep this type of analysis work in dedicated branches for now.

We can follow up in separate PRs once we establish a clear pattern for where these should live (e.g., scripts/benchmarks/, eval/, etc.) and how they should be maintained.

greptile-apps · 2025-12-12T12:21:35Z

Greptile Overview

Greptile Summary

This PR adds automatic language detection to content safety refusal messages, returning responses in the user's detected language across 9 supported languages (English, Spanish, Chinese, German, French, Hindi, Japanese, Arabic, Thai).

Key Changes:

Added detect_language action using fast-langdetect library for language detection with 99%+ accuracy
Integrated language detection into content safety flows (both v1 and v2) with opt-in configuration
Created configurable MultilingualConfig with customizable refusal messages, text normalization, and caching options
Added comprehensive test coverage including edge cases, error handling, and configuration validation
Provided example configuration demonstrating the feature

Implementation Quality:

Proper error handling with graceful fallbacks (ImportError, detection failures default to English)
Optional dependency properly configured in pyproject.toml with multilingual extras
Well-documented configuration fields with clear defaults
Consistent implementation across both flow versions (flows.co and flows.v1.co)
No breaking changes - feature is opt-in via configuration

Confidence Score: 5/5

This PR is safe to merge with minimal risk
The implementation demonstrates excellent software engineering practices with comprehensive test coverage (99%+ accuracy benchmarking), proper error handling with fallbacks, graceful degradation when dependencies are missing, and clear configuration design. The feature is opt-in with no breaking changes, and the library choice (fast-langdetect) is well-justified with benchmarking data showing 10-14x performance improvement over alternatives while maintaining comparable accuracy.
No files require special attention

Important Files Changed

File Analysis

Filename	Score	Overview
nemoguardrails/library/content_safety/actions.py	5/5	Added language detection action with proper error handling, fallbacks, and integration with config
nemoguardrails/library/content_safety/flows.co	5/5	Integrated multilingual refusal messages into content safety flows for both input and output checks
nemoguardrails/rails/llm/config.py	5/5	Added well-structured configuration models for multilingual content safety with clear documentation
pyproject.toml	5/5	Added optional fast-langdetect dependency with proper extras configuration
tests/test_content_safety_actions.py	5/5	Comprehensive test coverage for language detection, refusal messages, and edge cases

Sequence Diagram

sequenceDiagram
    participant User
    participant ContentSafetyFlow
    participant ContentSafetyCheck
    participant Config
    participant DetectLanguageAction
    participant LangDetector
    participant Bot

    User->>ContentSafetyFlow: Input message
    ContentSafetyFlow->>ContentSafetyCheck: Check safety (input/output)
    ContentSafetyCheck-->>ContentSafetyFlow: {allowed: false, policy_violations: [...]}
    
    alt multilingual enabled
        ContentSafetyFlow->>Config: Check multilingual.enabled
        Config-->>ContentSafetyFlow: enabled=true
        ContentSafetyFlow->>DetectLanguageAction: detect_language(user_message)
        DetectLanguageAction->>Config: Get multilingual config
        Config-->>DetectLanguageAction: custom_messages, max_text_length, normalize_text, cache_dir
        DetectLanguageAction->>LangDetector: detect(text)
        LangDetector-->>DetectLanguageAction: detected_lang (or None)
        DetectLanguageAction->>DetectLanguageAction: Fallback to 'en' if None or unsupported
        DetectLanguageAction->>DetectLanguageAction: Get refusal message (custom or default)
        DetectLanguageAction-->>ContentSafetyFlow: {language: lang, refusal_message: message}
        ContentSafetyFlow->>Bot: Send refusal_message
    else multilingual disabled
        ContentSafetyFlow->>Bot: Send default "refuse to respond"
    end
    
    Bot-->>User: Refusal message (in detected language)

greptile-apps

_{9 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

greptile-apps

_{9 files reviewed, no comments}

_{Edit Code Review Agent Settings | Greptile}

cparisien reviewed Dec 3, 2025

View reviewed changes

Pouyanpi added 4 commits December 12, 2025 12:16

add tests

011ee98

fix

fix

8330dc7

Pouyanpi force-pushed the feat/multilang-content-safety branch from 1a6ea08 to 4339ac2 Compare December 12, 2025 12:09

update poetry lock

a8ed974

Pouyanpi marked this pull request as ready for review December 12, 2025 12:18

Pouyanpi marked this pull request as draft December 12, 2025 12:18

greptile-apps bot reviewed Dec 12, 2025

View reviewed changes

chore(deps): add fast-langdetect dependency to dev

88a3e9d

Pouyanpi marked this pull request as ready for review December 12, 2025 12:23

Pouyanpi self-assigned this Dec 12, 2025

Pouyanpi added this to the 0.20.0 milestone Dec 12, 2025

Pouyanpi added the enhancement New feature or request label Dec 12, 2025

greptile-apps bot reviewed Dec 12, 2025

View reviewed changes

Pouyanpi mentioned this pull request Dec 12, 2025

docs: add multilingual refusal messages documentation #1541

Open

Pouyanpi requested review from cparisien and tgasser-nv December 12, 2025 15:52

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Are you sure you want to change the base?

feat(content_safety): add support to auto select multilingual refusal bot messages #1530

Conversation

Pouyanpi commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Language Detection Benchmark Results

Datasets Used

Prompt Length Analysis (characters)

Overall Accuracy comparison

Latency comparison (μs)

Per Language Accuracy (fast-langdetect)

Per-Language Accuracy (lingua)

Error analysis

Benchmark Scripts

Uh oh!

codecov bot commented Dec 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

cparisien Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

cparisien Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Pouyanpi Dec 10, 2025

Choose a reason for hiding this comment

Uh oh!

tgasser-nv commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Pouyanpi commented Dec 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tgasser-nv commented Dec 10, 2025

Uh oh!

Pouyanpi commented Dec 12, 2025

Uh oh!

greptile-apps bot commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Overview

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Sequence Diagram

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

greptile-apps bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Pouyanpi commented Dec 2, 2025 •

edited

Loading

codecov bot commented Dec 2, 2025 •

edited

Loading

tgasser-nv commented Dec 3, 2025 •

edited

Loading

Pouyanpi commented Dec 10, 2025 •

edited

Loading

greptile-apps bot commented Dec 12, 2025 •

edited

Loading