Extend privacy-guard masking from tool results to free-text user prompts (RFC + validation plan)

## Problem

The privacy-guard plugin currently masks PII only in **structured tool result payloads** before they reach the LLM, then restores the original values after the turn. **Free-text user prompts are not processed.** If a user writes

> *"What should we pay Anna Schmidt (32, lives at Bahnhofstr. 5, 60311 Frankfurt) given her current salary of €72,000?"*

every one of those identifiers reaches the LLM verbatim. This is a privacy hole in a feature that is otherwise a core value prop.

## What we can reuse (restore is already solved)

The hard parts of the existing implementation are generic and reusable for prompts:

- `«TYPE_N»`-token format and per-turn `TokenizeMap`
- Restore pass on the LLM response
- System-prompt directive that teaches the LLM about tokens (with the Token-Storm fallback)
- Three-tier allowlist hierarchy
- Detector registry with confidence-based span deduplication and word-boundary extension

**The new work is exclusively detecting PII spans in unstructured text.** That is fundamentally harder than schema-driven JSON masking because:

- no schema → spans must be detected probabilistically;
- coreference matters → restored text must preserve LLM utility;
- the failure mode is asymmetric: a missed span = privacy leak, over-masking = token-soup prompt the LLM can't answer.

## Approach landscape (ranked by integration value for omadia)

Detection ≠ substitution. Substitution is mostly solved by the existing token map; detection is the actual gap. Ranking criterion = how worth pursuing for omadia.

| # | Approach (layer) | Pro | Con | Why this rank |
| --- | --- | --- | --- | --- |
| 1 | **Hybrid detector ensemble on the prompt** — dedicated PII transformer (GLiNER-PII / Piiranha) added to existing Regex/Presidio/LLM detectors, confidence-reconciled (Detection) | Closes the free-text recall gap; uses existing detector registry + dedup; on-prem | More latency/engineering; locale tuning + threshold calibration needed | Best quality-per-effort: fits omadia's architecture, directly addresses the gap, no new trust boundary |
| 2 | **Local-LLM pass for contextual / quasi-identifiers** (Detection) — already present via Ollama detector | Only layer that catches context-only PII ("my daughter who broke her leg last summer"); semantic | Latency, nondeterminism, hallucination/miss; not guarantee-able | The only thing catching context-only PII that NER/regex structurally cannot; must stay as complement, not base |
| 3 | **Consistent realistic surrogates** (Faker-style, LangChain PresidioReversibleAnonymizer pattern) instead of opaque tokens (Substitution) | Preserves fluency + coreference → better LLM answer; defuses "token storm" | Surrogate can collide with real text; restore needs exact uniqueness | Substitution-layer upgrade with high leverage on response quality, but a refinement, not a closing of the gap |
| 4 | **Opaque placeholder tokens + turn map** (current) extended to prompts (Substitution) | Already built, proven, leaks nothing; near-zero cost to extend | Does *not* solve detection; degrades on heavy free text (reasoning/coreference) | Necessary baseline, but on its own does not deliver the feature value |
| 5 | **Format-Preserving Encryption / crypto tokenization** for format-bound IDs (Substitution) | Stateless reversibility via key, preserves format/referential integrity | Only meaningful for numbers/IDs, not names/sentences; key management | Narrow scope — elegant for IDs, inapplicable to the bulk of free-text PII |
| 6 | **Commercial cloud DLP** (Google SDP, AWS Comprehend, Azure) — detection + reidentify (Buy/Detection) | Mature, multilingual detection; native reversible crypto tokens | Ships the very PII you're protecting to a third party → new trust boundary, conflicts with data-residency mode | Strong capability, but the trust boundary breaks the privacy story |
| 7 | **Commercial LLM privacy vault/gateway** (Skyflow, Private AI, Protecto) (Buy) | Turnkey: detect + tokenize + detokenize, context-preserving, compliance tooling | This *is* omadia's own differentiator | Adopting = outsourcing the core feature; useful only as competitive reference |
| 8 | **Local Differential Privacy text sanitization** (paradigm) | Formal guarantees, detection-free, defends against inference attacks | **Not** exactly reversible by design (incompatible with the restore goal); degrades utility; recent work shows LLM reconstruction | Listed for completeness, structurally disqualified by the exact-restore requirement |

## Recommended path: Approach #1

Add a dedicated PII transformer as a new detector in the existing registry; orchestrate it together with the current regex/Presidio/Ollama detectors via the existing confidence-reconciliation/dedup logic, and apply the full ensemble to the user prompt before it reaches the LLM. Substitution stays on the current opaque-token + turn-map mechanism (approach #4 above); revisiting it for realistic surrogates (#3) is a separate follow-up issue.

## Scope decision: 6 Western-EU Latin-script languages

Target locales for the **initial** implementation: **EN, DE, FR, ES, IT, NL.** This lands exactly where the candidate model and dataset live, which makes both the model choice and the eval set genuinely lean.

**Out of scope for this round** (and therefore a re-test trigger when market expansion brings them in):

- CJK (中文 / 日本語 / 한국어) — no word boundaries, breaks the word-boundary-extension trick
- MENA / RTL (Arabic, Hebrew)
- Cyrillic, Indic, Turkish, other low-resource locales

Honest cross-impact: internationally, no single fixed transformer covers all locales — a future re-test for non-Latin scripts will lean more heavily on a multilingual GLiNER variant + the LLM-pass (#2). The architecture is the same; the model mix changes.

## Decision gate: lean validation plan

Before integrating anything, run a standalone validation that answers one question: **does the candidate detector ensemble reach the per-language quality bar required to justify shipping?** Pass/fail thresholds are committed *before* the run, otherwise it's a vibe check.

### Configurations compared

- **C0 (control):** current omadia detectors — regex + Presidio + Ollama
- **C1 (candidate):** C0 + Piiranha-v1 (`iiiorg/piiranha-v1-detect-personal-information`) added to the detector registry
- **Ablations:** each detector solo, to expose marginal contribution

Alternative / second measurement point: GLiNER-multi (zero-shot, tunable label set) and a DeBERTa fine-tuned on ai4privacy as a ceiling check.

### Eval set (~750–1000 items)

- **Backbone, near-free:** balanced slice from ai4privacy `pii-masking-300k`, ~100–150 items × 6 languages (~600–900), already human-validated
- **Hard slice, hand-built:** ~20–30/language focusing on context-only PII, per-locale ID formats (Steuer-ID, codice fiscale, NINO, NIE, BSN), ambiguous tokens, multi-part Spanish surnames. EN/DE in-house; FR/ES/IT/NL via LLM-generation + native-speaker spotcheck
- **Negatives:** 20–30 % PII-free prompts per language → measures over-masking directly

### Critical methodological caveat

Piiranha is trained on ai4privacy-style data, so evaluating Piiranha on ai4privacy is partially **in-distribution** and will inflate numbers. The honest go/no-go signal is the **out-of-distribution hand slice**, not the ai4privacy slice. The ai4privacy backbone provides breadth and per-language coverage; the hand slice tells us whether the model generalizes to real omadia prompts.

### Scoring

- Instance-level matching via `nervaluate` (Exact-Match): a PII instance counts as masked only if the detected span covers it fully — any uncovered identifying character = leak. Stricter than standard NER F1, but the right lens for the privacy goal.
- Per language × per severity tier:

| Tier | Entity types | Recall gate | Leak tolerance |
| --- | --- | --- | --- |
| Critical | Financial (IBAN, salary), health, ID numbers (Steuer-ID, SSN-equiv) | ≥ 0.98 | ~0 |
| High | Name, address, DOB, email, phone | ≥ 0.95 | low |
| Medium | Age, job, employer, other quasi-IDs | ≥ 0.85 | tolerable |

- **Guardrails:** precision ≥ 0.85 (over-masking); added p95 latency ≤ budget (proposed 150–300 ms; final value tunable based on UX measurement)
- **Decision rule:** C1 must beat C0 on leak-rate (Critical + High tiers) in **every** one of the 6 languages; the weakest language gates the verdict.

### Effort

~2–4 days total:

- Harness (detector calls → nervaluate → per-language/per-type aggregation + latency): ~1 day
- Eval set (mostly the hand slice): ~1–2 days
- Analysis + categorized miss list: ~0.5 day

### Deliberately out of scope (to keep the validation lean)

- No wiring into the plugin pipeline yet — detectors run standalone against the text
- No mask → LLM → restore round-trip (restore is already solved); optional Phase 2
- No LLM-response utility measurement (that belongs to the substitution layer)
- No commercial cloud DLP path (would breach the on-prem trust boundary)
- No DP-style text sanitization (incompatible with exact restore)

## Open questions / risks

- **Per-locale recall variance:** Piiranha's reported overall accuracy hides per-language variance, especially on lower-frequency entity types and quasi-identifiers. The test must surface this. If e.g. NL recall on Critical-tier IDs is below gate, the option is feature-flagged rollout or LLM-pass-only fallback for that language.
- **Latency budget on the prompt path:** Piiranha is 280M parameters; on CPU this is non-trivial added latency for *every* user prompt. The p95 budget needs an actual UX number, not a guess.
- **In-distribution inflation** of Piiranha vs. ai4privacy as noted — hand slice is the real signal.
- **Re-test trigger:** any new market that requires non-Latin script support invalidates the model choice and reopens the ranking.

## Next step

Build the harness + eval set, run the validation, post results back into this issue, then decide go/no-go.

## References

- Piiranha-v1 — [model card](https://huggingface.co/iiiorg/piiranha-v1-detect-personal-information), [announcement](https://www.marktechpost.com/2024/09/14/piiranha-v1-released-a-280m-small-encoder-open-model-for-pii-detection-with-98-27-token-detection-accuracy-supporting-6-languages-and-17-pii-types-released-under-mit-license/)
- ai4privacy [pii-masking-300k dataset](https://huggingface.co/datasets/ai4privacy/pii-masking-300k)
- Microsoft Presidio [analyzer + anonymizer + deanonymization](https://microsoft.github.io/presidio/anonymizer/) — already integrated
- LangChain [PresidioReversibleAnonymizer](https://python.langchain.com/api_reference/experimental/data_anonymizer/langchain_experimental.data_anonymizer.presidio.PresidioReversibleAnonymizer.html) — pattern reference for substitution upgrade (#3)
- [Hybrid methods for multilingual PII detection — RECAP (arXiv 2510.07551)](https://arxiv.org/abs/2510.07551)
- [Unmasking the Reality of PII Masking Models (arXiv 2504.12308)](https://arxiv.org/pdf/2504.12308)
- [GLiNER (arXiv 2311.08526)](https://arxiv.org/abs/2311.08526) · [GLiNER2-PII](https://pioneer.ai/blog/gliner2-pii-open-source-privacy-filtering-with-pii-detection)
- [Survey on text anonymization (arXiv 2508.21587)](https://arxiv.org/html/2508.21587v1)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Extend privacy-guard masking from tool results to free-text user prompts (RFC + validation plan) #361

Problem

What we can reuse (restore is already solved)

Approach landscape (ranked by integration value for omadia)

Recommended path: Approach #1

Scope decision: 6 Western-EU Latin-script languages

Decision gate: lean validation plan

Configurations compared

Eval set (~750–1000 items)

Critical methodological caveat

Scoring

Effort

Deliberately out of scope (to keep the validation lean)

Open questions / risks

Next step

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

#	Approach (layer)	Pro	Con	Why this rank
1	Hybrid detector ensemble on the prompt — dedicated PII transformer (GLiNER-PII / Piiranha) added to existing Regex/Presidio/LLM detectors, confidence-reconciled (Detection)	Closes the free-text recall gap; uses existing detector registry + dedup; on-prem	More latency/engineering; locale tuning + threshold calibration needed	Best quality-per-effort: fits omadia's architecture, directly addresses the gap, no new trust boundary
2	Local-LLM pass for contextual / quasi-identifiers (Detection) — already present via Ollama detector	Only layer that catches context-only PII ("my daughter who broke her leg last summer"); semantic	Latency, nondeterminism, hallucination/miss; not guarantee-able	The only thing catching context-only PII that NER/regex structurally cannot; must stay as complement, not base
3	Consistent realistic surrogates (Faker-style, LangChain PresidioReversibleAnonymizer pattern) instead of opaque tokens (Substitution)	Preserves fluency + coreference → better LLM answer; defuses "token storm"	Surrogate can collide with real text; restore needs exact uniqueness	Substitution-layer upgrade with high leverage on response quality, but a refinement, not a closing of the gap
4	Opaque placeholder tokens + turn map (current) extended to prompts (Substitution)	Already built, proven, leaks nothing; near-zero cost to extend	Does not solve detection; degrades on heavy free text (reasoning/coreference)	Necessary baseline, but on its own does not deliver the feature value
5	Format-Preserving Encryption / crypto tokenization for format-bound IDs (Substitution)	Stateless reversibility via key, preserves format/referential integrity	Only meaningful for numbers/IDs, not names/sentences; key management	Narrow scope — elegant for IDs, inapplicable to the bulk of free-text PII
6	Commercial cloud DLP (Google SDP, AWS Comprehend, Azure) — detection + reidentify (Buy/Detection)	Mature, multilingual detection; native reversible crypto tokens	Ships the very PII you're protecting to a third party → new trust boundary, conflicts with data-residency mode	Strong capability, but the trust boundary breaks the privacy story
7	Commercial LLM privacy vault/gateway (Skyflow, Private AI, Protecto) (Buy)	Turnkey: detect + tokenize + detokenize, context-preserving, compliance tooling	This is omadia's own differentiator	Adopting = outsourcing the core feature; useful only as competitive reference
8	Local Differential Privacy text sanitization (paradigm)	Formal guarantees, detection-free, defends against inference attacks	Not exactly reversible by design (incompatible with the restore goal); degrades utility; recent work shows LLM reconstruction	Listed for completeness, structurally disqualified by the exact-restore requirement

Tier	Entity types	Recall gate	Leak tolerance
Critical	Financial (IBAN, salary), health, ID numbers (Steuer-ID, SSN-equiv)	≥ 0.98	~0
High	Name, address, DOB, email, phone	≥ 0.95	low
Medium	Age, job, employer, other quasi-IDs	≥ 0.85	tolerable

Uh oh!

Extend privacy-guard masking from tool results to free-text user prompts (RFC + validation plan) #361

Description

Problem

What we can reuse (restore is already solved)

Approach landscape (ranked by integration value for omadia)

Recommended path: Approach #1

Scope decision: 6 Western-EU Latin-script languages

Decision gate: lean validation plan

Configurations compared

Eval set (~750–1000 items)

Critical methodological caveat

Scoring

Effort

Deliberately out of scope (to keep the validation lean)

Open questions / risks

Next step

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions