Problem
The privacy-guard plugin currently masks PII only in structured tool result payloads before they reach the LLM, then restores the original values after the turn. Free-text user prompts are not processed. If a user writes
"What should we pay Anna Schmidt (32, lives at Bahnhofstr. 5, 60311 Frankfurt) given her current salary of €72,000?"
every one of those identifiers reaches the LLM verbatim. This is a privacy hole in a feature that is otherwise a core value prop.
What we can reuse (restore is already solved)
The hard parts of the existing implementation are generic and reusable for prompts:
«TYPE_N»-token format and per-turn TokenizeMap
- Restore pass on the LLM response
- System-prompt directive that teaches the LLM about tokens (with the Token-Storm fallback)
- Three-tier allowlist hierarchy
- Detector registry with confidence-based span deduplication and word-boundary extension
The new work is exclusively detecting PII spans in unstructured text. That is fundamentally harder than schema-driven JSON masking because:
- no schema → spans must be detected probabilistically;
- coreference matters → restored text must preserve LLM utility;
- the failure mode is asymmetric: a missed span = privacy leak, over-masking = token-soup prompt the LLM can't answer.
Approach landscape (ranked by integration value for omadia)
Detection ≠ substitution. Substitution is mostly solved by the existing token map; detection is the actual gap. Ranking criterion = how worth pursuing for omadia.
| # |
Approach (layer) |
Pro |
Con |
Why this rank |
| 1 |
Hybrid detector ensemble on the prompt — dedicated PII transformer (GLiNER-PII / Piiranha) added to existing Regex/Presidio/LLM detectors, confidence-reconciled (Detection) |
Closes the free-text recall gap; uses existing detector registry + dedup; on-prem |
More latency/engineering; locale tuning + threshold calibration needed |
Best quality-per-effort: fits omadia's architecture, directly addresses the gap, no new trust boundary |
| 2 |
Local-LLM pass for contextual / quasi-identifiers (Detection) — already present via Ollama detector |
Only layer that catches context-only PII ("my daughter who broke her leg last summer"); semantic |
Latency, nondeterminism, hallucination/miss; not guarantee-able |
The only thing catching context-only PII that NER/regex structurally cannot; must stay as complement, not base |
| 3 |
Consistent realistic surrogates (Faker-style, LangChain PresidioReversibleAnonymizer pattern) instead of opaque tokens (Substitution) |
Preserves fluency + coreference → better LLM answer; defuses "token storm" |
Surrogate can collide with real text; restore needs exact uniqueness |
Substitution-layer upgrade with high leverage on response quality, but a refinement, not a closing of the gap |
| 4 |
Opaque placeholder tokens + turn map (current) extended to prompts (Substitution) |
Already built, proven, leaks nothing; near-zero cost to extend |
Does not solve detection; degrades on heavy free text (reasoning/coreference) |
Necessary baseline, but on its own does not deliver the feature value |
| 5 |
Format-Preserving Encryption / crypto tokenization for format-bound IDs (Substitution) |
Stateless reversibility via key, preserves format/referential integrity |
Only meaningful for numbers/IDs, not names/sentences; key management |
Narrow scope — elegant for IDs, inapplicable to the bulk of free-text PII |
| 6 |
Commercial cloud DLP (Google SDP, AWS Comprehend, Azure) — detection + reidentify (Buy/Detection) |
Mature, multilingual detection; native reversible crypto tokens |
Ships the very PII you're protecting to a third party → new trust boundary, conflicts with data-residency mode |
Strong capability, but the trust boundary breaks the privacy story |
| 7 |
Commercial LLM privacy vault/gateway (Skyflow, Private AI, Protecto) (Buy) |
Turnkey: detect + tokenize + detokenize, context-preserving, compliance tooling |
This is omadia's own differentiator |
Adopting = outsourcing the core feature; useful only as competitive reference |
| 8 |
Local Differential Privacy text sanitization (paradigm) |
Formal guarantees, detection-free, defends against inference attacks |
Not exactly reversible by design (incompatible with the restore goal); degrades utility; recent work shows LLM reconstruction |
Listed for completeness, structurally disqualified by the exact-restore requirement |
Recommended path: Approach #1
Add a dedicated PII transformer as a new detector in the existing registry; orchestrate it together with the current regex/Presidio/Ollama detectors via the existing confidence-reconciliation/dedup logic, and apply the full ensemble to the user prompt before it reaches the LLM. Substitution stays on the current opaque-token + turn-map mechanism (approach #4 above); revisiting it for realistic surrogates (#3) is a separate follow-up issue.
Scope decision: 6 Western-EU Latin-script languages
Target locales for the initial implementation: EN, DE, FR, ES, IT, NL. This lands exactly where the candidate model and dataset live, which makes both the model choice and the eval set genuinely lean.
Out of scope for this round (and therefore a re-test trigger when market expansion brings them in):
- CJK (中文 / 日本語 / 한국어) — no word boundaries, breaks the word-boundary-extension trick
- MENA / RTL (Arabic, Hebrew)
- Cyrillic, Indic, Turkish, other low-resource locales
Honest cross-impact: internationally, no single fixed transformer covers all locales — a future re-test for non-Latin scripts will lean more heavily on a multilingual GLiNER variant + the LLM-pass (#2). The architecture is the same; the model mix changes.
Decision gate: lean validation plan
Before integrating anything, run a standalone validation that answers one question: does the candidate detector ensemble reach the per-language quality bar required to justify shipping? Pass/fail thresholds are committed before the run, otherwise it's a vibe check.
Configurations compared
- C0 (control): current omadia detectors — regex + Presidio + Ollama
- C1 (candidate): C0 + Piiranha-v1 (
iiiorg/piiranha-v1-detect-personal-information) added to the detector registry
- Ablations: each detector solo, to expose marginal contribution
Alternative / second measurement point: GLiNER-multi (zero-shot, tunable label set) and a DeBERTa fine-tuned on ai4privacy as a ceiling check.
Eval set (~750–1000 items)
- Backbone, near-free: balanced slice from ai4privacy
pii-masking-300k, ~100–150 items × 6 languages (~600–900), already human-validated
- Hard slice, hand-built: ~20–30/language focusing on context-only PII, per-locale ID formats (Steuer-ID, codice fiscale, NINO, NIE, BSN), ambiguous tokens, multi-part Spanish surnames. EN/DE in-house; FR/ES/IT/NL via LLM-generation + native-speaker spotcheck
- Negatives: 20–30 % PII-free prompts per language → measures over-masking directly
Critical methodological caveat
Piiranha is trained on ai4privacy-style data, so evaluating Piiranha on ai4privacy is partially in-distribution and will inflate numbers. The honest go/no-go signal is the out-of-distribution hand slice, not the ai4privacy slice. The ai4privacy backbone provides breadth and per-language coverage; the hand slice tells us whether the model generalizes to real omadia prompts.
Scoring
- Instance-level matching via
nervaluate (Exact-Match): a PII instance counts as masked only if the detected span covers it fully — any uncovered identifying character = leak. Stricter than standard NER F1, but the right lens for the privacy goal.
- Per language × per severity tier:
| Tier |
Entity types |
Recall gate |
Leak tolerance |
| Critical |
Financial (IBAN, salary), health, ID numbers (Steuer-ID, SSN-equiv) |
≥ 0.98 |
~0 |
| High |
Name, address, DOB, email, phone |
≥ 0.95 |
low |
| Medium |
Age, job, employer, other quasi-IDs |
≥ 0.85 |
tolerable |
- Guardrails: precision ≥ 0.85 (over-masking); added p95 latency ≤ budget (proposed 150–300 ms; final value tunable based on UX measurement)
- Decision rule: C1 must beat C0 on leak-rate (Critical + High tiers) in every one of the 6 languages; the weakest language gates the verdict.
Effort
~2–4 days total:
- Harness (detector calls → nervaluate → per-language/per-type aggregation + latency): ~1 day
- Eval set (mostly the hand slice): ~1–2 days
- Analysis + categorized miss list: ~0.5 day
Deliberately out of scope (to keep the validation lean)
- No wiring into the plugin pipeline yet — detectors run standalone against the text
- No mask → LLM → restore round-trip (restore is already solved); optional Phase 2
- No LLM-response utility measurement (that belongs to the substitution layer)
- No commercial cloud DLP path (would breach the on-prem trust boundary)
- No DP-style text sanitization (incompatible with exact restore)
Open questions / risks
- Per-locale recall variance: Piiranha's reported overall accuracy hides per-language variance, especially on lower-frequency entity types and quasi-identifiers. The test must surface this. If e.g. NL recall on Critical-tier IDs is below gate, the option is feature-flagged rollout or LLM-pass-only fallback for that language.
- Latency budget on the prompt path: Piiranha is 280M parameters; on CPU this is non-trivial added latency for every user prompt. The p95 budget needs an actual UX number, not a guess.
- In-distribution inflation of Piiranha vs. ai4privacy as noted — hand slice is the real signal.
- Re-test trigger: any new market that requires non-Latin script support invalidates the model choice and reopens the ranking.
Next step
Build the harness + eval set, run the validation, post results back into this issue, then decide go/no-go.
References
Problem
The privacy-guard plugin currently masks PII only in structured tool result payloads before they reach the LLM, then restores the original values after the turn. Free-text user prompts are not processed. If a user writes
every one of those identifiers reaches the LLM verbatim. This is a privacy hole in a feature that is otherwise a core value prop.
What we can reuse (restore is already solved)
The hard parts of the existing implementation are generic and reusable for prompts:
«TYPE_N»-token format and per-turnTokenizeMapThe new work is exclusively detecting PII spans in unstructured text. That is fundamentally harder than schema-driven JSON masking because:
Approach landscape (ranked by integration value for omadia)
Detection ≠ substitution. Substitution is mostly solved by the existing token map; detection is the actual gap. Ranking criterion = how worth pursuing for omadia.
Recommended path: Approach #1
Add a dedicated PII transformer as a new detector in the existing registry; orchestrate it together with the current regex/Presidio/Ollama detectors via the existing confidence-reconciliation/dedup logic, and apply the full ensemble to the user prompt before it reaches the LLM. Substitution stays on the current opaque-token + turn-map mechanism (approach #4 above); revisiting it for realistic surrogates (#3) is a separate follow-up issue.
Scope decision: 6 Western-EU Latin-script languages
Target locales for the initial implementation: EN, DE, FR, ES, IT, NL. This lands exactly where the candidate model and dataset live, which makes both the model choice and the eval set genuinely lean.
Out of scope for this round (and therefore a re-test trigger when market expansion brings them in):
Honest cross-impact: internationally, no single fixed transformer covers all locales — a future re-test for non-Latin scripts will lean more heavily on a multilingual GLiNER variant + the LLM-pass (#2). The architecture is the same; the model mix changes.
Decision gate: lean validation plan
Before integrating anything, run a standalone validation that answers one question: does the candidate detector ensemble reach the per-language quality bar required to justify shipping? Pass/fail thresholds are committed before the run, otherwise it's a vibe check.
Configurations compared
iiiorg/piiranha-v1-detect-personal-information) added to the detector registryAlternative / second measurement point: GLiNER-multi (zero-shot, tunable label set) and a DeBERTa fine-tuned on ai4privacy as a ceiling check.
Eval set (~750–1000 items)
pii-masking-300k, ~100–150 items × 6 languages (~600–900), already human-validatedCritical methodological caveat
Piiranha is trained on ai4privacy-style data, so evaluating Piiranha on ai4privacy is partially in-distribution and will inflate numbers. The honest go/no-go signal is the out-of-distribution hand slice, not the ai4privacy slice. The ai4privacy backbone provides breadth and per-language coverage; the hand slice tells us whether the model generalizes to real omadia prompts.
Scoring
nervaluate(Exact-Match): a PII instance counts as masked only if the detected span covers it fully — any uncovered identifying character = leak. Stricter than standard NER F1, but the right lens for the privacy goal.Effort
~2–4 days total:
Deliberately out of scope (to keep the validation lean)
Open questions / risks
Next step
Build the harness + eval set, run the validation, post results back into this issue, then decide go/no-go.
References