[professional-crm] Proposal: tag_dictionary + validation trigger to prevent tag drift

## Problem

`extensions/professional-crm` stores contact tags as a free-form `text[]`. There's no canonical vocabulary, alias mapping, or write-time validation. This guarantees tag drift over time. In a real CRM I built on top of this extension, after ~700 contacts the data already showed:

```
medtronic (33), post-medtronic-network (4), ex-mdt (1), pan-mdt (1),
medtronic-employment-counsel (1), post-medtronic (1), ex-medtronic (1)
```

— **seven distinct ways to spell "Medtronic"**, all accidental. Other examples: `kids-school` + `school-staff` + `school-admin` (three variants of "involved with a kid's school"); `sponsor-active` + `sponsor-grade` + `advice-grade-not-sponsor-grade` (overlapping grade tags).

This is not a one-time data accident; it's an inevitable consequence of tools that accept arbitrary strings. The Gemma-/Claude-/agent-driven workflows OB1 enables will accelerate it.

## Proposed pattern

A small, **opt-in** addition that any user can adopt without breaking the standard schema:

1. **New `tag_dictionary` table** — canonical names, aliases, definitions, parent (for hierarchy), dimension (affiliation / role / cohort / etc.), `is_classifier_safe` flag (so legacy or strategic tags can be preserved but not classifier-applied).
2. **`validate_contact_tags()` trigger** — `BEFORE INSERT OR UPDATE OF tags` on `professional_contacts`. Auto-resolves known aliases to canonical names. Raises a clear error on unknown tags.
3. **`resolveTags()` helper in `index.ts`** — pre-validates before the DB write, returns Levenshtein-based "did you mean" suggestions in the error message.
4. **Optional consolidation migration** — one-time pass to canonicalize existing contact tags using the dictionary.

The standard `professional_contacts.tags` column type stays `text[]`. The trigger gracefully no-ops if `tag_dictionary` is empty (or absent, with a guard). So existing OB1 deployments aren't broken.

## Behavior changes (when adopted)

**Successful write** — `crm_add_contact({ tags: ['medtronic', 'ent', 'sponsor-active'] })` resolves to stored `['mdt', 'mdt-ent', 'sponsor']` (via alias map).

**Rejected write** — `crm_add_contact({ tags: ['mdt', 'bogus'] })` returns:
```
Tag validation failed. Add new tags to tag_dictionary first, or use a canonical name:
  'bogus' is not a known tag — did you mean: bus, busy?
```

Tested end-to-end against a deployed Supabase Edge Function (post-#330 patch). Defense in depth: trigger catches all writers (MCP tool, direct SQL, dashboard, future tools) while the tool-layer helper provides clearer errors.

## Open questions for the maintainer

1. Is this within scope for OB1, or better as a separate "ob-extension-tag-hygiene" companion?
2. If in scope: should the dictionary itself be part of the canonical schema.sql (with no seed rows) or shipped as an optional `tag-dictionary.sql` add-on?
3. The trigger as drafted is **strict** (rejects unknown tags). Alternative is **soft** (warns + writes anyway). Strong opinions either way?
4. The Gemma/agent workflow benefits significantly from the dictionary because it can be read at classification time and constrains the model's output vocabulary. Would docs/examples for that flow be valuable upstream?

Happy to draft a PR if there's interest. Local reference implementation:
- `tag_dictionary` schema: composite of name, aliases[], definition, parent, dimension, is_classifier_safe, notes
- Trigger: ~30 lines plpgsql
- Helper: ~50 lines TS

Related: PR #330 (deploy-blockers) is the prerequisite — without the StreamableHTTPTransport fix the extension doesn't run as published.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[professional-crm] Proposal: tag_dictionary + validation trigger to prevent tag drift #332

Problem

Proposed pattern

Behavior changes (when adopted)

Open questions for the maintainer

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[professional-crm] Proposal: tag_dictionary + validation trigger to prevent tag drift #332

Description

Problem

Proposed pattern

Behavior changes (when adopted)

Open questions for the maintainer

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions