Problem
extensions/professional-crm stores contact tags as a free-form text[]. There's no canonical vocabulary, alias mapping, or write-time validation. This guarantees tag drift over time. In a real CRM I built on top of this extension, after ~700 contacts the data already showed:
medtronic (33), post-medtronic-network (4), ex-mdt (1), pan-mdt (1),
medtronic-employment-counsel (1), post-medtronic (1), ex-medtronic (1)
— seven distinct ways to spell "Medtronic", all accidental. Other examples: kids-school + school-staff + school-admin (three variants of "involved with a kid's school"); sponsor-active + sponsor-grade + advice-grade-not-sponsor-grade (overlapping grade tags).
This is not a one-time data accident; it's an inevitable consequence of tools that accept arbitrary strings. The Gemma-/Claude-/agent-driven workflows OB1 enables will accelerate it.
Proposed pattern
A small, opt-in addition that any user can adopt without breaking the standard schema:
- New
tag_dictionary table — canonical names, aliases, definitions, parent (for hierarchy), dimension (affiliation / role / cohort / etc.), is_classifier_safe flag (so legacy or strategic tags can be preserved but not classifier-applied).
validate_contact_tags() trigger — BEFORE INSERT OR UPDATE OF tags on professional_contacts. Auto-resolves known aliases to canonical names. Raises a clear error on unknown tags.
resolveTags() helper in index.ts — pre-validates before the DB write, returns Levenshtein-based "did you mean" suggestions in the error message.
- Optional consolidation migration — one-time pass to canonicalize existing contact tags using the dictionary.
The standard professional_contacts.tags column type stays text[]. The trigger gracefully no-ops if tag_dictionary is empty (or absent, with a guard). So existing OB1 deployments aren't broken.
Behavior changes (when adopted)
Successful write — crm_add_contact({ tags: ['medtronic', 'ent', 'sponsor-active'] }) resolves to stored ['mdt', 'mdt-ent', 'sponsor'] (via alias map).
Rejected write — crm_add_contact({ tags: ['mdt', 'bogus'] }) returns:
Tag validation failed. Add new tags to tag_dictionary first, or use a canonical name:
'bogus' is not a known tag — did you mean: bus, busy?
Tested end-to-end against a deployed Supabase Edge Function (post-#330 patch). Defense in depth: trigger catches all writers (MCP tool, direct SQL, dashboard, future tools) while the tool-layer helper provides clearer errors.
Open questions for the maintainer
- Is this within scope for OB1, or better as a separate "ob-extension-tag-hygiene" companion?
- If in scope: should the dictionary itself be part of the canonical schema.sql (with no seed rows) or shipped as an optional
tag-dictionary.sql add-on?
- The trigger as drafted is strict (rejects unknown tags). Alternative is soft (warns + writes anyway). Strong opinions either way?
- The Gemma/agent workflow benefits significantly from the dictionary because it can be read at classification time and constrains the model's output vocabulary. Would docs/examples for that flow be valuable upstream?
Happy to draft a PR if there's interest. Local reference implementation:
tag_dictionary schema: composite of name, aliases[], definition, parent, dimension, is_classifier_safe, notes
- Trigger: ~30 lines plpgsql
- Helper: ~50 lines TS
Related: PR #330 (deploy-blockers) is the prerequisite — without the StreamableHTTPTransport fix the extension doesn't run as published.
Problem
extensions/professional-crmstores contact tags as a free-formtext[]. There's no canonical vocabulary, alias mapping, or write-time validation. This guarantees tag drift over time. In a real CRM I built on top of this extension, after ~700 contacts the data already showed:— seven distinct ways to spell "Medtronic", all accidental. Other examples:
kids-school+school-staff+school-admin(three variants of "involved with a kid's school");sponsor-active+sponsor-grade+advice-grade-not-sponsor-grade(overlapping grade tags).This is not a one-time data accident; it's an inevitable consequence of tools that accept arbitrary strings. The Gemma-/Claude-/agent-driven workflows OB1 enables will accelerate it.
Proposed pattern
A small, opt-in addition that any user can adopt without breaking the standard schema:
tag_dictionarytable — canonical names, aliases, definitions, parent (for hierarchy), dimension (affiliation / role / cohort / etc.),is_classifier_safeflag (so legacy or strategic tags can be preserved but not classifier-applied).validate_contact_tags()trigger —BEFORE INSERT OR UPDATE OF tagsonprofessional_contacts. Auto-resolves known aliases to canonical names. Raises a clear error on unknown tags.resolveTags()helper inindex.ts— pre-validates before the DB write, returns Levenshtein-based "did you mean" suggestions in the error message.The standard
professional_contacts.tagscolumn type staystext[]. The trigger gracefully no-ops iftag_dictionaryis empty (or absent, with a guard). So existing OB1 deployments aren't broken.Behavior changes (when adopted)
Successful write —
crm_add_contact({ tags: ['medtronic', 'ent', 'sponsor-active'] })resolves to stored['mdt', 'mdt-ent', 'sponsor'](via alias map).Rejected write —
crm_add_contact({ tags: ['mdt', 'bogus'] })returns:Tested end-to-end against a deployed Supabase Edge Function (post-#330 patch). Defense in depth: trigger catches all writers (MCP tool, direct SQL, dashboard, future tools) while the tool-layer helper provides clearer errors.
Open questions for the maintainer
tag-dictionary.sqladd-on?Happy to draft a PR if there's interest. Local reference implementation:
tag_dictionaryschema: composite of name, aliases[], definition, parent, dimension, is_classifier_safe, notesRelated: PR #330 (deploy-blockers) is the prerequisite — without the StreamableHTTPTransport fix the extension doesn't run as published.