Skip to content

fix: add confusable/homograph protection to username validation #48

@mbradley

Description

@mbradley

problem

the name server validates usernames against IDNA 2008 rules (punycode encoding, DNS label limits, emoji rejection, ZWJ rejection), but does not protect against cross-script confusable attacks. a user can register a name that is visually identical to an existing name using characters from a different unicode script, and NIP-05 verification will confirm both as legitimate @divine.video identities.

example: matt (ASCII) vs mаtt (Cyrillic а at position 2) are visually indistinguishable but would resolve to different pubkeys. any client displaying a verified checkmark would show both as legitimate.

this was surfaced while reviewing a punycode-encoded registration containing a trademark symbol, which is itself benign but highlighted the gap.

risk

  • identity spoofing via visually identical NIP-05 names
  • undermines the trust signal that name@divine.video verification provides to clients
  • low likelihood today given user count, but the attack is trivial to execute

approaches (based on browser and platform best practices)

1. single-script restriction
reject labels that mix scripts (e.g., Latin + Cyrillic in the same name). this is what Chrome and Firefox do for IDN display. simple to implement, eliminates the highest-impact attack class.

2. skeleton-based confusable detection (UTS #39)
apply Unicode Technical Standard #39's skeleton algorithm (NFD + confusable character mapping) at registration time. reject any new name whose skeleton collides with an existing registered name. this is what GitHub-style username systems converge on. more comprehensive than script restriction alone — catches both cross-script and within-script confusables. requires vendoring Unicode confusable data (published by the Unicode Consortium as confusables.txt) and keeping it current with Unicode releases, or depending on an npm package that wraps it. the check only runs at registration time, not on NIP-05 lookups.

3. NFKC normalization before storage
ensure equivalent unicode representations (e.g., é as single codepoint vs e + combining accent) are normalized to the same canonical form. prevents "same name, different bytes" collisions. should be applied regardless of which other approach is chosen.

4. ASCII-only restriction
restrict NIP-05 names to [a-z0-9-] only. eliminates the problem entirely but drops internationalized name support, which the current validation explicitly supports across multiple scripts (CJK, Cyrillic, Arabic, Thai, etc.).

recommendation

option 4 is untenable — the IDN support is intentional and well-tested. option 1 is a subset of what option 2 provides. recommend implementing option 2 (skeleton-based confusable detection) + option 3 (NFKC normalization) to get this right from the start.

also worth considering: an audit of existing registered names for confusable collisions with each other.

references

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions