docs: audit all product surfaces and reposition for market by Fieldnote-Echo · Pull Request #33 · Project-Navi/navi-sanitize

Nelson Spence (Fieldnote-Echo) · 2026-04-05T18:38:49Z

Summary

Full five-agent documentation audit across all product surfaces (README, CLAUDE.md, docs site, pipeline code)
Fixed 1 HIGH, 8 MEDIUM, 9 LOW accuracy issues across README, CLAUDE.md, and 5 docs files
Repositioned README and docs/index.md based on four-agent market research
Pipeline code: _normalize_nfkc() returns count instead of bool, type: ignore replaced with cast(T, ...)

What changed

Accuracy fixes:

Pipeline table: homoglyph scope, invisible char categories, stage return types
CI section: job count, fuzz workflow, smoke test scope
Benchmark numbers: rerun on local hardware (AMD Ryzen 9 9950X, Python 3.13)
ZWJ emoji decomposition caveat added to comparison table
NFKC warning now includes count per project convention

Positioning:

Hero line: mechanism specificity + "no ML" differentiator
Install command: moved to first screen (was line 150)
"Why This Matters": compressed to scannable bullets, LLM leads
OWASP LLM Prompt Injection Prevention Cheat Sheet alignment
"Only maintained homoglyph library" (both alternatives archived)
CVE-2024-43093 proof point (actively exploited fullwidth bypass)

Code:

_normalize_nfkc() → tuple[str, int] (count of changed codepoints)
type: ignore[return-value] → cast(T, ...) with explanatory comments
Zero type: ignore remaining in src/

Test plan

405 tests pass (-W error, warnings-as-errors)
mypy strict clean (0 issues, 0 type: ignore in src/)
ruff check + format clean
All pre-commit hooks pass (gitleaks, trailing whitespace, ruff, mypy)
CI passes on GitHub
Docs site renders correctly (check after merge)

Five-agent documentation audit found and fixed: - README pipeline table understated homoglyph scope (missing Armenian/Cherokee/typographic) - README "emoji intact" claim lacked ZWJ decomposition caveat - CLAUDE.md invisible char description listed 5 of 9 categories (omitted 304 of 492 chars) - CLAUDE.md stage return type claim incorrect for escaper (str, not tuple) - CLAUDE.md CI section said "five parallel jobs" (actually 7, only 4 parallel) - CLAUDE.md omitted fuzz workflow entirely - Benchmark numbers stale (from CI runner); rerun on local hardware - type: ignore annotations replaced with typed cast() - NFKC warning lacked count, violating project's own convention Pipeline code change: - _normalize_nfkc() returns tuple[str, int] (count) instead of tuple[str, bool] - Warning message now includes count per project convention - Both type: ignore[return-value] replaced with cast(T, ...) + explanatory comments Positioning changes (informed by market research): - Hero line: added mechanism specificity and "no ML" differentiator - Install command moved to first screen (was buried at line 150) - "Why This Matters" compressed to scannable bullets, LLM leads - Added OWASP LLM Prompt Injection Prevention Cheat Sheet alignment - Added "only maintained homoglyph library" positioning (both alternatives archived) - Added CVE-2024-43093 proof point (actively exploited fullwidth bypass) - docs/index.md: mirrored positioning, added OWASP and competitive bullets All surfaces updated: README, CLAUDE.md, docs/index.md, pipeline-architecture, comparison, threat-model, performance, quickstart. 405 tests pass. mypy strict clean. ruff clean.

codecov · 2026-04-05T18:39:39Z

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

github-actions · 2026-04-05T18:41:37Z


-def _normalize_nfkc(s: str) -> tuple[str, bool]:
-    """NFKC normalize. Returns (cleaned, changed)."""
+def _normalize_nfkc(s: str) -> tuple[str, int]:


🟡 MEDIUM: _normalize_nfkc() may miscount changed codepoints when string-level normalization changes more than one character

Confidence: 95%

count = sum(1 for c in s if unicodedata.normalize("NFKC", c) != c) return normalized, max(count, 1)

The revised _normalize_nfkc() function attempts to count the number of codepoints changed by per-character normalization. However, performing unicodedata.normalize("NFKC", c) for each character and comparing it to c does not always accurately count the number of changed codepoints. Some NFKC normalizations operate on substrings (e.g., character pairs), so the per-codepoint approach may undercount or misrepresent true changes.

Suggestion: Consider a more robust diffing mechanism for counting NFKC changes, such as using difflib to compare original and normalized strings, or explicitly document that this is an approximate count. Alternatively, clarify in both docstrings and warning logs that the count may be inexact.

— Counting NFKC-changed codepoints is tricky-per-char normalization can be misleading when composite sequences normalize as one. Document the approximation or consider stronger diff logic.

github-actions · 2026-04-05T18:41:37Z

@@ -10,7 +10,11 @@
 [![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)
 [![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)


🔵 LOW: Documentation repositioned: install command and purpose statement now immediately visible

Confidence: 98%

pip install navi-sanitize (now at top of README), hero line specifies differentiators

Moving the install command and improving the opening description aligns the README with common OSS best practices and makes onboarding easier for newcomers.

Suggestion: No action required; this is a positive example for future documentation.

— README onboarding sequence now matches what most people actually want to find. ...no arguments here.

github-actions · 2026-04-05T18:41:37Z

@@ -1,25 +1,25 @@
 # Performance


🔵 LOW: Benchmark updates: hardware, Python version, and numbers clarified

Confidence: 96%

Benchmarks measured on Python 3.13, single thread, AMD Ryzen 9 9950X. ... expect ±20% on different hardware; CI runners are typically 2-3x slower.

Benchmarks now specify hardware details (AMD Ryzen 9 9950X, Python 3.13), clarify expected variance, and report more realistic ops/sec. This transparency improves reproducibility.

Suggestion: Continue to benchmark and document performance in context for user clarity.

— Modernizing benchmark methodology is a step forward. Numbers are harder to fudge when they're precise.

github-actions

Grippy approves — PASS (95/100)

github-actions · 2026-04-05T18:41:40Z

✅ Grippy Review — PASS

Score: 100/100 | Findings: 3

Delta: 3 new

_{Commit: 6eeb02e}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bf54a5ea56

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".

chatgpt-codex-connector · 2026-04-05T18:41:56Z

+    count = sum(1 for c in s if unicodedata.normalize("NFKC", c) != c)
+    return normalized, max(count, 1)  # at least 1 if string changed


Compute NFKC change counts from whole-string diffs

The new count logic under-reports normalization changes when NFKC effects depend on neighboring code points (for example, combining-mark sequences like "e\u0301e\u0301"), because it checks each character in isolation and then falls back to 1 via max(count, 1). In these cases the string changes but multiple characters are normalized, so the warning count is inaccurate; this breaks the new “count for traceability” behavior and can skew any log-based metrics built on that count.

Useful? React with 👍 / 👎.

Copilot

Pull request overview

This PR performs a broad documentation audit/repositioning across the README + docs site and updates the core sanitization pipeline’s NFKC stage to return a change count (and removes remaining type: ignore usage in src/).

Changes:

Update _normalize_nfkc() to return (normalized_text, change_count) and log NFKC normalization with a count.
Replace type: ignore[return-value] with cast(T, ...) in walk() return paths.
Rework README + docs landing pages and supporting docs for accuracy and market positioning (pipeline table, benchmarks, caveats, etc.).

Reviewed changes

Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
src/navi_sanitize/_pipeline.py	NFKC stage now returns an int count; logging updated; typing uses `cast` instead of `type: ignore`.
README.md	Repositioning, install moved up, pipeline table + benchmarks + additional caveats.
docs/index.md	Landing page refresh + new feature bullets.
docs/getting-started/quickstart.md	Logging section updated to reflect NFKC count-based warning.
docs/explanation/threat-model.md	Add caveats around ZWJ/bidi stripping and homoglyph scope text updates.
docs/explanation/pipeline-architecture.md	Update stage semantics/diagram text to reflect count-based indicators.
docs/explanation/performance.md	Benchmark environment and results updated; `walk()` description updated.
docs/explanation/comparison.md	Comparison positioning updates + ZWJ footnote + homoglyph scope update.
CLAUDE.md	CI/pipeline documentation updates (but stage return-type text needs reconciliation).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T18:42:23Z

+def _normalize_nfkc(s: str) -> tuple[str, int]:
+    """NFKC normalize. Returns (cleaned, count of changed codepoints)."""
    normalized = unicodedata.normalize("NFKC", s)
-    return normalized, normalized != s
+    if normalized == s:
+        return s, 0
+    count = sum(1 for c in s if unicodedata.normalize("NFKC", c) != c)
+    return normalized, max(count, 1)  # at least 1 if string changed


_normalize_nfkc() claims to return a “count of changed codepoints”, but the current counting approach is only per-codepoint NFKC differences and then forces a minimum of 1 when the string changes due to cross-codepoint composition. This makes the count potentially misleading (e.g., a sequence that changes only when normalized as a whole will log 1 even though no individual codepoint differs in isolation). Consider either (a) redefining/documenting the count as an “approximate/non-zero change indicator” or (b) computing a count that better matches the wording (and log message).

Copilot · 2026-04-05T18:42:24Z

+def _normalize_nfkc(s: str) -> tuple[str, int]:
+    """NFKC normalize. Returns (cleaned, count of changed codepoints)."""
    normalized = unicodedata.normalize("NFKC", s)
-    return normalized, normalized != s
+    if normalized == s:
+        return s, 0
+    count = sum(1 for c in s if unicodedata.normalize("NFKC", c) != c)
+    return normalized, max(count, 1)  # at least 1 if string changed


_normalize_nfkc() now does an additional per-codepoint unicodedata.normalize() pass to compute a count when the string changes. Since _normalize_nfkc() is also used for the stage-5 “Re-NFKC” pass where the count isn’t logged/used, consider splitting into two helpers (normalize-only vs normalize+count) or adding an option to skip counting to avoid unnecessary O(n) work in that path.

Copilot · 2026-04-05T18:42:24Z

+    text, nfkc_count = _normalize_nfkc(text)
+    if nfkc_count:
+        logger.warning("Normalized %d fullwidth/compatibility character(s) in value", nfkc_count)


The PR changes stage 3’s warning to include an NFKC normalization count, but the current test suite doesn’t appear to assert the count/value for this warning (unlike invisibles/homoglyph counts). Adding a focused test that checks the warning contains the expected count for a representative NFKC case would help prevent regressions in the new counting behavior.

Copilot · 2026-04-05T18:42:24Z

 - `"Removed 2 null byte(s) from value"`
 - `"Stripped 3 invisible character(s) from value"`
- `"Normalized fullwidth character(s) in value"`
+- `"Normalized {N} fullwidth/compatibility character(s) in value"`


This section lists example warning strings as literal messages, but the NFKC entry is shown as "Normalized {N} ..." rather than an example with an actual number. Since the real log message includes a concrete integer, consider either using a concrete example (e.g. "Normalized 1 ...") or explicitly stating that {N} is a placeholder to avoid confusion/copy-paste mismatch.

Suggested change

- `"Normalized {N} fullwidth/compatibility character(s) in value"`

- `"Normalized 1 fullwidth/compatibility character(s) in value"`

Copilot · 2026-04-05T18:42:24Z

 6. **Escaper** (optional) — pluggable `Callable[[str], str]` runs last

-Each stage returns `(cleaned_string, count_or_flag)` — either an `int` count of removals/replacements or a `bool` changed flag. Stages have no side effects — the orchestrator logs.
+Stages 1–5 each return `(cleaned_string, count_or_flag)` — either an `int` count of removals/replacements or a `bool` changed flag. Stage 6 (escaper) is a `Callable[[str], str]` that returns a bare `str`. Stages have no side effects — the orchestrator logs.


This still documents that stages 1–5 may return a bool “changed flag”, but _pipeline._normalize_nfkc() now returns an int count (and the other stages return counts as well). Please update this description to match the current implementation so it doesn’t mislead contributors about stage return types.

Suggested change

Stages 1–5 each return `(cleaned_string, count_or_flag)` — either an `int` count of removals/replacements or a `bool` changed flag. Stage 6 (escaper) is a `Callable[[str], str]` that returns a bare `str`. Stages have no side effects — the orchestrator logs.

Stages 1–5 each return `(cleaned_string, count)` where `count` is an `int` for removals, replacements, or normalization changes. Stage 6 (escaper) is a `Callable[[str], str]` that returns a bare `str`. Stages have no side effects — the orchestrator logs.

Copilot · 2026-04-05T18:42:25Z

 # Pipeline Architecture

-Every string passed to `clean()` flows through six stages in strict order. Each stage is a deterministic function that returns the cleaned string and a change indicator (a count for stages that strip or replace, a boolean for normalization). The pipeline orchestrator logs warnings when stages modify input.
+Every string passed to `clean()` flows through six stages in strict order. Each of the five universal stages (1--5) is a deterministic function that returns the cleaned string and a change indicator (a count of removals or replacements). The escaper (stage 6, if provided) is a plain `str -> str` function. The pipeline orchestrator logs warnings when stages modify input.


The intro says universal stages return “a count of removals or replacements”, but stage 3 (NFKC) is normalization (not removal/replacement) and the implementation now reports a count of normalized/changed codepoints. Consider tweaking this wording to include normalization (or more generally: “a count of affected codepoints/changes”) so it matches the actual stage semantics.

Suggested change

Every string passed to `clean()` flows through six stages in strict order. Each of the five universal stages (1--5) is a deterministic function that returns the cleaned string and a change indicator (a count of removals or replacements). The escaper (stage 6, if provided) is a plain `str -> str` function. The pipeline orchestrator logs warnings when stages modify input.

Every string passed to `clean()` flows through six stages in strict order. Each of the five universal stages (1--5) is a deterministic function that returns the cleaned string and a change indicator (a count of affected codepoints). The escaper (stage 6, if provided) is a plain `str -> str` function. The pipeline orchestrator logs warnings when stages modify input.

Copilot · 2026-04-05T18:42:25Z

+- **Only maintained option** --- both [confusable_homoglyphs](https://github.com/vhf/confusable_homoglyphs) and [homoglyphs](https://github.com/life4/homoglyphs) are archived; navi-sanitize is the only maintained Python library covering homoglyph replacement
 - **Deterministic** --- same input always produces the same output; no probabilistic models, no heuristics
- **Zero dependencies** --- Python 3.12+ stdlib only
+- **Zero dependencies** --- Python 3.12+ stdlib only; zero supply chain risk


“Zero dependencies … zero supply chain risk” is an absolute claim that isn’t strictly accurate (there’s still supply-chain risk in the package itself and its distribution channel; “zero third‑party dependency risk” would be closer). Consider rephrasing to avoid overstating the security property.

Suggested change

- **Zero dependencies** --- Python 3.12+ stdlib only; zero supply chain risk

- **Zero dependencies** --- Python 3.12+ stdlib only; no third-party dependency risk

NFKC count logic: - Replace per-character normalization with whole-string positional diff (fixes combining-mark sequences like e+accent that compose as pairs) - Add count=False fast path for stage-5 re-NFKC where count is discarded - Add 2 tests asserting NFKC warning includes correct count Doc fixes from Copilot suggestions: - CLAUDE.md: stage return types now all described as int count (no bool) - pipeline-architecture.md: "affected codepoints" instead of "removals or replacements" - quickstart.md: concrete example "Normalized 1 ..." instead of "{N}" placeholder - docs/index.md: "no third-party dependency risk" instead of absolute "zero supply chain risk"

github-actions

Grippy approves — PASS (100/100)

Greek U+03A5 + combining tilde triggers the count=False fast path: stage 4 replaces U+03A5 -> Y, stage 5 re-NFKC composes Y+tilde -> U+1EF8.

github-actions

Grippy approves — PASS (100/100)

Stage 5 re-NFKC never changes the string because stage 3 NFKC already handles compatibility decomposition and stage 4 NFC handles canonical composition. The count=False optimization was unreachable dead code — _normalize_nfkc always returns (s, 0) at the early equality check in the stage 5 path, so the counting diff never runs there. Removed: count parameter, max(n, 1) guard (n >= 1 is guaranteed when normalized != s since equal-length unequal strings differ in at least one position).

Copilot

Pull request overview

Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.

Comments suppressed due to low confidence (3)

src/navi_sanitize/_pipeline.py:66

_normalize_nfkc()’s “affected codepoints” count is computed via positional comparison + length delta, which can significantly overcount when normalization changes length early in the string (e.g., ligatures like U+FB00 → "ff" will report 2 affected codepoints even though only 1 input codepoint changed). Consider either (a) switching to an edit-script based diff (e.g., SequenceMatcher opcodes over codepoints) to produce a more meaningful count, or (b) documenting/renaming this value as an approximate upper bound rather than a codepoint count.

    acute) compose to a single precomposed codepoint but each constituent
    appears unchanged when normalized in isolation.
    """
    normalized = unicodedata.normalize("NFKC", s)
    if normalized == s:
        return s, 0
    min_len = min(len(s), len(normalized))
    n = sum(1 for i in range(min_len) if s[i] != normalized[i])
    n += abs(len(s) - len(normalized))
    return normalized, n


def _replace_homoglyphs(s: str) -> tuple[str, int]:
    """Replace homoglyphs with Latin equivalents. Returns (cleaned, count).

    Decomposes to NFD first so that combining marks cannot hide

src/navi_sanitize/_pipeline.py:125

The Stage 3 warning message presents nfkc_count as an exact number of “fullwidth/compatibility character(s)”, but the implementation is explicitly approximate and may overcount for certain compatibility folds. Either adjust the log wording to indicate the count is approximate/upper-bound, or tighten the counting logic so the warning’s number matches what users will reasonably interpret as “characters affected”.

    text, glyph_count = _replace_homoglyphs(text)
    if glyph_count:
        logger.warning("Replaced %d homoglyph(s) in value", glyph_count)
        # Stage 5: Re-NFKC — homoglyph replacement can produce Latin chars that

src/navi_sanitize/_pipeline.py:168

walk() is typed as walk[T](data: T) -> T, but for str subclasses it returns a plain str (clean() always returns str). Using cast(T, ...) silences mypy but makes the public type signature unsound for callers relying on subclass preservation. Consider adding overloads (str -> str, dict/list -> same container type) or constraining the TypeVar so the signature reflects the actual runtime behavior.


    Uses a single iterative pass (no recursion, no deepcopy) so hostile
    nesting depth cannot cause stack overflow. Logs a warning when nesting
    exceeds *max_depth* but continues sanitizing. Cyclic references are
    handled via identity tracking — each container is copied and sanitized
    exactly once.

    Only dict and list contents are traversed; tuples, sets, and other
    types pass through by reference.
    """
    if max_depth < 0:
        raise ValueError("max_depth must be >= 0")

    # Scalars and strings — no traversal needed
    if isinstance(data, str):
        # clean() returns str; T is str (or a subclass whose extra semantics
        # sanitization intentionally discards).  Mypy can't prove str <: T.
        return cast(T, clean(data, escaper=escaper))
    if not isinstance(data, (dict, list)):
        return data

    # Single-pass iterative copy-and-sanitize (boltons remap pattern).
    # Each stack entry pairs (original, its_new_copy, depth). Children
    # are pushed onto the stack instead of recursing. The `seen` dict

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-05T18:52:25Z

+| **Preserves Unicode** | Yes (CJK, Arabic, emoji¹ intact) | No (destroys all non-ASCII) | Yes | Yes | Yes |
 | **Pluggable escaper** | Yes | No | No | No | N/A (HTML-specific) |
 | **Dependencies** | Zero | Zero | Zero | wcwidth | C ext / Rust ext |

+¹ ZWJ (U+200D) is stripped as a zero-width character, which decomposes ZWJ emoji sequences (e.g. family emoji) into individual emoji. Single emoji are unaffected. Bidi formatting marks (U+061C, U+200E/F, etc.) used in Arabic/Hebrew are also stripped — correct rendering may require re-adding directional marks downstream.


Copilot · 2026-04-05T18:52:26Z

+2. **Legitimate Unicode preserved** --- CJK, Arabic, Hebrew, emoji,¹ and non-confusable text pass through unchanged. A string that passes through unmodified was already clean.
+
+¹ ZWJ (U+200D) is stripped as a zero-width character, decomposing ZWJ emoji sequences into individual emoji. Bidi formatting marks (U+061C, U+200E/F, etc.) are also stripped — see [Stripping Arabic Letter Mark](#stripping-arabic-letter-mark-and-mongolian-fvs) below.


This section says Arabic/Hebrew “pass through unchanged” but the footnote notes bidi formatting marks (used in Arabic/Hebrew) are stripped. Also, “U+200E/F” is ambiguous. Consider adjusting the main sentence (or adding footnote markers on Arabic/Hebrew too) and spelling out the exact bidi codepoints (e.g., U+200E, U+200F) so the “preserved” claim is accurate.

Suggested change

2. **Legitimate Unicode preserved** --- CJK, Arabic, Hebrew, emoji,¹ and non-confusable text pass through unchanged. A string that passes through unmodified was already clean.

¹ ZWJ (U+200D) is stripped as a zero-width character, decomposing ZWJ emoji sequences into individual emoji. Bidi formatting marks (U+061C, U+200E/F, etc.) are also stripped — see [Stripping Arabic Letter Mark](#stripping-arabic-letter-mark-and-mongolian-fvs) below.

2. **Legitimate Unicode preserved** --- CJK, Arabic and Hebrew text (except stripped bidi formatting marks), emoji,¹ and non-confusable text pass through unchanged. A string that passes through unmodified was already clean.

¹ ZWJ (U+200D) is stripped as a zero-width character, decomposing ZWJ emoji sequences into individual emoji. Bidi formatting marks (U+061C, U+200E, U+200F, etc.) are also stripped — see [Stripping Arabic Letter Mark](#stripping-arabic-letter-mark-and-mongolian-fvs) below.

Copilot · 2026-04-05T18:52:26Z

 **Why they're different:** Transliteration destroys content. Unidecode turns Chinese characters into pinyin, Cyrillic sentences into romanized gibberish, and Arabic into Latin approximations. It's designed for slug generation, not security.

-navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, emoji, and non-confusable Cyrillic pass through unchanged.
+navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, emoji,¹ and non-confusable Cyrillic pass through unchanged.


The sentence claims “emoji … pass through unchanged” but the footnote clarifies ZWJ sequences are decomposed (and the library also strips some bidi/formatting marks). Consider tweaking the wording to avoid “unchanged” for emoji in general, or make the footnote marker placement reflect exactly what’s affected (e.g., “emoji sequences”). This will reduce confusion for readers comparing tools.

Suggested change

navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, emoji,¹ and non-confusable Cyrillic pass through unchanged.

navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, single emoji,¹ and non-confusable Cyrillic pass through unchanged.

github-actions · 2026-04-05T18:53:20Z

@@ -40,10 +41,23 @@ def _strip_invisible(s: str) -> tuple[str, int]:
    return s, 0


🔵 LOW: NFKC normalization now returns count of codepoint changes

Confidence: 98%

def _normalize_nfkc(s: str) -> tuple[str, int]: # ... n = sum(1 for i in range(min_len) if s[i] != normalized[i]) n += abs(len(s) - len(normalized))

The function _normalize_nfkc() was updated to return a count of affected codepoints rather than a boolean for change detection. This provides granular visibility into how many characters were actually altered by NFKC normalization, which improves logging and makes warnings traceable to the magnitude of normalization.

Suggestion: No action necessary. This change improves both accuracy and diagnostic capability.

— Granular counts for normalization changes add precision and better logs. ...acceptable.

github-actions · 2026-04-05T18:53:20Z

@@ -143,7 +157,9 @@ def walk[T](data: T, *, escaper: Escaper | None = None, max_depth: int = 128) ->



🔵 LOW: Use of cast(T, ...) eliminates type: ignore

Confidence: 98%

return cast(T, clean(data, escaper=escaper))

Removed type: ignore[return-value] in favor of explicit cast(T, ...) from the typing module in the walk function. This removes type ignores from the codebase and clarifies the intent for the type-checker.

Suggestion: No further action needed. Prefer explicit cast over broad ignores for clarity and mypy cleanliness.

— Explicit cast makes intent clear. The type checker will stop complaining. Progress.

github-actions · 2026-04-05T18:53:20Z

@@ -216,4 +232,6 @@ def _resolve(v: object, depth: int) -> object:
            for item in orig_l:


🔵 LOW: Root result of walk uses cast(T, result) for mypy correctness

Confidence: 95%

return cast(T, result)

The final return in walk uses cast(T, result) post-structure-copy, replacing an old type: ignore. This matches the per-item typing discipline of the function.

Suggestion: No follow-up required. The cast gives both safety and silence for type checkers.

— Type: ignore is extinct in this codebase. ...which is how it should be.

github-actions

Grippy approves — PASS (100/100)

Nelson Spence (Fieldnote-Echo) requested review from Copilot and Navi Bot (project-navi-bot) April 5, 2026 18:38

Copilot started reviewing on behalf of Nelson Spence (Fieldnote-Echo) April 5, 2026 18:39 View session

Navi Bot (project-navi-bot) previously approved these changes Apr 5, 2026

View reviewed changes

Merge branch 'main' into docs/audit-and-positioning

9943f86

github-actions Bot reviewed Apr 5, 2026

View reviewed changes

github-actions Bot previously approved these changes Apr 5, 2026

View reviewed changes

chatgpt-codex-connector Bot reviewed Apr 5, 2026

View reviewed changes

Copilot AI reviewed Apr 5, 2026

View reviewed changes

Nelson Spence (Fieldnote-Echo) dismissed stale reviews from github-actions[bot] and Navi Bot (project-navi-bot) via 518285f April 5, 2026 18:45

Navi Bot (project-navi-bot) self-requested a review April 5, 2026 18:45

Navi Bot (project-navi-bot) previously approved these changes Apr 5, 2026

View reviewed changes

github-actions Bot previously approved these changes Apr 5, 2026

View reviewed changes

test: cover re-NFKC stage-5 path for codecov

d61bb6b

Greek U+03A5 + combining tilde triggers the count=False fast path: stage 4 replaces U+03A5 -> Y, stage 5 re-NFKC composes Y+tilde -> U+1EF8.

Copilot AI review requested due to automatic review settings April 5, 2026 18:48

Nelson Spence (Fieldnote-Echo) dismissed stale reviews from github-actions[bot] and Navi Bot (project-navi-bot) via d61bb6b April 5, 2026 18:48

Copilot started reviewing on behalf of Nelson Spence (Fieldnote-Echo) April 5, 2026 18:48 View session

github-actions Bot previously approved these changes Apr 5, 2026

View reviewed changes

Nelson Spence (Fieldnote-Echo) dismissed github-actions[bot]’s stale review via 6eeb02e April 5, 2026 18:51

Copilot AI reviewed Apr 5, 2026

View reviewed changes

github-actions Bot reviewed Apr 5, 2026

View reviewed changes

github-actions Bot approved these changes Apr 5, 2026

View reviewed changes

Navi Bot (project-navi-bot) merged commit c846736 into main Apr 5, 2026
18 checks passed

Navi Bot (project-navi-bot) deleted the docs/audit-and-positioning branch April 5, 2026 18:56

		@@ -10,7 +10,11 @@
		[![Python 3.12+](https://img.shields.io/badge/python-3.12%2B-blue.svg)](https://www.python.org/downloads/)
		[![Ruff](https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/astral-sh/ruff/main/assets/badge/v2.json)](https://github.com/astral-sh/ruff)

		count = sum(1 for c in s if unicodedata.normalize("NFKC", c) != c)
		return normalized, max(count, 1) # at least 1 if string changed

	- `"Normalized {N} fullwidth/compatibility character(s) in value"`
	- `"Normalized 1 fullwidth/compatibility character(s) in value"`

	Stages 1–5 each return `(cleaned_string, count_or_flag)` — either an `int` count of removals/replacements or a `bool` changed flag. Stage 6 (escaper) is a `Callable[[str], str]` that returns a bare `str`. Stages have no side effects — the orchestrator logs.
	Stages 1–5 each return `(cleaned_string, count)` where `count` is an `int` for removals, replacements, or normalization changes. Stage 6 (escaper) is a `Callable[[str], str]` that returns a bare `str`. Stages have no side effects — the orchestrator logs.

	Every string passed to `clean()` flows through six stages in strict order. Each of the five universal stages (1--5) is a deterministic function that returns the cleaned string and a change indicator (a count of removals or replacements). The escaper (stage 6, if provided) is a plain `str -> str` function. The pipeline orchestrator logs warnings when stages modify input.
	Every string passed to `clean()` flows through six stages in strict order. Each of the five universal stages (1--5) is a deterministic function that returns the cleaned string and a change indicator (a count of affected codepoints). The escaper (stage 6, if provided) is a plain `str -> str` function. The pipeline orchestrator logs warnings when stages modify input.

	- Zero dependencies --- Python 3.12+ stdlib only; zero supply chain risk
	- Zero dependencies --- Python 3.12+ stdlib only; no third-party dependency risk

		2. Legitimate Unicode preserved --- CJK, Arabic, Hebrew, emoji,¹ and non-confusable text pass through unchanged. A string that passes through unmodified was already clean.

		¹ ZWJ (U+200D) is stripped as a zero-width character, decomposing ZWJ emoji sequences into individual emoji. Bidi formatting marks (U+061C, U+200E/F, etc.) are also stripped — see [Stripping Arabic Letter Mark](#stripping-arabic-letter-mark-and-mongolian-fvs) below.

	navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, emoji,¹ and non-confusable Cyrillic pass through unchanged.
	navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, single emoji,¹ and non-confusable Cyrillic pass through unchanged.

		@@ -40,10 +41,23 @@ def _strip_invisible(s: str) -> tuple[str, int]:
		return s, 0

		@@ -143,7 +157,9 @@ def walk[T](data: T, *, escaper: Escaper \| None = None, max_depth: int = 128) ->

		@@ -216,4 +232,6 @@ def _resolve(v: object, depth: int) -> object:
		for item in orig_l:

Conversation

Nelson Spence (Fieldnote-Echo) commented Apr 5, 2026

Summary

What changed

Test plan

Uh oh!

codecov Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

github-actions Bot Apr 5, 2026

Choose a reason for hiding this comment

🟡 MEDIUM: _normalize_nfkc() may miscount changed codepoints when string-level normalization changes more than one character

Uh oh!

github-actions Bot Apr 5, 2026

Choose a reason for hiding this comment

🔵 LOW: Documentation repositioned: install command and purpose statement now immediately visible

Uh oh!

github-actions Bot Apr 5, 2026

Choose a reason for hiding this comment

🔵 LOW: Benchmark updates: hardware, Python version, and numbers clarified

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Grippy Review — PASS

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

github-actions Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 5, 2026

Choose a reason for hiding this comment

codecov Bot commented Apr 5, 2026 •

edited

Loading

github-actions Bot commented Apr 5, 2026 •

edited

Loading