docs: audit all product surfaces and reposition for market#33
Conversation
Five-agent documentation audit found and fixed: - README pipeline table understated homoglyph scope (missing Armenian/Cherokee/typographic) - README "emoji intact" claim lacked ZWJ decomposition caveat - CLAUDE.md invisible char description listed 5 of 9 categories (omitted 304 of 492 chars) - CLAUDE.md stage return type claim incorrect for escaper (str, not tuple) - CLAUDE.md CI section said "five parallel jobs" (actually 7, only 4 parallel) - CLAUDE.md omitted fuzz workflow entirely - Benchmark numbers stale (from CI runner); rerun on local hardware - type: ignore annotations replaced with typed cast() - NFKC warning lacked count, violating project's own convention Pipeline code change: - _normalize_nfkc() returns tuple[str, int] (count) instead of tuple[str, bool] - Warning message now includes count per project convention - Both type: ignore[return-value] replaced with cast(T, ...) + explanatory comments Positioning changes (informed by market research): - Hero line: added mechanism specificity and "no ML" differentiator - Install command moved to first screen (was buried at line 150) - "Why This Matters" compressed to scannable bullets, LLM leads - Added OWASP LLM Prompt Injection Prevention Cheat Sheet alignment - Added "only maintained homoglyph library" positioning (both alternatives archived) - Added CVE-2024-43093 proof point (actively exploited fullwidth bypass) - docs/index.md: mirrored positioning, added OWASP and competitive bullets All surfaces updated: README, CLAUDE.md, docs/index.md, pipeline-architecture, comparison, threat-model, performance, quickstart. 405 tests pass. mypy strict clean. ruff clean.
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
|
|
||
| def _normalize_nfkc(s: str) -> tuple[str, bool]: | ||
| """NFKC normalize. Returns (cleaned, changed).""" | ||
| def _normalize_nfkc(s: str) -> tuple[str, int]: |
There was a problem hiding this comment.
🟡 MEDIUM: _normalize_nfkc() may miscount changed codepoints when string-level normalization changes more than one character
Confidence: 95%
count = sum(1 for c in s if unicodedata.normalize("NFKC", c) != c)
return normalized, max(count, 1)
The revised _normalize_nfkc() function attempts to count the number of codepoints changed by per-character normalization. However, performing unicodedata.normalize("NFKC", c) for each character and comparing it to c does not always accurately count the number of changed codepoints. Some NFKC normalizations operate on substrings (e.g., character pairs), so the per-codepoint approach may undercount or misrepresent true changes.
Suggestion: Consider a more robust diffing mechanism for counting NFKC changes, such as using difflib to compare original and normalized strings, or explicitly document that this is an approximate count. Alternatively, clarify in both docstrings and warning logs that the count may be inexact.
— Counting NFKC-changed codepoints is tricky-per-char normalization can be misleading when composite sequences normalize as one. Document the approximation or consider stronger diff logic.
| @@ -10,7 +10,11 @@ | |||
| [](https://www.python.org/downloads/) | |||
| [](https://github.com/astral-sh/ruff) | |||
There was a problem hiding this comment.
🔵 LOW: Documentation repositioned: install command and purpose statement now immediately visible
Confidence: 98%
pip install navi-sanitize (now at top of README), hero line specifies differentiators
Moving the install command and improving the opening description aligns the README with common OSS best practices and makes onboarding easier for newcomers.
Suggestion: No action required; this is a positive example for future documentation.
— README onboarding sequence now matches what most people actually want to find. ...no arguments here.
| @@ -1,25 +1,25 @@ | |||
| # Performance | |||
There was a problem hiding this comment.
🔵 LOW: Benchmark updates: hardware, Python version, and numbers clarified
Confidence: 96%
Benchmarks measured on Python 3.13, single thread, AMD Ryzen 9 9950X. ... expect ±20% on different hardware; CI runners are typically 2-3x slower.
Benchmarks now specify hardware details (AMD Ryzen 9 9950X, Python 3.13), clarify expected variance, and report more realistic ops/sec. This transparency improves reproducibility.
Suggestion: Continue to benchmark and document performance in context for user clarity.
— Modernizing benchmark methodology is a step forward. Numbers are harder to fudge when they're precise.
✅ Grippy Review — PASSScore: 100/100 | Findings: 3 Delta: 3 new Commit: 6eeb02e |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bf54a5ea56
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "Codex (@codex) address that feedback".
| count = sum(1 for c in s if unicodedata.normalize("NFKC", c) != c) | ||
| return normalized, max(count, 1) # at least 1 if string changed |
There was a problem hiding this comment.
Compute NFKC change counts from whole-string diffs
The new count logic under-reports normalization changes when NFKC effects depend on neighboring code points (for example, combining-mark sequences like "e\u0301e\u0301"), because it checks each character in isolation and then falls back to 1 via max(count, 1). In these cases the string changes but multiple characters are normalized, so the warning count is inaccurate; this breaks the new “count for traceability” behavior and can skew any log-based metrics built on that count.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Pull request overview
This PR performs a broad documentation audit/repositioning across the README + docs site and updates the core sanitization pipeline’s NFKC stage to return a change count (and removes remaining type: ignore usage in src/).
Changes:
- Update
_normalize_nfkc()to return(normalized_text, change_count)and log NFKC normalization with a count. - Replace
type: ignore[return-value]withcast(T, ...)inwalk()return paths. - Rework README + docs landing pages and supporting docs for accuracy and market positioning (pipeline table, benchmarks, caveats, etc.).
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/navi_sanitize/_pipeline.py | NFKC stage now returns an int count; logging updated; typing uses cast instead of type: ignore. |
| README.md | Repositioning, install moved up, pipeline table + benchmarks + additional caveats. |
| docs/index.md | Landing page refresh + new feature bullets. |
| docs/getting-started/quickstart.md | Logging section updated to reflect NFKC count-based warning. |
| docs/explanation/threat-model.md | Add caveats around ZWJ/bidi stripping and homoglyph scope text updates. |
| docs/explanation/pipeline-architecture.md | Update stage semantics/diagram text to reflect count-based indicators. |
| docs/explanation/performance.md | Benchmark environment and results updated; walk() description updated. |
| docs/explanation/comparison.md | Comparison positioning updates + ZWJ footnote + homoglyph scope update. |
| CLAUDE.md | CI/pipeline documentation updates (but stage return-type text needs reconciliation). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| def _normalize_nfkc(s: str) -> tuple[str, int]: | ||
| """NFKC normalize. Returns (cleaned, count of changed codepoints).""" | ||
| normalized = unicodedata.normalize("NFKC", s) | ||
| return normalized, normalized != s | ||
| if normalized == s: | ||
| return s, 0 | ||
| count = sum(1 for c in s if unicodedata.normalize("NFKC", c) != c) | ||
| return normalized, max(count, 1) # at least 1 if string changed |
There was a problem hiding this comment.
_normalize_nfkc() claims to return a “count of changed codepoints”, but the current counting approach is only per-codepoint NFKC differences and then forces a minimum of 1 when the string changes due to cross-codepoint composition. This makes the count potentially misleading (e.g., a sequence that changes only when normalized as a whole will log 1 even though no individual codepoint differs in isolation). Consider either (a) redefining/documenting the count as an “approximate/non-zero change indicator” or (b) computing a count that better matches the wording (and log message).
| def _normalize_nfkc(s: str) -> tuple[str, int]: | ||
| """NFKC normalize. Returns (cleaned, count of changed codepoints).""" | ||
| normalized = unicodedata.normalize("NFKC", s) | ||
| return normalized, normalized != s | ||
| if normalized == s: | ||
| return s, 0 | ||
| count = sum(1 for c in s if unicodedata.normalize("NFKC", c) != c) | ||
| return normalized, max(count, 1) # at least 1 if string changed |
There was a problem hiding this comment.
_normalize_nfkc() now does an additional per-codepoint unicodedata.normalize() pass to compute a count when the string changes. Since _normalize_nfkc() is also used for the stage-5 “Re-NFKC” pass where the count isn’t logged/used, consider splitting into two helpers (normalize-only vs normalize+count) or adding an option to skip counting to avoid unnecessary O(n) work in that path.
| text, nfkc_count = _normalize_nfkc(text) | ||
| if nfkc_count: | ||
| logger.warning("Normalized %d fullwidth/compatibility character(s) in value", nfkc_count) |
There was a problem hiding this comment.
The PR changes stage 3’s warning to include an NFKC normalization count, but the current test suite doesn’t appear to assert the count/value for this warning (unlike invisibles/homoglyph counts). Adding a focused test that checks the warning contains the expected count for a representative NFKC case would help prevent regressions in the new counting behavior.
| - `"Removed 2 null byte(s) from value"` | ||
| - `"Stripped 3 invisible character(s) from value"` | ||
| - `"Normalized fullwidth character(s) in value"` | ||
| - `"Normalized {N} fullwidth/compatibility character(s) in value"` |
There was a problem hiding this comment.
This section lists example warning strings as literal messages, but the NFKC entry is shown as "Normalized {N} ..." rather than an example with an actual number. Since the real log message includes a concrete integer, consider either using a concrete example (e.g. "Normalized 1 ...") or explicitly stating that {N} is a placeholder to avoid confusion/copy-paste mismatch.
| - `"Normalized {N} fullwidth/compatibility character(s) in value"` | |
| - `"Normalized 1 fullwidth/compatibility character(s) in value"` |
| 6. **Escaper** (optional) — pluggable `Callable[[str], str]` runs last | ||
|
|
||
| Each stage returns `(cleaned_string, count_or_flag)` — either an `int` count of removals/replacements or a `bool` changed flag. Stages have no side effects — the orchestrator logs. | ||
| Stages 1–5 each return `(cleaned_string, count_or_flag)` — either an `int` count of removals/replacements or a `bool` changed flag. Stage 6 (escaper) is a `Callable[[str], str]` that returns a bare `str`. Stages have no side effects — the orchestrator logs. |
There was a problem hiding this comment.
This still documents that stages 1–5 may return a bool “changed flag”, but _pipeline._normalize_nfkc() now returns an int count (and the other stages return counts as well). Please update this description to match the current implementation so it doesn’t mislead contributors about stage return types.
| Stages 1–5 each return `(cleaned_string, count_or_flag)` — either an `int` count of removals/replacements or a `bool` changed flag. Stage 6 (escaper) is a `Callable[[str], str]` that returns a bare `str`. Stages have no side effects — the orchestrator logs. | |
| Stages 1–5 each return `(cleaned_string, count)` where `count` is an `int` for removals, replacements, or normalization changes. Stage 6 (escaper) is a `Callable[[str], str]` that returns a bare `str`. Stages have no side effects — the orchestrator logs. |
| # Pipeline Architecture | ||
|
|
||
| Every string passed to `clean()` flows through six stages in strict order. Each stage is a deterministic function that returns the cleaned string and a change indicator (a count for stages that strip or replace, a boolean for normalization). The pipeline orchestrator logs warnings when stages modify input. | ||
| Every string passed to `clean()` flows through six stages in strict order. Each of the five universal stages (1--5) is a deterministic function that returns the cleaned string and a change indicator (a count of removals or replacements). The escaper (stage 6, if provided) is a plain `str -> str` function. The pipeline orchestrator logs warnings when stages modify input. |
There was a problem hiding this comment.
The intro says universal stages return “a count of removals or replacements”, but stage 3 (NFKC) is normalization (not removal/replacement) and the implementation now reports a count of normalized/changed codepoints. Consider tweaking this wording to include normalization (or more generally: “a count of affected codepoints/changes”) so it matches the actual stage semantics.
| Every string passed to `clean()` flows through six stages in strict order. Each of the five universal stages (1--5) is a deterministic function that returns the cleaned string and a change indicator (a count of removals or replacements). The escaper (stage 6, if provided) is a plain `str -> str` function. The pipeline orchestrator logs warnings when stages modify input. | |
| Every string passed to `clean()` flows through six stages in strict order. Each of the five universal stages (1--5) is a deterministic function that returns the cleaned string and a change indicator (a count of affected codepoints). The escaper (stage 6, if provided) is a plain `str -> str` function. The pipeline orchestrator logs warnings when stages modify input. |
| - **Only maintained option** --- both [confusable_homoglyphs](https://github.com/vhf/confusable_homoglyphs) and [homoglyphs](https://github.com/life4/homoglyphs) are archived; navi-sanitize is the only maintained Python library covering homoglyph replacement | ||
| - **Deterministic** --- same input always produces the same output; no probabilistic models, no heuristics | ||
| - **Zero dependencies** --- Python 3.12+ stdlib only | ||
| - **Zero dependencies** --- Python 3.12+ stdlib only; zero supply chain risk |
There was a problem hiding this comment.
“Zero dependencies … zero supply chain risk” is an absolute claim that isn’t strictly accurate (there’s still supply-chain risk in the package itself and its distribution channel; “zero third‑party dependency risk” would be closer). Consider rephrasing to avoid overstating the security property.
| - **Zero dependencies** --- Python 3.12+ stdlib only; zero supply chain risk | |
| - **Zero dependencies** --- Python 3.12+ stdlib only; no third-party dependency risk |
NFKC count logic:
- Replace per-character normalization with whole-string positional diff
(fixes combining-mark sequences like e+accent that compose as pairs)
- Add count=False fast path for stage-5 re-NFKC where count is discarded
- Add 2 tests asserting NFKC warning includes correct count
Doc fixes from Copilot suggestions:
- CLAUDE.md: stage return types now all described as int count (no bool)
- pipeline-architecture.md: "affected codepoints" instead of "removals or replacements"
- quickstart.md: concrete example "Normalized 1 ..." instead of "{N}" placeholder
- docs/index.md: "no third-party dependency risk" instead of absolute "zero supply chain risk"
518285f
Greek U+03A5 + combining tilde triggers the count=False fast path: stage 4 replaces U+03A5 -> Y, stage 5 re-NFKC composes Y+tilde -> U+1EF8.
d61bb6b
Stage 5 re-NFKC never changes the string because stage 3 NFKC already handles compatibility decomposition and stage 4 NFC handles canonical composition. The count=False optimization was unreachable dead code — _normalize_nfkc always returns (s, 0) at the early equality check in the stage 5 path, so the counting diff never runs there. Removed: count parameter, max(n, 1) guard (n >= 1 is guaranteed when normalized != s since equal-length unequal strings differ in at least one position).
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 10 out of 10 changed files in this pull request and generated 3 comments.
Comments suppressed due to low confidence (3)
src/navi_sanitize/_pipeline.py:66
- _normalize_nfkc()’s “affected codepoints” count is computed via positional comparison + length delta, which can significantly overcount when normalization changes length early in the string (e.g., ligatures like U+FB00 → "ff" will report 2 affected codepoints even though only 1 input codepoint changed). Consider either (a) switching to an edit-script based diff (e.g., SequenceMatcher opcodes over codepoints) to produce a more meaningful count, or (b) documenting/renaming this value as an approximate upper bound rather than a codepoint count.
acute) compose to a single precomposed codepoint but each constituent
appears unchanged when normalized in isolation.
"""
normalized = unicodedata.normalize("NFKC", s)
if normalized == s:
return s, 0
min_len = min(len(s), len(normalized))
n = sum(1 for i in range(min_len) if s[i] != normalized[i])
n += abs(len(s) - len(normalized))
return normalized, n
def _replace_homoglyphs(s: str) -> tuple[str, int]:
"""Replace homoglyphs with Latin equivalents. Returns (cleaned, count).
Decomposes to NFD first so that combining marks cannot hide
src/navi_sanitize/_pipeline.py:125
- The Stage 3 warning message presents nfkc_count as an exact number of “fullwidth/compatibility character(s)”, but the implementation is explicitly approximate and may overcount for certain compatibility folds. Either adjust the log wording to indicate the count is approximate/upper-bound, or tighten the counting logic so the warning’s number matches what users will reasonably interpret as “characters affected”.
text, glyph_count = _replace_homoglyphs(text)
if glyph_count:
logger.warning("Replaced %d homoglyph(s) in value", glyph_count)
# Stage 5: Re-NFKC — homoglyph replacement can produce Latin chars that
src/navi_sanitize/_pipeline.py:168
- walk() is typed as walk[T](data: T) -> T, but for str subclasses it returns a plain str (clean() always returns str). Using cast(T, ...) silences mypy but makes the public type signature unsound for callers relying on subclass preservation. Consider adding overloads (str -> str, dict/list -> same container type) or constraining the TypeVar so the signature reflects the actual runtime behavior.
Uses a single iterative pass (no recursion, no deepcopy) so hostile
nesting depth cannot cause stack overflow. Logs a warning when nesting
exceeds *max_depth* but continues sanitizing. Cyclic references are
handled via identity tracking — each container is copied and sanitized
exactly once.
Only dict and list contents are traversed; tuples, sets, and other
types pass through by reference.
"""
if max_depth < 0:
raise ValueError("max_depth must be >= 0")
# Scalars and strings — no traversal needed
if isinstance(data, str):
# clean() returns str; T is str (or a subclass whose extra semantics
# sanitization intentionally discards). Mypy can't prove str <: T.
return cast(T, clean(data, escaper=escaper))
if not isinstance(data, (dict, list)):
return data
# Single-pass iterative copy-and-sanitize (boltons remap pattern).
# Each stack entry pairs (original, its_new_copy, depth). Children
# are pushed onto the stack instead of recursing. The `seen` dict
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| | **Preserves Unicode** | Yes (CJK, Arabic, emoji¹ intact) | No (destroys all non-ASCII) | Yes | Yes | Yes | | ||
| | **Pluggable escaper** | Yes | No | No | No | N/A (HTML-specific) | | ||
| | **Dependencies** | Zero | Zero | Zero | wcwidth | C ext / Rust ext | | ||
|
|
||
| ¹ ZWJ (U+200D) is stripped as a zero-width character, which decomposes ZWJ emoji sequences (e.g. family emoji) into individual emoji. Single emoji are unaffected. Bidi formatting marks (U+061C, U+200E/F, etc.) used in Arabic/Hebrew are also stripped — correct rendering may require re-adding directional marks downstream. |
There was a problem hiding this comment.
The comparison table says “Preserves Unicode … Arabic … intact” but the new footnote states bidi formatting marks used in Arabic/Hebrew are stripped, which can affect rendering. Also, “U+200E/F” is ambiguous notation; consider spelling out both codepoints (U+200E, U+200F) or using a range (U+200E–U+200F). Updating the table wording and/or footnote marker placement would make the claim consistent and unambiguous.
| | **Preserves Unicode** | Yes (CJK, Arabic, emoji¹ intact) | No (destroys all non-ASCII) | Yes | Yes | Yes | | |
| | **Pluggable escaper** | Yes | No | No | No | N/A (HTML-specific) | | |
| | **Dependencies** | Zero | Zero | Zero | wcwidth | C ext / Rust ext | | |
| ¹ ZWJ (U+200D) is stripped as a zero-width character, which decomposes ZWJ emoji sequences (e.g. family emoji) into individual emoji. Single emoji are unaffected. Bidi formatting marks (U+061C, U+200E/F, etc.) used in Arabic/Hebrew are also stripped — correct rendering may require re-adding directional marks downstream. | |
| | **Preserves Unicode** | Mostly (CJK and Arabic letters preserved; emoji¹ preserved except ZWJ sequences) | No (destroys all non-ASCII) | Yes | Yes | Yes | | |
| | **Pluggable escaper** | Yes | No | No | No | N/A (HTML-specific) | | |
| | **Dependencies** | Zero | Zero | Zero | wcwidth | C ext / Rust ext | | |
| ¹ ZWJ (U+200D) is stripped as a zero-width character, which decomposes ZWJ emoji sequences (e.g. family emoji) into individual emoji. Single emoji are unaffected. Bidi formatting marks (U+061C, U+200E, U+200F, etc.) used in Arabic/Hebrew are also stripped — correct rendering may require re-adding directional marks downstream. |
| 2. **Legitimate Unicode preserved** --- CJK, Arabic, Hebrew, emoji,¹ and non-confusable text pass through unchanged. A string that passes through unmodified was already clean. | ||
|
|
||
| ¹ ZWJ (U+200D) is stripped as a zero-width character, decomposing ZWJ emoji sequences into individual emoji. Bidi formatting marks (U+061C, U+200E/F, etc.) are also stripped — see [Stripping Arabic Letter Mark](#stripping-arabic-letter-mark-and-mongolian-fvs) below. |
There was a problem hiding this comment.
This section says Arabic/Hebrew “pass through unchanged” but the footnote notes bidi formatting marks (used in Arabic/Hebrew) are stripped. Also, “U+200E/F” is ambiguous. Consider adjusting the main sentence (or adding footnote markers on Arabic/Hebrew too) and spelling out the exact bidi codepoints (e.g., U+200E, U+200F) so the “preserved” claim is accurate.
| 2. **Legitimate Unicode preserved** --- CJK, Arabic, Hebrew, emoji,¹ and non-confusable text pass through unchanged. A string that passes through unmodified was already clean. | |
| ¹ ZWJ (U+200D) is stripped as a zero-width character, decomposing ZWJ emoji sequences into individual emoji. Bidi formatting marks (U+061C, U+200E/F, etc.) are also stripped — see [Stripping Arabic Letter Mark](#stripping-arabic-letter-mark-and-mongolian-fvs) below. | |
| 2. **Legitimate Unicode preserved** --- CJK, Arabic and Hebrew text (except stripped bidi formatting marks), emoji,¹ and non-confusable text pass through unchanged. A string that passes through unmodified was already clean. | |
| ¹ ZWJ (U+200D) is stripped as a zero-width character, decomposing ZWJ emoji sequences into individual emoji. Bidi formatting marks (U+061C, U+200E, U+200F, etc.) are also stripped — see [Stripping Arabic Letter Mark](#stripping-arabic-letter-mark-and-mongolian-fvs) below. |
| **Why they're different:** Transliteration destroys content. Unidecode turns Chinese characters into pinyin, Cyrillic sentences into romanized gibberish, and Arabic into Latin approximations. It's designed for slug generation, not security. | ||
|
|
||
| navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, emoji, and non-confusable Cyrillic pass through unchanged. | ||
| navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, emoji,¹ and non-confusable Cyrillic pass through unchanged. |
There was a problem hiding this comment.
The sentence claims “emoji … pass through unchanged” but the footnote clarifies ZWJ sequences are decomposed (and the library also strips some bidi/formatting marks). Consider tweaking the wording to avoid “unchanged” for emoji in general, or make the footnote marker placement reflect exactly what’s affected (e.g., “emoji sequences”). This will reduce confusion for readers comparing tools.
| navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, emoji,¹ and non-confusable Cyrillic pass through unchanged. | |
| navi-sanitize normalizes only the 66 highest-risk Latin lookalikes and leaves legitimate Unicode intact. CJK, Arabic, single emoji,¹ and non-confusable Cyrillic pass through unchanged. |
| @@ -40,10 +41,23 @@ def _strip_invisible(s: str) -> tuple[str, int]: | |||
| return s, 0 | |||
There was a problem hiding this comment.
🔵 LOW: NFKC normalization now returns count of codepoint changes
Confidence: 98%
def _normalize_nfkc(s: str) -> tuple[str, int]:
# ...
n = sum(1 for i in range(min_len) if s[i] != normalized[i])
n += abs(len(s) - len(normalized))
The function _normalize_nfkc() was updated to return a count of affected codepoints rather than a boolean for change detection. This provides granular visibility into how many characters were actually altered by NFKC normalization, which improves logging and makes warnings traceable to the magnitude of normalization.
Suggestion: No action necessary. This change improves both accuracy and diagnostic capability.
— Granular counts for normalization changes add precision and better logs. ...acceptable.
| @@ -143,7 +157,9 @@ def walk[T](data: T, *, escaper: Escaper | None = None, max_depth: int = 128) -> | |||
|
|
|||
There was a problem hiding this comment.
🔵 LOW: Use of cast(T, ...) eliminates type: ignore
Confidence: 98%
return cast(T, clean(data, escaper=escaper))
Removed type: ignore[return-value] in favor of explicit cast(T, ...) from the typing module in the walk function. This removes type ignores from the codebase and clarifies the intent for the type-checker.
Suggestion: No further action needed. Prefer explicit cast over broad ignores for clarity and mypy cleanliness.
— Explicit cast makes intent clear. The type checker will stop complaining. Progress.
| @@ -216,4 +232,6 @@ def _resolve(v: object, depth: int) -> object: | |||
| for item in orig_l: | |||
There was a problem hiding this comment.
🔵 LOW: Root result of walk uses cast(T, result) for mypy correctness
Confidence: 95%
return cast(T, result)
The final return in walk uses cast(T, result) post-structure-copy, replacing an old type: ignore. This matches the per-item typing discipline of the function.
Suggestion: No follow-up required. The cast gives both safety and silence for type checkers.
— Type: ignore is extinct in this codebase. ...which is how it should be.
Summary
_normalize_nfkc()returns count instead of bool,type: ignorereplaced withcast(T, ...)What changed
Accuracy fixes:
Positioning:
Code:
_normalize_nfkc()→tuple[str, int](count of changed codepoints)type: ignore[return-value]→cast(T, ...)with explanatory commentstype: ignoreremaining in src/Test plan
-W error, warnings-as-errors)