Releases: unixwzrd/UnicodeFix
Releases · unixwzrd/UnicodeFix
CodExorcism+ Release v1.1.0
Test Harness Simplification added beginnings of semantic analysis.
- Rebuilt
tests/test_all.sh
to derive its file list directly fromdata/
, drive glob/batch runs with a single command, and rely on-t
for in-place scenarios. - STDIN/STDOUT scenario now skips binary fixtures to avoid Python's UTF-8 decoding errors, while every other scenario still exercises them.
- Normalized diffs and
wc
comparisons are produced per scenario without duplicating helper logic. - Updated README, docs/cleanup-text.md, and docs/test-suite.md with the new run commands, behavior notes, cleanup instructions, and the preview
--metrics
documentation. - Bumped version to 1.1.0 and documented the experimental semantic metrics (
--metrics
,--metrics-help
). - Installation is now Pip based.
- Added additional test files with human and AI generated text.
20250907_01-Release - CodExorcism Edition - Not just for Codex
20250907_01-Release - CodExorcism Edition - Not just for Codex
- Expanded quote normalization: map additional Unicode quote/prime/angle/fullwidth marks to ASCII ' and " for shell-safe output
- Added new options:
-Q
/--keep-smart-quotes
: preserve Unicode curly/smart quotes-D
/--keep-dashes
: preserve EN/EM dashes
- Normalize ellipses:
…
(U+2026) and⋯
(U+22EF) →...
;‥
(U+2025) →..
- Normalize Unicode spaces: replace NBSP (U+00A0), NARROW NBSP (U+202F), EN/EM/THIN spaces (U+2000–U+200A), IDEOGRAPHIC SPACE (U+3000), etc., with ASCII space
- Remove bidi/zero-width controls: strip LRM/RLM, embeddings/overrides/isolates, ZWSP/ZWNJ/ZWJ, BOM
- Refined VS Code filter handling: only apply newline compensation in filter mode; never in file-write modes; respect CI/CD env
- Note: These artifacts were observed in content produced by Codex/VS Code extensions
- No breaking changes; behavior unchanged for already-clean inputs
20250907_00 CodeExorcism Release
20250907_00-Release - CodExorcism Release - Not just for Codex
- Unicode characters creeping in from Codex
- Expanded quote normalization: map additional Unicode quote/prime/angle/fullwidth marks to ASCII ' and " for shell-safe output
- Refined VS Code filter handling: only apply newline compensation in filter mode; never in file-write modes; respect CI/CD env
- Normalize Unicode spaces: replace NBSP (U+00A0), NARROW NBSP (U+202F), EN/EM/THIN spaces (U+2000–U+200A), IDEOGRAPHIC SPACE (U+3000), etc., with ASCII space
- Remove bidi/zero-width controls: strip LRM/RLM, embeddings/overrides/isolates, ZWSP/ZWNJ/ZWJ, BOM
- Note: These artifacts were observed in content produced by Codex/VS Code extensions
- No breaking changes; behavior unchanged for already-clean inputs
- Ellipsis handling and normalization
20250812_00 Patch Release
20250812_00 Patch Relese
- Expand Unicode quote normalization
- refine VS Code filter newline handling
- preserve extended ASCII
Extended ASCII Preservation Fix
2025-07-28
Extended ASCII Preservation Fix
- Switched from Unidecode to ftfy: Replaced aggressive Unicode-to-ASCII conversion with intelligent text fixing
- Preserves Extended ASCII: Now correctly preserves 8-bit extended ASCII characters (128-255) like é, ñ, ü, etc.
- Smarter Unicode Handling: Only converts problematic Unicode characters while preserving intentional extended ASCII usage
- Updated Dependencies: Replaced
Unidecode
dependency withftfy
in requirements.txt - Maintains AI Artifact Removal: Still removes smart quotes, EM/EN dashes, and other "AI tells" as designed
- Added a check to see if we are in a VSCode extension and handle EOF newline properly - was being stripped by th extension handler.
All the same, VS Code issue fixed
Was not handling VS Code strangeness while in extensions and inconsistently stripping off newlines at the end of file. VS Code issue, not mine, but had to handle it.
20250722_01-update
20250722_01 update - minor issues - EOF newline missing.
20250722_00 Enough of of Your AI Nonsense Edition
- Major update, new options
- Smarter removal of Unicode and conversion
- More coding artifacts removed - less lint
20250427_01-Updates
Updated to working as a filter and removes trailing spaces
First really solid release
This is a good working release. The shell script wraper is heavily dependent on the user's installation. However, the Shortcut and the Python work just fine.
Full Changelog: https://github.com/unixwzrd/UnicodeFix/commits/20250427_00-Rel