Skip to content

Releases: unixwzrd/UnicodeFix

CodExorcism+ Release v1.1.0

19 Sep 02:40
Compare
Choose a tag to compare

Test Harness Simplification added beginnings of semantic analysis.

  • Rebuilt tests/test_all.sh to derive its file list directly from data/, drive glob/batch runs with a single command, and rely on -t for in-place scenarios.
  • STDIN/STDOUT scenario now skips binary fixtures to avoid Python's UTF-8 decoding errors, while every other scenario still exercises them.
  • Normalized diffs and wc comparisons are produced per scenario without duplicating helper logic.
  • Updated README, docs/cleanup-text.md, and docs/test-suite.md with the new run commands, behavior notes, cleanup instructions, and the preview --metrics documentation.
  • Bumped version to 1.1.0 and documented the experimental semantic metrics (--metrics, --metrics-help).
  • Installation is now Pip based.
  • Added additional test files with human and AI generated text.

20250907_01-Release - CodExorcism Edition - Not just for Codex

07 Sep 23:56
Compare
Choose a tag to compare

20250907_01-Release - CodExorcism Edition - Not just for Codex

  • Expanded quote normalization: map additional Unicode quote/prime/angle/fullwidth marks to ASCII ' and " for shell-safe output
  • Added new options:
    • -Q / --keep-smart-quotes: preserve Unicode curly/smart quotes
    • -D / --keep-dashes: preserve EN/EM dashes
  • Normalize ellipses: (U+2026) and (U+22EF) → ...; (U+2025) → ..
  • Normalize Unicode spaces: replace NBSP (U+00A0), NARROW NBSP (U+202F), EN/EM/THIN spaces (U+2000–U+200A), IDEOGRAPHIC SPACE (U+3000), etc., with ASCII space
  • Remove bidi/zero-width controls: strip LRM/RLM, embeddings/overrides/isolates, ZWSP/ZWNJ/ZWJ, BOM
  • Refined VS Code filter handling: only apply newline compensation in filter mode; never in file-write modes; respect CI/CD env
  • Note: These artifacts were observed in content produced by Codex/VS Code extensions
  • No breaking changes; behavior unchanged for already-clean inputs

20250907_00 CodeExorcism Release

07 Sep 23:41
Compare
Choose a tag to compare

20250907_00-Release - CodExorcism Release - Not just for Codex

  • Unicode characters creeping in from Codex
  • Expanded quote normalization: map additional Unicode quote/prime/angle/fullwidth marks to ASCII ' and " for shell-safe output
  • Refined VS Code filter handling: only apply newline compensation in filter mode; never in file-write modes; respect CI/CD env
  • Normalize Unicode spaces: replace NBSP (U+00A0), NARROW NBSP (U+202F), EN/EM/THIN spaces (U+2000–U+200A), IDEOGRAPHIC SPACE (U+3000), etc., with ASCII space
  • Remove bidi/zero-width controls: strip LRM/RLM, embeddings/overrides/isolates, ZWSP/ZWNJ/ZWJ, BOM
  • Note: These artifacts were observed in content produced by Codex/VS Code extensions
  • No breaking changes; behavior unchanged for already-clean inputs
  • Ellipsis handling and normalization

20250812_00 Patch Release

13 Aug 08:41
Compare
Choose a tag to compare

20250812_00 Patch Relese

  • Expand Unicode quote normalization
  • refine VS Code filter newline handling
  • preserve extended ASCII

Extended ASCII Preservation Fix

28 Jul 19:03
Compare
Choose a tag to compare

2025-07-28

Extended ASCII Preservation Fix

  • Switched from Unidecode to ftfy: Replaced aggressive Unicode-to-ASCII conversion with intelligent text fixing
  • Preserves Extended ASCII: Now correctly preserves 8-bit extended ASCII characters (128-255) like é, ñ, ü, etc.
  • Smarter Unicode Handling: Only converts problematic Unicode characters while preserving intentional extended ASCII usage
  • Updated Dependencies: Replaced Unidecode dependency with ftfy in requirements.txt
  • Maintains AI Artifact Removal: Still removes smart quotes, EM/EN dashes, and other "AI tells" as designed
  • Added a check to see if we are in a VSCode extension and handle EOF newline properly - was being stripped by th extension handler.

All the same, VS Code issue fixed

28 Jul 20:43
Compare
Choose a tag to compare

Was not handling VS Code strangeness while in extensions and inconsistently stripping off newlines at the end of file. VS Code issue, not mine, but had to handle it.

20250722_01-update

23 Jul 14:49
Compare
Choose a tag to compare

20250722_01 update - minor issues - EOF newline missing.

20250722_00 Enough of of Your AI Nonsense Edition

  • Major update, new options
  • Smarter removal of Unicode and conversion
  • More coding artifacts removed - less lint

20250427_01-Updates

28 Apr 20:37
Compare
Choose a tag to compare

Updated to working as a filter and removes trailing spaces

First really solid release

27 Apr 11:23
Compare
Choose a tag to compare

This is a good working release. The shell script wraper is heavily dependent on the user's installation. However, the Shortcut and the Python work just fine.

Full Changelog: https://github.com/unixwzrd/UnicodeFix/commits/20250427_00-Rel