Skip to content

Commit 597c590

Browse files
committed
20250427_01-update
- Added trailing whitespace stripping and whitespace on blank line stripping too - As a filter in a pipeline now works well too. - Updated README
1 parent e8df386 commit 597c590

File tree

6 files changed

+100
-56
lines changed

6 files changed

+100
-56
lines changed

.gitignore

+2
Original file line numberDiff line numberDiff line change
@@ -251,3 +251,5 @@ Sessionx.vim
251251
tags
252252
# Persistent undo
253253
[._]*.un~
254+
bashexp*.txt
255+

CHANGELOG.md

+13
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
# Changelog for UnicodeFix
2+
3+
4+
## 2025-04-27 20250427_01-update
5+
6+
- Update README
7+
- Update cleanup-text.py to handle trailing whitespace
8+
- Whitespace on empty lines (newline preserved)
9+
10+
## 2025-04-26 20250427_00-release
11+
12+
- Initial release
13+
- Added STDIO pipe handling as a filter

README.md

+71-54
Original file line numberDiff line numberDiff line change
@@ -1,44 +1,50 @@
11
# UnicodeFix
22

3-
Normalizes Unicode to ASCII equivalents.
3+
UnicodeFix normalizes problematic Unicode artifacts into clean ASCII equivalents.
44

5-
**I'm getting this out quickly as people need it. Updates will follow to polish this up more soon.**
5+
This project was created to address the increasing frequency of invisible and typographic Unicode characters causing issues in code, configuration files, AI detection, and document processing.
6+
7+
**This is an early release. Further polishing and enhancements will follow.**
68

79
- [UnicodeFix](#unicodefix)
810
- [Installation](#installation)
911
- [Usage](#usage)
12+
- [Pipe / Filter (STDIN to STDOUT)](#pipe--filter-stdin-to-stdout)
13+
- [Using in vi/vim/macvim](#using-in-vivimmacvim)
1014
- [Shortcut for macOS](#shortcut-for-macos)
11-
- [To add the shortcut:](#to-add-the-shortcut)
12-
- [What's in This Repo:](#whats-in-this-repo)
15+
- [To add the Shortcut:](#to-add-the-shortcut)
16+
- [What's in This Repository](#whats-in-this-repository)
1317
- [Contributing](#contributing)
14-
- [Support This and Other Projects I Have](#support-this-and-other-projects-i-have)
18+
- [Support This and Other Projects](#support-this-and-other-projects)
1519
- [Changelog](#changelog)
1620
- [2025-04-27](#2025-04-27)
1721
- [2025-04-26](#2025-04-26)
1822
- [License](#license)
1923

2024
## Installation
2125

22-
Clone the repository somewhere on your system. You will need to pop open a terminal window to do this.
23-
24-
Then copy and paste the following commands into the terminal:
26+
Clone the repository and run the setup script:
2527

2628
```bash
2729
git clone https://github.com/unixwzrd/UnicodeFix.git
2830
cd UnicodeFix
2931
bash setup.sh
3032
```
3133

32-
Setup will create a virtual environment to keep your system Python clean. I also have a whole set of [Virtual Environment Utilities](https://github.com/unixwzrd/venvutil) repo it's likely overkill for most people., but it does contain a lot of useful utilities and tools for managing Python Virtual environments using Pip and Conda, along with many other handy tools for AI and Unix.
34+
The \`setup.sh\` script:
3335

34-
It will also add the items needed to start the script into your `.bashrc`.
36+
- Creates a dedicated Python virtual environment
37+
- Installs required dependencies
38+
- Adds startup configuration to your \`.bashrc\` for easier usage
3539

36-
Look at the [setup.sh](setup.sh) file to see exactly what it does if you like — it's very simple.
40+
You can review [setup.sh](setup.sh) to see exactly what is modified.
3741

38-
The `.bashrc` items are necessary because I have a Shortcut you may use from the macOS context menu to run the script directly.
42+
I also maintain a broader toolset for virtual environment management here: [VenvUtil](https://github.com/unixwzrd/venvutil), which may be of interest for more advanced users.
3943

4044
## Usage
4145

46+
Once installed and activated:
47+
4248
```bash
4349
(python-3.10-PA-dev) [unixwzrd@xanax: UnicodeFix]$ python bin/cleanup-text.py --help
4450
usage: cleanup-text.py [-h] [infile ...]
@@ -50,87 +56,98 @@ positional arguments:
5056

5157
options:
5258
-h, --help Show this help message and exit
59+
```
60+
61+
### Pipe / Filter (STDIN to STDOUT)
5362

54-
Example:
55-
python bin/cleanup-text.py <input_file>
63+
UnicodeFix can operate as a standard UNIX pipe:
64+
65+
```bash
66+
cat file.txt | cleanup-text > cleaned.txt
5667
```
5768

58-
The output file will be named the same as the input file, but with a `.clean.txt` extension.
69+
If no input file arguments are given, it automatically reads from standard input and writes to standard output.
5970

60-
You can select multiple files at once.
71+
### Using in vi/vim/macvim
72+
73+
You can run UnicodeFix as a filter within vi/vim/macvim:
74+
75+
```vim
76+
:%!cleanup-text
77+
```
78+
79+
This command rewrites the entire buffer with cleaned text.
80+
81+
**Note**:
82+
- Ensure your virtual environment is activated before launching your editor, or
83+
- Use a shell wrapper that sources your \`.bashrc\` and activates the environment automatically.
84+
85+
Depending on how you manage virtual environments, you may need to adjust your editor’s shell invocation settings.
6186

6287
## Shortcut for macOS
6388

64-
There is a "Shortcut" file in the `macOS/` directory which may be imported into the Shortcuts app.
65-
It will allow the script to be run as a **Quick Action** from the Finder "Right Click" menu.
66-
This allows selecting multiple files and scrubbing the Unicode quirks from them in bulk.
89+
UnicodeFix includes a macOS Shortcut for direct Finder integration.
6790

68-
### To add the shortcut:
91+
You can right-click one or more files and select a Quick Action to clean Unicode quirks without opening a terminal.
6992

70-
1. Open the "Shortcuts" app.
93+
### To add the Shortcut:
7194

72-
2. Go to `File -> Import...`
95+
1. Open the **Shortcuts** app.
96+
2. Navigate to \`File -> Import\`.
7397

7498
![Shortcuts App Menu](docs/Screenshot%202025-04-25%20at%2005.50.57.png)
7599

76-
3. Navigate to the `macOS` directory in this repository and select the `Strip Unicode.shortcut` file.
100+
3. Select the Shortcut file located in \`macOS/Strip Unicode.shortcut\`.
77101

78102
![Import Shortcut](docs/Screenshot%202025-04-25%20at%2005.51.54.png)
79103

80-
4. You will need to open the shortcut and change the location path of the `cleanup-text.py` script.
104+
4. Edit the Shortcut to point to your local installation of \`cleanup-text.py\`.
81105

82106
![Edit Shortcut Script Path](docs/Screenshot%202025-04-25%20at%2005.07.47.png)
83107

84-
5. You may have to restart Finder (use `Command+Option+Esc`, select Finder, and click "Relaunch").
108+
5. You may need to relaunch Finder (\`Command+Option+Esc\` → Select FinderRelaunch).
85109

86-
6. Once setup, right-click on a file or multiple files in Finder, go to `Quick Actions`, and select `Strip Unicode`.
110+
6. After setup, right-click selected files, choose \`Quick Actions\`, and select \`Strip Unicode\`.
87111

88112
![Select Shortcut File](docs/Screenshot%202025-04-25%20at%2005.47.51.png)
89113

90-
This will invoke the script on the selected files and create `.clean.txt` versions.
114+
## What's in This Repository
91115

92-
Strip all the Unicode quirks out of your text files right in the finder using a Quick Action!
93-
94-
If you know a better way for Linux or Windows users, feel free to submit a PR with your improvements.
95-
96-
## What's in This Repo:
97-
98-
- [bin/cleanup-text.py](bin/cleanup-text.py) — The script that cleans up the text.
99-
- [bin/cleanup-text](bin/cleanup-text) — A symlink without the `.py` extension for prettier usage in scripts.
100-
- [setup.sh](setup.sh) — A script that sets up the virtual environment.
101-
- [LICENSE](LICENSE) — The license for the project.
102-
- [README.md](README.md) — This file.
103-
- [requirements.txt](requirements.txt) — The dependencies needed to run.
104-
- [data/](data/) — Sample files full of Unicode issues for testing.
105-
- [docs/](docs/) — Supporting documentation for the project.
106-
- [macOS/](macOS/) — The Shortcut file for macOS users.
116+
- [bin/cleanup-text.py](bin/cleanup-text.py) — Main cleaning script
117+
- [bin/cleanup-text](bin/cleanup-text) — Symlink for command-line usage
118+
- [setup.sh](setup.sh) — Virtual environment setup script
119+
- [requirements.txt](requirements.txt) — Python dependencies
120+
- [macOS/](macOS/) — macOS Shortcut for Finder integration
121+
- [data/](data/) — Example test files with Unicode artifacts
122+
- [docs/](docs/) — Documentation and screenshots
123+
- [LICENSE](LICENSE) — License information
124+
- [README.md](README.md) — This file
107125

108126
## Contributing
109127

110-
If you have suggestions, enhancements, or fixes, feel free to open an issue or pull request!
111-
Testing and feedback are also very welcome.
128+
Feedback, testing, bug reports, and pull requests are welcome.
112129

113-
## Support This and Other Projects I Have
130+
If you find a better integration path for Linux or Windows platforms, feel free to open an issue or contribute a patch.
114131

115-
AI and Unix are my passions — but I need to pay the bills too.
132+
## Support This and Other Projects
116133

117-
If you find this project useful, please tell others, and consider supporting my work:
134+
If you find UnicodeFix or my other projects valuable, please consider supporting continued development:
118135

119136
- [Patreon](https://www.patreon.com/unixwzrd)
120-
- [Buy me a Ko-Fi](https://ko-fi.com/unixwzrd)
121-
- [Buy me a Coffee](https://www.buymeacoffee.com/unixwzrd)
122-
123-
Thank you!
137+
- [Ko-Fi](https://ko-fi.com/unixwzrd)
138+
- [Buy Me a Coffee](https://www.buymeacoffee.com/unixwzrd)
124139

140+
Thank you for your support.
125141

126142
## Changelog
127143

128144
### 2025-04-27
129-
- bug fix for filtering STDIO pipes
130-
- added a shell script wrapper to source in your .bashrc, presumable with the virtual environment activated.
145+
- Fixed behavior when processing STDIN pipes
146+
- Added trailing whitespace and blank line normalization
147+
- Added shell script wrapper for easier activation from editors
131148

132149
### 2025-04-26
133-
- Initial release.
150+
- Initial release
134151

135152
## License
136153

bin/cleanup-text.py

+8-1
Original file line numberDiff line numberDiff line change
@@ -49,7 +49,14 @@ def clean_text(text: str) -> str:
4949
}
5050
for orig, repl in replacements.items():
5151
text = text.replace(orig, repl)
52-
return re.sub(r'[\u200B\u200C\u200D\uFEFF]', '', text)
52+
53+
# Remove zero-width characters
54+
text = re.sub(r'[\u200B\u200C\u200D\uFEFF]', '', text)
55+
56+
# Remove trailing whitespace on every line
57+
text = re.sub(r'[ \t]+(\r?\n)', r'\1', text)
58+
59+
return text
5360

5461

5562
def main():

data/unicode-tst.bin

+2-1
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,4 @@
11
Here’s a “smart” example—complete with em dashes, zero-width spaces, and other quirks.​
2-
Notice the zero-width space between 'and' and 'other'? It’s invisible but present.​
2+
Notice the zero-width space between 'and' and 'other'? It’s invisible but present. ​
3+
34
Also, beware of zero-width non-joiners‌ and joiners‍ that sneak into your text.

data/unicode-tst.bin.txt

+4
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
Here's a "smart" example-complete with em dashes, zero-width spaces, and other quirks.
2+
Notice the zero-width space between 'and' and 'other'? It's invisible but present.
3+
4+
Also, beware of zero-width non-joiners and joiners that sneak into your text.

0 commit comments

Comments
 (0)