Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stop normalizing my UTF-8 #157

Open
liferooter opened this issue Mar 21, 2024 · 1 comment
Open

Stop normalizing my UTF-8 #157

liferooter opened this issue Mar 21, 2024 · 1 comment

Comments

@liferooter
Copy link

Tofi normalizes Unicode input. It is a really bad idea, since when it changed anything, its output is not the same as its input. This breaks the very idea of chooser.

Even if Tofi uses Unicode normalization for better search, it should print original chosen line, not normalized one.

@alex-huff
Copy link

alex-huff commented Mar 5, 2025

I just encountered this issue with this script:

cmus-remote --file "$(tofi --prompt="song: " < .music-cache)"

.music-cache is just a file where each line is a path to a song
cmus-remote --file <path> remotely controls cmus (my music player) to play a song

cmus is giving the error: Error: Couldn't get file information for /home/alex/music/tf/Serhat Durmus - La Câlin.mp3

This is because the input LATIN SMALL LETTER A WITH CIRCUMFLEX is being replaced with the letter a + COMBINING CIRCUMFLEX ACCENT in the output.

For things like file paths which aren't unicode aware this causes issues since at the byte level the output has changed from the corresponding input item.

Image

If anyone wants to just completely disable normalization for stdin input you can do something like this:

diff --git a/src/main.c b/src/main.c
index e91838f..d6a6927 100644
--- a/src/main.c
+++ b/src/main.c
@@ -49,7 +49,7 @@ static uint32_t gettime_ms() {
 
 
 /* Read all of stdin into a buffer. */
-static char *read_stdin(bool normalize) {
+static char *read_stdin() {
 	const size_t block_size = BUFSIZ;
 	size_t num_blocks = 1;
 	size_t buf_size = block_size;
@@ -73,15 +73,6 @@ static char *read_stdin(bool normalize) {
 			break;
 		}
 	}
-	if (normalize) {
-		if (utf8_validate(buf)) {
-			char *tmp = utf8_normalize(buf);
-			free(buf);
-			buf = tmp;
-		} else {
-			log_error("Invalid UTF-8 in stdin.\n");
-		}
-	}
 	return buf;
 }
 
@@ -1495,7 +1486,10 @@ int main(int argc, char *argv[])
 		log_debug("App list generated.\n");
 	} else {
 		log_debug("Reading stdin.\n");
-		char *buf = read_stdin(!tofi.ascii_input);
+		char *buf = read_stdin();
+		if (!tofi.ascii_input && !utf8_validate(buf)) {
+			log_error("Invalid UTF-8 in stdin.\n");
+		}
 		tofi.window.entry.command_buffer = buf;
 		tofi.window.entry.commands = string_ref_vec_from_buffer(buf);
 		if (tofi.use_history) {

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants