Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug: Arabic search fails to find words without diacritics (Tashkeel) #243122

Open
ahmedelq opened this issue Mar 10, 2025 · 0 comments
Open

Bug: Arabic search fails to find words without diacritics (Tashkeel) #243122

ahmedelq opened this issue Mar 10, 2025 · 0 comments
Assignees

Comments

@ahmedelq
Copy link

Does this issue occur when all extensions are disabled?: Yes

  • VS Code Version:  1.98.0
  • OS Version:  Linux 6.12.18-1-lts  x86_64 GNU/Linux

Steps to Reproduce:

  1. Open a new file or an existing file with Arabic text that includes words with diacritics (Harakat/Tashkeel)
  2. Example word with diacritics: Let's use the Arabic word "كِتَابٌ". Copy and paste this word into your file.
  3. Search for the word without diacritics: In VS Code's search bar (Ctrl+F), type the same word but without the diacritics: "كتاب".
  4. VS Code will not find the word "كِتَابٌ" in the file, even though it is semantically the same word.
  5. Search for the word with diacritics: Now, search again but this time include the diacritics: "كِتَابٌ".
  6. VS Code will now correctly find the word "كِتَابٌ".

Expected behavior:

VS Code search should find the word "كِتَابٌ" even when searching for "كتاب". Users often omit diacritics when searching, especially if they are not used to typing them or if the keyboard layout makes it cumbersome. The search should be more user-friendly and consider words with and without diacritics as matches, or at least provide an option to enable diacritic-insensitive search.

Almost all well known apps (e.g Word, Chrome, etc..) have a diacritic-insensitive search feature except for VSCode.

Actual behavior:

VS Code search is diacritic-sensitive and fails to find words if diacritics are missing in the search query, even if the root word is the same. This leads to a frustrating search experience for users working with Arabic and potentially other languages that use diacritics.

Suggestion Enhancements:

Implement Diacritic-Insensitive Search as a Default or Option:
Make the search diacritic-insensitive by default for languages like Arabic where diacritics are commonly omitted in general writing and search. The search algorithm should treat characters with and without diacritics as the same base character for matching purposes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants