You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Unicode has 92,865 CJK ideographic characters. But each language uses a small subset. Annex A of ISO/IEC 10646 shows a list of character collections relevant to Japanese text. (Note: Annex A also provides collections for other languages as well). Each of the listed character collections contains less than 10,000 characters.
Assistive technologies (e.g., Japanese TTS) are unlikely to handle 92,865 CJK ideographic characters. According to a report from a Japanese ministry in 2015, most TTS engines support 6355 characters in JIS X 0208 only. I have not heard significant improvements since then.
Moreover, authors of textbooks or books for children use even smaller subsets for pedagogical reasons. For example, 1006 CJK ideographic characters are taught in Japanese compulsory education.
I thus think that accessibility metadata should be able to indicate (1) which character collection is used as a basis and (2) which character beyond the specified collection is used as exceptions, which are sometimes necessary. I believe that this is good for other CJK countries. Moreover, since no languages and no TTS engines support all Unicode characters, I guess that this is good for everybody.
The text was updated successfully, but these errors were encountered:
xfq
added
the
i18n-tracker
Group bringing to attention of Internationalization, or tracked by i18n but not needing response.
label
Dec 12, 2021
Unicode has 92,865 CJK ideographic characters. But each language uses a small subset. Annex A of ISO/IEC 10646 shows a list of character collections relevant to Japanese text. (Note: Annex A also provides collections for other languages as well). Each of the listed character collections contains less than 10,000 characters.
Assistive technologies (e.g., Japanese TTS) are unlikely to handle 92,865 CJK ideographic characters. According to a report from a Japanese ministry in 2015, most TTS engines support 6355 characters in JIS X 0208 only. I have not heard significant improvements since then.
Moreover, authors of textbooks or books for children use even smaller subsets for pedagogical reasons. For example, 1006 CJK ideographic characters are taught in Japanese compulsory education.
I thus think that accessibility metadata should be able to indicate (1) which character collection is used as a basis and (2) which character beyond the specified collection is used as exceptions, which are sometimes necessary. I believe that this is good for other CJK countries. Moreover, since no languages and no TTS engines support all Unicode characters, I guess that this is good for everybody.
The text was updated successfully, but these errors were encountered: