Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Russian language #100

Open
juhnowski opened this issue May 30, 2018 · 3 comments
Open

Russian language #100

juhnowski opened this issue May 30, 2018 · 3 comments

Comments

@juhnowski
Copy link

The Russian dictionary is very low quality. In the recognized text is inserted obscene language. Look attentively at the committers of the dictionary and consider whether it is worth continuing cooperation with them.

@amitdo
Copy link

amitdo commented May 30, 2018

Look attentively at the committers of the dictionary and consider whether it is worth continuing cooperation with them.

We'll fire the bots... 😆

@juhnowski
Copy link
Author

Do you need help in this shooting?

@amitdo
Copy link

amitdo commented May 31, 2018

From #62 (comment)

theraysmith commented on Aug 3, 2017

FYI: The wordlists are generated files, so it isn't a good idea to modify them, as the modifications will likely get overwritten in a future training. To help prevent the ß/B confusion, the words that you want to lose from the wordlists need to go in langdata/lang/lang.bad_words.

See also page 8 in https://github.com/tesseract-ocr/docs/raw/master/das_tutorial2016/6ModernizationEfforts.pdf.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants