Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify language support quality status #83

Open
eyalroz opened this issue May 21, 2022 · 6 comments
Open

Clarify language support quality status #83

eyalroz opened this issue May 21, 2022 · 6 comments

Comments

@eyalroz
Copy link

eyalroz commented May 21, 2022

The README.md says tesseract "supports over 100 languages out of the box". But - which languages? And what quality is the support for different languages known to be, out of the box?

It would be helpful if a separate file (or wiki page) would detail, to the extent possible, this information.

@stweil
Copy link
Contributor

stweil commented May 21, 2022

See https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html. All work on Tesseract is currently done by volunteers, so you are invited to find the answers to your questions and document them.

@eyalroz
Copy link
Author

eyalroz commented May 21, 2022

@stweil : Can you linkify the "100 languages" sentence in the README.md to point to that page?

@amitdo amitdo transferred this issue from tesseract-ocr/tesseract Jun 12, 2022
tooomm added a commit to tooomm/tesseract that referenced this issue Mar 5, 2023
@tooomm
Copy link
Contributor

tooomm commented Mar 5, 2023

@eyalroz I went ahead and propsed the change in the tesseract repo: tesseract-ocr/tesseract#4027

I also think it would be very helpful. Even though the list itself has no information on languages in v5 yet.

amitdo pushed a commit to tesseract-ocr/tesseract that referenced this issue Mar 6, 2023
@amitdo
Copy link
Collaborator

amitdo commented Mar 9, 2023

Even though the list itself has no information on languages in v5 yet.

There was no update for v5. All the v4 data files should work with Tesseract 5.x.

@tooomm
Copy link
Contributor

tooomm commented Mar 9, 2023

There was no update for v5. All the v4 data files should work with Tesseract 5.x.

That's at least not obvious from the table.

The information can be found in other parts of the docs, true. Users can easily miss it though.
Language model traineddata files same as listed above for version 4.0.0 can be used with Tesseract 5.x.x.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants