[New] Make RPA/recognition/ocr return actual confidence#139
Closed
[New] Make RPA/recognition/ocr return actual confidence#139
Conversation
We used the terms "confidence" and "coincidence", which are ambiguous
and used inconsistently:
* When using tesseract, only "confidence" is used, and it refers to the
similarity between the text that it tries to find and the text that
was returned by OCR.
* When using RapidOCR:
* "confidence" refers to the confidence score returned by RapidOCR,
which is the model's estimated probability that the returned text is
actually what is visible in the image.
* "coincidence" refers to the same which "confidence" refers to in
tesseract: the similarity between the text that we try to find and
the text that was returned by OCR
* In the matches dictionaries returned by find_text, the confidence item
always refers to the text similarity, even when using RapidOCR, where
the coincidence argument is used for that.
* For RapidOCR, the "confidence" value is a fraction (between 0 and 1),
for the other values we use percentage (between 0 and 100).
We now consistently use:
* "similarity" for text similarity
* "confidence" for the OCR confidence score (only returned by RapidOCR)
* Percentage (0 to 100) for all similarity and confidence values.
The module only returned similarity but tesseract does actually also return confidence values, we just have to use it.
Contributor
Author
|
Merged into #138 because without this change, the other PR wouldn't have 100% coverage |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
The
yarf/vendor/RPA/recognition/ocr.pymodule only returned similarity but tesseract does actually also return confidence values, we just have to use it.Tests
The module doesn't have any tests as far as I can see and I didn't add any.
Important
This is based on #138, please review that PR first.