Skip to content

Comments

[New] Make RPA/recognition/ocr return actual confidence#139

Closed
adombeck wants to merge 2 commits intomainfrom
tesseract-confidence
Closed

[New] Make RPA/recognition/ocr return actual confidence#139
adombeck wants to merge 2 commits intomainfrom
tesseract-confidence

Conversation

@adombeck
Copy link
Contributor

Description

The yarf/vendor/RPA/recognition/ocr.py module only returned similarity but tesseract does actually also return confidence values, we just have to use it.

Tests

The module doesn't have any tests as far as I can see and I didn't add any.

Important

This is based on #138, please review that PR first.

We used the terms "confidence" and "coincidence", which are ambiguous
and used inconsistently:

* When using tesseract, only "confidence" is used, and it refers to the
  similarity between the text that it tries to find and the text that
  was returned by OCR.

* When using RapidOCR:
  * "confidence" refers to the confidence score returned by RapidOCR,
    which is the model's estimated probability that the returned text is
    actually what is visible in the image.
  * "coincidence" refers to the same which "confidence" refers to in
    tesseract: the similarity between the text that we try to find and
    the text that was returned by OCR

* In the matches dictionaries returned by find_text, the confidence item
  always refers to the text similarity, even when using RapidOCR, where
  the coincidence argument is used for that.

* For RapidOCR, the "confidence" value is a fraction (between 0 and 1),
  for the other values we use percentage (between 0 and 100).

We now consistently use:
* "similarity" for text similarity
* "confidence" for the OCR confidence score (only returned by RapidOCR)
* Percentage (0 to 100) for all similarity and confidence values.
The module only returned similarity but tesseract does actually also
return confidence values, we just have to use it.
@adombeck
Copy link
Contributor Author

Merged into #138 because without this change, the other PR wouldn't have 100% coverage

@adombeck adombeck closed this Dec 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant