Skip to content

How to use this repo to correctly split words in OCR #1

@zydjohnHotmail

Description

@zydjohnHotmail

Hello:
I am new to Orca2 language model, I have no idea on how to use this, but I have a real world example, so I want to know how to use t this language model to split the words correctly for those OCR text returned by Florence2 language model.
See the screenshot to know the OCR test image.
The OCR returned by Florence2 language model looks like this:
This is a lot of 12 point text to test theocr code and see if it works on all ctypesof file format.The quick brown dog jumped over thelazy fox. The quick brown dog jumpedover the lazy fox.Thequick brown dogjumped over the lazyfox. Thequickbrown dog jumpedoverthe lazy fox
Basically, the OCR results are quite OK, but not exactly. I want to use Orca2 language model to correctly split the following strings:
theocr; jumpedover; dogjumped; lazyfox. Thequickbrown; jumpedoverthe. Even in the original image, there are clearly some spaces, but the OCR results did NOT have any space at all.
Please show me some code how to use this language to correctly split those words.
testocr

Thanks,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions