tessdata_ocrb

traineddata for MRZ using OCR-B fonts

This is a proof of concept traineddata in response to this post in tesseract-ocr forum

Feel free to clone the repo and rerun training with your own custom training_text and fonts.

Update: April 15, 2019

Retrained to add missing X using 3 OCRB fonts and a larger training text compared to previous version. Both float/best and integer/fast versions are provided.

Trained by plus finetuning tessdata_best/eng.traineddata

(800 iterations - char train=0.273%, word train=3.47%, word train=0%)

Download best version - 10.8 MB. use with -l ocrb.
Download fast version - 1.38 MB. use with -l ocrb_int.

Evaluation

ocrb_eval folder has synthetic MRZ samples in the same 3 fonts for evaluation. The box/tiff pairs are also saved.

lstmeval of the files with tessdata_best/eng gives Eval Char error rate=44.954738, Word error rate=89.583333.
lstmeval of the files with ocrb and ocrb_int gives Eval Char error rate=0, Word error rate=0.

Test

tesseract ./ocrb_eval/eng.OCR-B_10_BT.exp0.tif - -l ocrb --tessdata-dir ./

Failed to load any lstm-specific dictionaries for lang ocrb!!
Page 1

P<EURJOSE<<POLLY<<<<<<<<<<<<<<<<<<<<<<<<<<<<
IDAUT10000999<6<<<<<<<<<<<<<<<
3178087440GBR6502230M1810301<<<<<<<<<<<<<<02
IDCHE123456789C791217C891201CC
L898902C<3UZB6908061F9406236ZE184226B<<<<<14
D231458907WLF7408122F1204159<<<<<<<6
V<KQTPAPANDROPOULOUS<<STEPHEN<TREVOR<<<<<<<<
P<INDVARMA<<NATHU<RAO<GODSE<SHARMA<<<<<<<<<<
P<GUYGLADYS<ALMETA<<<<<<<<<<<<<<<<<
IDEBD<ALEZYZ<BN<MTEB<<<<<<<<<<<<<<<<<<<<<
6754505660GBR6006163M2012296<<<<<<<<<<<<<<08
0504933705ITL7307144F1601013<<<<<<<<<<<<<<06
AA00000000POL6002084F1412314<<<<<<<<<<<<<<<4
7553279419RUS8712242M2104131<<<<<<<<<<<<<<02
P<ROMJOHNSON<<ANNE<QUEEN<<<<<<<<<<<<<<<<<<<<
P<IRNKHOSRAVANI<<MAHBOD<<<<<<<<<<<<<<<<<<<<<

Other projects

There are other github repos with MRZ solutions (I have not tried them).

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
ocrb_eval		ocrb_eval
README.md		README.md
eng.MRZ.eval.training_text		eng.MRZ.eval.training_text
eng.MRZ.training_text		eng.MRZ.training_text
finetune-ocrb.sh		finetune-ocrb.sh
ocrb.traineddata		ocrb.traineddata
ocrb_int.traineddata		ocrb_int.traineddata

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tessdata_ocrb

Update: April 15, 2019

Trained by plus finetuning tessdata_best/eng.traineddata

Evaluation

Test

Other projects

About

Releases

Packages

Languages

Shreeshrii/tessdata_ocrb

Folders and files

Latest commit

History

Repository files navigation

tessdata_ocrb

Update: April 15, 2019

Trained by plus finetuning tessdata_best/eng.traineddata

Evaluation

Test

Other projects

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages