Skip to content
/ ABINet Public
forked from FangShancheng/ABINet

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

License

Notifications You must be signed in to change notification settings

LPAIS/ABINet

This branch is 9 commits behind FangShancheng/ABINet:main.

Folders and files

NameName
Last commit message
Last commit date

Latest commit

author
fangshancheng
Jun 18, 2021
d3bfeec · Jun 18, 2021

History

4 Commits
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021
Jun 18, 2021

Repository files navigation

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

The official code of ABINet (CVPR 2021, Oral).

ABINet uses a vision model and an explicit language model to recognize text in the wild, which are trained in end-to-end way. The language model (BCN) achieves bidirectional language representation in simulating cloze test, additionally utilizing iterative correction strategy.

framework

Runtime Environment

  • We provide a pre-built docker image using the Dockerfile from docker/Dockerfile

  • Running in Docker

    $ [email protected]:FangShancheng/ABINet.git
    $ docker run --gpus all --rm -ti --ipc=host -v $(pwd)/ABINet:/app fangshancheng/fastai:torch1.1 /bin/bash
    
  • (Untested) Or using the dependencies

    pip install -r requirements.txt
    

Datasets

  • Training datasets

    1. MJSynth (MJ):
    2. SynthText (ST):
    3. WikiText103, which is only used for pre-trainig language models:
  • Evaluation datasets, LMDB datasets can be downloaded from BaiduNetdisk(passwd:1dbv), GoogleDrive.

    1. ICDAR 2013 (IC13)
    2. ICDAR 2015 (IC15)
    3. IIIT5K Words (IIIT)
    4. Street View Text (SVT)
    5. Street View Text-Perspective (SVTP)
    6. CUTE80 (CUTE)
  • The structure of data directory is

    data
    ├── charset_36.txt
    ├── evaluation
    │   ├── CUTE80
    │   ├── IC13_857
    │   ├── IC15_1811
    │   ├── IIIT5k_3000
    │   ├── SVT
    │   └── SVTP
    ├── training
    │   ├── MJ
    │   │   ├── MJ_test
    │   │   ├── MJ_train
    │   │   └── MJ_valid
    │   └── ST
    ├── WikiText-103.csv
    └── WikiText-103_eval_d1.csv
    

Pretrained Models

Get the pretrained models from BaiduNetdisk(passwd:kwck), GoogleDrive. Performances of the pretrained models are summaried as follows:

Model IC13 SVT IIIT IC15 SVTP CUTE AVG
ABINet-SV 97.1 92.7 95.2 84.0 86.7 88.5 91.4
ABINet-LV 97.0 93.4 96.4 85.9 89.5 89.2 92.7

Training

  1. Pre-train vision model
    CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_vision_model.yaml
    
  2. Pre-train language model
    CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/pretrain_language_model.yaml
    
  3. Train ABINet
    CUDA_VISIBLE_DEVICES=0,1,2,3 python main.py --config=configs/train_abinet.yaml
    

Note:

  • You can set the checkpoint path for vision and language models separately for specific pretrained model, or set to None to train from scratch

Evaluation

CUDA_VISIBLE_DEVICES=0 python main.py --config=configs/train_abinet.yaml --phase test --image_only

Additional flags:

  • --checkpoint /path/to/checkpoint set the path of evaluation model
  • --test_root /path/to/dataset set the path of evaluation dataset
  • --model_eval [alignment|vision] which sub-model to evaluate
  • --image_only disable dumping visualization of attention masks

Visualization

Successful and failure cases on low-quality images:

cases

Citation

If you find our method useful for your reserach, please cite

@article{fang2021read,
  title={Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition},
  author={Fang, Shancheng and Xie, Hongtao and Wang, Yuxin and Mao, Zhendong and Zhang, Yongdong},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  year={2021}
}

License

This project is only free for academic research purposes, licensed under the 2-clause BSD License - see the LICENSE file for details.

Feel free to contact [email protected] if you have any questions.

About

Read Like Humans: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Recognition

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 95.1%
  • Python 4.9%