Skip to content

An OCR (tesseract) web interface to upload images. The idea of this project is to study technologies like Python, Django, Continuous Integration, Celery, etc...

Notifications You must be signed in to change notification settings

fabinhojorge/OCR_web

Repository files navigation

CircleCI

OCR web

This project is an OCR (tesseract) web interface to upload images. The idea of this project is to study technologies like Python, Django, Tesseract(OCR), Continuous Integration, Celery, etc...

How to install and Run

After activate your Python Virtual Environment (venv) run the below command:

pip install -r requirements.txt

python manage.py migrate

python manage.py runserver

So you can access in the local URL: localhost:8000

Inside the requirements.txt there are a package called pytesseract. It´s the wrapper to communicate with the Tesseract library (C/C++ code). So, the next step is to install the Tesseract itself.

For this, please follow the below instructions for your SO:

If an additional language is required, is necessary to download it from here and move it to $TESSERACT_PATH/tessdata/

How to use

  1. TBD

Libraries

  • Django
  • Pillow
  • Bootstrap
  • JQuery
  • Tesseract (pytesseract)
  • Celery

To Do

  • Create an initial project
  • Add the continuous integration build and test (Circleci)
  • Create the upload media system: models, forms, templates, media url, etc...
  • Call the OCR to process the image
  • Add image link in the home page. Click the image open a modal to check the image
  • Add support to different languages. OCR have a parameter to select the language of the text. User to inform this while uploading the image.
  • Create a model to store and an interface to return the text generated by the OCR.
  • Pagination in the first page
  • Model to handle a copy of the original image. This copy that will be used to run the OCR. The idea is to in the future use this copy to apply some image treatments (all triggered by interface).
  • Basic image treatment like: cut, rotate, threshold, 'grow'
  • After the core is working, enhance the BE with Celery.

Screen Shots

Home Page Home page

Image Zoom Image Zoom

About

An OCR (tesseract) web interface to upload images. The idea of this project is to study technologies like Python, Django, Continuous Integration, Celery, etc...

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published