Skip to content

an ocr using qwen2-vl for hindi & english text extraction from images

Notifications You must be signed in to change notification settings

SwekeR-463/Qwen2VL-OCR

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Qwen2VL-OCR

This web application allows users to upload images, extract text in Hindi & English using a pre-trained vision-language model (Qwen2-VL), and search for specific keywords within the extracted text. The application uses Gradio for the user interface.

Screenshot

Project Details

Key Features:

  • OCR: Extracts text in English, Sanskrit, and Hindi from images using the Qwen2-VL model.
  • Keyword Search: Search for keywords within the extracted text.
  • Multilingual Support: Works for images containing hindi and english texts.

Technology Stack:

  • Gradio: Provides the user interface for image uploading and interaction.
  • Transformers: Handles the model loading and text extraction process.
  • Torch: Used for computation on the CPU during inference.

Setup Instructions

Prerequisites

Ensure you have the following installed:

  • Python 3.8 or later
  • Virtualenv or any Python environment management tool
  • pip (Python package manager)

Installation

  1. Clone the repository:

    Open your terminal and run:

    git clone https://github.com/SwekeR-463/Qwen2VL-OCR.git
    cd Qwen2VL-OCR
  2. Create a virtual environment:

    Use virtualenv or venv to create an isolated environment:

    python -m venv venv
  3. Activate the virtual environment:

    On Windows:

    venv\Scripts\activate

    On Mac/Linux:

    source venv/bin/activate
  4. Install the Dependencies:

    After activating the virtual environment, install the required Python packages:

    pip install -r requirements.txt

Running the Web App locally

Once the environment is set up and dependencies are installed, you can run the application locally:

  1. Start the Gradio App:

    In the project directory, run:

    python app.py
  2. Open the Application:

    Once the app starts, it will display a URL (e.g., http://127.0.0.1:7860). Open this URL in your browser to access the web application interface.

  3. Using the App:

    • Upload Image: Click on "Upload an Image" and select an image file (JPEG, JPG, PNG).
    • Keyword Search: Enter the keyword to search for within the extracted text.
    • Results: The app will display the extracted text, highlight search results, and show a JSON output of the extraction.

Deployment on Hugging Face Spaces

You can deploy this application easily on Hugging Face Spaces, which supports Gradio applications natively. Follow these steps:

  1. Create a Hugging Face Account:

    Sign up or log in to Hugging Face.

  2. Create a New Space:

  3. Set Up the Repository:

    • Clone the repository on Hugging Face Spaces by connecting your GitHub repository or uploading the files manually.
    • Make sure to include the requirements.txt file to install the necessary dependencies (e.g., gradio, transformers, torch).
  4. Add the Required Files: Ensure the following files are in the repository:

    • app.py: The main Python script running the Gradio app.
    • requirements.txt: The list of dependencies.

    Hugging Face will automatically install the dependencies from requirements.txt.

  5. Push Your Changes:

    • Once the repository is set up, commit and push your changes to Hugging Face.
    • Hugging Face Spaces will automatically build and deploy your app.
  6. Access Your Space:

About

an ocr using qwen2-vl for hindi & english text extraction from images

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published