This web application allows users to upload images, extract text in Hindi & English using a pre-trained vision-language model (Qwen2-VL), and search for specific keywords within the extracted text. The application uses Gradio for the user interface.
- OCR: Extracts text in English, Sanskrit, and Hindi from images using the Qwen2-VL model.
- Keyword Search: Search for keywords within the extracted text.
- Multilingual Support: Works for images containing hindi and english texts.
- Gradio: Provides the user interface for image uploading and interaction.
- Transformers: Handles the model loading and text extraction process.
- Torch: Used for computation on the CPU during inference.
Ensure you have the following installed:
- Python 3.8 or later
- Virtualenv or any Python environment management tool
pip
(Python package manager)
-
Clone the repository:
Open your terminal and run:
git clone https://github.com/SwekeR-463/Qwen2VL-OCR.git cd Qwen2VL-OCR
-
Create a virtual environment:
Use virtualenv or venv to create an isolated environment:
python -m venv venv
-
Activate the virtual environment:
On Windows:
venv\Scripts\activate
On Mac/Linux:
source venv/bin/activate
-
Install the Dependencies:
After activating the virtual environment, install the required Python packages:
pip install -r requirements.txt
Once the environment is set up and dependencies are installed, you can run the application locally:
-
Start the Gradio App:
In the project directory, run:
python app.py
-
Open the Application:
Once the app starts, it will display a URL (e.g., http://127.0.0.1:7860). Open this URL in your browser to access the web application interface.
-
Using the App:
- Upload Image: Click on "Upload an Image" and select an image file (JPEG, JPG, PNG).
- Keyword Search: Enter the keyword to search for within the extracted text.
- Results: The app will display the extracted text, highlight search results, and show a JSON output of the extraction.
You can deploy this application easily on Hugging Face Spaces, which supports Gradio applications natively. Follow these steps:
-
Create a Hugging Face Account:
Sign up or log in to Hugging Face.
-
Create a New Space:
- Navigate to the [Spaces](https://huggingface.co/spaces) section.
- Click on "Create a Space."
- Choose Gradio as the template for your Space.
-
Set Up the Repository:
- Clone the repository on Hugging Face Spaces by connecting your GitHub repository or uploading the files manually.
- Make sure to include the requirements.txt file to install the necessary dependencies (e.g., gradio, transformers, torch).
-
Add the Required Files: Ensure the following files are in the repository:
- app.py: The main Python script running the Gradio app.
- requirements.txt: The list of dependencies.
Hugging Face will automatically install the dependencies from requirements.txt.
-
Push Your Changes:
- Once the repository is set up, commit and push your changes to Hugging Face.
- Hugging Face Spaces will automatically build and deploy your app.
-
Access Your Space:
- After a successful build, you will get a URL for your Space (e.g., https://huggingface.co/spaces/yourusername/your-space-name).
- Open this link in your browser to use the deployed application.