Qwen2VL-OCR

This web application allows users to upload images, extract text in Hindi & English using a pre-trained vision-language model (Qwen2-VL), and search for specific keywords within the extracted text. The application uses Gradio for the user interface.

Live Demo

Project Details

Key Features:

OCR: Extracts text in English, Sanskrit, and Hindi from images using the Qwen2-VL model.
Keyword Search: Search for keywords within the extracted text.
Multilingual Support: Works for images containing hindi and english texts.

Technology Stack:

Gradio: Provides the user interface for image uploading and interaction.
Transformers: Handles the model loading and text extraction process.
Torch: Used for computation on the CPU during inference.

Setup Instructions

Prerequisites

Ensure you have the following installed:

Python 3.8 or later
Virtualenv or any Python environment management tool
pip (Python package manager)

Installation

Clone the repository:

Open your terminal and run:

git clone https://github.com/SwekeR-463/Qwen2VL-OCR.git
cd Qwen2VL-OCR

Create a virtual environment:

Use virtualenv or venv to create an isolated environment:
```
python -m venv venv
```
Activate the virtual environment:

On Windows:
```
venv\Scripts\activate
```
On Mac/Linux:
```
source venv/bin/activate
```
Install the Dependencies:

After activating the virtual environment, install the required Python packages:
```
pip install -r requirements.txt
```

Running the Web App locally

Once the environment is set up and dependencies are installed, you can run the application locally:

Start the Gradio App:

In the project directory, run:
```
python app.py
```
Open the Application:

Once the app starts, it will display a URL (e.g., http://127.0.0.1:7860). Open this URL in your browser to access the web application interface.
Using the App:
- Upload Image: Click on "Upload an Image" and select an image file (JPEG, JPG, PNG).
- Keyword Search: Enter the keyword to search for within the extracted text.
- Results: The app will display the extracted text, highlight search results, and show a JSON output of the extraction.

Deployment on Hugging Face Spaces

You can deploy this application easily on Hugging Face Spaces, which supports Gradio applications natively. Follow these steps:

Create a Hugging Face Account:

Sign up or log in to Hugging Face.
Create a New Space:
- Navigate to the [Spaces](https://huggingface.co/spaces) section.
- Click on "Create a Space."
- Choose Gradio as the template for your Space.
Set Up the Repository:
- Clone the repository on Hugging Face Spaces by connecting your GitHub repository or uploading the files manually.
- Make sure to include the requirements.txt file to install the necessary dependencies (e.g., gradio, transformers, torch).
Add the Required Files: Ensure the following files are in the repository:
- app.py: The main Python script running the Gradio app.
- requirements.txt: The list of dependencies.
Hugging Face will automatically install the dependencies from requirements.txt.
Push Your Changes:
- Once the repository is set up, commit and push your changes to Hugging Face.
- Hugging Face Spaces will automatically build and deploy your app.
Access Your Space:
- After a successful build, you will get a URL for your Space (e.g., https://huggingface.co/spaces/yourusername/your-space-name).
- Open this link in your browser to use the deployed application.

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
extracted-text		extracted-text
ColpaliQwen.ipynb		ColpaliQwen.ipynb
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Qwen2VL-OCR

Live Demo

Project Details

Key Features:

Technology Stack:

Setup Instructions

Prerequisites

Installation

Running the Web App locally

Deployment on Hugging Face Spaces

About

Releases

Packages

Languages

SwekeR-463/Qwen2VL-OCR

Folders and files

Latest commit

History

Repository files navigation

Qwen2VL-OCR

Live Demo

Project Details

Key Features:

Technology Stack:

Setup Instructions

Prerequisites

Installation

Running the Web App locally

Deployment on Hugging Face Spaces

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages