CapGen is a web application that generates meaningful captions for any uploaded image in real-time using the BLIP (Bootstrapped Language-Image Pretraining) model from Hugging Face. Built with Streamlit, it offers a clean, interactive, and user-friendly interface for instant AI-powered image captioning.
- Upload images and generate descriptive captions instantly
- Uses BLIP, a state-of-the-art image captioning model
- Fully interactive Streamlit web interface
- Clean and polished UI suitable for portfolios or demos
- Fully local processing, no external API required
| Tool | Purpose |
|---|---|
| Python | Core programming language |
| Streamlit | Front-end for the app |
| Hugging Face | BLIP model and inference |
| Torch | Backend deep learning lib |
| PIL | Image processing |
git clone https://github.com/AaryanAgrawal96/image-caption-generator-blip.git
cd image-caption-generator-blippip install -r requirements.txtstreamlit run generate_caption.pyAaryan Agrawal
BTech in Computer Science and Engineering, IIT Jodhpur