Here's the link to the tweet where I explain how it works
Would appreciate a github star if you're reading this :)
Instructions for anyone wanting to make PRs/additions: Bottom of this readme contains a couple feature suggestions + ways to expand this for anyone coming across this.
For any new features that you want to add, the current application is set up as a command line "game" with various modes. For ex, "option 1" for single sonar, "option 2" for triple check mode, etc.
If you propose any changes, edit the core/app.py file to modify the workflow and route it to your "mode."
Get search grounded, up-to-date, accurate responses for any multiple choice trivia question within seconds (less than 3 on average). It combines the power of multiple AI models to give you the best possible answer under any timed based trivia game.
- Real-time capture and analysis: Point your camera at the question and get instant results
- Triple-check mode: Cross-references answers from three different AI models:
- OpenAI's GPT-4-Turbo
- Perplexity's Sonar Pro
- Perplexity's Sonar
- Continuous capture: Keep your camera running for seamless question-to-question transitions
- Multi-camera support: Select from available webcams on your device
- On-screen results: View answers directly in the camera feed
This application demonstrates several software engineering principles and technologies:
- Clean Architecture: Separation of concerns with distinct layers for UI, business logic, and data
- SOLID Principles: Single responsibility, dependency injection, and interface segregation
- Concurrent Processing: Parallel API calls using ThreadPoolExecutor for optimal performance
- Real-time Computer Vision: OpenCV integration for camera feeds and image processing
- Cloud AI Integration: Multiple AI service APIs orchestrated in a single application
┌─────────────┐ ┌───────────────┐ ┌──────────────┐
│ UI │────▶│ Application │────▶│ AI Services │
│ (OpenCV) │◀────│ Core │◀────│ (API Calls) │
└─────────────┘ └───────────────┘ └──────────────┘
│
▼
┌──────────────┐
│ OCR │
│ Services │
└──────────────┘
- Python 3.8+: Core programming language
- OpenCV: Camera interfacing and image processing
- Google Cloud Vision API: Optical Character Recognition
- API Integrations: OpenAI API, Perplexity API
- Concurrent Processing: Python's ThreadPoolExecutor
- Environment Management: python-dotenv for configuration
robbinhood/
├── main.py # Entry point and application bootstrap
├── config.py # Configuration management
├── camera/ # Camera abstraction layer
│ ├── __init__.py
│ └── camera_manager.py # Camera operations and frame capture
├── ocr/ # Text extraction services
│ ├── __init__.py
│ └── ocr_processor.py # OCR processing with Google Vision
├── ai/ # AI model interfaces
│ ├── __init__.py
│ ├── base_processor.py # Abstract base class for AI models
│ ├── perplexity.py # Perplexity API integration
│ └── gpt4.py # OpenAI GPT-4 integration
├── ui/ # User interface components
│ ├── __init__.py
│ ├── display.py # Display management
│ └── renderer.py # Text and overlay rendering
└── core/ # Core application logic
├── __init__.py
└── app.py # Main application workflows
- Factory Pattern: For creating AI processors
- Strategy Pattern: Different AI models implement the same interface
- Dependency Injection: Components receive their dependencies
- Observer Pattern: UI updated as results become available
- Python 3.8+
- Webcam
- API keys:
- Google Cloud Vision API (for OCR)
- Perplexity API
- OpenAI API
- Clone the repository:
git clone https://github.com/vikvang/robbinghood.git
cd robbinhood- Install the required packages:
pip install -r requirements.txt- Create a
.envfile in the project directory with your API keys:
PERPLEXITY_API_KEY=your_perplexity_api_key
OPENAI_API_KEY=your_openai_api_key
GOOGLE_CREDENTIALS_PATH=path/to/your/google_credentials.json
GEMINI_API_KEY = your_gemini_api_key
- Set up Google Cloud Vision API:
- Create a project in the Google Cloud Console
- Enable the Vision API
- Create a service account and download the JSON credentials file
- Set the path to this file in your
.envfile
Run the program:
python main.py- Parallel Processing: AI model requests run concurrently for maximum speed
- Non-blocking UI: User interface remains responsive during processing
- Optimized OCR: Google Vision API provides high-quality text extraction
- Memory Management: Temporary images are properly cleaned up
The modular architecture makes it easy to:
- Add new AI models by implementing the BaseAIProcessor interface
- Support alternative OCR engines by creating new OCR processor classes
- Create custom UI visualizations by extending the renderer
- Add new processing modes to the application core