Skip to content

vikvang/robbinghood_live

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RobbinHood: a Perplexity powered trivia assistant.

Here's the link to the tweet where I explain how it works

Would appreciate a github star if you're reading this :)

Instructions for anyone wanting to make PRs/additions: Bottom of this readme contains a couple feature suggestions + ways to expand this for anyone coming across this. For any new features that you want to add, the current application is set up as a command line "game" with various modes. For ex, "option 1" for single sonar, "option 2" for triple check mode, etc. If you propose any changes, edit the core/app.py file to modify the workflow and route it to your "mode."

Get search grounded, up-to-date, accurate responses for any multiple choice trivia question within seconds (less than 3 on average). It combines the power of multiple AI models to give you the best possible answer under any timed based trivia game.

Features

  • Real-time capture and analysis: Point your camera at the question and get instant results
  • Triple-check mode: Cross-references answers from three different AI models:
    • OpenAI's GPT-4-Turbo
    • Perplexity's Sonar Pro
    • Perplexity's Sonar
  • Continuous capture: Keep your camera running for seamless question-to-question transitions
  • Multi-camera support: Select from available webcams on your device
  • On-screen results: View answers directly in the camera feed

Technical Overview

This application demonstrates several software engineering principles and technologies:

  • Clean Architecture: Separation of concerns with distinct layers for UI, business logic, and data
  • SOLID Principles: Single responsibility, dependency injection, and interface segregation
  • Concurrent Processing: Parallel API calls using ThreadPoolExecutor for optimal performance
  • Real-time Computer Vision: OpenCV integration for camera feeds and image processing
  • Cloud AI Integration: Multiple AI service APIs orchestrated in a single application

Architecture

┌─────────────┐     ┌───────────────┐     ┌──────────────┐
│     UI      │────▶│  Application  │────▶│ AI Services  │
│  (OpenCV)   │◀────│     Core      │◀────│ (API Calls)  │
└─────────────┘     └───────────────┘     └──────────────┘
                           │
                           ▼
                    ┌──────────────┐
                    │     OCR      │
                    │   Services   │
                    └──────────────┘

Technologies Used

  • Python 3.8+: Core programming language
  • OpenCV: Camera interfacing and image processing
  • Google Cloud Vision API: Optical Character Recognition
  • API Integrations: OpenAI API, Perplexity API
  • Concurrent Processing: Python's ThreadPoolExecutor
  • Environment Management: python-dotenv for configuration

Code Structure

robbinhood/
├── main.py                 # Entry point and application bootstrap
├── config.py               # Configuration management
├── camera/                 # Camera abstraction layer
│   ├── __init__.py
│   └── camera_manager.py   # Camera operations and frame capture
├── ocr/                    # Text extraction services
│   ├── __init__.py
│   └── ocr_processor.py    # OCR processing with Google Vision
├── ai/                     # AI model interfaces
│   ├── __init__.py
│   ├── base_processor.py   # Abstract base class for AI models
│   ├── perplexity.py       # Perplexity API integration
│   └── gpt4.py             # OpenAI GPT-4 integration
├── ui/                     # User interface components
│   ├── __init__.py
│   ├── display.py          # Display management
│   └── renderer.py         # Text and overlay rendering
└── core/                   # Core application logic
    ├── __init__.py
    └── app.py              # Main application workflows

Design Patterns Used

  • Factory Pattern: For creating AI processors
  • Strategy Pattern: Different AI models implement the same interface
  • Dependency Injection: Components receive their dependencies
  • Observer Pattern: UI updated as results become available

Requirements

  • Python 3.8+
  • Webcam
  • API keys:
    • Google Cloud Vision API (for OCR)
    • Perplexity API
    • OpenAI API

Installation

  1. Clone the repository:
git clone https://github.com/vikvang/robbinghood.git
cd robbinhood
  1. Install the required packages:
pip install -r requirements.txt
  1. Create a .env file in the project directory with your API keys:
PERPLEXITY_API_KEY=your_perplexity_api_key
OPENAI_API_KEY=your_openai_api_key
GOOGLE_CREDENTIALS_PATH=path/to/your/google_credentials.json
GEMINI_API_KEY = your_gemini_api_key
  1. Set up Google Cloud Vision API:
    • Create a project in the Google Cloud Console
    • Enable the Vision API
    • Create a service account and download the JSON credentials file
    • Set the path to this file in your .env file

Usage

Run the program:

python main.py

Performance Considerations (i tried implementing the following but could be improved)

  • Parallel Processing: AI model requests run concurrently for maximum speed
  • Non-blocking UI: User interface remains responsive during processing
  • Optimized OCR: Google Vision API provides high-quality text extraction
  • Memory Management: Temporary images are properly cleaned up

Extending the Application (feature suggestions open to anyone to build on top of this)

The modular architecture makes it easy to:

  • Add new AI models by implementing the BaseAIProcessor interface
  • Support alternative OCR engines by creating new OCR processor classes
  • Create custom UI visualizations by extending the renderer
  • Add new processing modes to the application core

About

Perplexity powered AI assistant for time based trivia games

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •  

Languages