Skip to content

manit2004/build2break

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

2 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AI vs Real Image Detection System

A sophisticated AI detection system that analyzes images to determine if they are AI-generated or real using traditional machine learning approaches with advanced feature extraction and a user-friendly web interface.

πŸš€ Quick Start

# 1. Clone and setup
git clone <repository-url>
cd build2break

# 2. Setup AI detector system
cd ai_detector
python -m venv ai_detector_env
source ai_detector_env/bin/activate  # On Windows: ai_detector_env\Scripts\activate
pip install -r requirements.txt

# 3. Train models (optional - pre-trained models included)
python train.py

# 4. Run web app
cd ../ai_detector_app
streamlit run app.py

# 5. Open browser to http://localhost:8501

πŸ“‹ Features

  • πŸ“€ File Upload & Camera: Upload images or use live camera
  • 🎯 Probability Score: 0.0-1.0 likelihood of AI generation
  • πŸ“Š Confidence Level: Low/Medium/High confidence assessment
  • πŸ’¬ Human-readable Explanations: Clear reasoning for decisions
  • ⚑ Fast Processing: Sub-60 second processing time
  • 🎨 Professional UI: Clean, responsive Streamlit interface
  • πŸ” Advanced Features: 193+ sophisticated image features
  • πŸ§ͺ Testing Tools: Demo scripts and external image testing

πŸ› οΈ Complete Setup Instructions

Prerequisites

  • Python 3.7+ (Tested on Python 3.13.7)
  • Git
  • Webcam (optional, for camera feature)

Step 1: Environment Setup

# Create project directory
mkdir ai_detection_project
cd ai_detection_project

# Clone repository
git clone <your-repo-url> .

# Navigate to AI detector folder
cd ai_detector

# Create virtual environment
python -m venv ai_detector_env

# Activate virtual environment
# On Linux/Mac:
source ai_detector_env/bin/activate
# On Windows:
ai_detector_env\Scripts\activate

# Verify Python version
python --version  # Should show Python 3.7+

Step 2: Install Dependencies

# Install all required packages
pip install -r requirements.txt

# Verify critical installations
python -c "import cv2; print(f'OpenCV: {cv2.__version__}')"
python -c "import sklearn; print(f'Scikit-learn: {sklearn.__version__}')" 
python -c "import xgboost; print(f'XGBoost: {xgboost.__version__}')"

Step 3: Prepare Data Structure

# Data folders should be created in ai_detector/
cd ai_detector

# Create required directories
mkdir -p data/real data/ai_generated
mkdir -p test_images

# Add your images to appropriate folders:
# data/real/ - Put real images here (JPG, PNG, JPEG, BMP)
# data/ai_generated/ - Put AI-generated images here
# test_images/ - Individual test images for evaluation

Step 4: Train Models

# Train the ML models (in ai_detector directory)
python train.py

# This will:
# - Extract 193+ features from images in data/ folders
# - Train SVM, Random Forest, XGBoost models
# - Save best model to 'models/' directory
# - Show performance metrics and cross-validation results
# - Generate feature importance analysis

# Alternative training options:
python demo.py              # Quick demo with sample images
python test.py              # Test trained models
python test_external_images.py  # Test on external image files

Step 5: Run Web Application

# Navigate to app directory
cd ../ai_detector_app

# Run Streamlit app
streamlit run app.py

# App will open at: http://localhost:8501

Step 6: Test the System

  1. File Upload Tab:

    • Upload an image (PNG, JPG, JPEG)
    • View prediction results
  2. Camera Tab:

    • Allow camera permissions
    • Capture photo for real-time detection
    • View instant results

🧠 Available Scripts

The ai_detector folder contains several useful scripts:

Core Scripts

  • train.py - Main training script for ML models
  • detector.py - Core detection logic and inference
  • feature_extractor.py - Advanced feature extraction (193+ features)
  • model_trainer.py - Model training utilities
  • model_evaluator.py - Model evaluation and metrics

Testing & Demo Scripts

  • demo.py - Quick demonstration with sample data
  • test.py - Test trained models on validation data
  • test_external_images.py - Test individual external images
  • data_loader.py - Data loading and preprocessing utilities

Usage Examples

# Test on specific image
python test_external_images.py path/to/your/image.jpg

# Evaluate model performance
python test.py

# Full training pipeline
python train.py

Mid Submission Guidelines (Deadline: 10 PM)

CREATE & PUSH commit titled "MIDSUBMISSION". In the commit message, include a brief summary of the work completed so far and the tech stack currently being used. A fully functional submission is not required, but your current progress should show a consistent direction. Avoid major changes to the tech stack or codebase unless there's a clear and valid reason.

If you make any significant changes after this submission, ensure they are properly explained in future commit messages.

MIDSUBMISSION IS COMPULSARY.

P.S. Ask common doubts in general so it saves time on us for repeated common questions. For problem specific related doubts, ask on the respective problem statement channels.

πŸ“Š Technical Architecture

Traditional ML Pipeline

Image Input β†’ Feature Extraction (193+ features) β†’ ML Models β†’ Prediction

Feature Categories:

  • Texture Analysis: Local Binary Patterns (LBP), Gray-Level Co-occurrence Matrix (GLCM)
  • Frequency Domain: Discrete Cosine Transform (DCT), Fast Fourier Transform (FFT)
  • Statistical Features: Histogram analysis, moments, edge detection
  • AI-Specific Features: Skin texture analysis, smoothness detection, compression artifacts

Models Available:

  • Linear SVM (Best performer: 67.79% AUC)
  • Random Forest
  • XGBoost
  • Grid search hyperparameter optimization

πŸ”§ Troubleshooting

Common Issues

OpenCV Import Error:

pip uninstall opencv-python opencv-python-headless
pip install opencv-python==4.8.1.78

Streamlit Camera Not Working:

  • Ensure browser permissions for camera access
  • Try different browsers (Chrome recommended)
  • Check if other apps are using camera

Model Training Memory Error:

  • Use smaller subsets of your data for initial testing
  • Close other applications to free memory
  • Ensure sufficient disk space for model storage

Performance Issues:

  • Close other applications to free memory
  • Use smaller batch sizes if memory errors occur
  • Ensure sufficient disk space for model storage

File Structure

build2break/
β”œβ”€β”€ .venv/                      # Main virtual environment
β”œβ”€β”€ setup_openai_env.sh        # OpenAI setup script
β”œβ”€β”€ ai_detector/                # Traditional ML system
β”‚   β”œβ”€β”€ ai_detector_env/        # Virtual environment (ignored)
β”‚   β”œβ”€β”€ data/                   # Training dataset
β”‚   β”‚   β”œβ”€β”€ real/              # Real images
β”‚   β”‚   └── ai_generated/      # AI-generated images
β”‚   β”œβ”€β”€ test_images/           # Test images
β”‚   β”œβ”€β”€ models/                # Saved models (ignored)
β”‚   β”œβ”€β”€ utils/                 # Utility functions (ignored)
β”‚   β”œβ”€β”€ feature_extractor.py   # 193+ feature extraction
β”‚   β”œβ”€β”€ model_trainer.py       # ML model training
β”‚   β”œβ”€β”€ model_evaluator.py     # Model evaluation
β”‚   β”œβ”€β”€ detector.py            # Inference pipeline
β”‚   β”œβ”€β”€ train.py               # Main training script
β”‚   β”œβ”€β”€ demo.py                # Demo script
β”‚   β”œβ”€β”€ test.py                # Testing script
β”‚   β”œβ”€β”€ test_external_images.py # External image testing
β”‚   β”œβ”€β”€ data_loader.py         # Data loading utilities
β”‚   └── requirements.txt       # Dependencies
β”œβ”€β”€ ai_detector_app/            # Streamlit web interface
β”‚   β”œβ”€β”€ app.py                 # Main web application
β”‚   β”œβ”€β”€ detector.py            # Detection logic copy
β”‚   β”œβ”€β”€ feature_extractor.py   # Feature extraction copy
β”‚   β”œβ”€β”€ data_loader.py         # Data loading copy
β”‚   β”œβ”€β”€ models/                # Model files copy
β”‚   └── requirements.txt       # App dependencies
└── README.md                   # This file
  • Close other applications

Performance Issues:

  • Use CPU-only PyTorch for compatibility: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

File Structure

build2break/
β”œβ”€β”€ ai_detector/                 # Traditional ML system
β”‚   β”œβ”€β”€ ai_detector_env/        # Virtual environment (ignored)
β”‚   β”œβ”€β”€ feature_extractor.py    # 193+ feature extraction
β”‚   β”œβ”€β”€ model_trainer.py        # ML model training
β”‚   β”œβ”€β”€ detector.py            # Inference pipeline
β”‚   β”œβ”€β”€ train.py              # Main training script
β”‚   β”œβ”€β”€ models/               # Saved models (ignored)
β”‚   └── requirements.txt      # Dependencies
β”œβ”€β”€ ai_detector_app/           # Streamlit web interface
β”‚   β”œβ”€β”€ app.py               # Main web application
β”‚   └── requirements.txt     # App dependencies
β”œβ”€β”€ data/                    # Main dataset
β”‚   β”œβ”€β”€ real/               # Real images
β”‚   └── ai_generated/       # AI-generated images
β”œβ”€β”€ data_small/             # Smaller dataset for testing
β”œβ”€β”€ test_images/            # Individual test images
β”œβ”€β”€ Untitled1.ipynb        # Neural network training
└── README.md              # This file

🎯 Current Progress

Mid Submission Guidelines (Deadline: 10 PM)

CREATE & PUSH commit titled "MIDSUBMISSION". In the commit message, include a brief summary of the work completed so far and the tech stack currently being used.

Completed Features βœ…

  • βœ… Traditional ML system with 67.79% AUC
  • βœ… 193+ sophisticated feature extraction
  • βœ… Streamlit web app with file upload
  • βœ… Real-time camera capture functionality
  • βœ… Comprehensive error handling
  • βœ… Professional UI with explanations
  • βœ… Multiple testing and demo scripts
  • βœ… Model evaluation and comparison tools

In Progress πŸ”„

  • πŸ”„ Performance optimization for larger datasets
  • πŸ”„ Model comparison and evaluation
  • πŸ”„ Advanced feature engineering

πŸ› οΈ Tech Stack

  • Backend: Python 3.13.7, OpenCV 4.12.0, scikit-learn 1.7.2
  • Machine Learning: Linear SVM, Random Forest, XGBoost 3.0.5
  • Feature Extraction: LBP, GLCM, DCT, FFT, Statistical Analysis
  • Frontend: Streamlit with camera integration
  • Data Processing: PIL, NumPy, pandas
  • Model Persistence: joblib

πŸ“ˆ Performance Metrics

Traditional ML Results:

  • Best Model: Linear SVM
  • Test AUC: 67.79%
  • Cross-Validation AUC: 68.10% Β± 4.39%
  • Processing Speed: ~0.04 seconds per image
  • Dataset Size: 10,000 samples (5,000 real + 5,000 AI-generated)

Detailed Performance:

  • Precision: 63% (Real), 62% (AI-generated)
  • Recall: 61% (Real), 64% (AI-generated)
  • F1-Score: 62% (Real), 63% (AI-generated)
  • Overall Accuracy: 62%

Note: Model accuracy will be updated once the current larger dataset training completes.

Installation

  1. Install required packages:
pip install streamlit pillow
  1. Make sure the main AI detector is trained and models are available in ../ai_detector/models/

Usage

  1. Start the Streamlit app:
streamlit run app.py
  1. Open your browser to the provided URL (usually http://localhost:8501)

  2. Upload an image and get instant AI detection results!

Technical Details

  • Model: Linear SVM with 180+ sophisticated features
  • Features: Texture (LBP, GLCM), frequency domain (DCT, FFT), statistical, edge, color analysis
  • Face Detection: OpenCV Haar cascades
  • Performance: ~0.04 seconds per image
  • Current Accuracy: 67.79% AUC on test dataset

Output Format

For each image, the app provides:

  • AI Probability: Score between 0.0 (definitely real) and 1.0 (definitely AI)
  • Classification: "Real" or "AI-Generated" based on probability threshold
  • Confidence Level: High (>90%), Medium (70-90%), Low (<70%)
  • Explanation: Human-readable reasoning for the decision
  • Processing Time: Actual time taken for analysis

Requirements

  • Python 3.7+
  • Streamlit
  • OpenCV
  • PIL (Pillow)
  • All dependencies from the main AI detector system

Work Completed So Far

βœ… Completed Components:

  1. Feature Extraction System: 180+ sophisticated features for image analysis
  2. Machine Learning Pipeline: Multiple ML models with hyperparameter tuning
  3. Data Processing: Automated face detection and preprocessing
  4. Model Training: Complete training pipeline with cross-validation
  5. Web Interface: Streamlit app for user-friendly interaction
  6. Testing Framework: Comprehensive testing and evaluation system

πŸ”„ In Progress:

  1. Large Dataset Training: Training on 66K+ images for improved accuracy
  2. Performance Optimization: Enhancing model accuracy and speed
  3. UI/UX Improvements: Refining the web interface
  4. AUDIO CONVERTER: We are training a classification model (DL model) to distinguish between ai generated and real audio clips. The way we are approaching this problem is we are first generating an embedding from the audio clip using a model like wave2vec and then train the model based on the embedding. The model training is currently on progress

πŸ“‹ Next Steps:

  1. Complete large dataset training
  2. Final model evaluation and metrics reporting
  3. Deploy and test with external images
  4. Documentation and final submission

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages