AI vs Real Image Detection System

A sophisticated AI detection system that analyzes images to determine if they are AI-generated or real using traditional machine learning approaches with advanced feature extraction and a user-friendly web interface.

🚀 Quick Start

# 1. Clone and setup
git clone <repository-url>
cd build2break

# 2. Setup AI detector system
cd ai_detector
python -m venv ai_detector_env
source ai_detector_env/bin/activate  # On Windows: ai_detector_env\Scripts\activate
pip install -r requirements.txt

# 3. Train models (optional - pre-trained models included)
python train.py

# 4. Run web app
cd ../ai_detector_app
streamlit run app.py

# 5. Open browser to http://localhost:8501

📋 Features

📤 File Upload & Camera: Upload images or use live camera
🎯 Probability Score: 0.0-1.0 likelihood of AI generation
📊 Confidence Level: Low/Medium/High confidence assessment
💬 Human-readable Explanations: Clear reasoning for decisions
⚡ Fast Processing: Sub-60 second processing time
🎨 Professional UI: Clean, responsive Streamlit interface
🔍 Advanced Features: 193+ sophisticated image features
🧪 Testing Tools: Demo scripts and external image testing

🛠️ Complete Setup Instructions

Prerequisites

Python 3.7+ (Tested on Python 3.13.7)
Git
Webcam (optional, for camera feature)

Step 1: Environment Setup

# Create project directory
mkdir ai_detection_project
cd ai_detection_project

# Clone repository
git clone <your-repo-url> .

# Navigate to AI detector folder
cd ai_detector

# Create virtual environment
python -m venv ai_detector_env

# Activate virtual environment
# On Linux/Mac:
source ai_detector_env/bin/activate
# On Windows:
ai_detector_env\Scripts\activate

# Verify Python version
python --version  # Should show Python 3.7+

Step 2: Install Dependencies

# Install all required packages
pip install -r requirements.txt

# Verify critical installations
python -c "import cv2; print(f'OpenCV: {cv2.__version__}')"
python -c "import sklearn; print(f'Scikit-learn: {sklearn.__version__}')" 
python -c "import xgboost; print(f'XGBoost: {xgboost.__version__}')"

Step 3: Prepare Data Structure

# Data folders should be created in ai_detector/
cd ai_detector

# Create required directories
mkdir -p data/real data/ai_generated
mkdir -p test_images

# Add your images to appropriate folders:
# data/real/ - Put real images here (JPG, PNG, JPEG, BMP)
# data/ai_generated/ - Put AI-generated images here
# test_images/ - Individual test images for evaluation

Step 4: Train Models

# Train the ML models (in ai_detector directory)
python train.py

# This will:
# - Extract 193+ features from images in data/ folders
# - Train SVM, Random Forest, XGBoost models
# - Save best model to 'models/' directory
# - Show performance metrics and cross-validation results
# - Generate feature importance analysis

# Alternative training options:
python demo.py              # Quick demo with sample images
python test.py              # Test trained models
python test_external_images.py  # Test on external image files

Step 5: Run Web Application

# Navigate to app directory
cd ../ai_detector_app

# Run Streamlit app
streamlit run app.py

# App will open at: http://localhost:8501

Step 6: Test the System

File Upload Tab:
- Upload an image (PNG, JPG, JPEG)
- View prediction results
Camera Tab:
- Allow camera permissions
- Capture photo for real-time detection
- View instant results

🧠 Available Scripts

The ai_detector folder contains several useful scripts:

Core Scripts

train.py - Main training script for ML models
detector.py - Core detection logic and inference
feature_extractor.py - Advanced feature extraction (193+ features)
model_trainer.py - Model training utilities
model_evaluator.py - Model evaluation and metrics

Testing & Demo Scripts

demo.py - Quick demonstration with sample data
test.py - Test trained models on validation data
test_external_images.py - Test individual external images
data_loader.py - Data loading and preprocessing utilities

Usage Examples

# Test on specific image
python test_external_images.py path/to/your/image.jpg

# Evaluate model performance
python test.py

# Full training pipeline
python train.py

Mid Submission Guidelines (Deadline: 10 PM)

CREATE & PUSH commit titled "MIDSUBMISSION". In the commit message, include a brief summary of the work completed so far and the tech stack currently being used. A fully functional submission is not required, but your current progress should show a consistent direction. Avoid major changes to the tech stack or codebase unless there's a clear and valid reason.

If you make any significant changes after this submission, ensure they are properly explained in future commit messages.

MIDSUBMISSION IS COMPULSARY.

P.S. Ask common doubts in general so it saves time on us for repeated common questions. For problem specific related doubts, ask on the respective problem statement channels.

📊 Technical Architecture

Traditional ML Pipeline

Image Input → Feature Extraction (193+ features) → ML Models → Prediction

Feature Categories:

Texture Analysis: Local Binary Patterns (LBP), Gray-Level Co-occurrence Matrix (GLCM)
Frequency Domain: Discrete Cosine Transform (DCT), Fast Fourier Transform (FFT)
Statistical Features: Histogram analysis, moments, edge detection
AI-Specific Features: Skin texture analysis, smoothness detection, compression artifacts

Models Available:

Linear SVM (Best performer: 67.79% AUC)
Random Forest
XGBoost
Grid search hyperparameter optimization

🔧 Troubleshooting

Common Issues

OpenCV Import Error:

pip uninstall opencv-python opencv-python-headless
pip install opencv-python==4.8.1.78

Streamlit Camera Not Working:

Ensure browser permissions for camera access
Try different browsers (Chrome recommended)
Check if other apps are using camera

Model Training Memory Error:

Use smaller subsets of your data for initial testing
Close other applications to free memory
Ensure sufficient disk space for model storage

Performance Issues:

Close other applications to free memory
Use smaller batch sizes if memory errors occur
Ensure sufficient disk space for model storage

File Structure

build2break/
├── .venv/                      # Main virtual environment
├── setup_openai_env.sh        # OpenAI setup script
├── ai_detector/                # Traditional ML system
│   ├── ai_detector_env/        # Virtual environment (ignored)
│   ├── data/                   # Training dataset
│   │   ├── real/              # Real images
│   │   └── ai_generated/      # AI-generated images
│   ├── test_images/           # Test images
│   ├── models/                # Saved models (ignored)
│   ├── utils/                 # Utility functions (ignored)
│   ├── feature_extractor.py   # 193+ feature extraction
│   ├── model_trainer.py       # ML model training
│   ├── model_evaluator.py     # Model evaluation
│   ├── detector.py            # Inference pipeline
│   ├── train.py               # Main training script
│   ├── demo.py                # Demo script
│   ├── test.py                # Testing script
│   ├── test_external_images.py # External image testing
│   ├── data_loader.py         # Data loading utilities
│   └── requirements.txt       # Dependencies
├── ai_detector_app/            # Streamlit web interface
│   ├── app.py                 # Main web application
│   ├── detector.py            # Detection logic copy
│   ├── feature_extractor.py   # Feature extraction copy
│   ├── data_loader.py         # Data loading copy
│   ├── models/                # Model files copy
│   └── requirements.txt       # App dependencies
└── README.md                   # This file

Close other applications

Performance Issues:

Use CPU-only PyTorch for compatibility: pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

File Structure

build2break/
├── ai_detector/                 # Traditional ML system
│   ├── ai_detector_env/        # Virtual environment (ignored)
│   ├── feature_extractor.py    # 193+ feature extraction
│   ├── model_trainer.py        # ML model training
│   ├── detector.py            # Inference pipeline
│   ├── train.py              # Main training script
│   ├── models/               # Saved models (ignored)
│   └── requirements.txt      # Dependencies
├── ai_detector_app/           # Streamlit web interface
│   ├── app.py               # Main web application
│   └── requirements.txt     # App dependencies
├── data/                    # Main dataset
│   ├── real/               # Real images
│   └── ai_generated/       # AI-generated images
├── data_small/             # Smaller dataset for testing
├── test_images/            # Individual test images
├── Untitled1.ipynb        # Neural network training
└── README.md              # This file

🎯 Current Progress

Mid Submission Guidelines (Deadline: 10 PM)

CREATE & PUSH commit titled "MIDSUBMISSION". In the commit message, include a brief summary of the work completed so far and the tech stack currently being used.

Completed Features ✅

✅ Traditional ML system with 67.79% AUC
✅ 193+ sophisticated feature extraction
✅ Streamlit web app with file upload
✅ Real-time camera capture functionality
✅ Comprehensive error handling
✅ Professional UI with explanations
✅ Multiple testing and demo scripts
✅ Model evaluation and comparison tools

In Progress 🔄

🔄 Performance optimization for larger datasets
🔄 Model comparison and evaluation
🔄 Advanced feature engineering

🛠️ Tech Stack

Backend: Python 3.13.7, OpenCV 4.12.0, scikit-learn 1.7.2
Machine Learning: Linear SVM, Random Forest, XGBoost 3.0.5
Feature Extraction: LBP, GLCM, DCT, FFT, Statistical Analysis
Frontend: Streamlit with camera integration
Data Processing: PIL, NumPy, pandas
Model Persistence: joblib

📈 Performance Metrics

Traditional ML Results:

Best Model: Linear SVM
Test AUC: 67.79%
Cross-Validation AUC: 68.10% ± 4.39%
Processing Speed: ~0.04 seconds per image
Dataset Size: 10,000 samples (5,000 real + 5,000 AI-generated)

Detailed Performance:

Precision: 63% (Real), 62% (AI-generated)
Recall: 61% (Real), 64% (AI-generated)
F1-Score: 62% (Real), 63% (AI-generated)
Overall Accuracy: 62%

Note: Model accuracy will be updated once the current larger dataset training completes.

Installation

Install required packages:

pip install streamlit pillow

Make sure the main AI detector is trained and models are available in ../ai_detector/models/

Usage

Start the Streamlit app:

streamlit run app.py

Open your browser to the provided URL (usually http://localhost:8501)
Upload an image and get instant AI detection results!

Technical Details

Model: Linear SVM with 180+ sophisticated features
Features: Texture (LBP, GLCM), frequency domain (DCT, FFT), statistical, edge, color analysis
Face Detection: OpenCV Haar cascades
Performance: ~0.04 seconds per image
Current Accuracy: 67.79% AUC on test dataset

Output Format

For each image, the app provides:

AI Probability: Score between 0.0 (definitely real) and 1.0 (definitely AI)
Classification: "Real" or "AI-Generated" based on probability threshold
Confidence Level: High (>90%), Medium (70-90%), Low (<70%)
Explanation: Human-readable reasoning for the decision
Processing Time: Actual time taken for analysis

Requirements

Python 3.7+
Streamlit
OpenCV
PIL (Pillow)
All dependencies from the main AI detector system

Work Completed So Far

✅ Completed Components:

Feature Extraction System: 180+ sophisticated features for image analysis
Machine Learning Pipeline: Multiple ML models with hyperparameter tuning
Data Processing: Automated face detection and preprocessing
Model Training: Complete training pipeline with cross-validation
Web Interface: Streamlit app for user-friendly interaction
Testing Framework: Comprehensive testing and evaluation system

🔄 In Progress:

Large Dataset Training: Training on 66K+ images for improved accuracy
Performance Optimization: Enhancing model accuracy and speed
UI/UX Improvements: Refining the web interface
AUDIO CONVERTER: We are training a classification model (DL model) to distinguish between ai generated and real audio clips. The way we are approaching this problem is we are first generating an embedding from the audio clip using a model like wave2vec and then train the model based on the embedding. The model training is currently on progress

📋 Next Steps:

Complete large dataset training
Final model evaluation and metrics reporting
Deploy and test with external images
Documentation and final submission

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
ai_detector		ai_detector
ai_detector_app		ai_detector_app
.gitignore		.gitignore
README.md		README.md

manit2004/build2break

Folders and files

Latest commit

History

Repository files navigation

AI vs Real Image Detection System

🚀 Quick Start

📋 Features

🛠️ Complete Setup Instructions

Prerequisites

Step 1: Environment Setup

Step 2: Install Dependencies

Step 3: Prepare Data Structure

Step 4: Train Models

Step 5: Run Web Application

Step 6: Test the System

🧠 Available Scripts

Core Scripts

Testing & Demo Scripts

Usage Examples

Mid Submission Guidelines (Deadline: 10 PM)

📊 Technical Architecture

Traditional ML Pipeline

🔧 Troubleshooting

Common Issues

File Structure

File Structure

🎯 Current Progress

Mid Submission Guidelines (Deadline: 10 PM)

Completed Features ✅

In Progress 🔄

🛠️ Tech Stack

📈 Performance Metrics

Traditional ML Results:

Detailed Performance:

Installation

Usage

Technical Details

Output Format

Requirements

Work Completed So Far

✅ Completed Components:

🔄 In Progress:

📋 Next Steps:

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages