A sophisticated AI detection system that analyzes images to determine if they are AI-generated or real using traditional machine learning approaches with advanced feature extraction and a user-friendly web interface.
# 1. Clone and setup
git clone <repository-url>
cd build2break
# 2. Setup AI detector system
cd ai_detector
python -m venv ai_detector_env
source ai_detector_env/bin/activate # On Windows: ai_detector_env\Scripts\activate
pip install -r requirements.txt
# 3. Train models (optional - pre-trained models included)
python train.py
# 4. Run web app
cd ../ai_detector_app
streamlit run app.py
# 5. Open browser to http://localhost:8501- π€ File Upload & Camera: Upload images or use live camera
- π― Probability Score: 0.0-1.0 likelihood of AI generation
- π Confidence Level: Low/Medium/High confidence assessment
- π¬ Human-readable Explanations: Clear reasoning for decisions
- β‘ Fast Processing: Sub-60 second processing time
- π¨ Professional UI: Clean, responsive Streamlit interface
- π Advanced Features: 193+ sophisticated image features
- π§ͺ Testing Tools: Demo scripts and external image testing
- Python 3.7+ (Tested on Python 3.13.7)
- Git
- Webcam (optional, for camera feature)
# Create project directory
mkdir ai_detection_project
cd ai_detection_project
# Clone repository
git clone <your-repo-url> .
# Navigate to AI detector folder
cd ai_detector
# Create virtual environment
python -m venv ai_detector_env
# Activate virtual environment
# On Linux/Mac:
source ai_detector_env/bin/activate
# On Windows:
ai_detector_env\Scripts\activate
# Verify Python version
python --version # Should show Python 3.7+# Install all required packages
pip install -r requirements.txt
# Verify critical installations
python -c "import cv2; print(f'OpenCV: {cv2.__version__}')"
python -c "import sklearn; print(f'Scikit-learn: {sklearn.__version__}')"
python -c "import xgboost; print(f'XGBoost: {xgboost.__version__}')"# Data folders should be created in ai_detector/
cd ai_detector
# Create required directories
mkdir -p data/real data/ai_generated
mkdir -p test_images
# Add your images to appropriate folders:
# data/real/ - Put real images here (JPG, PNG, JPEG, BMP)
# data/ai_generated/ - Put AI-generated images here
# test_images/ - Individual test images for evaluation# Train the ML models (in ai_detector directory)
python train.py
# This will:
# - Extract 193+ features from images in data/ folders
# - Train SVM, Random Forest, XGBoost models
# - Save best model to 'models/' directory
# - Show performance metrics and cross-validation results
# - Generate feature importance analysis
# Alternative training options:
python demo.py # Quick demo with sample images
python test.py # Test trained models
python test_external_images.py # Test on external image files# Navigate to app directory
cd ../ai_detector_app
# Run Streamlit app
streamlit run app.py
# App will open at: http://localhost:8501-
File Upload Tab:
- Upload an image (PNG, JPG, JPEG)
- View prediction results
-
Camera Tab:
- Allow camera permissions
- Capture photo for real-time detection
- View instant results
The ai_detector folder contains several useful scripts:
train.py- Main training script for ML modelsdetector.py- Core detection logic and inferencefeature_extractor.py- Advanced feature extraction (193+ features)model_trainer.py- Model training utilitiesmodel_evaluator.py- Model evaluation and metrics
demo.py- Quick demonstration with sample datatest.py- Test trained models on validation datatest_external_images.py- Test individual external imagesdata_loader.py- Data loading and preprocessing utilities
# Test on specific image
python test_external_images.py path/to/your/image.jpg
# Evaluate model performance
python test.py
# Full training pipeline
python train.pyCREATE & PUSH commit titled "MIDSUBMISSION". In the commit message, include a brief summary of the work completed so far and the tech stack currently being used. A fully functional submission is not required, but your current progress should show a consistent direction. Avoid major changes to the tech stack or codebase unless there's a clear and valid reason.
If you make any significant changes after this submission, ensure they are properly explained in future commit messages.
MIDSUBMISSION IS COMPULSARY.
P.S. Ask common doubts in general so it saves time on us for repeated common questions. For problem specific related doubts, ask on the respective problem statement channels.
Image Input β Feature Extraction (193+ features) β ML Models β Prediction
Feature Categories:
- Texture Analysis: Local Binary Patterns (LBP), Gray-Level Co-occurrence Matrix (GLCM)
- Frequency Domain: Discrete Cosine Transform (DCT), Fast Fourier Transform (FFT)
- Statistical Features: Histogram analysis, moments, edge detection
- AI-Specific Features: Skin texture analysis, smoothness detection, compression artifacts
Models Available:
- Linear SVM (Best performer: 67.79% AUC)
- Random Forest
- XGBoost
- Grid search hyperparameter optimization
OpenCV Import Error:
pip uninstall opencv-python opencv-python-headless
pip install opencv-python==4.8.1.78Streamlit Camera Not Working:
- Ensure browser permissions for camera access
- Try different browsers (Chrome recommended)
- Check if other apps are using camera
Model Training Memory Error:
- Use smaller subsets of your data for initial testing
- Close other applications to free memory
- Ensure sufficient disk space for model storage
Performance Issues:
- Close other applications to free memory
- Use smaller batch sizes if memory errors occur
- Ensure sufficient disk space for model storage
build2break/
βββ .venv/ # Main virtual environment
βββ setup_openai_env.sh # OpenAI setup script
βββ ai_detector/ # Traditional ML system
β βββ ai_detector_env/ # Virtual environment (ignored)
β βββ data/ # Training dataset
β β βββ real/ # Real images
β β βββ ai_generated/ # AI-generated images
β βββ test_images/ # Test images
β βββ models/ # Saved models (ignored)
β βββ utils/ # Utility functions (ignored)
β βββ feature_extractor.py # 193+ feature extraction
β βββ model_trainer.py # ML model training
β βββ model_evaluator.py # Model evaluation
β βββ detector.py # Inference pipeline
β βββ train.py # Main training script
β βββ demo.py # Demo script
β βββ test.py # Testing script
β βββ test_external_images.py # External image testing
β βββ data_loader.py # Data loading utilities
β βββ requirements.txt # Dependencies
βββ ai_detector_app/ # Streamlit web interface
β βββ app.py # Main web application
β βββ detector.py # Detection logic copy
β βββ feature_extractor.py # Feature extraction copy
β βββ data_loader.py # Data loading copy
β βββ models/ # Model files copy
β βββ requirements.txt # App dependencies
βββ README.md # This file
- Close other applications
Performance Issues:
- Use CPU-only PyTorch for compatibility:
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
build2break/
βββ ai_detector/ # Traditional ML system
β βββ ai_detector_env/ # Virtual environment (ignored)
β βββ feature_extractor.py # 193+ feature extraction
β βββ model_trainer.py # ML model training
β βββ detector.py # Inference pipeline
β βββ train.py # Main training script
β βββ models/ # Saved models (ignored)
β βββ requirements.txt # Dependencies
βββ ai_detector_app/ # Streamlit web interface
β βββ app.py # Main web application
β βββ requirements.txt # App dependencies
βββ data/ # Main dataset
β βββ real/ # Real images
β βββ ai_generated/ # AI-generated images
βββ data_small/ # Smaller dataset for testing
βββ test_images/ # Individual test images
βββ Untitled1.ipynb # Neural network training
βββ README.md # This file
CREATE & PUSH commit titled "MIDSUBMISSION". In the commit message, include a brief summary of the work completed so far and the tech stack currently being used.
- β Traditional ML system with 67.79% AUC
- β 193+ sophisticated feature extraction
- β Streamlit web app with file upload
- β Real-time camera capture functionality
- β Comprehensive error handling
- β Professional UI with explanations
- β Multiple testing and demo scripts
- β Model evaluation and comparison tools
- π Performance optimization for larger datasets
- π Model comparison and evaluation
- π Advanced feature engineering
- Backend: Python 3.13.7, OpenCV 4.12.0, scikit-learn 1.7.2
- Machine Learning: Linear SVM, Random Forest, XGBoost 3.0.5
- Feature Extraction: LBP, GLCM, DCT, FFT, Statistical Analysis
- Frontend: Streamlit with camera integration
- Data Processing: PIL, NumPy, pandas
- Model Persistence: joblib
- Best Model: Linear SVM
- Test AUC: 67.79%
- Cross-Validation AUC: 68.10% Β± 4.39%
- Processing Speed: ~0.04 seconds per image
- Dataset Size: 10,000 samples (5,000 real + 5,000 AI-generated)
- Precision: 63% (Real), 62% (AI-generated)
- Recall: 61% (Real), 64% (AI-generated)
- F1-Score: 62% (Real), 63% (AI-generated)
- Overall Accuracy: 62%
Note: Model accuracy will be updated once the current larger dataset training completes.
- Install required packages:
pip install streamlit pillow- Make sure the main AI detector is trained and models are available in
../ai_detector/models/
- Start the Streamlit app:
streamlit run app.py-
Open your browser to the provided URL (usually http://localhost:8501)
-
Upload an image and get instant AI detection results!
- Model: Linear SVM with 180+ sophisticated features
- Features: Texture (LBP, GLCM), frequency domain (DCT, FFT), statistical, edge, color analysis
- Face Detection: OpenCV Haar cascades
- Performance: ~0.04 seconds per image
- Current Accuracy: 67.79% AUC on test dataset
For each image, the app provides:
- AI Probability: Score between 0.0 (definitely real) and 1.0 (definitely AI)
- Classification: "Real" or "AI-Generated" based on probability threshold
- Confidence Level: High (>90%), Medium (70-90%), Low (<70%)
- Explanation: Human-readable reasoning for the decision
- Processing Time: Actual time taken for analysis
- Python 3.7+
- Streamlit
- OpenCV
- PIL (Pillow)
- All dependencies from the main AI detector system
- Feature Extraction System: 180+ sophisticated features for image analysis
- Machine Learning Pipeline: Multiple ML models with hyperparameter tuning
- Data Processing: Automated face detection and preprocessing
- Model Training: Complete training pipeline with cross-validation
- Web Interface: Streamlit app for user-friendly interaction
- Testing Framework: Comprehensive testing and evaluation system
- Large Dataset Training: Training on 66K+ images for improved accuracy
- Performance Optimization: Enhancing model accuracy and speed
- UI/UX Improvements: Refining the web interface
- AUDIO CONVERTER: We are training a classification model (DL model) to distinguish between ai generated and real audio clips. The way we are approaching this problem is we are first generating an embedding from the audio clip using a model like wave2vec and then train the model based on the embedding. The model training is currently on progress
- Complete large dataset training
- Final model evaluation and metrics reporting
- Deploy and test with external images
- Documentation and final submission