A comprehensive Python tool for extracting and analyzing image metadata, designed specifically to aid in neural network preprocessing and computer vision tasks.
This tool extracts detailed metadata from images, including both technical specifications and advanced image analysis metrics that are particularly valuable for machine learning preprocessing. It processes both single images and entire directories, providing normalized outputs in both CSV and JSON formats.
-
Comprehensive Metadata Extraction:
- Basic image properties (dimensions, format, color mode)
- Color statistics and distributions
- Texture analysis using Local Binary Patterns (LBP)
- Edge detection metrics
- Shape analysis using contours
- EXIF data processing
- Color space analysis (LAB color space)
-
Neural Network Preprocessing Benefits:
- Normalized numerical outputs for direct ML pipeline integration
- Feature extraction metrics useful for data preprocessing
- Batch processing capabilities for large datasets
- Consistent data formatting across different image types
- Clone the repository:
git clone <repository-url>
- Install dependencies:
pip install pillow numpy scikit-image pandas
The tool provides an interactive CLI with the following options:
- Process a single image:
python src/main.py
# Select option 1 and enter the image path
- Process all images in a directory:
python src/main.py
# Select option 2 and enter the directory path
from src.image_processor import extract_metadata
from src.data_handler import save_metadata_to_files
# Process single image
metadata = extract_metadata("path/to/image.jpg")
save_metadata_to_files([metadata])
The tool generates two types of output files:
metadata_normalized.csv
: Normalized metadata in CSV formatmetadata_normalized.json
: Complete metadata in JSON format
src/
├── __init__.py
├── cli.py # Command-line interface
├── data_handler.py # Data processing and saving
├── image_processor.py # Core metadata extraction
├── io_handler.py # File I/O operations
└── main.py # Main application entry point
This tool is particularly valuable for neural network preprocessing because:
-
Feature Normalization:
- Automatically normalizes numerical features
- Provides consistent scaling across datasets
- Enables direct integration into ML pipelines
-
Quality Assessment:
- Extracts image quality metrics
- Identifies potential preprocessing needs
- Helps in dataset cleaning and filtering
-
Dataset Analysis:
- Provides statistical insights about your dataset
- Helps identify biases in image collections
- Enables informed preprocessing decisions
-
Preprocessing Optimization:
- Identifies images requiring specific preprocessing
- Enables automated preprocessing workflows
- Supports batch processing for large datasets
-
Color Analysis:
- RGB/LAB color space statistics
- Color histogram analysis
- Dominant color extraction
-
Texture Analysis:
- Local Binary Patterns (LBP)
- Shannon entropy calculation
- Statistical texture measures
-
Shape Analysis:
- Contour detection and analysis
- Edge density calculation
- Shape complexity metrics
- Multi-threaded processing for directory operations
- Efficient memory management for large images
- Robust error handling and recovery
Contributions are welcome! Please feel free to submit issues and pull requests.
Personal use is granted, however any icense is given without the author's permission if you want to re-distribute and / or re-use the software.