The objective of this data collection strategy is to gather image data suitable for object detection tasks within the PikasuBirdAi project, utilizing Hugging Face transformer models and the YOLO object detection framework with COCO dataset names.
- DETR (DEtection TRansformer) for end-to-end object detection
- ViT (Vision Transformer) for image classification and feature extraction
- YOLOS (You Only Look at One Sequence) for object detection
- High-resolution images of birds in various natural habitats
- Images with multiple birds of different species
- Images capturing birds in flight, perched, and in nests
- Varied lighting conditions (daylight, dusk, dawn)
- Different weather conditions (clear, cloudy, rainy)
- Public datasets: COCO dataset, iNaturalist, NABirds
- Flickr API for user-generated bird images
- Web scraping from reputable bird-watching websites and forums
- Image resizing: Standardize to 640x640 pixels for YOLO compatibility
- Normalization: Scale pixel values to [0, 1]
- Data augmentation:
- Random horizontal flips
- Random rotations (±15 degrees)
- Random brightness and contrast adjustments
- Annotation conversion: Convert existing annotations to YOLO format
- Format: JPEG for images, TXT files for annotations
- Structure:
data/ ├── images/ │ ├── train/ │ ├── val/ │ └── test/ └── labels/ ├── train/ ├── val/ └── test/
- Map bird species to COCO class IDs
- Create a custom
coco_bird_names.txt
file with bird species names - Use YOLO's dataset loading utilities to integrate with COCO format
- Implement a custom dataset class that combines COCO annotations with our bird-specific data
- Maintain a README.md file in the data directory
- Document data collection sources, preprocessing steps, and usage instructions
- Keep a version history of the dataset
- Include scripts for data preprocessing and format conversion