Rui Hu1,*, Lianghui Zhu1,*, Yuxuan Zhang1, Tianheng Cheng1,🌟, Lei Liu2, Heng Liu2, Longjin Ran2,
Xiaoxin Chen2, Wenyu Liu1, Xinggang Wang1,📧
1 Huazhong University of Science and Technology, 2 vivo AI Lab
(* equal contribution, 🌟 Project lead, 📧 corresponding author)
[2025-3-14]:
GroundingSuite arXiv paper released. Code and dataset are now available!
- Automated VLM Annotation: A novel VLM-based framework for efficient pixel-level grounding annotation
- Large-scale Dataset: 9.56M training samples with diverse referring expressions
- Comprehensive Benchmark: 3,800 instances evaluation benchmark for thorough assessment
- Efficient Annotation: 4.5x faster annotation compared to GLaMM
GroundingSuite is a comprehensive pixel grounding framework that addresses the challenges of complex multi-granular pixel grounding. Our framework introduces:
- An automated VLM-based annotation pipeline that significantly improves annotation efficiency
- A large-scale dataset with 9.56M diverse training samples
- A rigorous evaluation benchmark with 3,800 carefully curated instances
- State-of-the-art performance metrics that demonstrate the effectiveness of our approach
Our dataset consists of:
- Training Set: 9.56M samples with diverse referring expressions
- Evaluation Benchmark: 3,800 carefully curated instances
You can download the GSEval from: Hugging Face
python evaluate_grounding.py --image_dir ./images --gt_file GroundingSuite-Eval.jsonl --pred_file model_predictions.jsonl
--image_dir
: Directory containing images (default: current directory)--gt_file
: Path to ground truth JSONL file (default: "GroundingSuite-Eval.jsonl")--pred_file
: Path to model prediction JSONL file (default: "claude_predictions.jsonl")--output_file
: Path for saving evaluation results (default: "[model_name]_result.json")--iou_threshold
: IoU threshold for evaluation (default: 0.5)--vis_dir
: Directory for visualization results (default: "visualization")--visualize
: Enable visualization generation (default: False)--normalize_coords
: Whether prediction coordinates are normalized [0-1] (default: False)--mode
: Evaluation mode ("box" or "mask") (default: "box")--vis_samples
: Number of random samples to visualize (default: 5)
Generate visualizations comparing ground truth and predictions:
python evaluate_grounding.py --image_dir ./images --gt_file GroundingSuite-Eval.jsonl --pred_file model_predictions.jsonl --visualize --vis_dir ./vis_results
{"idx": 1, "image_path": "images/example.jpg", "box": [10, 20, 100, 200], "class_id": 0, "label": "dog"}
{"idx": 1, "image_path": "images/example.jpg", "box": [15, 25, 105, 205]}
- Box Mode: Calculates IoU (Intersection over Union) and accuracy (IoU > threshold)
- Mask Mode: Calculates GIoU (mean IoU)
If you find GroundingSuite useful in your research or applications, please consider giving us a star ⭐ and citing it using the following BibTeX entry:
@misc{hu2025groundingsuite,
title={GroundingSuite: Measuring Complex Multi-Granular Pixel Grounding},
author={Rui Hu and Lianghui Zhu and Yuxuan Zhang and Tianheng Cheng and Lei Liu and Heng Liu and Longjin Ran and Xiaoxin Chen and Wenyu Liu and Xinggang Wang},
journal={arXiv preprint arXiv:2503.10596},
year={2025}
}