You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+5-39
Original file line number
Diff line number
Diff line change
@@ -103,50 +103,16 @@ Apart from MMDetection, we also released [MMEngine](https://github.com/open-mmla
103
103
104
104
### Highlight
105
105
106
-
**v3.2.0** was released in 12/10/2023:
106
+
**v3.3.0** was released in 5/1/2024:
107
107
108
-
**1. Detection Transformer SOTA Model Collection**
109
-
(1) Supported four updated and stronger SOTA Transformer models: [DDQ](configs/ddq/README.md), [CO-DETR](projects/CO-DETR/README.md), [AlignDETR](projects/AlignDETR/README.md), and [H-DINO](projects/HDINO/README.md).
110
-
(2) Based on CO-DETR, MMDet released a model with a COCO performance of 64.1 mAP.
111
-
(3) Algorithms such as DINO support `AMP/Checkpoint/FrozenBN`, which can effectively reduce memory usage.
108
+
**[MM-Grounding-DINO: An Open and Comprehensive Pipeline for Unified Object Grounding and Detection](https://arxiv.org/abs/2401.02361)**
112
109
113
-
**2. [Comprehensive Performance Comparison between CNN and Transformer](projects/RF100-Benchmark/README.md)**
114
-
RF100 consists of a dataset collection of 100 real-world datasets, including 7 domains. It can be used to assess the performance differences of Transformer models like DINO and CNN-based algorithms under different scenarios and data volumes. Users can utilize this benchmark to quickly evaluate the robustness of their algorithms in various scenarios.
110
+
Grounding DINO is a grounding pre-training model that unifies 2d open vocabulary object detection and phrase grounding, with wide applications. However, its training part has not been open sourced. Therefore, we propose MM-Grounding-DINO, which not only serves as an open source replication version of Grounding DINO, but also achieves significant performance improvement based on reconstructed data types, exploring different dataset combinations and initialization strategies. Moreover, we conduct evaluations from multiple dimensions, including OOD, REC, Phrase Grounding, OVD, and Fine-tune, to fully excavate the advantages and disadvantages of Grounding pre-training, hoping to provide inspiration for future work.
**3. Support for [GLIP](configs/glip/README.md) and [Grounding DINO](configs/grounding_dino/README.md) fine-tuning, the only algorithm library that supports Grounding DINO fine-tuning**
121
-
The Grounding DINO algorithm in MMDet is the only library that supports fine-tuning. Its performance is one point higher than the official version, and of course, GLIP also outperforms the official version.
122
-
We also provide a detailed process for training and evaluating Grounding DINO on custom datasets. Everyone is welcome to give it a try.
123
-
124
-
| Model | Backbone | Style | COCO mAP | Official COCO mAP |
We are excited to announce our latest work on real-time object recognition tasks, **RTMDet**, a family of fully convolutional single-stage detectors. RTMDet not only achieves the best parameter-accuracy trade-off on object detection from tiny to extra-large model sizes but also obtains new state-of-the-art performance on instance segmentation and rotated object detection tasks. Details can be found in the [technical report](https://arxiv.org/abs/2212.07784). Pre-trained models are [here](configs/rtmdet).
Copy file name to clipboardExpand all lines: configs/mm_grounding_dino/README.md
+11-1
Original file line number
Diff line number
Diff line change
@@ -1,10 +1,20 @@
1
1
# MM Grounding DINO
2
2
3
+
> [An Open and Comprehensive Pipeline for Unified Object Grounding and Detection](https://arxiv.org/abs/2401.02361)
4
+
3
5
<!-- [ALGORITHM] -->
4
6
5
7
## Abstract
6
8
7
-
TODO
9
+
Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lacks comprehensive public technical details due to the unavailability of its training code. To bridge this gap, we present MM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline, which is built with the MMDetection toolbox. It adopts abundant vision datasets for pre-training and various detection and grounding datasets for fine-tuning. We give a comprehensive analysis of each reported result and detailed settings for reproduction. The extensive experiments on the benchmarks mentioned demonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tiny baseline. We release all our models to the research community.
Referential expression understanding refers to the model automatically comprehending the referential expressions involved in a user's language description without the need for noun phrase extraction.
Copy file name to clipboardExpand all lines: docs/en/notes/changelog.md
+29-1
Original file line number
Diff line number
Diff line change
@@ -1,6 +1,34 @@
1
1
# Changelog of v3.x
2
2
3
-
## v3.1.0 (12/10/2023)
3
+
## v3.3.0 (05/01/2024)
4
+
5
+
### Highlights
6
+
7
+
Grounding-DINO is a state-of-the-art open-set detection model that tackles multiple vision tasks including Open-Vocabulary Detection (OVD), Phrase Grounding (PG), and Referring Expression Comprehension (REC). Its effectiveness has led to its widespread adoption as a mainstream architecture for various downstream applications. However, despite its significance, the original Grounding-DINO model lacks comprehensive public technical details due to the unavailability of its training code. To bridge this gap, we present MM-Grounding-DINO, an open-source, comprehensive, and user-friendly baseline, which is built with the MMDetection toolbox. It adopts abundant vision datasets for pre-training and various detection and grounding datasets for fine-tuning. We give a comprehensive analysis of each reported result and detailed settings for reproduction. The extensive experiments on the benchmarks mentioned demonstrate that our MM-Grounding-DINO-Tiny outperforms the Grounding-DINO-Tiny baseline. We release all our models to the research community.
8
+
9
+
### New Features
10
+
11
+
- Add RTMDet Swin / ConvNeXt backbone and results (#11259)
12
+
- Add `odinw` configs and evaluation results of `GLIP` (#11175)
13
+
- Add optional score threshold option to `coco_error_analysis.py` (#11117)
14
+
- Add new configs for `panoptic_fpn` (#11109)
15
+
- Replace partially weighted download links with OpenXLab for the `Faster-RCNN` (#11173)
16
+
17
+
### Bug Fixes
18
+
19
+
- Fix `Grounding DINO` nan when class tokens exceeds 256 (#11066)
20
+
- Fix the `CO-DETR` config files error (#11325)
21
+
- Fix `CO-DETR` load_from url in config (#11220)
22
+
- Fixed mask shape after Albu postprocess (#11280)
23
+
- Fix bug in `convert_coco_format` and `youtubevis2coco` (#11251, #11086)
24
+
25
+
### Contributors
26
+
27
+
A total of 15 developers contributed to this release.
0 commit comments