Skip to content

Commit 05f39cd

Browse files
Code and model release
Co-authored-by: ChunyuanLI <[email protected]>
1 parent 83423b0 commit 05f39cd

File tree

124 files changed

+13562
-3
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

124 files changed

+13562
-3
lines changed

README.md

+36-3
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-customized-visual-models-with/semi-supervised-image-classification-on-1)](https://paperswithcode.com/sota/semi-supervised-image-classification-on-1?p=learning-customized-visual-models-with)
33
[![PWC](https://img.shields.io/endpoint.svg?url=https://paperswithcode.com/badge/learning-customized-visual-models-with/semi-supervised-image-classification-on-2)](https://paperswithcode.com/sota/semi-supervised-image-classification-on-2?p=learning-customized-visual-models-with)
44

5-
# REACT: Learning Customized Visual Models with Retrieval-Augmented Knowledge (CVPR 2023)
5+
## REACT: Learning Customized Visual Models with Retrieval-Augmented Knowledge (CVPR 2023, Highlight 2.5%)
66

77
[Haotian Liu](https://hliu.cc), [Kilho Son](#), [Jianwei Yang](https://jwyang.github.io/), [Ce Liu](#), [Jianfeng Gao](https://www.microsoft.com/en-us/research/people/jfgao/), [Yong Jae Lee*](https://pages.cs.wisc.edu/~yongjaelee/), [Chunyuan Li*](https://chunyuan.li/)
88

@@ -13,9 +13,38 @@
1313
- Introducing a customization stage to the lifecycle of foundation models!
1414
- REACT customizes foundation models to downstream tasks without the need of any labeled data.
1515

16-
## Code Release
16+
## :fire: News
1717

18-
Code comming soon. Stay tuned!
18+
* **[2023.03.29]** Code base and checkpoints are released.
19+
* **[2023.03.25]** Our research paper is selected as <b>highlight</b> (2.5% acceptance rate)!
20+
* **[2023.03.24]** Our new checkpoint based on OpenCLIP-G/14 achieves <b>81.0%</b> zero-shot on ImageNet, the <b>new SOTA</b> among public checkpoints!
21+
* **[2023.02.28]** Paper is accepted to CVPR 2023.
22+
* **[2023.01.17]** REACT paper is released.
23+
24+
## Code
25+
26+
### [:globe_with_meridians: Stage 1: Retrieval](./react_retrieval)
27+
REACT provides a pipeline that supports building index on a large dataset, and efficiently queries and retrieves relevant data for downstream tasks with information as simple as class names. See [`react_retrieval`](./react_retrieval) for details.
28+
29+
You may skip this step if you want to focus on building customized models on standard benchmarks like ImageNet-1K and ELEVATER, by directly using our retrieved indices.
30+
31+
### [:art: Stage 2: Customization](./react_customization)
32+
33+
REACT proposes the efficient and effective *locked-text gated-image tuning* for tuning customized model on the retrieved dataset, with a performance improvement of up to 5.4% improvements on ImageNet. See [`react_customization`](./react_customization) for details.
34+
35+
## Pretrained Models
36+
37+
### ImageNet-1K
38+
39+
| | Baseline | REACT <br/> (Locked-Text) <br/> LAION-400M | REACT <br/> (Gated-Image) <br/> LAION-400M | REACT <br/> (Gated-Image) <br/> LAION-2B |
40+
|------------------------|------|-----------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------|
41+
| CLIP (B32, WIT-400M) | 63.2 | 66.9 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/clip-vit-base-32-locked-text.pt)) | 68.6 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/clip-vit-base-32-gated-image.pt)) | -- |
42+
| OpenCLIP (B32, L-400M) | 62.9 | 65.7 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/openclip-vit-base-32-locked-text.pt)) | 66.4 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/openclip-vit-base-32-gated-image.pt)) | -- |
43+
| OpenCLIP (B32, L-2B) | 66.6 | 67.5 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/openclip-laion2b-vit-base-32-locked-text.pt)) | 69.5 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/openclip-laion2b-vit-base-32-gated-image.pt)) | -- |
44+
| CLIP (B16, WIT-400M) | 68.6 | 71.6 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/clip-vit-base-16-locked-text.pt)) | 73.4 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/clip-vit-base-16-gated-image.pt)) | -- |
45+
| CLIP (L14, WIT-400M) | 75.3 | -- | 78.1 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/clip-vit-large-14-gated-image.pt)) | 79.8 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/clip-vit-large-14-gated-image-laion2b.pt)) |
46+
| OpenCLIP (L14, L-2B) | 75.3 | -- | 76.4 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/openclip-vit-large-14-gated-image.pt)) | 78.6 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/openclip-vit-large-14-gated-image-laion2b.pt)) |
47+
| OpenCLIP (G14, L-2B) | 80.1 | -- | -- | 81.0 ([hf](https://huggingface.co/react-vl/react-in1k/blob/main/openclip-vit-bigG-14-gated-image-laion2b.pt)) |
1948

2049
## Citation
2150
```
@@ -27,6 +56,10 @@ Code comming soon. Stay tuned!
2756
}
2857
```
2958

59+
## Acknowledgement
60+
61+
We are grateful for the contributions of several open-source projects, including [CLIP](https://github.com/openai/CLIP), [OpenCLIP](https://github.com/mlfoundations/open_clip), [LAION.AI](https://laion.ai/), [FAISS](https://github.com/facebookresearch/faiss), [Autofaiss](https://github.com/criteo/autofaiss), [img2dataset](https://github.com/rom1504/img2dataset), and [ELEVATER](https://github.com/Computer-Vision-in-the-Wild/Elevater_Toolkit_IC).
62+
3063
## Contributing
3164

3265
This project welcomes contributions and suggestions. Most contributions require you to agree to a

react_customization/.gitignore

+154
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,154 @@
1+
logs/
2+
wandb/
3+
models/
4+
features/
5+
results/
6+
/data/
7+
8+
tests/data/
9+
*.pt
10+
11+
# Byte-compiled / optimized / DLL files
12+
__pycache__/
13+
*.py[cod]
14+
*$py.class
15+
16+
# C extensions
17+
*.so
18+
19+
# Distribution / packaging
20+
.Python
21+
build/
22+
develop-eggs/
23+
dist/
24+
downloads/
25+
eggs/
26+
.eggs/
27+
lib/
28+
lib64/
29+
parts/
30+
sdist/
31+
var/
32+
wheels/
33+
pip-wheel-metadata/
34+
share/python-wheels/
35+
*.egg-info/
36+
.installed.cfg
37+
*.egg
38+
MANIFEST
39+
40+
# PyInstaller
41+
# Usually these files are written by a python script from a template
42+
# before PyInstaller builds the exe, so as to inject date/other infos into it.
43+
*.manifest
44+
*.spec
45+
46+
# Installer logs
47+
pip-log.txt
48+
pip-delete-this-directory.txt
49+
50+
# Unit test / coverage reports
51+
htmlcov/
52+
.tox/
53+
.nox/
54+
.coverage
55+
.coverage.*
56+
.cache
57+
nosetests.xml
58+
coverage.xml
59+
*.cover
60+
*.py,cover
61+
.hypothesis/
62+
.pytest_cache/
63+
64+
# Translations
65+
*.mo
66+
*.pot
67+
68+
# Django stuff:
69+
*.log
70+
local_settings.py
71+
db.sqlite3
72+
db.sqlite3-journal
73+
74+
# Flask stuff:
75+
instance/
76+
.webassets-cache
77+
78+
# Scrapy stuff:
79+
.scrapy
80+
81+
# Sphinx documentation
82+
docs/_build/
83+
84+
# PyBuilder
85+
target/
86+
87+
# Jupyter Notebook
88+
.ipynb_checkpoints
89+
90+
# IPython
91+
profile_default/
92+
ipython_config.py
93+
94+
# pyenv
95+
.python-version
96+
97+
# pipenv
98+
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
99+
# However, in case of collaboration, if having platform-specific dependencies or dependencies
100+
# having no cross-platform support, pipenv may install dependencies that don't work, or not
101+
# install all needed dependencies.
102+
#Pipfile.lock
103+
104+
# PEP 582; used by e.g. github.com/David-OConnor/pyflow
105+
__pypackages__/
106+
107+
# Celery stuff
108+
celerybeat-schedule
109+
celerybeat.pid
110+
111+
# SageMath parsed files
112+
*.sage.py
113+
114+
# Environments
115+
.env
116+
.venv
117+
env/
118+
venv/
119+
ENV/
120+
env.bak/
121+
venv.bak/
122+
123+
# Spyder project settings
124+
.spyderproject
125+
.spyproject
126+
127+
# Rope project settings
128+
.ropeproject
129+
130+
# mkdocs documentation
131+
/site
132+
133+
# mypy
134+
.mypy_cache/
135+
.dmypy.json
136+
dmypy.json
137+
138+
# Pyre type checker
139+
.pyre/
140+
sync.sh
141+
gpu1sync.sh
142+
.idea
143+
*.pdf
144+
**/._*
145+
**/*DS_*
146+
**.jsonl
147+
src/sbatch
148+
src/misc
149+
.vscode
150+
src/debug
151+
core.*
152+
153+
# Allow
154+
!src/evaluation/misc/results_dbs/*

react_customization/CITATION.cff

+33
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,33 @@
1+
cff-version: 1.1.0
2+
message: If you use this software, please cite it as below.
3+
authors:
4+
- family-names: Ilharco
5+
given-names: Gabriel
6+
- family-names: Wortsman
7+
given-names: Mitchell
8+
- family-names: Wightman
9+
given-names: Ross
10+
- family-names: Gordon
11+
given-names: Cade
12+
- family-names: Carlini
13+
given-names: Nicholas
14+
- family-names: Taori
15+
given-names: Rohan
16+
- family-names: Dave
17+
given-names: Achal
18+
- family-names: Shankar
19+
given-names: Vaishaal
20+
- family-names: Namkoong
21+
given-names: Hongseok
22+
- family-names: Miller
23+
given-names: John
24+
- family-names: Hajishirzi
25+
given-names: Hannaneh
26+
- family-names: Farhadi
27+
given-names: Ali
28+
- family-names: Schmidt
29+
given-names: Ludwig
30+
title: OpenCLIP
31+
version: v0.1
32+
doi: 10.5281/zenodo.5143773
33+
date-released: 2021-07-28

react_customization/HISTORY.md

+140
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
## 2.14.0
2+
3+
* Move dataset mixtures logic to shard level
4+
* Fix CoCa accum-grad training
5+
* Safer transformers import guard
6+
* get_labels refactoring
7+
8+
## 2.13.0
9+
10+
* Add support for dataset mixtures with different sampling weights
11+
* Make transformers optional again
12+
13+
## 2.12.0
14+
15+
* Updated convnext configs for consistency
16+
* Added input_patchnorm option
17+
* Clean and improve CoCa generation
18+
* Support model distillation
19+
* Add ConvNeXt-Large 320x320 fine-tune weights
20+
21+
## 2.11.1
22+
23+
* Make transformers optional
24+
* Add MSCOCO CoCa finetunes to pretrained models
25+
26+
## 2.11.0
27+
28+
* coca support and weights
29+
* ConvNeXt-Large weights
30+
31+
## 2.10.1
32+
33+
* `hf-hub:org/model_id` support for loading models w/ config and weights in Hugging Face Hub
34+
35+
## 2.10.0
36+
37+
* Added a ViT-bigG-14 model.
38+
* Added an up-to-date example slurm script for large training jobs.
39+
* Added a option to sync logs and checkpoints to S3 during training.
40+
* New options for LR schedulers, constant and constant with cooldown
41+
* Fix wandb autoresuming when resume is not set
42+
* ConvNeXt `base` & `base_w` pretrained models added
43+
* `timm-` model prefix removed from configs
44+
* `timm` augmentation + regularization (dropout / drop-path) supported
45+
46+
## 2.9.3
47+
48+
* Fix wandb collapsing multiple parallel runs into a single one
49+
50+
## 2.9.2
51+
52+
* Fix braceexpand memory explosion for complex webdataset urls
53+
54+
## 2.9.1
55+
56+
* Fix release
57+
58+
## 2.9.0
59+
60+
* Add training feature to auto-resume from the latest checkpoint on restart via `--resume latest`
61+
* Allow webp in webdataset
62+
* Fix logging for number of samples when using gradient accumulation
63+
* Add model configs for convnext xxlarge
64+
65+
## 2.8.2
66+
67+
* wrapped patchdropout in a torch.nn.Module
68+
69+
## 2.8.1
70+
71+
* relax protobuf dependency
72+
* override the default patch dropout value in 'vision_cfg'
73+
74+
## 2.8.0
75+
76+
* better support for HF models
77+
* add support for gradient accumulation
78+
* CI fixes
79+
* add support for patch dropout
80+
* add convnext configs
81+
82+
83+
## 2.7.0
84+
85+
* add multilingual H/14 xlm roberta large
86+
87+
## 2.6.1
88+
89+
* fix setup.py _read_reqs
90+
91+
## 2.6.0
92+
93+
* Make openclip training usable from pypi.
94+
* Add xlm roberta large vit h 14 config.
95+
96+
## 2.5.0
97+
98+
* pretrained B/32 xlm roberta base: first multilingual clip trained on laion5B
99+
* pretrained B/32 roberta base: first clip trained using an HF text encoder
100+
101+
## 2.4.1
102+
103+
* Add missing hf_tokenizer_name in CLIPTextCfg.
104+
105+
## 2.4.0
106+
107+
* Fix #211, missing RN50x64 config. Fix type of dropout param for ResNet models
108+
* Bring back LayerNorm impl that casts to input for non bf16/fp16
109+
* zero_shot.py: set correct tokenizer based on args
110+
* training/params.py: remove hf params and get them from model config
111+
112+
## 2.3.1
113+
114+
* Implement grad checkpointing for hf model.
115+
* custom_text: True if hf_model_name is set
116+
* Disable hf tokenizer parallelism
117+
118+
## 2.3.0
119+
120+
* Generalizable Text Transformer with HuggingFace Models (@iejMac)
121+
122+
## 2.2.0
123+
124+
* Support for custom text tower
125+
* Add checksum verification for pretrained model weights
126+
127+
## 2.1.0
128+
129+
* lot including sota models, bfloat16 option, better loading, better metrics
130+
131+
## 1.2.0
132+
133+
* ViT-B/32 trained on Laion2B-en
134+
* add missing openai RN50x64 model
135+
136+
## 1.1.1
137+
138+
* ViT-B/16+
139+
* Add grad checkpointing support
140+
* more robust data loader

0 commit comments

Comments
 (0)