diff --git a/README.md b/README.md index 5920aff..00b73cb 100644 --- a/README.md +++ b/README.md @@ -193,6 +193,13 @@ We also release the full training set ([`Chrisyichuan/screenshot-training-natural-filtered-v2`](https://huggingface.co/datasets/Chrisyichuan/screenshot-training-natural-filtered-v2)), so you can adapt other backbones yourself — a larger Qwen, or any other embedding model. +### Data Curation + +Visualization of some very early version of the training data: +[early training data viewer](https://yichuan-w.github.io/share/blog-review-first100-light/) + +Reproduce: TBD + ## Acknowledgments Thanks to [Rulin Shao](https://rulinshao.github.io/) for support.