- [2021 ICASSP] QuerYD: A Video Dataset with High-Quality Textual and Audio Narrations, [paper], [bibtex], [homepage].
- [2020 EMNLP] What is More Likely to Happen Next? Video-and-Language Future Event Prediction, [paper], [bibtex], sources: [jayleicn/VideoLanguageFuturePred].
- [2020 CVPR] VIOLIN: A Large-Scale Dataset for Video-and-Language Inference, [paper], [bibtex], sources: [jimmy646/violin].
- [2018 CVPR] Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments, [paper], [bibtex], sources: [peteanderson80/Matterport3DSimulator].
- [2019 ACL] Are You Looking? Grounding to Multiple Modalities in Vision-and-Language Navigation, [paper], [bibtex], [supplementary].