- Our model was trained and evaluated using the following package dependencies:
- Pytorch 1.9.1
- Python 3.6.12
-
Install Matterport3D simulators: follow instructions here.
-
Download object features here.
-
Download datasets of R2R and R4R here. It contains a datasets folder.
-
(Optional). Download the trained model here.
cd pretrain_src
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_r4r.sh 8001 # R4R
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_r2r.sh 8001 # R2R
CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/r4r_b16.sh 8001 # R4R
CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/r2r_b16.sh 8001 # R2R
Please see APAF_RxR/README.md.
If you find this work useful in your research, please cite the following paper:
# BibTeX
@article{10.1145/3748656,
author = {Huang, Bowen and Zheng, Yanwei and Lan, Chuanlin and Sui, Dongchen and Zhao, Xinpeng and Zhang, Xiao and Xiao, Mengbai and Yu, Dongxiao},
title = {Action-Aware Visual-Textual Alignment for Long-Instruction Vision-and-Language Navigation},
year = {2025},
issue_date = {September 2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {21},
number = {9},
issn = {1551-6857},
url = {https://doi.org/10.1145/3748656},
doi = {10.1145/3748656},
month = sep,
articleno = {270},
numpages = {22},
keywords = {Long-Instruction Vision-and-Language Navigation, Action-Perception Alignment Framework, Action-Contextual Encoding Module, Dynamic Instruction Weighting Module}
}
# GB/T 7714
[1] Huang B , Zheng Y , Lan C ,et al.Action-Aware Visual-Textual Alignment for Long-Instruction Vision-and-Language Navigation[J].ACM Transactions on Multimedia Computing, Communications and Applications, 2025.
# MLA
[1] Huang, Bowen , et al. "Action-Aware Visual-Textual Alignment for Long-Instruction Vision-and-Language Navigation." #i{ACM Transactions on Multimedia Computing, Communications and Applications} (2025).
# APA
[1] Huang, B. , Zheng, Y. , Lan, C. , & Sui, D. . (2025). Action-aware visual-textual alignment for long-instruction vision-and-language navigation. #i{ACM Transactions on Multimedia Computing, Communications and Applications}.