Skip to content

visee-sdu/APAF

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

54 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Prerequisites

  1. Our model was trained and evaluated using the following package dependencies:
  • Pytorch 1.9.1
  • Python 3.6.12
  1. Install Matterport3D simulators: follow instructions here.

  2. Download object features here.

  3. Download datasets of R2R and R4R here. It contains a datasets folder.

  4. (Optional). Download the trained model here.

Pre-training

cd pretrain_src
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_r4r.sh 8001  # R4R
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_r2r.sh 8001  # R2R

Fine-tuning

CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/r4r_b16.sh 8001  # R4R
CUDA_VISIBLE_DEVICES=0,1,2,3 bash scripts/r2r_b16.sh 8001  # R2R

RxR

Please see APAF_RxR/README.md.

Citation

If you find this work useful in your research, please cite the following paper:

# BibTeX
@article{10.1145/3748656,
author = {Huang, Bowen and Zheng, Yanwei and Lan, Chuanlin and Sui, Dongchen and Zhao, Xinpeng and Zhang, Xiao and Xiao, Mengbai and Yu, Dongxiao},
title = {Action-Aware Visual-Textual Alignment for Long-Instruction Vision-and-Language Navigation},
year = {2025},
issue_date = {September 2025},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {21},
number = {9},
issn = {1551-6857},
url = {https://doi.org/10.1145/3748656},
doi = {10.1145/3748656},
month = sep,
articleno = {270},
numpages = {22},
keywords = {Long-Instruction Vision-and-Language Navigation, Action-Perception Alignment Framework, Action-Contextual Encoding Module, Dynamic Instruction Weighting Module}
}

# GB/T 7714
[1] Huang B , Zheng Y , Lan C ,et al.Action-Aware Visual-Textual Alignment for Long-Instruction Vision-and-Language Navigation[J].ACM Transactions on Multimedia Computing, Communications and Applications, 2025.

# MLA
[1] Huang, Bowen , et al. "Action-Aware Visual-Textual Alignment for Long-Instruction Vision-and-Language Navigation." #i{ACM Transactions on Multimedia Computing, Communications and Applications} (2025).

# APA
[1] Huang, B. ,  Zheng, Y. ,  Lan, C. , &  Sui, D. . (2025). Action-aware visual-textual alignment for long-instruction vision-and-language navigation. #i{ACM Transactions on Multimedia Computing, Communications and Applications}.

Acknowledgement

Codebase from ScaleVLN, BEVBert and DUET.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published