TIC-VLA: A Think-in-Control Vision-Language-Action Model for Robot Navigation in Dynamic Environments
Zhiyu Huang†, Yun Zhang†, Johnson Liu, Rui Song, Chen Tang, Jiaqi Ma
University of California, Los Angeles (UCLA)
† Equal contribution
Stay tuned for new updates!
TIC-VLA introduces a latency-aware Think-in-Control (TIC) architecture for vision-language-action (VLA) model for robot navigation in dynamic, human-centric environments.
-
🧠 Think-in-Control Architecture
Decouples slow vision-language reasoning from fast reactive control through an explicit delayed semantic–control interface. -
⏱️ Latency-Aware Action Generation
Conditions control on current observations, cached VLM hidden states, and explicit delay metadata to mitigate stale semantics. -
🧪 Latency-Consistent Training Pipeline
Combines vision-language reasoning distillation, latency-induced imitation learning, and online reinforcement learning. -
🚶 Dynamic, Human-Centric Navigation
Evaluated in physics-accurate, photo-realistic environments with human-robot interactions and long-horizon instructions.
We introduce DynaNav, a language-conditioned navigation benchmark designed to test VLA systems under realistic scenarios.
- 85 task configurations across Hospital, Office, Warehouse, and Outdoor scenes
- Varying crowd density, navigation distance, and scene layout
If you find this repository useful for your research, please consider giving us a star 🌟 and citing our paper.
@article{huang2026ticvla,
title={TIC-VLA: A Think-in-Control Vision-Language-Action Model for Robot Navigation in Dynamic Environments},
author={Zhiyu Huang and Yun Zhang and Johnson Liu and Rui Song and Chen Tang and Jiaqi Ma},
year={2026},
eprint={2602.02459},
}
