See the project page for video demos.
YouTube
Bilibili
Chrome and DingTalk
Search.NBA.FMVP.and.send.to.friend.mp4
Word
Write.an.introduction.of.Alibaba.in.Word.mp4
Mobile-Agent-v2.mp4
Mobile-Agent.mp4
- 🔥🔥[2.21.25] We have released an updated version of PC-Agent. Check the paper for details. The code will be updated soon.
- 🔥🔥[1.20.25] We propose Mobile-Agent-E, a hierarchical multi-agent framework capable of self-evolution through past experience, achieving stronger performance on complex, multi-app tasks.
- 🔥🔥[9.26] Mobile-Agent-v2 has been accepted by The Thirty-eighth Annual Conference on Neural Information Processing Systems (NeurIPS 2024).
- 🔥[8.23] We proposed PC-Agent, a PC operation assistant supporting both Mac and Windows platforms.
- 🔥[7.29] Mobile-Agent won the best demo award at the The 23rd China National Conference on Computational Linguistics (CCL 2024). On the CCL 2024, we displayed the upcoming Mobile-Agent-v3. It has smaller memory overhead (8 GB), faster reasoning speed (10s-15s per operation), and all uses open source models. Video demo, please see the last section 📺Demo.
- [6.27] We proposed Demo that can upload mobile phone screenshots to experience Mobile-Agent-V2 in Hugging Face and ModelScope. You don’t need to configure models and devices, and you can experience it immediately.
- [6. 4] Modelscope-Agent has supported Mobile-Agent-V2, based on Android Adb Env, please check in the application.
- [6. 4] We proposed Mobile-Agent-v2, a mobile device operation assistant with effective navigation via multi-agent collaboration.
- [3.10] Mobile-Agent has been accepted by the ICLR 2024 Workshop on Large Language Model (LLM) Agents.
- PC-Agent - A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC
- Mobile-Agent-E - Stronger performance on complex, long-horizon, reasoning-intensive tasks, with self-evolution capability
- Mobile-Agent-v3
- Mobile-Agent-v2 - Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
- Mobile-Agent - Autonomous Multi-Modal Mobile Device Agent with Visual Perception
If you find Mobile-Agent useful for your research and applications, please cite using this BibTeX:
@article{liu2025pc,
title={PC-Agent: A Hierarchical Multi-Agent Collaboration Framework for Complex Task Automation on PC},
author={Liu, Haowei and Zhang, Xi and Xu, Haiyang and Wanyan, Yuyang and Wang, Junyang and Yan, Ming and Zhang, Ji and Yuan, Chunfeng and Xu, Changsheng and Hu, Weiming and Huang, Fei},
journal={arXiv preprint arXiv:2502.14282},
year={2025}
}
@article{wang2025mobile,
title={Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks},
author={Wang, Zhenhailong and Xu, Haiyang and Wang, Junyang and Zhang, Xi and Yan, Ming and Zhang, Ji and Huang, Fei and Ji, Heng},
journal={arXiv preprint arXiv:2501.11733},
year={2025}
}
@article{wang2024mobile2,
title={Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration},
author={Wang, Junyang and Xu, Haiyang and Jia, Haitao and Zhang, Xi and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2406.01014},
year={2024}
}
@article{wang2024mobile,
title={Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception},
author={Wang, Junyang and Xu, Haiyang and Ye, Jiabo and Yan, Ming and Shen, Weizhou and Zhang, Ji and Huang, Fei and Sang, Jitao},
journal={arXiv preprint arXiv:2401.16158},
year={2024}
}
- AppAgent: Multimodal Agents as Smartphone Users
- mPLUG-Owl & mPLUG-Owl2: Modularized Multimodal Large Language Model
- Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
- GroundingDINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
- CLIP: Contrastive Language-Image Pretraining