- Title: Blind Spatial Protocol: Pure-Text LLM Desktop GUI Control via ElementMap and UIA Element ID
- Author: Yuanhui Gao (GitHub: @gaowatch)
- Version: v1.0 (Concept Paper)
- GitHub Repository: https://github.com/gaowatch/veyranova
- Paper PDF: [bsp-v1.0-arXiv.pdf]](https://github.com/gaowatch/veyranova/blob/main/bsp-v1.0-arXiv.pdf)Due to PDF preview instability on GitHub, the full paper is provided as a ZIP package. Download and extract to view the complete PDF.
- Direct Access Link: https://github.com/gaowatch/nekot/blob/main/bsp-v1.0-arXiv.pdf
- ElementMap Blind Operation Protocol: A structured element mapping mechanism that lets pure-text LLMs directly reference UIA element IDs to locate GUI components
- Zero Coordinate Guessing: Fundamentally eliminates the coordinate hallucination problem that plagues all vision-based agents, ensuring 100% operation accuracy
- Vision-Free & Privilege-Free: No screenshots, no elevated system privileges, accessible to all ordinary users out of the box
- Complete Tool-Call Parsing Pipeline: End-to-end parsing of LLM instructions to ensure accurate execution of operations
- Constitutional-Level Security Pre-Check: Built-in inviolable security rules to prevent malicious operations and protect system safety
- Pure-Text LLM Native Support: Compatible with any pure-text large language model, no multimodal model required
This paper corresponds to the open-source desktop agent project VeyraNova, which is under active development, with its core implementation based on the BSP protocol.