Founder, CEO & Chief Scientist at Nexa AI | Stanford PhD
I am an AI researcher and software architect specializing in on-device AI and efficient generative models.
Currently, I am building Nexa AI, where we enable Day-0 support for state-of-the-art generative AI models on edge devices (NPU, GPU, CPU). My work focuses on making AI friction-free, private, and production-ready for mobile, PC, automotive, and IoT platforms.
- π Iβm currently working on: NexaSDK and the NexaML inference engine.
- π± My research interests: Multimodal AI, Model Quantization, Hardware Acceleration (Qualcomm HTP, CUDA), and Agentic Workflows.
- πΌ Previous Experience: Investment Scout at Sequoia Capital; PhD Researcher at Stanford.
- NexaML: A core inference engine enabling multimodal model deployment on Qualcomm NPU/GPU/CPU. Achieved 7.6K+ GitHub stars.
- Octopus Model Series (V1-V4): On-device language models that outperform GPT-4o on function-calling benchmarks with 35x faster inference and 70x better energy efficiency.
- Hyperlink: A fully local, private desktop app for agentic RAG file search.
- Octopus: On-device language model for function calling of software APIs (NAACL-HLT 2025)
- Octo-planner: On-device Language Model for Planner-Action Agents (EMAS 2025)
- DP-FedLORA: Privacy-Enhanced Federated Fine-Tuning for On-Device LLMs (IEEE ICDM 2025, Best Paper Runner-Up Award)
- AutoNeural: Co-Designing Vision-Language Models for NPU Inference (arXiv:2512.02924)
- OmniVLM: A token-compressed, sub-billion-parameter vision-language model for efficient on-device inference (arXiv:2412.11475)
- Dolphin: Long Context as a New Modality for Energy-Efficient On-Device Language Models (arXiv:2408.15518)
- Octopus v4: Graph of language models (arXiv:2404.19296)
- Octopus v3: Technical Report for On-device Sub-billion Multimodal AI Agent (arXiv:2404.11459)
- Octopus v2: On-device language model for super agent (arXiv:2404.01744)
- On-Device Language Models: A Comprehensive Review (arXiv:2409.00088)



