Control your computer with natural language and voice commands.
Powered by Google Gemini AI, this assistant understands what you want to do and executes it automatically.
setup_venv.batCreate a .env file:
GEMINI_API_KEY=your_api_key_here
Get your key: https://makersuite.google.com/app/apikey
run.batThat's it! ๐
Just type naturally:
search for Python tutorials and open first result
write an article about AI and post to X
click the submit button
Press V then speak:
๐ค "Search for AI trends"
๐ค "Post to Twitter"
- Launcher Guide - All ways to start the app
- Quick Start Guide - Command examples
- Complex Workflows - Multi-step automation
- Direct Search & Input - Search and voice features
- Latest Enhancements - What's new
- Architecture - How it works
- Bug Fixes - Recent fixes
- Implementation Details - Technical specs
- Natural language processing
- Detects simple vs complex tasks
- Automatic workflow generation
- 15+ websites supported (X, Facebook, LinkedIn, Gmail, GitHub, etc.)
- Smart navigation with keyboard shortcuts
- Tab-based navigation fallback
- Press
Vto speak commands - Works alongside text input
- Install:
pip install SpeechRecognition pyaudio
- Opens Chrome and searches
- Can open first result automatically
- Example: "search for Python and open first result"
- AI-powered article writing
- Social media posts
- Customizable length and style
- API key in
.envfile (not in code) - Dry-run mode for testing
- Emergency stop (Ctrl+C)
- Python 3.10+
- Windows (Linux/Mac support coming)
- Gemini API Key (free from Google)
click the OK button
type hello world
press enter
open Chrome
search for AI trends
go to twitter.com
navigate to github.com
search for Python tutorials and open first result
write an article about AI and post to X
research machine learning and create a summary
help - Show help
voice - Toggle voice input
exit - Quit
Edit config.json to customize:
{
"social_media": {
"posting_strategy": "tab_navigation",
"supported_platforms": ["X/Twitter", "Facebook", "LinkedIn", ...]
}
}Strategies:
keyboard_shortcut- Fast (uses N key for Twitter)tab_navigation- Reliable (presses Tab to navigate)smart- With verification (uses screen capture)
ai-automation-assistant/
โโโ run.py # ๐ Main launcher (use this!)
โโโ run.bat # ๐ Windows launcher
โโโ setup_venv.bat # Setup script
โโโ .env # ๐ API key (create this)
โโโ config.json # โ๏ธ Configuration
โโโ requirements.txt # ๐ฆ Dependencies
โ
โโโ ๐ ai_brain/ # ๐ง AI command processing
โโโ ๐ automation_engine/ # ๐ค Mouse/keyboard control
โโโ ๐ shared/ # ๐ง Shared utilities
โ
โโโ ๐ docs/ # ๐ Documentation
โโโ ๐ scripts/ # ๐จ Helper scripts & old launchers
โโโ ๐ tests/ # ๐งช Test files
โโโ ๐ venv/ # ๐ Virtual environment
setup_venv.batCreate .env with:
GEMINI_API_KEY=your_key_here
venv\Scripts\activate.bat
pip install -r requirements.txtpip install SpeechRecognition pyaudio> search for Python tutorials
โ Opens Chrome
โ Searches "Python tutorials"
โ Shows results
> search for best restaurants and open first result
โ Opens Chrome
โ Searches "best restaurants"
โ Presses Tab+Tab+Enter
โ Opens first result
> write an article about AI and post to X
โ Researches AI topics
โ Generates article
โ Opens Chrome
โ Goes to X.com
โ Posts content
Contributions welcome! The codebase is clean and well-documented.
MIT License - See LICENSE file
- Google Gemini AI - Natural language processing
- PyAutoGUI - Automation
- Rich - Beautiful terminal UI
- Documentation: See
docs/folder - Issues: Check error messages (they're helpful!)
- Quick Check: Run
run.batand choose option 6
Made with โค๏ธ for automation enthusiasts
๐ Start now: run.bat