A multi-lingual hands-free speech-to-action AI browser agent. Made to truly expand internet accessbility to all people.
- Python β₯ 3.11 (Python 3.12 recommended)
- Git for cloning the repository
- Microphone for voice input (optional but recommended)
Install uv (recommended):
Windows (PowerShell as Admin):
powershell -ExecutionPolicy ByPass -Command "irm https://astral.sh/uv/install.ps1 | iex"macOS / Linux:
curl -LsSf https://astral.sh/uv/install.sh | shAfter installing, restart your terminal and verify:
uv --version# Clone the repository
git clone https://github.com/taherakolawala/Browser4All.git
cd Browser4All
# Create virtual environment
uv venv --python 3.12
# Activate environment
# Windows:
.\.venv\Scripts\Activate.ps1
# macOS/Linux:
source .venv/bin/activate
# Install all dependencies (includes tkinter for UI)
uv pip install browser-use elevenlabs pygame aiohttp python-dotenv pydantic SpeechRecognition pyaudio
# Download Chromium browser
uvx playwright install chromium --with-deps --no-shellCreate a .env file in the project root:
# OpenAI API Key (required for browser agent)
OPENAI_API_KEY=your_openai_api_key_here
# ElevenLabs API Key (required for speech)
ELEVEN_LABS_API_KEY=your_elevenlabs_api_key_hereGet API Keys:
- OpenAI: https://platform.openai.com/api-keys
- ElevenLabs: https://elevenlabs.io/ (free tier: 10,000 characters/month)
Test your microphone:
python microphone_selector.pyThis will help you:
- Select the correct microphone device
- Test audio recording quality
- Automatically configure speech settings
python launcher.pyThe agent will:
- Launch the hovering UI in the top-right corner of your screen
- Greet you with speech and wait for your voice or text input
- Display all activity in the transparent interface above your browser
If you prefer traditional pip:
# Clone and enter directory
git clone https://github.com/taherakolawala/Browser4All.git
cd Browser4All
# Create virtual environment
python -m venv .venv
# Activate environment
# Windows:
.venv\Scripts\activate
# macOS/Linux:
source .venv/bin/activate
# Install dependencies
pip install -r speech_requirements.txt
pip install browser-use
# Download Chromium
playwright install chromium --with-depsVoice input is enabled by default. The agent will:
- Listen for your voice responses automatically
- Fall back to text input if voice fails
- Allow you to say "use text input" to switch modes
Configure in main.py:
configure_speech(
enabled=True, # Enable text-to-speech
speak_questions=True, # Agent speaks questions aloud
speak_confirmations=True, # Agent speaks confirmations
listen_for_responses=True, # Voice input enabled (NEW!)
voice_input_default=True, # Voice is primary input mode (NEW!)
recognition_timeout=10, # Voice timeout (seconds)
debug_audio=True, # Hear what was recorded (NEW!)
phrase_timeout=6 # Silence duration to end recording (NEW!)
)Default voice is Rachel. To change, edit speech_handler.py:
# Popular voice options:
self.voice_id = "21m00Tcm4TlvDq8ikWAM" # Rachel - Pleasant female (default)
self.voice_id = "ErXwobaYiN019PkySvjV" # Antoni - Calm male
self.voice_id = "EXAVITQu4vr4xnSDxMaL" # Bella - Soft female
self.voice_id = "yoZ06aMxZJJ28mfd3POQ" # Sam - Natural male
self.voice_id = "pNInz6obpgDQGcFmaJgB" # Adam - Friendly maleThe microphone recording duration is controlled by:
phrase_timeout: Seconds of silence to end recording (default: 4 seconds)recognition_timeout: Max seconds to wait for speech to start (default: 10 seconds)
To change recording length:
# In main.py configure_speech()
phrase_timeout=8, # Allow 8 seconds of silence before endingConfigure browser appearance in main.py:
browser = Browser(
window_size={'width': 1920, 'height': 1080}, # Browser window size
window_position={'width': 100, 'height': 50}, # Position from top-left
headless=False, # Show browser window
viewport={'width': 1280, 'height': 720} # Content area size
)Common sizes:
# Full HD
window_size={'width': 1920, 'height': 1080}
# Laptop standard
window_size={'width': 1366, 'height': 768}
# Compact window (current default)
window_size={'width': 960, 'height': 540}browser = Browser(
# Display
headless=False, # Show browser window
window_size={'width': 1200, 'height': 800},
# Downloads
accept_downloads=True, # Auto-accept downloads
downloads_path='./downloads', # Download directory
auto_download_pdfs=True, # Auto-download PDFs
# Recording & Debugging
record_video_dir='./recordings', # Save session videos
record_har_path='./trace.har', # Save network traces
# Device Emulation
user_agent='Mozilla/5.0...', # Custom user agent
)Browser4All features a transparent hovering interface that displays all agent activity in real-time:
- Transparent overlay: Hovers above the browser window with customizable opacity
- Real-time updates: Shows all terminal output, questions, responses, and agent actions
- ** Color-coded messages**: Questions (blue), success (green), warnings (yellow), errors (red)
- Draggable & resizable: Click and drag to reposition, minimize/maximize controls
- Timestamped logs: Each message includes timestamp for tracking
- Always on top: Stays visible above the browser for continuous monitoring
Customize the hovering interface in main.py:
# Initialize with custom settings
initialize_ui(
width=500, # Window width in pixels
height=400, # Window height in pixels
opacity=0.88 # Transparency (0.0-1.0, higher = more opaque)
)The UI automatically:
- Positions in the top-right corner by default
- Categorizes messages by content and emoji
- Auto-scrolls to show latest activity
- Manages memory by limiting stored messages
- Shuts down cleanly when the agent exits
Browser4All uses voice-first interaction with visual feedback:
- Agent greets you: "Hello! What would you like me to help you with?" (appears in hovering UI)
- You speak: "Go to YouTube and search for cat videos" (logged in UI)
- Agent acts: Navigates to YouTube, searches (progress shown in UI)
- Agent asks: "I found cat videos. What would you like me to do next?" (question highlighted in blue)
- You respond: Voice or text - "Play the first video" (response logged in purple)
- Automatic microphone calibration for optimal recording
- Smart fallback: Switches to text if voice fails
- Flexible input: Say "use text input" to switch modes anytime
- Debug mode: Hear your recorded audio played back
- Visual feedback: All voice interactions logged in hovering UI
- Real-time monitoring: See all agent activity as it happens
- Message categorization: Color-coded by type (questions, responses, errors, etc.)
- Persistent display: Stays visible above browser window
- Interactive controls: Drag to move, minimize/close buttons
- Memory efficient: Auto-manages message history
# Open the launcher:
python launcher.pyExample interactions (all displayed in real-time UI):
[UI] π Browser4All Assistant Started
[UI] π Speaking greeting...
You (voice): "reddit"
[UI] π€ Agent asks: "What would you like me to do on Reddit?"
You (voice): "find funny posts"
[UI] π Agent navigating to Reddit...
[UI] β
Found funny posts, asking what's next
You (voice): "amazon"
[UI] π€ Agent asks: "What would you like me to search for on Amazon?"
You (voice): "wireless headphones under 100 dollars"
[UI] π Searching Amazon...
[UI] β
Found results, asking if you want details
uv command not found:
- Restart your terminal after installation
- Windows: Run PowerShell as Administrator
- Verify installation:
uv --version
Permission errors:
- Windows: Use PowerShell as Administrator
- macOS/Linux: Don't use
sudowith uv
Chromium installation fails:
# Force reinstall
uvx playwright install chromium --force
# Linux additional deps
sudo apt-get install libnss3 libatk-bridge2.0-0 libxcomposite1 libdrm2PyAudio installation errors:
# Windows (if pip fails)
pip install pipwin
pipwin install pyaudio
# Or download wheel from: https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudioNo speech output:
- Check volume and speakers/headphones
- Verify
ELEVEN_LABS_API_KEYin.env - Test:
python -c "import pygame; pygame.mixer.init(); print('Audio OK')"
UI not appearing:
- Check if
tkinteris installed:python -c "import tkinter; print('UI OK')" - Windows: tkinter included with Python
- Linux:
sudo apt-get install python3-tk - Verify screen resolution supports UI positioning
No microphone input:
- Check microphone permissions in system settings
- Run
python microphone_selector.pyto test - Ensure microphone isn't being used by other apps
Voice recognition fails:
- Speak clearly with minimal background noise
- Increase
recognition_timeoutto 15+ seconds - Set
debug_audio=Trueto hear recorded audio quality
Invalid OpenAI key:
- Verify key format:
sk-...(starts with sk-) - Check billing status: https://platform.openai.com/usage
ElevenLabs quota exceeded:
- Check usage: https://elevenlabs.io/usage
- Free tier: 10,000 characters/month
- Consider upgrading or disable speech:
enabled=False
Windows (complete setup):
# Run in PowerShell as Admin
git clone https://github.com/taherakolawala/Browser4All.git; cd Browser4All; powershell -ExecutionPolicy ByPass -Command "irm https://astral.sh/uv/install.ps1 | iex"
# Restart terminal, then:
uv venv --python 3.12; .\.venv\Scripts\Activate.ps1; uv pip install browser-use elevenlabs pygame aiohttp python-dotenv pydantic SpeechRecognition pyaudio; uvx playwright install chromium --with-deps --no-shell
# Add API keys to .env file, then: python main.pymacOS/Linux (complete setup):
git clone https://github.com/taherakolawala/Browser4All.git && cd Browser4All && curl -LsSf https://astral.sh/uv/install.sh | sh
# Restart terminal, then:
source ~/.bashrc && uv venv --python 3.12 && source .venv/bin/activate && uv pip install browser-use elevenlabs pygame aiohttp python-dotenv pydantic SpeechRecognition pyaudio && uvx playwright install chromium --with-deps --no-shell
# Add API keys to .env file, then: python main.pyElevenLabs (Speech):
- Free tier: 10,000 characters/month (~200 questions)
- Paid: $5/month for 30,000 characters
- Monitor at: https://elevenlabs.io/usage
OpenAI (AI Brain):
- GPT-4.1-mini recommended: ~$0.01-0.05 per session
- Monitor at: https://platform.openai.com/usage
Total typical cost: $0-10/month depending on usage.
- ** Real-time visual feedback** of all agent activity
- ** Transparent overlay** that hovers above your browser
- ** Color-coded messages** for easy tracking
- ** Interactive controls** - drag, minimize, close
- ** Timestamped logs** for session tracking
All terminal output automatically appears in both:
- ** Terminal** - Traditional command line output
- ** Hovering UI** - Visual overlay with timestamps and colors
Browser4All - Voice-controlled browser automation with visual feedback π€π€πΊ