Browser Automation with Amazon Nova Act

Automate web tasks using natural language with Amazon Nova Act and Bedrock. Transform routine browser interactions into simple conversational commands that free up your time for more meaningful work.

What is Nova Act?

Nova Act is Amazon's specialized AI model designed specifically for reliable web browser automation. Unlike general-purpose language models, Nova Act excels at translating natural language instructions into precise browser actions—clicking, typing, scrolling, and navigating just like a human would.

Key Features

🎯 Natural Language Browser Control

Control any website using simple, conversational commands:

"Search for wireless headphones on Amazon"
"Find the best-rated product under $100"
"Add it to my cart and proceed to checkout"

🧠 Intelligent Agent Layer

Bridges the gap between human intent and browser actions:

Purpose-Driven Navigation: Knows which websites to visit and what elements matter
Contextual Continuity: Maintains context across complex multi-step tasks
Smart Task Breakdown: Converts high-level goals into step-by-step browser actions

🚀 Multi-Session Browsing

Enable multiple sessions (or users) to automate browser tasks simultaneously:

Session-Based Isolation: Each user gets a dedicated browser instance with unique session ID
Independent Browser Profiles: Separate cookies, authentication, and browsing data per session
Parallel Task Execution: Multiple browser automation tasks run concurrently without interference
Scalable Architecture: Handles dozens of concurrent users with isolated browser contexts

👥 Human-in-the-Loop

Seamlessly handles scenarios that require human judgment:

Authentication challenges and CAPTCHAs
Ambiguous UI elements
Unexpected interface changes
Intelligent handoff between automated and manual control

🔌 Model Context Protocol (MCP) Integration

Advanced tool integration through standardized protocol:

Standardized Tool Communication enables seamless integration of browser automation with external services
Streamable HTTP Transport enables real-time bidirectional communication between agents and tools with optimizerd resource usage

Demo

Real-World Use Cases

This system enables automation across various domains:

Fashion Research: Trend analysis and product comparison
Financial Analysis: Market research and data gathering
E-commerce: Shopping, price comparison, and inventory management
News Aggregation: Technology trends and industry insights
Travel Planning: Flight searches, hotel bookings, and itinerary planning

E-commerce Shopping (from Search to Cart)

Go to Amazon and search for 'laptop stand'. Filter by brand 'AmazonBasics', check customer ratings above 4 stars, and add the adjustable one to your cart.

Financial Product (ETF) Comparison

Go to https://investor.vanguard.com/investment-products/index-fudds
Filter for Stock-Sector funds only, then identify which sector ETF has the best YTD performance. Also note its expense ratio and 5-year return.

Fashion Trend Analysis

Analyze current fashion trends on Pinterest for “summer 2025 fashion women".

Quick Start

Prerequisites

Operating System: MacOS (recommended)
Python: 3.10 or higher
Node.js: 18 or higher
Package Manager: npm or yarn

Installation

# Clone the repository
git clone https://github.com/aws-samples/browser-control-with-nova-act.git
cd browser-control-with-nova-act

# Backend setup
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
cd py-backend
pip install -r requirements.txt

# Frontend setup
cd ..
npm install

Configuration

1. Set up Environment Variables

# Copy the example environment file
cd py-backend
cp .env.example .env

# Edit .env file and add your Nova Act API Key
# NOVA_ACT_API_KEY=your_api_key_here

Alternative: Use system environment variables

export NOVA_ACT_API_KEY="your_api_key_here"

2. Configure Browser Settings (Optional) All browser settings can be configured in the .env file or by editing py-backend/app/libs/config/config.py:

# Core browser settings
BROWSER_HEADLESS = True  # Set to False for debugging
BROWSER_START_URL = "https://www.google.com"
BROWSER_MAX_STEPS = 2  # Keep small for reliability

# Browser profile (for persistent sessions)
BROWSER_USER_DATA_DIR = '/path/to/chrome/profile'

3. AI Model Configuration

# Multimodal models required for screenshot interpretation
DEFAULT_MODEL_ID = "us.amazon.nova-premier-v1:0"
# Tested models: Nova Premier, Claude 3.7 Sonnet, Claude 3.5 Sonnet

Running the Application

npm run dev

Visit http://localhost:3000 to start automating!

Usage Examples

Basic Commands

# Simple navigation
"Go to amazon.com"
"Search for wireless headphones"

# Interactive actions  
"Click the search bar and type 'gaming laptop'"
"Scroll down to see more products"
"Select the third result"

# Complex research tasks
"Find gaming laptops under $1000 and compare their specs"
"Research the latest AI news and summarize key trends"
"Book a flight from Seattle to New York for next Friday"

Architecture Overview

The system uses a three-tier architecture:

Supervisor Layer: Breaks down complex tasks and coordinates workflow
Agent Layer: Executes browser missions and interprets results
Nova Act Layer: Performs direct browser interactions

Learn More

A detailed blog post covering technical implementation, architectural decisions, and advanced usage patterns will be published soon. It will include:

Deep dive into the agent architecture
Advanced prompting strategies
Performance optimization techniques
Troubleshooting common issues
Real-world deployment scenarios

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Ready to automate your web workflows? Start with npm run dev and experience the future of browser automation! 🚀

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
app		app
assets		assets
components		components
hooks		hooks
lib		lib
public		public
py-backend		py-backend
services		services
types		types
utils		utils
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
components.json		components.json
config.ts		config.ts
constants.ts		constants.ts
next-env.d.ts		next-env.d.ts
next.config.mjs		next.config.mjs
package-lock.json		package-lock.json
package.json		package.json
postcss.config.mjs		postcss.config.mjs
tailwind.config.ts		tailwind.config.ts
test.py		test.py
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Browser Automation with Amazon Nova Act

What is Nova Act?

Key Features

🎯 Natural Language Browser Control

🧠 Intelligent Agent Layer

🚀 Multi-Session Browsing

👥 Human-in-the-Loop

🔌 Model Context Protocol (MCP) Integration

Demo

Real-World Use Cases

E-commerce Shopping (from Search to Cart)

Financial Product (ETF) Comparison

Fashion Trend Analysis

Quick Start

Prerequisites

Installation

Configuration

Running the Application

Usage Examples

Basic Commands

Architecture Overview

Learn More

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

License

aws-samples/browser-control-with-nova-act

Folders and files

Latest commit

History

Repository files navigation

Browser Automation with Amazon Nova Act

What is Nova Act?

Key Features

🎯 Natural Language Browser Control

🧠 Intelligent Agent Layer

🚀 Multi-Session Browsing

👥 Human-in-the-Loop

🔌 Model Context Protocol (MCP) Integration

Demo

Real-World Use Cases

E-commerce Shopping (from Search to Cart)

Financial Product (ETF) Comparison

Fashion Trend Analysis

Quick Start

Prerequisites

Installation

Configuration

Running the Application

Usage Examples

Basic Commands

Architecture Overview

Learn More

Contributing

License

About

Resources

License

Code of conduct

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages