AI-Powered CAPTCHA Solver

This project is a Python-based command-line tool that uses large multimodal models (LMMs) like OpenAI's GPT-4o and Google's Gemini to automatically solve various types of CAPTCHAs. It leverages Selenium for web browser automation to interact with web pages and solve CAPTCHAs in real-time.

A successful solve is recorded as a GIF in the successful_solves directory.

Key Features

Multiple AI Providers: Supports both OpenAI (e.g., GPT-4o) and Google Gemini (e.g., Gemini 2.5 Pro) models.
Multiple CAPTCHA Types: Capable of solving a variety of CAPTCHA challenges.
Browser Automation: Uses Selenium to simulate human interaction with web pages.
Extensible: The modular design makes it easy to add support for new CAPTCHA types or AI models.
Benchmarking: Includes a script to test the performance and success rate of the solvers.

Supported CAPTCHA Types

The tool can solve the following CAPTCHA types found on the 2captcha.com/demo/ pages:

Text Captcha: Simple text recognition.
Complicated Text Captcha: Text with more distortion and noise.
reCAPTCHA v2: Google's "I'm not a robot" checkbox with image selection challenges.
Puzzle Captcha: Slider puzzles where a piece must be moved to the correct location.
Audio Captcha: Transcribing spoken letters or numbers from an audio file.

Prerequisites

Python 3.7+
Mozilla Firefox

Installation & Configuration

Clone the repository:

git clone https://github.com/aydinnyunus/ai-captcha-bypass
cd ai-captcha-bypass

Install dependencies:
```
pip install -r requirements.txt
```
Set up your API keys: Create a .env file in the root directory by copying the example file:
```
cp .env.example .env
```
Open the .env file and add your API keys for OpenAI and/or Google Gemini:
```
OPENAI_API_KEY="sk-..."
GOOGLE_API_KEY="..."
```

Usage

The primary script for running the solver is main.py. You need to specify the CAPTCHA type to test. You can also specify the AI provider and model.

Command-Line Arguments

captcha_type: (Required) The type of CAPTCHA to solve.
- Choices: puzzle, text, complicated_text, recaptcha_v2, audio
--provider: The AI provider to use.
- Choices: openai, gemini (Default: openai)
--model: The specific model to use (e.g., gpt-4o, gemini-2.5-flash).
--file: Path to an audio file for the audio test. (Default: files/audio.mp3)

Examples

Solve a simple text CAPTCHA using OpenAI (default):

python main.py text

Solve a complicated text CAPTCHA using Gemini:

python main.py complicated_text --provider gemini

Solve a reCAPTCHA v2 challenge using Gemini:

python main.py recaptcha_v2 --provider gemini

Transcribe an audio CAPTCHA:

python main.py audio --file files/radio.wav --provider openai

Solve a puzzle CAPTCHA using a specific OpenAI model:

python main.py puzzle --provider openai --model gpt-4o

How It Works

Launch Browser: The script starts a Firefox browser instance using Selenium.
Navigate: It goes to the demo page for the specified CAPTCHA type.
Capture: It takes screenshots of the CAPTCHA challenge (image, instructions, or puzzle).
AI Analysis: The captured images or audio files are sent to the selected AI provider (OpenAI or Gemini) with a specific prompt tailored to the CAPTCHA type.
Get Action: The AI returns the solution (text, coordinates, or image selections).
Perform Action: The script uses Selenium to enter the text, move the slider, or click the correct images.
Verify: The script checks for a success message to confirm the CAPTCHA was solved.

Success Examples

Here are some examples of the solver successfully bypassing different CAPTCHA types.

CAPTCHA Type	OpenAI (GPT-4o)	Gemini (2.5 Pro)
reCAPTCHA v2
Puzzle
Complicated Text

Project Structure

main.py: The main entry point to run the CAPTCHA solver tests. Handles command-line arguments and calls the appropriate test functions.
ai_utils.py: Contains all the functions for interacting with the OpenAI and Gemini APIs. This is where prompts are defined and API calls are made.
puzzle_solver.py: Implements the logic specifically for solving the multi-step slider puzzle CAPTCHA.
benchmark.py: A script for running multiple tests to evaluate the performance and success rate of the different solvers.
requirements.txt: A list of all the Python packages required for the project.
screenshots/: Directory where screenshots of CAPTCHAs are temporarily saved.
successful_solves/: Directory where GIFs of successful solutions are saved.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
browser_use		browser_use
successful_solves		successful_solves
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
ai_utils.py		ai_utils.py
main.py		main.py
puzzle_solver.py		puzzle_solver.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI-Powered CAPTCHA Solver

Key Features

Supported CAPTCHA Types

Prerequisites

Installation & Configuration

Usage

Command-Line Arguments

Examples

How It Works

Success Examples

Project Structure

Contact

About

Uh oh!

Releases

Packages

Languages

License

lineCode/ai-captcha-bypass

Folders and files

Latest commit

History

Repository files navigation

AI-Powered CAPTCHA Solver

Key Features

Supported CAPTCHA Types

Prerequisites

Installation & Configuration

Usage

Command-Line Arguments

Examples

How It Works

Success Examples

Project Structure

Contact

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages