This project is a Python-based command-line tool that uses large multimodal models (LMMs) like OpenAI's GPT-4o and Google's Gemini to automatically solve various types of CAPTCHAs. It leverages Selenium for web browser automation to interact with web pages and solve CAPTCHAs in real-time.
A successful solve is recorded as a GIF in the successful_solves
directory.
- Multiple AI Providers: Supports both OpenAI (e.g., GPT-4o) and Google Gemini (e.g., Gemini 2.5 Pro) models.
- Multiple CAPTCHA Types: Capable of solving a variety of CAPTCHA challenges.
- Browser Automation: Uses Selenium to simulate human interaction with web pages.
- Extensible: The modular design makes it easy to add support for new CAPTCHA types or AI models.
- Benchmarking: Includes a script to test the performance and success rate of the solvers.
The tool can solve the following CAPTCHA types found on the 2captcha.com/demo/
pages:
- Text Captcha: Simple text recognition.
- Complicated Text Captcha: Text with more distortion and noise.
- reCAPTCHA v2: Google's "I'm not a robot" checkbox with image selection challenges.
- Puzzle Captcha: Slider puzzles where a piece must be moved to the correct location.
- Audio Captcha: Transcribing spoken letters or numbers from an audio file.
- Python 3.7+
- Mozilla Firefox
-
Clone the repository:
git clone https://github.com/aydinnyunus/ai-captcha-bypass cd ai-captcha-bypass
-
Install dependencies:
pip install -r requirements.txt
-
Set up your API keys: Create a
.env
file in the root directory by copying the example file:cp .env.example .env
Open the
.env
file and add your API keys for OpenAI and/or Google Gemini:OPENAI_API_KEY="sk-..." GOOGLE_API_KEY="..."
The primary script for running the solver is main.py
. You need to specify the CAPTCHA type to test. You can also specify the AI provider and model.
captcha_type
: (Required) The type of CAPTCHA to solve.- Choices:
puzzle
,text
,complicated_text
,recaptcha_v2
,audio
- Choices:
--provider
: The AI provider to use.- Choices:
openai
,gemini
(Default:openai
)
- Choices:
--model
: The specific model to use (e.g.,gpt-4o
,gemini-2.5-flash
).--file
: Path to an audio file for theaudio
test. (Default:files/audio.mp3
)
Solve a simple text CAPTCHA using OpenAI (default):
python main.py text
Solve a complicated text CAPTCHA using Gemini:
python main.py complicated_text --provider gemini
Solve a reCAPTCHA v2 challenge using Gemini:
python main.py recaptcha_v2 --provider gemini
Transcribe an audio CAPTCHA:
python main.py audio --file files/radio.wav --provider openai
Solve a puzzle CAPTCHA using a specific OpenAI model:
python main.py puzzle --provider openai --model gpt-4o
- Launch Browser: The script starts a Firefox browser instance using Selenium.
- Navigate: It goes to the demo page for the specified CAPTCHA type.
- Capture: It takes screenshots of the CAPTCHA challenge (image, instructions, or puzzle).
- AI Analysis: The captured images or audio files are sent to the selected AI provider (OpenAI or Gemini) with a specific prompt tailored to the CAPTCHA type.
- Get Action: The AI returns the solution (text, coordinates, or image selections).
- Perform Action: The script uses Selenium to enter the text, move the slider, or click the correct images.
- Verify: The script checks for a success message to confirm the CAPTCHA was solved.
Here are some examples of the solver successfully bypassing different CAPTCHA types.
CAPTCHA Type | OpenAI (GPT-4o) | Gemini (2.5 Pro) |
---|---|---|
reCAPTCHA v2 | ![]() |
![]() |
Puzzle | ![]() |
![]() |
Complicated Text | ![]() |
![]() |
main.py
: The main entry point to run the CAPTCHA solver tests. Handles command-line arguments and calls the appropriate test functions.ai_utils.py
: Contains all the functions for interacting with the OpenAI and Gemini APIs. This is where prompts are defined and API calls are made.puzzle_solver.py
: Implements the logic specifically for solving the multi-step slider puzzle CAPTCHA.benchmark.py
: A script for running multiple tests to evaluate the performance and success rate of the different solvers.requirements.txt
: A list of all the Python packages required for the project.screenshots/
: Directory where screenshots of CAPTCHAs are temporarily saved.successful_solves/
: Directory where GIFs of successful solutions are saved.