curllm Documentation v2

Current documentation for curllm - Browser Automation with Multi-Provider LLM support.

🚀 Quick Start

# Install
pip install curllm

# Extract data (uses local Ollama by default)
curllm "https://example.com" -d "Extract all links"

# Use cloud provider (auto-detects API key from environment)
CURLLM_LLM_PROVIDER=openai/gpt-4o-mini curllm "https://example.com" -d "Extract products"

🤖 LLM Providers

curllm supports multiple LLM providers via litellm:

Provider	Format	Environment Variable
Ollama (local)	`ollama/qwen2.5:7b`	-
OpenAI	`openai/gpt-4o-mini`	`OPENAI_API_KEY`
Anthropic	`anthropic/claude-3-haiku-20240307`	`ANTHROPIC_API_KEY`
Gemini	`gemini/gemini-2.0-flash`	`GEMINI_API_KEY`
Groq	`groq/llama3-70b-8192`	`GROQ_API_KEY`
DeepSeek	`deepseek/deepseek-chat`	`DEEPSEEK_API_KEY`

from curllm_core import CurllmExecutor, LLMConfig

# Auto-detects API key from OPENAI_API_KEY
executor = CurllmExecutor(LLMConfig(provider="openai/gpt-4o-mini"))

# Or specify explicitly
executor = CurllmExecutor(LLMConfig(
    provider="anthropic/claude-3-haiku-20240307",
    api_token="sk-ant-..."
))

📁 Documentation Structure

docs/v2/
├── architecture/                  # System architecture docs
│   ├── ARCHITECTURE.md            # Core architecture
│   ├── DSL_SYSTEM.md              # 🆕 Strategy-based extraction
│   ├── ATOMIC_QUERY_SYSTEM.md     # DOM Toolkit
│   ├── STREAMWARE.md              # Component system
│   ├── LLM.md                     # LLM integration
│   └── COMPONENTS.md              # Component reference
├── features/                      # Feature documentation
│   ├── FORM_FILLING.md            # Form automation
│   ├── ITERATIVE_EXTRACTOR.md     # Atomic extraction
│   ├── HIERARCHICAL_PLANNER.md    # 3-level LLM optimization
│   └── VISION_FORM_ANALYSIS.md    # Visual form detection
├── guides/                        # User guides
│   ├── Installation.md            # Setup instructions
│   ├── EXAMPLES.md                # Code examples
│   ├── Docker.md                  # Docker deployment
│   └── Troubleshooting.md
└── api/                           # API reference
    ├── API.md                     # REST API
    └── CLI_COMMANDS.md            # CLI reference

🆕 Recent Additions

December 2024

DSL System - Strategy-based extraction with auto-learning
- YAML strategy files for reusable extraction recipes
- SQLite Knowledge Base tracks algorithm success per domain
- Automatic fallback algorithms when primary fails
- 80% reduction in LLM calls through pure JS DOM Toolkit
DOM Toolkit - Pure JavaScript atomic queries
- Zero LLM calls for DOM analysis
- Statistical container detection
- Pattern recognition and selector generation

November 2024

Hierarchical Planner - Revolutionary 3-level LLM optimization
- 87% reduction in token usage
- Interactive detail requesting
- Automatic threshold-based activation
Form Filling Guide - Complete form automation documentation
- Priority-based value handling
- Automatic error detection
- Email validation fallbacks

📂 Code Examples

See the examples/ directory for runnable code:

Example	Description	Link
LLM Providers	Use OpenAI, Anthropic, Gemini, Groq	examples/llm-providers/
Product Extraction	Extract product data	examples/extraction/products/
Form Filling	Automate contact forms	examples/forms/contact/
BQL Queries	Browser Query Language	examples/bql/
Streamware	Component pipelines	examples/streamware/
API Clients	Node.js, PHP clients	examples/api-clients/

🔗 External Links

📝 Contributing to Documentation

Documentation improvements are welcome! To contribute:

Edit the relevant .md file in docs/
Ensure navigation links are maintained
Test all internal links
Submit a pull request

Documentation Standards

Navigation: Every page should have header and footer navigation
Formatting: Use clear headings, code blocks, and examples
Links: Always use relative links for internal documentation
Examples: Include practical, runnable code samples

💡 Tips

Use browser's search (Ctrl+F / Cmd+F) to find topics quickly
Check the INDEX for a complete documentation map
Start with Examples if you learn by doing
Refer to Troubleshooting when encountering issues

📚 Documentation Index | ⬆️ Back to Top | Main README

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

curllm Documentation v2

🚀 Quick Start

🤖 LLM Providers

📁 Documentation Structure

🆕 Recent Additions

December 2024

November 2024

📂 Code Examples

🔗 External Links

📝 Contributing to Documentation

Documentation Standards

💡 Tips

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

curllm Documentation v2

🚀 Quick Start

🤖 LLM Providers

📁 Documentation Structure

🆕 Recent Additions

December 2024

November 2024

📂 Code Examples

🔗 External Links

📝 Contributing to Documentation

Documentation Standards

💡 Tips