Firecrawl - Open-WebUI-Pipelines is a collection of Python-based functions designed to extend the capabilities of Open WebUI with additional pipelines. These pipelines allow users to interact with Firecrawl API, and customize the Open WebUI experience.
- Firecrawl Integration: Web crawling, Scraping, and URL mapping capabilities through Firecrawl API.
- Flexible Configuration: Use environment variables to adjust pipeline settings dynamically.
To use these pipelines, ensure the following:
- An Active Open WebUI Instance: You must have Open WebUI installed and running.
- Firecrawl API Access: You'll need to create an account and obtain an API key from Firecrawl.
- Admin Access: To install pipelines in Open WebUI, you must have administrator privileges.
To install and configure pipelines in Open WebUI, follow these steps:
-
Ensure Admin Access:
- You must be an admin in Open WebUI to install pipelines.
-
Access Admin Settings:
- Navigate to the Admin Settings section in Open WebUI.
-
Go to the Function Tab:
- Open the Functions tab in the admin panel.
-
Create a New Function:
- Click Add New Function.
- Copy the pipeline code from this repository and paste it into the function editor.
-
Set Environment Variables (if required):
- The Firecrawl pipelines require an API key via environment variables.
- Set WEBUI_SECRET_KEY for secure encryption of sensitive API keys.
-
Save and Activate:
- Save the function, and it will be available for use within Open WebUI.
Firecrawl is a powerful API service that takes a URL, crawls it, and converts it into clean markdown. We crawl all accessible subpages and give you clean LLM-ready data.
The following pipelines integrate Firecrawl with Open WebUI:
- Crawls websites to extract content from multiple pages
- Configurable crawl depth and URL limits
- Support for path inclusion/exclusion patterns
- Sitemap integration for efficient crawling
- Handles both internal and external links
- Extracts content from specific URLs
- Multiple output formats (markdown, HTML, text)
- Main content extraction to filter out navigation, ads, etc.
- Custom tag inclusion/exclusion
- Mobile device emulation
- Ad blocking capabilities
- Discovers all URLs on a website
- Search functionality to find specific URLs
- Sitemap integration
- Subdomain discovery
- URL pattern matching
- Structured data extraction from websites using a specified prompt and schema
- Supports multiple URLs
- Options to enable web search, ignore sitemaps, include subdomains, and show sources
- Customizable scrape options
To use the Firecrawl pipelines, you need to:
- Create a Firecrawl account at https://www.firecrawl.dev/app/api-keys
- Generate an API key from the dashboard
- Configure the pipeline with your API key
For detailed documentation on the Firecrawl API, visit https://docs.firecrawl.dev/api-reference/introduction
For support with Firecrawl, contact [email protected]
🔗 Firecrawl GitHub Repository - Give it a star ⭐️ to support the project!
Contributions are welcome! We appreciate your interest in improving these pipelines. You can contribute in several ways:
- Open an Issue: Have suggestions, found a bug, or want to request a feature? Create an issue to let us know.
- Submit a Pull Request: Have improvements or fixes ready? Submit a PR with your changes.
- Share Feedback: Your insights on how to make these pipelines better are valuable to us.
We review all contributions and will work with you to get them integrated. Thank you for helping make this project better!
This project is licensed under the MIT License - see the LICENSE file for details. 📄
If you have any questions, suggestions, or need assistance, please open an issue or join our Discord community to connect with us! 🤝
For Firecrawl-specific support, contact [email protected].
Follow us on social media to stay updated with the latest news and features:
- 𝕏: @firecrawl_dev
- LinkedIn: Firecrawl