Skip to content

Commit 576fe9f

Browse files
authored
Merge pull request #5 from ScrapeGraphAI/pre/beta
Searchscraper Tool
2 parents c224f56 + 374df1a commit 576fe9f

15 files changed

+336
-280
lines changed

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,18 @@
1+
## [1.3.0-beta.1](https://github.com/ScrapeGraphAI/langchain-scrapegraph/compare/v1.2.1-beta.1...v1.3.0-beta.1) (2025-02-22)
2+
3+
4+
### Features
5+
6+
* searchscraper ([6a96801](https://github.com/ScrapeGraphAI/langchain-scrapegraph/commit/6a968015d9c8f4ce798111850b0f000c3317c467))
7+
* updated tests searchscraper ([a771564](https://github.com/ScrapeGraphAI/langchain-scrapegraph/commit/a771564838b637f6aef0277e5ca3d723208d6701))
8+
9+
## [1.2.1-beta.1](https://github.com/ScrapeGraphAI/langchain-scrapegraph/compare/v1.2.0...v1.2.1-beta.1) (2025-01-02)
10+
11+
12+
### Bug Fixes
13+
14+
* updated docs url ([f7b640c](https://github.com/ScrapeGraphAI/langchain-scrapegraph/commit/f7b640c29d9780a30212acb19b09247b765a41ff))
15+
116
## [1.2.0](https://github.com/ScrapeGraphAI/langchain-scrapegraph/compare/v1.1.0...v1.2.0) (2024-12-18)
217

318

README.md

Lines changed: 45 additions & 67 deletions
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@
22

33
[![License](https://img.shields.io/badge/License-MIT-blue.svg)](https://opensource.org/licenses/MIT)
44
[![Python Support](https://img.shields.io/pypi/pyversions/langchain-scrapegraph.svg)](https://pypi.org/project/langchain-scrapegraph/)
5-
[![Documentation](https://img.shields.io/badge/Documentation-Latest-green)](https://scrapegraphai.com/documentation)
5+
[![Documentation](https://img.shields.io/badge/Documentation-Latest-green)](https://docs.scrapegraphai.com/integrations/langchain)
66

77
Supercharge your LangChain agents with AI-powered web scraping capabilities. LangChain-ScrapeGraph provides a seamless integration between [LangChain](https://github.com/langchain-ai/langchain) and [ScrapeGraph AI](https://scrapegraphai.com), enabling your agents to extract structured data from websites using natural language.
88

@@ -58,98 +58,76 @@ result = tool.invoke({
5858
print(result)
5959
```
6060

61-
<details>
62-
<summary>🔍 Using Output Schemas with SmartscraperTool</summary>
63-
64-
You can define the structure of the output using Pydantic models:
61+
### 🌐 SearchscraperTool
62+
Search and extract structured information from the web using natural language prompts.
6563

6664
```python
67-
from typing import List
68-
from pydantic import BaseModel, Field
69-
from langchain_scrapegraph.tools import SmartScraperTool
65+
from langchain_scrapegraph.tools import SearchScraperTool
7066

71-
class WebsiteInfo(BaseModel):
72-
title: str = Field(description="The main title of the webpage")
73-
description: str = Field(description="The main description or first paragraph")
74-
urls: List[str] = Field(description="The URLs inside the webpage")
75-
76-
# Initialize with schema
77-
tool = SmartScraperTool(llm_output_schema=WebsiteInfo)
67+
# Initialize the tool (uses SGAI_API_KEY from environment)
68+
tool = SearchScraperTool()
7869

79-
# The output will conform to the WebsiteInfo schema
70+
# Search and extract information using natural language
8071
result = tool.invoke({
81-
"website_url": "https://www.example.com",
82-
"user_prompt": "Extract the website information"
72+
"user_prompt": "What are the key features and pricing of ChatGPT Plus?"
8373
})
8474

8575
print(result)
8676
# {
87-
# "title": "Example Domain",
88-
# "description": "This domain is for use in illustrative examples...",
89-
# "urls": ["https://www.iana.org/domains/example"]
77+
# "product": {
78+
# "name": "ChatGPT Plus",
79+
# "description": "Premium version of ChatGPT..."
80+
# },
81+
# "features": [...],
82+
# "pricing": {...},
83+
# "reference_urls": [
84+
# "https://openai.com/chatgpt",
85+
# ...
86+
# ]
9087
# }
9188
```
92-
</details>
93-
94-
### 💻 LocalscraperTool
95-
Extract information from HTML content using AI.
96-
97-
```python
98-
from langchain_scrapegraph.tools import LocalScraperTool
99-
100-
tool = LocalScraperTool()
101-
result = tool.invoke({
102-
"user_prompt": "Extract all contact information",
103-
"website_html": "<html>...</html>"
104-
})
105-
106-
print(result)
107-
```
10889

10990
<details>
110-
<summary>🔍 Using Output Schemas with LocalscraperTool</summary>
91+
<summary>🔍 Using Output Schemas with SearchscraperTool</summary>
11192

11293
You can define the structure of the output using Pydantic models:
11394

11495
```python
115-
from typing import Optional
96+
from typing import List, Dict
11697
from pydantic import BaseModel, Field
117-
from langchain_scrapegraph.tools import LocalScraperTool
98+
from langchain_scrapegraph.tools import SearchScraperTool
11899

119-
class CompanyInfo(BaseModel):
120-
name: str = Field(description="The company name")
121-
description: str = Field(description="The company description")
122-
email: Optional[str] = Field(description="Contact email if available")
123-
phone: Optional[str] = Field(description="Contact phone if available")
100+
class ProductInfo(BaseModel):
101+
name: str = Field(description="Product name")
102+
features: List[str] = Field(description="List of product features")
103+
pricing: Dict[str, Any] = Field(description="Pricing information")
104+
reference_urls: List[str] = Field(description="Source URLs for the information")
124105

125106
# Initialize with schema
126-
tool = LocalScraperTool(llm_output_schema=CompanyInfo)
127-
128-
html_content = """
129-
<html>
130-
<body>
131-
<h1>TechCorp Solutions</h1>
132-
<p>We are a leading AI technology company.</p>
133-
<div class="contact">
134-
<p>Email: [email protected]</p>
135-
<p>Phone: (555) 123-4567</p>
136-
</div>
137-
</body>
138-
</html>
139-
"""
140-
141-
# The output will conform to the CompanyInfo schema
107+
tool = SearchScraperTool(llm_output_schema=ProductInfo)
108+
109+
# The output will conform to the ProductInfo schema
142110
result = tool.invoke({
143-
"website_html": html_content,
144-
"user_prompt": "Extract the company information"
111+
"user_prompt": "What are the key features and pricing of ChatGPT Plus?"
145112
})
146113

147114
print(result)
148115
# {
149-
# "name": "TechCorp Solutions",
150-
# "description": "We are a leading AI technology company.",
151-
# "email": "[email protected]",
152-
# "phone": "(555) 123-4567"
116+
# "name": "ChatGPT Plus",
117+
# "features": [
118+
# "GPT-4 access",
119+
# "Faster response speed",
120+
# ...
121+
# ],
122+
# "pricing": {
123+
# "amount": 20,
124+
# "currency": "USD",
125+
# "period": "monthly"
126+
# },
127+
# "reference_urls": [
128+
# "https://openai.com/chatgpt",
129+
# ...
130+
# ]
153131
# }
154132
```
155133
</details>

examples/agent_example.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -11,7 +11,7 @@
1111

1212
from langchain_scrapegraph.tools import (
1313
GetCreditsTool,
14-
LocalScraperTool,
14+
SearchScraperTool,
1515
SmartScraperTool,
1616
)
1717

@@ -20,8 +20,8 @@
2020
# Initialize the tools
2121
tools = [
2222
SmartScraperTool(),
23-
LocalScraperTool(),
2423
GetCreditsTool(),
24+
SearchScraperTool(),
2525
]
2626

2727
# Create the prompt template

examples/localscraper_tool.py

Lines changed: 0 additions & 28 deletions
This file was deleted.

examples/localscraper_tool_schema.py

Lines changed: 0 additions & 38 deletions
This file was deleted.

examples/searchscraper_tool.py

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
from scrapegraph_py.logger import sgai_logger
2+
3+
from langchain_scrapegraph.tools import SearchScraperTool
4+
5+
sgai_logger.set_logging(level="INFO")
6+
7+
# Will automatically get SGAI_API_KEY from environment
8+
tool = SearchScraperTool()
9+
10+
# Example prompt
11+
user_prompt = "What are the key features and pricing of ChatGPT Plus?"
12+
13+
# Use the tool
14+
result = tool.invoke({"user_prompt": user_prompt})
15+
16+
print("\nResult:", result)

examples/searchscraper_tool_schema.py

Lines changed: 41 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
from typing import Dict, List
2+
3+
from pydantic import BaseModel, Field
4+
from scrapegraph_py.logger import sgai_logger
5+
6+
from langchain_scrapegraph.tools import SearchScraperTool
7+
8+
9+
class Feature(BaseModel):
10+
name: str = Field(description="Name of the feature")
11+
description: str = Field(description="Description of the feature")
12+
13+
14+
class PricingPlan(BaseModel):
15+
name: str = Field(description="Name of the pricing plan")
16+
price: Dict[str, str] = Field(
17+
description="Price details including amount, currency, and period"
18+
)
19+
features: List[str] = Field(description="List of features included in the plan")
20+
21+
22+
class ProductInfo(BaseModel):
23+
name: str = Field(description="Name of the product")
24+
description: str = Field(description="Description of the product")
25+
features: List[Feature] = Field(description="List of product features")
26+
pricing: Dict[str, List[PricingPlan]] = Field(description="Pricing information")
27+
reference_urls: List[str] = Field(description="Source URLs for the information")
28+
29+
30+
sgai_logger.set_logging(level="INFO")
31+
32+
# Initialize with Pydantic model class
33+
tool = SearchScraperTool(llm_output_schema=ProductInfo)
34+
35+
# Example prompt
36+
user_prompt = "What are the key features and pricing of ChatGPT Plus?"
37+
38+
# Use the tool - output will conform to ProductInfo schema
39+
result = tool.invoke({"user_prompt": user_prompt})
40+
41+
print("\nResult:", result)

examples/smartscraper_tool_schema.py

Lines changed: 25 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -17,10 +17,31 @@ class WebsiteInfo(BaseModel):
1717
# Initialize with Pydantic model class
1818
tool = SmartScraperTool(llm_output_schema=WebsiteInfo)
1919

20-
# Example website and prompt
20+
# Example 1: Using website URL
2121
website_url = "https://www.example.com"
2222
user_prompt = "Extract info about the website"
2323

24-
# Use the tool - output will conform to WebsiteInfo schema
25-
result = tool.invoke({"website_url": website_url, "user_prompt": user_prompt})
26-
print(result)
24+
# Use the tool with URL
25+
result_url = tool.invoke({"website_url": website_url, "user_prompt": user_prompt})
26+
print("\nResult from URL:", result_url)
27+
28+
# Example 2: Using HTML content directly
29+
html_content = """
30+
<html>
31+
<body>
32+
<h1>Example Domain</h1>
33+
<p>This domain is for use in illustrative examples.</p>
34+
<a href="https://www.iana.org/domains/example">More information...</a>
35+
</body>
36+
</html>
37+
"""
38+
39+
# Use the tool with HTML content
40+
result_html = tool.invoke(
41+
{
42+
"website_url": website_url, # Still required but will be overridden
43+
"website_html": html_content,
44+
"user_prompt": user_prompt,
45+
}
46+
)
47+
print("\nResult from HTML:", result_html)
Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
from .credits import GetCreditsTool
2-
from .localscraper import LocalScraperTool
32
from .markdownify import MarkdownifyTool
3+
from .searchscraper import SearchScraperTool
44
from .smartscraper import SmartScraperTool
55

6-
__all__ = ["SmartScraperTool", "GetCreditsTool", "MarkdownifyTool", "LocalScraperTool"]
6+
__all__ = ["SmartScraperTool", "GetCreditsTool", "MarkdownifyTool", "SearchScraperTool"]

0 commit comments

Comments
 (0)