Data Glimpse is a Streamlit application designed to automate the web scraping process, utilizing the power of AutoScraper and BeautifulSoup. This application simplifies the extraction of data from various websites, including e-commerce platforms like Amazon and Flipkart.
- Automates web scraping process.
- Supports scraping data from normal websites and e-commerce platforms.
- Uses AutoScraper for general web scraping and BeautifulSoup for scraping product pages.
- Enables users to input website links and specify tag details for scraping.
To run Data Glimpse, ensure you have Python 3.x installed on your system. Then, follow these steps:
-
Clone this repository:
git clone https://github.com/dikshant182004/Data-Glimpse.git
-
Navigate to the project directory:
cd Data-Glimpse
-
Install the required dependencies:
pip install -r requirements.txt
-
Run the Streamlit application:
streamlit run main.py
-
Input the website link you want to scrape data from.
-
If using for general web scraping:
- Provide an example of the data you want to scrape.
-
If scraping e-commerce websites like Amazon:
- Specify tag details required for BeautifulSoup.
- Ensure the first tag is a
<div>
tag with a class or ID attributes - Other tags can be
<span>
and<a>
with class and ID attributes.
-
After specifying the required details, click on the "Scrape Data" button.
-
View the scraped data in the form of a DataFrame.
-
Optionally, save the scraped data to your local computer.
Suppose you want to scrape product information from Amazon. Here's how you can use Data Glimpse:
-
Input the Amazon product page link.
-
Provide tag details for BeautifulSoup:
-
Container class (should be a
<div>
tag with the class or id information). -
Other relevant tags like
<span>
and<a>
with class and ID attributes.
-
-
Click on "Scrape Data" to fetch and display the product information.
-
Optionally, save the scraped data as a CSV file on your local machine.
For any queries or assistance, feel free to contact [email protected].