🚀 Web Scraper with Selenium & OpenPyXL 🌐

A Python-based web scraper that uses Selenium and OpenPyXL to extract data (text, images, and links) from websites listed in an Excel file. The scraper automatically reads the URLs, scrapes the content, and stores it in a new Excel file.

🛠 Prerequisites

Make sure you have the following installed:

Python 3.x
Selenium for browser automation
WebDriver Manager for managing the Chrome WebDriver
OpenPyXL for handling Excel files

Install the necessary Python packages:

pip install selenium webdriver-manager openpyxl

🔧 Setup & Installation

1. Clone or Download the Repository:

Clone the repository or download the script files to your local machine.

2. Prepare the Input Excel File:

Create an Excel file named LinksSeleniumWEBSC.xlsx and place it in your working directory, e.g., C:\Users\ACER\Desktop\Web Scraper selenium\.

The file should contain a sheet named "Sheet1".
List the website URLs starting from row A2 (A3, A4, etc.).

🚀 How to Use the Scraper

Run the Script:
- Open a terminal or command prompt.
- Navigate to the folder containing the script.
- Execute the script using Python:
```
python main.py
```
What Happens Next:
- The script will read all the URLs from the input file (LinksSeleniumWEBSC.xlsx).
- It will scrape the text, images, and links from each website.
- A new Excel file called link2.xlsx will be generated with the scraped data.

📊 Output Example

Input Excel (`LinksSeleniumWEBSC.xlsx`):

URL
https://example.com
https://another.com

Output Excel (`link2.xlsx`):

Text	Images	Links
Example Domain		example link
Another Example		another link

The data is organized into three columns:

Text: The extracted text from the webpage.
Images: Links to all the images found on the page.
Links: All internal and external links found on the page.

🧑‍💻 How the Script Works

`scrape_data(url, driver)`

This function:

Loads the provided URL using Selenium.
Waits for the page to load completely.
Scrapes the text, images, and links from the page.
Returns a dictionary with the scraped data.

`main()`

This function:

Configures the Selenium WebDriver to run in headless mode (no UI).
Reads URLs from the input Excel file.
Scrapes data from each URL and stores the results.
Saves the data into a new Excel file (link2.xlsx).

⚠️ Error Handling

The script handles errors related to missing or unreadable Excel files, and outputs error messages if necessary.
If scraping a particular URL fails, the script will continue with the next URL, ensuring the process doesn't halt.

💡 Example Input File

Ensure that the LinksSeleniumWEBSC.xlsx file has a structure like this:

URL
https://example.com
https://another.com

📝 Additional Notes

Dynamic Websites: The scraper is capable of handling basic websites. If a website loads content dynamically with JavaScript, the scraper still captures the content rendered by Selenium.
Customization: Feel free to extend the functionality to scrape additional data such as headers, tables, or specific sections from a webpage.

👐 Contribute or Ask Questions

Contributing: Feel free to fork this project, submit pull requests, or open issues if you have any suggestions for improvement or bug reports.
Questions?: If you encounter any problems or have any questions, don't hesitate to reach out!

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
LinksSeleniumWEBSC.xlsx		LinksSeleniumWEBSC.xlsx
README.md		README.md
chromedriver.exe		chromedriver.exe
geckodriver.exe		geckodriver.exe
link2.xlsx		link2.xlsx
main.py		main.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Web Scraper with Selenium & OpenPyXL 🌐

🛠 Prerequisites

Install the necessary Python packages:

🔧 Setup & Installation

1. Clone or Download the Repository:

2. Prepare the Input Excel File:

🚀 How to Use the Scraper

📊 Output Example

Input Excel (`LinksSeleniumWEBSC.xlsx`):

Output Excel (`link2.xlsx`):

🧑‍💻 How the Script Works

`scrape_data(url, driver)`

`main()`

⚠️ Error Handling

💡 Example Input File

📝 Additional Notes

👐 Contribute or Ask Questions

About

Uh oh!

Releases

Packages

Languages

sgindeed/Web-Scraper-selenium

Folders and files

Latest commit

History

Repository files navigation

🚀 Web Scraper with Selenium & OpenPyXL 🌐

🛠 Prerequisites

Install the necessary Python packages:

🔧 Setup & Installation

1. Clone or Download the Repository:

2. Prepare the Input Excel File:

🚀 How to Use the Scraper

📊 Output Example

Input Excel (LinksSeleniumWEBSC.xlsx):

Output Excel (link2.xlsx):

🧑‍💻 How the Script Works

scrape_data(url, driver)

main()

⚠️ Error Handling

💡 Example Input File

📝 Additional Notes

👐 Contribute or Ask Questions

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Input Excel (`LinksSeleniumWEBSC.xlsx`):

Output Excel (`link2.xlsx`):

`scrape_data(url, driver)`

`main()`

Packages