Skip to content

Latest commit

 

History

History
84 lines (58 loc) · 2.62 KB

File metadata and controls

84 lines (58 loc) · 2.62 KB

BASHkrawler

1. Description

Bash Web Crawler to find URLs by parsing the HTML source code and the found javascript links on homepage of a required specific website domain. It is also possible to use a pattern word as optional argument to customize the URLs extraction.


2. Install

➜ git clone https://github.com/torsh4rk/BASHkrawler.git
➜ cd BASHkrawler/ && chmod +x bashkrawler.sh
➜ ./bashkrawler.sh

3. Example Usage

Fig.1 - Displaying banner


3.1. Making HTML parsing without using a pattern word to match

Fig.2 - Chosing the option 1 to find all URLs at target domain www.nasa.gov via HTML parsing

Fig.3 - Finding all URLs at target domain www.nasa.gov via HTML parsing


3.3. Finding all JS links at target domain and parsing them without using a pattern word to match

Fig.4 - Chosing the option 2 to find all JS links at target domain www.nasa.gov and extract all URLs from this found JS links


3.3. Making a full web crawling by running the option 1 and 2 without using a pattern word to match

Fig.5 - Chosing the option 3 to find all URLs at target domain www.nasa.gov via option 1 and 2 without using a pattern word to match

Fig.6 - Finishing the full web crawling at target domain www.nasa.gov


3.4. Making HTML parsing by using a pattern word to match

Fig.7 - Make web crawling at a target domain and find all URLs with the word ".nasa"

Fig.8 - Chosing the option 3 to find all URLs with the word "nasa" at target domain www.nasa.gov via option 1 and 2

Fig.9 - Finishing the full web crawling at target domain www.nasa.gov by using the word ".nasa"


4. References

https://medium.datadriveninvestor.com/what-is-a-web-crawler-and-how-does-it-work-b9e9c2e4c35d