BASHkrawler

1. Description

Bash Web Crawler to find URLs by parsing the HTML source code and the found javascript links on homepage of a required specific website domain. It is also possible to use a pattern word as optional argument to customize the URLs extraction.

2. Install

➜ git clone https://github.com/torsh4rk/BASHkrawler.git
➜ cd BASHkrawler/ && chmod +x bashkrawler.sh
➜ ./bashkrawler.sh

3. Example Usage

Fig.1 - Displaying banner

3.1. Making HTML parsing without using a pattern word to match

Fig.2 - Chosing the option 1 to find all URLs at target domain www.nasa.gov via HTML parsing

Fig.3 - Finding all URLs at target domain www.nasa.gov via HTML parsing

3.3. Finding all JS links at target domain and parsing them without using a pattern word to match

Fig.4 - Chosing the option 2 to find all JS links at target domain www.nasa.gov and extract all URLs from this found JS links

3.3. Making a full web crawling by running the option 1 and 2 without using a pattern word to match

Fig.5 - Chosing the option 3 to find all URLs at target domain www.nasa.gov via option 1 and 2 without using a pattern word to match

Fig.6 - Finishing the full web crawling at target domain www.nasa.gov

3.4. Making HTML parsing by using a pattern word to match

Fig.7 - Make web crawling at a target domain and find all URLs with the word ".nasa"

Fig.8 - Chosing the option 3 to find all URLs with the word "nasa" at target domain www.nasa.gov via option 1 and 2

Fig.9 - Finishing the full web crawling at target domain www.nasa.gov by using the word ".nasa"

4. References

https://medium.datadriveninvestor.com/what-is-a-web-crawler-and-how-does-it-work-b9e9c2e4c35d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BASHkrawler

1. Description

2. Install

3. Example Usage

3.1. Making HTML parsing without using a pattern word to match

3.3. Finding all JS links at target domain and parsing them without using a pattern word to match

3.3. Making a full web crawling by running the option 1 and 2 without using a pattern word to match

3.4. Making HTML parsing by using a pattern word to match

4. References

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

BASHkrawler

1. Description

2. Install

3. Example Usage

3.1. Making HTML parsing without using a pattern word to match

3.3. Finding all JS links at target domain and parsing them without using a pattern word to match

3.3. Making a full web crawling by running the option 1 and 2 without using a pattern word to match

3.4. Making HTML parsing by using a pattern word to match

4. References