- Python 3.8+
- Requests
- BeautifulSoup
01 -) Clone:
$ git clone https://github.com/thiiagoms/links-extractor
02 -) Go to links-extractor
directory:
$ cd links-extractor
links-extractor $
01 -) In your script.py
call Extractor
main class like:
from src.services.extractor import Extractor
from src.utils.printer import Printer
urls = ['https://github.com', 'https://google.com']
extractor = Extractor()
links = extractor.extract(urls, timeout=10)
for url, extracted_links in links.items():
Printer.message(f"Url: {url}")
for link in extracted_links:
Printer.success(f" { link}")
Printer.message("###############")
And you should receive this output:
$ python example.py
Url: https://github.com
#start-of-content
https://github.com/
/signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home
/features/actions
/features/packages
/features/security
###############
Url: https://google.com
https://www.google.com/imghp?hl=pt-BR&tab=wi
https://maps.google.com.br/maps?hl=pt-BR&tab=wl
https://play.google.com/?hl=pt-BR&tab=w8
###############
01 -) Run tests with pytest:
links-extractor $ pytest
02 -) Run autopep8 lint on files like:
links-extractor $ autopep8 --in-place --aggressive --aggressive src/services/extractor.py