GitHub - thiiagoms/links-extractor: :hammer: extract all links from website

Extract links from urls 🗜️

Library that allows for the extraction of links from web pages

Dependencies ➕
Install 📦
Run 🏃
Bonus 🏅

Dependencies

Python 3.8+
Requests
BeautifulSoup

Install

01 -) Clone:

$ git clone https://github.com/thiiagoms/links-extractor

02 -) Go to links-extractor directory:

$ cd links-extractor
links-extractor $

Run

01 -) In your script.py call Extractor main class like:

from src.services.extractor import Extractor
from src.utils.printer import Printer

urls = ['https://github.com', 'https://google.com']
extractor = Extractor()
links = extractor.extract(urls, timeout=10)

for url, extracted_links in links.items():
    Printer.message(f"Url: {url}")
    for link in extracted_links:
        Printer.success(f" { link}")
    Printer.message("###############")

And you should receive this output:

$ python example.py

Url: https://github.com

  #start-of-content
  https://github.com/
  /signup?ref_cta=Sign+up&ref_loc=header+logged+out&ref_page=%2F&source=header-home
  /features/actions
  /features/packages
  /features/security

###############

Url: https://google.com

  https://www.google.com/imghp?hl=pt-BR&tab=wi
  https://maps.google.com.br/maps?hl=pt-BR&tab=wl
  https://play.google.com/?hl=pt-BR&tab=w8

###############

Bonus

01 -) Run tests with pytest:

links-extractor $ pytest

02 -) Run autopep8 lint on files like:

links-extractor $  autopep8 --in-place --aggressive --aggressive src/services/extractor.py

Name		Name	Last commit message	Last commit date
Latest commit History 54 Commits
.github/workflows		.github/workflows
assets/img		assets/img
src		src
tests/unit		tests/unit
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
example.py		example.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extract links from urls 🗜️

Dependencies

Install

Run

Bonus

About

Releases

Packages

Contributors 2

Languages

License

thiiagoms/links-extractor

Folders and files

Latest commit

History

Repository files navigation

Extract links from urls 🗜️

Dependencies

Install

Run

Bonus

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages