Skip to content

spider-rs/spider-py

Folders and files

NameName
Last commit message
Last commit date

Latest commit

b010eae · Jan 28, 2025

History

67 Commits
Dec 8, 2023
Dec 27, 2023
Jun 24, 2024
Sep 24, 2024
Jun 24, 2024
Jan 28, 2025
Aug 29, 2024
Jan 28, 2025
Dec 8, 2023
Mar 9, 2024
Mar 27, 2024
Dec 8, 2023

Repository files navigation

spider-py

The spider project ported to Python.

Getting Started

  1. pip install spider_rs
import asyncio

from spider_rs import Website

async def main():
    website = Website("https://choosealicense.com")
    website.crawl()
    print(website.get_links())

asyncio.run(main())

View the examples to learn more.

Development

Install maturin pipx install maturin and python.

  1. maturin develop

Benchmarks

View the benchmarks to see a breakdown between libs and platforms.

Test url: https://espn.com

libraries pages speed
spider(rust): crawl 150,387 1m
spider(nodejs): crawl 150,387 153s
spider(python): crawl 150,387 186s
scrapy(python): crawl 49,598 1h
crawlee(nodejs): crawl 18,779 30m

The benches above were ran on a mac m1, spider on linux arm machines performs about 2-10x faster.

Issues

Please submit a Github issue for any issues found.