Scrapy

All my Scrapy projects are here

Garagem

This project aims to scrap data from books printed by the major brands and solded in Saraiva website.

First of all, you need a Splash instance running. You are able to do that by running a Docker image:

$ docker run -p 8050:8050 scrapinghub/splash

You must also add the Splash server address to ./garagem/garagem/settings.py like this:

SPLASH_URL = 'http://192.168.59.103:8050'

Currently, there is no script to execute the Spider autonomously. It requires doing it from Terminal.

To start crawling, you must be located at garagem folder (the outermost one) and send this command:

$ scrapy crawl saraiva

if you want to dump the items in a JSON Lines file like I did with saraiva.jl, just add -o FILENAME.jl to the command above.

You may access Scrapy Docs for further options and Documentation.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
garagem		garagem
scrapy_tutorial		scrapy_tutorial
.gitignore		.gitignore
README.md		README.md