This program downloads single page images from Apabi Digital Library (阿帕比电子图书).
You will need to have Git and Python 3 with pip/pipenv installed on your computer. Then install the dependencies thusly:
- Clone this repository:
git clone https://github.com/pcdi/ApabiDownloader.git
- Go into the repository:
cd ApabiDownloader
- Install dependencies:
pipenv sync
- Go into the crawler's directory:
You should now be at
cd ApabiDownloader
ApabiDownloader/ApabiDownloader
, the folder that containsscrapy.cfg
. - Make sure you are able to log into Apabi via your IP! Otherwise, the program will not be able to download anything. For example: Either be physically at the institution that gives you access to Apabi or be connected via VPN.
- Run the program. The information about the book you want to download should be supplied as an argument. Make sure you supply a URL that contains
book.detail
. In the following command, replace the URL with your own URL:pipenv run scrapy crawl apabi_downloader -L INFO -a book_detail_url="http://apabi.lib.pku.edu.cn/Usp/pku/?pid=book.detail&metaid=m.20201211-ZGRM-KXSJ-0307"
- To stop the crawler, either wait for it to finish running or exit with Ctrl-C.
- If the crawler has not downloaded all images for the book, you can restart the crawler running the same command again. It will only download pages that have not been previously downloaded.