Spider

第2章-基本库的使用

2.1-urllib的使用] _

2.2-requests的使用 _

2.3-正则表达式 _

2.4-httpx的使用 _ pip install "httpx[http2]"

2.5-基础爬虫案例实战 _

第3章-网页数据的解析爬取

3.1-XPath的使用 _

3.2-Beautiful Soup _

3.3-pyquery的使用 _

3.4-parsel的使用 _

第 4 章-数据的存储

4.1-txt文本文件存储

4.2-JSON文件存储

4.3-CSV文件存储 _

4.4-MySQL存储 _

4.5-MongoDB文档存储 _

4.6-Redis缓存存储

4.7-Elasticsearch搜索引擎存储 _

4.8-RabbitMQ的使用 _

第 5 章-Ajax分析与爬取实战

Ajax实战 -

第 6 章-异步爬虫

6.1-协程的基本原理

6.2aiohttp 的使用 _

6.3aiohttp异步爬取实战 _

第 7 章-JavaScript动态渲染页面爬虫

7.1selenium的使用 _

7.2Splash的使用

7.3Pyppeteer的使用 _

7.4Playwright的使用 _

7.5Selenium爬取实战

7.6Playwright爬取实战 _

7.7CSS位置偏移反爬案例分析与爬取实战 _

7.8字体反爬案例分析与爬取实战 _

第 8 章-验证码的识别

8.1使用OCR技术识别图形验证码 _

8.2使用OpenCV识别滑动验证码的缺口 _

8.3使用深度学习识别图形验证码 _

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
ch01		ch01
ch02		ch02
ch03		ch03
ch04		ch04
ch05		ch05
ch06		ch06
ch07		ch07
ch08		ch08
sign_off_system		sign_off_system
za		za
README.md		README.md
countWord_with.md.sh		countWord_with.md.sh
operation.md		operation.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spider

第2章-基本库的使用

第3章-网页数据的解析爬取

第 4 章-数据的存储

第 5 章-Ajax分析与爬取实战

第 6 章-异步爬虫

第 7 章-JavaScript动态渲染页面爬虫

第 8 章-验证码的识别

About

Releases

Packages

Languages

hueryan/spider

Folders and files

Latest commit

History

Repository files navigation

Spider

About

Resources

Stars

Watchers

Forks

Languages