Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

会爬取到设定时间之外的微博,辛苦作者解答! #495

Open
limingyang325 opened this issue Jul 31, 2024 · 3 comments
Open

Comments

@limingyang325
Copy link

设定开始日期和结束日期都是7.17的话,会爬到7.18,7.17的微博,看了一下好像大家出现这样问题的情况不多,想请问一下是什么原因呢?

@dataabc
Copy link
Owner

dataabc commented Jul 31, 2024

可能微博接口就是如此输出的。

@limingyang325
Copy link
Author

好嘞 谢谢作者!还想请问一下运行一段时间后出现这个报错是什么原因哇,没有找到类似的问题!
2024-08-01 07:45:51 [scrapy.core.scraper] ERROR: Spider error processing <GET https://s.weibo.com/weibo?q=%E6%9A%B4%E9%9B%A8&typeall=1&suball=1&timescope=custom:2024-07-18-6:2024-07-18-7&page=1> (referer: https://s.weibo.com/weibo?q=%E6%9A%B4%E9%9B%A8&typeall=1&suball=1&timescope=custom:2024-07-18-0:2024-07-19-0&page=1)
urllib3.exceptions.SSLError: [SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1006)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\adapters.py", line 486, in send
resp = conn.urlopen(
^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\urllib3\connectionpool.py", line 845, in urlopen
retries = retries.increment(
^^^^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\urllib3\util\retry.py", line 515, in increment
raise MaxRetryError(_pool, url, reason) from reason # type: ignore[arg-type]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='weibo.com', port=443): Max retries exceeded with url: /ajax/statuses/show?id=Oo4KYzuaR&locale=zh-CN (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1006)')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\utils\defer.py", line 279, in iter_errback
yield next(it)
^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\utils\python.py", line 350, in next
return next(self.data)
^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\utils\python.py", line 350, in next
return next(self.data)
^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync
for r in iterable:
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\spidermiddlewares\referer.py", line 352, in
return (self._set_referer(r, response) for r in result or ())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync
for r in iterable:
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 27, in
return (r for r in result or () if self._filter(r, spider))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync
for r in iterable:
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\spidermiddlewares\depth.py", line 31, in
return (r for r in result or () if self._filter(r, response, spider))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\scrapy\core\spidermw.py", line 106, in process_sync
for r in iterable:
File "E:\flood\weibo-search-master-修改\weibo-search-master - 副本\weibo\spiders\search.py", line 197, in parse_by_hour
for weibo in self.parse_weibo(response):
File "E:\flood\weibo-search-master-修改\weibo-search-master - 副本\weibo\spiders\search.py", line 517, in parse_weibo
weibo["ip"] = self.get_ip(bid)
^^^^^^^^^^^^^^^^
File "E:\flood\weibo-search-master-修改\weibo-search-master - 副本\weibo\spiders\search.py", line 271, in get_ip
response = requests.get(url, headers=self.settings.get('DEFAULT_REQUEST_HEADERS'))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\api.py", line 73, in get
return request("get", url, params=params, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\api.py", line 59, in request
return session.request(method=method, url=url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\sessions.py", line 589, in request
resp = self.send(prep, **send_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\sessions.py", line 703, in send
r = adapter.send(request, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\lmy\anaconda3\Lib\site-packages\requests\adapters.py", line 517, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='weibo.com', port=443): Max retries exceeded with url: /ajax/statuses/show?id=Oo4KYzuaR&locale=zh-CN (Caused by SSLError(SSLEOFError(8, '[SSL: UNEXPECTED_EOF_WHILE_READING] EOF occurred in violation of protocol (_ssl.c:1006)')))

可能微博接口就是如此输出的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants