[dev] Optimize batch fetch method to boost throughput #269

NiuBlibing · 2023-02-08T08:38:28Z

Description

The previous start url fetching method only working when spider is idle, which is not full concurrency.This patch optimizes it by using request_left_downloader signal.

There maybe need a lock for calculating the need_size.

Fixes #119

How Has This Been Tested?

tox -e py38 -- tests/

Test Configuration:

OS version: debian 11 and Windows10
Necessary Libraries (optional):

Checklist:

My code follows the style guidelines of this project
I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules

The previous start url fetching method only working when spider is idle, which is not full concurrency.This patch optimizes it by using request_left_downloader signal. Signed-off-by: Tianyue Ren <[email protected]>

LuckyPigeon · 2023-06-10T10:34:50Z

@NiuBlibing Please resolve the assertion error. And add unit test for fill_requests_queue, thanks!

LuckyPigeon · 2023-06-21T02:30:45Z

@rmax How do you think about this implementation, it disabled spider_idel usage. I wonder if we need a switch between spider_idle and fill_requests_queue.

rmax · 2023-06-21T21:53:22Z

@rmax How do you think about this implementation, it disabled spider_idel usage. I wonder if we need a switch between spider_idle and fill_requests_queue.

Interesting the use of the other signal. What scrapy version is required for the new signal?

What happens with existing users that override the spider_idle method?

Does it make sense to bump the major version? Or somewhat related, shall we migrate to calendar versioning?

NiuBlibing and others added 2 commits February 8, 2023 16:23

optimize batch fetch method to boost throughput

af59239

The previous start url fetching method only working when spider is idle, which is not full concurrency.This patch optimizes it by using request_left_downloader signal. Signed-off-by: Tianyue Ren <[email protected]>

Merge branch 'master' into optimize_next_requests

58ba6da

LuckyPigeon changed the title ~~[patch v1] optimize batch fetch method to boost throughput~~ [dev] Optimize batch fetch method to boost throughput Jun 21, 2023

LuckyPigeon self-assigned this Jun 21, 2023

LuckyPigeon added 3 commits June 21, 2023 09:55

[fix] assertion error

466288c

[fix] assertion error

d1674c6

[fix] assertion error

6300a38

LuckyPigeon enabled auto-merge (squash) June 21, 2023 02:27

LuckyPigeon disabled auto-merge June 21, 2023 02:28

LuckyPigeon enabled auto-merge (squash) June 21, 2023 02:28

LuckyPigeon requested a review from rmax June 21, 2023 02:30

rmax force-pushed the master branch 2 times, most recently from daccc92 to 3245d28 Compare July 6, 2024 21:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dev] Optimize batch fetch method to boost throughput #269

[dev] Optimize batch fetch method to boost throughput #269

NiuBlibing commented Feb 8, 2023 •

edited by LuckyPigeon

Loading

LuckyPigeon commented Jun 10, 2023 •

edited

Loading

LuckyPigeon commented Jun 21, 2023 •

edited

Loading

rmax commented Jun 21, 2023

[dev] Optimize batch fetch method to boost throughput #269

Are you sure you want to change the base?

[dev] Optimize batch fetch method to boost throughput #269

Conversation

NiuBlibing commented Feb 8, 2023 • edited by LuckyPigeon Loading

Description

How Has This Been Tested?

Test Configuration:

Checklist:

LuckyPigeon commented Jun 10, 2023 • edited Loading

LuckyPigeon commented Jun 21, 2023 • edited Loading

rmax commented Jun 21, 2023

NiuBlibing commented Feb 8, 2023 •

edited by LuckyPigeon

Loading

LuckyPigeon commented Jun 10, 2023 •

edited

Loading

LuckyPigeon commented Jun 21, 2023 •

edited

Loading