-
Notifications
You must be signed in to change notification settings - Fork 14
Open
Description
Situation
- A task was supposed to be run
- It failed and was retried a couple of times (apparently because of a Timeout error)
- The task was scheduled nonetheless
- Eventually, the command to run the task succeeds in getting a response
- The response is that the job is already scheduled
- Because of this, the returned job_id is None (See
BaseScript._schedule_job()
andTask.run()
) - This
None
job_id is then tracked among the running jobs (SeeGraphManager.run_pending_jobs()
) - GraphManager waits for it to finish (See
GraphManager.check_running_jobs()
) - Because it's not an actual job key, it never finishes, so the GraphManager is stuck and doesn't move on to the next task.
Logs
Affected jobs:
2023-10-16 05:51:07.032000 INFO [shub_workflow.graph] Job discovery (265529/71/10349) still running
2023-10-16 05:53:07.597000 INFO [shub_workflow.graph] Job "graph_manager/discovery" (265529/71/10349) finished
2023-10-16 05:53:07.598000 INFO [shub_workflow.graph.task] Will start task "graph_manager/consumers"
2023-10-16 05:54:07.726000 ERROR [shub_workflow.script] Failed to schedule job with arguments {'spider': 'py:hcfmanager.py', 'add_tag': ['FLOW_ID=869a601a-24f7-4f3b-8ce4-2a883c2eaac6', 'PARENT_NAME=graph_manager', 'TASK_ID=consumers'], 'units': None, 'cmd_args': 'imdb.com_fresh imdb.com_fresh slot_ --loop-mode=60 "--frontera-settings-json={\\"BACKEND\\": \\"hcf_backend.HCFBackend\\", \\"HCF_PROJECT_ID\\": \\"265529\\", \\"HCF_AUTH\\": \\"153822e638164f449e637000b46194fb\\", \\"HCF_PRODUCER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_PRODUCER_SLOT_PREFIX\\": \\"slot_\\", \\"HCF_PRODUCER_NUMBER_OF_SLOTS\\": 11, \\"HCF_CONSUMER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_CONSUMER_MAX_BATCHES\\": 150, \\"MAX_NEXT_REQUESTS\\": 15000, \\"HCF_CONSUMER_DELETE_BATCHES_ON_STOP\\": true}" --max-running-jobs=11 "--job-settings={\\"SCHEDULER\\": \\"scrapy_frontera.scheduler.FronteraScheduler\\", \\"SPIDER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerSpiderMiddleware\\": 0}, \\"DOWNLOADER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerDownloaderMiddleware\\": 0}, \\"FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER\\": [\\"crawl_title\\", \\"crawl_person\\"], \\"FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP\\": {\\"crawl_title\\": \\"slot_crawl_title_/11\\", \\"crawl_person\\": \\"slot_crawl_person_/11\\"}, \\"FRONTERA_SCHEDULER_SKIP_START_REQUESTS\\": true, \\"URLACCUMULATOR_SCHEDULE_URLS_ENABLED\\": false, \\"SPIDERMON_SPIDER_CLOSE_MONITORS\\": [\\"netflix.monitors.WorkflowConsumersCloseMonitorSuite\\"], \\"SPIDERMON_PERIODIC_MONITORS\\": {\\"netflix.monitors.WorkflowConsumersPeriodicMonitorSuite\\": 10800}}"', 'meta': None}
2023-10-16 05:54:07.727000 ERROR [shub_workflow.utils] Retrying shub_workflow.script.BaseScript._schedule_job in 60.0 seconds as it raised ReadTimeout: HTTPSConnectionPool(host='app.scrapinghub.com', port=443): Read timed out. (read timeout=60).
2023-10-16 05:56:07.918000 ERROR [shub_workflow.script] Failed to schedule job with arguments {'spider': 'py:hcfmanager.py', 'add_tag': ['FLOW_ID=869a601a-24f7-4f3b-8ce4-2a883c2eaac6', 'PARENT_NAME=graph_manager', 'TASK_ID=consumers'], 'units': None, 'cmd_args': 'imdb.com_fresh imdb.com_fresh slot_ --loop-mode=60 "--frontera-settings-json={\\"BACKEND\\": \\"hcf_backend.HCFBackend\\", \\"HCF_PROJECT_ID\\": \\"265529\\", \\"HCF_AUTH\\": \\"153822e638164f449e637000b46194fb\\", \\"HCF_PRODUCER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_PRODUCER_SLOT_PREFIX\\": \\"slot_\\", \\"HCF_PRODUCER_NUMBER_OF_SLOTS\\": 11, \\"HCF_CONSUMER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_CONSUMER_MAX_BATCHES\\": 150, \\"MAX_NEXT_REQUESTS\\": 15000, \\"HCF_CONSUMER_DELETE_BATCHES_ON_STOP\\": true}" --max-running-jobs=11 "--job-settings={\\"SCHEDULER\\": \\"scrapy_frontera.scheduler.FronteraScheduler\\", \\"SPIDER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerSpiderMiddleware\\": 0}, \\"DOWNLOADER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerDownloaderMiddleware\\": 0}, \\"FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER\\": [\\"crawl_title\\", \\"crawl_person\\"], \\"FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP\\": {\\"crawl_title\\": \\"slot_crawl_title_/11\\", \\"crawl_person\\": \\"slot_crawl_person_/11\\"}, \\"FRONTERA_SCHEDULER_SKIP_START_REQUESTS\\": true, \\"URLACCUMULATOR_SCHEDULE_URLS_ENABLED\\": false, \\"SPIDERMON_SPIDER_CLOSE_MONITORS\\": [\\"netflix.monitors.WorkflowConsumersCloseMonitorSuite\\"], \\"SPIDERMON_PERIODIC_MONITORS\\": {\\"netflix.monitors.WorkflowConsumersPeriodicMonitorSuite\\": 10800}}"', 'meta': None}
2023-10-16 05:56:07.918000 ERROR [shub_workflow.utils] Retrying shub_workflow.script.BaseScript._schedule_job in 60.0 seconds as it raised ReadTimeout: HTTPSConnectionPool(host='app.scrapinghub.com', port=443): Read timed out. (read timeout=60).
2023-10-16 05:58:08.104000 ERROR [shub_workflow.script] Failed to schedule job with arguments {'spider': 'py:hcfmanager.py', 'add_tag': ['FLOW_ID=869a601a-24f7-4f3b-8ce4-2a883c2eaac6', 'PARENT_NAME=graph_manager', 'TASK_ID=consumers'], 'units': None, 'cmd_args': 'imdb.com_fresh imdb.com_fresh slot_ --loop-mode=60 "--frontera-settings-json={\\"BACKEND\\": \\"hcf_backend.HCFBackend\\", \\"HCF_PROJECT_ID\\": \\"265529\\", \\"HCF_AUTH\\": \\"153822e638164f449e637000b46194fb\\", \\"HCF_PRODUCER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_PRODUCER_SLOT_PREFIX\\": \\"slot_\\", \\"HCF_PRODUCER_NUMBER_OF_SLOTS\\": 11, \\"HCF_CONSUMER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_CONSUMER_MAX_BATCHES\\": 150, \\"MAX_NEXT_REQUESTS\\": 15000, \\"HCF_CONSUMER_DELETE_BATCHES_ON_STOP\\": true}" --max-running-jobs=11 "--job-settings={\\"SCHEDULER\\": \\"scrapy_frontera.scheduler.FronteraScheduler\\", \\"SPIDER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerSpiderMiddleware\\": 0}, \\"DOWNLOADER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerDownloaderMiddleware\\": 0}, \\"FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER\\": [\\"crawl_title\\", \\"crawl_person\\"], \\"FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP\\": {\\"crawl_title\\": \\"slot_crawl_title_/11\\", \\"crawl_person\\": \\"slot_crawl_person_/11\\"}, \\"FRONTERA_SCHEDULER_SKIP_START_REQUESTS\\": true, \\"URLACCUMULATOR_SCHEDULE_URLS_ENABLED\\": false, \\"SPIDERMON_SPIDER_CLOSE_MONITORS\\": [\\"netflix.monitors.WorkflowConsumersCloseMonitorSuite\\"], \\"SPIDERMON_PERIODIC_MONITORS\\": {\\"netflix.monitors.WorkflowConsumersPeriodicMonitorSuite\\": 10800}}"', 'meta': None}
2023-10-16 05:58:08.104000 ERROR [shub_workflow.utils] Retrying shub_workflow.script.BaseScript._schedule_job in 60.0 seconds as it raised ReadTimeout: HTTPSConnectionPool(host='app.scrapinghub.com', port=443): Read timed out. (read timeout=60).
2023-10-16 06:00:08.300000 ERROR [shub_workflow.script] Failed to schedule job with arguments {'spider': 'py:hcfmanager.py', 'add_tag': ['FLOW_ID=869a601a-24f7-4f3b-8ce4-2a883c2eaac6', 'PARENT_NAME=graph_manager', 'TASK_ID=consumers'], 'units': None, 'cmd_args': 'imdb.com_fresh imdb.com_fresh slot_ --loop-mode=60 "--frontera-settings-json={\\"BACKEND\\": \\"hcf_backend.HCFBackend\\", \\"HCF_PROJECT_ID\\": \\"265529\\", \\"HCF_AUTH\\": \\"153822e638164f449e637000b46194fb\\", \\"HCF_PRODUCER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_PRODUCER_SLOT_PREFIX\\": \\"slot_\\", \\"HCF_PRODUCER_NUMBER_OF_SLOTS\\": 11, \\"HCF_CONSUMER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_CONSUMER_MAX_BATCHES\\": 150, \\"MAX_NEXT_REQUESTS\\": 15000, \\"HCF_CONSUMER_DELETE_BATCHES_ON_STOP\\": true}" --max-running-jobs=11 "--job-settings={\\"SCHEDULER\\": \\"scrapy_frontera.scheduler.FronteraScheduler\\", \\"SPIDER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerSpiderMiddleware\\": 0}, \\"DOWNLOADER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerDownloaderMiddleware\\": 0}, \\"FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER\\": [\\"crawl_title\\", \\"crawl_person\\"], \\"FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP\\": {\\"crawl_title\\": \\"slot_crawl_title_/11\\", \\"crawl_person\\": \\"slot_crawl_person_/11\\"}, \\"FRONTERA_SCHEDULER_SKIP_START_REQUESTS\\": true, \\"URLACCUMULATOR_SCHEDULE_URLS_ENABLED\\": false, \\"SPIDERMON_SPIDER_CLOSE_MONITORS\\": [\\"netflix.monitors.WorkflowConsumersCloseMonitorSuite\\"], \\"SPIDERMON_PERIODIC_MONITORS\\": {\\"netflix.monitors.WorkflowConsumersPeriodicMonitorSuite\\": 10800}}"', 'meta': None}
2023-10-16 06:00:08.301000 ERROR [shub_workflow.utils] Retrying shub_workflow.script.BaseScript._schedule_job in 60.0 seconds as it raised ReadTimeout: HTTPSConnectionPool(host='app.scrapinghub.com', port=443): Read timed out. (read timeout=60).
2023-10-16 06:02:08.490000 ERROR [shub_workflow.script] Failed to schedule job with arguments {'spider': 'py:hcfmanager.py', 'add_tag': ['FLOW_ID=869a601a-24f7-4f3b-8ce4-2a883c2eaac6', 'PARENT_NAME=graph_manager', 'TASK_ID=consumers'], 'units': None, 'cmd_args': 'imdb.com_fresh imdb.com_fresh slot_ --loop-mode=60 "--frontera-settings-json={\\"BACKEND\\": \\"hcf_backend.HCFBackend\\", \\"HCF_PROJECT_ID\\": \\"265529\\", \\"HCF_AUTH\\": \\"153822e638164f449e637000b46194fb\\", \\"HCF_PRODUCER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_PRODUCER_SLOT_PREFIX\\": \\"slot_\\", \\"HCF_PRODUCER_NUMBER_OF_SLOTS\\": 11, \\"HCF_CONSUMER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_CONSUMER_MAX_BATCHES\\": 150, \\"MAX_NEXT_REQUESTS\\": 15000, \\"HCF_CONSUMER_DELETE_BATCHES_ON_STOP\\": true}" --max-running-jobs=11 "--job-settings={\\"SCHEDULER\\": \\"scrapy_frontera.scheduler.FronteraScheduler\\", \\"SPIDER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerSpiderMiddleware\\": 0}, \\"DOWNLOADER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerDownloaderMiddleware\\": 0}, \\"FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER\\": [\\"crawl_title\\", \\"crawl_person\\"], \\"FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP\\": {\\"crawl_title\\": \\"slot_crawl_title_/11\\", \\"crawl_person\\": \\"slot_crawl_person_/11\\"}, \\"FRONTERA_SCHEDULER_SKIP_START_REQUESTS\\": true, \\"URLACCUMULATOR_SCHEDULE_URLS_ENABLED\\": false, \\"SPIDERMON_SPIDER_CLOSE_MONITORS\\": [\\"netflix.monitors.WorkflowConsumersCloseMonitorSuite\\"], \\"SPIDERMON_PERIODIC_MONITORS\\": {\\"netflix.monitors.WorkflowConsumersPeriodicMonitorSuite\\": 10800}}"', 'meta': None}
2023-10-16 06:02:08.490000 ERROR [shub_workflow.utils] Retrying shub_workflow.script.BaseScript._schedule_job in 60.0 seconds as it raised ReadTimeout: HTTPSConnectionPool(host='app.scrapinghub.com', port=443): Read timed out. (read timeout=60).
2023-10-16 06:04:08.671000 ERROR [shub_workflow.script] Failed to schedule job with arguments {'spider': 'py:hcfmanager.py', 'add_tag': ['FLOW_ID=869a601a-24f7-4f3b-8ce4-2a883c2eaac6', 'PARENT_NAME=graph_manager', 'TASK_ID=consumers'], 'units': None, 'cmd_args': 'imdb.com_fresh imdb.com_fresh slot_ --loop-mode=60 "--frontera-settings-json={\\"BACKEND\\": \\"hcf_backend.HCFBackend\\", \\"HCF_PROJECT_ID\\": \\"265529\\", \\"HCF_AUTH\\": \\"153822e638164f449e637000b46194fb\\", \\"HCF_PRODUCER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_PRODUCER_SLOT_PREFIX\\": \\"slot_\\", \\"HCF_PRODUCER_NUMBER_OF_SLOTS\\": 11, \\"HCF_CONSUMER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_CONSUMER_MAX_BATCHES\\": 150, \\"MAX_NEXT_REQUESTS\\": 15000, \\"HCF_CONSUMER_DELETE_BATCHES_ON_STOP\\": true}" --max-running-jobs=11 "--job-settings={\\"SCHEDULER\\": \\"scrapy_frontera.scheduler.FronteraScheduler\\", \\"SPIDER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerSpiderMiddleware\\": 0}, \\"DOWNLOADER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerDownloaderMiddleware\\": 0}, \\"FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER\\": [\\"crawl_title\\", \\"crawl_person\\"], \\"FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP\\": {\\"crawl_title\\": \\"slot_crawl_title_/11\\", \\"crawl_person\\": \\"slot_crawl_person_/11\\"}, \\"FRONTERA_SCHEDULER_SKIP_START_REQUESTS\\": true, \\"URLACCUMULATOR_SCHEDULE_URLS_ENABLED\\": false, \\"SPIDERMON_SPIDER_CLOSE_MONITORS\\": [\\"netflix.monitors.WorkflowConsumersCloseMonitorSuite\\"], \\"SPIDERMON_PERIODIC_MONITORS\\": {\\"netflix.monitors.WorkflowConsumersPeriodicMonitorSuite\\": 10800}}"', 'meta': None}
2023-10-16 06:04:08.671000 ERROR [shub_workflow.utils] Retrying shub_workflow.script.BaseScript._schedule_job in 60.0 seconds as it raised ReadTimeout: HTTPSConnectionPool(host='app.scrapinghub.com', port=443): Read timed out. (read timeout=60).
2023-10-16 06:06:08.858000 ERROR [shub_workflow.script] Failed to schedule job with arguments {'spider': 'py:hcfmanager.py', 'add_tag': ['FLOW_ID=869a601a-24f7-4f3b-8ce4-2a883c2eaac6', 'PARENT_NAME=graph_manager', 'TASK_ID=consumers'], 'units': None, 'cmd_args': 'imdb.com_fresh imdb.com_fresh slot_ --loop-mode=60 "--frontera-settings-json={\\"BACKEND\\": \\"hcf_backend.HCFBackend\\", \\"HCF_PROJECT_ID\\": \\"265529\\", \\"HCF_AUTH\\": \\"153822e638164f449e637000b46194fb\\", \\"HCF_PRODUCER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_PRODUCER_SLOT_PREFIX\\": \\"slot_\\", \\"HCF_PRODUCER_NUMBER_OF_SLOTS\\": 11, \\"HCF_CONSUMER_FRONTIER\\": \\"imdb.com_fresh\\", \\"HCF_CONSUMER_MAX_BATCHES\\": 150, \\"MAX_NEXT_REQUESTS\\": 15000, \\"HCF_CONSUMER_DELETE_BATCHES_ON_STOP\\": true}" --max-running-jobs=11 "--job-settings={\\"SCHEDULER\\": \\"scrapy_frontera.scheduler.FronteraScheduler\\", \\"SPIDER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerSpiderMiddleware\\": 0}, \\"DOWNLOADER_MIDDLEWARES\\": {\\"netflix.scheduler.FronteraSchedulerDownloaderMiddleware\\": 0}, \\"FRONTERA_SCHEDULER_REQUEST_CALLBACKS_TO_FRONTIER\\": [\\"crawl_title\\", \\"crawl_person\\"], \\"FRONTERA_SCHEDULER_CALLBACK_SLOT_PREFIX_MAP\\": {\\"crawl_title\\": \\"slot_crawl_title_/11\\", \\"crawl_person\\": \\"slot_crawl_person_/11\\"}, \\"FRONTERA_SCHEDULER_SKIP_START_REQUESTS\\": true, \\"URLACCUMULATOR_SCHEDULE_URLS_ENABLED\\": false, \\"SPIDERMON_SPIDER_CLOSE_MONITORS\\": [\\"netflix.monitors.WorkflowConsumersCloseMonitorSuite\\"], \\"SPIDERMON_PERIODIC_MONITORS\\": {\\"netflix.monitors.WorkflowConsumersPeriodicMonitorSuite\\": 10800}}"', 'meta': None}
2023-10-16 06:06:08.858000 ERROR [shub_workflow.utils] Retrying shub_workflow.script.BaseScript._schedule_job in 60.0 seconds as it raised ReadTimeout: HTTPSConnectionPool(host='app.scrapinghub.com', port=443): Read timed out. (read timeout=60).
2023-10-16 06:08:10.015000 ERROR [shub_workflow.script] py:hcfmanager.py already scheduled
2023-10-16 06:10:10.333000 INFO [shub_workflow.graph] Job consumers (None) still running
2023-10-16 06:12:10.659000 INFO [shub_workflow.graph] Job consumers (None) still running
2023-10-16 06:14:10.986000 INFO [shub_workflow.graph] Job consumers (None) still running
Proposal
I don't know what's the best way to address it. On one hand, this is a temporal issue in ScrapyCloud/JobQ, so this is something external to shub_workflow. On the other hand, there's been some instability on this API for quite a while, so it could be good to handle it, at least to avoid the jobs from being stuck.
I think if self.run_job()
returns None in GraphManager, we should probably stop the waiting loop because we won't be tracking that task's associated job ID either way.
I wonder if getting the job ID from the already scheduled job could be possible, too, so GraphManager can recover from this situation without having to restart it manually.
Metadata
Metadata
Assignees
Labels
No labels