fix: suppress CancelledError in _cleanup_producer#669
fix: suppress CancelledError in _cleanup_producer#669lithammer wants to merge 1 commit intoa2aproject:mainfrom
_cleanup_producer#669Conversation
Summary of ChangesHello @lithammer, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request addresses a critical issue where Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This is a great pull request that correctly identifies and fixes a resource leak caused by an unhandled CancelledError during cleanup. The use of contextlib.suppress is a clean and effective solution. The fix is well-explained in the description, and the addition of a dedicated test case ensures the new behavior is verified. The overall quality is high.
Prevent CancelledError from bubbling up as a 500 when cleanup is cancelled after a request has already completed. This avoids clients retrying on spurious 5xx responses, which can cause duplicate side effects (e.g. duplicate Slack messages).
c00382b to
1a75f94
Compare
Description
Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
CONTRIBUTINGGuide.fix:which represents bug fixes, and correlates to a SemVer patch.feat:represents a new feature, and correlates to a SemVer minor.feat!:, orfix!:,refactor!:, etc., which represent a breaking change (indicated by the!) and will result in a SemVer major.bash scripts/format.shfrom the repository root to format)Problem
When a streaming response is interrupted by a client disconnect, the ASGI server cancels the response coroutine. The
on_message_send_streamhandler catches this and schedules_cleanup_produceras a background task to clean up resources. However, theproducer_taskitself may also have been cancelled by this point. When_cleanup_producerawaits the cancelled producer task, theCancelledErrorpropagates out and the cleanup task fails, skippingqueue_manager.close()and leaving a stale entry in_running_agents.Here's a concrete scenario:
on_message_send_streamcatches theCancelledError/GeneratorExitand schedules_cleanup_produceras a background task_cleanup_producerdoesawait producer_task, but the producer was also cancelled during teardown, soCancelledErrorpropagates out500Fix
Suppress
CancelledErrorfromawait producer_taskso that the subsequent resource cleanup (queue_manager.close()and_running_agentsremoval) always runs.Why this is safe
await producer_taskin_cleanup_producerserves one purpose: waiting for the producer to finish before tearing down its resources. If the producer was cancelled, there are two cases:CancelledErroris purely an artifact of lifecycle timing. Suppressing it is a no-op in practice.In both cases, the
CancelledErrorcarries no actionable information. The producer's actual result (if any) was already consumed by theResultAggregatororEventConsumerupstream._cleanup_producernever inspects the producer's return value; it only waits for completion so cleanup is ordered correctly. Suppressing the error preserves that ordering guarantee while making sure resources are always freed.