Skip to content

fix: suppress CancelledError in _cleanup_producer#669

Open
lithammer wants to merge 1 commit intoa2aproject:mainfrom
lithammer:handle-cancellation-during-cleanup
Open

fix: suppress CancelledError in _cleanup_producer#669
lithammer wants to merge 1 commit intoa2aproject:mainfrom
lithammer:handle-cancellation-during-cleanup

Conversation

@lithammer
Copy link

Description

Thank you for opening a Pull Request!
Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Follow the CONTRIBUTING Guide.
  • Make your Pull Request title in the https://www.conventionalcommits.org/ specification.
    • Important Prefixes for release-please:
      • fix: which represents bug fixes, and correlates to a SemVer patch.
      • feat: represents a new feature, and correlates to a SemVer minor.
      • feat!:, or fix!:, refactor!:, etc., which represent a breaking change (indicated by the !) and will result in a SemVer major.
  • Ensure the tests and linter pass (Run bash scripts/format.sh from the repository root to format)
  • Appropriate docs were updated (if necessary)

Problem

When a streaming response is interrupted by a client disconnect, the ASGI server cancels the response coroutine. The on_message_send_stream handler catches this and schedules _cleanup_producer as a background task to clean up resources. However, the producer_task itself may also have been cancelled by this point. When _cleanup_producer awaits the cancelled producer task, the CancelledError propagates out and the cleanup task fails, skipping queue_manager.close() and leaving a stale entry in _running_agents.

Here's a concrete scenario:

  1. Client sends a request; the agent executes successfully (e.g. posts a Slack message)
  2. The ASGI server tears down the connection scope, maybe because the reverse proxy enforced a response timeout, or the client hung up right as the response was being written
  3. on_message_send_stream catches the CancelledError/GeneratorExit and schedules _cleanup_producer as a background task
  4. _cleanup_producer does await producer_task, but the producer was also cancelled during teardown, so CancelledError propagates out
  5. The queue and running-agents map are never cleaned up (resource leak)
  6. If the framework surfaces the error to the HTTP layer, the client sees a 500
  7. The client retries the request, the agent runs again, and you get a duplicate Slack message

Fix

Suppress CancelledError from await producer_task so that the subsequent resource cleanup (queue_manager.close() and _running_agents removal) always runs.

Why this is safe

await producer_task in _cleanup_producer serves one purpose: waiting for the producer to finish before tearing down its resources. If the producer was cancelled, there are two cases:

  • The producer already finished: The task completed and was then cancelled during teardown. The CancelledError is purely an artifact of lifecycle timing. Suppressing it is a no-op in practice.
  • The producer was still running: Its work was interrupted and won't complete regardless. Suppressing the error just lets us clean up the queue and tracking state, which we need to do either way.

In both cases, the CancelledError carries no actionable information. The producer's actual result (if any) was already consumed by the ResultAggregator or EventConsumer upstream. _cleanup_producer never inspects the producer's return value; it only waits for completion so cleanup is ordered correctly. Suppressing the error preserves that ordering guarantee while making sure resources are always freed.

@lithammer lithammer requested a review from a team as a code owner February 6, 2026 11:06
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @lithammer, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request addresses a critical issue where CancelledError propagating from a cancelled producer task could prevent essential resource cleanup, leading to resource leaks and potential 500 errors for clients. The fix ensures robust cleanup by suppressing this error, guaranteeing that queues and running agents are always properly closed and removed, regardless of the producer task's cancellation status.

Highlights

  • Error Handling Improvement: The _cleanup_producer coroutine now gracefully handles asyncio.CancelledError when awaiting producer_task. This prevents the cleanup process from failing if the producer task was cancelled, ensuring that resources are always properly released.
  • Resource Leak Prevention: By suppressing CancelledError, the queue_manager.close() and removal from _running_agents are guaranteed to execute, resolving a potential resource leak scenario where stale entries could persist if a client disconnects during a streaming response.
  • New Test Case: A new asynchronous test, test_cleanup_producer_swallows_cancelled_error, has been added to verify that _cleanup_producer correctly suppresses CancelledError and proceeds with resource cleanup as expected.
Changelog
  • src/a2a/server/request_handlers/default_request_handler.py
    • Imported the contextlib module.
    • Wrapped the await producer_task call within _cleanup_producer with contextlib.suppress(asyncio.CancelledError) to prevent the cleanup process from being interrupted by task cancellation.
  • tests/server/request_handlers/test_default_request_handler.py
    • Added a new asynchronous test case, test_cleanup_producer_swallows_cancelled_error, to validate that _cleanup_producer correctly suppresses CancelledError and ensures subsequent resource cleanup operations are performed.
Activity
  • No specific activity (comments, reviews, or progress updates) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This is a great pull request that correctly identifies and fixes a resource leak caused by an unhandled CancelledError during cleanup. The use of contextlib.suppress is a clean and effective solution. The fix is well-explained in the description, and the addition of a dedicated test case ensures the new behavior is verified. The overall quality is high.

Prevent CancelledError from bubbling up as a 500 when cleanup is
cancelled after a request has already completed. This avoids clients
retrying on spurious 5xx responses, which can cause duplicate side
effects (e.g. duplicate Slack messages).
@lithammer lithammer force-pushed the handle-cancellation-during-cleanup branch from c00382b to 1a75f94 Compare February 6, 2026 11:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant