Skip to content

Conversation

@Edison-A-N
Copy link
Contributor

Fix Race Condition in StreamableHTTP Transport (Closes #1363)

Motivation and Context

Starting from v1.12.0, MCP servers in HTTP Streamable mode experience a race condition that causes ClosedResourceError exceptions when requests fail validation early (e.g., due to incorrect Accept headers). This issue affects server reliability and can be reproduced consistently with fast-failing requests.

The race condition occurs because:

  1. Message router enters async for write_stream_reader loop
  2. write_stream_reader calls checkpoint() in receive(), yielding control
  3. Request validation fails early and returns immediately
  4. Transport termination closes all streams including write_stream_reader
  5. Message router resumes and encounters closed stream, raising ClosedResourceError

This fix ensures graceful handling of stream closure scenarios without propagating exceptions that could destabilize the server.

How Has This Been Tested?

Test Suite

Added comprehensive test suite in tests/issues/test_1363_race_condition_streamable_http.py that reproduces the race condition:

  1. Invalid Accept Headers Test:

    • Missing application/json in Accept header
    • Missing text/event-stream in Accept header
    • Completely invalid Accept header
  2. Invalid Content-Type Test:

    • Incorrect Content-Type header
  3. Log Analysis:

    • Captures server logs from separate process
    • Verifies no ClosedResourceError exceptions occur
    • Checks for "Error in message router" messages
    • Validates graceful error handling

Test Execution

  • Tests run in isolated processes to capture real server behavior
  • Server runs in stateless mode to trigger race condition
  • Multiple request scenarios tested to ensure comprehensive coverage
  • Log analysis confirms fix prevents exception propagation

Breaking Changes

None. This is a bug fix that maintains full backward compatibility.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

Implementation Details

The fix adds explicit exception handling for anyio.ClosedResourceError in the message router loop:

except anyio.ClosedResourceError:
    if self._terminated:
        logging.debug("Read stream closed by client")
    else:
        logging.exception("Unexpected closure of read stream in message router")

This approach:

  • Graceful Handling: Prevents exception propagation that could crash the server
  • Smart Logging: Distinguishes between expected termination and unexpected closure
  • Minimal Impact: No performance overhead or behavioral changes
  • Robust: Handles the race condition without complex synchronization

Related Issues

@Edison-A-N Edison-A-N requested review from a team and felixweinberger September 21, 2025 06:12
@maxisbey maxisbey added the bug Something isn't working label Sep 22, 2025
@thomasst
Copy link

This seems to silence the error. Is this the correct approach given that for me (and others in #1219 / #1190) the error happens on every request, so it doesn't appear to just be a race condition?

@Edison-A-N
Copy link
Contributor Author

Hi,

In anyio's Implementation

1. Conditions for Iteration Termination

Class inheritance:

MemoryObjectReceiveStream -> ObjectReceiveStream -> UnreliableObjectReceiveStream

As we can see in the implementation of UnreliableObjectReceiveStream.__anext__:

async def __anext__(self) -> T_co:
    try:
        return await self.receive()
    except EndOfStream:
        raise StopAsyncIteration from None

That is, the EndOfStream exception will terminate the iteration.

2. When to Raise EndOfStream or ClosedResourceError

MemoryObjectReceiveStream.receive -> receive_nowait:

def receive_nowait(self) -> T_co:
    """
    Receive the next item if it can be done without waiting.

    :return: the received item
    :raises ~anyio.ClosedResourceError: if this send stream has been closed
    :raises ~anyio.EndOfStream: if the buffer is empty and this stream has been
        closed from the sending end
    :raises ~anyio.WouldBlock: if there are no items in the buffer and no tasks
        waiting to send
    """

All ClosedResourceError exceptions are based on this check:

if self._closed:
    raise ClosedResourceError

And of course, self._closed becomes True originates from:

class MemoryObjectReceiveStream:
    ...
    ...
    def close(self) -> None:
        """
        Close the stream.

        This works the exact same way as :meth:`aclose`, but is provided as a special
        case for the benefit of synchronous callbacks.

        """
        if not self._closed:
            self._closed = True
            self._state.open_receive_channels -= 1
            if self._state.open_receive_channels == 0:
                send_events = list(self._state.waiting_senders.keys())
                for event in send_events:
                    event.set()

Review of Known Issues

In issue #1219, the debug information clearly shows _closed = True (visible in the debug for : second screenshot).

The traceback in issue #1190 also lists the root cause of the error. It occurs when if self._closed is True.

In fact, looking at the anyio implementation above, it's very clear that ClosedResourceError is raised because the stream has been closed.

Why This Implementation is Appropriate

This implementation is not "silencing the error". In fact, in scenarios where multiple coroutines operate on the same stream simultaneously, checking whether the stream has been closed is a necessary operation. Since anyio.MemoryObjectReceiveStream chooses to raise ClosedResourceError rather than support automatic iteration termination, we need to actively check during for loop iteration.

When checking externally, we simultaneously check self._terminated, which ensures that external calls know whether the ClosedResourceError exception is due to active closure. If not, it still outputs logger.exception to the logs.

@ofek
Copy link

ofek commented Oct 21, 2025

I describe an easy way to reproduce this issue here #1190 (comment)

Your test case is quite similar to the underlying implementation of the example.

Copy link

@ofek ofek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test file needs a new line at the end.

@Edison-A-N
Copy link
Contributor Author

Hi, all! I recently tried to review the exception occurrence situation again. Here are some additional insights I'd like to share. Welcome any feedback and guidance!

Core Problem

Synchronous flow execution completes, causing checkpoint to be unable to yield and return for execution, leading to ClosedResourceError

Test Case Analysis

Based on three test scenarios in test_1363_race_condition_streamable_http.py:

  1. Invalid Accept Headers - Missing text/event-stream or application/json
  2. Invalid Content-Type - Not application/json
  3. JSON Response Mode - Specific code path when json_response=True

Execution Flow Analysis

1. System Startup Phase

# streamable_http_manager.py:170-187
async def run_stateless_server():
    async with http_transport.connect() as streams:
        # Start message router task
        tg.start_soon(message_router)
        # Start MCP server
        await self.app.run(streams)

2. Message Router Suspension

# streamable_http.py:831
async def message_router():
    async for session_message in write_stream_reader:  # Key point
        # Process message routing
        # After processing, return to loop start
        # Call checkpoint() again to suspend and wait for next message

Key Mechanism: The async for loop internally calls checkpoint() to yield control and wait for new messages.

3. Request Processing Phase

3.1 Invalid Request Headers Scenario (Fast Failure)

# streamable_http.py:315-323
async def _handle_post_request():
    # Synchronous Accept header validation
    has_json, has_sse = self._check_accept_headers(request)  # Synchronous
    if not (has_json and has_sse):
        response = self._create_error_response(...)  # Synchronous
        await response(scope, receive, send)  # Only yield point
        return  # Immediate return

3.2 JSON Response Mode Scenario (After Processing Response)

# streamable_http.py:397-439
if self.is_json_response_enabled:

    # Wait for response
    async for event_message in request_stream_reader:  # Line 408
        if isinstance(event_message.message.root, JSONRPCResponse | JSONRPCError):
            response_message = event_message.message
            break

    # Send response
    response = self._create_json_response(response_message)
    await response(scope, receive, send)

    # Clean up resources
    await self._clean_up_memory_streams(request_id)  # Synchronous

4. Transport Termination Phase

# streamable_http_manager.py:189-193
# Immediately terminate after processing request
await http_transport.handle_request(scope, receive, send)
await http_transport.terminate()  # Immediately close streams
# streamable_http.py:623-653
async def terminate(self):
    self._terminated = True
    # Close all streams
    if self._write_stream_reader is not None:
        await self._write_stream_reader.aclose()  # Close stream used by message router

Precise Timing of Race Condition

Key Timeline

Timeline:
T1: Message router starts, enters async for loop
T2: write_stream_reader.receive() calls checkpoint() and suspends
T3: Main coroutine processes request, validation fails or completes
T4: Main coroutine sends response (await response) - No cooperation point
T5: Main coroutine executes synchronous cleanup code - No cooperation point
T6: Main coroutine calls terminate() - Synchronous code
T7: terminate() closes write_stream_reader - Synchronous code
T8: Message router resumes, tries to continue iteration
T9: Discovers stream is closed, throws ClosedResourceError

Core Problem

1. When validation fails early (T3), the main coroutine executes synchronous error handling and immediately terminates the transport, causing the same race condition as described in the test cases.

2. After T4, the main coroutine executes all synchronous code with no opportunity to yield control, preventing the message router from completing its current iteration before the stream is closed.

Root Cause

The message router suspends via checkpoint() in the async for loop, while the main coroutine executes synchronous code and quickly closes the stream without giving the message router a chance to complete its current iteration.

Solution

Direct ClosedResourceError Handling

Referencing the handling practice for request_stream, the appropriate solution is to directly catch ClosedResourceError:

# streamable_http.py:862-871 (existing code)
if request_stream_id in self._request_streams:
    try:
        # Send both the message and the event ID
        await self._request_streams[request_stream_id][0].send(EventMessage(message, event_id))
    except (
        anyio.BrokenResourceError,
        anyio.ClosedResourceError,  # Already catching this error
    ):
        # Stream might be closed, remove from registry
        self._request_streams.pop(request_stream_id, None)

Solution: Directly catch ClosedResourceError in the message router's async for loop:

# streamable_http.py:829-887
async def message_router():
    try:
        async for session_message in write_stream_reader:
            # Process message routing logic
            # ...
    except anyio.ClosedResourceError:
        # Stream closed, graceful exit
        if self._terminated:
            logger.debug("Read stream closed by client")
        else:
            logger.debug("Read stream closed unexpectedly")
    except Exception:
        logger.exception("Error in message router")

This way, when the main coroutine closes the stream, the message router will gracefully catch ClosedResourceError and exit, rather than letting the exception propagate.

I would like to emphasize once again that directly catching the error is appropriate and correct. Of course, I also welcome more guidance on this solution, and I'm very happy to learn from everyone. 😊😊

chetan-jarande added a commit to chetan-jarande/openai-batch-tracker that referenced this pull request Oct 30, 2025
Notes:
- Disable returning JSON responses when running in stateless mode. This change prevents the
`anyio.ClosedResourceError` that can occur when the response stream is closed unexpectedly.
- This resolves an observed anyio.ClosedResourceError.
More discussion and context are available in the linked threads.
- An alternative is to catch this error inside message_router, handle it gracefully, and log it
instead of letting it propagate to the top-level; that approach is left for a follow-up.

Github Issue Threads:
- jlowin/fastmcp#2083
- modelcontextprotocol/python-sdk#1384 (comment)
@ofek
Copy link

ofek commented Dec 2, 2025

Is this blocked on reviews?

Copy link
Member

@Kludex Kludex left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to stop using uvicorn in every test in this repository. It kills me (and the test suite...).

We shouldn't rely on network requests in the test suite.


That said, the change seems fine.

@maxisbey Can you please decide what do you want to do with the test here?

@maxisbey
Copy link
Contributor

maxisbey commented Dec 3, 2025

I think we should merge for now since it's a high priority issue and fixes a bug affecting people. Fixing the uvicorn in test suites is something that's been mentioned a few times and we should fix in some follow ups:

@maxisbey maxisbey enabled auto-merge (squash) December 3, 2025 19:12
@ofek
Copy link

ofek commented Dec 3, 2025

It looks like the merge commit that was just introduced impacted coverage and now CI is failing because a handful of lines are no longer covered.

auto-merge was automatically disabled December 4, 2025 09:21

Head branch was pushed to by a user without write access

@Kludex Kludex merged commit 9ed0b93 into modelcontextprotocol:main Dec 4, 2025
18 checks passed
@Edison-A-N Edison-A-N deleted the fix/race-condition-streamable-http-1363 branch December 4, 2025 11:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working needs maintainer action Potentially serious issue - needs proactive fix and maintainer attention

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Race Condition in StreamableHTTP Transport Causes ClosedResourceError

6 participants