Skip to content

fix: run SSE notification keepalive on a dedicated thread (survives GameThread stalls)#491

Open
vladSirin wants to merge 1 commit into
ChiR24:devfrom
vladSirin:fix/keepalive-dedicated-thread
Open

fix: run SSE notification keepalive on a dedicated thread (survives GameThread stalls)#491
vladSirin wants to merge 1 commit into
ChiR24:devfrom
vladSirin:fix/keepalive-dedicated-thread

Conversation

@vladSirin

Copy link
Copy Markdown

Problem

The notification-stream SSE keepalive is driven by CleanupStaleRequests, which runs on the subsystem ticker (the GameThread). When the editor GameThread stalls longer than KeepaliveIntervalSeconds — recompile, PIE enter/exit, a modal dialog, a blocking asset import — no keepalive is sent, the client's SSE stream idles out, and the MCP session is re-initialized. It shows up as repeated reconnects with no editor restart.

Fixes #488.

Change

Move the keepalive onto its own thread so it keeps firing through GameThread stalls:

  • RunKeepaliveLoop() launched in Start() via Async(EAsyncExecution::Thread, …), stored as TFuture<void> KeepaliveLoopFuture. It waits on the existing StopEvent (5s tick) and calls SweepNotificationKeepalives().
  • SweepNotificationKeepalives() is the sweep lifted out of step 4 of CleanupStaleRequests() — same snapshot-under-NotificationStreamsMutex then write-outside-lock, keeps TouchSession() on success.
  • Shutdown() joins the loop after the accept thread is killed and Stop() has signaled bStopping/StopEvent, but before StopEvent is returned to the pool and notification streams are closed.
  • CleanupStaleRequests() keeps the request-timeout / session-expiry / dead-stream reaping on the GameThread.

No new lock nesting (per-stream WriteMutex is only taken outside NotificationStreamsMutex); the manual-reset StopEvent gives prompt, busy-spin-free shutdown. KeepaliveIntervalSeconds is unchanged.

Verification

Implemented and verified in a production UE 5.7.1 project: a 90s total GameThread freeze left the notification stream + session intact (no stream closed, no re-initialize), and a post-freeze call worked on the same session. Before the change, the same stall dropped the stream.

3 files changed, +85 / −19.

…ameThread stalls)

The notification-stream SSE keepalive was driven by CleanupStaleRequests, which
runs on the subsystem ticker (the GameThread). When the editor GameThread stalls
longer than KeepaliveIntervalSeconds (recompile, PIE enter/exit, modal dialog,
blocking asset import), no keepalive is sent, the client's SSE stream idles out,
and the MCP session is re-initialized -- visible as repeated reconnects with no
editor restart.

Move the keepalive onto its own thread:
- RunKeepaliveLoop() launched in Start() via Async(EAsyncExecution::Thread),
  stored as TFuture<void> KeepaliveLoopFuture. It waits on the existing StopEvent
  (5s tick) and calls SweepNotificationKeepalives().
- SweepNotificationKeepalives() is the keepalive sweep lifted out of step 4 of
  CleanupStaleRequests() (same snapshot-under-lock then write-outside-lock; keeps
  TouchSession() on success).
- Shutdown() joins the loop after the accept thread is killed and Stop() has
  signaled bStopping/StopEvent, but before StopEvent is returned to the pool and
  before notification streams are closed.
- CleanupStaleRequests() keeps the request-timeout / session-expiry / dead-stream
  reaping on the GameThread.

No new lock nesting (per-stream WriteMutex is only taken outside
NotificationStreamsMutex); the manual-reset StopEvent gives prompt, busy-spin-free
shutdown. KeepaliveIntervalSeconds is unchanged.

Verified in a production UE 5.7.1 project: a 90s total GameThread freeze left the
notification stream + session intact (no "stream closed", no re-"initialize"), and
a post-freeze call worked on the same session. Before the change the same stall
dropped the stream.

Closes ChiR24#488

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@github-actions

Copy link
Copy Markdown
Contributor

👋 Thanks for your first Pull Request! We love contributions. Please ensure you have signed off your commits and followed the contribution guidelines.

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Summary by CodeRabbit

  • New Features

    • Added method to check whether log event subscribers are active.
  • Improvements

    • Optimized keepalive mechanism to run on a dedicated background thread, preventing game thread interference.
    • Enhanced shutdown sequence to ensure background tasks complete cleanly before full shutdown.

Walkthrough

The SSE notification-stream keepalive is extracted from the game-thread CleanupStaleRequests() tick into a dedicated background thread. A new RunKeepaliveLoop() method (launched via Async in Start(), stored as TFuture<void>) periodically calls SweepNotificationKeepalives(), which snapshots eligible streams under the map lock, writes keepalives outside it, and marks streams for removal on failure. Shutdown() joins the future before releasing StopEvent or closing streams.

Changes

Dedicated Keepalive Thread for SSE Notification Streams

Layer / File(s) Summary
Header: contracts, member declarations, concurrency docs
...McpNativeTransport.h
Adds Async/Future.h include; declares RunKeepaliveLoop() and SweepNotificationKeepalives() as private methods and HasLogEventSubscribers() as a public method; documents non-atomic write-once semantics for the notification stream timestamp; adds TFuture<void> KeepaliveLoopFuture member.
Keepalive loop and sweep implementation
...McpNativeTransportCleanup.cpp
Removes keepalive writes from CleanupStaleRequests(). Adds RunKeepaliveLoop() (fixed-interval wait on StopEvent, calls sweep until bStopping). Adds SweepNotificationKeepalives() (snapshot under map lock, write outside lock, update LastKeepaliveTime and call TouchSession() on success, mark for removal on write failure).
Lifecycle wiring: Start/Shutdown
...McpNativeTransportLifecycle.cpp
Start() launches RunKeepaliveLoop on a dedicated thread via Async after bind/listen, storing the result in KeepaliveLoopFuture. Shutdown() waits on KeepaliveLoopFuture before releasing StopEvent and closing notification streams.

Sequence Diagram(s)

sequenceDiagram
  actor GameThread
  participant CleanupStaleRequests
  participant KeepaliveThread as RunKeepaliveLoop (background)
  participant SweepNotificationKeepalives
  participant NotificationStream

  rect rgba(200, 100, 100, 0.5)
    Note over GameThread,CleanupStaleRequests: Game thread (may stall)
    GameThread->>CleanupStaleRequests: Tick (reap expired/dead streams only)
  end

  rect rgba(100, 150, 200, 0.5)
    Note over KeepaliveThread,NotificationStream: Dedicated keepalive thread (unaffected by game-thread stalls)
    loop Every ~5s until bStopping
      KeepaliveThread->>SweepNotificationKeepalives: call
      SweepNotificationKeepalives->>NotificationStream: snapshot eligible streams (under map lock)
      SweepNotificationKeepalives->>NotificationStream: write `:keepalive` (outside map lock)
      alt success
        SweepNotificationKeepalives->>SweepNotificationKeepalives: update LastKeepaliveTime, TouchSession()
      else failure
        SweepNotificationKeepalives->>SweepNotificationKeepalives: mark stream for removal
      end
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Suggested labels

size/m

Poem

🐇 Hop hop, the game thread froze one day,
But streams kept silent — clients ran away!
Now a brave thread wakes on its own each beat,
Sending keepalives through rain, stall, and sleet.
No reconnect, no session lost in the snow —
The rabbit moved the loop so streams always flow! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly summarizes the main change: moving SSE notification keepalive to a dedicated thread to survive GameThread stalls.
Description check ✅ Passed The description comprehensively covers the problem, implementation details, verification results, and includes a reference to issue #488.
Linked Issues check ✅ Passed The PR fully addresses issue #488 by moving keepalive to a dedicated thread, maintaining proper synchronization discipline, and supporting graceful shutdown.
Out of Scope Changes check ✅ Passed All changes are scoped to the keepalive mechanism refactoring: header additions, keepalive loop extraction, lifecycle management, and a public HasLogEventSubscribers() method.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
plugins/McpAutomationBridge/Source/McpAutomationBridge/Private/MCP/Transport/McpNativeTransport.h (1)

79-81: ⚡ Quick win

Keep the keepalive helpers private.

These helpers are currently public, but SweepNotificationKeepalives() mutates LastKeepaliveTime, whose contract says it is accessed only by the dedicated keepalive thread. Move these declarations below private: so external callers cannot accidentally violate that single-writer assumption.

Proposed visibility fix
-	// Dedicated-thread keepalive (immune to GameThread stalls).
-	void RunKeepaliveLoop();
-	void SweepNotificationKeepalives();
-
 	// FRunnable interface
 	virtual bool Init() override { return true; }
 	virtual uint32 Run() override;
 	virtual void Stop() override;
 
 private:
+	// Dedicated-thread keepalive (immune to GameThread stalls).
+	void RunKeepaliveLoop();
+	void SweepNotificationKeepalives();
+
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@plugins/McpAutomationBridge/Source/McpAutomationBridge/Private/MCP/Transport/McpNativeTransport.h`
around lines 79 - 81, The methods RunKeepaliveLoop() and
SweepNotificationKeepalives() are currently declared in the public section of
McpNativeTransport class, but since SweepNotificationKeepalives() mutates
LastKeepaliveTime which should only be accessed by the dedicated keepalive
thread, these methods must be moved to the private section. Relocate both method
declarations from their current public location to below a private: access
specifier to enforce this single-writer contract and prevent external callers
from accidentally violating it.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In
`@plugins/McpAutomationBridge/Source/McpAutomationBridge/Private/MCP/Transport/McpNativeTransport.h`:
- Around line 79-81: The methods RunKeepaliveLoop() and
SweepNotificationKeepalives() are currently declared in the public section of
McpNativeTransport class, but since SweepNotificationKeepalives() mutates
LastKeepaliveTime which should only be accessed by the dedicated keepalive
thread, these methods must be moved to the private section. Relocate both method
declarations from their current public location to below a private: access
specifier to enforce this single-writer contract and prevent external callers
from accidentally violating it.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 72157f34-7674-4ede-b949-5b1945b60e4b

📥 Commits

Reviewing files that changed from the base of the PR and between 89aedb0 and c766d3d.

📒 Files selected for processing (3)
  • plugins/McpAutomationBridge/Source/McpAutomationBridge/Private/MCP/Transport/McpNativeTransport.h
  • plugins/McpAutomationBridge/Source/McpAutomationBridge/Private/MCP/Transport/McpNativeTransportCleanup.cpp
  • plugins/McpAutomationBridge/Source/McpAutomationBridge/Private/MCP/Transport/McpNativeTransportLifecycle.cpp

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Native MCP SSE keepalive is game-thread-bound — notification stream drops during editor stalls

1 participant