Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Nov 4, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Ready Ready Preview Comment Nov 4, 2025 7:25pm
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 4, 2025 7:25pm
rivet-inspector Ignored Ignored Preview Nov 4, 2025 7:25pm
rivet-site Ignored Ignored Preview Nov 4, 2025 7:25pm

This was referenced Nov 4, 2025
Copy link
Member Author

NathanFlurry commented Nov 4, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 4, 2025

Code Review: Fix WebSocket Connection Cleanup

Summary

This PR fixes a race condition where connections were not being cleaned up properly when actor.createConn threw an error. The fix tracks the created connection in a closure variable and uses .finally() instead of .then() to ensure cleanup always happens.

Positive Changes ✅

  1. Correct Race Condition Fix: The previous implementation would fail to clean up connections if handlersPromise rejected (i.e., if actor.createConn threw an error). By tracking createdConn and using .finally(), cleanup now happens regardless of success or failure.

  2. Simplified Error Handling: Removed the redundant .catch() block in the onClose handler that was logging errors from handlersPromise. Since we now only call __connDisconnected if a connection was actually created, this error handling was unnecessary.

  3. Clearer Intent: The variable name createdConn makes it explicit that we're tracking whether a connection was successfully created, improving code readability.

Potential Issues & Recommendations ⚠️

1. Incomplete Cleanup on Error Path (Medium Priority)

Issue: When actor.createConn throws an error (lines 209-220), the error is caught and the WebSocket is closed with ws.close(1011, code). However, there's no explicit cleanup of any partial state that createConn might have created before throwing.

Location: rivetkit-typescript/packages/rivetkit/src/actor/router-endpoints.ts:209-221

Recommendation: Consider if actor.createConn might leave any partial state (e.g., database records, in-memory references) when it throws. If so, ensure those are cleaned up. You may want to verify that createConn is atomic or has its own cleanup logic.

2. Potential Memory Leak with Closure Variable (Low Priority)

Issue: The createdConn variable is captured in the closure scope of all event handlers. If the WebSocket connection lives for a long time, this could potentially hold references longer than necessary.

Current code:

let createdConn: AnyConn | undefined;

return {
  onOpen: ...,   // captures createdConn
  onMessage: ..., // captures createdConn
  onClose: ...,   // captures createdConn
  onError: ...    // captures createdConn
};

Recommendation: This is likely not a practical issue since WebSocket connections have a defined lifecycle and the handlers are designed to be short-lived. However, if connection lifetimes can be very long, consider explicitly nulling out createdConn after cleanup to allow GC.

3. Consistency with Error Logging (Minor)

Issue: The previous implementation logged errors that occurred during the promise resolution in onClose. While this error path was problematic (since it would catch createConn failures), completely removing error visibility might make debugging harder.

Previous code (removed):

.catch((error) => {
  deconstructError(error, actor.rLog, { wsEvent: "close" }, exposeInternalError);
});

Recommendation: Verify that errors from actor.createConn are already being properly logged in the onOpen handler (lines 212-220). If not, consider adding a debug log to track when onClose runs without a created connection.

4. Missing Test Coverage (Medium Priority)

Question: Are there tests that verify the following scenarios?

  • WebSocket closes after createConn throws an error
  • WebSocket closes during createConn execution (before it completes)
  • __connDisconnected is NOT called when no connection was created
  • __connDisconnected IS called when connection was created successfully

Recommendation: Add integration tests for these edge cases to prevent regression.

Security Considerations 🔒

No security concerns identified. The change doesn't introduce new attack vectors.

Performance Considerations ⚡

  • Positive: Using .finally() instead of .then() eliminates one promise chain, slightly reducing overhead.
  • Negligible Impact: Adding one closure variable has minimal memory impact.

Code Quality 📝

Follows repository conventions:

  • ✅ Structured logging patterns maintained
  • ✅ Error handling follows existing patterns
  • ✅ TypeScript types properly used

Suggestions:

  1. Consider adding a comment explaining why we track createdConn separately from the promise resolution (i.e., to handle the error case).

    // Track connection outside of scope for cleanup
    // We need this because onClose may fire after createConn throws,
    // and we should only cleanup connections that were actually created
    let createdConn: AnyConn | undefined;

Testing Recommendations 🧪

Add test cases covering:

  1. Error before connection creation: Verify no cleanup called when createConn throws before creating a connection
  2. Error during connection creation: Verify cleanup IS called if createConn creates a connection but then throws
  3. Normal close flow: Verify existing behavior unchanged
  4. Abnormal close flow: Verify wasClean is correctly passed through

Summary

This is a solid fix for a real race condition bug. The core logic is correct, but I recommend:

  1. Merge: The fix addresses the stated issue correctly
  2. 🔍 Verify: Confirm createConn doesn't leave partial state on error
  3. 📝 Enhance: Add a clarifying comment about the purpose of createdConn
  4. 🧪 Test: Add test coverage for the error path this fixes

Overall Assessment: Approved with minor suggestions for improvement.

@claude
Copy link

claude bot commented Nov 4, 2025

Code Review

Summary

This PR fixes a resource leak where WebSocket connections weren't being properly cleaned up when actor.createConn throws an error. The fix introduces a variable to track the created connection outside the promise scope so it can be cleaned up regardless of whether the promise resolves or rejects.

Positive Changes ✅

  1. Correct Fix for Resource Leak: The change properly addresses the memory leak by storing the connection reference (createdConn) outside the async scope, ensuring cleanup happens even when actor.createConn errors.

  2. Simplified Error Handling: The new approach using .finally() is cleaner and more idiomatic than the previous .then().catch() chain. It correctly handles both success and error cases.

  3. Consistent Pattern: The fix aligns well with the SSE handler pattern in the same file (lines 352-443), which also uses a similar variable scoping approach for cleanup.

Potential Issues & Suggestions 🔍

1. Race Condition Edge Case (Minor)

There's a theoretical race condition between lines 205 and 208:

// Store connection so we can clean on close
createdConn = conn;

// Unblock other handlers
handlersResolve({ conn, actor, connId: conn.id });

If onClose fires extremely quickly after handlersResolve, it's possible (though unlikely) that createdConn might not be set yet. Consider moving the assignment before line 208:

// Store connection so we can clean on close
createdConn = conn;

// Unblock other handlers  
handlersResolve({ conn, actor, connId: conn.id });

Actually, looking at this more carefully, the assignment is already before the resolve, so this is fine. The .finally() handler won't execute until after the promise settles anyway.

2. Inconsistent Cleanup Logic

In the onClose handler (lines 308-313), you're using handlersPromise.finally() which waits for the promise to settle. However, if actor.createConn errors and rejects the promise, the connection was never successfully created, so createdConn will be undefined and cleanup is correctly skipped.

But: The previous code would have called deconstructError on the handlersPromise.catch() path (lines 311-317 in the old code). Now errors from actor.createConn will only be logged in the onOpen handler's catch block (lines 212-220). This is probably fine since the error is already being handled there, but it's a subtle behavior change worth noting.

3. Consider Adding a Comment

The fix is subtle and non-obvious. Consider adding a comment explaining why createdConn must be outside the async scope:

// Track connection outside of scope for cleanup
// This ensures we can clean up the connection in onClose even if
// actor.createConn fails and the promise is rejected
let createdConn: AnyConn | undefined;

Code Quality Notes 📝

  1. Type Safety: Good use of | undefined to make the uninitialized state explicit
  2. Error Handling: The error handling in the onOpen async IIFE (lines 209-221) properly catches and closes the WebSocket with the error code
  3. Logging: Good logging throughout for debugging

Test Coverage 🧪

Issue: I couldn't find any tests that cover the createConn error path. Consider adding a test that:

  • Mocks actor.createConn to throw an error
  • Verifies the WebSocket is closed with the correct code
  • Crucially: Verifies that __connDisconnected is NOT called when the connection was never created

This would prevent regressions and document the expected behavior.

Security & Performance ✓

  • Security: No concerns - the fix doesn't introduce any security issues
  • Performance: Negligible impact - adds one variable and simplifies the promise chain

Verdict

Approved with suggestions

This is a solid fix for a real resource leak issue. The code change is minimal and correct. My suggestions are mostly about edge cases and testing, not blocking issues.

Recommendation:

  • Consider adding the explanatory comment suggested above
  • Add test coverage for the error path to prevent regressions

@claude
Copy link

claude bot commented Nov 4, 2025

Code Review

Summary

This PR fixes a resource leak where WebSocket connections were not being properly cleaned up when actor.createConn throws an error. The fix introduces a connection tracking variable outside the closure scope to ensure cleanup occurs even in error scenarios.

Positive Changes

Critical Bug Fix: The PR correctly addresses a real resource leak. Previously, if actor.createConn threw an error, the onClose handler would call .finally() on handlersPromise but the promise would have been rejected (via handlersReject(error) in line 210), meaning the original .then() logic would never execute and actor.__connDisconnected would never be called.

Cleaner Error Handling: The new approach using .finally() is more appropriate than the previous .then().catch() pattern since cleanup should happen regardless of success or failure.

Minimal Changes: The fix is surgical and doesn't introduce unnecessary complexity.

Issues & Concerns

1. Potential Race Condition ⚠️

There's a subtle race condition in the current implementation:

// Line 204-205
// Store connection so we can clean on close
createdConn = conn;

// Line 308-313
handlersPromise.finally(() => {
    if (createdConn) {
        const wasClean = event.wasClean || event.code === 1000;
        actor.__connDisconnected(createdConn, wasClean, requestId);
    }
});

If the WebSocket closes before actor.createConn completes but after line 187 starts executing, the onClose handler will fire with createdConn still undefined, meaning cleanup won't happen. The .finally() callback will execute, find createdConn === undefined, and skip the disconnection logic.

Suggested Fix: Store the connection in a different state that tracks the attempt:

let connAttempt: { started: boolean; conn?: AnyConn } = { started: false };

// In onOpen async block
connAttempt.started = true;
conn = await actor.createConn(...);
connAttempt.conn = conn;

// In onClose
handlersPromise.finally(() => {
    if (connAttempt.started && connAttempt.conn) {
        actor.__connDisconnected(connAttempt.conn, wasClean, requestId);
    }
});

However, after reviewing __connDisconnected (instance.ts:881), I see it checks if the connection exists in the connections map. So if createConn hasn't added it yet, calling __connDisconnected would be a no-op anyway. This may be acceptable depending on whether you want explicit tracking.

2. Type Safety ℹ️

createdConn is typed as AnyConn | undefined but there's no explicit null check before dereferencing in the finally block. While JavaScript handles this correctly with the if (createdConn) check, consider whether explicit null checking would improve clarity:

if (createdConn \!== undefined) {
    actor.__connDisconnected(createdConn, wasClean, requestId);
}

This is a minor stylistic point.

3. Error Swallowing in finally() ⚠️

The .finally() block has no error handling. If actor.__connDisconnected throws an exception, it will be swallowed and potentially cause issues. Consider wrapping in try-catch:

handlersPromise.finally(() => {
    if (createdConn) {
        try {
            const wasClean = event.wasClean || event.code === 1000;
            actor.__connDisconnected(createdConn, wasClean, requestId);
        } catch (error) {
            actor.rLog.error({ msg: "error during connection cleanup", error });
        }
    }
});

Testing Recommendations

Since there are no test files for router-endpoints.ts, I recommend adding tests for this scenario:

  1. Test: Connection cleanup on createConn error

    • Mock actor.createConn to throw an error
    • Trigger WebSocket onOpen followed by onClose
    • Verify actor.__connDisconnected is NOT called (since no connection was created)
  2. Test: Connection cleanup on successful creation

    • Mock successful actor.createConn
    • Trigger WebSocket onOpen, wait for completion, then onClose
    • Verify actor.__connDisconnected IS called with correct parameters
  3. Test: Connection cleanup on early close

    • Mock actor.createConn with a delay
    • Trigger onOpen immediately followed by onClose before createConn completes
    • Verify no errors are thrown

Performance Considerations

No performance concerns. The addition of a single variable in the closure is negligible.

Security Considerations

No security issues identified. The change doesn't affect authentication, authorization, or data exposure.

Style & Convention Notes

✅ Code follows existing patterns in the file
✅ Variable naming (createdConn) is clear and follows conventions
✅ Comments are helpful ("Track connection outside of scope for cleanup")

Verdict

Approve with recommendations

The core fix is correct and addresses a real resource leak. The main concern is the race condition scenario and lack of error handling in the .finally() block, but these are edge cases that may be acceptable given the existing safeguards in __connDisconnected.

Recommendation: Merge as-is if the race condition scenario is deemed acceptable (which it likely is), or add the error handling wrapper for robustness. Consider adding tests in a follow-up PR to prevent regressions.

@NathanFlurry NathanFlurry force-pushed the 11-03-fix_rivetkit_fix_conns_not_being_cleaned_up_if_actor.createconn_errors branch from cde5aa5 to ac85131 Compare November 4, 2025 19:24
@claude
Copy link

claude bot commented Nov 4, 2025

Code Review

Summary

This PR fixes a resource leak where connections weren't being cleaned up if actor.createConn threw an error during the WebSocket onOpen handler. The fix tracks the created connection outside the handler scope and ensures cleanup happens regardless of whether createConn succeeded or failed.

Positive Feedback

Core Fix is Correct: The approach of tracking createdConn outside the async IIFE scope and using .finally() is the right pattern for ensuring cleanup happens in all cases.

Simplified Error Handling: Removing the .then().catch() chain in favor of .finally() makes the code cleaner and more maintainable.

Handles Edge Cases: The if (createdConn) check properly handles the case where createConn never completed.

Issues & Concerns

1. Potential Race Condition (Minor)

The assignment createdConn = conn happens after actor.createConn completes, but the onClose handler could theoretically fire immediately if the WebSocket closes during connection creation. In practice, this is unlikely since:

  • The WebSocket is passed to createConn, so it should be open at that point
  • The handlers run asynchronously

However, for complete safety, consider whether createdConn should be set before awaiting other operations within createConn.

Severity: Low (edge case, unlikely in practice)

2. Inconsistency with SSE Handler (Observation)

The SSE handler (handleSseConnect) has similar connection lifecycle management but uses a different pattern:

  • It uses a conn variable in the outer scope (line 353)
  • Has explicit cleanup in both the abort handler and catch block (lines 397, 436)

While the WebSocket pattern is now cleaner, the inconsistency between handlers might make the codebase harder to maintain. Consider whether both should use the same pattern.

Recommendation: Document why different patterns are used, or unify them in a future refactor.

3. Test Coverage (Important)

There don't appear to be unit tests for handleWebSocketConnect, particularly for the error path that this PR fixes.

Recommendation: Add tests to verify:

  • Connection cleanup happens when actor.createConn throws
  • No double-cleanup occurs
  • The connection isn't added to the actor's connection map if createConn fails
  • Memory leaks don't occur under error conditions

4. Documentation (Minor)

The inline comment at line 167 (// Track connection outside of scope for cleanup) is helpful, but could be expanded to explain why this is necessary (i.e., to ensure cleanup happens even if createConn fails).

Questions

  1. Are there integration tests that cover this scenario? If not, how was this bug discovered and verified fixed?

  2. What happens to the WebSocket itself if createConn fails? The code calls ws.close(1011, code) in the catch block (line 220), which is good. But should we verify that calling actor.__connDisconnected on a connection that was never fully created doesn't cause issues?

  3. Is there a way this could cause double-cleanup? If handlersReject is called in the catch block and then onClose fires immediately after, both will call handlersPromise.finally(). The if (createdConn) guard should prevent issues, but worth verifying.

Performance Considerations

✅ No performance concerns - the changes are minimal and don't add any meaningful overhead.

Security Considerations

✅ No security concerns - this is a bug fix that improves resource cleanup, which could prevent DoS via resource exhaustion.

Code Style

The changes follow the existing code style and conventions in the file. No issues here.

Overall Assessment

Approval: ✅ Approved with minor recommendations

This is a solid bug fix that addresses a real resource leak. The implementation is clean and follows good error handling patterns. My main recommendation is to add test coverage for this error path to prevent regressions.

The fix improves the robustness of the WebSocket connection lifecycle and should prevent resource leaks when actor creation fails.

@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 4, 2025

Merge activity

  • Nov 4, 8:10 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 4, 8:11 PM UTC: CI is running for this pull request on a draft pull request (#3349) due to your merge queue CI optimization settings.
  • Nov 4, 8:41 PM UTC: The Graphite merge queue removed this pull request due to removal of a downstack PR #3342.
  • Nov 4, 8:53 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 4, 8:53 PM UTC: CI is running for this pull request on a draft pull request (#3353) due to your merge queue CI optimization settings.
  • Nov 4, 8:56 PM UTC: Merged by the Graphite merge queue via draft PR: #3353.

graphite-app bot pushed a commit that referenced this pull request Nov 4, 2025
@NathanFlurry NathanFlurry force-pushed the 11-04-fix_rivetkit_fix_not_pino_bindings branch from 3cedf91 to c98b4a2 Compare November 4, 2025 20:25
@NathanFlurry NathanFlurry force-pushed the 11-03-fix_rivetkit_fix_conns_not_being_cleaned_up_if_actor.createconn_errors branch from ac85131 to 4de4ae9 Compare November 4, 2025 20:25
@vercel
Copy link

vercel bot commented Nov 4, 2025

Deployment failed with the following error:

Too many requests - try again in 5 minutes (more than 120, code: "api-deployments-flood-pro").

Learn More: https://vercel.com/docs/limits#rate-limits

@claude
Copy link

claude bot commented Nov 4, 2025

PR Review: Connection Cleanup Fix

Summary

This PR fixes a resource leak where WebSocket connections weren't being properly cleaned up when actor.createConn errors. The fix introduces a simpler, more robust approach to connection cleanup.


✅ Positive Changes

1. Fixes Critical Resource Leak
The original code had a significant bug: if actor.createConn threw an error, the onClose handler would try to call __connDisconnected via the handlersPromise, but since the promise was rejected (not resolved), the cleanup in the .then() block would never execute. The .catch() only logged the error but didn't perform cleanup. This meant connections that failed during creation would leak and never be properly removed.

2. Simplifies Control Flow
The new approach is much cleaner:

  • Before: Used promise chain with .then().catch() requiring the promise to resolve to get conn and actor
  • After: Uses .finally() with createdConn variable, ensuring cleanup happens regardless of success or failure

This is a textbook example of when to use finally - cleanup logic that must run regardless of outcome.

3. Reduces Complexity

  • Removes unnecessary error handling in the cleanup path (the .catch() that just logged errors)
  • Eliminates the need to extract both conn and actor from the promise
  • The actor variable is already in scope, so no need to pass it through the promise

🔍 Code Quality Observations

Good:

  • The fix is minimal and focused on the specific issue
  • Variable naming (createdConn) is clear and descriptive
  • Uses undefined initialization for proper TypeScript typing
  • Correctly checks if (createdConn) before calling __connDisconnected

Consider:

  • The comment on line 167 ("Track connection outside of scope for cleanup") could be more specific: "Track connection for cleanup in onClose, even if createConn fails"

🐛 Potential Issues & Edge Cases

1. Race Condition (Minor)
There's a theoretical race condition between onClose being called and createdConn being assigned (line 205). If the WebSocket closes immediately after opening but before actor.createConn completes, createdConn will still be undefined and cleanup won't happen.

However, this is actually the correct behavior because:

  • If createConn hasn't completed, there's no connection in the actor's conns map yet
  • The error path in onOpen (lines 209-221) already closes the WebSocket
  • The handlersReject ensures the error is propagated

So this edge case is already properly handled. ✅

2. Consistency with Other Handlers
The onMessage handler (lines 224-274) still uses the old pattern of waiting on handlersPromise.then().catch(). This is correct because messages need a valid connection, but it's worth noting the pattern difference.

3. Error Handling in onOpen
After createdConn is assigned, if an error occurs (lines 209-221), we:

  1. Reject the handlers promise
  2. Log the error
  3. Close the WebSocket
  4. The onClose handler will then fire and clean up via createdConn

This flow is correct. ✅


🔐 Security Considerations

No security concerns identified. The change:

  • Doesn't expose any additional error information
  • Maintains the same access control patterns
  • Uses the same __connDisconnected method that's already trusted

⚡ Performance Considerations

Positive Impact:

  • Prevents resource leaks, which would accumulate over time and degrade performance
  • Simpler code path is slightly more efficient (no promise chain overhead in cleanup)

Neutral:

  • The finally approach has the same performance characteristics as .then().catch() for the cleanup path

🧪 Test Coverage

Concern: This PR doesn't include tests for the specific bug being fixed.

Recommended test cases:

  1. Test that verifies connections are cleaned up when createConn throws an error
  2. Test that verifies __connDisconnected is called exactly once per connection attempt
  3. Test the race condition where WebSocket closes immediately after opening

The test would look something like:

// Pseudo-code
test('connection cleanup when createConn errors', async () => {
  const actor = await createTestActor();
  const originalConnCount = actor.conns.size;
  
  // Mock createConn to throw
  actor.createConn = jest.fn().mockRejectedValue(new Error('Creation failed'));
  
  const ws = await connectWebSocket(actor);
  
  // Wait for error handling
  await waitForWebSocketClose(ws);
  
  // Verify no leaked connections
  expect(actor.conns.size).toBe(originalConnCount);
  expect(actor.__connDisconnected).toHaveBeenCalledTimes(0); // Because conn was never created
});

📋 Recommendations

  1. Add tests for the connection cleanup error path (as outlined above)
  2. Optional: Update the comment on line 167 to be more specific about why we're tracking the connection
  3. Consider: Adding a structured log when cleanup happens due to createConn failure vs normal close

Final Assessment

Overall: Excellent fix

This PR correctly addresses a real resource leak bug with a clean, simple solution. The use of finally is the right approach here, and the code is more maintainable than the original.

The only significant gap is test coverage for this specific error path. I'd recommend adding tests before merging to prevent regressions.

Approve with suggestion to add tests.


Review generated by Claude Code - rivetkit-typescript/packages/rivetkit/src/actor/router-endpoints.ts:163-314

graphite-app bot pushed a commit that referenced this pull request Nov 4, 2025
@graphite-app graphite-app bot closed this Nov 4, 2025
@graphite-app graphite-app bot deleted the 11-03-fix_rivetkit_fix_conns_not_being_cleaned_up_if_actor.createconn_errors branch November 4, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants