Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Nov 3, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Ready Ready Preview Comment Nov 4, 2025 7:25pm
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 4, 2025 7:25pm
rivet-inspector Ignored Ignored Preview Nov 4, 2025 7:25pm
rivet-site Ignored Ignored Preview Nov 4, 2025 7:25pm

Copy link
Member Author

NathanFlurry commented Nov 3, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 3, 2025

Code Review

Thanks for this PR! The changes improve the sleep timer logic by tracking Hono HTTP requests instead of raw fetch requests, which makes sense. However, I've identified one critical bug that needs to be fixed before merging.


🐛 Critical Bug

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1889

__endHonoHttpRequest() {
    this.#activeHonoHttpRequests = this.#activeHonoHttpRequests;  // ❌ BUG: This doesn't decrement!
    if (this.#activeHonoHttpRequests < 0) {
        // ...
    }
}

Issue: Line 1889 assigns the counter to itself instead of decrementing it. This means:

  • The counter will only increase, never decrease
  • Actors will never be able to sleep after the first request
  • The < 0 check will never trigger

Fix:

__endHonoHttpRequest() {
    this.#activeHonoHttpRequests--;  // ✅ Decrement the counter
    if (this.#activeHonoHttpRequests < 0) {
        // ...
    }
}

Or more defensively:

__endHonoHttpRequest() {
    this.#activeHonoHttpRequests = Math.max(0, this.#activeHonoHttpRequests - 1);
    if (this.#activeHonoHttpRequests < 0) {  // This check becomes redundant with Math.max
        // ...
    }
}

✅ Positive Observations

  1. Better abstraction: The rename from sleep to startSleep and shutdown to shutdownRunner makes the API clearer about what these methods actually do.

  2. Improved sleep logic: Tracking Hono HTTP requests specifically (via middleware) is more accurate than tracking raw fetch calls, especially for inspector requests.

  3. Good documentation: The comments explaining the sleep mechanism are helpful.

  4. Simplified async handling: Removing unnecessary async/await from _sleep() (lines 1962, 1979, 1983) is a good simplification since the driver handles the async work.


💡 Minor Suggestions

  1. Redundant check (line 1890-1896): With the fix above, if you use Math.max(0, this.#activeHonoHttpRequests - 1), the subsequent < 0 check becomes impossible to trigger and could be removed. If you prefer the defensive check with warning, the current approach is fine.

  2. Error handling removed (lines 1933-1935): The previous version had error handling for the sleep operation. Consider if this could mask issues:

    // Old code had:
    this._sleep().catch((error) => {
        this.#rLog.error({ msg: "error during sleep", error: stringifyError(error) });
    });

    Since startSleep is now synchronous and delegates to the driver, this might be fine. Just verify the driver handles errors appropriately.

  3. Timeout value change in example: The sleepTimeout change from 10s to 5s in examples/kitchen-sink/src/backend/actors/demo.ts seems unrelated to the main fix. Consider if this should be in a separate commit or if there's a reason for this change.


🧪 Testing Recommendations

After fixing the bug, please test:

  1. Multiple sequential requests complete and actor can sleep afterward
  2. Concurrent requests are tracked correctly
  3. Inspector requests properly reset the sleep timer
  4. The negative counter warning works if there's a middleware ordering issue

Summary

Must fix before merge: The decrement bug on line 1889.

Once that's fixed, this PR will properly solve the issue of inspector requests not resetting the sleep timer and improve the overall sleep management logic.


Review generated with Claude Code

@NathanFlurry NathanFlurry force-pushed the 11-02-fix_rivetkit_prevent_sleeping_if_there_are_any_action_hono_requests_fixes_inspector_requests_not_resetiting_sleep_timer branch from 31cb2fa to 3842c8f Compare November 4, 2025 00:36
@claude
Copy link

claude bot commented Nov 4, 2025

PR Review: Fix RivetKit Sleep Timer for Inspector Requests

Summary

This PR fixes an issue where inspector requests (and other Hono-routed HTTP requests) were not properly resetting the actor sleep timer. The solution replaces tracking of raw fetch requests with tracking of Hono HTTP requests via middleware.


Positive Aspects

Correct Problem Identification: The PR correctly identifies that the previous approach of tracking #activeRawFetchCount in the handleRawFetch method was incomplete, as it didn't cover all HTTP requests going through the Hono router (like inspector requests).

Clean Architecture: Using Hono middleware to track requests is the right approach - it's centralized and catches all HTTP requests uniformly.

Defensive Programming: The __endHonoHttpRequest method includes a guard against the counter going negative with appropriate logging.

Proper Synchronization: The change from async _sleep() to synchronous _sleep() with setImmediate is correct, as startSleep is now synchronous.

Better Documentation: Improved comments and JSDoc for startSleep and variable naming.


Issues & Concerns

1. CRITICAL: Potential Race Condition in Middleware

router.use("*", async (c, next) => {
    const actor = await actorDriver.loadActor(c.env.actorId);
    actor.__beginHonoHttpRequest();
    try {
        await next();
    } finally {
        actor.__endHonoHttpRequest();
    }
});

Problem: If an exception is thrown in loadActor, the __beginHonoHttpRequest is never called, but if it succeeds and then next() throws, __endHonoHttpRequest will be called without a matching begin.

Impact: This could cause the counter to go negative (which is caught by your guard) or allow the actor to sleep prematurely.

Recommendation: Restructure to ensure symmetry:

router.use("*", async (c, next) => {
    const actor = await actorDriver.loadActor(c.env.actorId);
    actor.__beginHonoHttpRequest();
    try {
        await next();
    } finally {
        actor.__endHonoHttpRequest();
    }
});

Actually, looking at this more carefully - if loadActor throws, the finally block won't execute, so this is actually fine. However, you should verify that loadActor cannot return a cached actor instance in an inconsistent state.

2. WebSocket Requests May Be Miscounted

Looking at router.ts:135-188, WebSocket upgrade requests go through the Hono router initially:

router.get(PATH_CONNECT_WEBSOCKET, async (c) => {
    const upgradeWebSocket = runConfig.getUpgradeWebSocket?.();
    // ...
});

Problem: WebSocket upgrade requests will increment #activeHonoHttpRequests when they start, but may decrement it when the HTTP handshake completes, even though the WebSocket connection remains open. This could allow the actor to sleep while WebSocket connections are active.

Current Mitigation: You have #activeRawWebSockets tracking which prevents sleep, so this may not be a bug in practice.

Recommendation: Add a comment explaining that WebSocket upgrades briefly increment the HTTP counter but are ultimately tracked separately via #activeRawWebSockets.

3. Missing Error Handling in _sleep

The previous version had explicit error handling:

this._sleep().catch((error) => {
    this.#rLog.error({
        msg: "error during sleep",
        error: stringifyError(error),
    });
});

Now it's just:

setImmediate(() => {
    sleep();
});

Problem: If sleep() throws synchronously (before setImmediate), there's no error handling. Since startSleep is now synchronous (void return), errors would be unhandled.

Recommendation: Add try-catch in the setImmediate callback:

setImmediate(() => {
    try {
        sleep();
    } catch (error) {
        this.#rLog.error({
            msg: "error starting sleep",
            error: stringifyError(error),
        });
    }
});

4. Inconsistent Method Naming Convention

__beginHonoHttpRequest()
__endHonoHttpRequest()

Issue: These use double underscore prefix, which typically indicates "very private" or "name-mangled" methods in JavaScript/TypeScript. However, they're being called from outside the class (from router middleware).

Recommendation:

  • Either make these truly private and access via a public method
  • Or use single underscore (like _sleep, _stop) to indicate "internal but accessible"
  • Or rename to something like notifyHttpRequestStart/End

5. Minor: Removed Error Handling Context

In handleRawFetch, the removed code had:

} finally {
    this.#activeRawFetchCount = Math.max(0, this.#activeRawFetchCount - 1);
    this.#resetSleepTimer();
    this.#savePersistThrottled();
}

Now it's just:

} finally {
    this.#savePersistThrottled();
}

Question: Is it still necessary to call this.#resetSleepTimer() here, or is that now fully handled by the middleware? If handleRawFetch can be called outside of the Hono router, you may need to keep this call.


Testing Recommendations

Test Cases to Add:

  1. Inspector Request Sleep Test: Verify that making an inspector request resets the sleep timer
  2. Concurrent Request Test: Multiple overlapping HTTP requests should keep incrementing/decrementing correctly
  3. WebSocket + HTTP Test: Verify that WebSocket upgrades don't break the HTTP request counting
  4. Error During Request Test: Ensure that if a request handler throws, the counter still decrements
  5. Sleep Timeout Verification: Verify that the sleep timeout value is actually respected (you reduced it from 10s to 5s in the example)

Code Quality

  • ✅ Follows TypeScript conventions
  • ✅ Uses structured logging appropriately
  • ✅ Good use of private fields
  • ⚠️ Could use more inline comments explaining the sleep state machine
  • ⚠️ Method naming could be more consistent

Performance Considerations

  • ✅ No performance regressions expected
  • ✅ Middleware approach is efficient (single actor load per request)
  • ✅ Integer increment/decrement is negligible overhead

Security Considerations

No security issues identified. The sleep mechanism is an internal optimization and doesn't expose attack surface.


Overall Assessment

Verdict: Approve with minor suggestions

The core fix is sound and addresses the stated problem. The main concerns are:

  1. Potential edge case with error handling in _sleep
  2. Method naming convention
  3. Could benefit from additional test coverage

The critical path (Hono middleware tracking) is correct and should fix the inspector request issue.


Recommendation: Consider the error handling suggestion in _sleep before merging. The other issues are minor and could be addressed in follow-up PRs if needed.

@NathanFlurry NathanFlurry force-pushed the 11-02-chore_rivetkit_pass_request_id_from_driver_to_conn_so_we_can_correlate_underlying_requests_with_connections branch from fe12a10 to 6282237 Compare November 4, 2025 18:59
@NathanFlurry NathanFlurry force-pushed the 11-02-fix_rivetkit_prevent_sleeping_if_there_are_any_action_hono_requests_fixes_inspector_requests_not_resetiting_sleep_timer branch from 3842c8f to 5fc0aa1 Compare November 4, 2025 18:59
@claude
Copy link

claude bot commented Nov 4, 2025

Pull Request Review

Summary

This PR fixes two critical bugs in the actor sleep mechanism:

  1. Inspector requests were not resetting the sleep timer - The old implementation tracked raw fetch requests, but inspector requests through the Hono router weren't being counted
  2. Sleep prevention during active Hono HTTP requests - The middleware now properly tracks all HTTP requests through the router

Code Quality ✅

Strengths:

  • Clear separation of concerns: HTTP request tracking moved to router middleware where it belongs
  • Better naming: #activeRawFetchCount#activeHonoHttpRequests more accurately describes what's being tracked
  • Improved documentation with comprehensive comments
  • Defensive programming with bounds checking in __endHonoHttpRequest()
  • Clean API refactoring: sleep()startSleep(), shutdown()shutdownRunner() improves clarity

Good practices observed:

  • Added structured logging with EXTRA_ERROR_LOG for debugging
  • Added timeout logging in #resetSleepTimer() for better observability
  • Simplified async handling by making _sleep() synchronous since it just delegates to startSleep()

Potential Issues 🔍

1. Race condition in middleware (Minor)
In router.ts:82-90, there's a potential edge case:

const actor = await actorDriver.loadActor(c.env.actorId);
actor.__beginHonoHttpRequest();
try {
    await next();
} finally {
    actor.__endHonoHttpRequest();
}

If next() throws before the request handler starts, the counter is still incremented. This is probably acceptable, but worth noting.

2. Public-like method naming (Stylistic)
__beginHonoHttpRequest() and __endHonoHttpRequest() use double underscore prefix, which in JavaScript/TypeScript typically indicates "really private, don't touch". Since these are called from router middleware, consider:

  • Moving to a friend-class pattern, or
  • Documenting that these are internal APIs not meant for external use

3. Removed error logging in _sleep() (Minor concern)
The old code had:

this._sleep().catch((error) => {
    this.#rLog.error({ msg: "error during sleep", error: stringifyError(error) });
});

This was removed since _sleep() is now synchronous. However, if startSleep() can throw synchronously, errors would now be unhandled. Consider wrapping in try-catch if this is possible.

Performance Considerations ✅

  • Middleware overhead is minimal (just increment/decrement)
  • Removed unnecessary async/await in sleep path
  • No performance concerns

Security Concerns ✅

No security issues identified. The changes are purely internal lifecycle management.

Test Coverage ⚠️

Missing test coverage:

  1. No test for the core bug fix - Should add a test that verifies inspector/action requests prevent sleep
  2. No test for request counting edge cases - What happens if __endHonoHttpRequest() is called without a matching begin?
  3. No test for concurrent requests - Verify counter handles multiple simultaneous requests correctly

Suggested test cases:

// Test that HTTP requests prevent sleep
test('actor should not sleep during active HTTP request', async () => {
    // Start long-running request
    // Verify actor doesn't sleep during request
    // Request completes
    // Verify actor can now sleep
});

// Test that inspector requests reset timer
test('inspector requests should reset sleep timer', async () => {
    // Make inspector request
    // Verify sleep timer was reset
});

// Test concurrent requests
test('multiple concurrent requests should be tracked correctly', async () => {
    // Start 3 concurrent requests
    // Verify counter = 3
    // Complete 1 request
    // Verify counter = 2
    // Verify actor still can't sleep
    // Complete remaining requests
    // Verify actor can sleep
});

Architecture ✅

The refactoring improves the architecture:

  • Before: Request tracking was split between handleRawFetch() and the router
  • After: Centralized tracking in router middleware that catches ALL HTTP requests
  • Benefit: Inspector requests, health checks, and any other HTTP traffic now correctly prevent sleep

Recommendations

Must address:

  • Add test coverage for the bug fix (high priority)

Should consider:

  • Add error handling around startSleep() call if it can throw
  • Document the __beginHonoHttpRequest / __endHonoHttpRequest API contract

Nice to have:

  • Add integration test for sleep timer behavior with various request types
  • Consider adding metrics/traces for request count tracking

Verdict

Approved with suggestions

This is a solid bug fix that addresses real issues with the sleep mechanism. The code quality is good, and the architectural changes are improvements. The main gap is test coverage - please add tests to verify the fix works as intended and prevent regressions.


Review completed with adherence to Rivet conventions per CLAUDE.md

@claude
Copy link

claude bot commented Nov 4, 2025

PR Review

Summary

This PR fixes a critical bug where inspector requests (and other Hono action requests) weren't resetting the sleep timer. The fix refactors the sleep prevention mechanism from tracking raw fetch requests to tracking Hono HTTP requests via middleware.


✅ Strengths

  1. Cleaner Architecture: The new middleware approach in router.ts:82-90 is more elegant than the previous scattered tracking logic in handleRawFetch.

  2. Better Naming:

    • #activeRawFetchCount#activeHonoHttpRequests is much clearer about what's being tracked
    • sleep()startSleep() better conveys the async/non-blocking nature
    • shutdown()shutdownRunner() clarifies what's being shut down
  3. Error Handling: The counter underflow protection at instance.ts:1890-1896 is good defensive programming.

  4. Documentation: Added helpful JSDoc comments explaining the new methods.


🐛 Potential Issues

Critical: Race Condition in Request Tracking

// router.ts:82-90
router.use("*", async (c, next) => {
    const actor = await actorDriver.loadActor(c.env.actorId);
    actor.__beginHonoHttpRequest();
    try {
        await next();
    } finally {
        actor.__endHonoHttpRequest();
    }
});

Problem: If an exception is thrown in loadActor(), the counter won't be incremented but execution continues to the next middleware. If somehow a request gets through with a failed actor load, the counter tracking becomes inconsistent.

Recommendation: Add error handling for the actor load:

router.use("*", async (c, next) => {
    const actor = await actorDriver.loadActor(c.env.actorId);
    if (!actor) {
        throw new Error("Actor not found");
    }
    actor.__beginHonoHttpRequest();
    try {
        await next();
    } finally {
        actor.__endHonoHttpRequest();
    }
});

Minor: Inconsistent Error Handling

In instance.ts:1930-1933, the _sleep() method previously had error handling:

// Old code (removed):
this._sleep().catch((error) => {
    this.#rLog.error({
        msg: "error during sleep",
        error: stringifyError(error),
    });
});

Now _sleep() is synchronous and calls startSleep() which is also synchronous. However, if startSleep() throws synchronously, it won't be caught.

Recommendation: Add try-catch in the setImmediate callback at instance.ts:1979-1983:

setImmediate(() => {
    try {
        sleep();
    } catch (error) {
        this.#rLog.error({
            msg: "error starting sleep",
            error: stringifyError(error),
        });
    }
});

⚡ Performance Considerations

  1. Middleware Overhead: The new middleware runs on every request ("*" path). This adds two async operations per request:

    • loadActor() call
    • Counter increment/decrement

    This is acceptable but worth noting for high-traffic scenarios. The previous approach had similar overhead but was more scattered.

  2. Sleep Timer Churn: Every request now calls #resetSleepTimer() twice (begin + end). The method clears and recreates timeouts frequently. Consider debouncing if this becomes a performance issue with high request rates.


🔒 Security Considerations

No significant security concerns. The changes are internal refactoring without new attack surfaces.


🧪 Test Coverage

Missing: The PR doesn't add tests for the new behavior. Consider adding:

  1. Unit test: Verify that HTTP requests prevent sleep:

    test("actor should not sleep during active HTTP request", async () => {
        // Start request
        // Verify #canSleep() returns false
        // End request  
        // Verify #canSleep() returns true
    });
  2. Integration test: Verify inspector requests reset the sleep timer (the original bug).

  3. Edge case test: Verify counter doesn't go negative if __endHonoHttpRequest is called without matching __beginHonoHttpRequest.


📋 Minor Suggestions

  1. instance.ts:197-201: The comment is excellent, but consider adding that WebSocket connections are tracked separately in #activeRawWebSockets.

  2. driver.ts:43: The comment says "This will call _stop independently" but it's unclear if this is a guarantee or just a note about expected behavior. Consider clarifying.

  3. examples/kitchen-sink/src/backend/actors/demo.ts:160: The sleep timeout was reduced from 10s to 5s. This seems unrelated to the fix - was this intentional or should it be in a separate commit?

  4. Method Naming: The double underscore prefix (__beginHonoHttpRequest) typically indicates "very private" in JS/TS conventions. Consider if a single underscore is sufficient since these are called from the same package.


📝 Summary

Overall Assessment: ✅ Approve with Minor Suggestions

The fix correctly addresses the root cause and improves code clarity. The main concerns are:

  • Potential race condition in actor loading (critical to fix)
  • Missing error handling in _sleep (minor)
  • Lack of test coverage for the new behavior

The refactoring is a net improvement to maintainability. Once the race condition is addressed, this is ready to merge.


Files Reviewed: ✅ All 7 changed files

@NathanFlurry NathanFlurry force-pushed the 11-02-chore_rivetkit_pass_request_id_from_driver_to_conn_so_we_can_correlate_underlying_requests_with_connections branch from 6282237 to d22a4df Compare November 4, 2025 19:24
@NathanFlurry NathanFlurry force-pushed the 11-02-fix_rivetkit_prevent_sleeping_if_there_are_any_action_hono_requests_fixes_inspector_requests_not_resetiting_sleep_timer branch from 5fc0aa1 to 91f44ec Compare November 4, 2025 19:24
@claude
Copy link

claude bot commented Nov 4, 2025

Code Review

Summary

This PR fixes a critical bug in the actor sleep logic. Previously, the sleep timer was only being reset for raw fetch requests, but not for Hono-routed HTTP requests (including inspector requests). This meant that inspector requests and other action requests weren't preventing actors from sleeping, which could cause actors to sleep while still handling active requests.

Changes Overview

  1. Replaced raw fetch tracking with Hono HTTP request tracking - The old #activeRawFetchCount is replaced with #activeHonoHttpRequests
  2. Added middleware to track all HTTP requests - A new middleware in router.ts wraps all requests to properly track active HTTP requests
  3. Renamed driver methods for clarity - sleep()startSleep() and shutdown()shutdownRunner() for better semantics
  4. Made startSleep() synchronous - Changed from async to sync since it just initiates the sleep sequence

Positive Aspects ✅

  1. Root cause fix: This properly addresses the issue where inspector requests weren't resetting the sleep timer by tracking all HTTP requests through the Hono router.

  2. Better naming: The rename from sleep() to startSleep() and shutdown() to shutdownRunner() is much clearer about what these methods do.

  3. Defense in depth: The guard against negative request counts (instance.ts:1890-1896) is good defensive programming.

  4. Proper middleware placement: The middleware is placed after the logger but before all other routes, ensuring all HTTP requests are tracked.

  5. Good documentation: Added clear JSDoc comments explaining the purpose of the new methods.


Potential Issues ⚠️

1. Race condition in middleware (Medium severity)

In router.ts:82-90, the middleware loads the actor and then tracks the request:

const actor = await actorDriver.loadActor(c.env.actorId);
actor.__beginHonoHttpRequest();
try {
    await next();
} finally {
    actor.__endHonoHttpRequest();
}

Issue: If loadActor() throws an error, the request counter won't be incremented but the finally block won't run either (which is correct). However, if next() throws after __beginHonoHttpRequest(), the finally block will properly decrement. This seems correct, but consider if there's any edge case where loadActor succeeds, counter increments, but the actor gets destroyed before next() completes.

Recommendation: This is likely fine as-is, but consider adding a try-catch around loadActor if actor loading failures should be handled differently.

2. WebSocket upgrade requests may not be tracked correctly (High severity)

Looking at the WebSocket handlers in router.ts:136-201 and router.ts:286-315, the WebSocket upgrade happens inside the route handler via upgradeWebSocket().

Issue: When a WebSocket connection is established, the HTTP upgrade request passes through the middleware (incrementing the counter), but once upgraded, the connection becomes a long-lived WebSocket. The middleware's finally block will execute when the upgrade completes, decrementing the counter, even though the WebSocket connection is still active.

However, looking at instance.ts:1956, there's already separate tracking for raw WebSockets via #activeRawWebSockets, so this might be intentional. But the connection-based WebSockets (PATH_CONNECT_WEBSOCKET) use #connections which are checked at instance.ts:1943-1949.

Recommendation: Verify that WebSocket upgrade requests don't cause false negatives in the sleep logic. The current implementation may be correct since WebSockets are tracked separately, but this should be tested.

3. Inspector requests now properly prevent sleep (Confirm intended behavior)

The PR title mentions "fixes inspector requests not resetting sleep timer", which suggests this is the desired behavior. However, I want to confirm: Should inspector requests (like debugging/introspection) prevent an actor from sleeping?

If the inspector is actively being used, yes, this makes sense. But if someone just has an inspector tab open but isn't actively using it, should that prevent sleep? This might be worth considering for future optimization.

4. Raw HTTP endpoint behavior (Low severity)

The /raw/http/* endpoint at router.ts:252-284 calls actor.handleFetch() directly. Looking at the old code, handleFetch used to track #activeRawFetchCount and reset the sleep timer. Now it doesn't do either.

Issue: Raw HTTP requests are now only tracked by the middleware (which is correct), but handleFetch no longer resets the sleep timer itself. This should be fine since the middleware handles it, but the removal of the sleep timer reset in handleFetch should be intentional.

Looking at the diff: The old code in instance.ts:1517-1542 had:

this.#activeRawFetchCount++;
this.#resetSleepTimer();
// ... handler ...
this.#activeRawFetchCount--;
this.#resetSleepTimer();

This is now removed, and all tracking happens in the middleware. This is correct IF all raw HTTP requests go through the router middleware. Verify that handleFetch is only called from the router and not from other code paths.

5. Missing timer reset in new methods (Low severity)

The new __beginHonoHttpRequest() and __endHonoHttpRequest() both call #resetSleepTimer(), which is good. However, __beginHonoHttpRequest increments then resets, while __endHonoHttpRequest decrements then resets.

Question: Should the sleep timer be reset when a request ends, or only when the count goes to zero? The current implementation resets on every begin/end, which might be more aggressive than needed, but is safer.

Recommendation: Current behavior is fine and safer.


Performance Considerations

  1. Middleware overhead: The new middleware adds an await actorDriver.loadActor() call to every HTTP request. If actor loading is cached/fast, this should be fine. If not, this could add latency to all requests.

  2. Sleep timer churn: Every request now resets the sleep timer twice (begin and end). For high-traffic actors, this means lots of clearTimeout/setTimeout calls. This is probably fine for most use cases, but could be optimized if needed.


Security Concerns

No security issues identified. The changes don't introduce any new attack vectors.


Test Coverage

I notice that the PR includes a test file change (driver-engine.test.ts) but it only updates the cleanup call from shutdown to shutdownRunner.

Recommendations for additional tests:

  1. Test that HTTP requests prevent sleep
  2. Test that inspector requests prevent sleep
  3. Test that the request counter doesn't go negative under normal circumstances
  4. Test that actors can sleep after all HTTP requests complete
  5. Test WebSocket upgrade behavior with the new sleep logic
  6. Test concurrent requests properly track count

Code Style & Best Practices

  1. ✅ Follows TypeScript conventions
  2. ✅ Good use of private fields with # prefix
  3. ✅ Proper error handling with defensive guard
  4. ✅ Clear naming conventions
  5. ✅ JSDoc comments added

Minor Suggestions

  1. Consider renaming __beginHonoHttpRequest and __endHonoHttpRequest to _beginHttpRequest and _endHttpRequest for consistency with other internal methods (single underscore vs double).

  2. Log message enhancement at instance.ts:1920: The log now includes timeout which is helpful. Consider also logging activeHonoHttpRequests for debugging:

    this.#rLog.debug({
        msg: "resetting sleep timer",
        canSleep,
        existingTimeout: !!this.#sleepTimeout,
        timeout: this.#config.options.sleepTimeout,
        activeHonoHttpRequests: this.#activeHonoHttpRequests, // Add this
    });
  3. Comment update needed at instance.ts:197-199: The comment says "Does not include WebSockets" but the middleware runs for WebSocket upgrade requests too. Consider clarifying:

    /**
     * Track active HTTP requests through Hono router so sleep logic can
     * account for them. Includes WebSocket upgrade requests but not the
     * WebSocket connections themselves (those are tracked separately).
     */

Conclusion

This is a solid bug fix that addresses a real issue with inspector requests not preventing actor sleep. The core logic is sound, but there are a few edge cases to verify:

  1. Approve with minor concerns - The main logic is correct
  2. ⚠️ Verify WebSocket upgrade behavior - Ensure upgrade requests don't cause issues
  3. ⚠️ Verify handleFetch call paths - Ensure all calls go through middleware
  4. 💡 Consider adding tests - Sleep behavior should have comprehensive test coverage

Overall assessment: The changes improve the codebase and fix a legitimate bug. The renaming improves clarity, and the tracking mechanism is more accurate. Recommended to merge after addressing the WebSocket upgrade verification.


Great work on identifying and fixing this issue! 🎯

@claude
Copy link

claude bot commented Nov 4, 2025

Code Review

Thanks for this PR! This is a solid fix for preventing actors from sleeping during active HTTP requests. Here's my review:

✅ Strengths

  1. Correct Architecture - Moving from tracking raw fetch calls to tracking Hono HTTP requests at the middleware level is the right approach. This ensures all HTTP requests (including inspector requests) properly prevent sleeping.

  2. Clean Implementation - The middleware pattern in router.ts:82-90 is clean and handles the request lifecycle correctly with proper try/finally cleanup.

  3. Good Error Handling - The defensive check in __endHonoHttpRequest() that prevents the counter from going negative and logs a warning is excellent defensive programming.

  4. API Clarity - Renaming sleep()startSleep() and shutdown()shutdownRunner() with improved documentation makes the API intent much clearer.

  5. Simplified Sleep Logic - Removing the async/await and error handling from _sleep() makes sense since the driver handles those concerns.

🔍 Observations & Suggestions

1. Potential Race Condition (Minor)

In router.ts:83-89, there's a theoretical race where if loadActor() throws after the request has started processing, the counter won't be incremented but __endHonoHttpRequest() could still be called if the actor was partially loaded. This is probably fine given your defensive counter check, but consider whether you want to move the increment inside a try block:

router.use("*", async (c, next) => {
    const actor = await actorDriver.loadActor(c.env.actorId);
    actor.__beginHonoHttpRequest();
    try {
        await next();
    } finally {
        actor.__endHonoHttpRequest();
    }
});

Current code assumes loadActor always succeeds before calling __beginHonoHttpRequest(). If loadActor can fail, the current code is fine. If it can partially succeed, you might want the increment after the load succeeds.

2. Method Naming Convention (Nitpick)

The double underscore prefix (__beginHonoHttpRequest, __endHonoHttpRequest) typically indicates "private but accessible" or "internal API" in TypeScript/JavaScript. Since these are intentionally called from outside the class (from router middleware), consider:

  • Single underscore for "internal but not truly private"
  • Or make them public without underscores if they're part of the supported internal API
  • Or add a comment explaining the naming convention

Current naming is fine, just a style consideration.

3. Documentation Enhancement

The comment on #activeHonoHttpRequests is great! Consider adding a brief comment explaining why HTTP requests are tracked separately from WebSockets (because WS hibernation is planned but not yet implemented).

4. Test Coverage

I don't see test changes for the new behavior. Consider adding a test that:

  • Starts an HTTP request
  • Verifies the actor doesn't sleep while the request is active
  • Completes the request
  • Verifies the actor can now sleep (if no other activity)

This would prevent regression of the inspector bug.

🐛 Potential Issues

None identified - The code looks solid!

🔒 Security

No security concerns identified.

⚡ Performance

The middleware adds minimal overhead (just incrementing/decrementing a counter), which is negligible. The change should actually improve performance by preventing unnecessary wake-from-sleep cycles when inspector requests come in.

📝 Additional Notes

  1. Sleep timeout change - The change from 10s → 5s in demo.ts is unrelated to the main fix. Consider whether this should be in a separate commit or if there's a reason for it.

  2. Removed error handling - You removed the .catch() block from the _sleep() call in #resetSleepTimer(). Make sure errors in the sleep sequence are handled properly by the driver. Based on the code, this looks intentional since startSleep is now synchronous and the driver handles errors.

Summary

Recommendation: ✅ Approve with minor suggestions

The fix correctly addresses the root cause by tracking HTTP requests at the right abstraction level. The code quality is good with proper error handling and clear documentation. My suggestions are mostly nitpicks around defensive programming and testing.

Great work on the improved API naming and simplified async handling!

@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 4, 2025

Merge activity

  • Nov 4, 8:09 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 4, 8:11 PM UTC: CI is running for this pull request on a draft pull request (#3349) due to your merge queue CI optimization settings.
  • Nov 4, 8:41 PM UTC: The Graphite merge queue removed this pull request due to removal of a downstack PR #3342.
  • Nov 4, 8:49 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 4, 8:50 PM UTC: CI is running for this pull request on a draft pull request (#3352) due to your merge queue CI optimization settings.
  • Nov 4, 8:52 PM UTC: Merged by the Graphite merge queue via draft PR: #3352.

graphite-app bot pushed a commit that referenced this pull request Nov 4, 2025
…, fixes inspector requests not resetiting sleep timer (#3319)
…, fixes inspector requests not resetiting sleep timer
@NathanFlurry NathanFlurry force-pushed the 11-02-chore_rivetkit_pass_request_id_from_driver_to_conn_so_we_can_correlate_underlying_requests_with_connections branch from d22a4df to 82a8b53 Compare November 4, 2025 20:25
@NathanFlurry NathanFlurry force-pushed the 11-02-fix_rivetkit_prevent_sleeping_if_there_are_any_action_hono_requests_fixes_inspector_requests_not_resetiting_sleep_timer branch from 91f44ec to 40b7e52 Compare November 4, 2025 20:25
@vercel
Copy link

vercel bot commented Nov 4, 2025

Deployment failed with the following error:

Too many requests - try again in 5 minutes (more than 120, code: "api-deployments-flood-pro").

Learn More: https://vercel.com/docs/limits#rate-limits

@claude
Copy link

claude bot commented Nov 4, 2025

Code Review for PR #3319

Summary

This PR fixes an important bug where inspector requests were not resetting the sleep timer, potentially causing actors to sleep during active inspector sessions. The solution refactors the sleep prevention logic from tracking raw fetch requests to tracking HTTP requests through the Hono router middleware.

Positive Changes

  1. Better Architecture: Moving from tracking #activeRawFetchCount to #activeHonoHttpRequests is a cleaner approach that captures all HTTP traffic through the router, including inspector requests.

  2. Proper Middleware Integration: The middleware at router.ts:82-90 correctly wraps all HTTP requests, ensuring consistent tracking.

  3. Good Error Handling: The check at instance.ts:1890-1896 that warns if the counter goes below zero is a good defensive programming practice.

  4. Improved Documentation: The JSDoc comments clearly explain the purpose of the new methods and renamed driver methods.

  5. Synchronous Sleep Initiation: Changing _sleep() from async to sync (with setImmediate for deferred execution) is more correct since the actual sleep work happens in the driver.

Issues & Concerns

1. Race Condition in Middleware (Medium Priority)

The middleware at router.ts:82-90 has a potential issue:

router.use("*", async (c, next) => {
    const actor = await actorDriver.loadActor(c.env.actorId);
    actor.__beginHonoHttpRequest();
    try {
        await next();
    } finally {
        actor.__endHonoHttpRequest();
    }
});

Issue: If the request handler calls _sleep() synchronously within the request, the finally block will still execute and decrement the counter, potentially allowing the actor to sleep before the HTTP response is fully sent.

Suggestion: Consider if this is the intended behavior. If requests should prevent sleep until fully complete, this is correct. Document this behavior if intentional.

2. WebSocket Upgrade Handling (Low Priority)

WebSocket upgrades start as HTTP requests but transition to WebSocket connections. The middleware will:

  • Call __beginHonoHttpRequest() when the upgrade request starts
  • Call __endHonoHttpRequest() when the upgrade completes
  • Then #activeRawWebSockets tracking takes over

This seems correct, but verify that there's no gap between __endHonoHttpRequest() and the websocket being added to #activeRawWebSockets where the actor could sleep.

Location: instance.ts:1563 where websocket is added to tracking.

3. Inconsistent Error Removal (Minor)

At instance.ts:1530-1542, the error handling in handleFetch() now differs from the previous version. Previously, both success and error paths would trigger sleep timer reset via the finally block. Now there's no sleep timer interaction at all.

Question: Is this intentional? The new middleware-based approach should handle it, but worth confirming that raw fetch handlers (if any exist outside the router) are covered.

4. Missing Test Coverage (Medium Priority)

The PR doesn't add tests for:

  • Inspector requests resetting the sleep timer
  • HTTP requests preventing sleep
  • The counter not going below zero

Suggestion: Add integration tests to verify the fix works as intended.

5. Typo in Log Message (Minor)

At instance.ts:1655, there's a copy-paste error in the log message:

this.#rLog.debug({
    msg: "actor instance onSocketMessage",  // Should be "onSocketClosed"
    rivetRequestId,
    // ...
});

Note: This existed before this PR, but worth fixing.

Performance Considerations

  • The middleware adds minimal overhead (one counter increment/decrement per request)
  • No significant performance concerns

Security Considerations

  • No security issues identified
  • The counter overflow protection is good defensive programming

Style & Conventions

✅ Follows Rivet conventions:

  • Uses structured logging with tracing
  • Lowercase log messages
  • Private fields with # prefix
  • Good JSDoc comments

Recommendations

  1. Add tests for the inspector sleep timer bug fix
  2. Verify WebSocket upgrade timing to ensure no sleep gaps
  3. Document the middleware behavior regarding in-flight sleep calls
  4. Consider: Should handleFetch() for raw fetch handlers still interact with sleep timer, or is it fully replaced by middleware?

Verdict

Approve with minor suggestions. The core fix is sound and addresses the reported issue. The recommendations above are mostly about edge cases and test coverage improvements.

🤖 Generated with Claude Code

graphite-app bot pushed a commit that referenced this pull request Nov 4, 2025
…, fixes inspector requests not resetiting sleep timer (#3319)
@graphite-app graphite-app bot closed this Nov 4, 2025
@graphite-app graphite-app bot deleted the 11-02-fix_rivetkit_prevent_sleeping_if_there_are_any_action_hono_requests_fixes_inspector_requests_not_resetiting_sleep_timer branch November 4, 2025 20:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants