Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Oct 31, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Ready Ready Preview Comment Nov 4, 2025 8:26pm
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 4, 2025 8:26pm
rivet-inspector Ignored Ignored Preview Nov 4, 2025 8:26pm
rivet-site Ignored Ignored Preview Nov 4, 2025 8:26pm

Copy link
Member Author

NathanFlurry commented Oct 31, 2025

Warning

This pull request is not mergeable via GitHub because a downstack PR is open. Once all requirements are satisfied, merge this PR as a stack on Graphite.
Learn more


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Oct 31, 2025

PR Review: WebSocket Hibernation Integration

I've reviewed this PR and found the implementation to be well-structured overall. Here are my findings organized by category:

✅ Strengths

  1. Backward Compatibility: The schema migration from v1 to v2 is handled properly with an empty array default for hibernatableWebSocket, ensuring existing persisted actors can be deserialized without issues.

  2. Clean Architecture: The separation of concerns between the driver, instance, and connection layers is maintained well. The PERSIST_SYMBOL approach for accessing internal state is a good pattern.

  3. Logging: Good use of structured logging throughout, especially in the hibernation lifecycle tracking.

🐛 Critical Issues

1. Race Condition in getActorHibernationConfig (actor-driver.ts:259-264)

Severity: HIGH

The function always pushes a new hibernatable WebSocket entry to the array, even when one already exists:

// Save hibernatable WebSocket
handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
    requestId,
    lastSeenTimestamp: BigInt(Date.now()),
    msgIndex: -1n,
});

This happens AFTER checking for existingWs. If a WebSocket reconnects or this function is called multiple times for the same requestId, you'll accumulate duplicate entries.

Recommendation: Only push if !existingWs:

if (!existingWs) {
    handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
        requestId,
        lastSeenTimestamp: BigInt(Date.now()),
        msgIndex: -1n,
    });
}

2. Inconsistent Sleep Logic (instance.ts:1912)

Severity: MEDIUM

There's a contradiction in #canSleep():

// Line 1904: Allow hibernatable connections to sleep
if (conn.status === "connected" && !conn.isHibernatable)
    return false;

// Line 1912: Block sleep for ANY raw websockets
if (this.#activeRawWebSockets.size > 0) return false;

If raw WebSockets can be hibernatable, they should be removed from #activeRawWebSockets when hibernating, or the check at line 1912 should account for hibernatable websockets.

Recommendation: Filter hibernatable WebSockets or remove them from the set:

// Count only non-hibernatable raw websockets
const nonHibernatableWsCount = Array.from(this.#activeRawWebSockets)
    .filter(ws => !(ws as any).isHibernatable)
    .length;
if (nonHibernatableWsCount > 0) return false;

3. Missing Event Listener Cleanup on Error (instance.ts:1651-1657)

Severity: MEDIUM

In onSocketOpened, if an error occurs during event processing, the listeners are added but there's no cleanup path besides the close handler. If the socket fails to open properly, listeners could remain attached.

Recommendation: Wrap the event attachment in try-catch and ensure cleanup on error.

⚠️ Potential Issues

4. Index Out of Bounds (instance.ts:1605-1630)

Severity: LOW-MEDIUM

When removing hibernatable WebSockets on close:

const wsIndex = this.#persist.hibernatableWebSocket.findIndex((ws) =>
    arrayBuffersEqual(ws.requestId, rivetRequestIdLocal),
);

const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);

If findIndex returns -1 (not found), splice(-1, 1) will remove the last element instead of failing safely.

Recommendation: Check for -1:

if (wsIndex !== -1) {
    const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);
    // ... logging
} else {
    this.#rLog.warn({
        msg: "could not find hibernatable websocket to remove",
        rivetRequestId,
    });
}

5. BigInt Precision Loss (actor-driver.ts:191)

Severity: LOW

Converting msgIndex from BigInt to Number could cause precision loss for very large values:

lastMsgIndex: Number(existingWs.msgIndex),

While unlikely to hit JavaScript's safe integer limit in practice, it's worth documenting or adding a safe conversion check.

6. Missing rivetMessageIndex Validation (instance.ts:1586-1587)

Severity: LOW

The code assumes event.rivetMessageIndex exists and is valid:

persistedHibernatableWebSocket.msgIndex = BigInt(event.rivetMessageIndex);

If this value is undefined or not a number, it could cause issues.

Recommendation: Add validation:

if (typeof event.rivetMessageIndex === 'number') {
    persistedHibernatableWebSocket.msgIndex = BigInt(event.rivetMessageIndex);
}

📝 Code Quality Suggestions

7. Type Safety (conn-drivers.ts:161)

Severity: LOW

The type assertion here is fragile:

const raw = state.websocket.raw as HonoWebSocketAdapter;
if (typeof raw.isHibernatable === "boolean") {
    return raw.isHibernatable;
}

Recommendation: Use a type guard function or check for the property more safely.

8. Typo in Configuration Key (config.ts:73)

Severity: LOW

The configuration key has a typo: canHibernatWebSocket should be canHibernateWebSocket (missing 'e').

Recommendation: Fix the typo for better readability, though this may require migration if already in use.

9. Log Message Typo (instance.ts:1633)

Severity: TRIVIAL

msg: "actor instance onSocketMessage",  // Should be "onSocketClosed"

🧪 Test Coverage Recommendations

This feature would benefit from tests covering:

  1. Hibernation lifecycle: Actor hibernates with active hibernatable WebSocket, then resumes
  2. Message index tracking: Verify msgIndex is correctly incremented and persisted
  3. Duplicate requestId handling: Ensure duplicate WebSocket entries don't accumulate
  4. Race conditions: Concurrent WebSocket open/close operations
  5. Migration: v1 → v2 schema migration with existing actors
  6. Error cases: Invalid requestId, missing rivetMessageIndex, etc.

📚 Documentation Needs

  • Document the @experimental WebSocket hibernation feature
  • Add JSDoc comments explaining the hibernation lifecycle
  • Document the canHibernatWebSocket configuration option with examples

Summary

This is a solid implementation of a complex feature. The critical issues are:

  1. Fix the duplicate entry bug in getActorHibernationConfig
  2. Resolve the sleep logic inconsistency for raw WebSockets
  3. Add bounds checking for array splice operations

The rest are minor improvements that would increase robustness. The schema versioning and backward compatibility are handled well.

Overall Assessment: Approve with requested changes for the critical issues.

@pkg-pr-new
Copy link

pkg-pr-new bot commented Oct 31, 2025

More templates

@rivetkit/actor

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/actor@3301

@rivetkit/cloudflare-workers

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/cloudflare-workers@3301

@rivetkit/core

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/core@3301

@rivetkit/db

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/db@3301

@rivetkit/framework-base

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/framework-base@3301

@rivetkit/next-js

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/next-js@3301

@rivetkit/react

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/react@3301

rivetkit

pnpm add https://pkg.pr.new/rivet-dev/rivet/rivetkit@3301

@rivetkit/sql-loader

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/sql-loader@3301

@rivetkit/engine-runner

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner@3301

@rivetkit/engine-runner-protocol

pnpm add https://pkg.pr.new/rivet-dev/rivet/@rivetkit/engine-runner-protocol@3301

commit: e0a1d01

@NathanFlurry NathanFlurry changed the base branch from main to graphite-base/3301 October 31, 2025 23:23
@NathanFlurry NathanFlurry force-pushed the 10-31-feat_rivetkit_integrate_websocket_hibernation branch from d6e9414 to 60414c4 Compare October 31, 2025 23:23
@NathanFlurry NathanFlurry changed the base branch from graphite-base/3301 to 10-31-fix_use_correct_vbare_functions_for_runner_tunnel October 31, 2025 23:23
@claude
Copy link

claude bot commented Oct 31, 2025

PR Review: WebSocket Hibernation Integration

Summary

This PR implements WebSocket hibernation support in RivetKit, allowing WebSocket connections to persist across actor sleep/wake cycles. This is a significant enhancement for long-lived WebSocket connections and reduces resource consumption.

Code Quality & Best Practices

✅ Strengths

  1. Well-structured schema versioning: New v2.bare schema properly extends the persistence model
  2. Proper TypeScript typing: Added appropriate types and interfaces for hibernatable WebSocket tracking
  3. Defensive programming: Good use of optional chaining and validation (e.g., checking if actor/handler exists)
  4. Comprehensive logging: Debug logs added at key points for observability
  5. Clean separation of concerns: Connection driver abstraction properly extended with isHibernatable method

🔍 Areas for Improvement

1. Missing Index Validation (Bug Risk)

// rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1604-1615
const wsIndex = this.#persist.hibernatableWebSocket.findIndex((ws) =>
    arrayBuffersEqual(ws.requestId, rivetRequestIdLocal),
);

const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);

Issue: If findIndex returns -1 (not found), splice(-1, 1) will remove the last element incorrectly.

Recommendation:

if (wsIndex !== -1) {
    const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);
    // ... log removed
} else {
    this.#rLog.warn({ msg: "could not find hibernatable websocket to remove" });
}

2. Race Condition in WebSocket Event Tracking

// rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1551-1578
const onSocketOpened = (event: any) => {
    rivetRequestId = event?.rivetRequestId;
    persistedHibernatableWebSocket = this.#persist.hibernatableWebSocket.find(...)
}

Issue: The WebSocket is added to hibernatableWebSocket in getActorHibernationConfig (in driver) before the onSocketOpened handler fires. However, if the socket opens before the handler is registered, the find operation might not locate the newly-added entry.

Recommendation: Consider synchronization or ensure the hibernatable WS is added before any events can fire.

3. ArrayBuffer Comparison Performance

// rivetkit-typescript/packages/rivetkit/src/utils.ts:252-265
export function arrayBuffersEqual(buf1: ArrayBuffer, buf2: ArrayBuffer): boolean {
    if (buf1.byteLength !== buf2.byteLength) return false;
    const view1 = new Uint8Array(buf1);
    const view2 = new Uint8Array(buf2);
    for (let i = 0; i < view1.length; i++) {
        if (view1[i] !== view2[i]) return false;
    }
    return true;
}

Issue: O(n) comparison called in multiple find operations could be expensive with many hibernatable WebSockets.

Recommendation: Consider using a Map keyed by base64/hex string representation of requestId for O(1) lookup.

4. Sleep Logic Inconsistency

// rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1904-1905
if (conn.status === "connected" && !conn.isHibernatable)
    return false;

Issue: Active raw WebSockets prevent sleep (line 1912), even if they're hibernatable. This seems inconsistent.

Question: Should hibernatable raw WebSockets also be exempted from the raw WebSocket check at line 1912?

5. Error Handling in canHibernatWebSocket Callback

// rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:223-238
try {
    const canHibernate = canHibernatWebSocket(truncatedRequest);
    hibernationConfig = { enabled: canHibernate, lastMsgIndex: undefined };
} catch (error) {
    logger().error({ msg: "error calling canHibernatWebSocket", error });
    hibernationConfig = { enabled: false, lastMsgIndex: undefined };
}

Good: Catches errors but could be more informative about which actor/path failed.

Security Concerns

⚠️ Medium Priority

  1. No validation of rivetRequestId uniqueness: The system assumes requestIds are unique but doesn't validate. If the engine runner provides duplicate requestIds, the hibernation system could track the wrong WebSocket.

  2. ArrayBuffer persistence: requestId is stored as raw data in BARE schema. Ensure the engine runner provides cryptographically random requestIds to prevent prediction attacks.

  3. Missing bounds on hibernatableWebSocket array: No limit on how many hibernatable WebSockets can be persisted per actor. Could lead to unbounded memory growth.

Recommendation: Add a configurable limit (e.g., 100 hibernatable WebSockets per actor).

Performance Considerations

  1. Linear search operations: Multiple find/findIndex operations on hibernatableWebSocket array. Consider indexing by requestId.

  2. BigInt operations: Using BigInt for timestamps and message indices is appropriate but ensure serialization/deserialization is efficient.

  3. Frequent state updates: Every WebSocket message updates lastSeenTimestamp and msgIndex, triggering persistence. This could be costly if messages are frequent.

Recommendation: Consider batching state updates or implementing a debounce mechanism.

Test Coverage

❌ Missing Tests

Critical: No test files found for WebSocket hibernation functionality. This is a complex feature that requires comprehensive testing:

  1. Unit tests needed:

    • arrayBuffersEqual function
    • Hibernatable WebSocket lifecycle (add/update/remove)
    • canHibernatWebSocket configuration (boolean and function)
  2. Integration tests needed:

    • WebSocket survives actor sleep/wake cycle
    • Message replay after hibernation
    • Concurrent hibernatable WebSockets
    • Non-hibernatable WebSocket behavior unchanged
  3. Edge case tests:

    • WebSocket closes before hibernation
    • Actor stops with active hibernatable WebSockets
    • Corrupted/missing requestId handling

Recommendation: Add test suite in tests/websocket-hibernation.test.ts before merging.

Migration & Compatibility

✅ Good Version Management

  • New schema version (v2.bare) properly extends v1
  • Backward compatibility appears maintained
  • Version bump in package.json (2.0.22-rc.1)

⚠️ Migration Path Unclear

  • How do existing persisted actors migrate from v1 to v2 schema?
  • Does the system automatically add empty hibernatableWebSocket: [] for v1 actors?

Recommendation: Document migration strategy or add automatic migration logic.

Additional Comments

  1. Experimental API: Good use of @experimental JSDoc tags for new APIs
  2. TypeScript typing: Consider making rivetRequestId and rivetMessageIndex strongly typed (e.g., Uint8Array or branded type) instead of any
  3. Documentation: Missing inline documentation for complex logic in getActorHibernationConfig

Recommendations Summary

Must Fix Before Merge:

  1. ✅ Fix index validation bug in WebSocket removal (line 1604)
  2. ✅ Add comprehensive test coverage
  3. ✅ Document migration path for schema v1 → v2

Should Fix:

  1. Add bounds checking for hibernatableWebSocket array size
  2. Optimize arrayBuffersEqual with Map-based lookup
  3. Clarify sleep behavior with hibernatable raw WebSockets
  4. Add migration logic or documentation

Nice to Have:

  1. Debounce state updates for high-frequency messages
  2. Stronger typing for Rivet-specific fields
  3. Inline documentation for complex hibernation logic

Overall Assessment

This is a well-architected feature that extends the actor persistence model cleanly. The main concerns are:

  • Critical bug in WebSocket removal logic
  • Missing test coverage for a complex feature
  • Potential performance issues with linear searches

The code follows the repository's conventions well (as per CLAUDE.md), but needs tests and the bug fix before it's production-ready.


Review generated by Claude Code

@claude
Copy link

claude bot commented Oct 31, 2025

Pull Request Review: WebSocket Hibernation Integration

I've completed a thorough review of this PR. Overall, the implementation is well-structured and follows the codebase conventions. Below are my findings:


Strengths

  1. Schema Versioning: Proper migration from v1 to v2 with a clean migration function that adds hibernatableWebSocket: [] (actor-driver.ts:1541-1621, versioned.ts:22-29)

  2. Structured Logging: Excellent use of structured logging throughout. Examples:

    • tracing::debug!(?rivetRequestId, ?isHibernatable, "actor instance onSocketOpened") pattern
    • All log messages are lowercase per CLAUDE.md conventions
  3. Configuration Design: The canHibernatWebSocket config option supports both boolean and function types, providing flexibility (config.ts:73-80)

  4. Error Handling: Good defensive error handling with fallback to enabled: false when errors occur (actor-driver.ts:231-240)

  5. Clean Separation: The PERSIST_SYMBOL approach provides a nice encapsulation of persistence internals


🔍 Issues & Concerns

1. Critical: Potential Memory Leak in Hibernation Config (HIGH PRIORITY)

Location: actor-driver.ts:259-264

// Save hibernatable WebSocket
handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
    requestId,
    lastSeenTimestamp: BigInt(Date.now()),
    msgIndex: -1n,
});

Problem: This push() happens every time getActorHibernationConfig is called, even for existing WebSockets. This means:

  • If a WebSocket reconnects multiple times, you'll have duplicate entries
  • The array grows unbounded until cleanup happens in onSocketClosed
  • There's a race condition if getActorHibernationConfig is called twice before the socket fully opens

Fix: Only push if !existingWs:

if (!existingWs) {
    handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
        requestId,
        lastSeenTimestamp: BigInt(Date.now()),
        msgIndex: -1n,
    });
}

2. Race Condition: WebSocket Cleanup (MEDIUM PRIORITY)

Location: instance.ts:1598-1618

The onSocketClosed handler tries to remove from hibernatableWebSocket array, but:

  • The rivetRequestId may be undefined if the socket closed before onSocketOpened fired
  • The findIndex will return -1, and splice(wsIndex, 1) won't remove anything but will log a warning

Suggestion: Add explicit check:

if (rivetRequestId) {
    const wsIndex = this.#persist.hibernatableWebSocket.findIndex((ws) =>
        arrayBuffersEqual(ws.requestId, rivetRequestId)
    );
    if (wsIndex !== -1) {
        this.#persist.hibernatableWebSocket.splice(wsIndex, 1);
        this.#rLog.debug({ msg: "removed hibernatable websocket", ... });
    }
} else {
    this.#rLog.debug({ msg: "socket closed without rivetRequestId" });
}

3. Timing Issue: Event Listener Registration (LOW PRIORITY)

Location: instance.ts:1541-1621

The event listeners are registered after the websocket might already be open. In the current flow:

  1. WebSocket is created and may immediately fire events
  2. Then listeners are registered

Potential issue: If the underlying WebSocket implementation fires open synchronously, the onSocketOpened handler might miss it.

Suggestion: Consider registering listeners before any WebSocket operations, or document this assumption clearly.


4. BigInt Precision Loss (LOW-MEDIUM PRIORITY)

Location: actor-driver.ts:191

lastMsgIndex: Number(existingWs.msgIndex),

Converting bigint to number can lose precision for very large values (> 2^53 - 1). While unlikely in practice for message indices, this could be problematic for long-running connections with billions of messages.

Suggestion: Either:

  • Keep as bigint in the HibernationConfig interface, or
  • Document the assumption that message indices won't exceed Number.MAX_SAFE_INTEGER

5. Inconsistent Null Handling

Location: instance.ts:1580

const rivetRequestId = event?.rivetRequestId;

Uses optional chaining, but later code in onSocketMessage and onSocketClosed assumes rivetRequestId might be undefined without explicit checks before first use.

Suggestion: Make the undefined case more explicit earlier:

if (!rivetRequestId) {
    this.#rLog.debug({ msg: "websocket opened without rivetRequestId" });
    return;
}

🎯 Performance Considerations

  1. arrayBuffersEqual Performance: The byte-by-byte comparison in arrayBuffersEqual (utils.ts:252-265) could be slow for large request IDs. Consider early exit or optimize with DataView comparisons if requestIds are consistently large.

  2. Frequent Array Searches: Multiple .find() and .findIndex() operations on hibernatableWebSocket array. If this grows large, consider using a Map<string, PersistedHibernatableWebSocket> with a hex-encoded requestId as key.


🔒 Security Considerations

No major security concerns identified. The implementation:

  • Doesn't expose sensitive data in logs
  • Properly validates configuration before enabling hibernation
  • Has appropriate error boundaries

📝 Code Style & Conventions

✅ Follows CLAUDE.md conventions:

  • Uses workspace dependencies correctly
  • Structured logging with lowercase messages
  • Hard tabs in formatting
  • Proper error handling with anyhow

⚠️ Minor: Some log messages could be more descriptive:

  • instance.ts:1631 - "actor instance onSocketMessage" could be "websocket message received"

🧪 Test Coverage

Question: Are there tests for:

  • WebSocket reconnection scenarios?
  • Hibernation state persistence across actor restarts?
  • The migration from v1 to v2 schema?
  • Edge cases like duplicate requestIds or concurrent socket operations?

These would be valuable to prevent regressions.


📋 Summary

Must Fix Before Merge:

  1. Memory leak in getActorHibernationConfig (Issue [SVC-2555] Set up issue templates #1)

Should Fix Before Merge:
2. Race condition in WebSocket cleanup (Issue #2)
3. BigInt precision loss (Issue #4)

Nice to Have:
4. Event listener timing (Issue #3)
5. Inconsistent null handling (Issue #5)
6. Performance optimizations for array searches

Overall, this is solid work! The main concern is the memory leak in Issue #1, which should be addressed before merging. The rest are lower priority but would improve robustness.

Let me know if you'd like me to clarify any of these points or if you'd like help implementing the fixes!

@claude
Copy link

claude bot commented Nov 1, 2025

PR Review: WebSocket Hibernation Integration

Summary

This PR implements WebSocket hibernation support for RivetKit actors, allowing WebSocket connections to persist across actor sleep/wake cycles. The implementation adds tracking for hibernatable WebSockets and integrates this with the actor persistence layer.

Code Quality & Architecture

Strengths

  1. Well-structured persistence schema: The new v2 BARE schema cleanly adds hibernatableWebSocket tracking with proper migration from v1
  2. Clean separation of concerns: The hibernation logic is properly distributed across driver, connection, and instance layers
  3. Good use of TypeScript: Proper typing throughout with experimental annotations for new features
  4. Follows repository patterns: Uses structured logging correctly (lowercase messages with structured fields)

Issues & Concerns

🔴 Critical: Missing Index Validation

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1612-1615

const wsIndex = this.#persist.hibernatableWebSocket.findIndex((ws) =>
    arrayBuffersEqual(ws.requestId, rivetRequestIdLocal),
);
const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);

Problem: findIndex returns -1 when not found, but splice(-1, 1) removes the last element instead of being a no-op. This could silently corrupt hibernation state.

Fix: Add validation:

if (wsIndex !== -1) {
    const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);
    // ... logging
} else {
    this.#rLog.warn({
        msg: "could not find hibernatable websocket to remove",
        rivetRequestId,
    });
}

🟡 Potential Bug: Race Condition in Event Handlers

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1551-1578

The onSocketOpened handler searches for an existing hibernatable WebSocket, but this assumes the WebSocket was already added to this.#persist.hibernatableWebSocket before the open event fires. However, in actor-driver.ts:258, the WebSocket is added to the persist array inside getActorHibernationConfig, which is called from the engine runner tunnel.

Concern: Depending on timing, the WebSocket might not be in the array yet when onSocketOpened fires. The code handles this gracefully by checking if (persistedHibernatableWebSocket), but the flow is confusing.

Recommendation: Add a comment explaining the expected timing or consider refactoring to make the flow more explicit.

🟡 Type Safety: Missing Null Check

Location: rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1586-1587

persistedHibernatableWebSocket.msgIndex = BigInt(event.rivetMessageIndex);

Issue: event.rivetMessageIndex could be undefined if the event doesn't come from the engine runner. This would set msgIndex to 0n, which might be incorrect.

Fix: Add validation:

if (event.rivetMessageIndex !== undefined) {
    persistedHibernatableWebSocket.msgIndex = BigInt(event.rivetMessageIndex);
}

🟡 Code Organization: Extra Blank Lines

Location: engine/sdks/typescript/runner/src/tunnel.ts:577

adapter._handleOpen(requestId);


// Call websocket handler

Two blank lines are unusual for this codebase. Consider removing one.

🔵 Typo in Config Property Name

Location: rivetkit-typescript/packages/rivetkit/src/actor/config.ts:73

canHibernatWebSocket

Should be canHibernateWebSocket (missing 'e' in 'Hibernate'). However, this is marked as @experimental, so it's acceptable to fix this in a follow-up if you want to avoid breaking changes during the experimental phase.

🔵 Performance: O(n) Array Comparisons

Location: Multiple locations using arrayBuffersEqual in findIndex

The implementation uses linear search through hibernatableWebSocket arrays. For typical use cases this is fine, but if actors have many hibernatable WebSockets, consider using a Map keyed by a string representation of requestId.

Current: O(n*m) where n = number of WebSockets, m = bytes in requestId
With Map: O(m) lookup time

This is likely premature optimization for most use cases, but worth considering if hibernatable WebSocket counts grow large.

Security

ArrayBuffer Equality Check

The arrayBuffersEqual implementation in utils.ts:252-265 is correct and constant-time safe for small buffers. For cryptographic use cases with large buffers, consider using a constant-time comparison, but for request IDs this is adequate.

State Persistence

The persistence layer correctly tracks WebSocket state, but there's no validation that rivetRequestId is actually coming from a trusted source (the engine runner). This is acceptable if the runner is always trusted, but worth documenting.

Performance

  1. Throttled persistence: Good use of #savePersistThrottled() to batch state updates
  2. Event handler cleanup: Properly removes event listeners on close
  3. BigInt timestamps: Using BigInt for timestamps is appropriate for microsecond precision

Test Coverage

🔴 Major Gap: No tests found for the hibernation feature. The grep search found no test files covering:

  • canHibernatWebSocket configuration
  • hibernatableWebSocket array manipulation
  • Recovery after hibernation
  • Edge cases (missing requestId, message index tracking)

Recommendation: Add tests covering:

  1. Basic hibernation flow (sleep → wake → resume)
  2. WebSocket with canHibernatWebSocket: true
  3. WebSocket with canHibernatWebSocket: (req) => boolean function
  4. Multiple concurrent hibernatable WebSockets
  5. Cleanup when WebSocket closes
  6. Migration from v1 to v2 persistence schema

Best Practices

✅ Follows CLAUDE.md Guidelines

  • Uses structured logging correctly
  • Follows error handling patterns
  • Uses workspace dependencies
  • Proper TypeScript typing

✅ Schema Versioning

The migration from v1 to v2 is clean and follows the existing versioned data pattern.

✅ Experimental Features

Good use of @experimental annotations for the new hibernation API.

Recommendations

High Priority

  1. Fix the findIndex bug - this could corrupt state
  2. Add test coverage - especially for hibernation recovery scenarios
  3. Add null checks for rivetMessageIndex

Medium Priority

  1. Document the hibernation flow - especially timing of when WebSockets are added to the persist array
  2. Consider renaming canHibernatWebSocket to canHibernateWebSocket while still experimental

Low Priority

  1. Remove extra blank lines in tunnel.ts
  2. Consider Map-based lookup if hibernatable WebSocket counts become large

Conclusion

This is a solid implementation of an experimental feature. The main concern is the critical bug with findIndex that needs to be fixed before merge, and the lack of test coverage which should be addressed to ensure reliability. The architecture is sound and follows the codebase patterns well.

Overall: Approve with required changes ✅ (after fixing the findIndex bug)

@claude
Copy link

claude bot commented Nov 1, 2025

PR Review: WebSocket Hibernation Integration

I've reviewed this PR implementing WebSocket hibernation support. Overall, the implementation is solid and follows good practices, but I've identified several issues that should be addressed.


Critical Issues

1. Race Condition in WebSocket Tracking (rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:259-264)

The getActorHibernationConfig function adds a new hibernatable WebSocket to the persisted state unconditionally at line 260-264:

// Save hibernatable WebSocket
handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
    requestId,
    lastSeenTimestamp: BigInt(Date.now()),
    msgIndex: -1n,
});

Problem: This is called before determining if hibernation is actually enabled. If hibernationConfig.enabled is false, you're still adding the WebSocket to the persistence layer, creating a memory leak.

Impact: Non-hibernatable WebSockets will accumulate in the hibernatableWebSocket array and never be cleaned up (unless explicitly closed), wasting memory and storage.

Fix: Only add to hibernatableWebSocket when hibernationConfig.enabled === true.


2. Duplicate WebSocket Entries on Reconnection (rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:179-192)

When checking for existing WebSockets at lines 179-184:

const existingWs = handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.find((ws) =>
    arrayBuffersEqual(ws.requestId, requestId),
);

If an existingWs is found, the code returns the hibernation config BUT still executes line 260 which adds another entry with the same requestId.

Impact: Each reconnection will create a duplicate entry, leading to unbounded array growth.

Fix: Add early return after setting hibernationConfig for existing WebSockets, or use a flag to skip the push at the end.


3. msgIndex Not Updated on onSocketOpened (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1551-1578)

In onSocketOpened, when finding an existing hibernatable WebSocket (lines 1557-1568), only lastSeenTimestamp is updated:

if (persistedHibernatableWebSocket) {
    persistedHibernatableWebSocket.lastSeenTimestamp = BigInt(Date.now());
}

Problem: The msgIndex is not reset or validated. If this is a reconnection after hibernation, the actor should know what the last processed message index was, but this isn't being handled.

Impact: Message replay or loss on reconnection scenarios.

Recommendation: Clarify the intended behavior. Should msgIndex be preserved across reconnections, or should there be validation logic?


4. Potential Index Out of Bounds (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1604-1615)

The cleanup logic uses findIndex then splice:

const wsIndex = this.#persist.hibernatableWebSocket.findIndex((ws) =>
    arrayBuffersEqual(ws.requestId, rivetRequestIdLocal),
);

const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);

Problem: If findIndex returns -1 (not found), splice(-1, 1) will remove the last element from the array, which is incorrect behavior.

Impact: Wrong WebSocket entries get removed, corrupting state.

Fix: Check wsIndex !== -1 before calling splice.


High Priority Issues

5. Sleep Logic Doesn't Consider Raw WebSockets (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1904-1912)

The #canSleep() method checks:

// Check for active conns. This will also cover active actions, since all actions have a connection.
for (const conn of this.#connections.values()) {
    if (conn.status === "connected" && !conn.isHibernatable)
        return false;
}

// Do not sleep if there are raw websockets open
if (this.#activeRawWebSockets.size > 0) return false;

Problem: The second check (this.#activeRawWebSockets.size > 0) prevents sleeping if any raw WebSocket is open, even if it's hibernatable. This defeats the purpose of WebSocket hibernation for raw WebSockets.

Impact: Actors with hibernatable raw WebSockets will never sleep.

Recommendation: Modify the logic to check if raw WebSockets are hibernatable before preventing sleep, similar to how connections are checked.


6. Missing rivetMessageIndex Validation (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1580-1598)

In onSocketMessage, the code assumes event.rivetMessageIndex exists:

persistedHibernatableWebSocket.msgIndex = BigInt(event.rivetMessageIndex);

Problem: No validation if event.rivetMessageIndex is undefined or null.

Impact: Could set msgIndex to NaN or cause runtime errors when converting to BigInt.

Fix: Add validation: event.rivetMessageIndex != null before using it.


Medium Priority Issues

7. Log Message Typo (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1633)

Line 1633 has incorrect log message:

this.#rLog.debug({
    msg: "actor instance onSocketMessage",  // Should be "onSocketClosed"
    rivetRequestId,
    isHibernatable: !!persistedHibernatableWebSocket,
    hibernatableWebSocketCount: this.#persist.hibernatableWebSocket.length,
});

Fix: Change to "actor instance onSocketClosed".


8. Type Safety Concern with PERSIST_SYMBOL (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:54-58)

The new PERSIST_SYMBOL getter exposes internal state:

get [PERSIST_SYMBOL](): PersistedActor<S, CP, CS, I> {
    return this.#persist;
}

Concern: This breaks encapsulation by exposing mutable internal state. External code (like in actor-driver.ts:180-183 and router-endpoints.ts:606-608) directly mutates this state.

Recommendation: Consider providing dedicated methods for managing hibernatable WebSockets instead of exposing raw state, or at least document that this is intentionally mutable.


9. Schema Migration Testing

The migration from v1 to v2 (rivetkit-typescript/packages/rivetkit/src/schemas/actor-persist/versioned.ts:23-29) looks correct:

migrations.set(
    1,
    (v1Data: v1.PersistedActor): v2.PersistedActor => ({
        ...v1Data,
        hibernatableWebSocket: [],
    }),
);

Recommendation: Ensure there are tests covering:

  • Migration from v1 to v2 with existing persisted actors
  • Serialization/deserialization of v2 format
  • Handling of hibernatable WebSockets across actor restarts

Minor Issues

10. Inconsistent Error Handling (rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:231-240)

The try-catch around canHibernatWebSocket function call logs errors but returns enabled: false:

} catch (error) {
    logger().error({
        msg: "error calling canHibernatWebSocket",
        error,
    });
    hibernationConfig = {
        enabled: false,
        lastMsgIndex: undefined,
    };
}

Recommendation: Consider if this should throw or if there should be metrics/alerts for configuration errors.


11. TypeScript Strict Mode Concerns

Multiple uses of any types:

  • onSocketOpened = (event: any) (instance.ts:1551)
  • onSocketMessage = (event: any) (instance.ts:1580)
  • onOpen: (evt: any, ws: any) (router-endpoints.ts:602)

Recommendation: Define proper types for these events to improve type safety, especially since you're accessing specific properties like rivetRequestId and rivetMessageIndex.


Positive Observations

Good migration strategy: The v1→v2 schema migration is clean and backward compatible

Proper logging: Good use of structured logging throughout for debugging

Event-driven design: The event listener approach for tracking WebSocket lifecycle is clean

Configuration flexibility: The canHibernatWebSocket being either boolean or function is nice

Following conventions: Adheres to the codebase's error handling and logging patterns


Testing Recommendations

Consider adding tests for:

  1. WebSocket hibernation and wake-up cycles
  2. Multiple concurrent hibernatable WebSockets
  3. Non-hibernatable WebSockets mixed with hibernatable ones
  4. Edge cases: duplicate requestIds, missing event properties
  5. Actor sleep behavior with hibernatable connections
  6. Schema migration from v1 to v2
  7. Cleanup on WebSocket close/error

Summary

The feature implementation is well-structured, but there are critical bugs that need to be fixed before merging:

Overall direction is good - please address the critical issues and consider the recommendations for a more robust implementation.

@NathanFlurry NathanFlurry force-pushed the 10-31-fix_use_correct_vbare_functions_for_runner_tunnel branch from fc88080 to 479b46d Compare November 2, 2025 16:05
@NathanFlurry NathanFlurry force-pushed the 10-31-feat_rivetkit_integrate_websocket_hibernation branch from 9854433 to 554ce2a Compare November 2, 2025 16:05
@claude
Copy link

claude bot commented Nov 2, 2025

PR Review: WebSocket Hibernation Integration

I've reviewed PR #3301 which adds WebSocket hibernation support to RivetKit. This is a significant feature that enables stateful WebSocket connections to persist across actor sleep/wake cycles. Here's my detailed feedback:

✅ Strengths

1. Well-Structured Schema Migration

  • Proper versioned schema migration from v1 to v2 in schemas/actor-persist/versioned.ts:23-29
  • Migration correctly initializes hibernatableWebSocket as empty array for backward compatibility
  • Clean separation of schema versions using BARE schema format

2. Comprehensive State Tracking

  • Tracking of WebSocket state including requestId, lastSeenTimestamp, and msgIndex in actor/instance.ts:1545-1598
  • Proper lifecycle management with event listeners for open, message, and close events
  • Integration with existing connection driver architecture

3. Good Logging & Observability

  • Debug logging at critical points for tracking hibernatable WebSocket state
  • Useful context in log messages (request IDs, message indices, hibernation status)

🔍 Issues & Concerns

Critical: Potential Memory Leak in getActorHibernationConfig

Location: drivers/engine/actor-driver.ts:259-264

// Save hibernatable WebSocket
handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
    requestId,
    lastSeenTimestamp: BigInt(Date.now()),
    msgIndex: -1n,
});

Problem: This code unconditionally pushes a new hibernatable WebSocket entry every time getActorHibernationConfig is called, even when handling an existing WebSocket. This happens because:

  1. Lines 179-184 check if the WebSocket already exists
  2. Lines 259-264 always push a new entry, regardless of whether one was found

Impact:

  • Memory leak: Array grows unbounded with duplicate entries
  • Data corruption: Multiple entries with same requestId can cause incorrect behavior
  • The cleanup logic in actor/instance.ts:1600-1633 won't prevent this since entries are added before WebSocket opens

Recommended Fix:

// Only save if this is a new WebSocket
if (!existingWs) {
    handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
        requestId,
        lastSeenTimestamp: BigInt(Date.now()),
        msgIndex: -1n,
    });
}

Medium: Race Condition in WebSocket Cleanup

Location: actor/instance.ts:1600-1633

The onSocketClosed handler uses findIndex to locate and remove the WebSocket from the array. However:

  1. If getActorHibernationConfig adds duplicates (see issue above), only the first match is removed
  2. The warning at line 1622 suggests this has been observed: "could not find hibernatable websocket to remove"

Recommendation: Consider using a Map keyed by requestId instead of an array for O(1) lookups and guaranteed uniqueness.

Medium: Missing Error Handling in msgIndex Update

Location: actor/instance.ts:1586-1588

persistedHibernatableWebSocket.msgIndex = BigInt(
    event.rivetMessageIndex,
);

Problem: No validation that event.rivetMessageIndex is defined or a valid number before converting to BigInt.

Recommendation:

if (typeof event.rivetMessageIndex === 'number') {
    persistedHibernatableWebSocket.msgIndex = BigInt(event.rivetMessageIndex);
}

Low: Type Safety Concern

Location: actor/conn-drivers.ts:160-164

if (state.websocket.raw) {
    const raw = state.websocket.raw as HonoWebSocketAdapter;
    if (typeof raw.isHibernatable === "boolean") {
        return raw.isHibernatable;
    }
}

Issue: Using as HonoWebSocketAdapter bypasses type checking. The subsequent typeof check suggests uncertainty about the actual type.

Recommendation: Use a type guard or runtime check:

if (state.websocket.raw && 'isHibernatable' in state.websocket.raw) {
    return state.websocket.raw.isHibernatable === true;
}

Low: Inconsistent Naming

Location: actor/config.ts:73

The config field is named canHibernatWebSocket (missing 'e' in "Hibernate")

Recommendation: Rename to canHibernateWebSocket for consistency with other naming (isHibernatable, hibernatableWebSocket)

Low: Missing Utility Function Export

Location: utils.ts:252-265

The new arrayBuffersEqual function is defined but not exported in the main utils export. Verify it's properly exported for use across the codebase.

🎯 Testing Recommendations

  1. Test concurrent WebSocket connections - Verify no duplicate entries are created when multiple WebSockets connect
  2. Test hibernation/wake cycles - Ensure WebSocket state (especially msgIndex) correctly persists and resumes
  3. Test cleanup on abnormal closure - Verify hibernatable WebSocket entries are removed on errors (code 1006)
  4. Test with non-hibernatable WebSockets - Ensure mixing hibernatable and non-hibernatable WebSockets works correctly
  5. Load test - Verify no memory leaks with many WebSocket connections over time

📝 Minor Style Notes

  1. Follow structured logging pattern: actor/instance.ts:1571 uses msg: "actor instance onSocketOpened" - consider lowercase per CLAUDE.md guidelines
  2. Consider extracting magic values: msgIndex: -1n appears multiple times - consider a named constant INITIAL_MSG_INDEX
  3. The experimental annotation is good - consider documenting the hibernation feature's stability/maturity level

Summary

This is a well-architected feature with good schema design and state management. However, there's a critical memory leak in getActorHibernationConfig that must be fixed before merging. The race condition and error handling issues should also be addressed to ensure production stability.

Recommendation: Request changes - Fix the critical issue, consider addressing medium priority issues, then this should be good to merge.

@claude
Copy link

claude bot commented Nov 2, 2025

PR Review: WebSocket Hibernation Integration

This PR introduces WebSocket hibernation support to RivetKit, allowing WebSocket connections to persist across actor sleep/wake cycles. Overall, the implementation is well-structured with proper schema versioning and migration. Here's my detailed feedback:

✅ Strengths

  1. Proper Schema Versioning: Great job adding a v2 schema with a migration path from v1 (rivetkit-typescript/packages/rivetkit/src/schemas/actor-persist/versioned.ts:23-29). The migration correctly initializes the hibernatableWebSocket field as an empty array for existing data.

  2. Good State Tracking: The implementation properly tracks hibernatable WebSockets through the lifecycle:

    • Added to persist state in getActorHibernationConfig (rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:260-264)
    • Updated on open/message events (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1551-1598)
    • Removed on close (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1600-1630)
  3. Flexible Configuration: The canHibernatWebSocket option supports both boolean and function forms, allowing fine-grained control (rivetkit-typescript/packages/rivetkit/src/actor/config.ts:74-82).

  4. Experimental Annotation: Properly marked experimental features with @experimental JSDoc comments.

  5. Structured Logging: Good use of structured logging throughout with appropriate log levels (debug/warn/error).

🐛 Potential Issues

Critical

  1. Race Condition in getActorHibernationConfig (rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:260-264)

    The function always pushes a new WebSocket to the array, even for existing connections:

    // This always adds, even if existingWs is found!
    handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
        requestId,
        lastSeenTimestamp: BigInt(Date.now()),
        msgIndex: -1n,
    });

    Issue: If getActorHibernationConfig is called multiple times for the same WebSocket (e.g., reconnection scenarios), you'll create duplicates in the array.

    Fix: Only push when !existingWs:

    if (!existingWs) {
        handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
            requestId,
            lastSeenTimestamp: BigInt(Date.now()),
            msgIndex: -1n,
        });
    }
  2. Array Splice with Invalid Index (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1604-1615)

    When the WebSocket is not found, findIndex returns -1, and splice(-1, 1) will remove the last element:

    const wsIndex = this.#persist.hibernatableWebSocket.findIndex((ws) =>
        arrayBuffersEqual(ws.requestId, rivetRequestIdLocal),
    );
    const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);

    Fix: Only splice if index is valid:

    if (wsIndex !== -1) {
        const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);
        this.#rLog.debug({ msg: "removed hibernatable websocket", ... });
    } else {
        this.#rLog.warn({ msg: "could not find hibernatable websocket to remove", ... });
    }

Moderate

  1. Typo in Configuration Name

    canHibernatWebSocket is missing an "e" - should be canHibernateWebSocket. This is used consistently throughout, but it's a typo that will be baked into the public API once released. Consider fixing before merging if this hasn't been released yet.

  2. Missing Error Handling for arrayBuffersEqual

    The code assumes requestId is always a valid ArrayBuffer, but if it's undefined or corrupted, arrayBuffersEqual could throw. Consider defensive checks or try-catch blocks around the equality comparisons.

  3. BigInt to Number Conversion (rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:191)

    lastMsgIndex: Number(existingWs.msgIndex),

    This could lose precision for very large BigInt values. If msgIndex can exceed Number.MAX_SAFE_INTEGER (2^53 - 1), this will silently truncate. Consider adding a range check or using BigInt consistently.

  4. Memory Leak Potential

    If WebSocket close events are missed (e.g., network issues, crashes), the hibernatableWebSocket array could grow unbounded. Consider:

    • Adding a maximum array size check
    • Implementing a cleanup mechanism based on lastSeenTimestamp
    • Adding TTL-based expiration

Minor

  1. Inconsistent Log Message (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1634)

    msg: "actor instance onSocketMessage",  // Should be onSocketClosed

    The message says "onSocketMessage" but it's in the onSocketClosed handler.

  2. Empty Lines (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1577-1578)

    Two consecutive empty lines should be reduced to one for consistency.

🔒 Security Considerations

  • ArrayBuffer Comparison: The arrayBuffersEqual implementation is straightforward and secure - good constant-time comparison.
  • Request Object Construction: Properly sanitizes and reconstructs Request objects before passing to user callbacks.
  • No obvious injection vulnerabilities in the hibernation logic.

⚡ Performance Considerations

  1. Linear Array Search: The code uses find() and findIndex() which are O(n). For actors with many hibernatable WebSockets, this could become slow. Consider using a Map<ArrayBuffer, PersistedHibernatableWebSocket> keyed by request ID (with a custom hash or string representation).

  2. ArrayBuffer Comparison: The byte-by-byte comparison in arrayBuffersEqual is reasonable, but ensure request IDs aren't unnecessarily large.

📝 Test Coverage

I don't see test file changes in this PR. Consider adding tests for:

  • WebSocket hibernation lifecycle (open → message → close)
  • Race conditions with duplicate connections
  • Migration from v1 to v2 schema
  • Edge cases (undefined requestId, invalid indices, etc.)
  • The canHibernatWebSocket callback functionality

📚 Documentation

The feature is marked experimental which is good. Consider adding:

  • A migration guide for users updating to v2
  • Example usage of the canHibernatWebSocket callback
  • Documentation about when WebSockets can/cannot hibernate

✏️ Nits

  • rivetkit-typescript/packages/rivetkit/src/actor/router-endpoints.ts:721: The comment says "Truncase" but should be "Truncate"

Summary

This is a solid implementation of a complex feature. The main concerns are:

  1. Fix the race condition in getActorHibernationConfig (Critical)
  2. Fix the splice bug with invalid index (Critical)
  3. Consider the typo in canHibernatWebSocket before API freeze
  4. Add test coverage for the new functionality

The schema versioning and migration approach is exemplary - great work on that front!

@NathanFlurry NathanFlurry force-pushed the 10-31-fix_use_correct_vbare_functions_for_runner_tunnel branch from 479b46d to d6216a4 Compare November 2, 2025 16:20
@claude
Copy link

claude bot commented Nov 4, 2025

Pull Request Review: WebSocket Hibernation Integration

Overview

This PR adds experimental WebSocket hibernation support to RivetKit, allowing actors to persist WebSocket connections across sleep/wake cycles. The implementation includes schema changes, connection state tracking, and integration with the engine runner.

Code Quality & Best Practices

✅ Strengths

  1. Well-structured schema migration: The v1 → v2 migration properly initializes hibernatableWebSocket as an empty array, maintaining backward compatibility
  2. Proper TypeScript typing: New interfaces and types are well-defined with clear documentation
  3. Defensive programming: Multiple null checks and error handling throughout the hibernation config logic
  4. Appropriate use of experimental markers: The @experimental JSDoc tags clearly indicate unstable APIs
  5. Consistent logging: Structured logging with tracing pattern (?x notation) - good adherence to CLAUDE.md guidelines

⚠️ Areas for Improvement

1. Inconsistent log messages (src/actor/instance.ts:1634)

this.#rLog.debug({
    msg: "actor instance onSocketMessage", // Should be "onSocketClosed"
    rivetRequestId,
    isHibernatable: !!persistedHibernatableWebSocket,
    hibernatableWebSocketCount:
        this.#persist.hibernatableWebSocket.length,
});

The log message says "onSocketMessage" but this is in the onSocketClosed handler.

2. Missing index validation (src/actor/instance.ts:1604-1615)

const wsIndex = this.#persist.hibernatableWebSocket.findIndex((ws) =>
    arrayBuffersEqual(ws.requestId, rivetRequestIdLocal),
);

const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);

If findIndex returns -1 (not found), splice(-1, 1) will remove the last element, which is incorrect behavior. Should check wsIndex >= 0 before calling splice.

3. Race condition potential (src/drivers/engine/actor-driver.ts:260-264)

The getActorHibernationConfig always pushes a new hibernatable WebSocket entry, even if checking for an existing one. This could lead to duplicate entries if called multiple times rapidly:

// Check for existing WS
const existingWs = handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.find(...);

// ... logic ...

// Always pushes regardless of existingWs
handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
    requestId,
    lastSeenTimestamp: BigInt(Date.now()),
    msgIndex: -1n,
});

4. TODO comment indicates incomplete feature (src/actor/instance.ts:1904-1906)

// TODO: Enable this when hibernation is implemented. We're waiting on support for Guard to not auto-wake the actor if it sleeps.
// if (conn.status === "connected" && !conn.isHibernatable)
//     return false;

This suggests the sleep logic doesn't actually respect hibernatable connections yet, which means the feature may not be fully functional.

Potential Bugs

🐛 Critical Issues

  1. Array index bug: As mentioned above, wsIndex = -1 will incorrectly remove the last element (src/actor/instance.ts:1612)

  2. Possible duplicate tracking: Race condition in getActorHibernationConfig could create multiple entries for the same WebSocket (src/drivers/engine/actor-driver.ts:260)

  3. Missing msgIndex initialization for existing WS: When finding an existing hibernatable WebSocket, the msgIndex is read but never initialized from the persisted state during the "onOpen" event

⚠️ Minor Issues

  1. Unused import: PersistedHibernatableWebSocket is imported in conn.ts:3 but appears to only be used in a comment

  2. Empty lines: Extra blank lines at src/tunnel.ts:1576-1578 should be removed per code style

Performance Considerations

✅ Good

  1. Efficient ArrayBuffer comparison: The arrayBuffersEqual utility is O(n) but short-circuits on length mismatch
  2. Minimal overhead: Hibernation tracking only activates when feature is enabled
  3. Event listener cleanup: Proper removal of event listeners prevents memory leaks

⚠️ Concerns

  1. Linear search on every message: The find() operation in onSocketOpened performs a linear search through all hibernatable WebSockets. For actors with many concurrent hibernatable WebSockets, this could be inefficient. Consider using a Map keyed by requestId.

  2. BigInt conversions: Frequent BigInt operations (Date.now() → BigInt, msgIndex conversions) may have performance implications. Profile in high-throughput scenarios.

Security Concerns

✅ Good

  1. No obvious injection vulnerabilities: ArrayBuffer comparison is safe
  2. Error handling prevents information leakage: Errors in canHibernatWebSocket are caught and logged without exposing details to clients

⚠️ Considerations

  1. Request ID trust: The implementation trusts rivetRequestId from events without additional validation. Ensure the engine runner properly validates/generates these IDs to prevent spoofing.

  2. WebSocket state persistence: Persisting WebSocket state (msgIndex, lastSeenTimestamp) could potentially be exploited if an attacker can manipulate the persisted data. Ensure the storage layer has appropriate access controls.

Test Coverage

⚠️ Insufficient

  1. No new test files: The PR doesn't add any tests specifically for hibernation functionality
  2. Complex logic untested: The hibernation config determination logic (PATH_CONNECT_WEBSOCKET vs PATH_RAW_WEBSOCKET_PREFIX) should have unit tests
  3. Edge cases: No tests for:
    • Race conditions when multiple WebSockets open simultaneously
    • Behavior when msgIndex wraps or becomes very large
    • Migration from v1 to v2 schema
    • ArrayBuffer comparison edge cases

Recommendation: Add test coverage for at least:

  • arrayBuffersEqual utility
  • Hibernation config logic in getActorHibernationConfig
  • WebSocket lifecycle (open → message → close) with hibernation enabled
  • Schema migration v1 → v2

Recommendations Summary

Must Fix (Blocking)

  1. Fix the wsIndex = -1 bug in onSocketClosed handler
  2. Prevent duplicate WebSocket entries in getActorHibernationConfig
  3. Correct the log message in onSocketClosed

Should Fix (High Priority)

  1. Add test coverage for core hibernation logic
  2. Consider using a Map instead of array for hibernatable WebSocket tracking (performance)
  3. Add validation for wsIndex >= 0 before splice

Nice to Have

  1. Remove unused import in conn.ts
  2. Clean up extra blank lines
  3. Add performance profiling for high-concurrency scenarios
  4. Document the race condition behavior and mitigation strategy

Overall Assessment

The implementation is well-structured and follows most project conventions. However, there are critical bugs that need to be fixed before merging, and the lack of test coverage is concerning for an experimental feature. Once the blocking issues are addressed and tests are added, this will be a solid foundation for WebSocket hibernation.


Review generated with Claude Code

@graphite-app
Copy link
Contributor

graphite-app bot commented Nov 4, 2025

Merge activity

  • Nov 4, 8:09 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 4, 8:10 PM UTC: CI is running for this pull request on a draft pull request (#3349) due to your merge queue CI optimization settings.
  • Nov 4, 8:41 PM UTC: The Graphite merge queue removed this pull request due to removal of a downstack PR #3342.
  • Nov 4, 8:49 PM UTC: NathanFlurry added this pull request to the Graphite merge queue.
  • Nov 4, 8:50 PM UTC: CI is running for this pull request on a draft pull request (#3352) due to your merge queue CI optimization settings.
  • Nov 4, 8:52 PM UTC: Merged by the Graphite merge queue via draft PR: #3352.

@NathanFlurry NathanFlurry force-pushed the 10-31-feat_rivetkit_integrate_websocket_hibernation branch from dcaa1af to e0a1d01 Compare November 4, 2025 20:25
@NathanFlurry NathanFlurry force-pushed the 10-31-fix_use_correct_vbare_functions_for_runner_tunnel branch from 7327333 to c0672bb Compare November 4, 2025 20:25
@claude
Copy link

claude bot commented Nov 4, 2025

PR Review: WebSocket Hibernation Integration

This PR implements WebSocket hibernation functionality for RivetKit, allowing WebSocket connections to be persisted and resumed across actor sleep/wake cycles. Overall, the implementation is well-structured, but there are several issues that should be addressed.


Critical Issues

1. Race Condition in Hibernation Config (rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:259-266)

The getActorHibernationConfig function unconditionally pushes a new hibernatable WebSocket entry to the array, even when existingWs is found or hibernation is disabled:

// Save hibernatable WebSocket
handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
    requestId,
    lastSeenTimestamp: BigInt(Date.now()),
    msgIndex: -1n,
});

Problems:

  • This happens before the WebSocket is actually opened
  • Creates duplicate entries if the same WebSocket reconnects
  • Adds entries even when hibernationConfig.enabled = false
  • Could lead to memory leaks with abandoned WebSocket entries

Recommendation: Only add to the array when hibernationConfig.enabled === true and no existing entry is found. Consider moving this logic to when the WebSocket actually opens.


2. Missing Index Bounds Check (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1602-1630)

In onSocketClosed, the code calls splice with a potentially invalid index:

const wsIndex = this.#persist.hibernatableWebSocket.findIndex((ws) =>
    arrayBuffersEqual(ws.requestId, rivetRequestIdLocal),
);

const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);

Problem: If findIndex returns -1 (not found), splice(-1, 1) will remove the last element from the array, which is incorrect behavior.

Fix:

if (wsIndex !== -1) {
    const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);
    // ... logging
}

3. Incorrect Log Message (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1633)

this.#rLog.debug({
    msg: "actor instance onSocketMessage",  // ← Wrong message
    rivetRequestId,
    isHibernatable: !!persistedHibernatableWebSocket,
    hibernatableWebSocketCount: this.#persist.hibernatableWebSocket.length,
});

This should be "actor instance onSocketClosed" not "onSocketMessage".


High Priority Issues

4. Disabled Hibernation Sleep Check (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1904-1908)

The commented-out code suggests hibernatable WebSockets should prevent actor sleep, but it's currently disabled:

// TODO: Enable this when hibernation is implemented. We're waiting on support for Guard to not auto-wake the actor if it sleeps.
// if (conn.status === "connected" && !conn.isHibernatable)
//     return false;

if (conn.status === "connected") return false;

Concern: This means actors cannot sleep even when all WebSockets are hibernatable, defeating the purpose of hibernation. The TODO should be tracked with a GitHub issue and timeline.


5. Event Listener Cleanup Best Practices (rivetkit-typescript/packages/rivetkit/src/actor/instance.ts:1640-1644)

The cleanup code swallows all errors:

try {
    websocket.removeEventListener("open", onSocketOpened);
    websocket.removeEventListener("message", onSocketMessage);
    websocket.removeEventListener("close", onSocketClosed);
    websocket.removeEventListener("error", onSocketClosed);
} catch {}

Recommendation: Log errors during cleanup for debugging purposes rather than silently swallowing them.


Medium Priority Issues

6. Missing Migration Path

The new BARE schema version v2.bare adds the hibernatableWebSocket field, but there's no clear migration strategy for existing persisted actors using v1.

Questions:

  • How are existing v1 persisted actors upgraded to v2?
  • Is the versioning system backward-compatible?
  • Should there be a migration test?

7. Type Safety: BigInt Conversion (rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:191)

lastMsgIndex: Number(existingWs.msgIndex),

Converting bigint to number could lose precision for very large message indices. Consider whether the HibernationConfig type should accept bigint instead, or if there's a maximum message index constraint.


8. Experimental API Surface

The PR adds several @experimental annotations:

  • canHibernatWebSocket config option
  • rivetRequestId in events
  • rivetMessageIndex in events

Recommendation: Add clear documentation about:

  • What "experimental" means for these APIs
  • Expected timeline to stabilization
  • Breaking change policy

Code Quality Issues

9. Inconsistent Null Handling

The code uses optional chaining (event?.rivetRequestId) in some places but assumes values exist in others. Consider making the null-handling strategy consistent.


10. Performance: Linear Search on Every Message

In onSocketOpened and onSocketMessage, the code performs a linear search through hibernatableWebSocket:

persistedHibernatableWebSocket = this.#persist.hibernatableWebSocket.find((ws) =>
    arrayBuffersEqual(ws.requestId, rivetRequestIdLocal),
);

For actors with many hibernatable WebSockets, this could impact performance. Consider using a Map<string, PersistedHibernatableWebSocket> indexed by a string representation of requestId.


Positive Observations

Good: The BARE schema is well-documented with clear field descriptions
Good: Proper use of structured logging throughout
Good: Error handling in canHibernatWebSocket callback (actor-driver.ts:231-240)
Good: The isHibernatable property is properly encapsulated as a getter on the connection
Good: Comprehensive debugging logs to track WebSocket lifecycle


Testing Recommendations

  1. Add tests for race conditions:

    • Multiple WebSocket connections with the same requestId
    • WebSocket closing before opening completes
    • Rapid connect/disconnect cycles
  2. Add tests for edge cases:

    • findIndex returning -1
    • Very large msgIndex values (BigInt boundaries)
    • Empty hibernatableWebSocket arrays
  3. Add integration tests:

    • Actor hibernation and wake with persisted WebSocket state
    • Message replay after hibernation
    • WebSocket state consistency across sleep/wake cycles

Security Considerations

No critical security issues identified, but consider:

  • Request ID validation: Ensure rivetRequestId cannot be spoofed by malicious clients
  • Memory bounds: Add limits to hibernatableWebSocket array size to prevent unbounded growth

Summary

The PR implements a valuable feature but has several bugs that should be fixed before merging:

Must Fix:

Should Fix:

Nice to Have:

Overall assessment: Needs revision before merge.

@claude
Copy link

claude bot commented Nov 4, 2025

WebSocket Hibernation Feature Review

This PR introduces WebSocket hibernation support, allowing actors to persist WebSocket connections across sleep/wake cycles. Overall, the implementation is well-structured and follows the codebase conventions. Here are my findings:

Strengths ✅

  1. Good Architecture: The feature is cleanly integrated with the existing actor persistence system through a new schema version (v2.bare).
  2. Proper Type Safety: New TypeScript interfaces and types are well-defined with clear documentation.
  3. Logging: Comprehensive debug logging throughout the hibernation lifecycle.
  4. Backward Compatibility: Schema versioning approach maintains compatibility.

Critical Issues 🔴

1. Potential Memory Leak in getActorHibernationConfig (actor-driver.ts:260-264)

The function always pushes a new hibernatable WebSocket to the array, even when checking if hibernation should be enabled. This means:

  • Every call adds an entry, regardless of whether hibernation is actually enabled
  • If a connection fails or is rejected, the entry is never cleaned up
  • Multiple calls for the same requestId could add duplicates

Recommendation: Only add to hibernatableWebSocket array if hibernation is actually enabled:

if (hibernationConfig.enabled) {
  handler.actor[PERSIST_SYMBOL].hibernatableWebSocket.push({
    requestId,
    lastSeenTimestamp: BigInt(Date.now()),
    msgIndex: -1n,
  });
}

2. Race Condition in WebSocket Event Handlers (instance.ts:1551-1598)

The event handlers for open, message, and close are added after the WebSocket may already be open (see tunnel.ts:554-574 where the request is created before the open confirmation is sent). This could cause:

  • Missed events if the WebSocket fires events before handlers are attached
  • onSocketOpened might not capture the rivetRequestId if the event already fired

Recommendation: Restructure to ensure handlers are attached before any WebSocket events can fire, or add defensive checks.

3. Index Out of Bounds Bug (instance.ts:1612-1615)

When removing a hibernatable WebSocket, if wsIndex is -1 (not found), calling splice(-1, 1) will remove the last element instead of doing nothing:

const wsIndex = this.#persist.hibernatableWebSocket.findIndex(...);
const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);  // Bug: -1 removes last element!

Recommendation:

if (wsIndex !== -1) {
  const removed = this.#persist.hibernatableWebSocket.splice(wsIndex, 1);
  // ... rest of logic
} else {
  this.#rLog.warn({
    msg: "could not find hibernatable websocket to remove",
    rivetRequestId,
  });
}

High Priority Issues 🟡

4. BigInt Type Conversion Could Overflow (actor-driver.ts:191)

Converting bigint to number for msgIndex could overflow for very large values:

lastMsgIndex: Number(existingWs.msgIndex),  // Potential overflow

If msgIndex exceeds Number.MAX_SAFE_INTEGER (2^53-1), this will lose precision.

Recommendation: Either document the limitation or use a different approach. Consider if the engine runner can handle BigInt directly.

5. Missing Null/Undefined Check (instance.ts:1586-1587)

event.rivetMessageIndex is used without validation:

persistedHibernatableWebSocket.msgIndex = BigInt(event.rivetMessageIndex);

If rivetMessageIndex is undefined/null, this will create an invalid bigint.

Recommendation: Add validation:

if (event.rivetMessageIndex != null) {
  persistedHibernatableWebSocket.msgIndex = BigInt(event.rivetMessageIndex);
}

6. Inconsistent Error Handling

The getActorHibernationConfig function returns early with { enabled: false } when actors aren't found, but this silently disables hibernation. Consider if this should throw an error or log more prominently.

Code Quality Issues 🟠

7. Duplicate Log Message (instance.ts:1633)

Line 1633 has the message "actor instance onSocketMessage" but should be "actor instance onSocketClosed":

this.#rLog.debug({
  msg: "actor instance onSocketMessage",  // Should be onSocketClosed
  // ...
});

8. Commented TODO Should Be Addressed (instance.ts:1904-1905)

There's a commented-out hibernation check with a TODO about Guard support:

// TODO: Enable this when hibernation is implemented...
// if (conn.status === "connected" && !conn.isHibernatable)
//   return false;

This suggests the feature isn't fully integrated with the sleep logic yet. Should this be tracked in an issue?

9. Empty Catch Blocks (instance.ts:1642-1647, 1652-1657)

Multiple try/catch {} blocks silently swallow errors:

try {
  websocket.removeEventListener("close", onSocketClosed);
  // ...
} catch {}

Consider at minimum logging these errors at debug level.

10. Magic Number -1 for msgIndex

The initial msgIndex: -1n in actor-driver.ts:263 should be documented or use a named constant to explain that -1 means "no messages processed yet".

Testing Concerns 🧪

  • No test coverage found for the new hibernation functionality
  • The PR doesn't include tests for:
    • WebSocket hibernation lifecycle (sleep → wake → message replay)
    • Edge cases (duplicate requestIds, failed hibernation, cleanup)
    • The canHibernatWebSocket configuration option

Recommendation: Add comprehensive tests covering the hibernation flow, especially around the critical bug scenarios identified above.

Security Considerations 🔒

  • The rivetRequestId is treated as an opaque identifier, which is good
  • No obvious injection or security issues detected
  • The arrayBuffersEqual utility correctly compares buffers byte-by-byte

Performance Considerations ⚡

  • The .find() calls on hibernatableWebSocket arrays are O(n), which could be slow with many concurrent hibernated connections
  • Consider using a Map<ArrayBuffer, PersistedHibernatableWebSocket> for O(1) lookups if this becomes a bottleneck

Documentation 📝

  • The @experimental tags are appropriate given the TODO about Guard support
  • Schema v2.bare is well-documented with comments
  • Consider adding JSDoc comments to the new isHibernatable getter and canHibernatWebSocket config option explaining the use cases

Style Compliance ✨

The code generally follows the project conventions from CLAUDE.md:

  • ✅ Uses structured logging with tracing
  • ✅ Lowercase log messages
  • ✅ Proper error handling with anyhow pattern
  • ✅ Hard tabs (would be enforced by formatter)
  • ✅ Timestamp naming (lastSeenTimestamp)

Summary

This is a solid foundation for WebSocket hibernation, but the critical bugs (#1, #2, #3) must be fixed before merging as they could cause memory leaks, data corruption, or missed events in production. The high-priority type safety issues (#4, #5) should also be addressed. Testing is essential for a feature of this complexity.

Overall assessment: Request Changes - Fix critical bugs, add tests, then re-review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants