Skip to content

Conversation

@NathanFlurry
Copy link
Member

No description provided.

@vercel
Copy link

vercel bot commented Nov 5, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Preview Comments Updated (UTC)
rivetkit-serverless Ready Ready Preview Comment Nov 5, 2025 5:14am
3 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
rivet-cloud Ignored Ignored Nov 5, 2025 5:14am
rivet-inspector Ignored Ignored Preview Nov 5, 2025 5:14am
rivet-site Ignored Ignored Preview Nov 5, 2025 5:14am

Copy link
Member Author

NathanFlurry commented Nov 5, 2025


How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has enabled the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

@claude
Copy link

claude bot commented Nov 5, 2025

PR Review: chore(pegboard): allow configuring reschedule_backoff_max_exponent

Summary

This PR adds configuration support for the maximum backoff exponent when rescheduling actors. The change is straightforward and follows existing patterns in the codebase.

Code Quality ✅

Positive aspects:

  • Follows existing code patterns consistently (similar to base_retry_timeout and retry_reset_duration)
  • Documentation is clear and follows the established format
  • Appropriate use of Option<usize> with sensible default (8)
  • Correctly marked as Experimental matching other pegboard configs
  • Function signature changes are minimal and backward-compatible

Conventions adherence:

  • ✅ Uses workspace dependencies correctly
  • ✅ Follows Rust naming conventions (snake_case)
  • ✅ Documentation style matches existing code
  • ✅ Import organization is consistent

Potential Issues 🔍

1. Comment accuracy in TypeScript (minor)

// Set max exponent to 1 to have a maximum of base_retry_timeout
RIVET__PEGBOARD__RESCHEDULE_BACKOFF_MAX_EXPONENT: "1",

The comment says "maximum of base_retry_timeout" but this isn't quite accurate. Based on the backoff implementation:

pub fn current_duration(&self) -> usize {
    self.wait * 2usize.pow(self.i.min(self.max_exponent) as u32)
}

With max_exponent=1 and base_retry_timeout=100:

  • retry 0: 100 * 2^0 = 100ms
  • retry 1: 100 * 2^1 = 200ms
  • retry 2+: 100 * 2^1 = 200ms (capped)

Suggested comment improvement:

// Set max exponent to 1 to cap backoff at 2x base_retry_timeout
RIVET__PEGBOARD__RESCHEDULE_BACKOFF_MAX_EXPONENT: "1",

2. Missing validation (minor consideration)
The config doesn't validate that max_exponent is reasonable. While usize prevents negatives, extremely large values (e.g., 63) could cause 2^63 overflow in the backoff calculation. However, this is a configuration issue that would be caught in testing, and the default of 8 is safe.

Performance ✅

  • Negligible performance impact
  • Config lookups are O(1)
  • Backoff calculation remains efficient

Security ✅

  • No security concerns identified
  • Configuration is internal and doesn't expose attack surface

Test Coverage ⚠️

Missing:

  • No tests explicitly verify the new configuration parameter
  • No tests validate that different max_exponent values produce expected backoff durations

Recommendation:
Consider adding a unit test in engine/packages/pegboard/src/workflows/actor/runtime.rs or the backoff module to verify behavior with different exponent values:

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_reschedule_backoff_respects_max_exponent() {
        // Test that max_exponent=1 caps backoff correctly
        let backoff = reschedule_backoff(5, 100, 1);
        assert_eq!(backoff.current_duration(), 200); // 100 * 2^1
        
        // Test default max_exponent=8 allows exponential growth
        let backoff = reschedule_backoff(5, 100, 8);
        assert_eq!(backoff.current_duration(), 3200); // 100 * 2^5
    }
}

Architecture Considerations ✅

The change fits well with the existing architecture:

  • Configuration layer remains clean and consistent
  • Workflow layer properly propagates the config
  • Development environment gets appropriate lower limits for fast iteration

Recommendations

  1. Fix the comment in engine-process/mod.ts to accurately describe the math
  2. Consider adding tests to validate the backoff behavior with different exponents
  3. Optional: Add a sanity check in the config accessor to cap max_exponent at something reasonable (e.g., 16) to prevent accidental overflow

Verdict: ✅ Approve with minor suggestions

This is a clean, well-implemented change that improves configurability for actor rescheduling. The issues identified are minor and don't block merging. The pattern follows established conventions and the code is production-ready.

Estimated risk: Low
Breaking changes: None
Migration required: None

@graphite-app graphite-app bot force-pushed the 11-04-fix_next-js_auto_shut_down_runners_when_source_code_updates branch from 943ae66 to d42c437 Compare November 5, 2025 05:12
@graphite-app graphite-app bot force-pushed the 11-04-fix_next-js_auto_shut_down_runners_when_source_code_updates branch from d42c437 to ddef08f Compare November 5, 2025 05:13
@graphite-app graphite-app bot force-pushed the 11-05-chore_pegboard_allow_configuring_reschedule_backoff_max_exponent branch from b824c00 to b89f7b9 Compare November 5, 2025 05:13
Base automatically changed from 11-04-fix_next-js_auto_shut_down_runners_when_source_code_updates to main November 5, 2025 05:14
@NathanFlurry NathanFlurry merged commit f5628c8 into main Nov 5, 2025
9 of 11 checks passed
@NathanFlurry NathanFlurry deleted the 11-05-chore_pegboard_allow_configuring_reschedule_backoff_max_exponent branch November 5, 2025 05:15
@claude
Copy link

claude bot commented Nov 5, 2025

PR Review: chore(pegboard): allow configuring reschedule_backoff_max_exponent

Summary

This PR adds configuration support for the reschedule backoff maximum exponent in Pegboard and includes significant enhancements to the Next.js integration for hot reload handling in development. Overall, the changes are well-structured and follow the codebase conventions.

Code Quality & Best Practices

Rust Changes ✅

The Rust changes are clean and follow established patterns:

  • engine/packages/config/src/config/pegboard.rs: Properly adds the new config field with appropriate documentation and default value (8)
  • engine/packages/pegboard/src/workflows/actor/runtime.rs: Correctly threads the new parameter through to the reschedule_backoff function

The implementation is consistent with existing config patterns in the codebase.

TypeScript Changes - Mixed

rivetkit-typescript/packages/next-js/src/mod.ts (168 additions):

Concerns:

  1. Scope Creep: This file contains substantial new functionality (file watching, stream wrapping) that appears unrelated to the PR title about configuring backoff exponent. This should ideally be in a separate PR.

  2. Platform-specific code: The file watching implementation checks for .next/server/app/api/rivet/[...all]/route.js which is specific to Next.js's build structure. This could break if Next.js changes its output structure.

  3. Polling vs. Native Watchers: The comment mentions avoiding file watchers due to cross-platform issues, but polling every 1 second is resource-intensive. Consider:

    • Document the performance implications
    • Make the interval configurable
    • Consider using fs.watch with proper error handling as a fallback
  4. Error Handling: In checkFile() at rivetkit-typescript/packages/next-js/src/mod.ts:110, errors are logged but don't stop the interval. This is good, but the error message says "failed to check for route file change" - consider if there are error conditions that should stop polling (e.g., permission errors).

  5. Memory Leaks: Each request creates a new interval (line 116). While intervals are cleared on stream finish/abort, ensure they're also cleared if the function exits early or throws.

Potential Bugs

Critical Issue in actor-driver.ts ⚠️

rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:653

this.shutdownRunner(false);  // Changed from true to false

The comment states: "We cannot assume that the request will always be closed gracefully by Rivet. We always proceed with a graceful shutdown in case the request was terminated for any other reason."

However, the comment appears incomplete: "If we did not use a graceful shutdown, the runner would" - this sentence is cut off.

Questions:

  • What happens if we use a forceful shutdown when the request is terminated by other means?
  • Is this change intentional? It seems like it could cause runners to not shut down properly in some cases.
  • This change should be explained more thoroughly in the PR description or commit message.

Minor Issues

  1. rivetkit-typescript/packages/next-js/src/mod.ts:94: The route path is hardcoded. What happens if the user has a different route configuration?

  2. rivetkit-typescript/packages/next-js/src/mod.ts:102: lastMtime !== null && mtime !== lastMtime - Consider using strict inequality to avoid type coercion issues.

Performance Considerations

  1. Polling overhead: Every request in development now polls the file system every second. With multiple concurrent requests, this could create significant I/O overhead.

  2. Stream wrapping overhead: Every response body is wrapped in a custom stream transformer (lines 125-154). While necessary for the cleanup logic, this adds a small performance penalty.

  3. Backoff configuration: The new configuration allows setting max_exponent to 1 in development (rivetkit-typescript/packages/rivetkit/src/engine-process/mod.ts:100), which dramatically reduces retry delays. Good for development, but ensure this is well-documented.

Security Concerns

  1. File system access: The code directly accesses the file system without validation. While it's reading a known path, ensure there's no way for user input to influence the routePath.

  2. Abort signal handling: The code merges abort signals from multiple sources. Ensure the abort controller cannot be triggered maliciously to DoS the service.

Test Coverage ⚠️

No tests were added for:

  • The new reschedule_backoff_max_exponent configuration
  • The file watching functionality
  • The stream wrapping logic
  • The abort signal merging behavior

Recommendations:

  1. Add unit tests for the backoff calculation with different max_exponent values
  2. Add integration tests for the file watching behavior
  3. Test edge cases: missing files, permission errors, rapid file changes

Additional Recommendations

  1. Split this PR: Consider splitting the Next.js development improvements into a separate PR. This PR should focus solely on the backoff configuration.

  2. Documentation: Add documentation explaining:

    • When to adjust reschedule_backoff_max_exponent
    • The implications of different values
    • Why development uses a max_exponent of 1
  3. Type Safety: In the TypeScript changes, ensure all error paths are handled properly and consider adding stricter TypeScript checks.

  4. Logging: The new file watching code adds several debug logs. Ensure these don't spam logs in production (they shouldn't since it's dev-only, but verify).

Compliance with CLAUDE.md ✅

  • ✅ Uses workspace dependencies correctly
  • ✅ Follows Rust formatting conventions (hard tabs)
  • ✅ Error handling follows anyhow patterns
  • ✅ Structured logging used appropriately
  • ✅ Configuration follows existing patterns

Verdict

Recommendation: Request Changes

The core feature (configurable backoff exponent) is solid, but:

  1. The incomplete comment in actor-driver.ts needs clarification
  2. The substantial Next.js changes should be in a separate PR
  3. Test coverage is needed
  4. The shutdownRunner(false) change needs justification

Please address these concerns before merging.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants