chore(pegboard): allow configuring reschedule_backoff_max_exponent #3370

NathanFlurry · 2025-11-05T04:55:44Z

No description provided.

vercel · 2025-11-05T04:55:49Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Preview	Comments	Updated (UTC)
rivetkit-serverless	Ready	Preview	Comment	Nov 5, 2025 5:14am

3 Skipped Deployments

Project	Deployment	Preview	Updated (UTC)
rivet-cloud	Ignored		Nov 5, 2025 5:14am
rivet-inspector	Ignored	Preview	Nov 5, 2025 5:14am
rivet-site	Ignored	Preview	Nov 5, 2025 5:14am

NathanFlurry · 2025-11-05T04:56:25Z

How to use the Graphite Merge Queue

Add the label merge-queue to this PR to add it to the merge queue.

You must have a Graphite account in order to use the merge queue. Sign up using this link.

_{An organization admin has enabled the Graphite Merge Queue in this repository.} _{Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.}

This stack of pull requests is managed by Graphite. Learn more about stacking.

claude · 2025-11-05T04:57:34Z

PR Review: chore(pegboard): allow configuring reschedule_backoff_max_exponent

Summary

This PR adds configuration support for the maximum backoff exponent when rescheduling actors. The change is straightforward and follows existing patterns in the codebase.

Code Quality ✅

Positive aspects:

Follows existing code patterns consistently (similar to base_retry_timeout and retry_reset_duration)
Documentation is clear and follows the established format
Appropriate use of Option<usize> with sensible default (8)
Correctly marked as Experimental matching other pegboard configs
Function signature changes are minimal and backward-compatible

Conventions adherence:

✅ Uses workspace dependencies correctly
✅ Follows Rust naming conventions (snake_case)
✅ Documentation style matches existing code
✅ Import organization is consistent

Potential Issues 🔍

1. Comment accuracy in TypeScript (minor)

// Set max exponent to 1 to have a maximum of base_retry_timeout
RIVET__PEGBOARD__RESCHEDULE_BACKOFF_MAX_EXPONENT: "1",

The comment says "maximum of base_retry_timeout" but this isn't quite accurate. Based on the backoff implementation:

pub fn current_duration(&self) -> usize {
    self.wait * 2usize.pow(self.i.min(self.max_exponent) as u32)
}

With max_exponent=1 and base_retry_timeout=100:

retry 0: 100 * 2^0 = 100ms
retry 1: 100 * 2^1 = 200ms
retry 2+: 100 * 2^1 = 200ms (capped)

Suggested comment improvement:

// Set max exponent to 1 to cap backoff at 2x base_retry_timeout
RIVET__PEGBOARD__RESCHEDULE_BACKOFF_MAX_EXPONENT: "1",

2. Missing validation (minor consideration)
The config doesn't validate that max_exponent is reasonable. While usize prevents negatives, extremely large values (e.g., 63) could cause 2^63 overflow in the backoff calculation. However, this is a configuration issue that would be caught in testing, and the default of 8 is safe.

Performance ✅

Negligible performance impact
Config lookups are O(1)
Backoff calculation remains efficient

Security ✅

No security concerns identified
Configuration is internal and doesn't expose attack surface

Test Coverage ⚠️

Missing:

No tests explicitly verify the new configuration parameter
No tests validate that different max_exponent values produce expected backoff durations

Recommendation:
Consider adding a unit test in engine/packages/pegboard/src/workflows/actor/runtime.rs or the backoff module to verify behavior with different exponent values:

#[cfg(test)]
mod tests {
    use super::*;
    
    #[test]
    fn test_reschedule_backoff_respects_max_exponent() {
        // Test that max_exponent=1 caps backoff correctly
        let backoff = reschedule_backoff(5, 100, 1);
        assert_eq!(backoff.current_duration(), 200); // 100 * 2^1
        
        // Test default max_exponent=8 allows exponential growth
        let backoff = reschedule_backoff(5, 100, 8);
        assert_eq!(backoff.current_duration(), 3200); // 100 * 2^5
    }
}

Architecture Considerations ✅

The change fits well with the existing architecture:

Configuration layer remains clean and consistent
Workflow layer properly propagates the config
Development environment gets appropriate lower limits for fast iteration

Recommendations

Fix the comment in engine-process/mod.ts to accurately describe the math
Consider adding tests to validate the backoff behavior with different exponents
Optional: Add a sanity check in the config accessor to cap max_exponent at something reasonable (e.g., 16) to prevent accidental overflow

Verdict: ✅ Approve with minor suggestions

This is a clean, well-implemented change that improves configurability for actor rescheduling. The issues identified are minor and don't block merging. The pattern follows established conventions and the code is production-ready.

Estimated risk: Low
Breaking changes: None
Migration required: None

claude · 2025-11-05T05:16:38Z

PR Review: chore(pegboard): allow configuring reschedule_backoff_max_exponent

Summary

This PR adds configuration support for the reschedule backoff maximum exponent in Pegboard and includes significant enhancements to the Next.js integration for hot reload handling in development. Overall, the changes are well-structured and follow the codebase conventions.

Code Quality & Best Practices

Rust Changes ✅

The Rust changes are clean and follow established patterns:

engine/packages/config/src/config/pegboard.rs: Properly adds the new config field with appropriate documentation and default value (8)
engine/packages/pegboard/src/workflows/actor/runtime.rs: Correctly threads the new parameter through to the reschedule_backoff function

The implementation is consistent with existing config patterns in the codebase.

TypeScript Changes - Mixed

rivetkit-typescript/packages/next-js/src/mod.ts (168 additions):

Concerns:

Scope Creep: This file contains substantial new functionality (file watching, stream wrapping) that appears unrelated to the PR title about configuring backoff exponent. This should ideally be in a separate PR.
Platform-specific code: The file watching implementation checks for .next/server/app/api/rivet/[...all]/route.js which is specific to Next.js's build structure. This could break if Next.js changes its output structure.
Polling vs. Native Watchers: The comment mentions avoiding file watchers due to cross-platform issues, but polling every 1 second is resource-intensive. Consider:
- Document the performance implications
- Make the interval configurable
- Consider using fs.watch with proper error handling as a fallback
Error Handling: In checkFile() at rivetkit-typescript/packages/next-js/src/mod.ts:110, errors are logged but don't stop the interval. This is good, but the error message says "failed to check for route file change" - consider if there are error conditions that should stop polling (e.g., permission errors).
Memory Leaks: Each request creates a new interval (line 116). While intervals are cleared on stream finish/abort, ensure they're also cleared if the function exits early or throws.

Potential Bugs

Critical Issue in actor-driver.ts ⚠️

rivetkit-typescript/packages/rivetkit/src/drivers/engine/actor-driver.ts:653

this.shutdownRunner(false);  // Changed from true to false

The comment states: "We cannot assume that the request will always be closed gracefully by Rivet. We always proceed with a graceful shutdown in case the request was terminated for any other reason."

However, the comment appears incomplete: "If we did not use a graceful shutdown, the runner would" - this sentence is cut off.

Questions:

What happens if we use a forceful shutdown when the request is terminated by other means?
Is this change intentional? It seems like it could cause runners to not shut down properly in some cases.
This change should be explained more thoroughly in the PR description or commit message.

Minor Issues

rivetkit-typescript/packages/next-js/src/mod.ts:94: The route path is hardcoded. What happens if the user has a different route configuration?
rivetkit-typescript/packages/next-js/src/mod.ts:102: lastMtime !== null && mtime !== lastMtime - Consider using strict inequality to avoid type coercion issues.

Performance Considerations

Polling overhead: Every request in development now polls the file system every second. With multiple concurrent requests, this could create significant I/O overhead.
Stream wrapping overhead: Every response body is wrapped in a custom stream transformer (lines 125-154). While necessary for the cleanup logic, this adds a small performance penalty.
Backoff configuration: The new configuration allows setting max_exponent to 1 in development (rivetkit-typescript/packages/rivetkit/src/engine-process/mod.ts:100), which dramatically reduces retry delays. Good for development, but ensure this is well-documented.

Security Concerns

File system access: The code directly accesses the file system without validation. While it's reading a known path, ensure there's no way for user input to influence the routePath.
Abort signal handling: The code merges abort signals from multiple sources. Ensure the abort controller cannot be triggered maliciously to DoS the service.

Test Coverage ⚠️

No tests were added for:

The new reschedule_backoff_max_exponent configuration
The file watching functionality
The stream wrapping logic
The abort signal merging behavior

Recommendations:

Add unit tests for the backoff calculation with different max_exponent values
Add integration tests for the file watching behavior
Test edge cases: missing files, permission errors, rapid file changes

Additional Recommendations

Split this PR: Consider splitting the Next.js development improvements into a separate PR. This PR should focus solely on the backoff configuration.
Documentation: Add documentation explaining:
- When to adjust reschedule_backoff_max_exponent
- The implications of different values
- Why development uses a max_exponent of 1
Type Safety: In the TypeScript changes, ensure all error paths are handled properly and consider adding stricter TypeScript checks.
Logging: The new file watching code adds several debug logs. Ensure these don't spam logs in production (they shouldn't since it's dev-only, but verify).

Compliance with CLAUDE.md ✅

✅ Uses workspace dependencies correctly
✅ Follows Rust formatting conventions (hard tabs)
✅ Error handling follows anyhow patterns
✅ Structured logging used appropriately
✅ Configuration follows existing patterns

Verdict

Recommendation: Request Changes

The core feature (configurable backoff exponent) is solid, but:

The incomplete comment in actor-driver.ts needs clarification
The substantial Next.js changes should be in a separate PR
Test coverage is needed
The shutdownRunner(false) change needs justification

Please address these concerns before merging.

NathanFlurry requested a review from MasterPtato November 5, 2025 04:55

NathanFlurry mentioned this pull request Nov 5, 2025

fix(next-js): auto shut down runners when source code updates #3369

Merged

vercel bot deployed to Preview – rivetkit-serverless November 5, 2025 04:57 View deployment

graphite-app bot force-pushed the 11-04-fix_next-js_auto_shut_down_runners_when_source_code_updates branch from 943ae66 to d42c437 Compare November 5, 2025 05:12

fix(next-js): auto shut down runners when source code updates

ddef08f

graphite-app bot force-pushed the 11-04-fix_next-js_auto_shut_down_runners_when_source_code_updates branch from d42c437 to ddef08f Compare November 5, 2025 05:13

chore(pegboard): allow configuring reschedule_backoff_max_exponent

b89f7b9

graphite-app bot force-pushed the 11-05-chore_pegboard_allow_configuring_reschedule_backoff_max_exponent branch from b824c00 to b89f7b9 Compare November 5, 2025 05:13

Base automatically changed from 11-04-fix_next-js_auto_shut_down_runners_when_source_code_updates to main November 5, 2025 05:14

vercel bot deployed to Preview – rivetkit-serverless November 5, 2025 05:14 View deployment

NathanFlurry merged commit f5628c8 into main Nov 5, 2025
9 of 11 checks passed

NathanFlurry deleted the 11-05-chore_pegboard_allow_configuring_reschedule_backoff_max_exponent branch November 5, 2025 05:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

chore(pegboard): allow configuring reschedule_backoff_max_exponent #3370

chore(pegboard): allow configuring reschedule_backoff_max_exponent #3370

Uh oh!

NathanFlurry commented Nov 5, 2025

Uh oh!

vercel bot commented Nov 5, 2025 •

edited

Loading

Uh oh!

NathanFlurry commented Nov 5, 2025 •

edited

Loading

Uh oh!

claude bot commented Nov 5, 2025

Uh oh!

Uh oh!

claude bot commented Nov 5, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

chore(pegboard): allow configuring reschedule_backoff_max_exponent #3370

chore(pegboard): allow configuring reschedule_backoff_max_exponent #3370

Uh oh!

Conversation

NathanFlurry commented Nov 5, 2025

Uh oh!

vercel bot commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NathanFlurry commented Nov 5, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use the Graphite Merge Queue

Uh oh!

claude bot commented Nov 5, 2025

PR Review: chore(pegboard): allow configuring reschedule_backoff_max_exponent

Summary

Code Quality ✅

Potential Issues 🔍

Performance ✅

Security ✅

Test Coverage ⚠️

Architecture Considerations ✅

Recommendations

Verdict: ✅ Approve with minor suggestions

Uh oh!

Uh oh!

claude bot commented Nov 5, 2025

PR Review: chore(pegboard): allow configuring reschedule_backoff_max_exponent

Summary

Code Quality & Best Practices

Rust Changes ✅

TypeScript Changes - Mixed

Potential Bugs

Critical Issue in actor-driver.ts ⚠️

Minor Issues

Performance Considerations

Security Concerns

Test Coverage ⚠️

Additional Recommendations

Compliance with CLAUDE.md ✅

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

vercel bot commented Nov 5, 2025 •

edited

Loading

NathanFlurry commented Nov 5, 2025 •

edited

Loading