Add performance tests #1868

alexeyr-ci2 · 2025-10-20T14:15:17Z

Summary

Add performance tests. Fixes #1777.

Pull Request checklist

~~[ ] Add/update test to cover these changes~~
Update documentation
Update CHANGELOG file

This change is

coderabbitai · 2025-10-20T14:15:28Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch alexeyr/performance-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2025-11-05T12:38:41Z

Code Review - PR #1868: Add Performance Tests

Thanks for this contribution! This is a solid implementation of performance testing infrastructure.

✅ Strengths

Comprehensive Tool Coverage: Using three industry-standard tools (Fortio, Vegeta, k6) provides good cross-validation
Well-Structured Workflow: Clear separation of concerns with logical step grouping
Good Error Handling: Proper use of set -euo pipefail and validation steps
Caching Strategy: Smart use of GitHub Actions cache for benchmark tool binaries
Security Conscious: SSH access properly gated with warnings and actor-only access
Flexible Configuration: Parameterized inputs allow customization

🐛 Critical Issues

Node.js Version Mismatch (benchmark.yml:155)
- Uses Node 22, but PR Fix Node.js 22 V8 cache crash by reverting to Node 20 #1925 documents Node 22 V8 cache crashes
- Recommend changing to Node 20 until Fix Node.js 22 V8 cache crash by reverting to Node 20 #1925 is resolved
Missing Trailing Newlines (CRITICAL per CLAUDE.md)
- Files spec/dummy/bin/prod and spec/dummy/bin/prod-assets missing final newlines
- This will cause CI failures - Run: rake autofix to fix
Race Condition in Server Startup (benchmark.yml:220)
- Background server might not be fully initialized when curl check succeeds
- Recommend adding health check endpoint or verifying multiple successful responses
Incomplete PR Checklist
- Documentation and CHANGELOG boxes unchecked
- Per CLAUDE.md: Should update CHANGELOG for user-visible features

⚠️ Performance & Quality Issues

bench.sh:66 - Unrealistic iteration count assumes 10 iterations/sec per VU
Fortio Optimization Missing - Referenced guide not implemented
No Resource Monitoring - Missing CPU/memory metrics
Missing Test Validation - No thresholds to detect performance regressions
Single Route Testing - Only tests server_side_hello_world_hooks
Complex Validation Logic - Lines 17-27 hard to read
Inconsistent Quoting - Variables not consistently quoted
Hard-coded Timeouts - 30s and 60s timeouts not configurable

📚 Documentation Issues

Missing Documentation - No README for running benchmarks locally
Workflow Triggers - Runs on ALL PRs (expensive, should use labels/manual only)
No Historical Tracking - Results not compared over time

🎯 Before Merging

Fix Node.js version (22 → 20)
Add trailing newlines
Run bundle exec rubocop
Update CHANGELOG.md
Add performance thresholds
Update workflow triggers

📊 Summary

Assessment: Good foundation but needs critical fixes before merge

Effort: Critical issues ~30 mins, High priority ~1-2 hours

Main concerns: Node.js version mismatch and missing trailing newlines will cause CI failures.

Let me know if you need help with any of these!

claude · 2025-11-05T15:52:59Z

PR Review: Add Performance Tests (1868)

Summary

This PR adds comprehensive performance testing infrastructure using three industry-standard benchmarking tools (Fortio, Vegeta, and k6). Well-structured implementation addressing issue 1777.

Critical Issues

1. Missing Trailing Newlines

CRITICAL per CLAUDE.md: Files MUST end with newline character or CI will fail:

spec/dummy/bin/prod (line 4)
spec/dummy/bin/prod-assets (line 9)
spec/performance/bench.sh (line 202)

2. RuboCop Required

CRITICAL per CLAUDE.md: Must run bundle exec rubocop and fix ALL violations before commit.

Potential Bugs

K6 JSON Parsing (bench.sh:187-192)

The jq expression uses add which fails on empty arrays. Use add // 0 for safety.

Server Startup Race (benchmark.yml:254)

Uses exit 0 inside loop which exits the entire step. Should use break instead.

npm vs yarn (benchmark.yml:200)

Uses npm install --global yalc but CLAUDE.md requires yarn-only commands.

Code Quality Strengths

Excellent tool selection - three benchmarking tools for cross-validation
Well-structured workflow with clear sections
Good error handling with set -euo pipefail
Comprehensive parameterization via workflow inputs
Smart binary caching for faster CI
Security-conscious SSH access gating

Suggestions

Shell portability: Replace bc with awk for float comparisons
Hardcoded route: Make server_side_hello_world_hooks configurable
Duration: 30s may be too short, consider 60s for stable measurements
Add documentation for local benchmark execution
Consider performance regression detection vs master
Add spec/performance/README.md with interpretation guide

Test Coverage

Good: Infrastructure in place, runs on PR/push, results preserved as artifacts
Missing: No bash script tests, no performance thresholds, no local testing docs

Verdict

Solid PR with high code quality. Performance testing infrastructure will be valuable.

Required Before Merge:

Fix trailing newlines (will cause CI failure)
Run bundle exec rubocop and fix violations
Fix K6 empty array bug

PR Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a well-structured addition to the repository. Here's my detailed feedback:

✅ Strengths

Comprehensive Tool Coverage: Great choice using three different load testing tools (Fortio, Vegeta, k6) to get diverse perspectives on performance
Excellent Error Handling: The bash script has robust parameter validation and error handling throughout
Smart Caching: GitHub Actions workflow efficiently caches binaries to speed up future runs
Good Documentation: Clear comments, parameter descriptions, and helpful echo statements
Configurable Parameters: Extensive workflow_dispatch inputs allow flexible testing scenarios
Production-Ready Setup: Proper Puma configuration with workers and preloading for production benchmarks

🐛 Potential Issues

1. Script Portability - bc Dependency (bench.sh:21,33)

The script uses bc for floating-point comparisons but bc is not installed in the workflow.

Fix: Add bc to the apt package installation step in the workflow, or use bash-native arithmetic for integer comparisons.

2. Race Condition in Server Startup (benchmark.yml:248)

The server process is backgrounded but there is no guarantee it has started before the polling loop begins. If the server fails to start immediately, the loop might miss error messages.

Suggestion: Add a brief sleep 2 before the polling loop, or capture the PID and check if the process is still running during the timeout loop.

3. k6 Status Reporting Edge Case (bench.sh:199-204)

If .root_group.checks is empty, add returns null, causing K6_REQS_OTHER calculation to fail.

Fix: Add null handling by changing the jq expression to use | add // 0

4. Missing Newline at End of Files

Per CLAUDE.md requirements: ALWAYS ensure files end with a newline character

Files needing newlines:

spec/dummy/bin/prod (line 4)
spec/dummy/bin/prod-assets (line 9)
spec/performance/bench.sh (line 214)

Fix: Run rake autofix or add trailing newlines manually.

5. Benchmark Workflow Runs on Every PR (benchmark.yml:57-58)

Running 30-second benchmarks with 3 tools on every PR will consume significant CI minutes and slow down PR feedback.

Suggestion:

Remove pull_request trigger and only keep workflow_dispatch + push to master
Or add a condition to only run on PRs with a specific label (e.g., performance)
Or significantly reduce default duration (e.g., 10s) for PR runs

🔍 Code Quality & Best Practices

Security

✅ SSH access is properly gated behind workflow_dispatch input with clear warnings
✅ limit-access-to-actor: true prevents unauthorized access
⚠️ Consider adding a comment warning maintainers never to merge code while SSH session is active

Performance

✅ Binary caching will significantly speed up repeated runs
✅ Server warm-up phase (10 requests) is good practice
⚠️ REQUEST_TIMEOUT=60s is very generous - consider 30s default to catch timeout issues faster

Maintainability

✅ Clear section comments in YAML
✅ Parameterized configuration
⚠️ Consider extracting the jq parsing logic (lines 167-209) into a separate function for readability

📊 Test Coverage

Missing:

❌ No automated tests for the bench.sh script itself
❌ No validation that the summary.txt format is correct
❌ No performance regression detection (just data collection)

Suggestions:

Add a unit test for bench.sh parameter validation
Consider storing benchmark results over time and comparing against baselines
Add a step to post benchmark results as a PR comment for visibility

🔧 Configuration Issues

Puma Configuration (spec/dummy/config/puma.rb:38)

Workflow sets default to 4 workers (WEB_CONCURRENCY: 4), but Puma defaults to 2 if not set. The workflow default should match.

Consistency check needed: Verify that WEB_CONCURRENCY=4 is intentional for CI runners with 2 CPU cores.

📝 Documentation & Changelog

Per CLAUDE.md: Update CHANGELOG.md for user-visible changes only (features, bug fixes, breaking changes, deprecations, performance improvements)

This PR qualifies as it adds a performance testing feature. Needs:

✅ Changelog entry format: PR 1868 by alexeyr-ci2: Add performance benchmarking workflow with Fortio, Vegeta, and k6.

Missing:

❌ No README/docs update explaining how to run benchmarks locally
❌ No documentation on interpreting benchmark results

🎯 Recommendations

Before Merge (Critical):

✅ Add trailing newlines to all new files
✅ Run bundle exec rubocop and fix any violations
✅ Add CHANGELOG entry
✅ Fix k6 null handling edge case
✅ Reconsider pull_request trigger or reduce duration

Nice to Have:
6. Add bc to dependencies or remove floating-point support
7. Add brief README section on running benchmarks
8. Handle server startup race condition
9. Consider performance regression detection

🎓 Overall Assessment

This is a solid, production-ready implementation of performance testing. The code quality is high, with excellent error handling and configurability. The main concerns are:

CI cost - running on every PR may be expensive
Missing documentation - users won't know how to use this
File formatting - needs trailing newlines per repo standards

Recommendation: Request changes for the critical items (newlines, changelog, PR trigger), then approve.

Score: 8/10 - Great implementation, minor issues to address.

claude · 2025-11-07T13:16:47Z

Pull Request Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a valuable addition to the project. Below is my detailed review:

✅ Strengths

Multi-tool approach: Using Fortio, Vegeta, and K6 provides cross-validation of results and flexibility for different testing scenarios.
Well-structured workflow: The GitHub Actions workflow is logically organized with clear steps and helpful comments.
Configurable parameters: Extensive configurability via workflow inputs allows fine-tuning for different performance scenarios.
Good error handling: The bench.sh script includes comprehensive input validation and error messages.
Caching: Binary caching for tools reduces workflow execution time.
Production mode testing: Tests against production-like configuration (Puma workers, precompiled assets) for realistic results.

🔍 Issues and Concerns

Critical Issues

Missing file newlines (spec/dummy/bin/prod, spec/dummy/bin/prod-assets)
- Location: spec/dummy/bin/prod:4, spec/dummy/bin/prod-assets:9
- Impact: CI will fail per CLAUDE.md requirements
- Fix: Ensure both files end with a newline character
- Reference: CLAUDE.md states "ALWAYS ensure files end with a newline character"
RuboCop not run
- Impact: MANDATORY requirement before commits
- Required action: Run bundle exec rubocop and fix ALL violations
- Reference: CLAUDE.md: "BEFORE EVERY COMMIT/PUSH: ALWAYS run bundle exec rubocop and fix ALL violations"

High Priority Issues

Workflow runs on every PR/push to master (benchmark.yml:55-58)
- Impact: Performance tests are resource-intensive and will slow down CI significantly
- Recommendation: Only run on manual workflow_dispatch, specific labels (e.g., run-benchmarks), or scheduled runs (e.g., nightly)
Server process not properly cleaned up (benchmark.yml:248)
- Issue: Server started in background with bin/prod & but never explicitly killed
- Impact: May leave orphaned processes, especially if workflow fails
- Fix: Add cleanup step or use a trap/signal handler

Medium Priority Issues

Hardcoded route in bench.sh (line 6)
- Issue: ROUTE:-server_side_hello_world_hooks may not exist or may not be representative
- Recommendation: Verify this route exists in the dummy app, consider testing multiple routes, and document what this route tests
No baseline comparison
- Issue: Results are collected but not compared against previous runs
- Enhancement: Consider using benchmark-action to track performance over time
Missing documentation
- Issue: No README or documentation for how to run benchmarks locally, interpret results, or understand what "good" performance looks like
- Recommendation: Add spec/performance/README.md

🔒 Security Considerations

SSH access via tmate (benchmark.yml:108-113)
- Status: Properly gated behind workflow input (default: false)
- Good: Includes security warnings and limit-access-to-actor

🧪 Test Coverage

Missing:

No unit tests for the bash script logic
No tests verifying the benchmark workflow succeeds
No validation that summary output format is correct

Recommendations:

Add a dry-run mode to bench.sh for testing
Consider shellcheck for bash script validation

📝 Suggested Changes

Must Fix (Blocks Merge)

Add trailing newlines to spec/dummy/bin/prod and spec/dummy/bin/prod-assets
Run bundle exec rubocop and fix all violations
Change workflow triggers to avoid running on every PR/push

Should Fix (High Priority)

Add server cleanup/shutdown in workflow
Verify the benchmark route exists and is representative
Add basic documentation (spec/performance/README.md)

Nice to Have

Add baseline comparison/tracking
Add shellcheck to CI for bash scripts
Consider shorter default duration for PR checks
Extract K6 test to separate file

✅ Changelog

Per CLAUDE.md, this should be added to CHANGELOG.md as it's a new feature.

🎯 Overall Assessment

This is a solid foundation for performance testing with good multi-tool coverage and configurability. The main blockers are CLAUDE.md compliance issues (trailing newlines, RuboCop). Once those are fixed and the workflow triggers are adjusted to avoid running on every PR, this will be ready to merge.

Estimated effort to address critical issues: ~30 minutes

Great work overall! 🚀

claude · 2025-11-07T13:40:40Z

Code Review: Add Performance Tests

Thank you for this comprehensive performance testing addition! This is a well-structured PR that addresses issue #1777.

✅ Strengths

1. Excellent Tool Coverage

Multiple benchmarking tools (Fortio, Vegeta, k6) provide good cross-validation
Smart caching strategy for tool binaries reduces CI time
Flexible configuration via environment variables

2. Robust Error Handling

Good use of set -euo pipefail in bash scripts
Comprehensive input validation with clear error messages
Server health checks with timeout protection

3. Production-Ready Configuration

Proper Puma clustering setup for production benchmarks
Preloading and worker configuration aligned with best practices
Appropriate separation of prod/dev configurations

🔒 Security Concerns - CRITICAL

SSH Access Feature (lines 96-119 in .github/workflows/benchmark.yml)

This poses significant security risks:

Issues:

Detached mode risk: Using detached: true leaves the SSH session open indefinitely
Repository exposure: The entire repository is accessible via SSH
No audit trail: Limited visibility into commands executed during SSH sessions
Secrets exposure risk: If secrets are added later, they could be exposed

Recommendations:

Consider removing this feature entirely for production workflows
If kept: Remove detached: true, add timeout-minutes: 15, and restrict to non-master branches
Document that NO secrets should ever be added to this workflow

🐛 Potential Bugs

1. Division by Zero Risk (spec/performance/bench.sh:195-197)

If K6_REQS_KNOWN_STATUS is null/empty, this could fail. Add validation.

2. Race Condition in Server Startup (spec/dummy/bin/prod)

The rails s command starts but there is no guarantee assets are fully loaded. Consider adding a readiness check that validates asset serving.

3. Missing Error Context

When benchmarks fail, the error message does not indicate which tool failed. Consider wrapping each tool execution with proper error context.

4. Hardcoded Paths (spec/performance/bench.sh:87)

If run from wrong directory, this could create directories in unexpected locations. Consider using SCRIPT_DIR to make paths relative to the script location.

⚡ Performance Considerations

Workflow runs on every push/PR: For a 20-minute benchmark suite, consider only running on workflow_dispatch and push to master, or add path filters to skip documentation changes.
Sequential warmup (spec/performance/bench.sh:73-78): Consider parallel warmup for efficiency.
Triple tool execution: Running all 3 tools provides validation but triples execution time. Consider defaulting to k6 only.

🧪 Test Coverage - Missing

Unit tests for benchmark script: Input validation and JSON parsing logic should be tested
Workflow validation: Consider adding actionlint to CI
Integration test: Should verify benchmark artifacts are generated correctly

📝 Documentation - REQUIRED

Missing per CLAUDE.md requirements:

CHANGELOG.md update - This is a user-visible feature that developers will use. Action needed: Update CHANGELOG.md
Inline documentation: spec/performance/bench.sh needs header comment explaining:
- Purpose
- Usage examples
- Required dependencies
- Expected output format
Workflow documentation: Add comment block in .github/workflows/benchmark.yml explaining when to run benchmarks and how to interpret results

🎨 Code Quality Suggestions

Good Practices:

Consistent error handling
Input validation with clear messages
Proper use of environment variables with defaults
JSON output for machine-readable results

Suggestions:

YAML Anchors: Lines 7-14 define anchors but only use them once - consider simplifying
DRY Principle: Lines 165-208 have very similar blocks for each tool - consider refactoring
Magic Numbers: Several hardcoded values should be configurable (TIMEOUT_SEC=60, warmup count, sleep durations)
Puma Configuration: Default of 2 workers may be too low - document why this was chosen

📋 Action Items Summary

🔴 High Priority (Before Merge):

Address SSH security concerns - Remove or significantly restrict
Add CHANGELOG.md entry (required per project guidelines)
Fix potential null reference in K6 status parsing
Add workflow path filters to prevent unnecessary runs

🟡 Medium Priority (Before Merge):

Add header documentation to bench.sh
Add error context to benchmark failures
Fix hardcoded paths in bash script

🟢 Low Priority (Future Enhancement):

Add unit tests for benchmark script
Refactor duplicate code in parsing logic
Make magic numbers configurable

✨ Overall Assessment

This is a solid foundation for performance testing! The multi-tool approach and comprehensive workflow setup show good engineering practices.

Main concerns:

Security: SSH feature needs significant changes or removal
Documentation: Missing CHANGELOG and inline docs (required per CLAUDE.md)
Robustness: A few edge cases in error handling

Recommendation: Request changes for security and documentation issues, then approve after those are addressed.

Great work on this comprehensive testing infrastructure!

alexeyr-ci2 changed the title ~~Alexeyr/performance tests~~ Add performance tests Oct 20, 2025

alexeyr-ci force-pushed the alexeyr/performance-tests branch from 980a424 to 4a01be4 Compare October 20, 2025 19:01

alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 5 times, most recently from ec26e73 to a616cd1 Compare November 5, 2025 11:54

alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 5 times, most recently from 040c0a5 to ad56e14 Compare November 5, 2025 15:49

alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 2 times, most recently from e350a34 to 7b9d621 Compare November 6, 2025 10:25

Initial benchmark version

6f0e820

alexeyr-ci2 force-pushed the alexeyr/performance-tests branch from 7b9d621 to f2b9ef8 Compare November 7, 2025 13:13

alexeyr-ci2 force-pushed the alexeyr/performance-tests branch from c58f480 to 43f8858 Compare November 7, 2025 13:36

alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 7 times, most recently from 49b2575 to be4af9e Compare November 7, 2025 20:16

alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 5 times, most recently from 0acf3d2 to 92fcf7d Compare November 8, 2025 10:22

alexeyr added 18 commits November 8, 2025 10:42

Add production scripts

b7d8d02

Initial benchmark workflow

a6f372f

Add server warm-up to benchmark

3a2613f

Make request timeout a parameter

ab17e2d

Update defaults for now

fbaa1bd

Fix knip error

2491b77

Enable clustered mode in production

2f885d9

Add MAX_CONNECTIONS

7e1781c

Fix max rate K6 scenario

0e0fb3f

Reorder workflow parameters more logically

4a5b985

Closer to recommended Fortio options

019b808

Allow configuring RAILS_MAX/MIN_THREADS in the workflow

5e7a267

Move showing benchmark params to bench.sh for simplicity

17eb696

Convert the benchmark script to Ruby

3e50354

Fix k6 timeout

f24e79f

Replace DURATION_SEC with DURATION

24e38a5

Group all code for a tool into a single block

1ab22eb

Remove duplication in adding summaries

0181753

alexeyr-ci2 force-pushed the alexeyr/performance-tests branch from 92fcf7d to 0181753 Compare November 8, 2025 10:42

alexeyr added 5 commits November 8, 2025 12:19

Benchmark all routes

58e57b0

Fix Fortio failure on server_side_log_throw_raise

8e8070d

Allow specifying routes

d27ad02

Add pro benchmarks

23b347f

Update Claude instructions

28ba76f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add performance tests #1868

Add performance tests #1868

Uh oh!

alexeyr-ci2 commented Oct 20, 2025 •

edited by justin808

Loading

Uh oh!

coderabbitai bot commented Oct 20, 2025 •

edited

Loading

Review skipped

Uh oh!

claude bot commented Nov 5, 2025

Uh oh!

claude bot commented Nov 5, 2025

Uh oh!

claude bot commented Nov 6, 2025

Uh oh!

claude bot commented Nov 7, 2025

Uh oh!

claude bot commented Nov 7, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Add performance tests #1868

Are you sure you want to change the base?

Add performance tests #1868

Uh oh!

Conversation

alexeyr-ci2 commented Oct 20, 2025 • edited by justin808 Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Pull Request checklist

Uh oh!

coderabbitai bot commented Oct 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

claude bot commented Nov 5, 2025

Code Review - PR #1868: Add Performance Tests

✅ Strengths

🐛 Critical Issues

⚠️ Performance & Quality Issues

📚 Documentation Issues

🎯 Before Merging

📊 Summary

Uh oh!

claude bot commented Nov 5, 2025

PR Review: Add Performance Tests (1868)

Summary

Critical Issues

1. Missing Trailing Newlines

2. RuboCop Required

Potential Bugs

K6 JSON Parsing (bench.sh:187-192)

Server Startup Race (benchmark.yml:254)

npm vs yarn (benchmark.yml:200)

Code Quality Strengths

Suggestions

Test Coverage

Verdict

Required Before Merge:

Recommended:

Uh oh!

claude bot commented Nov 6, 2025

PR Review: Add Performance Tests

✅ Strengths

🐛 Potential Issues

1. Script Portability - bc Dependency (bench.sh:21,33)

2. Race Condition in Server Startup (benchmark.yml:248)

3. k6 Status Reporting Edge Case (bench.sh:199-204)

4. Missing Newline at End of Files

5. Benchmark Workflow Runs on Every PR (benchmark.yml:57-58)

🔍 Code Quality & Best Practices

Security

Performance

Maintainability

📊 Test Coverage

🔧 Configuration Issues

Puma Configuration (spec/dummy/config/puma.rb:38)

📝 Documentation & Changelog

🎯 Recommendations

🎓 Overall Assessment

Uh oh!

claude bot commented Nov 7, 2025

Pull Request Review: Add Performance Tests

✅ Strengths

🔍 Issues and Concerns

Critical Issues

High Priority Issues

Medium Priority Issues

🔒 Security Considerations

🧪 Test Coverage

📝 Suggested Changes

Must Fix (Blocks Merge)

Should Fix (High Priority)

Nice to Have

✅ Changelog

🎯 Overall Assessment

Uh oh!

claude bot commented Nov 7, 2025

Code Review: Add Performance Tests

✅ Strengths

alexeyr-ci2 commented Oct 20, 2025 •

edited by justin808

Loading

coderabbitai bot commented Oct 20, 2025 •

edited

Loading