Skip to content

Conversation

@alexeyr-ci2
Copy link
Collaborator

@alexeyr-ci2 alexeyr-ci2 commented Oct 20, 2025

Summary

Add performance tests. Fixes #1777.

Pull Request checklist

  • [ ] Add/update test to cover these changes
  • Update documentation
  • Update CHANGELOG file

This change is Reviewable

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Oct 20, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch alexeyr/performance-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@alexeyr-ci2 alexeyr-ci2 changed the title Alexeyr/performance tests Add performance tests Oct 20, 2025
@alexeyr-ci alexeyr-ci force-pushed the alexeyr/performance-tests branch from 980a424 to 4a01be4 Compare October 20, 2025 19:01
@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 5 times, most recently from ec26e73 to a616cd1 Compare November 5, 2025 11:54
@claude
Copy link

claude bot commented Nov 5, 2025

Code Review - PR #1868: Add Performance Tests

Thanks for this contribution! This is a solid implementation of performance testing infrastructure.


✅ Strengths

  1. Comprehensive Tool Coverage: Using three industry-standard tools (Fortio, Vegeta, k6) provides good cross-validation
  2. Well-Structured Workflow: Clear separation of concerns with logical step grouping
  3. Good Error Handling: Proper use of set -euo pipefail and validation steps
  4. Caching Strategy: Smart use of GitHub Actions cache for benchmark tool binaries
  5. Security Conscious: SSH access properly gated with warnings and actor-only access
  6. Flexible Configuration: Parameterized inputs allow customization

🐛 Critical Issues

  1. Node.js Version Mismatch (benchmark.yml:155)

  2. Missing Trailing Newlines (CRITICAL per CLAUDE.md)

    • Files spec/dummy/bin/prod and spec/dummy/bin/prod-assets missing final newlines
    • This will cause CI failures - Run: rake autofix to fix
  3. Race Condition in Server Startup (benchmark.yml:220)

    • Background server might not be fully initialized when curl check succeeds
    • Recommend adding health check endpoint or verifying multiple successful responses
  4. Incomplete PR Checklist

    • Documentation and CHANGELOG boxes unchecked
    • Per CLAUDE.md: Should update CHANGELOG for user-visible features

⚠️ Performance & Quality Issues

  1. bench.sh:66 - Unrealistic iteration count assumes 10 iterations/sec per VU
  2. Fortio Optimization Missing - Referenced guide not implemented
  3. No Resource Monitoring - Missing CPU/memory metrics
  4. Missing Test Validation - No thresholds to detect performance regressions
  5. Single Route Testing - Only tests server_side_hello_world_hooks
  6. Complex Validation Logic - Lines 17-27 hard to read
  7. Inconsistent Quoting - Variables not consistently quoted
  8. Hard-coded Timeouts - 30s and 60s timeouts not configurable

📚 Documentation Issues

  1. Missing Documentation - No README for running benchmarks locally
  2. Workflow Triggers - Runs on ALL PRs (expensive, should use labels/manual only)
  3. No Historical Tracking - Results not compared over time

🎯 Before Merging

  1. Fix Node.js version (22 → 20)
  2. Add trailing newlines
  3. Run bundle exec rubocop
  4. Update CHANGELOG.md
  5. Add performance thresholds
  6. Update workflow triggers

📊 Summary

Assessment: Good foundation but needs critical fixes before merge

Effort: Critical issues ~30 mins, High priority ~1-2 hours

Main concerns: Node.js version mismatch and missing trailing newlines will cause CI failures.

Let me know if you need help with any of these!

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 5 times, most recently from 040c0a5 to ad56e14 Compare November 5, 2025 15:49
@claude
Copy link

claude bot commented Nov 5, 2025

PR Review: Add Performance Tests (1868)

Summary

This PR adds comprehensive performance testing infrastructure using three industry-standard benchmarking tools (Fortio, Vegeta, and k6). Well-structured implementation addressing issue 1777.

Critical Issues

1. Missing Trailing Newlines

CRITICAL per CLAUDE.md: Files MUST end with newline character or CI will fail:

  • spec/dummy/bin/prod (line 4)
  • spec/dummy/bin/prod-assets (line 9)
  • spec/performance/bench.sh (line 202)

2. RuboCop Required

CRITICAL per CLAUDE.md: Must run bundle exec rubocop and fix ALL violations before commit.

Potential Bugs

K6 JSON Parsing (bench.sh:187-192)

The jq expression uses add which fails on empty arrays. Use add // 0 for safety.

Server Startup Race (benchmark.yml:254)

Uses exit 0 inside loop which exits the entire step. Should use break instead.

npm vs yarn (benchmark.yml:200)

Uses npm install --global yalc but CLAUDE.md requires yarn-only commands.

Code Quality Strengths

  1. Excellent tool selection - three benchmarking tools for cross-validation
  2. Well-structured workflow with clear sections
  3. Good error handling with set -euo pipefail
  4. Comprehensive parameterization via workflow inputs
  5. Smart binary caching for faster CI
  6. Security-conscious SSH access gating

Suggestions

  1. Shell portability: Replace bc with awk for float comparisons
  2. Hardcoded route: Make server_side_hello_world_hooks configurable
  3. Duration: 30s may be too short, consider 60s for stable measurements
  4. Add documentation for local benchmark execution
  5. Consider performance regression detection vs master
  6. Add spec/performance/README.md with interpretation guide

Test Coverage

Good: Infrastructure in place, runs on PR/push, results preserved as artifacts
Missing: No bash script tests, no performance thresholds, no local testing docs

Verdict

Solid PR with high code quality. Performance testing infrastructure will be valuable.

Required Before Merge:

  1. Fix trailing newlines (will cause CI failure)
  2. Run bundle exec rubocop and fix violations
  3. Fix K6 empty array bug

Recommended:

  1. Add basic documentation
  2. Verify workflow passes on CI
  3. Consider CHANGELOG entry

Great work overall!

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 2 times, most recently from e350a34 to 7b9d621 Compare November 6, 2025 10:25
@claude
Copy link

claude bot commented Nov 6, 2025

PR Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a well-structured addition to the repository. Here's my detailed feedback:


Strengths

  1. Comprehensive Tool Coverage: Great choice using three different load testing tools (Fortio, Vegeta, k6) to get diverse perspectives on performance
  2. Excellent Error Handling: The bash script has robust parameter validation and error handling throughout
  3. Smart Caching: GitHub Actions workflow efficiently caches binaries to speed up future runs
  4. Good Documentation: Clear comments, parameter descriptions, and helpful echo statements
  5. Configurable Parameters: Extensive workflow_dispatch inputs allow flexible testing scenarios
  6. Production-Ready Setup: Proper Puma configuration with workers and preloading for production benchmarks

🐛 Potential Issues

1. Script Portability - bc Dependency (bench.sh:21,33)

The script uses bc for floating-point comparisons but bc is not installed in the workflow.

Fix: Add bc to the apt package installation step in the workflow, or use bash-native arithmetic for integer comparisons.

2. Race Condition in Server Startup (benchmark.yml:248)

The server process is backgrounded but there is no guarantee it has started before the polling loop begins. If the server fails to start immediately, the loop might miss error messages.

Suggestion: Add a brief sleep 2 before the polling loop, or capture the PID and check if the process is still running during the timeout loop.

3. k6 Status Reporting Edge Case (bench.sh:199-204)

If .root_group.checks is empty, add returns null, causing K6_REQS_OTHER calculation to fail.

Fix: Add null handling by changing the jq expression to use | add // 0

4. Missing Newline at End of Files

Per CLAUDE.md requirements: ALWAYS ensure files end with a newline character

Files needing newlines:

  • spec/dummy/bin/prod (line 4)
  • spec/dummy/bin/prod-assets (line 9)
  • spec/performance/bench.sh (line 214)

Fix: Run rake autofix or add trailing newlines manually.

5. Benchmark Workflow Runs on Every PR (benchmark.yml:57-58)

Running 30-second benchmarks with 3 tools on every PR will consume significant CI minutes and slow down PR feedback.

Suggestion:

  • Remove pull_request trigger and only keep workflow_dispatch + push to master
  • Or add a condition to only run on PRs with a specific label (e.g., performance)
  • Or significantly reduce default duration (e.g., 10s) for PR runs

🔍 Code Quality & Best Practices

Security

  • ✅ SSH access is properly gated behind workflow_dispatch input with clear warnings
  • ✅ limit-access-to-actor: true prevents unauthorized access
  • ⚠️ Consider adding a comment warning maintainers never to merge code while SSH session is active

Performance

  • ✅ Binary caching will significantly speed up repeated runs
  • ✅ Server warm-up phase (10 requests) is good practice
  • ⚠️ REQUEST_TIMEOUT=60s is very generous - consider 30s default to catch timeout issues faster

Maintainability

  • ✅ Clear section comments in YAML
  • ✅ Parameterized configuration
  • ⚠️ Consider extracting the jq parsing logic (lines 167-209) into a separate function for readability

📊 Test Coverage

Missing:

  • ❌ No automated tests for the bench.sh script itself
  • ❌ No validation that the summary.txt format is correct
  • ❌ No performance regression detection (just data collection)

Suggestions:

  1. Add a unit test for bench.sh parameter validation
  2. Consider storing benchmark results over time and comparing against baselines
  3. Add a step to post benchmark results as a PR comment for visibility

🔧 Configuration Issues

Puma Configuration (spec/dummy/config/puma.rb:38)

Workflow sets default to 4 workers (WEB_CONCURRENCY: 4), but Puma defaults to 2 if not set. The workflow default should match.

Consistency check needed: Verify that WEB_CONCURRENCY=4 is intentional for CI runners with 2 CPU cores.


📝 Documentation & Changelog

Per CLAUDE.md: Update CHANGELOG.md for user-visible changes only (features, bug fixes, breaking changes, deprecations, performance improvements)

This PR qualifies as it adds a performance testing feature. Needs:

  • ✅ Changelog entry format: PR 1868 by alexeyr-ci2: Add performance benchmarking workflow with Fortio, Vegeta, and k6.

Missing:

  • ❌ No README/docs update explaining how to run benchmarks locally
  • ❌ No documentation on interpreting benchmark results

🎯 Recommendations

Before Merge (Critical):

  1. ✅ Add trailing newlines to all new files
  2. ✅ Run bundle exec rubocop and fix any violations
  3. ✅ Add CHANGELOG entry
  4. ✅ Fix k6 null handling edge case
  5. ✅ Reconsider pull_request trigger or reduce duration

Nice to Have:
6. Add bc to dependencies or remove floating-point support
7. Add brief README section on running benchmarks
8. Handle server startup race condition
9. Consider performance regression detection


🎓 Overall Assessment

This is a solid, production-ready implementation of performance testing. The code quality is high, with excellent error handling and configurability. The main concerns are:

  1. CI cost - running on every PR may be expensive
  2. Missing documentation - users won't know how to use this
  3. File formatting - needs trailing newlines per repo standards

Recommendation: Request changes for the critical items (newlines, changelog, PR trigger), then approve.


Score: 8/10 - Great implementation, minor issues to address.

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/performance-tests branch from 7b9d621 to f2b9ef8 Compare November 7, 2025 13:13
@claude
Copy link

claude bot commented Nov 7, 2025

Pull Request Review: Add Performance Tests

Thank you for this comprehensive performance testing implementation! This is a valuable addition to the project. Below is my detailed review:


Strengths

  1. Multi-tool approach: Using Fortio, Vegeta, and K6 provides cross-validation of results and flexibility for different testing scenarios.

  2. Well-structured workflow: The GitHub Actions workflow is logically organized with clear steps and helpful comments.

  3. Configurable parameters: Extensive configurability via workflow inputs allows fine-tuning for different performance scenarios.

  4. Good error handling: The bench.sh script includes comprehensive input validation and error messages.

  5. Caching: Binary caching for tools reduces workflow execution time.

  6. Production mode testing: Tests against production-like configuration (Puma workers, precompiled assets) for realistic results.


🔍 Issues and Concerns

Critical Issues

  1. Missing file newlines (spec/dummy/bin/prod, spec/dummy/bin/prod-assets)

    • Location: spec/dummy/bin/prod:4, spec/dummy/bin/prod-assets:9
    • Impact: CI will fail per CLAUDE.md requirements
    • Fix: Ensure both files end with a newline character
    • Reference: CLAUDE.md states "ALWAYS ensure files end with a newline character"
  2. RuboCop not run

    • Impact: MANDATORY requirement before commits
    • Required action: Run bundle exec rubocop and fix ALL violations
    • Reference: CLAUDE.md: "BEFORE EVERY COMMIT/PUSH: ALWAYS run bundle exec rubocop and fix ALL violations"

High Priority Issues

  1. Workflow runs on every PR/push to master (benchmark.yml:55-58)

    • Impact: Performance tests are resource-intensive and will slow down CI significantly
    • Recommendation: Only run on manual workflow_dispatch, specific labels (e.g., run-benchmarks), or scheduled runs (e.g., nightly)
  2. Server process not properly cleaned up (benchmark.yml:248)

    • Issue: Server started in background with bin/prod & but never explicitly killed
    • Impact: May leave orphaned processes, especially if workflow fails
    • Fix: Add cleanup step or use a trap/signal handler

Medium Priority Issues

  1. Hardcoded route in bench.sh (line 6)

    • Issue: ROUTE:-server_side_hello_world_hooks may not exist or may not be representative
    • Recommendation: Verify this route exists in the dummy app, consider testing multiple routes, and document what this route tests
  2. No baseline comparison

    • Issue: Results are collected but not compared against previous runs
    • Enhancement: Consider using benchmark-action to track performance over time
  3. Missing documentation

    • Issue: No README or documentation for how to run benchmarks locally, interpret results, or understand what "good" performance looks like
    • Recommendation: Add spec/performance/README.md

🔒 Security Considerations

  1. SSH access via tmate (benchmark.yml:108-113)
    • Status: Properly gated behind workflow input (default: false)
    • Good: Includes security warnings and limit-access-to-actor

🧪 Test Coverage

Missing:

  • No unit tests for the bash script logic
  • No tests verifying the benchmark workflow succeeds
  • No validation that summary output format is correct

Recommendations:

  • Add a dry-run mode to bench.sh for testing
  • Consider shellcheck for bash script validation

📝 Suggested Changes

Must Fix (Blocks Merge)

  • Add trailing newlines to spec/dummy/bin/prod and spec/dummy/bin/prod-assets
  • Run bundle exec rubocop and fix all violations
  • Change workflow triggers to avoid running on every PR/push

Should Fix (High Priority)

  • Add server cleanup/shutdown in workflow
  • Verify the benchmark route exists and is representative
  • Add basic documentation (spec/performance/README.md)

Nice to Have

  • Add baseline comparison/tracking
  • Add shellcheck to CI for bash scripts
  • Consider shorter default duration for PR checks
  • Extract K6 test to separate file

Changelog

Per CLAUDE.md, this should be added to CHANGELOG.md as it's a new feature.


🎯 Overall Assessment

This is a solid foundation for performance testing with good multi-tool coverage and configurability. The main blockers are CLAUDE.md compliance issues (trailing newlines, RuboCop). Once those are fixed and the workflow triggers are adjusted to avoid running on every PR, this will be ready to merge.

Estimated effort to address critical issues: ~30 minutes

Great work overall! 🚀

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/performance-tests branch from c58f480 to 43f8858 Compare November 7, 2025 13:36
@claude
Copy link

claude bot commented Nov 7, 2025

Code Review: Add Performance Tests

Thank you for this comprehensive performance testing addition! This is a well-structured PR that addresses issue #1777.


✅ Strengths

1. Excellent Tool Coverage

  • Multiple benchmarking tools (Fortio, Vegeta, k6) provide good cross-validation
  • Smart caching strategy for tool binaries reduces CI time
  • Flexible configuration via environment variables

2. Robust Error Handling

  • Good use of set -euo pipefail in bash scripts
  • Comprehensive input validation with clear error messages
  • Server health checks with timeout protection

3. Production-Ready Configuration

  • Proper Puma clustering setup for production benchmarks
  • Preloading and worker configuration aligned with best practices
  • Appropriate separation of prod/dev configurations

🔒 Security Concerns - CRITICAL

SSH Access Feature (lines 96-119 in .github/workflows/benchmark.yml)

This poses significant security risks:

Issues:

  1. Detached mode risk: Using detached: true leaves the SSH session open indefinitely
  2. Repository exposure: The entire repository is accessible via SSH
  3. No audit trail: Limited visibility into commands executed during SSH sessions
  4. Secrets exposure risk: If secrets are added later, they could be exposed

Recommendations:

  • Consider removing this feature entirely for production workflows
  • If kept: Remove detached: true, add timeout-minutes: 15, and restrict to non-master branches
  • Document that NO secrets should ever be added to this workflow

🐛 Potential Bugs

1. Division by Zero Risk (spec/performance/bench.sh:195-197)

If K6_REQS_KNOWN_STATUS is null/empty, this could fail. Add validation.

2. Race Condition in Server Startup (spec/dummy/bin/prod)

The rails s command starts but there is no guarantee assets are fully loaded. Consider adding a readiness check that validates asset serving.

3. Missing Error Context

When benchmarks fail, the error message does not indicate which tool failed. Consider wrapping each tool execution with proper error context.

4. Hardcoded Paths (spec/performance/bench.sh:87)

If run from wrong directory, this could create directories in unexpected locations. Consider using SCRIPT_DIR to make paths relative to the script location.


⚡ Performance Considerations

  1. Workflow runs on every push/PR: For a 20-minute benchmark suite, consider only running on workflow_dispatch and push to master, or add path filters to skip documentation changes.

  2. Sequential warmup (spec/performance/bench.sh:73-78): Consider parallel warmup for efficiency.

  3. Triple tool execution: Running all 3 tools provides validation but triples execution time. Consider defaulting to k6 only.


🧪 Test Coverage - Missing

  1. Unit tests for benchmark script: Input validation and JSON parsing logic should be tested
  2. Workflow validation: Consider adding actionlint to CI
  3. Integration test: Should verify benchmark artifacts are generated correctly

📝 Documentation - REQUIRED

Missing per CLAUDE.md requirements:

  1. CHANGELOG.md update - This is a user-visible feature that developers will use. Action needed: Update CHANGELOG.md

  2. Inline documentation: spec/performance/bench.sh needs header comment explaining:

    • Purpose
    • Usage examples
    • Required dependencies
    • Expected output format
  3. Workflow documentation: Add comment block in .github/workflows/benchmark.yml explaining when to run benchmarks and how to interpret results


🎨 Code Quality Suggestions

Good Practices:

  • Consistent error handling
  • Input validation with clear messages
  • Proper use of environment variables with defaults
  • JSON output for machine-readable results

Suggestions:

  1. YAML Anchors: Lines 7-14 define anchors but only use them once - consider simplifying
  2. DRY Principle: Lines 165-208 have very similar blocks for each tool - consider refactoring
  3. Magic Numbers: Several hardcoded values should be configurable (TIMEOUT_SEC=60, warmup count, sleep durations)
  4. Puma Configuration: Default of 2 workers may be too low - document why this was chosen

📋 Action Items Summary

🔴 High Priority (Before Merge):

  1. Address SSH security concerns - Remove or significantly restrict
  2. Add CHANGELOG.md entry (required per project guidelines)
  3. Fix potential null reference in K6 status parsing
  4. Add workflow path filters to prevent unnecessary runs

🟡 Medium Priority (Before Merge):

  1. Add header documentation to bench.sh
  2. Add error context to benchmark failures
  3. Fix hardcoded paths in bash script

🟢 Low Priority (Future Enhancement):

  1. Add unit tests for benchmark script
  2. Refactor duplicate code in parsing logic
  3. Make magic numbers configurable

✨ Overall Assessment

This is a solid foundation for performance testing! The multi-tool approach and comprehensive workflow setup show good engineering practices.

Main concerns:

  1. Security: SSH feature needs significant changes or removal
  2. Documentation: Missing CHANGELOG and inline docs (required per CLAUDE.md)
  3. Robustness: A few edge cases in error handling

Recommendation: Request changes for security and documentation issues, then approve after those are addressed.

Great work on this comprehensive testing infrastructure!

@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 7 times, most recently from 49b2575 to be4af9e Compare November 7, 2025 20:16
@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/performance-tests branch 5 times, most recently from 0acf3d2 to 92fcf7d Compare November 8, 2025 10:22
@alexeyr-ci2 alexeyr-ci2 force-pushed the alexeyr/performance-tests branch from 92fcf7d to 0181753 Compare November 8, 2025 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add performance tests

3 participants