|
| 1 | +# Go Benchmark |
| 2 | + |
| 3 | +This Section details how to execute Go benchmarks using the `go test` command and analyze the results with `benchstat`. |
| 4 | + |
| 5 | +## 1. Running Benchmarks: Core Flags |
| 6 | + |
| 7 | +To run benchmarks, use the `go test` command with specific flags. The most essential flag is `-bench`, which accepts a regular expression to select which benchmark functions to execute. |
| 8 | + |
| 9 | +### Execution Control Flags |
| 10 | + |
| 11 | +| Flag | Description | Example | |
| 12 | +| :--- | :--- | :--- | |
| 13 | +| `-bench=<regexp>` | Run benchmarks matching the regex. Use `.` to run all. | `-bench=.` | |
| 14 | +| `-run=<regexp>` | Run unit tests matching the regex. Use `^$` to skip all unit tests and only run benchmarks. | `-run=^$` | |
| 15 | +| `-benchtime=<t>` | Duration to run each benchmark (default `1s`). Can also specify iterations (suffix `x`). | `-benchtime=5s` or `-benchtime=1000x` | |
| 16 | +| `-count=<n>` | Run each benchmark `n` times. Essential for statistical analysis with `benchstat`. | `-count=10` | |
| 17 | +| `-timeout=<t>` | Overrides the default 10m timeout. Necessary for long-running benchmark suites. | `-timeout=30m` | |
| 18 | +| `-failfast` | Stop execution immediately after the first failure. | `-failfast` | |
| 19 | + |
| 20 | +### Resource & Profiling Flags |
| 21 | + |
| 22 | +These flags are critical for performance tuning and memory analysis, which aligns with your recent work on allocation optimization. |
| 23 | + |
| 24 | +| Flag | Description | Example | |
| 25 | +| :--- | :--- | :--- | |
| 26 | +| `-benchmem` | Print memory allocation statistics (allocations per op, bytes per op). | `-benchmem` | |
| 27 | +| `-cpu=<n,m...>` | Run benchmarks with specific `GOMAXPROCS` values. | `-cpu=1,2,4,8` | |
| 28 | +| `-cpuprofile=<file>` | Write a CPU profile to the specified file. | `-cpuprofile=cpu.out` | |
| 29 | +| `-memprofile=<file>` | Write a memory profile to the specified file. | `-memprofile=mem.out` | |
| 30 | +| `-blockprofile=<file>`| Write a goroutine blocking profile (contention analysis). | `-blockprofile=block.out` | |
| 31 | +| `-mutexprofile=<file>`| Write a mutex contention profile. | `-mutexprofile=mutex.out` | |
| 32 | +| `-trace=<file>` | Write an execution trace for the `go tool trace` viewer. | `-trace=trace.out` | |
| 33 | + |
| 34 | +> **Note:** When using profiling flags like `-cpuprofile`, the resulting binary is preserved in the current directory for analysis with `go tool pprof`. |
| 35 | +
|
| 36 | +## 2. The Analysis Workflow (A/B Testing) |
| 37 | + |
| 38 | +Reliable performance optimization requires measuring the "before" and "after" states. The standard workflow involves saving benchmark output to files and comparing them. |
| 39 | + |
| 40 | +### Step 1: Install Benchstat |
| 41 | + |
| 42 | +`benchstat` is the standard tool for computing statistical summaries and comparing Go benchmarks. |
| 43 | +```bash |
| 44 | +go install golang.org/x/perf/cmd/benchstat@latest |
| 45 | +``` |
| 46 | +or |
| 47 | + |
| 48 | +```shell |
| 49 | +make install-tools |
| 50 | +``` |
| 51 | + |
| 52 | +### Step 2: Capture Baseline (Old) |
| 53 | + |
| 54 | +Run the current code 10 times to gather statistically significant data. |
| 55 | +```bash |
| 56 | +# Save current performance to old.txt |
| 57 | +go test -bench=. -benchmem -count=10 -run=^$ . > old.txt |
| 58 | +``` |
| 59 | + |
| 60 | +### Step 3: Capture Experiment (New) |
| 61 | + |
| 62 | +Apply your code changes (e.g., the allocation optimizations you researched) and run the same benchmark command. |
| 63 | +```bash |
| 64 | +# Save optimized performance to new.txt |
| 65 | +go test -bench=. -benchmem -count=10 -run=^$ . > new.txt |
| 66 | +``` |
| 67 | + |
| 68 | +## 3. Analyzing Results with Benchstat |
| 69 | + |
| 70 | +Run `benchstat` against your two captured files to see the delta. |
| 71 | + |
| 72 | +```bash |
| 73 | +benchstat old.txt new.txt |
| 74 | +``` |
| 75 | + |
| 76 | +### Interpreting the Output |
| 77 | + |
| 78 | +The output displays the mean value for each metric and the percentage change. |
| 79 | + |
| 80 | +```text |
| 81 | +name old time/op new time/op delta |
| 82 | +JSONEncode-8 1.50µs ± 2% 1.20µs ± 1% -20.00% (p=0.000 n=10+10) |
| 83 | +
|
| 84 | +name old alloc/op new alloc/op delta |
| 85 | +JSONEncode-8 896B ± 0% 420B ± 0% -53.12% (p=0.000 n=10+10) |
| 86 | +``` |
| 87 | + |
| 88 | +* **delta:** The percentage change. Negative values indicate improvement (reduced time or allocations). |
| 89 | +* **p-value:** The probability that the difference is due to random noise. A value `< 0.05` is generally considered statistically significant. |
| 90 | +* **n=10+10:** Indicates 10 valid samples were used from both the old and new files. |
| 91 | + |
| 92 | +### Advanced Grouping |
| 93 | + |
| 94 | +If your benchmarks use configuration naming conventions (e.g., `Benchmark/Enc=json` vs `Benchmark/Enc=gob`), you can group results specifically. |
| 95 | + |
| 96 | +```bash |
| 97 | +# Compare results grouping by a specific configuration key |
| 98 | +benchstat -col /Enc old.txt |
| 99 | +``` |
| 100 | + |
| 101 | +## 4. Best Practices for Accurate Results |
| 102 | + |
| 103 | +* **Isolation:** Close high-CPU applications (browsers, IDE indexing) before running benchmarks to reduce noise. |
| 104 | +* **Count > 1:** Always use `-count` (ideally 5-10) to detect variance. Single runs are unreliable for optimization decisions. |
| 105 | +* **Run Filter:** Always use `-run=^$` to prevent unit tests from interfering with benchmark timing or output parsing. |
| 106 | +* **Stable Machine:** For mission-critical measurements, consider using a dedicated bare-metal machine or a cloud instance with pinned CPUs to avoid "noisy neighbor" effects. |
0 commit comments