feat(serverv2): add benchmarks of (old) cacheKV vs branch #22497

testinginprod · 2024-11-08T14:57:21Z

Description

Summary of Benchmark Performance Comparison

Bench Test Type: `_CacheStack_Set`:

This sets the same key in state. Note: Stack size should have no impact here since the top branch is the one where the cache is being set.

StackSize1-14
- Previous Performance (storev1.txt): 48.97 ns/op
- Current Performance (old.txt): 32.93 ns/op
- Improvement: Current code is 1.49× faster
StackSize10-14
- Previous: 55.87 ns/op
- Current: 46.38 ns/op
- Improvement: Current code is 1.20× faster
StackSize100-14
- Previous: 56.58 ns/op
- Current: 46.38 ns/op
- Improvement: Current code is 1.22× faster

Bench Test Type: `_GetCached`

Gets the same key.

NOTE: CacheKV caches the key on the first get, making getting the same key expensive only the first time but in-expensive other times. This works well only when we fetch the same key. GetSparse shows the opposite view.

StackSize1-14
- Previous: 12.56 ns/op
- Current: 25.90 ns/op
- Performance Decline: Current code is 2.06× slower
StackSize10-14
- Previous: 17.28 ns/op
- Current: 331.1 ns/op
- Performance Decline: Current code is 19.16× slower
StackSize100-14
- Previous: 16.85 ns/op
- Current: 5,398 ns/op
- Performance Decline: Current code is 320× slower

Bench Test Type: `_GetSparse`

Fetching always new key from storage.

StackSize1-14
- Previous: 261.8 ns/op
- Current: 22.25 ns/op
- Improvement: Current code is 11.77× faster
StackSize10-14
- Previous: 1,795 ns/op
- Current: 328.1 ns/op
- Improvement: Current code is 5.47× faster
StackSize100-14
- Previous: 18,770 ns/op
- Current: 5,292 ns/op
- Improvement: Current code is 3.54× faster

Bench Test Type: `_Iterate`

StackSize1-14
- Previous: 457.3 ns/op
- Current: 290.4 ns/op
- Improvement: Current code is 1.58× faster
StackSize10-14
- Previous: 34,090 ns/op
- Current: 13,670 ns/op
- Improvement: Current code is 2.49× faster
StackSize100-14
- Previous: 47,370,000 ns/op
- Current: 1,441,000 ns/op
- Improvement: Current code is 32.86× faster

Overall Performance

Geometric Mean Time
- Previous: 626.9 ns/op
- Current: 547.4 ns/op
- Overall Improvement: Current code is approximately 1.15× faster

Memory Usage

Bytes per Operation (B/op):

_CacheStack_Set
- Previous: 34 B/op
- Current: 2 B/op
- Improvement: Current code uses 17× less memory
_GetCached
- Both Versions: 1 B/op
- Note: No significant change
_GetSparse
- Previous: Ranged from 115 B/op to 13.23 KiB/op
- Current: 0 B/op
- Improvement: Current code eliminates memory allocations in this test
_Iterate
- Previous vs. Current: Slight reductions in memory usage with the current code

Allocations per Operation (allocs/op):

_CacheStack_Set
- Previous: 3 allocs/op
- Current: 2 allocs/op
- Improvement: Fewer allocations with current code
_GetSparse
- Previous: Up to 103 allocs/op
- Current: 0 allocs/op
- Improvement: Current code eliminates allocations
_Iterate
- Slight Reduction: Current code has fewer allocations per operation

Conclusion:

Performance Gains: The current code shows significant performance improvements in the _CacheStack_Set, _GetSparse, and _Iterate benchmarks, especially with larger stack sizes.
Performance Losses: There is a notable performance decline in the _GetCached benchmarks, where the current code is slower than the previous version.
Memory Efficiency: The current code generally uses less memory and has fewer allocations per operation, which may contribute to performance improvements in certain benchmarks.

This summary provides a detailed comparison of the benchmark tests, highlighting where the current code has improved or declined in performance relative to the previous version.

Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

included the correct type prefix in the PR title, you can find examples of the prefixes below:
confirmed ! in the type prefix if API or client breaking change
targeted the correct branch (see PR Targeting)
provided a link to the relevant issue or specification
reviewed "Files changed" and left comments if necessary
included the necessary unit and integration tests
added a changelog entry to CHANGELOG.md
updated the relevant documentation or specification, including comments for documenting Go code
confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

Please see Pull Request Reviewer section in the contributing guide for more information on how to review a pull request.

I have...

confirmed the correct type prefix in the PR title
confirmed all author checklist items have been addressed
reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

Summary by CodeRabbit

New Features
- Introduced a suite of benchmark tests for the cachekv package, measuring performance for setting and retrieving values, as well as iteration over key-value pairs.
- Added a new function to convert integers to byte slices for enhanced key generation.
Bug Fixes
- Improved clarity in comments for the Get method within the Store struct to better reflect the logic flow.

These enhancements aim to improve performance evaluation and code clarity for users interacting with the caching system.

coderabbitai · 2024-11-08T14:57:29Z

📝 Walkthrough

Walkthrough

The changes introduce enhancements to benchmarking tests in the cachekv package and modifications to the Get method in the Store struct. New benchmark functions are added to evaluate performance, ensuring results are not optimized away by the compiler. Additionally, a new utility function for converting integers to byte slices is introduced, alongside improvements in key generation logic. Comments in the Get method are clarified without altering its functionality.

Changes

File	Change Summary
`server/v2/stf/branch/bench_test.go`	- Added `sink` variable in benchmark functions to prevent compiler optimizations.
	- Modified `Benchmark_CacheStack_Set`, `Benchmark_Get`, and added `Benchmark_GetSparse`.
	- Introduced `numToBytes` function for converting integers to byte slices.
	- Updated key generation logic in `makeBranchStack`.
`server/v2/stf/branch/store.go`	- Updated comments in `Get` method for clarity; no changes to logic or functionality.
`store/cachekv/branch_bench_test.go`	- New file with benchmark tests: `Benchmark_CacheStack_Set`, `Benchmark_Get`, `Benchmark_GetSparse`, `Benchmark_Iterate`.
	- Added helper functions: `makeBranchStack` and `numToBytes`.

Sequence Diagram(s)

sequenceDiagram
    participant B as Benchmark
    participant S as Store
    participant C as CacheStack

    B->>C: Benchmark_CacheStack_Set()
    C->>S: Set(key, value)
    S-->>C: Result (stored in sink)
    B->>C: Benchmark_Get()
    C->>S: Get(key)
    S-->>C: Result (stored in sink)
    B->>C: Benchmark_GetSparse()
    C->>S: Get(unique_key)
    S-->>C: Result (stored in sink)
    B->>C: Benchmark_Iterate()
    C->>S: Iterate()
    S-->>C: Key-Value pairs

Warning

Tool Failures:

Tool Failure Count:

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

github-actions · 2024-11-08T14:58:47Z

@testinginprod your pull request is missing a changelog!

coderabbitai

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (8)

server/v2/stf/branch/bench_test.go (3)
35-35: Consider adding more detailed performance metrics.

Given the significant performance regression (320x slower) mentioned in the PR objectives, consider adding:

Memory statistics using b.ReportAllocs()

Sub-benchmarks for different key patterns

CPU profiling for detailed analysis

Example enhancement:
 func Benchmark_Get(b *testing.B) {
 	var sink any
 	for _, stackSize := range stackSizes {
 		b.Run(fmt.Sprintf("StackSize%d", stackSize), func(b *testing.B) {
 			bs := makeBranchStack(b, stackSize)
+			// Add CPU profiling
+			if stackSize == 100 { // Profile only larger sizes
+				f, _ := os.Create("cpu.prof")
+				pprof.StartCPUProfile(f)
+				defer pprof.StopCPUProfile()
+			}
 			b.ResetTimer()
 			b.ReportAllocs()
 			for i := 0; i < b.N; i++ {
 				sink, _ = bs.Get([]byte{0})
 			}
 		})
 	}
 	if sink == nil {
 		b.Fatal("prevent compiler optimization")
 	}
 }
Also applies to: 42-48

51-72: LGTM! Consider adding parallel benchmark variant.

The benchmark is well-structured with proper key generation outside the timed section. Given the significant performance improvement (11.77x faster), it would be valuable to also test parallel access patterns.

Add a parallel variant:
+func Benchmark_GetSparse_Parallel(b *testing.B) {
+	for _, stackSize := range stackSizes {
+		b.Run(fmt.Sprintf("StackSize%d", stackSize), func(b *testing.B) {
+			bs := makeBranchStack(b, stackSize)
+			keys := make([][]byte, b.N)
+			for i := 0; i < b.N; i++ {
+				keys[i] = numToBytes(i)
+			}
+			b.ResetTimer()
+			b.ReportAllocs()
+			b.RunParallel(func(pb *testing.PB) {
+				var sink any
+				i := 0
+				for pb.Next() {
+					sink, _ = bs.Get(keys[i%len(keys)])
+					i++
+				}
+				if sink == nil {
+					b.Fatal("prevent compiler optimization")
+				}
+			})
+		})
+	}
+}
118-120: LGTM! Consider adding documentation.

The numToBytes function is well-implemented using generics and efficient binary encoding.

Add documentation to clarify the encoding format:
+// numToBytes converts an integer to its big-endian binary representation.
+// The function always returns an 8-byte slice regardless of the input value.
 func numToBytes[T ~int](n T) []byte {
 	return binary.BigEndian.AppendUint64(nil, uint64(n))
 }
store/cachekv/branch_bench_test.go (5)
13-17: Consider using constants and adding documentation.

The global variables would benefit from being declared as constants since they're not modified. Additionally, adding documentation would improve clarity about their purpose in the benchmarks.
-var (
+const (
 	stackSizes   = []int{1, 10, 100}
 	elemsInStack = 10
 )
+
+// stackSizes defines the different stack depths to benchmark
+// elemsInStack defines the number of elements to populate in each stack layer
18-29: Consider enhancing benchmark descriptions.

While the benchmark implementation is solid, the sub-benchmark descriptions could be more descriptive to better indicate what's being tested.
-b.Run(fmt.Sprintf("StackSize%d", stackSize), func(b *testing.B) {
+b.Run(fmt.Sprintf("StackSize=%d/SingleKeySet", stackSize), func(b *testing.B) {
74-96: Add documentation for sink variables.

The iterator benchmark correctly uses sink variables to prevent compiler optimizations, but their purpose should be documented.
-var keySink, valueSink any
+// keySink and valueSink prevent the compiler from optimizing away the iterator operations
+var keySink, valueSink any
98-112: Remove unused parameter in makeBranchStack.

The testing.B parameter is not used in the function.
-func makeBranchStack(_ *testing.B, stackSize int) *cachekv.Store {
+func makeBranchStack(stackSize int) *cachekv.Store {
114-116: Add documentation for numToBytes function.

The generic function should have documentation explaining its purpose and encoding details.
+// numToBytes converts a number to its big-endian byte representation.
+// It uses a generic constraint to accept any integer type.
 func numToBytes[T ~int](n T) []byte {

📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 78b5cd4 and 71245a1.

📒 Files selected for processing (3)

server/v2/stf/branch/bench_test.go (4 hunks)
server/v2/stf/branch/store.go (1 hunks)
store/cachekv/branch_bench_test.go (1 hunks)

🧰 Additional context used

📓 Path-based instructions (3)

server/v2/stf/branch/bench_test.go (2)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

Pattern **/*_test.go: "Assess the unit test code assessing sufficient code coverage for the changes associated in the pull request"

server/v2/stf/branch/store.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

store/cachekv/branch_bench_test.go (2)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

Pattern **/*_test.go: "Assess the unit test code assessing sufficient code coverage for the changes associated in the pull request"

🔇 Additional comments (6)

server/v2/stf/branch/bench_test.go (2)

107-107: LGTM! Key generation ensures uniqueness across branches.

The key generation logic correctly combines branch and element indices using the efficient numToBytes function.

18-18: LGTM! Compiler optimization prevention is properly implemented.

The sink variable pattern is correctly used to prevent the compiler from optimizing away the benchmark operations.

Let's verify that the benchmark results are stable and not affected by compiler optimizations:

Also applies to: 25-25, 29-31

✅ Verification successful

Sink variable usage follows established patterns across the codebase

The sink variable declaration and usage in server/v2/stf/branch/bench_test.go follows the consistent pattern found across the codebase's benchmark tests. This is a well-established practice in the project, with similar implementations in other benchmark files like store/cachekv/store_bench_test.go and math/dec_test.go.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Run the benchmark multiple times to check result stability
go test -bench=Benchmark_CacheStack_Set -count=5 ./server/v2/stf/branch/... | tee results.txt
# Check if results are consistent (should have similar numbers)
awk '/Benchmark_CacheStack_Set/ { print $3 }' results.txt | sort -n | uniq -c

Length of output: 398

Script:

#!/bin/bash
# Let's verify the sink variable usage pattern in the benchmark tests
rg -U "var\s+sink\s+.*?\n.*?=.*?sink" --type go

# Check for other benchmark functions and their sink usage
ast-grep --pattern 'func Benchmark$_($_) {
  $$$
}'

# Look for any other sink variable declarations
rg "var\s+sink\s+" --type go

Length of output: 666

server/v2/stf/branch/store.go (2)

Line range hint 32-36: LGTM: Comment changes improve clarity

The updated comment accurately describes the control flow of the Get method.

Line range hint 32-36: Consider optimizing Get method performance

According to the PR objectives, there's a significant performance regression (320× slower) in _GetCached operations. Consider implementing the following optimizations:

Add read-through caching to reduce parent store lookups
Implement bulk operations for batch retrievals
Add prefetching mechanisms for frequently accessed keys

Let's analyze the usage patterns to identify optimization opportunities:

store/cachekv/branch_bench_test.go (2)

1-11: LGTM! Package structure and imports are well-organized.

The package follows Go conventions with proper import grouping.

31-47: Verify benchmark coverage matches PR objectives.

The Get and GetSparse benchmarks align with the PR objectives showing the 320× slowdown and 11.77× speedup respectively. However, consider adding more granular test cases to investigate the significant performance regression in Get operations.

Consider adding benchmarks with different key sizes and value sizes to better understand the performance characteristics:

+var (
+	benchmarkKeySizes = []int{8, 64, 256}  // in bytes
+	benchmarkValSizes = []int{8, 1024, 4096}  // in bytes
+)

Also applies to: 49-72

odeke-em

Just a drive-by code review for proper benchmarks, you need to make the sink a global and also reset it to nil after every benchmark.

odeke-em · 2024-11-08T18:56:17Z

server/v2/stf/branch/bench_test.go

 			}
 		})
 	}
+	if sink != nil {


You want the opposite, if sink == nil

odeke-em · 2024-11-08T18:56:50Z

server/v2/stf/branch/bench_test.go

+		})
+	}
+	if sink == nil {
+		b.Fatal("prevent compiler optimization")


Please use instead b.Fatal("Benchmark did not run")

odeke-em · 2024-11-08T18:57:48Z

server/v2/stf/branch/bench_test.go

+}
+
+func Benchmark_GetSparse(b *testing.B) {
+	var sink any


You need this sink to be a global to ensure writebarriers and also please reset it to nil after usage.

odeke-em · 2024-11-08T18:59:14Z

store/cachekv/branch_bench_test.go

+
+// Gets the same key from the branch store.
+func Benchmark_Get(b *testing.B) {
+	var sink any


Needs to global and to be reset after use.

odeke-em · 2024-11-08T18:59:38Z

store/cachekv/branch_bench_test.go

+
+// Gets always different keys.
+func Benchmark_GetSparse(b *testing.B) {
+	var sink any


Needs to be global and reset to nil after usage.

testinginprod added 3 commits November 8, 2024 09:08

checkpoint

a61024e

checkpoint

7fff23a

add benches for old cacheKV

71245a1

testinginprod requested review from cool-develope, kocubinski and a team as code owners November 8, 2024 14:57

github-prbot assigned cool-develope and kocubinski Nov 8, 2024

github-actions bot added C:Store C:server/v2 Issues related to server/v2 C:server/v2 stf labels Nov 8, 2024

coderabbitai bot reviewed Nov 8, 2024

View reviewed changes

odeke-em requested changes Nov 8, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(serverv2): add benchmarks of (old) cacheKV vs branch #22497

feat(serverv2): add benchmarks of (old) cacheKV vs branch #22497

testinginprod commented Nov 8, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 8, 2024 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Tool Failures:

Tool Failure Count:

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

github-actions bot commented Nov 8, 2024

coderabbitai bot left a comment

odeke-em left a comment

odeke-em Nov 8, 2024

odeke-em Nov 8, 2024

odeke-em Nov 8, 2024

odeke-em Nov 8, 2024

odeke-em Nov 8, 2024

feat(serverv2): add benchmarks of (old) cacheKV vs branch #22497

Are you sure you want to change the base?

feat(serverv2): add benchmarks of (old) cacheKV vs branch #22497

Conversation

testinginprod commented Nov 8, 2024 • edited by coderabbitai bot Loading

Description

Bench Test Type: _CacheStack_Set:

Bench Test Type: _GetCached

Bench Test Type: _GetSparse

Bench Test Type: _Iterate

Overall Performance

Memory Usage

Allocations per Operation (allocs/op):

Author Checklist

Reviewers Checklist

Summary by CodeRabbit

coderabbitai bot commented Nov 8, 2024 • edited Loading

Walkthrough

Changes

Sequence Diagram(s)

Tool Failures:

Tool Failure Count:

Chat

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

github-actions bot commented Nov 8, 2024

coderabbitai bot left a comment

Choose a reason for hiding this comment

odeke-em left a comment

Choose a reason for hiding this comment

odeke-em Nov 8, 2024

Choose a reason for hiding this comment

odeke-em Nov 8, 2024

Choose a reason for hiding this comment

odeke-em Nov 8, 2024

Choose a reason for hiding this comment

odeke-em Nov 8, 2024

Choose a reason for hiding this comment

odeke-em Nov 8, 2024

Choose a reason for hiding this comment

testinginprod commented Nov 8, 2024 •

edited by coderabbitai bot

Loading

Bench Test Type: `_CacheStack_Set`:

Bench Test Type: `_GetCached`

Bench Test Type: `_GetSparse`

Bench Test Type: `_Iterate`

coderabbitai bot commented Nov 8, 2024 •

edited

Loading