Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(serverv2): add benchmarks of (old) cacheKV vs branch #22497

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

testinginprod
Copy link
Contributor

@testinginprod testinginprod commented Nov 8, 2024

Description

Summary of Benchmark Performance Comparison


Bench Test Type: _CacheStack_Set:

This sets the same key in state. Note: Stack size should have no impact here since the top branch is the one where the cache is being set.

  • StackSize1-14

    • Previous Performance (storev1.txt): 48.97 ns/op
    • Current Performance (old.txt): 32.93 ns/op
    • Improvement: Current code is 1.49× faster
  • StackSize10-14

    • Previous: 55.87 ns/op
    • Current: 46.38 ns/op
    • Improvement: Current code is 1.20× faster
  • StackSize100-14

    • Previous: 56.58 ns/op
    • Current: 46.38 ns/op
    • Improvement: Current code is 1.22× faster

Bench Test Type: _GetCached

Gets the same key.

NOTE: CacheKV caches the key on the first get, making getting the same key expensive only the first time but in-expensive other times. This works well only when we fetch the same key. GetSparse shows the opposite view.

  • StackSize1-14

    • Previous: 12.56 ns/op
    • Current: 25.90 ns/op
    • Performance Decline: Current code is 2.06× slower
  • StackSize10-14

    • Previous: 17.28 ns/op
    • Current: 331.1 ns/op
    • Performance Decline: Current code is 19.16× slower
  • StackSize100-14

    • Previous: 16.85 ns/op
    • Current: 5,398 ns/op
    • Performance Decline: Current code is 320× slower

Bench Test Type: _GetSparse

Fetching always new key from storage.

  • StackSize1-14

    • Previous: 261.8 ns/op
    • Current: 22.25 ns/op
    • Improvement: Current code is 11.77× faster
  • StackSize10-14

    • Previous: 1,795 ns/op
    • Current: 328.1 ns/op
    • Improvement: Current code is 5.47× faster
  • StackSize100-14

    • Previous: 18,770 ns/op
    • Current: 5,292 ns/op
    • Improvement: Current code is 3.54× faster

Bench Test Type: _Iterate

  • StackSize1-14

    • Previous: 457.3 ns/op
    • Current: 290.4 ns/op
    • Improvement: Current code is 1.58× faster
  • StackSize10-14

    • Previous: 34,090 ns/op
    • Current: 13,670 ns/op
    • Improvement: Current code is 2.49× faster
  • StackSize100-14

    • Previous: 47,370,000 ns/op
    • Current: 1,441,000 ns/op
    • Improvement: Current code is 32.86× faster

Overall Performance

  • Geometric Mean Time
    • Previous: 626.9 ns/op
    • Current: 547.4 ns/op
    • Overall Improvement: Current code is approximately 1.15× faster

Memory Usage

Bytes per Operation (B/op):

  • _CacheStack_Set

    • Previous: 34 B/op
    • Current: 2 B/op
    • Improvement: Current code uses 17× less memory
  • _GetCached

    • Both Versions: 1 B/op
    • Note: No significant change
  • _GetSparse

    • Previous: Ranged from 115 B/op to 13.23 KiB/op
    • Current: 0 B/op
    • Improvement: Current code eliminates memory allocations in this test
  • _Iterate

    • Previous vs. Current: Slight reductions in memory usage with the current code

Allocations per Operation (allocs/op):

  • _CacheStack_Set

    • Previous: 3 allocs/op
    • Current: 2 allocs/op
    • Improvement: Fewer allocations with current code
  • _GetSparse

    • Previous: Up to 103 allocs/op
    • Current: 0 allocs/op
    • Improvement: Current code eliminates allocations
  • _Iterate

    • Slight Reduction: Current code has fewer allocations per operation

Conclusion:

  • Performance Gains: The current code shows significant performance improvements in the _CacheStack_Set, _GetSparse, and _Iterate benchmarks, especially with larger stack sizes.
  • Performance Losses: There is a notable performance decline in the _GetCached benchmarks, where the current code is slower than the previous version.
  • Memory Efficiency: The current code generally uses less memory and has fewer allocations per operation, which may contribute to performance improvements in certain benchmarks.

This summary provides a detailed comparison of the benchmark tests, highlighting where the current code has improved or declined in performance relative to the previous version.


Author Checklist

All items are required. Please add a note to the item if the item is not applicable and
please add links to any relevant follow up issues.

I have...

  • included the correct type prefix in the PR title, you can find examples of the prefixes below:
  • confirmed ! in the type prefix if API or client breaking change
  • targeted the correct branch (see PR Targeting)
  • provided a link to the relevant issue or specification
  • reviewed "Files changed" and left comments if necessary
  • included the necessary unit and integration tests
  • added a changelog entry to CHANGELOG.md
  • updated the relevant documentation or specification, including comments for documenting Go code
  • confirmed all CI checks have passed

Reviewers Checklist

All items are required. Please add a note if the item is not applicable and please add
your handle next to the items reviewed if you only reviewed selected items.

Please see Pull Request Reviewer section in the contributing guide for more information on how to review a pull request.

I have...

  • confirmed the correct type prefix in the PR title
  • confirmed all author checklist items have been addressed
  • reviewed state machine logic, API design and naming, documentation is accurate, tests and test coverage

Summary by CodeRabbit

  • New Features

    • Introduced a suite of benchmark tests for the cachekv package, measuring performance for setting and retrieving values, as well as iteration over key-value pairs.
    • Added a new function to convert integers to byte slices for enhanced key generation.
  • Bug Fixes

    • Improved clarity in comments for the Get method within the Store struct to better reflect the logic flow.

These enhancements aim to improve performance evaluation and code clarity for users interacting with the caching system.

Copy link
Contributor

coderabbitai bot commented Nov 8, 2024

📝 Walkthrough

Walkthrough

The changes introduce enhancements to benchmarking tests in the cachekv package and modifications to the Get method in the Store struct. New benchmark functions are added to evaluate performance, ensuring results are not optimized away by the compiler. Additionally, a new utility function for converting integers to byte slices is introduced, alongside improvements in key generation logic. Comments in the Get method are clarified without altering its functionality.

Changes

File Change Summary
server/v2/stf/branch/bench_test.go - Added sink variable in benchmark functions to prevent compiler optimizations.
- Modified Benchmark_CacheStack_Set, Benchmark_Get, and added Benchmark_GetSparse.
- Introduced numToBytes function for converting integers to byte slices.
- Updated key generation logic in makeBranchStack.
server/v2/stf/branch/store.go - Updated comments in Get method for clarity; no changes to logic or functionality.
store/cachekv/branch_bench_test.go - New file with benchmark tests: Benchmark_CacheStack_Set, Benchmark_Get, Benchmark_GetSparse, Benchmark_Iterate.
- Added helper functions: makeBranchStack and numToBytes.

Sequence Diagram(s)

sequenceDiagram
    participant B as Benchmark
    participant S as Store
    participant C as CacheStack

    B->>C: Benchmark_CacheStack_Set()
    C->>S: Set(key, value)
    S-->>C: Result (stored in sink)
    B->>C: Benchmark_Get()
    C->>S: Get(key)
    S-->>C: Result (stored in sink)
    B->>C: Benchmark_GetSparse()
    C->>S: Get(unique_key)
    S-->>C: Result (stored in sink)
    B->>C: Benchmark_Iterate()
    C->>S: Iterate()
    S-->>C: Key-Value pairs
Loading

Warning

Tool Failures:

Tool Failure Count:


Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

github-actions bot commented Nov 8, 2024

@testinginprod your pull request is missing a changelog!

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Outside diff range and nitpick comments (8)
server/v2/stf/branch/bench_test.go (3)

35-35: Consider adding more detailed performance metrics.

Given the significant performance regression (320x slower) mentioned in the PR objectives, consider adding:

  1. Memory statistics using b.ReportAllocs()
  2. Sub-benchmarks for different key patterns
  3. CPU profiling for detailed analysis

Example enhancement:

 func Benchmark_Get(b *testing.B) {
 	var sink any
 	for _, stackSize := range stackSizes {
 		b.Run(fmt.Sprintf("StackSize%d", stackSize), func(b *testing.B) {
 			bs := makeBranchStack(b, stackSize)
+			// Add CPU profiling
+			if stackSize == 100 { // Profile only larger sizes
+				f, _ := os.Create("cpu.prof")
+				pprof.StartCPUProfile(f)
+				defer pprof.StopCPUProfile()
+			}
 			b.ResetTimer()
 			b.ReportAllocs()
 			for i := 0; i < b.N; i++ {
 				sink, _ = bs.Get([]byte{0})
 			}
 		})
 	}
 	if sink == nil {
 		b.Fatal("prevent compiler optimization")
 	}
 }

Also applies to: 42-48


51-72: LGTM! Consider adding parallel benchmark variant.

The benchmark is well-structured with proper key generation outside the timed section. Given the significant performance improvement (11.77x faster), it would be valuable to also test parallel access patterns.

Add a parallel variant:

+func Benchmark_GetSparse_Parallel(b *testing.B) {
+	for _, stackSize := range stackSizes {
+		b.Run(fmt.Sprintf("StackSize%d", stackSize), func(b *testing.B) {
+			bs := makeBranchStack(b, stackSize)
+			keys := make([][]byte, b.N)
+			for i := 0; i < b.N; i++ {
+				keys[i] = numToBytes(i)
+			}
+			b.ResetTimer()
+			b.ReportAllocs()
+			b.RunParallel(func(pb *testing.PB) {
+				var sink any
+				i := 0
+				for pb.Next() {
+					sink, _ = bs.Get(keys[i%len(keys)])
+					i++
+				}
+				if sink == nil {
+					b.Fatal("prevent compiler optimization")
+				}
+			})
+		})
+	}
+}

118-120: LGTM! Consider adding documentation.

The numToBytes function is well-implemented using generics and efficient binary encoding.

Add documentation to clarify the encoding format:

+// numToBytes converts an integer to its big-endian binary representation.
+// The function always returns an 8-byte slice regardless of the input value.
 func numToBytes[T ~int](n T) []byte {
 	return binary.BigEndian.AppendUint64(nil, uint64(n))
 }
store/cachekv/branch_bench_test.go (5)

13-17: Consider using constants and adding documentation.

The global variables would benefit from being declared as constants since they're not modified. Additionally, adding documentation would improve clarity about their purpose in the benchmarks.

-var (
+const (
 	stackSizes   = []int{1, 10, 100}
 	elemsInStack = 10
 )
+
+// stackSizes defines the different stack depths to benchmark
+// elemsInStack defines the number of elements to populate in each stack layer

18-29: Consider enhancing benchmark descriptions.

While the benchmark implementation is solid, the sub-benchmark descriptions could be more descriptive to better indicate what's being tested.

-b.Run(fmt.Sprintf("StackSize%d", stackSize), func(b *testing.B) {
+b.Run(fmt.Sprintf("StackSize=%d/SingleKeySet", stackSize), func(b *testing.B) {

74-96: Add documentation for sink variables.

The iterator benchmark correctly uses sink variables to prevent compiler optimizations, but their purpose should be documented.

-var keySink, valueSink any
+// keySink and valueSink prevent the compiler from optimizing away the iterator operations
+var keySink, valueSink any

98-112: Remove unused parameter in makeBranchStack.

The testing.B parameter is not used in the function.

-func makeBranchStack(_ *testing.B, stackSize int) *cachekv.Store {
+func makeBranchStack(stackSize int) *cachekv.Store {

114-116: Add documentation for numToBytes function.

The generic function should have documentation explaining its purpose and encoding details.

+// numToBytes converts a number to its big-endian byte representation.
+// It uses a generic constraint to accept any integer type.
 func numToBytes[T ~int](n T) []byte {
📜 Review details

Configuration used: .coderabbit.yml
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 78b5cd4 and 71245a1.

📒 Files selected for processing (3)
  • server/v2/stf/branch/bench_test.go (4 hunks)
  • server/v2/stf/branch/store.go (1 hunks)
  • store/cachekv/branch_bench_test.go (1 hunks)
🧰 Additional context used
📓 Path-based instructions (3)
server/v2/stf/branch/bench_test.go (2)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.


Pattern **/*_test.go: "Assess the unit test code assessing sufficient code coverage for the changes associated in the pull request"

server/v2/stf/branch/store.go (1)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.

store/cachekv/branch_bench_test.go (2)

Pattern **/*.go: Review the Golang code for conformity with the Uber Golang style guide, highlighting any deviations.


Pattern **/*_test.go: "Assess the unit test code assessing sufficient code coverage for the changes associated in the pull request"

🔇 Additional comments (6)
server/v2/stf/branch/bench_test.go (2)

107-107: LGTM! Key generation ensures uniqueness across branches.

The key generation logic correctly combines branch and element indices using the efficient numToBytes function.


18-18: LGTM! Compiler optimization prevention is properly implemented.

The sink variable pattern is correctly used to prevent the compiler from optimizing away the benchmark operations.

Let's verify that the benchmark results are stable and not affected by compiler optimizations:

Also applies to: 25-25, 29-31

✅ Verification successful

Sink variable usage follows established patterns across the codebase

The sink variable declaration and usage in server/v2/stf/branch/bench_test.go follows the consistent pattern found across the codebase's benchmark tests. This is a well-established practice in the project, with similar implementations in other benchmark files like store/cachekv/store_bench_test.go and math/dec_test.go.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Run the benchmark multiple times to check result stability
go test -bench=Benchmark_CacheStack_Set -count=5 ./server/v2/stf/branch/... | tee results.txt
# Check if results are consistent (should have similar numbers)
awk '/Benchmark_CacheStack_Set/ { print $3 }' results.txt | sort -n | uniq -c

Length of output: 398


Script:

#!/bin/bash
# Let's verify the sink variable usage pattern in the benchmark tests
rg -U "var\s+sink\s+.*?\n.*?=.*?sink" --type go

# Check for other benchmark functions and their sink usage
ast-grep --pattern 'func Benchmark$_($_) {
  $$$
}'

# Look for any other sink variable declarations
rg "var\s+sink\s+" --type go

Length of output: 666

server/v2/stf/branch/store.go (2)

Line range hint 32-36: LGTM: Comment changes improve clarity

The updated comment accurately describes the control flow of the Get method.


Line range hint 32-36: Consider optimizing Get method performance

According to the PR objectives, there's a significant performance regression (320× slower) in _GetCached operations. Consider implementing the following optimizations:

  1. Add read-through caching to reduce parent store lookups
  2. Implement bulk operations for batch retrievals
  3. Add prefetching mechanisms for frequently accessed keys

Let's analyze the usage patterns to identify optimization opportunities:

store/cachekv/branch_bench_test.go (2)

1-11: LGTM! Package structure and imports are well-organized.

The package follows Go conventions with proper import grouping.


31-47: Verify benchmark coverage matches PR objectives.

The Get and GetSparse benchmarks align with the PR objectives showing the 320× slowdown and 11.77× speedup respectively. However, consider adding more granular test cases to investigate the significant performance regression in Get operations.

Consider adding benchmarks with different key sizes and value sizes to better understand the performance characteristics:

+var (
+	benchmarkKeySizes = []int{8, 64, 256}  // in bytes
+	benchmarkValSizes = []int{8, 1024, 4096}  // in bytes
+)

Also applies to: 49-72

Copy link
Collaborator

@odeke-em odeke-em left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a drive-by code review for proper benchmarks, you need to make the sink a global and also reset it to nil after every benchmark.

}
})
}
if sink != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You want the opposite, if sink == nil

})
}
if sink == nil {
b.Fatal("prevent compiler optimization")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use instead b.Fatal("Benchmark did not run")

}

func Benchmark_GetSparse(b *testing.B) {
var sink any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need this sink to be a global to ensure writebarriers and also please reset it to nil after usage.


// Gets the same key from the branch store.
func Benchmark_Get(b *testing.B) {
var sink any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to global and to be reset after use.


// Gets always different keys.
func Benchmark_GetSparse(b *testing.B) {
var sink any
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs to be global and reset to nil after usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C:server/v2 stf C:server/v2 Issues related to server/v2 C:Store
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants