-
Couldn't load subscription status.
- Fork 4.6k
transport: Remove buffer copies while writing HTTP/2 Data frames #8667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #8667 +/- ##
==========================================
+ Coverage 81.97% 82.07% +0.09%
==========================================
Files 417 417
Lines 40788 40851 +63
==========================================
+ Hits 33435 33527 +92
+ Misses 5991 5945 -46
- Partials 1362 1379 +17
🚀 New features to boost your workflow:
|
| } | ||
|
|
||
| // Reader exposes a BufferSlice's data as an io.Reader, allowing it to interface | ||
| // with other parts systems. It also provides an additional convenience method |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are here, maybe you can remove mentions of this one additional convenience method. Looks like there are going to be multiple convenience methods going forward.
| // Next appends results to the provided res slice and returns the updated | ||
| // slice. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: This line about Next seems out of place. Maybe delete?
| // Peek returns up to the next n bytes from the reader's current position as | ||
| // a slice of byte slices. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to return the actual number of bytes returned in res? Most APIs to read data do that.
| // slice. | ||
| // The returned subslices are views into the underlying buffers and are only | ||
| // valid until the reader is advanced past the corresponding buffer. | ||
| Peek(n int, res [][]byte) [][]byte |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me as to why the result is both a parameter and a return value in this method.
| } | ||
| res := c.Peek(1, nil) | ||
| if len(res) != 0 { | ||
| t.Errorf("Peek() got %v, want empty", res) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Peek() got %v slices, want empty?
| tests := []struct { | ||
| name string | ||
| buffers [][]byte | ||
| operations func(t *testing.T, c mem.Reader) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does feel like many of checks in many of the operations should result in calls to t.Fatal instead of t.Error since it doesn't make sense to carry on (for example, when you start with an empty buffer and Remaining doesn't return 0).
And if we start calling t.Fatal at different places, we also probably need to defer the call to the Close method of the reader in the main test logic.
| }{ | ||
| { | ||
| name: "empty", | ||
| operations: func(t *testing.T, c mem.Reader) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Curious: why is the reader called c here and in the main test logic?
| fr *http2.Framer | ||
| writer *bufWriter | ||
| fr *http2.Framer | ||
| headerBuf []byte // cached slice for framer headers to reduce heap allocs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this be an array instead? [9]byte?
| f.headerBuf = append(f.headerBuf[:0], | ||
| byte(length>>16), | ||
| byte(length>>8), | ||
| byte(length), | ||
| byte(http2.FrameData), | ||
| byte(flags), | ||
| byte(streamID>>24), | ||
| byte(streamID>>16), | ||
| byte(streamID>>8), | ||
| byte(streamID)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And if we make the headerBuf an array, does it make sense to directly set entries in the array (using indices) instead of using append?
| _, err := f.writer.Write(d) | ||
| if err != nil { | ||
| return err | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: assignment and conditional on the same line?
This PR removes 2 buffer copies while writing data frames to the underlying net.Conn: one within gRPC and the other in the framer. Care is taken to avoid any extra heap allocations which can affect performance for smaller payloads.
A CL is out for review which allows using the framer to write frame headers. This PR duplicates the header writing code as a temporary workaround. This PR will be merged only after the CL is merged.
Results
Small payloads
Performance for small payloads increases slightly due to the reduction of a
deferredstatement.Large payloads
Local benchmarks show a ~5-10% regression with 1 MB payloads on my dev machine. The profiles show increased time spent in the copy operation inside the buffered writer. Counterintuitively, copying the grpc header and message data into a larger buffer increased the performance by 4% (compared to master).
To validate this behaviour (extra copy increasing performance) I ran the k8s benchmark for 1MB payloads and 100 concurrent streams which showed ~5% increase in QPS without the copies across multiple runs. Adding a copy reduced the performance.
Load test config file: loadtest.yaml
RELEASE NOTES: