Adding ByteBuffer class for reading/writing elements more memory efficiently.#221
Adding ByteBuffer class for reading/writing elements more memory efficiently.#221avalerio-tkd merged 11 commits intomainfrom
Conversation
avalerio-tkd
commented
Mar 1, 2026
- Adding ByteBuffer class for reading and writing fixed-size and variable-size elements.
- Small updates to bytes_utils.h for efficiency.
…able-size elements. - Small updates to bytes_utils.h for efficiency.
|
@argmarco-tkd this is basically the new way to capture and write elements (fixed and variable length) instead of creating N vectors. Gives random access and random writes. Conveniently it can be reused for the Parquet format and for our internal format, because besides the few header bytes on the internal format, the values themselves are identical to the Parquet format. The next PR will switch all uses of vectors during Encrypt/Decrypt to use this library. |
argmarco-tkd
left a comment
There was a problem hiding this comment.
Thanks for this! Overall it looks good. Left a bunch of comments - mostly are stylistic, but there is an important one around the behavior for overwriting a position with variable-sized elements.
avalerio-tkd
left a comment
There was a problem hiding this comment.
Thanks for the comments! Could you PTAL?
I'll give it another pass to add some optimizations for then elements are read/written in sequence, since that's the actual use case.
- Small clarifications and improvements from code review.
- Added sequential write checking for variable-size elements. - Added heuristics for reserve count in variable-size parsing for read-only buffers. - Added unittests.
- Moving to use flag for initialization of write buffer for efficiency.
argmarco-tkd
left a comment
There was a problem hiding this comment.
thanks for this. Changes look good - love the optimizations in place.
We should have a quick chat around testing (re: exposing some of the internal impl details in the unit test suite)
avalerio-tkd
left a comment
There was a problem hiding this comment.
Thanks for all the comments! Some of the "Done" comments aren't pushed yet while I wait for a final update on the PR.
…ads. - Small changes for code review.
|
@argmarco-tkd added an iterator over the element_span that should save the overhead of calculating offsets_ when the read is strictly sequencial (which is the most common way if doing it in single thread). If the elements are accessed randomly with GetElement, the offsets_ vector is initalized at the time in a lazy manner. |
…ther buffer and verify the round trip.
argmarco-tkd
left a comment
There was a problem hiding this comment.
Overall LGTM - thanks for all the back-and-forth process! - approving, but we need to have a conversation around the existence of the iterator functionality.
- Updated unittests.
Thanks for the careful review. Discussed offline for the iterator. The iterator optimization is not limited to one-by-one reads. A bulk encryptor can read in succession blocks of N elements in order from the buffer using the iterator. |
|
Added a relatively small change to account for buffers with prefixes that shouldn't be controversial. |