Skip to content

Adding ByteBuffer class for reading/writing elements more memory efficiently.#221

Merged
avalerio-tkd merged 11 commits intomainfrom
av_buffers_typedlist_041
Mar 2, 2026
Merged

Adding ByteBuffer class for reading/writing elements more memory efficiently.#221
avalerio-tkd merged 11 commits intomainfrom
av_buffers_typedlist_041

Conversation

@avalerio-tkd
Copy link
Collaborator

  • Adding ByteBuffer class for reading and writing fixed-size and variable-size elements.
  • Small updates to bytes_utils.h for efficiency.

…able-size elements.

- Small updates to bytes_utils.h for efficiency.
@avalerio-tkd
Copy link
Collaborator Author

@argmarco-tkd this is basically the new way to capture and write elements (fixed and variable length) instead of creating N vectors. Gives random access and random writes. Conveniently it can be reused for the Parquet format and for our internal format, because besides the few header bytes on the internal format, the values themselves are identical to the Parquet format.

The next PR will switch all uses of vectors during Encrypt/Decrypt to use this library.

Copy link
Collaborator

@argmarco-tkd argmarco-tkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this! Overall it looks good. Left a bunch of comments - mostly are stylistic, but there is an important one around the behavior for overwriting a position with variable-sized elements.

Copy link
Collaborator Author

@avalerio-tkd avalerio-tkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comments! Could you PTAL?

I'll give it another pass to add some optimizations for then elements are read/written in sequence, since that's the actual use case.

- Small clarifications and improvements from code review.
- Added sequential write checking for variable-size elements.
- Added heuristics for reserve count in variable-size parsing for read-only buffers.
- Added unittests.
- Moving to use flag for initialization of write buffer for efficiency.
Copy link
Collaborator

@argmarco-tkd argmarco-tkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for this. Changes look good - love the optimizations in place.

We should have a quick chat around testing (re: exposing some of the internal impl details in the unit test suite)

Copy link
Collaborator Author

@avalerio-tkd avalerio-tkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the comments! Some of the "Done" comments aren't pushed yet while I wait for a final update on the PR.

@avalerio-tkd
Copy link
Collaborator Author

avalerio-tkd commented Mar 2, 2026

@argmarco-tkd added an iterator over the element_span that should save the overhead of calculating offsets_ when the read is strictly sequencial (which is the most common way if doing it in single thread). If the elements are accessed randomly with GetElement, the offsets_ vector is initalized at the time in a lazy manner.

Copy link
Collaborator

@argmarco-tkd argmarco-tkd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - thanks for all the back-and-forth process! - approving, but we need to have a conversation around the existence of the iterator functionality.

@avalerio-tkd
Copy link
Collaborator Author

Overall LGTM - thanks for all the back-and-forth process! - approving, but we need to have a conversation around the existence of the iterator functionality.

Thanks for the careful review. Discussed offline for the iterator. The iterator optimization is not limited to one-by-one reads. A bulk encryptor can read in succession blocks of N elements in order from the buffer using the iterator.

@avalerio-tkd
Copy link
Collaborator Author

avalerio-tkd commented Mar 2, 2026

Added a relatively small change to account for buffers with prefixes that shouldn't be controversial.
Going to merge, we can discuss further on the last change and update in following PRs if needed.

@avalerio-tkd avalerio-tkd merged commit 089913d into main Mar 2, 2026
2 checks passed
@avalerio-tkd avalerio-tkd deleted the av_buffers_typedlist_041 branch March 2, 2026 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants