Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add
is_valid
andtruncate
methods toNullBufferBuilder
#7013Add
is_valid
andtruncate
methods toNullBufferBuilder
#7013Changes from 7 commits
f652665
0ddb588
1d65de7
f5b3110
3c963a5
847e5dc
cf5ec9a
5246cfc
2439103
290e8ef
e98d831
8ed7851
174f82c
d0d3c62
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think need to reference
self.capacity
here instead of 0?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm not sure when
bitmap_builder
hasn't been initialized, we should return the actual capacity of it (that is 0 because no builder existed) orself.capacity
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my opinion it would feel weird if I initialize a
NullBufferBuilder
with a capacity:But then checking the capacity right after would result in it saying 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree it is a bit strange, but I think the PR as written makes the most sense.
Specifically, downstream in DataFusion (and other places) we use the
capacity
as a way to calculate how much memory has been allocated -- if there is no bitmap_builder there is no memory allocated.I think we can make this less confusing with some comments. I left some suggestions
Another thing that might help might be to rename it to
fn allocated_capacity()
to make the difference more explicit 🤔There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
capacity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fn allocated_capacity()
would make sense, but it seems we already have that:arrow-rs/arrow-buffer/src/builder/null.rs
Lines 188 to 194 in 0c07ec7
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or rename
capacity
field toinitial_capacity
orinitial_bits
to clearly indicate this is for initialization only?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we don't need to add a
capacity()
method givenallocated_size()
seems to already do what is requiredThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed capacity method in 8ed7851
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once we materialize, is
self.len
accurate anymore?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure what you mean -- in this case the user requested to truncate to a value larger than the current length which has no effect.
This behavior seems to be consistent with
Vec::truncate
so it makes sense to me: https://doc.rust-lang.org/std/vec/struct.Vec.html#method.truncateThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider this test case:
It would fail at the last assertion because it was materialized after appending two nulls, but then truncating down to 1 is a noop since the internal
self.len
stays 0 (not updated after materialization)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are (of course) correct. This is a great test and find
I took the liberty of pushing a commit to this branch that includes this test case (and will fail CI until it is fixed so block this PR from merging)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed in d0d3c62 and added some more documentation on the relationship between
len
andbuilder