GH-87: [Vector] Add ExtensionWriter#697
Conversation
Based on changes from apache/arrow#41731. Added writer ExtensionWriter with 3 methods: - write method for writing values from Extension holders; - writeExtensionType method for writing values (arguments is Object because we don't know exact type); - addExtensionTypeFactory method - because exact vector and value type are unknown, user should create their own extension type vector, writer for it and ExtensionTypeFactory where it should map vector and writer.
This comment has been minimized.
This comment has been minimized.
|
@lidavidm, please take a look if this approach can be used |
lidavidm
left a comment
There was a problem hiding this comment.
Can we preserve the original committer's information if possible so they can get the credit for the original work? (A Co-authored-by tag in the PR should suffice I think)
Is there some way we could have a type-safe design instead of just Object everywhere?
|
|
||
| import org.apache.arrow.vector.ExtensionTypeVector; | ||
|
|
||
| public interface ExtensionTypeWriterFactory<T extends AbstractFieldWriter> { |
There was a problem hiding this comment.
Does the type bound need to be AbstractFieldWriter or can it just be FieldWriter (the interface)?
There was a problem hiding this comment.
I chose AbstractFieldWriter just to make sure that the user will use some specific implementation of writer, but in general yes - it could be FieldWriter
There was a problem hiding this comment.
IMO abstract implementation classes should not go in generic bounds - it should be the interface type
vector/src/main/java/org/apache/arrow/vector/holders/ExtensionHolder.java
Show resolved
Hide resolved
| this.vector = vector; | ||
| } | ||
|
|
||
| @Override |
There was a problem hiding this comment.
Is there a way we can provide default implementations for these functions to reduce the boilerplate?
There was a problem hiding this comment.
We may want to just pull the UuidType into a toplevel class now that it's being used by multiple tests.
Because the type is unknown, we could allow only Holder impl for writing. Is this acceptable? |
|
|
||
| public class TestUuidVector { | ||
|
|
||
| public static class UuidType extends ExtensionType { |
There was a problem hiding this comment.
What I meant was, can we just make this a toplevel class? Not a nested class?
| } | ||
| } | ||
|
|
||
| public static class UuidVector extends ExtensionTypeVector<FixedSizeBinaryVector> |
| import org.apache.arrow.vector.types.pojo.ArrowType.ExtensionType; | ||
| import org.apache.arrow.vector.util.TransferPair; | ||
|
|
||
| public class TestUuidVector { |
There was a problem hiding this comment.
In fact why is this class here at all? All it does is wrap the two inner classes
| void writeNull(); | ||
| <T extends ExtensionHolder> void write(T var1); | ||
| void writeExtensionType(Object var1); | ||
| <T extends ExtensionTypeWriterFactory> void addExtensionTypeFactory(T var1); |
There was a problem hiding this comment.
While the existing code is short on docstrings, can we add docstrings for new code going forward? In particular it's not clear what addExtensionTypeFactory is or how to use it
There was a problem hiding this comment.
Are the generic bounds actually useful here? Why can't it just be ExtensionTypeWriterFactory var1?
There was a problem hiding this comment.
Can we replace var1 with meaningful parameter names?
|
|
||
| public interface ExtensionWriter extends BaseWriter { | ||
| void writeNull(); | ||
| <T extends ExtensionHolder> void write(T var1); |
There was a problem hiding this comment.
Same here - why not just void write(ExtensionHolder value)?
|
|
||
| import org.apache.arrow.vector.ExtensionTypeVector; | ||
|
|
||
| public interface ExtensionTypeWriterFactory<T extends AbstractFieldWriter> { |
There was a problem hiding this comment.
IMO abstract implementation classes should not go in generic bounds - it should be the interface type
vector/src/main/java/org/apache/arrow/vector/complex/impl/ExtensionTypeWriterFactory.java
Show resolved
Hide resolved
There was a problem hiding this comment.
I don't see any test of using ExtensionHolder?
lidavidm
left a comment
There was a problem hiding this comment.
Looking good, just some final naming questions
| void writeNull(); | ||
|
|
||
| /** | ||
| * Writes vlaue from the given extension holder. |
There was a problem hiding this comment.
| * Writes vlaue from the given extension holder. | |
| * Writes value from the given extension holder. |
|
|
||
| @Override | ||
| public void write(ExtensionHolder holder) { | ||
| if (holder instanceof UuidHolder) { |
There was a problem hiding this comment.
Either delete the check to be consistent with writeExtensionType or explicitly error if the holder is the wrong type
| import org.apache.arrow.vector.holder.UuidHolder; | ||
| import org.apache.arrow.vector.holders.ExtensionHolder; | ||
|
|
||
| public class UuidWriterImpl extends AbstractExtensionTypeWriter { |
There was a problem hiding this comment.
| public class UuidWriterImpl extends AbstractExtensionTypeWriter { | |
| public class UuidWriterImpl extends AbstractExtensionTypeWriter<UuidVector> { |
| ByteBuffer bb = ByteBuffer.allocate(16); | ||
| bb.putLong(uuid.getMostSignificantBits()); | ||
| bb.putLong(uuid.getLeastSignificantBits()); | ||
| ((UuidVector) this.vector).setSafe(this.idx(), bb.array()); |
There was a problem hiding this comment.
| ((UuidVector) this.vector).setSafe(this.idx(), bb.array()); | |
| vector.setSafe(idx(), bb.array()); |
| ((UuidVector) this.vector).setSafe(this.idx(), uuidHolder.value); | ||
| this.vector.setValueCount(this.idx() + 1); |
There was a problem hiding this comment.
| ((UuidVector) this.vector).setSafe(this.idx(), uuidHolder.value); | |
| this.vector.setValueCount(this.idx() + 1); | |
| vector.setSafe(idx(), uuidHolder.value); | |
| vector.setValueCount(idx() + 1); |
| * | ||
| * @param value the extension type value to write | ||
| */ | ||
| void writeExtensionType(Object value); |
There was a problem hiding this comment.
This is a nit but now I'm thinking it should be just writeExtension to parallel writeVarChar
| * | ||
| * @param factory the extension type factory to add | ||
| */ | ||
| void addExtensionTypeFactory(ExtensionTypeWriterFactory factory); |
There was a problem hiding this comment.
| void addExtensionTypeFactory(ExtensionTypeWriterFactory factory); | |
| void addExtensionTypeWriterFactory(ExtensionTypeWriterFactory factory); |
for consistency as well
|
|
||
| @Override | ||
| protected int idx() { | ||
| return super.idx(); |
There was a problem hiding this comment.
Hmm, is the only reason we override here to make it protected instead of package-private? (Should we just use getPosition() instead as it is already public and does the same thing?)
There was a problem hiding this comment.
yep, makes sense - logic with idx() method added for all other writers.
lidavidm
left a comment
There was a problem hiding this comment.
LGTM, but please see the pre-commit failures
Based on changes from apache/arrow#41731. ## What's Changed Added writer ExtensionWriter with 3 methods: - write method for writing values from Extension holders; - writeExtensionType method for writing values (arguments is Object because we don't know exact type); - addExtensionTypeFactory method - because the exact vector and value type are unknown, the user should create their own extension type vector, write for it, and ExtensionTypeFactory, which should map the vector and writer. Closes apache#87. Co-authored-by: Finn Völkel <finn.volkel@gmail.com>
apacheGH-87: [Vector] Add ExtensionWriter (apache#697)
* apacheGH-87: [Vector] Add ExtensionWriter (apache#697) missed file
* apacheGH-87: [Vector] Add ExtensionWriter (apache#697) missed file * apacheGH-87: [Vector] Add ExtensionWriter (apache#697) missed file * apacheGH-87: [Vector] Add ExtensionWriter (apache#697) updated UnionListWriter.
Based on changes from apache/arrow#41731. ## What's Changed Added writer ExtensionWriter with 3 methods: - write method for writing values from Extension holders; - writeExtensionType method for writing values (arguments is Object because we don't know exact type); - addExtensionTypeFactory method - because the exact vector and value type are unknown, the user should create their own extension type vector, write for it, and ExtensionTypeFactory, which should map the vector and writer. Closes apache#87. Co-authored-by: Finn Völkel <finn.volkel@gmail.com>
* apacheGH-87: [Vector] Add ExtensionWriter (apache#697) missed file
* apacheGH-87: [Vector] Add ExtensionWriter (apache#697) missed file * apacheGH-87: [Vector] Add ExtensionWriter (apache#697) missed file * apacheGH-87: [Vector] Add ExtensionWriter (apache#697) updated UnionListWriter.
Based on changes from apache/arrow#41731.
What's Changed
Added writer ExtensionWriter with 3 methods:
Closes #87.
Co-authored-by: Finn Völkel finn.volkel@gmail.com