GH-35747: Add FixedSizeListArray.to_numpy_ndarray and FixedSizeListArray.from_numpy_ndarray#35864
GH-35747: Add FixedSizeListArray.to_numpy_ndarray and FixedSizeListArray.from_numpy_ndarray#35864spenczar wants to merge 1 commit intoapache:mainfrom
Conversation
|
|
|
Thank you @spenczar for making a contribution! The implementation in Python instead of C++ makes sense. It might also make sense to think about the value of having this case (numpy arrays of 1 vs 2 dimensions, how one could still use 2-dim arrays in pyarrow avoiding an error) added to the documentation without adding two new functions to the code. I think that could be enough for a user needing to have an option for 2-dim numpy arrays. |
|
Hey @spenczar, would you consider:
This could work either currently as a workaround or as machinery for this PR. |
|
Cleaning up old un-merged PRs in my queue. I still think this was a good idea but it appears to have no movement; I understand it is blocked by #40354. |
|
I hope we cycle back to this once #40354 is merged. And sorry for the wait. |
What changes are included in this PR?
#35747 asks for additions to all three ListArray types (ListArray, FixedSizeListArray, and LargeListArray). This PR only adds for FixedSizeListArray. That's just because it's both the easiest to add, and the most obviously useful scenario.
This implementation is just in Python and follows the general pattern of FixedShapeTensorArrays.
An alternative implementation would be to do this in python/pyarrow/src/arrow/python/numpy_convert.cc. We could detect 2D Numpy arrays and convert them to FixedShapeTensorArrays. That would be a much more significant change; I think it would change the behavior of other calls like
pyarrow.array(). It would also allow for much more careful management of copies and memory; it should be possible to implement both to_ and from_ with zero copy for primitive numeric types. However, it would be a lot more complex, and this is my first Arrow contribution, so I figured I'd start a bit smaller.Are these changes tested?
Yes.
Are there any user-facing changes?
Yes, and I think I added sufficient documentation.