GH-32007: [Python] Support arithmetic on arrays and scalars#48085
GH-32007: [Python] Support arithmetic on arrays and scalars#48085rmnskb wants to merge 20 commits intoapache:mainfrom
Conversation
|
@pitrou @AlenkaF @rok
|
|
Sorry for late reply @rmnskb. I think all dunders that semantically map to our existing kernels are fair game! I'm not sure what these would be, but the set you have now looks decently sized.
We might be able to add docstrings via type annotations. Or perhaps it's possible to copy them at runtime from the respective kernels? (unsure that is possible). In any case we probably don't want to duplicate docstrings in general.
Yes, that would be good to have! |
Ok, I was also thinking about covering as many dunder methods as possible, but did not want to go out of scope for this issue.
Copying the docstrings sounds like a good idea, I will look further into that. |
|
Thank you for working on this @rmnskb and sorry for replying late! I think the current scope is solid. I am thinking of basic comparisons and the discussion in the issue around Yes, it would be important to update the documentation (User Guide which is the part that is not API reference related). One comment after looking at the code would be to add tests that cover various type combinations (array, scalar, Python types, unsupported types ...) |
I think we already have
That's a good point, def __eq__(self, other):
try:
return self.equals(other)
except TypeError:
# This also handles comparing with None
# as Array.equals(None) raises a TypeError.
return NotImplemented |
Yes, true. I was under the impression that we might want to change the use of
|
|
Oh, I managed to forget this and thought It seems like a good opportunity to change this! :) |
I've looked more into this, and based on these two discussions, we'd have to copy them directly from the underlying functions at runtime via importing the |
I'll be happy to work on it, once I'm done with this implementation :)
Are you talking about this file? Or is there somewhere else, where I can put this information?
Yes, I agree. Are there any particular unsupported types I should include that you have in mind? |
Thank you!
I was thinking more about the compute page in our User Guide: https://github.com/apache/arrow/blob/main/docs/source/python/compute.rst
No, not really. What comes to mind are strings and/or nested for some arithmetic functions. |
I mentioned the newly implemented operators in the documentation. Please let me know if it makes sense. I'm not sure whether we should list all the implemented methods, on the other hand, I don't want to leave users guessing what exactly can they use. |
An integer based extension type maybe. |
|
Hey, @AlenkaF, good points! Thanks for the review. I've added another test and updated the docs as per your suggestion. |
|
Two general quick comments:
|
That is true, I also though about it, but it seems to me that being consistent with other dunders also makes sense. Should we probably document this? So the users know what to expect.
Fair point. I think documenting the expected behavior might be a good idea given that Python is quite forgiving when it comes to data types. What do you think? The other option is to explicitly convert the output of this dunders to integers to stay consistent with the builtin docs. |
|
Do we want to gold-plate this? We can keep the bitwise operators, remove the round/floor/etc. dunders, and perhaps improve things later if people request it. |
|
We can split this PR in two and move round/floor/etc., ... in a follow-up one if there will be requests for them. |
|
Thank you for the update @rmnskb! I think the latest change addresses the last comments, @pitrou or @raulcd - mind giving another look at the changes? |
| # GH-32007 | ||
| arr1, arr2 = float_arrays | ||
|
|
||
| assert (arr1 + arr2).equals(pc.add_checked(arr1, arr2)) |
There was a problem hiding this comment.
If we really want to assert the "checked" aspect, then we should also include a case where overflow occurs.
There was a problem hiding this comment.
The overall intention of this was to test that the dunder methods have the same output given the same input (kernels as well). Albeit, your suggestion is valid, so added extra tests to cover whether dunder methods indeed overflow.
| ] | ||
|
|
||
| >>> val = pa.scalar(42) | ||
| >>> val - arr1 |
There was a problem hiding this comment.
Can I simply call arr1 - 42 or would that not work?
There was a problem hiding this comment.
It would work, yes. I wanted to show in the docstrings that the users can also use explicit scalars, that's why I went with this.

Rationale for this change
Please see #32007, currently, neither arrays nor scalars support Python-native arithmetic operations, such as
array + array, it has to be done viapyarrow.computeAPI. This PR strives to fix this with custom dunder methods.What changes are included in this PR?
Implemented dunder methods
Are these changes tested?
Yes
Are there any user-facing changes?
Possibility to use Python operators directly instead of calling the
pyarrow.computeAPI.