-
Notifications
You must be signed in to change notification settings - Fork 28.6k
[SPARK-52249][PS] Enable divide-by-zero for truediv with ANSI enabled #50972
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Should be merged after #51035 |
2c60487
to
443e641
Compare
F.lit(right != 0) | F.lit(right).isNull(), | ||
left.__div__(right), | ||
).otherwise(F.lit(np.inf).__div__(left)) | ||
if not get_option("compute.ansi_mode_support"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we optimize out this get_option
which needs a separate Config RPC?
I guess we can just use the new branch
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry would you mind clarifying what you meant?
@@ -111,7 +111,6 @@ def test_binary_operator_sub(self): | |||
psdf = ps.DataFrame({"a": ["x"], "b": ["y"]}) | |||
self.assertRaisesRegex(TypeError, ks_err_msg, lambda: psdf["a"] - psdf["b"]) | |||
|
|||
@unittest.skipIf(is_ansi_mode_test, ansi_mode_not_supported_message) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you also try to find where we can remove skipping caused by the division error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I follow up with https://issues.apache.org/jira/browse/SPARK-52349 on that if you don't mind, in order to unblock the other pr?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, that's fine. 👍
python/pyspark/pandas/utils.py
Outdated
@@ -1070,6 +1070,14 @@ def xor(df1: PySparkDataFrame, df2: PySparkDataFrame) -> PySparkDataFrame: | |||
) | |||
|
|||
|
|||
def is_ansi_mode_enabled() -> bool: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we explicitly pass spark
? SparkSession.getActiveSession()
is not light.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusted.
What changes were proposed in this pull request?
Enable divide-by-zero for truediv with ANSI enabled
Why are the changes needed?
Part of https://issues.apache.org/jira/browse/SPARK-52169
Does this PR introduce any user-facing change?
Yes, divide-by-zero for truediv is enabled with ANSI enabled
FROM
TO
How was this patch tested?
Unit tests
Was this patch authored or co-authored using generative AI tooling?
No