Skip to content

BUG: .describe() doesn't work for EAs #61707 #61760

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

kernelism
Copy link

This PR fixes a bug where Series.describe() fails on certain ExtensionArray dtypes such as pint[kg], due to attempting to cast the result to Float64Dtype. This is because some of the produced statistics are not castable to float, which raises errors like DimensionalityError.

We now avoid forcing a Float64Dtype return dtype when the EA’s scalar values cannot be safely cast. Instead:

If the EA produces outputs with mixed dtypes, the result is returned with dtype=None.

def test_describe_multiple_dtypes(self):
"""
GH61707: describe() doesn't work on EAs which generate
statistics with multiple dtypes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick can this be a comment instead of a docstring

@@ -215,6 +216,14 @@ def reorder_columns(ldesc: Sequence[Series]) -> list[Hashable]:
return names


def has_multiple_internal_dtypes(d: list[Any]) -> bool:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this can be inlined since it is only used once

@@ -251,6 +260,10 @@ def describe_numeric_1d(series: Series, percentiles: Sequence[float]) -> Series:
import pyarrow as pa

dtype = ArrowDtype(pa.float64())
elif has_multiple_internal_dtypes(d):
# GH61707: describe() doesn't work on EAs
# with multiple internal dtypes, so return object dtype
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the relevant characteristic "multiple internal dtypes" or "entries that cant be cast to Float64"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: .describe() doesn't work for EAs
2 participants