Skip to content

BUG: DataFrame.rank does not return EA types when original type was an EADtype #52829

@tinadu0806

Description

@tinadu0806

Pandas version checks

  • I have checked that this issue has not already been reported.

    I have confirmed this bug exists on the latest version of pandas.

    I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pyarrow as pa
s = pd.Series([1, 2], dtype=pd.ArrowDtype(pa.int32()))
r1 = s.rank(method="min")
df = s.to_frame(name="a")
r2 = df.rank(method="min")
>>> s
0    1
1    2
dtype: int32[pyarrow]
>>> df.dtypes
a    int32[pyarrow]
dtype: object
>>> r1
0    1
1    2
dtype: uint64[pyarrow]
>>> r2
     a
0  1.0
1  2.0
>>> r2.dtypes
a    float64
dtype: object

Issue Description

When we have a dataframe backed with pyarrow type data, when we call df.rank(method="min"), returned result is not arrow backed dataframe. This behavior does not happen for Series.rank(), we could see Series.rank() returned result is still arrow backed Series.

Incorrect:

df.dtypes
a int32[pyarrow]
dtype: object
r2 = df.rank(method="min")
r2.dtypes
a float64
dtype: object

Correct:

s
0 1
1 2
dtype: int32[pyarrow]
r1 = s.rank(method="min")
r1.dtype
uint64[pyarrow]

Expected Behavior

DataFrame.rank should return pyarrow backed dataframe when original dataframe filled with pyarrow.

Installed Versions

pd.version
'2.0.0'

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
on Apr 21, 2023
mroeschke

mroeschke commented on Apr 21, 2023

@mroeschke
Member

This appears like a general issue with ExtensionArrays

In [23]: pd.DataFrame([1], dtype="Int64").rank().dtypes
Out[23]: 
0    float64
dtype: object
changed the title [-]BUG: DataFrame.rank does not return pyarrow backed dataframe when original dataframe filled with pyarrow.[/-] [+]BUG: DataFrame.rank does not return EA types when original type was an EADtype[/+] on Apr 21, 2023
added
ExtensionArrayExtending pandas with custom dtypes or arrays.
Dtype ConversionsUnexpected or buggy dtype conversions
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Apr 21, 2023
mroeschke

mroeschke commented on Apr 21, 2023

@mroeschke
Member

Looks like this condition needs to account for EAs when ndim == 2

        def ranker(data):
            if data.ndim == 2:
                # i.e. DataFrame, we cast to ndarray
                values = data.values
oscar-garzon

oscar-garzon commented on Apr 24, 2023

@oscar-garzon

take

Julian048

Julian048 commented on Aug 7, 2023

@Julian048
Contributor

@oscar-garzon Are you still working on this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugDtype ConversionsUnexpected or buggy dtype conversionsExtensionArrayExtending pandas with custom dtypes or arrays.Reduction Operationssum, mean, min, max, etc.pyarrow dtype retentionop with pyarrow dtype -> expect pyarrow result

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Development

      No branches or pull requests

        Participants

        @jbrockmendel@mroeschke@oscar-garzon@tinadu0806@Julian048

        Issue actions

          BUG: DataFrame.rank does not return EA types when original type was an EADtype · Issue #52829 · pandas-dev/pandas