Skip to content

BUG: DataFrame.rank does not return EA types when original type was an EADtype #52829

@tinadu0806

Description

@tinadu0806

Pandas version checks

  • I have checked that this issue has not already been reported.

    I have confirmed this bug exists on the latest version of pandas.

    I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

import pandas as pd
import pyarrow as pa
s = pd.Series([1, 2], dtype=pd.ArrowDtype(pa.int32()))
r1 = s.rank(method="min")
df = s.to_frame(name="a")
r2 = df.rank(method="min")
>>> s
0    1
1    2
dtype: int32[pyarrow]
>>> df.dtypes
a    int32[pyarrow]
dtype: object
>>> r1
0    1
1    2
dtype: uint64[pyarrow]
>>> r2
     a
0  1.0
1  2.0
>>> r2.dtypes
a    float64
dtype: object

Issue Description

When we have a dataframe backed with pyarrow type data, when we call df.rank(method="min"), returned result is not arrow backed dataframe. This behavior does not happen for Series.rank(), we could see Series.rank() returned result is still arrow backed Series.

Incorrect:

df.dtypes
a int32[pyarrow]
dtype: object
r2 = df.rank(method="min")
r2.dtypes
a float64
dtype: object

Correct:

s
0 1
1 2
dtype: int32[pyarrow]
r1 = s.rank(method="min")
r1.dtype
uint64[pyarrow]

Expected Behavior

DataFrame.rank should return pyarrow backed dataframe when original dataframe filled with pyarrow.

Installed Versions

pd.version
'2.0.0'

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
on Apr 21, 2023
mroeschke

mroeschke commented on Apr 21, 2023

@mroeschke
Member

This appears like a general issue with ExtensionArrays

In [23]: pd.DataFrame([1], dtype="Int64").rank().dtypes
Out[23]: 
0    float64
dtype: object
changed the title [-]BUG: DataFrame.rank does not return pyarrow backed dataframe when original dataframe filled with pyarrow.[/-] [+]BUG: DataFrame.rank does not return EA types when original type was an EADtype[/+] on Apr 21, 2023
added
ExtensionArrayExtending pandas with custom dtypes or arrays.
Dtype ConversionsUnexpected or buggy dtype conversions
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Apr 21, 2023
mroeschke

mroeschke commented on Apr 21, 2023

@mroeschke
Member

Looks like this condition needs to account for EAs when ndim == 2

        def ranker(data):
            if data.ndim == 2:
                # i.e. DataFrame, we cast to ndarray
                values = data.values
oscar-garzon

oscar-garzon commented on Apr 24, 2023

@oscar-garzon

take

Julian048

Julian048 commented on Aug 7, 2023

@Julian048
Contributor

@oscar-garzon Are you still working on this?

jbrockmendel

jbrockmendel commented on Aug 15, 2025

@jbrockmendel
Member

This looks pretty easy: NDFrame.rank should go through self._mgr.apply. That'll also avoid a copy in data.values.

sharkipelago

sharkipelago commented on Aug 23, 2025

@sharkipelago
Contributor

take

Dibyo10

Dibyo10 commented on Aug 23, 2025

@Dibyo10

take

sharkipelago

sharkipelago commented on Aug 23, 2025

@sharkipelago
Contributor

Hi @Dibyo10 sorry I'm already working on this issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

    Participants

    @jbrockmendel@mroeschke@oscar-garzon@sharkipelago@tinadu0806

    Issue actions

      BUG: DataFrame.rank does not return EA types when original type was an EADtype · Issue #52829 · pandas-dev/pandas