-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Reproducible Example
import pandas as pd
import pyarrow as pa
s = pd.Series([1, 2], dtype=pd.ArrowDtype(pa.int32()))
r1 = s.rank(method="min")
df = s.to_frame(name="a")
r2 = df.rank(method="min")
>>> s
0 1
1 2
dtype: int32[pyarrow]
>>> df.dtypes
a int32[pyarrow]
dtype: object
>>> r1
0 1
1 2
dtype: uint64[pyarrow]
>>> r2
a
0 1.0
1 2.0
>>> r2.dtypes
a float64
dtype: object
Issue Description
When we have a dataframe backed with pyarrow type data, when we call df.rank(method="min"), returned result is not arrow backed dataframe. This behavior does not happen for Series.rank(), we could see Series.rank() returned result is still arrow backed Series.
Incorrect:
df.dtypes
a int32[pyarrow]
dtype: object
r2 = df.rank(method="min")
r2.dtypes
a float64
dtype: object
Correct:
s
0 1
1 2
dtype: int32[pyarrow]
r1 = s.rank(method="min")
r1.dtype
uint64[pyarrow]
Expected Behavior
DataFrame.rank should return pyarrow backed dataframe when original dataframe filled with pyarrow.
Installed Versions
pd.version
'2.0.0'
Activity
mroeschke commentedon Apr 21, 2023
This appears like a general issue with ExtensionArrays
[-]BUG: DataFrame.rank does not return pyarrow backed dataframe when original dataframe filled with pyarrow.[/-][+]BUG: DataFrame.rank does not return EA types when original type was an EADtype[/+]mroeschke commentedon Apr 21, 2023
Looks like this condition needs to account for EAs when
ndim == 2
oscar-garzon commentedon Apr 24, 2023
take
Julian048 commentedon Aug 7, 2023
@oscar-garzon Are you still working on this?