Skip to content

BUG: isna on pyarrow backed Series is returning Series with bool dtype instead of bool[pyarrow] #59431

@thesword53

Description

@thesword53

Pandas version checks

  • I have checked that this issue has not already been reported.

    I have confirmed this bug exists on the latest version of pandas.

    I have confirmed this bug exists on the main branch of pandas.

Reproducible Example

>>> s = pd.Series([0, None, 4, 5], dtype="u1[pyarrow]")
>>> s
0       0
1    <NA>
2       4
3       5
dtype: uint8[pyarrow]

>>> s.isna()
0    False
1     True
2    False
3    False
dtype: bool

Issue Description

s.isna().dtype is BoolDType (bool) instead of ArrowDtype(pa.bool_()) (bool[pyarrow])

Expected Behavior

>>> s.isna()
0    False
1     True
2    False
3    False
dtype: bool[pyarrow]

Installed Versions

INSTALLED VERSIONS

commit : d9cdd2e
python : 3.11.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.cp1252

pandas : 2.2.2
numpy : 2.0.1
pytz : 2024.1
dateutil : 2.9.0
setuptools : 70.2.0
pip : None
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.2.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.9.0
numba : None
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
pyarrow : 17.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None

Activity

added
Needs TriageIssue that has not been reviewed by a pandas team member
on Aug 6, 2024
loicdiridollou

loicdiridollou commented on Aug 7, 2024

@loicdiridollou
Member

Hey @thesword53,
I realized that issue also affected something I am working on so I went down the rabbit hole and it seems like what is happening is that the Series gets cast to a np.ndarray then the is isna operation gets applied and when they rebuild the Series object, we lose the original type (pyarrow) and it seems like it just rebuilds without any assumption of type (as we pass an np.ndarray of bool it just set the type of the Series to bool and not bool[pyarrow]).

# box
result = obj._constructor(result, index=obj.index, name=obj.name, copy=False)
return result

This also affects if you create a Dataframe where the type of the column was originally uint8[pyarrow] and it gets cast into bool and not bool[pyarrow].

KevsterAmp

KevsterAmp commented on Aug 7, 2024

@KevsterAmp
Contributor

I'd like to work on this

KevsterAmp

KevsterAmp commented on Aug 7, 2024

@KevsterAmp
Contributor

take

KevsterAmp

KevsterAmp commented on Aug 7, 2024

@KevsterAmp
Contributor

take

KevsterAmp

KevsterAmp commented on Aug 7, 2024

@KevsterAmp
Contributor

take

KevsterAmp

KevsterAmp commented on Aug 7, 2024

@KevsterAmp
Contributor

Can't seem to assign the issue to myself, but I'll be opening a PR for this in a bit. Thanks @loicdiridollou for further investigating

KevsterAmp

KevsterAmp commented on Aug 7, 2024

@KevsterAmp
Contributor

Take

added
Needs DiscussionRequires discussion from core team before further action
Missing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolate
pyarrow dtype retentionop with pyarrow dtype -> expect pyarrow result
and removed
Needs TriageIssue that has not been reviewed by a pandas team member
on Aug 7, 2024
rhshadrach

rhshadrach commented on Aug 7, 2024

@rhshadrach
Member
WillAyd

WillAyd commented on Aug 8, 2024

@WillAyd
Member

This is another good issue to track for PDEP-13 #58455

removed their assignment
on Sep 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugMissing-datanp.nan, pd.NaT, pd.NA, dropna, isnull, interpolateNeeds DiscussionRequires discussion from core team before further actionpyarrow dtype retentionop with pyarrow dtype -> expect pyarrow result

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

      Participants

      @WillAyd@rhshadrach@loicdiridollou@thesword53@KevsterAmp

      Issue actions

        BUG: `isna` on pyarrow backed Series is returning Series with `bool` dtype instead of `bool[pyarrow]` · Issue #59431 · pandas-dev/pandas