-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Description
Pandas version checks
I have checked that this issue has not already been reported.
I have confirmed this bug exists on the latest version of pandas.
I have confirmed this bug exists on the main branch of pandas.
To pick up a draggable item, press the space bar. While dragging, use the arrow keys to move the item. Press space again to drop the item in its new position, or press escape to cancel.
Reproducible Example
>>> s = pd.Series([0, None, 4, 5], dtype="u1[pyarrow]")
>>> s
0 0
1 <NA>
2 4
3 5
dtype: uint8[pyarrow]
>>> s.isna()
0 False
1 True
2 False
3 False
dtype: bool
Issue Description
s.isna().dtype
is BoolDType
(bool
) instead of ArrowDtype(pa.bool_())
(bool[pyarrow]
)
Expected Behavior
>>> s.isna()
0 False
1 True
2 False
3 False
dtype: bool[pyarrow]
Installed Versions
INSTALLED VERSIONS
commit : d9cdd2e
python : 3.11.9.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.19045
machine : AMD64
processor : AMD64 Family 23 Model 113 Stepping 0, AuthenticAMD
byteorder : little
LC_ALL : None
LANG : fr_FR.UTF-8
LOCALE : fr_FR.cp1252
pandas : 2.2.2
numpy : 2.0.1
pytz : 2024.1
dateutil : 2.9.0
setuptools : 70.2.0
pip : None
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : 5.2.2
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 3.1.4
IPython : 8.22.2
pandas_datareader : None
adbc-driver-postgresql: None
adbc-driver-sqlite : None
bs4 : None
bottleneck : None
dataframe-api-compat : None
fastparquet : None
fsspec : None
gcsfs : None
matplotlib : 3.9.0
numba : None
numexpr : 2.10.1
odfpy : None
openpyxl : 3.1.5
pandas_gbq : None
pyarrow : 17.0.0
pyreadstat : None
python-calamine : None
pyxlsb : None
s3fs : None
scipy : 1.14.0
sqlalchemy : None
tables : None
tabulate : None
xarray : None
xlrd : None
zstandard : None
tzdata : 2024.1
qtpy : None
pyqt5 : None
Activity
loicdiridollou commentedon Aug 7, 2024
Hey @thesword53,
I realized that issue also affected something I am working on so I went down the rabbit hole and it seems like what is happening is that the
Series
gets cast to anp.ndarray
then the isisna
operation gets applied and when they rebuild theSeries
object, we lose the original type (pyarrow) and it seems like it just rebuilds without any assumption of type (as we pass annp.ndarray
of bool it just set the type of the Series tobool
and notbool[pyarrow]
).pandas/pandas/core/dtypes/missing.py
Lines 208 to 210 in aa134bb
This also affects if you create a
Dataframe
where the type of the column was originallyuint8[pyarrow]
and it gets cast intobool
and notbool[pyarrow]
.KevsterAmp commentedon Aug 7, 2024
I'd like to work on this
KevsterAmp commentedon Aug 7, 2024
take
KevsterAmp commentedon Aug 7, 2024
take
KevsterAmp commentedon Aug 7, 2024
take
KevsterAmp commentedon Aug 7, 2024
Can't seem to assign the issue to myself, but I'll be opening a PR for this in a bit. Thanks @loicdiridollou for further investigating
bool[pyarrow]
when calling pyarrow backedSeries.isna()
#59436KevsterAmp commentedon Aug 7, 2024
Take
rhshadrach commentedon Aug 7, 2024
Ref: #59436 (review)
WillAyd commentedon Aug 8, 2024
This is another good issue to track for PDEP-13 #58455