Description
- I have checked that this issue has not already been reported.
- I have confirmed this bug exists on the latest version of pandas.
- I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
#!/usr/bin/env python3
import pandas as pd
# 1.2.5 and 1.3.2
print(pd.__version__)
df = pd.DataFrame(data={'VAR': ['A', 'B'], 'X': range(2)})
df.VAR = df.VAR.astype('category')
df.X = df.X.astype('int8')
# Name: VAR, dtype: category
# Categories (2, object): ['A', 'B']
print(df.VAR)
# VAR category
# X int8
# dtype: object
print(df.dtypes)
def foo(row):
# doing nothing
return row
for arg in [None, 'reduce', 'expand', 'broadcast']:
print(f'\n{arg}')
print(df.apply(foo, axis=1, result_type='reduce').dtypes)
Issue Description
When I "iterate" on row level via DataFrame.apply(foo, axis=1)
the dtype
of the columns are changing.
Here in the MWE a category
becomes object
and an int8
becomes `int64'.
The argument result_type=
seems to have no effect. The MWE checks each possible option for it.
This is the output of the MWE
1.3.2
0 A
1 B
Name: VAR, dtype: category
Categories (2, object): ['A', 'B']
VAR category
X int8
dtype: object
None
VAR object
X int64
dtype: object
reduce
VAR object
X int64
dtype: object
expand
VAR object
X int64
dtype: object
broadcast
VAR object
X int64
dtype: object
Expected Behavior
Just keep the dtype
. Do not convert it without users permission. or minimally throw a warning if there is a good reason for it.
Possible related #42001
Installed Versions
INSTALLED VERSIONS
commit : 5f648bf
python : 3.9.4.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18362
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 12, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : de_DE.cp1252
pandas : 1.3.2
numpy : 1.21.1
pytz : 2021.1
dateutil : 2.8.1
pip : 21.2.4
setuptools : 57.0.0
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : 1.1
pymysql : None
psycopg2 : None
jinja2 : 2.11.3
IPython : 7.23.1
pandas_datareader: None
bs4 : 4.9.3
bottleneck : None
fsspec : None
fastparquet : None
gcsfs : None
matplotlib : 3.4.2
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : 4.0.1
pyxlsb : None
s3fs : None
scipy : 1.7.0
sqlalchemy : 1.4.13
tables : None
tabulate : 0.8.9
xarray : None
xlrd : None
xlwt : None
numba : None