Description
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the master branch of pandas.
Reproducible Example
import pandas
df = pandas.DataFrame(index=list(range(10)), columns=list("ABC"), data=1)
l = []
print(df.groupby(df.index < 4).apply(lambda col: l.append(col.index) or id(col.index)))
print(l)
sr = pandas.Series(index=list(range(10)), data=1)
l = []
print(sr.groupby(sr.index < 4).apply(lambda col: l.append(col.index) or id(col.index)))
print(l[0])
print(l[1])
df = pandas.DataFrame(index=list(range(10)), columns=list("ABC"), data=1)
l = []
print(df.groupby(df.index < 4).apply(lambda col: l.append(col.eval("index*2")) or id(col.index)))
print(l[0])
print(l[1])
print(len(l[0]),l[0].shape)
Issue Description
The result for the Dataframe is quite puzzling as it appears the index of the column sent to the lambda is reused/mutated across the different calls of the groupby/apply (while this does not happend on the Series). I think the values object is also shared.
Even worse, when using the column to generate new Series, we see that as the index is shared, we end up with a Series (l[0]
in the example) which a shape that is not consistent with the length of the Series.
False 2116454933648
True 2116454933648
dtype: int64
[Int64Index([0, 1, 2, 3], dtype='int64'), Int64Index([0, 1, 2, 3], dtype='int64')]
False 2116454934224
True 2116454934464
dtype: int64
Int64Index([4, 5, 6, 7, 8, 9], dtype='int64')
Int64Index([0, 1, 2, 3], dtype='int64')
False 2116454933456
True 2116454933456
dtype: int64
0 8
1 10
2 12
3 14
16
18
dtype: int64
0 0
1 2
2 4
3 6
dtype: int64
4 (6,)
Expected Behavior
I would expect a similar result as for Series, ie having a different index/values object on each call of the lambda function for the Dataframe.
Installed Versions
INSTALLED VERSIONS
commit : 5f648bf
python : 3.9.5.final.0
python-bits : 64
OS : Windows
OS-release : 10
Version : 10.0.18363
machine : AMD64
processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
byteorder : little
LC_ALL : None
LANG : None
LOCALE : French_Belgium.1252
pandas : 1.3.2
numpy : 1.20.3
pytz : 2021.1
dateutil : 2.8.2
pip : 21.2.4
setuptools : 58.0.0
Cython : None
pytest : 6.2.5
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 3.0.1
IPython : None
pandas_datareader: None
bs4 : 4.10.0
bottleneck : None
fsspec : 2021.08.1
fastparquet : 0.7.1
gcsfs : None
matplotlib : None
numexpr : None
odfpy : None
openpyxl : 3.0.7
pandas_gbq : None
pyarrow : None
pyxlsb : None
s3fs : None
scipy : 1.6.3
sqlalchemy : None
tables : None
tabulate : None
xarray : 0.17.0
xlrd : None
xlwt : None
numba : None