-
-
Notifications
You must be signed in to change notification settings - Fork 18.9k
Description
Feature Type
-
Adding new functionality to pandas
-
Changing existing functionality in pandas
-
Removing existing functionality in pandas
Problem Description
If a dataframe is grouped by a single column sometimes its name
is a numeric scalar and sometimes a single element tuple. This should be more consistent.
In the following I will consider the following example dataframe
>>> df
a b c
0 1 2 3
1 1 5 6
2 7 8 9
Cases where name
is a scalar
-
DataFrameGroupBy.groups
>>> print(df.groupby(['a']).groups) {1: [0, 1], 7: [2]}
-
DataFrameGroupBy.apply
>>> df.groupby(['a']).apply(lambda x: print(x.name), include_groups=False) 1 7 Empty DataFrame Columns: [] Index: [] >>>
Cases where name
is a one element tuple
DataFrameGroupBy.__iter__
>>> for name, _ in df.groupby(['a']): print(name) ... (1,) (7,)
Documentation
It should perhaps be said that DataFrameGroupBy.name
is ill documented. But it is not a private property. It seems like the most natural thing to query if you need the information from the columns that you have grouped.
This is especially important, as pandas forces include_groups=False
in apply
with a FutureWarning
/DeprecationWarning
. So DataFrameGroupBy.name
seems like the most natural way to reobtain this information now.
Feature Description
Consistency in either direction:
-
PRO SCALAR: It appears that in the majority of cases the
name
is a scalar. Although to be fair I have not checked many cases. This is probably also more intuitive to people that do not think about multiple column groupings -
PRO TUPLE: The single element tuple makes this more consistent with the case, where multiple columns are selected.
Additional Context
No response