Skip to content

ENH: Consistent name property for the iterates in DataFrameGroupBy #62141

@FelixBenning

Description

@FelixBenning

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

If a dataframe is grouped by a single column sometimes its name is a numeric scalar and sometimes a single element tuple. This should be more consistent.

In the following I will consider the following example dataframe

>>> df
   a  b  c
0  1  2  3
1  1  5  6
2  7  8  9

Cases where name is a scalar

  • DataFrameGroupBy.groups

    >>> print(df.groupby(['a']).groups)
    {1: [0, 1], 7: [2]}
  • DataFrameGroupBy.apply

    >>> df.groupby(['a']).apply(lambda x: print(x.name), include_groups=False)
    1
    7
    Empty DataFrame
    Columns: []
    Index: []
    >>> 

Cases where name is a one element tuple

  • DataFrameGroupBy.__iter__
    >>> for name, _ in df.groupby(['a']): print(name)
    ... 
    (1,)
    (7,)

Documentation

It should perhaps be said that DataFrameGroupBy.name is ill documented. But it is not a private property. It seems like the most natural thing to query if you need the information from the columns that you have grouped.

This is especially important, as pandas forces include_groups=False in apply with a FutureWarning/DeprecationWarning. So DataFrameGroupBy.name seems like the most natural way to reobtain this information now.

Feature Description

Consistency in either direction:

  • PRO SCALAR: It appears that in the majority of cases the name is a scalar. Although to be fair I have not checked many cases. This is probably also more intuitive to people that do not think about multiple column groupings

  • PRO TUPLE: The single element tuple makes this more consistent with the case, where multiple columns are selected.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions