Skip to content

Conversation

samukweku
Copy link
Collaborator

@samukweku samukweku commented Mar 30, 2025

PR Description

Please describe the changes proposed in the pull request:

  • Fix error when MultiIndex is used with summarise
def dfmi():
    """Create a MultiIndex DataFrame"""

    # https://pandas.pydata.org/docs/user_guide/advanced.html#using-slicers
    def mklbl(prefix, n):
        return ["%s%s" % (prefix, i) for i in range(n)]

    miindex = pd.MultiIndex.from_product(
        [mklbl("A", 4), mklbl("B", 2), mklbl("C", 4), mklbl("D", 2)]
    )

    micolumns = pd.MultiIndex.from_tuples(
        [("a", "foo"), ("a", "bar"), ("b", "foo"), ("b", "bah")],
        names=["lvl0", "lvl1"],
    )
    dfmi = (
        pd.DataFrame(
            np.arange(len(miindex) * len(micolumns)).reshape(
                (len(miindex), len(micolumns))
            ),
            index=miindex,
            columns=micolumns,
        )
        .sort_index()
        .sort_index(axis=1)
    )
    dfmi.index.names = list("ABCD")
    return dfmi


df = dfmi()
df.summarise(('a',['min','max']),by=('a','bar'))

lvl0        a               
lvl1      bar       foo     
          min  max  min  max
(a, bar)                    
1           1    1    0    0
5           5    5    4    4
9           9    9    8    8
13         13   13   12   12
17         17   17   16   16
...       ...  ...  ...  ...
237       237  237  236  236
241       241  241  240  240
245       245  245  244  244
249       249  249  248  248
253       253  253  252  252

[64 rows x 4 columns]

This PR resolves #1459 .

PR Checklist

Please ensure that you have done the following:

  1. PR in from a fork off your branch. Do not PR from <your_username>:dev, but rather from <your_username>:<feature-branch_name>.
  1. If you're not on the contributors list, add yourself to AUTHORS.md.
  1. Add a line to CHANGELOG.md under the latest version header (i.e. the one that is "on deck") describing the contribution.
    • Do use some discretion here; if there are multiple PRs that are related, keep them in a single line.

Automatic checks

There will be automatic checks run on the PR. These include:

  • Building a preview of the docs on Netlify
  • Automatically linting the code
  • Making sure the code is documented
  • Making sure that all tests are passed
  • Making sure that code coverage doesn't go down.

Relevant Reviewers

Please tag maintainers to review.

@samukweku samukweku self-assigned this Mar 30, 2025
@ericmjl
Copy link
Member

ericmjl commented Mar 30, 2025

Copy link

codecov bot commented Mar 30, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 92.43%. Comparing base (e1b64c1) to head (0c50f47).
Report is 11 commits behind head on dev.

Additional details and impacted files
@@            Coverage Diff             @@
##              dev    #1460      +/-   ##
==========================================
+ Coverage   83.49%   92.43%   +8.94%     
==========================================
  Files          88       90       +2     
  Lines        6469     6585     +116     
==========================================
+ Hits         5401     6087     +686     
+ Misses       1068      498     -570     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@ericmjl ericmjl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pre-approving!

@@ -502,7 +502,7 @@ def _compute_cartesian_product(inputs: tuple, sort: bool) -> dict:
return contents

lengths = (len(key) for key in contents if isinstance(key, tuple))
lengths = max(lengths)
lengths = max(lengths, default=0)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this change, I think we may need a test. What do you think, @samukweku?

Copy link
Collaborator Author

@samukweku samukweku Apr 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ericmjl thanks for the review ... .the default parameter is not necessary since line 504 will only kick in if line 501 is not true.

@samukweku samukweku requested a review from ericmjl April 7, 2025 13:14
@ericmjl
Copy link
Member

ericmjl commented Apr 10, 2025

@samukweku let's merge and release!

@samukweku samukweku merged commit 09df7f6 into dev Apr 10, 2025
1 check passed
samukweku added a commit that referenced this pull request Apr 21, 2025
* updates to jn.summarise

* cleanup

* cleanup

* singledispatch for tuple

* minor fix for mutate

* remove default parameter in max

* remove default parameter in max

* remove default parameter in max

---------

Co-authored-by: samuel.oranyeli <[email protected]>
Co-authored-by: Eric Ma <[email protected]>
@samukweku samukweku deleted the samukweku/agg_summarise_tuple_rename branch April 30, 2025 11:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

summarise fails for MultiIndex
2 participants