Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PERF: why nlargest is so slower? #61076

Open
3 tasks done
ZGarry opened this issue Mar 7, 2025 · 4 comments
Open
3 tasks done

PERF: why nlargest is so slower? #61076

ZGarry opened this issue Mar 7, 2025 · 4 comments
Labels
Needs Info Clarification about behavior needed to assess issue Performance Memory or execution speed performance

Comments

@ZGarry
Copy link

ZGarry commented Mar 7, 2025

Pandas version checks

  • I have checked that this issue has not already been reported.

  • I have confirmed this issue exists on the latest version of pandas.

  • I have confirmed this issue exists on the main branch of pandas.

Reproducible Example

nlargest is so slow, I think this is question, maybe we should do someting to improve it.

Here, is my code, you can see nlargest use many time more than it should use.

Image

Installed Versions

Replace this line with the output of pd.show_versions()

Prior Performance

No response

@ZGarry ZGarry added Needs Triage Issue that has not been reviewed by a pandas team member Performance Memory or execution speed performance labels Mar 7, 2025
@Liam3851
Copy link
Contributor

Liam3851 commented Mar 7, 2025

You haven't provided your data or a benchmark-- does your index have duplicates? I wonder if this is related to #55767.

@mroeschke mroeschke added Needs Info Clarification about behavior needed to assess issue and removed Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 7, 2025
@Jeffrharr
Copy link

Jeffrharr commented Mar 7, 2025

Until this gets more information -- I'd like to give #55767 a shot. Looks simple enough.

Are there are performance benchmarks anywhere in case of regressions?

@rhshadrach
Copy link
Member

@Jeffrharr - it looks like no, it should likely be added to our ASVs.

@Jeffrharr
Copy link

@rhendric for now, I do have a working solution to #55767 that is unlikely to cause any regressions and I'll make a PR early next week. I only fixed the bug with duplicate indices causing performance issues -- there's still some room for improvement in general.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Info Clarification about behavior needed to assess issue Performance Memory or execution speed performance
Projects
None yet
Development

No branches or pull requests

5 participants