Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC: Pivot() example call incorrectly used and would give "error: duplicate index" #61058

Open
1 task done
mheskett opened this issue Mar 5, 2025 · 3 comments
Open
1 task done
Labels
Error Reporting Incorrect or improved errors from pandas Needs Info Clarification about behavior needed to assess issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode

Comments

@mheskett
Copy link

mheskett commented Mar 5, 2025

Pandas version checks

  • I have checked that the issue still exists on the latest versions of the docs on main here

Location of the documentation

doc/source/user_guide/reshaping.rst

Documentation problem

the table given as an example for pivot() is wrong and cant be used. it would return "error duplicate index" as there are duplicate values in the column given for "index" parameter.

Image

Suggested fix for documentation

The "foo" column must contain unique values

@mheskett mheskett added Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 5, 2025
@goutam-kul
Copy link

@mheskett It will not throw ValueError: Index contains duplicate entries, cannot reshape , because the index (fool) and columns (bar) have unique combinations:

import pandas as pd

data = {"foo": ['one', 'one', 'one', 'two', 'two', 'two'],
    "bar": ['A', 'B', 'C', 'A', 'B', 'C'],
    "baz": [1, 2, 3, 4, 5, 6],
    "zoo": ['x', 'y', 'z', 'q', 'w', 't']
}

df = pd.DataFrame(data=data)
# print(df)
out = df.pivot(index='foo', columns='bar', values='baz')
print(out)

Output:

bar  A  B  C
foo         
one  1  2  3
two  4  5  6

What happens if I introduce a non-unique combination? yes it will throw duplicate index error. E.g:

data = {"foo": ['one', 'one', 'one', 'two', 'two', 'two'],
    "bar": ['A', 'A', 'C', 'A', 'B', 'C'],
    "baz": [1, 2, 3, 4, 5, 6],
    "zoo": ['x', 'y', 'z', 'q', 'w', 't']
}

Output:

ValueError: Index contains duplicate entries, cannot reshape


While you can use pivot_table method when your have duplicate values in index and column

out = df.pivot_table(index='foo', columns='bar', values='baz')
print(out)

Output:

bar    A    B    C
foo               
one  1.5  NaN  3.0
two  4.0  5.0  6.0

Hope this helps!

@mheskett
Copy link
Author

mheskett commented Mar 5, 2025

thank you. so in that case, the ValueError message is misleading. I can raise a separate issue about that. it should read "must contain unique combinations of index and column"

@rhshadrach
Copy link
Member

I can raise a separate issue about that.

We can rework this issue instead. Why do you feel the ValueError is misleading?

@rhshadrach rhshadrach added Reshaping Concat, Merge/Join, Stack/Unstack, Explode Error Reporting Incorrect or improved errors from pandas Needs Info Clarification about behavior needed to assess issue and removed Docs Needs Triage Issue that has not been reviewed by a pandas team member labels Mar 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Error Reporting Incorrect or improved errors from pandas Needs Info Clarification about behavior needed to assess issue Reshaping Concat, Merge/Join, Stack/Unstack, Explode
Projects
None yet
Development

No branches or pull requests

3 participants