Skip to content

GH-48978: [Python] test failures on pandas 3.0 for fastparquet and for zoneinfo w/o pytz#48979

Merged
rok merged 12 commits intoapache:mainfrom
tadeja:48978-test-failures-on-pandas-3.0-currently-CI-on-2.3.3
Feb 20, 2026
Merged

GH-48978: [Python] test failures on pandas 3.0 for fastparquet and for zoneinfo w/o pytz#48979
rok merged 12 commits intoapache:mainfrom
tadeja:48978-test-failures-on-pandas-3.0-currently-CI-on-2.3.3

Conversation

@tadeja
Copy link
Copy Markdown
Contributor

@tadeja tadeja commented Jan 25, 2026

Rationale for this change

Closes #48978

What changes are included in this PR?

Update to parquet/test_basic.py test_fastparquet_cross_compatibility for fastparquet string and categorical dtype differences causing failure Attribute "dtype" are different
Update to test_pandas.py‎ test_timestamp_as_object_non_nanosecond for failure ValueError: fromutc: dt.tzinfo is not self.

Are these changes tested?

Yes. Initially tested locally with pandas upgraded to 3.0 as CI was still running with pandas 2.3.3 cached.

Are there any user-facing changes?

No.

@github-actions
Copy link
Copy Markdown

⚠️ GitHub issue #48978 has been automatically assigned in GitHub to PR creator.

@github-actions github-actions bot added the awaiting review Awaiting review label Jan 25, 2026
@tadeja
Copy link
Copy Markdown
Contributor Author

tadeja commented Jan 25, 2026

Failures replicated here on CI job AMD64 Conda Python 3.13 Pandas latest
with temporary CI modification as follows:
pip install --upgrade pandas fastparquet
pip uninstall -q -y pytz

@tadeja
Copy link
Copy Markdown
Contributor Author

tadeja commented Jan 25, 2026

Proposed fixes complete previously failing tests with success, you can see it is with pandas 3.0, with fastparquet and without pytz here:
AMD64 Conda Python 3.13 Pandas latest succeeded

@h-vetinari
Copy link
Copy Markdown
Contributor

I backported this to the conda-forge feedstock in conda-forge/pyarrow-feedstock#169, and can confirm that it works! Thanks!

@rok rok removed request for assignUser, jonkeane and kou February 8, 2026 11:43
Copy link
Copy Markdown
Member

@AlenkaF AlenkaF left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the fix @tadeja! I am adding my comments bellow.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use integers for categories until fastparquet fully supports pandas 3.0? That way we can also check the dtype roundtrip here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, @AlenkaF here's the new approach with
"f": pd.Categorical([5, 6, 7]),

This fails the test at

tm.assert_frame_equal(table_fp.to_pandas(), df_for_fp, check_dtype=False)
  pyarrow/tests/parquet/test_basic.py:769:
...
...
E           AssertionError: Categorical Expected type <class 'pandas.Categorical'>, found <class 'numpy.ndarray'> instead

so we have to add another check_categorical=False on line 769
as already done before at line 756 for the same reason.

Comment thread python/pyarrow/tests/test_pandas.py Outdated
@github-actions github-actions bot added awaiting committer review Awaiting committer review and removed awaiting review Awaiting review labels Feb 11, 2026
@tadeja tadeja changed the title GH-48978: [Python] test failures on pandas 3.0 (currently CI on 2.3.3) GH-48978: [Python] test failures on pandas 3.0 for fastparquet and zoneinfo w/o pytz Feb 13, 2026
@tadeja tadeja changed the title GH-48978: [Python] test failures on pandas 3.0 for fastparquet and zoneinfo w/o pytz GH-48978: [Python] test failures on pandas 3.0 for fastparquet and for zoneinfo w/o pytz Feb 13, 2026
@tadeja
Copy link
Copy Markdown
Contributor Author

tadeja commented Feb 13, 2026

@tadeja tadeja requested a review from AlenkaF February 13, 2026 12:53
Copy link
Copy Markdown
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a comment suggestion.

Comment thread python/pyarrow/tests/parquet/test_basic.py Outdated
@github-actions github-actions bot added awaiting review Awaiting review awaiting merge Awaiting merge and removed awaiting committer review Awaiting committer review awaiting review Awaiting review awaiting merge Awaiting merge labels Feb 13, 2026
@tadeja tadeja force-pushed the 48978-test-failures-on-pandas-3.0-currently-CI-on-2.3.3 branch from bdf182c to f6c9301 Compare February 13, 2026 13:13
Copy link
Copy Markdown
Member

@rok rok left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Timezone part looks good. Suggesting a more readable fastparquet workaround. Proposed changes are not tested.

Comment thread python/pyarrow/tests/parquet/test_basic.py Outdated
Comment thread python/pyarrow/tests/parquet/test_basic.py Outdated
Comment thread python/pyarrow/tests/parquet/test_basic.py Outdated
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Feb 19, 2026
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Feb 19, 2026
Comment thread python/pyarrow/tests/parquet/test_basic.py Outdated
Co-authored-by: Rok Mihevc <rok@mihevc.org>
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels Feb 19, 2026
@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting changes Awaiting changes labels Feb 20, 2026
@rok rok merged commit ba61297 into apache:main Feb 20, 2026
23 of 25 checks passed
@rok rok removed the awaiting merge Awaiting merge label Feb 20, 2026
@tadeja
Copy link
Copy Markdown
Contributor Author

tadeja commented Feb 20, 2026

I backported this to the conda-forge feedstock in conda-forge/pyarrow-feedstock#169, and can confirm that it works! Thanks!

@h-vetinari, just a ping to let you know that slightly different fixes for pyarrow/tests/parquet/test_basic.py‎ and test_pandas.py have just been merged.

@conbench-apache-arrow
Copy link
Copy Markdown

After merging your PR, Conbench analyzed the 3 benchmarking runs that have been run so far on merge-commit ba61297.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 9 possible false positives for unstable benchmarks that are known to sometimes produce them.

@h-vetinari
Copy link
Copy Markdown
Contributor

Thanks for the heads-up @tadeja; I was following along here, and will pick up the changes on the feedstock when I get the chance (but since it was really just about making tests pass with pandas v3, the details don't matter all that much).

thisisnic pushed a commit to thisisnic/arrow that referenced this pull request Apr 6, 2026
…and for zoneinfo w/o pytz (apache#48979)

### Rationale for this change
Closes apache#48978 

### What changes are included in this PR?
Update to `parquet/test_basic.py test_fastparquet_cross_compatibility` for fastparquet string and categorical dtype differences causing failure `Attribute "dtype" are different`
Update to `test_pandas.py‎ test_timestamp_as_object_non_nanosecond` for failure `ValueError: fromutc: dt.tzinfo is not self`.

### Are these changes tested?
Yes. Initially tested locally with pandas upgraded to 3.0 as CI was still running with pandas 2.3.3 cached.

### Are there any user-facing changes?
No.
* GitHub Issue: apache#48978

Lead-authored-by: Tadeja Kadunc <tadeja.kadunc@gmail.com>
Co-authored-by: tadeja <tadeja@users.noreply.github.com>
Co-authored-by: Alenka Frim <AlenkaF@users.noreply.github.com>
Co-authored-by: Rok Mihevc <rok@mihevc.org>
Signed-off-by: Rok Mihevc <rok@mihevc.org>
@tadeja tadeja deleted the 48978-test-failures-on-pandas-3.0-currently-CI-on-2.3.3 branch April 9, 2026 12:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Python] test failures on pandas 3.0 for fastparquet and for zoneinfo w/o pytz

4 participants