Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bugfix/1835: Keep nulls in polars when dropping invalid rows and nullable=True #1890

Merged
merged 2 commits into from
Jan 3, 2025

Conversation

baldwinj30
Copy link
Contributor

When nullable=True in polars, it seems the check_outputs values from the error handler are left as null rather than True where values are null. This would be one possible way to remedy that to match with pandas behavior as per #1835 however I am not sure if this is the best place to handle that behavior, or even if it makes sense to match pandas behavior here or not.

There also seemed to be a bug in the original test for polars dropping null values which I attempted to fix up here.

Copy link

codecov bot commented Dec 31, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.32%. Comparing base (812b2a8) to head (17a1967).
Report is 186 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1890      +/-   ##
==========================================
- Coverage   94.28%   93.32%   -0.96%     
==========================================
  Files          91      121      +30     
  Lines        7013     9301    +2288     
==========================================
+ Hits         6612     8680    +2068     
- Misses        401      621     +220     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@baldwinj30 baldwinj30 changed the title Keep nulls in polars when dropping invalid rows and nullable=True bugfix/1835: Keep nulls in polars when dropping invalid rows and nullable=True Dec 31, 2024
@cosmicBboy
Copy link
Collaborator

thanks for the contribution @baldwinj30! I do think it makes sense to match pandas behavior here, but the kwarg of interest is Check(ignore_na=True). This is handled in the check backend layer in the postprocess method, see the pandas implementation:

https://github.com/unionai-oss/pandera/blob/main/pandera/backends/pandas/checks.py#L273

The equivalent spot in the polars check backend is here: https://github.com/unionai-oss/pandera/blob/main/pandera/backends/polars/checks.py#L90, let's make the change over there

@baldwinj30
Copy link
Contributor Author

Thanks @cosmicBboy! I moved the fix in to the check backend. I am not totally sure about the failing checks, but I don't think they are related to my changes?

Copy link
Collaborator

@cosmicBboy cosmicBboy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @baldwinj30 ! 🚀

@cosmicBboy cosmicBboy merged commit 813420d into unionai-oss:main Jan 3, 2025
145 of 146 checks passed
max-raphael pushed a commit to max-raphael/pandera that referenced this pull request Jan 24, 2025
…able=True (unionai-oss#1890)

* don't drop null values when dropping invalid rows in polars and nullable=True

Signed-off-by: Jacob Baldwin <[email protected]>

* move the fix for keeping nulls in polars when nullable=True to the check backend

Signed-off-by: Jacob Baldwin <[email protected]>

---------

Signed-off-by: Jacob Baldwin <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants