Skip to content

Conversation

loicdiridollou
Copy link
Member

Technically we should fix the pd.DataFrame.all but that would mean having a pd.Series[np.bool], yet np.bool is not a type of S1 so will leave it there for a bit, it has some FIXME statements there for us to know.

Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should remove the FIXME comments. Otherwise OK

@@ -1660,7 +1660,8 @@ class DataFrame(NDFrame, OpsMixin, _GetItemHack):
bool_only: _bool | None = ...,
skipna: _bool = ...,
**kwargs: Any,
) -> _bool: ...
) -> np.bool: ...
# FIXME the type below is not correct, should be pd.Series[np.bool]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way I look at it is that we are using Series[bool] to correspond to whatever bool is stored inside - being a python one, a numpy one, or even if we use BooleanDtype, so I don't think this comment is necessary.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the type checker will not accept that, because pd.Series can only be subscribed with S1 which contains bool (the generic boolean from python) but not np.bool.
Yet pd.DataFrame.any will return at runtime pd.Series[np.bool] (but this is not accepted since np.bool is not a subtype of S1).
Let me know if this is not clear,
image

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The type returned from DataFrame.any() and DataFrame.all() should be Series[_bool] . Then the tests should use that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My issue is that _bool does not contain np.bool which is the type we get at runtime. Happy to keep the stubs as is but that would mean that runtime type does not align with static type.
Or we can add np.bool to _bool, open to suggestion.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it matters. We have checks like this:

check(assert_type(df.any(), "pd.Series[bool]"), pd.Series, np.bool_)

So even though np.bool_ is in the Series, we call the type of the Series[bool]

It's similar to this:

    check(assert_type(df.value_counts(), "pd.Series[int]"), pd.Series, np.integer)

What's inside the series are numpy integers, but we call that Series[int]

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay I see your point, let me fix the comments, thanks for the more detailed vision on this issue!

@loicdiridollou loicdiridollou requested a review from Dr-Irv April 17, 2025 20:54
Copy link
Collaborator

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Dr-Irv Dr-Irv merged commit 1793f88 into pandas-dev:main Apr 17, 2025
13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Wrong type hint for Series.all() and Series.any()
2 participants