Skip to content

GH1173 Experiment with fuller typing #1193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
32 changes: 30 additions & 2 deletions pandas-stubs/core/frame.pyi
Original file line number Diff line number Diff line change
Expand Up @@ -698,16 +698,44 @@ class DataFrame(NDFrame, OpsMixin, _GetItemHack):
self,
expr: _str,
*,
parser: Literal["pandas", "python"] = ...,
engine: Literal["python", "numexpr"] | None = ...,
local_dict: dict[_str, Any] | None = ...,
global_dict: dict[_str, Any] | None = ...,
resolvers: list[Mapping] | None = ...,
level: int = ...,
target: object | None = ...,
inplace: Literal[True],
**kwargs: Any, # TODO: make more precise https://github.com/pandas-dev/pandas-stubs/issues/1173
) -> None: ...
@overload
def query(
self,
expr: _str,
*,
inplace: Literal[True],
**kwargs: Any,
) -> None: ...
Comment on lines +710 to 717
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear why you need this overload when the previous overload covers it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would raise a mypy/pyright error without the other overload.

kwargs = {"parser": "pandas", "engine": "numexpr"}
check(
assert_type(df.query("col1 > col2", inplace=False, **kwargs), pd.DataFrame),
pd.DataFrame,
)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My take on this is that we shouldn't support the kwargs being passed in a dict, because they can't be type-checked. The docs say what kwargs are acceptable, via the eval() link, so let's not include this test (and the overloads supporting kwargs

@overload
def query(
self,
expr: _str,
*,
inplace: Literal[False] = ...,
**kwargs: Any, # TODO: make more precise https://github.com/pandas-dev/pandas-stubs/issues/1173
parser: Literal["pandas", "python"] = ...,
engine: Literal["python", "numexpr"] | None = ...,
local_dict: dict[_str, Any] | None = ...,
global_dict: dict[_str, Any] | None = ...,
resolvers: list[Mapping] | None = ...,
level: int = ...,
target: object | None = ...,
) -> Self: ...
@overload
def query(
self,
expr: _str,
*,
inplace: Literal[False] = ...,
**kwargs: Any,
) -> Self: ...
Comment on lines +733 to 739
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment - not sure why we need this overload when the previous one covers it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to still allow for someone passing kwargs as a dictionary instead of the individual arguments since it is the way documented in the docs like df.query("col1 > col2", **kwargs) that you would pass from another function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that someone may put a wrapper like:

def my_own_query(df, expr, **kwargs):
    return df.query(expr, **kwargs)

If you drop the second overload this will raise an error.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the goal here is to restrict the possible keyword arguments, so we should remove the overload.

@overload
def eval(self, expr: _str, *, inplace: Literal[True], **kwargs: Any) -> None: ...
Expand Down
27 changes: 27 additions & 0 deletions tests/test_frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -514,6 +514,33 @@ def test_types_query() -> None:
check(assert_type(df.query("col1 % col2 == 0", inplace=True), None), type(None))


def test_types_query_kwargs() -> None:
df = pd.DataFrame(data={"col1": [1, 2, 3, 4], "col2": [3, 0, 1, 7]})
check(
assert_type(
df.query("col1 > col2", parser="pandas", engine="numexpr"), pd.DataFrame
),
pd.DataFrame,
)
check(
assert_type(
df.query("col1 > col2", parser="pandas", engine="numexpr", inplace=True),
None,
),
type(None),
)
kwargs = {"parser": "pandas", "engine": "numexpr"}
check(
assert_type(df.query("col1 > col2", inplace=False, **kwargs), pd.DataFrame),
pd.DataFrame,
)

check(
assert_type(df.query("col1 % col2 == 0", inplace=True, **kwargs), None),
type(None),
)


def test_types_eval() -> None:
df = pd.DataFrame(data={"col1": [1, 2, 3, 4], "col2": [3, 0, 1, 7]})
check(assert_type(df.eval("E = col1 > col2", inplace=True), None), type(None))
Expand Down