Skip to content

GH1173 Experiment with fuller typing #1193

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

loicdiridollou
Copy link
Member

@MarcoGorelli I took a try at your idea of typehinting the whole set of arguments in pd.DataFrame.query but stumbling upon the issue when the user passes a dictionary (which is still an allowed behavior).
Wondering if there is something I am missing here, please let me know!

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Apr 21, 2025

@MarcoGorelli I took a try at your idea of typehinting the whole set of arguments in pd.DataFrame.query but stumbling upon the issue when the user passes a dictionary (which is still an allowed behavior).
Wondering if there is something I am missing here, please let me know!

You'd still need to have the **kwargs as an overload. If you want to allow the other arguments, you can have that (although I don't think we should do that because it's not documented), but you'd still need the **kwargs overload

@loicdiridollou
Copy link
Member Author

I was able to make some progress here, let me know if this is what you envisioned and I will clean it up.

@MarcoGorelli
Copy link
Member

Hey - yeah, this is what I was thinking - I don't really understand why **kwargs would be needed at all, given that the accepted arguments by eval are limited https://pandas.pydata.org/docs/reference/api/pandas.eval.html#pandas.eval , although based on

If you want to allow the other arguments, you can have that (although I don't think we should do that because it's not documented)

it looks like Irv disagrees and that what I've suggested may be against the pandas-stubs philosophy - which is fair enough

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented May 5, 2025

Hey - yeah, this is what I was thinking - I don't really understand why **kwargs would be needed at all, given that the accepted arguments by eval are limited https://pandas.pydata.org/docs/reference/api/pandas.eval.html#pandas.eval , although based on

If you want to allow the other arguments, you can have that (although I don't think we should do that because it's not documented)

it looks like Irv disagrees and that what I've suggested may be against the pandas-stubs philosophy - which is fair enough

So I looked more carefully at the pandas docs, which do say that the accepted **kwargs are from eval(). Not the greatest on the docs front, and that could be improved. Maybe one of you could create an issue there??

So I'll look more carefully at this PR now with that in mind.

Comment on lines +710 to 717
@overload
def query(
self,
expr: _str,
*,
inplace: Literal[True],
**kwargs: Any,
) -> None: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not clear why you need this overload when the previous overload covers it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test would raise a mypy/pyright error without the other overload.

kwargs = {"parser": "pandas", "engine": "numexpr"}
check(
assert_type(df.query("col1 > col2", inplace=False, **kwargs), pd.DataFrame),
pd.DataFrame,
)

Comment on lines +733 to 739
def query(
self,
expr: _str,
*,
inplace: Literal[False] = ...,
**kwargs: Any,
) -> Self: ...
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same comment - not sure why we need this overload when the previous one covers it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The goal was to still allow for someone passing kwargs as a dictionary instead of the individual arguments since it is the way documented in the docs like df.query("col1 > col2", **kwargs) that you would pass from another function.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The idea is that someone may put a wrapper like:

def my_own_query(df, expr, **kwargs):
    return df.query(expr, **kwargs)

If you drop the second overload this will raise an error.

Comment on lines 519 to 530
check(
assert_type(
df.query("col1 > col2", parser="pandas", engine="numexpr"), pd.DataFrame
),
pd.DataFrame,
)
check(
assert_type(
df.query("col1 > col2", parser="pandas", engine="numexpr"), pd.DataFrame
),
pd.DataFrame,
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duplicate tests

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

@loicdiridollou
Copy link
Member Author

I will raise an issue on the pandas side if it is not simpler to have all the arguments directly in the function. It is a good point since it should not be too hard to maintain.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

type kwargs in DataFrame.query according to DataFrame.eval
3 participants