-
-
Notifications
You must be signed in to change notification settings - Fork 147
fix(series): arithmetics for Series[Any] #1343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(series): arithmetics for Series[Any] #1343
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only issues I see are with respect to the code inside of if TYPE_CHECKING_INVALID_USAGE
that you added.
The concern here is that some of the lines will execute fine if the Series
comes from a DataFrame
and has the correct type inside, but we see that it is a static Series[Any]
. And vice versa.
I think I prefer where if you are doing an operation like subtraction where it will sometimes work and sometimes not work, and the inferred type of one of the operands is Series[Any]
, then we detect that as a typing problem. But we need to be selective.
For example,
df = pd.DataFrame({"a": [1,2,3], "b": pd.to_datetime(["1/1/2025", "2/1/2025", "3/1/2025"])})
sa = df["a"]
sb = df["b"]
sa - pd.Timestamp("1/1/2024") # fails at runtime
sb - pd.Timestamp("1/1/2024") # works at runtime
Here sa
and sb
are Series[Any]
(mypy) or Series[Unknown]
(pyright). So the typing either has to accept both cases or reject both cases.
I think we have to be selective here, and probably disallow subtraction with untyped Series
when the other argument is known to be time related (Timestamp
, Timedelta
and associated Series
) or is a string or Series[str]
. I think the current stubs are more permissive, but now I'm not sure that's the right thing to do.
if TYPE_CHECKING_INVALID_USAGE: | ||
_0 = left_td - s | ||
check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
if TYPE_CHECKING_INVALID_USAGE: | ||
_1 = left_td - a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you have TYPE_CHECKING_INVALID_USAGE
, that means we should have # type: ignore
and # pyright: ignore
statements that demonstrate the type checker can catch those errors.
tests/series/arithmetic/test_sub.py
Outdated
check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta) | ||
if TYPE_CHECKING_INVALID_USAGE: | ||
_1 = left_ts - a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is an example of valid code that should be accepted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, it was a typo. 1fb597b
However, I did not easily understand your comment here.
This is an example of valid code that should be accepted.
left_ts
and left_td
are Series[Any]
at type checking. But you wrote
I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series)
It seems to me that in your plan, both left_td - a
and left_ts - a
should give an error / a Never
at type checking. Am I right?
My proposed plan is more permissive and will not detect the problem of left_ts - a
, because at runtime, Series[Any] - TimedeltaSeries
can either be TimestampSeries
or TimedeltaSeries
or give an error. In my proposed plan, at type checking, it would give Series[Any]
.
Hi @Dr-Irv , thank you for drafting the plan. Current plan
I would like to summarise this typing plan as following:
|
The challenge here is the issue of wide vs narrow types. See https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#narrow-vs-wide-arguments for some writeup I did about that. Let's consider this example from your list:
In what is in si = pd.DataFrame({"a": pd.Series([1,2,3])})["a"]
st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
result = si - st I think we do a better service to users if we actually catch this via typing, i.e., for This makes the user then Let's also consider this example: st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
sd = pd.DataFrame({"a": [pd.Timedelta("1 day"), pd.Timedelta("2 days"), pd.Timedelta("3 days")]})["a"]
result = st - sd In this case, if we adopt my proposal, the type checker would say that result = st - cast("pd.Series[Timedelta]", sd) which is telling the type checker "I know this is a series of timedeltas" I'm choosing what I consider to be a happy medium here between your proposal, and something that would be too narrow (e.g., disallowing I should say that the current behavior in the stubs is from 3 years ago when we first inherited the project from something MIcrosoft had started, and now that I have more experience with typing, as well as using the stubs in my own code, I've come around to trying to find more things with static type checking if we can find them, then not. So the summary of my proposal is (with respect to
Let me know your thoughts on that. |
This PR implements the ideas from #1274 (comment) and #1274 (comment).
assert_type()
to assert the type of any return value