fix(series): arithmetics for Series[Any] #1343

cmp0xff · 2025-08-22T12:34:30Z

This PR implements the ideas from #1274 (comment) and #1274 (comment).

Tests added: Please use assert_type() to assert the type of any return value

Dr-Irv

Only issues I see are with respect to the code inside of if TYPE_CHECKING_INVALID_USAGE that you added.

The concern here is that some of the lines will execute fine if the Series comes from a DataFrame and has the correct type inside, but we see that it is a static Series[Any]. And vice versa.

I think I prefer where if you are doing an operation like subtraction where it will sometimes work and sometimes not work, and the inferred type of one of the operands is Series[Any], then we detect that as a typing problem. But we need to be selective.

For example,

df = pd.DataFrame({"a": [1,2,3], "b": pd.to_datetime(["1/1/2025", "2/1/2025", "3/1/2025"])})
sa = df["a"]
sb = df["b"]
sa - pd.Timestamp("1/1/2024")  # fails at runtime
sb - pd.Timestamp("1/1/2024") # works at runtime

Here sa and sb are Series[Any] (mypy) or Series[Unknown] (pyright). So the typing either has to accept both cases or reject both cases.

I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series) or is a string or Series[str]. I think the current stubs are more permissive, but now I'm not sure that's the right thing to do.

Dr-Irv · 2025-08-22T17:04:23Z

tests/series/arithmetic/test_sub.py

+    if TYPE_CHECKING_INVALID_USAGE:
+        _0 = left_td - s
    check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta)
+    if TYPE_CHECKING_INVALID_USAGE:
+        _1 = left_td - a


When you have TYPE_CHECKING_INVALID_USAGE, that means we should have # type: ignore and # pyright: ignore statements that demonstrate the type checker can catch those errors.

Dr-Irv · 2025-08-22T17:05:45Z

tests/series/arithmetic/test_sub.py

    check(assert_type(left_ts - a, "TimedeltaSeries"), pd.Series, pd.Timedelta)
+    if TYPE_CHECKING_INVALID_USAGE:
+        _1 = left_ts - a


This is an example of valid code that should be accepted.

Sorry, it was a typo. 1fb597b

However, I did not easily understand your comment here.

This is an example of valid code that should be accepted.

left_ts and left_td are Series[Any] at type checking. But you wrote

I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series)

It seems to me that in your plan, both left_td - a and left_ts - a should give an error / a Never at type checking. Am I right?

My proposed plan is more permissive and will not detect the problem of left_ts - a, because at runtime, Series[Any] - TimedeltaSeries can either be TimestampSeries or TimedeltaSeries or give an error. In my proposed plan, at type checking, it would give Series[Any].

cmp0xff · 2025-08-23T08:39:43Z

Hi @Dr-Irv , thank you for drafting the plan.

Current plan

I think I prefer where if you are doing an operation like subtraction where it will sometimes work and sometimes not work, and the inferred type of one of the operands is Series[Any], then we detect that as a typing problem. But we need to be selective.

I think we have to be selective here, and probably disallow subtraction with untyped Series when the other argument is known to be time related (Timestamp, Timedelta and associated Series) or is a string or Series[str].

I would like to summarise this typing plan as following:

When the calculation can give a runtime error, typing shows an error or Never
Certain cases are exceptions

`Timestamp` and `Timedelta`: permissive or forbidding

With this typing plan, I have the following examples in my mind:

Series[Any] (int) - TimestampSeries -> error at type checking, error at runtime
Series[Any] (Timestamp) - TimestampSeries -> error at type checking, TimedeltaSeires at runtime

As a user I probably do not want the static type checker to aggressivly point out a potential problem. When the stub is less permissive and more forbidding, the static type checker becomes more aggresive. It seems better to me to allow both cases at the stage of static type checking, otherwise the user may need to manually ignore the type checker in many cases.

`int`: exceptions to the plan

"We need to be selective" is important in the plan, because we also have

Series[Any] (int) + Series[int] -> Series[Any] at type checking, Series[int] at runtime
Series[Any] (str) + Series[int] -> Series[Any] at type checking, error at runtime

Currently we are happy with the stub giving us Series[Any] for adding Series[Any] to Series[int]. This is an exception, which may potentially confuse the user.

Proposing a consistent plan

I would like to propose a new typing plan as following:

When the calculation gives several typing results or a runtime error, typing shows Series[Any]
When the calculation gives one typing result, say Series[R], or a runtime error, typing shows Series[R]
When the calculation always gives a runtime error, typing shows an error or Never

With this typing plan, the previous examples give different results:

Series[Any] (int) - TimestampSeries -> TimedeltaSeries at type checking, error at runtime (TimedeltaSeries is the only possible result that is valid, so unfortunately the type checker does not cache the potential problem here)
Series[Any] (Timestamp) - TimestampSeries -> TimedeltaSeries at type checking, TimedeltaSeires at runtime
Series[Any] (int) + Series[int] -> Series[Any] at type checking, Series[int] at runtime (no exceptional rule in the plan)
Series[Any] (str) + Series[int] -> Series[Any] at type checking, error at runtime (Series[float], Series[int] etc. are possible valid results, so unfortunately the type checker does not cache the potential problem here)

Further examples:

Series[Any] (int) + Series[str] -> Series[str] at type checking, error at runtime (Series[str] is the only possible result that is valid, so unfortunately the type checker does not cache the potential problem here)
Series[Any] (str) + Series[str] -> Series[str] at type checking, Series[str] at runtime
Series[Any] * TimestampSeries -> error / Never at type checking, error at runtime (Timestamp is consistently not multiplicative)

Thank you for reading the lengthy explanation. What do you think?

Dr-Irv · 2025-08-23T18:39:54Z

Thank you for reading the lengthy explanation. What do you think?

The challenge here is the issue of wide vs narrow types. See https://github.com/pandas-dev/pandas-stubs/blob/main/docs/philosophy.md#narrow-vs-wide-arguments for some writeup I did about that.

Let's consider this example from your list:

Series[Any] (int) - TimestampSeries -> TimedeltaSeries at type checking, error at runtime

In what is in main today, the following code works as you describe there, i.e., the type checker infers that result is TimedeltaSeries, but it fails at runtime.

si = pd.DataFrame({"a": pd.Series([1,2,3])})["a"]
st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
result = si - st

I think we do a better service to users if we actually catch this via typing, i.e., for Timedelta, TimedeltaSeries, Timestamp, TimestampSeries, str and Series[str], if they are in a binary operation with a Series[Any] (either before the operator or after the operator), the type checker reports an error. That's telling the user "We don't know how to handle a generic series with another operand that has a specified type", but we are limiting the types we do that with to just the ones I mentioned.

This makes the user then cast the variable si above to Series[int] (in which case we catch the failure), and know it will possibly fail at runtime.

Let's also consider this example:

st = pd.Series(pd.date_range("1/1/2005", "1/3/2005"))
sd = pd.DataFrame({"a": [pd.Timedelta("1 day"), pd.Timedelta("2 days"), pd.Timedelta("3 days")]})["a"]
result = st - sd

In this case, if we adopt my proposal, the type checker would say that st - sd is invalid. But the type of sd is partially unknown, so we are then suggesting that the user do:

result = st - cast("pd.Series[Timedelta]", sd)

which is telling the type checker "I know this is a series of timedeltas"

I'm choosing what I consider to be a happy medium here between your proposal, and something that would be too narrow (e.g., disallowing Series[Any].__sub__(Series[Any])), by suggesting that if we know the types of ONE of the operands, but not the other, we try to catch the error via static typing.

I should say that the current behavior in the stubs is from 3 years ago when we first inherited the project from something MIcrosoft had started, and now that I have more experience with typing, as well as using the stubs in my own code, I've come around to trying to find more things with static type checking if we can find them, then not.

So the summary of my proposal is (with respect to Series) for binary operators a X b, where X is the operator:

If a and b are fully typed, we figure out the result, and if it is an invalid calculation, we catch it.
If a is Series[Any] and b is fully typed, we say that is an error.
If a is fully typed, and b is Series[Any], we say that is an error.
If a and b are not fully typed (i.e., one is Series[Any] and the other is Any or Series[Any], we accept the calculation in typing and don't report an error.

Let me know your thoughts on that.

fix(series): arithmetics for Series[Any]

b6cdcf1

cmp0xff mentioned this pull request Aug 22, 2025

refactor: #718 only drop TimestampSeries #1274

Draft

2 tasks

Dr-Irv requested changes Aug 22, 2025

View reviewed changes

fix(comment): pandas-dev#1343 (comment)

1fb597b

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

fix(series): arithmetics for Series[Any] #1343

fix(series): arithmetics for Series[Any] #1343

cmp0xff commented Aug 22, 2025

Uh oh!

Dr-Irv left a comment

Uh oh!

Dr-Irv Aug 22, 2025

Uh oh!

Dr-Irv Aug 22, 2025

Uh oh!

cmp0xff Aug 23, 2025

Uh oh!

cmp0xff commented Aug 23, 2025 •

edited

Loading

Uh oh!

Dr-Irv commented Aug 23, 2025

Uh oh!

Uh oh!

Uh oh!

fix(series): arithmetics for Series[Any] #1343

Are you sure you want to change the base?

fix(series): arithmetics for Series[Any] #1343

Conversation

cmp0xff commented Aug 22, 2025

Uh oh!

Dr-Irv left a comment

Choose a reason for hiding this comment

Uh oh!

Dr-Irv Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Dr-Irv Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

cmp0xff Aug 23, 2025

Choose a reason for hiding this comment

Uh oh!

cmp0xff commented Aug 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Current plan

Timestamp and Timedelta: permissive or forbidding

int: exceptions to the plan

Proposing a consistent plan

Uh oh!

Dr-Irv commented Aug 23, 2025

Uh oh!

Uh oh!

cmp0xff commented Aug 23, 2025 •

edited

Loading

`Timestamp` and `Timedelta`: permissive or forbidding

`int`: exceptions to the plan