Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(typing): add np.ndarray aliases #1977

Merged
merged 15 commits into from
Feb 9, 2025
Merged

chore(typing): add np.ndarray aliases #1977

merged 15 commits into from
Feb 9, 2025

Conversation

dangotbanned
Copy link
Member

@dangotbanned dangotbanned commented Feb 9, 2025

Closes #1972

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Resolves warnings that look like this:

narwhals/_pandas_like/dataframe.py:858: error: Missing type parameters for generic type "ndarray"  [type-arg]

On main, this accounts for 25 warnings:

image

Noticed a few cases that use the original guard and combined with an `ndim == ...`
Now we get that in one call with type narrowing
Tried to resolve a variance issue in `PandasLikeDataFrame.to_numpy`, but abandoned for now
- Sub-package has a huge amount of unrelated warnings
- Tried to fix *only* those that are related to `np.ndarray`
import pandas as pd
import polars as pl
from pyarrow._stubs_typing import Indices # type: ignore # noqa: PGH003
Copy link
Member Author

@dangotbanned dangotbanned Feb 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Import is refering to https://github.com/zen-xu/pyarrow-stubs/blob/d97063876720e6a5edda7eb15f4efe07c31b8296/pyarrow-stubs/_stubs_typing.pyi#L27

That type is causing a lot of the headaches with pyarrow:

Indices: TypeAlias = list[int] | NDArray[np.integer] | IntegerArray

Where we are using Sequence[int] it causes a conflict, but AFAICT list[int] is overly narrow on their end

@@ -207,28 +217,30 @@ def __getitem__(
isinstance(item, tuple)
and len(item) == 2
and is_sequence_but_not_str(item[1])
and not isinstance(item[0], str)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I struggled to follow the logic here, but this seems like it should've been there?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should already have been validated at the narwhals/dataframe.py level - but sure, doesn't hurt to be more explicit about what we're dealing with when we get here πŸ‘

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MarcoGorelli I really think all these CompliantDataFrame.__getitem__ methods would benefit from some reusable code (per #1942)

Comment on lines +245 to +252
def is_numpy_array_1d(arr: Any) -> TypeIs[_1DArray]:
"""Check whether `arr` is a 1D NumPy Array without importing NumPy."""
return is_numpy_array(arr) and arr.ndim == 1


def is_numpy_array_2d(arr: Any) -> TypeIs[_2DArray]:
"""Check whether `arr` is a 2D NumPy Array without importing NumPy."""
return is_numpy_array(arr) and arr.ndim == 2
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suffixing the current guard name seemed reasonable, but it is a bit odd having:
array_1d -> _1DArray

@dangotbanned

This comment was marked as resolved.

@dangotbanned dangotbanned marked this pull request as ready for review February 9, 2025 14:38
Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @dangotbanned ! this 1DArray / 2DArray is pretty nice, I think it would be useful in pandas-stubs too, i think i'll suggest it there

@@ -207,28 +217,30 @@ def __getitem__(
isinstance(item, tuple)
and len(item) == 2
and is_sequence_but_not_str(item[1])
and not isinstance(item[0], str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should already have been validated at the narwhals/dataframe.py level - but sure, doesn't hurt to be more explicit about what we're dealing with when we get here πŸ‘

@MarcoGorelli MarcoGorelli merged commit 7b2db00 into main Feb 9, 2025
28 checks passed
@MarcoGorelli MarcoGorelli deleted the typing-ndarray branch February 9, 2025 18:00
@@ -819,7 +820,7 @@ def calculate_timestamp_date(s: pd.Series, time_unit: str) -> pd.Series:

def select_columns_by_name(
df: T,
column_names: Sequence[str],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❀️ yes I needed this too

@dangotbanned
Copy link
Member Author

thanks @dangotbanned ! this 1DArray / 2DArray is pretty nice, I think it would be useful in pandas-stubs too, i think i'll suggest it there

Thanks @MarcoGorelli!

I can't take all the credit there, I'm only simplifying parts of numpy(s) stubs:

AFAICT, we only care about .ndim == 1 and .ndim == 2, so we can forgo a lot of the complexities pandas-stubs may have to reckon with

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

add np.ndarray alias(es)
3 participants