Skip to content

Commit 510d4b3

Browse files
committed
Merge remote-tracking branch 'upstream/main' into ref/merge_blocks
2 parents e21b8d5 + ead37b2 commit 510d4b3

File tree

17 files changed

+361
-114
lines changed

17 files changed

+361
-114
lines changed

.pre-commit-config.yaml

Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,7 @@ ci:
1919
skip: [pyright, mypy]
2020
repos:
2121
- repo: https://github.com/astral-sh/ruff-pre-commit
22-
rev: v0.13.3
22+
rev: v0.14.3
2323
hooks:
2424
- id: ruff
2525
args: [--exit-non-zero-on-fix]
@@ -46,7 +46,7 @@ repos:
4646
- id: codespell
4747
types_or: [python, rst, markdown, cython, c]
4848
- repo: https://github.com/MarcoGorelli/cython-lint
49-
rev: v0.17.0
49+
rev: v0.18.1
5050
hooks:
5151
- id: cython-lint
5252
- id: double-quote-cython-strings
@@ -67,11 +67,11 @@ repos:
6767
- id: trailing-whitespace
6868
args: [--markdown-linebreak-ext=md]
6969
- repo: https://github.com/PyCQA/isort
70-
rev: 6.1.0
70+
rev: 7.0.0
7171
hooks:
7272
- id: isort
7373
- repo: https://github.com/asottile/pyupgrade
74-
rev: v3.20.0
74+
rev: v3.21.0
7575
hooks:
7676
- id: pyupgrade
7777
args: [--py311-plus]
@@ -87,7 +87,7 @@ repos:
8787
types: [text] # overwrite types: [rst]
8888
types_or: [python, rst]
8989
- repo: https://github.com/sphinx-contrib/sphinx-lint
90-
rev: v1.0.0
90+
rev: v1.0.1
9191
hooks:
9292
- id: sphinx-lint
9393
args: ["--enable", "all", "--disable", "line-too-long"]

doc/source/user_guide/groupby.rst

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ We could naturally group by either the ``A`` or ``B`` columns, or both:
137137

138138
``df.groupby('A')`` is just syntactic sugar for ``df.groupby(df['A'])``.
139139

140-
The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do
140+
DataFrame groupby always operates along axis 0 (rows). To split by columns, first do
141141
a transpose:
142142

143143
.. ipython::

doc/source/whatsnew/v3.0.0.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -739,6 +739,7 @@ Other Deprecations
739739
- Deprecated option "future.no_silent_downcasting", as it is no longer used. In a future version accessing this option will raise (:issue:`59502`)
740740
- Deprecated silent casting of non-datetime 'other' to datetime in :meth:`Series.combine_first` (:issue:`62931`)
741741
- Deprecated slicing on a :class:`Series` or :class:`DataFrame` with a :class:`DatetimeIndex` using a ``datetime.date`` object, explicitly cast to :class:`Timestamp` instead (:issue:`35830`)
742+
- Deprecated support for the Dataframe Interchange Protocol (:issue:`56732`)
742743
- Deprecated the 'inplace' keyword from :meth:`Resampler.interpolate`, as passing ``True`` raises ``AttributeError`` (:issue:`58690`)
743744

744745
.. ---------------------------------------------------------------------------
@@ -961,6 +962,7 @@ Categorical
961962
^^^^^^^^^^^
962963
- Bug in :class:`Categorical` where constructing from a pandas :class:`Series` or :class:`Index` with ``dtype='object'`` did not preserve the categories' dtype as ``object``; now the ``categories.dtype`` is preserved as ``object`` for these cases, while numpy arrays and Python sequences with ``dtype='object'`` continue to infer the most specific dtype (for example, ``str`` if all elements are strings) (:issue:`61778`)
963964
- Bug in :func:`Series.apply` where ``nan`` was ignored for :class:`CategoricalDtype` (:issue:`59938`)
965+
- Bug in :func:`bdate_range` raising ``ValueError`` with frequency ``freq="cbh"`` (:issue:`62849`)
964966
- Bug in :func:`testing.assert_index_equal` raising ``TypeError`` instead of ``AssertionError`` for incomparable ``CategoricalIndex`` when ``check_categorical=True`` and ``exact=False`` (:issue:`61935`)
965967
- Bug in :meth:`Categorical.astype` where ``copy=False`` would still trigger a copy of the codes (:issue:`62000`)
966968
- Bug in :meth:`DataFrame.pivot` and :meth:`DataFrame.set_index` raising an ``ArrowNotImplementedError`` for columns with pyarrow dictionary dtype (:issue:`53051`)
@@ -1180,6 +1182,7 @@ Groupby/resample/rolling
11801182
- Bug in :meth:`Rolling.apply` for ``method="table"`` where column order was not being respected due to the columns getting sorted by default. (:issue:`59666`)
11811183
- Bug in :meth:`Rolling.apply` where the applied function could be called on fewer than ``min_period`` periods if ``method="table"``. (:issue:`58868`)
11821184
- Bug in :meth:`Series.resample` could raise when the date range ended shortly before a non-existent time. (:issue:`58380`)
1185+
- Bug in :meth:`Series.resample` raising error when resampling non-nanosecond resolutions out of bounds for nanosecond precision (:issue:`57427`)
11831186

11841187
Reshaping
11851188
^^^^^^^^^

pandas/_libs/tslibs/offsets.pyx

Lines changed: 13 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -5688,18 +5688,27 @@ def shift_month(stamp: datetime, months: int, day_opt: object = None) -> datetim
56885688
cdef:
56895689
int year, month, day
56905690
int days_in_month, dy
5691+
npy_datetimestruct dts
5692+
5693+
if isinstance(stamp, _Timestamp):
5694+
creso = (<_Timestamp>stamp)._creso
5695+
val = (<_Timestamp>stamp)._value
5696+
pandas_datetime_to_datetimestruct(val, creso, &dts)
5697+
else:
5698+
# Plain datetime/date
5699+
pydate_to_dtstruct(stamp, &dts)
56915700

5692-
dy = (stamp.month + months) // 12
5693-
month = (stamp.month + months) % 12
5701+
dy = (dts.month + months) // 12
5702+
month = (dts.month + months) % 12
56945703

56955704
if month == 0:
56965705
month = 12
56975706
dy -= 1
5698-
year = stamp.year + dy
5707+
year = dts.year + dy
56995708

57005709
if day_opt is None:
57015710
days_in_month = get_days_in_month(year, month)
5702-
day = min(stamp.day, days_in_month)
5711+
day = min(dts.day, days_in_month)
57035712
elif day_opt == "start":
57045713
day = 1
57055714
elif day_opt == "end":

pandas/conftest.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -135,12 +135,14 @@ def pytest_collection_modifyitems(items, config) -> None:
135135
# Warnings from doctests that can be ignored; place reason in comment above.
136136
# Each entry specifies (path, message) - see the ignore_doctest_warning function
137137
ignored_doctest_warnings = [
138+
("api.interchange.from_dataframe", ".*Interchange Protocol is deprecated"),
138139
("is_int64_dtype", "is_int64_dtype is deprecated"),
139140
("is_interval_dtype", "is_interval_dtype is deprecated"),
140141
("is_period_dtype", "is_period_dtype is deprecated"),
141142
("is_datetime64tz_dtype", "is_datetime64tz_dtype is deprecated"),
142143
("is_categorical_dtype", "is_categorical_dtype is deprecated"),
143144
("is_sparse", "is_sparse is deprecated"),
145+
("DataFrame.__dataframe__", "Interchange Protocol is deprecated"),
144146
("DataFrameGroupBy.fillna", "DataFrameGroupBy.fillna is deprecated"),
145147
("DataFrameGroupBy.corrwith", "DataFrameGroupBy.corrwith is deprecated"),
146148
("NDFrame.replace", "Series.replace without 'value'"),

pandas/core/frame.py

Lines changed: 17 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -916,6 +916,14 @@ def __dataframe__(
916916
"""
917917
Return the dataframe interchange object implementing the interchange protocol.
918918
919+
.. deprecated:: 3.0.0
920+
921+
The Dataframe Interchange Protocol is deprecated.
922+
For dataframe-agnostic code, you may want to look into:
923+
924+
- `Arrow PyCapsule Interface <https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html>`_
925+
- `Narwhals <https://github.com/narwhals-dev/narwhals>`_
926+
919927
.. note::
920928
921929
For new development, we highly recommend using the Arrow C Data Interface
@@ -970,7 +978,14 @@ def __dataframe__(
970978
These methods (``column_names``, ``select_columns_by_name``) should work
971979
for any dataframe library which implements the interchange protocol.
972980
"""
973-
981+
warnings.warn(
982+
"The Dataframe Interchange Protocol is deprecated.\n"
983+
"For dataframe-agnostic code, you may want to look into:\n"
984+
"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n"
985+
"- Narwhals: https://github.com/narwhals-dev/narwhals\n",
986+
Pandas4Warning,
987+
stacklevel=find_stack_level(),
988+
)
974989
from pandas.core.interchange.dataframe import PandasDataFrameXchg
975990

976991
return PandasDataFrameXchg(self, allow_copy=allow_copy)
@@ -9430,7 +9445,7 @@ def groupby(
94309445
index. If a dict or Series is passed, the Series or dict VALUES
94319446
will be used to determine the groups (the Series' values are first
94329447
aligned; see ``.align()`` method). If a list or ndarray of length
9433-
equal to the selected axis is passed (see the `groupby user guide
9448+
equal to the number of rows is passed (see the `groupby user guide
94349449
<https://pandas.pydata.org/pandas-docs/stable/user_guide/groupby.html#splitting-an-object-into-groups>`_),
94359450
the values are used as-is to determine the groups. A label or list
94369451
of labels may be passed to group by the columns in ``self``.

pandas/core/indexes/datetimes.py

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1133,12 +1133,14 @@ def bdate_range(
11331133
msg = "freq must be specified for bdate_range; use date_range instead"
11341134
raise TypeError(msg)
11351135

1136-
if isinstance(freq, str) and freq.startswith("C"):
1136+
if isinstance(freq, str) and freq.upper().startswith("C"):
1137+
msg = f"invalid custom frequency string: {freq}"
1138+
if freq == "CBH":
1139+
raise ValueError(f"{msg}, did you mean cbh?")
11371140
try:
11381141
weekmask = weekmask or "Mon Tue Wed Thu Fri"
11391142
freq = prefix_mapping[freq](holidays=holidays, weekmask=weekmask)
11401143
except (KeyError, TypeError) as err:
1141-
msg = f"invalid custom frequency string: {freq}"
11421144
raise ValueError(msg) from err
11431145
elif holidays or weekmask:
11441146
msg = (

pandas/core/interchange/from_dataframe.py

Lines changed: 23 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,16 @@
66
Any,
77
overload,
88
)
9+
import warnings
910

1011
import numpy as np
1112

1213
from pandas._config import using_string_dtype
1314

1415
from pandas.compat._optional import import_optional_dependency
16+
from pandas.errors import Pandas4Warning
1517
from pandas.util._decorators import set_module
18+
from pandas.util._exceptions import find_stack_level
1619

1720
import pandas as pd
1821
from pandas.core.interchange.dataframe_protocol import (
@@ -47,6 +50,9 @@ def from_dataframe(df, allow_copy: bool = True) -> pd.DataFrame:
4750
From pandas 3.0 onwards, `from_dataframe` uses the PyCapsule Interface,
4851
only falling back to the interchange protocol if that fails.
4952
53+
From pandas 4.0 onwards, that fallback will no longer be available and only
54+
the PyCapsule Interface will be used.
55+
5056
.. warning::
5157
5258
Due to severe implementation issues, we recommend only considering using the
@@ -99,7 +105,14 @@ def from_dataframe(df, allow_copy: bool = True) -> pd.DataFrame:
99105
pa = import_optional_dependency("pyarrow", min_version="14.0.0")
100106
except ImportError:
101107
# fallback to _from_dataframe
102-
pass
108+
warnings.warn(
109+
"Conversion using Arrow PyCapsule Interface failed due to "
110+
"missing PyArrow>=14 dependency, falling back to (deprecated) "
111+
"interchange protocol. We recommend that you install "
112+
"PyArrow>=14.0.0.",
113+
UserWarning,
114+
stacklevel=find_stack_level(),
115+
)
103116
else:
104117
try:
105118
return pa.table(df).to_pandas(zero_copy_only=not allow_copy)
@@ -109,6 +122,15 @@ def from_dataframe(df, allow_copy: bool = True) -> pd.DataFrame:
109122
if not hasattr(df, "__dataframe__"):
110123
raise ValueError("`df` does not support __dataframe__")
111124

125+
warnings.warn(
126+
"The Dataframe Interchange Protocol is deprecated.\n"
127+
"For dataframe-agnostic code, you may want to look into:\n"
128+
"- Arrow PyCapsule Interface: https://arrow.apache.org/docs/format/CDataInterface/PyCapsuleInterface.html\n"
129+
"- Narwhals: https://github.com/narwhals-dev/narwhals\n",
130+
Pandas4Warning,
131+
stacklevel=find_stack_level(),
132+
)
133+
112134
return _from_dataframe(
113135
df.__dataframe__(allow_copy=allow_copy), allow_copy=allow_copy
114136
)

pandas/tests/frame/methods/test_join.py

Lines changed: 24 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -575,3 +575,27 @@ def test_frame_join_tzaware(self):
575575

576576
tm.assert_index_equal(result.index, expected)
577577
assert result.index.tz.key == "US/Central"
578+
579+
def test_frame_join_categorical_index(self):
580+
# GH 61675
581+
cat_data = pd.Categorical(
582+
[3, 4],
583+
categories=pd.Series([2, 3, 4, 5], dtype="Int64"),
584+
ordered=True,
585+
)
586+
values1 = "a b".split()
587+
values2 = "foo bar".split()
588+
df1 = DataFrame({"hr": cat_data, "values1": values1}).set_index("hr")
589+
df2 = DataFrame({"hr": cat_data, "values2": values2}).set_index("hr")
590+
df1.columns = pd.CategoricalIndex([4], dtype=cat_data.dtype, name="other_hr")
591+
df2.columns = pd.CategoricalIndex([3], dtype=cat_data.dtype, name="other_hr")
592+
593+
df_joined = df1.join(df2)
594+
expected = DataFrame(
595+
{"hr": cat_data, "values1": values1, "values2": values2}
596+
).set_index("hr")
597+
expected.columns = pd.CategoricalIndex(
598+
[4, 3], dtype=cat_data.dtype, name="other_hr"
599+
)
600+
601+
tm.assert_frame_equal(df_joined, expected)

pandas/tests/indexes/datetimes/test_date_range.py

Lines changed: 34 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1216,7 +1216,7 @@ def test_cdaterange_holidays_weekmask_requires_freqstr(self):
12161216
)
12171217

12181218
@pytest.mark.parametrize(
1219-
"freq", [freq for freq in prefix_mapping if freq.startswith("C")]
1219+
"freq", [freq for freq in prefix_mapping if freq.upper().startswith("C")]
12201220
)
12211221
def test_all_custom_freq(self, freq):
12221222
# should not raise
@@ -1280,6 +1280,39 @@ def test_data_range_custombusinessday_partial_time(self, unit):
12801280
)
12811281
tm.assert_index_equal(result, expected)
12821282

1283+
def test_cdaterange_cbh(self):
1284+
# GH#62849
1285+
result = bdate_range(
1286+
"2009-03-13",
1287+
"2009-03-15",
1288+
freq="cbh",
1289+
weekmask="Mon Wed Fri",
1290+
holidays=["2009-03-14"],
1291+
)
1292+
expected = DatetimeIndex(
1293+
[
1294+
"2009-03-13 09:00:00",
1295+
"2009-03-13 10:00:00",
1296+
"2009-03-13 11:00:00",
1297+
"2009-03-13 12:00:00",
1298+
"2009-03-13 13:00:00",
1299+
"2009-03-13 14:00:00",
1300+
"2009-03-13 15:00:00",
1301+
"2009-03-13 16:00:00",
1302+
],
1303+
dtype="datetime64[ns]",
1304+
freq="cbh",
1305+
)
1306+
tm.assert_index_equal(result, expected)
1307+
1308+
def test_cdaterange_deprecated_error_CBH(self):
1309+
# GH#62849
1310+
msg = "invalid custom frequency string: CBH, did you mean cbh?"
1311+
with pytest.raises(ValueError, match=msg):
1312+
bdate_range(
1313+
START, END, freq="CBH", weekmask="Mon Wed Fri", holidays=["2009-03-14"]
1314+
)
1315+
12831316

12841317
class TestDateRangeNonNano:
12851318
def test_date_range_reso_validation(self):

0 commit comments

Comments
 (0)