-
Notifications
You must be signed in to change notification settings - Fork 5
Update dependency pandas to v1.5.3 - autoclosed #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
[puLL-Merge] - pandas-dev/[email protected] Diffdiff --git .devcontainer.json .devcontainer.json
index 8bea96aea29c1..7c5d009260c64 100644
--- .devcontainer.json
+++ .devcontainer.json
@@ -9,8 +9,7 @@
// You can edit these settings after create using File > Preferences > Settings > Remote.
"settings": {
"terminal.integrated.shell.linux": "/bin/bash",
- "python.condaPath": "/opt/conda/bin/conda",
- "python.pythonPath": "/opt/conda/bin/python",
+ "python.pythonPath": "/usr/local/bin/python",
"python.formatting.provider": "black",
"python.linting.enabled": true,
"python.linting.flake8Enabled": true,
diff --git .github/actions/setup-conda/action.yml .github/actions/setup-conda/action.yml
index 002d0020c2df1..7d1e54052f938 100644
--- .github/actions/setup-conda/action.yml
+++ .github/actions/setup-conda/action.yml
@@ -18,7 +18,7 @@ runs:
- name: Set Arrow version in ${{ inputs.environment-file }} to ${{ inputs.pyarrow-version }}
run: |
grep -q ' - pyarrow' ${{ inputs.environment-file }}
- sed -i"" -e "s/ - pyarrow/ - pyarrow=${{ inputs.pyarrow-version }}/" ${{ inputs.environment-file }}
+ sed -i"" -e "s/ - pyarrow<10/ - pyarrow=${{ inputs.pyarrow-version }}/" ${{ inputs.environment-file }}
cat ${{ inputs.environment-file }}
shell: bash
if: ${{ inputs.pyarrow-version }}
diff --git .github/workflows/32-bit-linux.yml .github/workflows/32-bit-linux.yml
index 8c9f0b594f321..49df3a077cfe7 100644
--- .github/workflows/32-bit-linux.yml
+++ .github/workflows/32-bit-linux.yml
@@ -5,12 +5,10 @@ on:
branches:
- main
- 1.5.x
- - 1.4.x
pull_request:
branches:
- main
- 1.5.x
- - 1.4.x
paths-ignore:
- "doc/**"
@@ -19,7 +17,7 @@ permissions:
jobs:
pytest:
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
steps:
- name: Checkout
uses: actions/checkout@v3
diff --git .github/workflows/assign.yml .github/workflows/assign.yml
index b7bb8db549f86..b3331060823a9 100644
--- .github/workflows/assign.yml
+++ .github/workflows/assign.yml
@@ -11,7 +11,7 @@ jobs:
permissions:
issues: write
pull-requests: write
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
steps:
- if: github.event.comment.body == 'take'
run: |
diff --git .github/workflows/asv-bot.yml .github/workflows/asv-bot.yml
index abb19a95315b6..d264698e60485 100644
--- .github/workflows/asv-bot.yml
+++ .github/workflows/asv-bot.yml
@@ -21,7 +21,7 @@ jobs:
name: "Run benchmarks"
# TODO: Support more benchmarking options later, against different branches, against self, etc
if: startsWith(github.event.comment.body, '@github-actions benchmark')
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
defaults:
run:
shell: bash -el {0}
diff --git .github/workflows/autoupdate-pre-commit-config.yml .github/workflows/autoupdate-pre-commit-config.yml
index 9a41871c26062..376aa8343c571 100644
--- .github/workflows/autoupdate-pre-commit-config.yml
+++ .github/workflows/autoupdate-pre-commit-config.yml
@@ -15,10 +15,10 @@ jobs:
pull-requests: write # for technote-space/create-pr-action to create a PR
if: github.repository_owner == 'pandas-dev'
name: Autoupdate pre-commit config
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
steps:
- name: Set up Python
- uses: actions/setup-python@v3
+ uses: actions/setup-python@v4
- name: Cache multiple paths
uses: actions/cache@v3
with:
diff --git .github/workflows/code-checks.yml .github/workflows/code-checks.yml
index 6aff77c708378..cb95b224ba677 100644
--- .github/workflows/code-checks.yml
+++ .github/workflows/code-checks.yml
@@ -5,12 +5,10 @@ on:
branches:
- main
- 1.5.x
- - 1.4.x
pull_request:
branches:
- main
- 1.5.x
- - 1.4.x
env:
ENV_FILE: environment.yml
@@ -22,7 +20,7 @@ permissions:
jobs:
pre_commit:
name: pre-commit
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
concurrency:
# https://github.community/t/concurrecy-not-work-for-push/183068/7
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-pre-commit
@@ -32,16 +30,16 @@ jobs:
uses: actions/checkout@v3
- name: Install Python
- uses: actions/setup-python@v3
+ uses: actions/setup-python@v4
with:
- python-version: '3.9.7'
+ python-version: '3.9'
- name: Run pre-commit
uses: pre-commit/[email protected]
typing_and_docstring_validation:
name: Docstring and typing validation
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
defaults:
run:
shell: bash -el {0}
@@ -100,7 +98,7 @@ jobs:
asv-benchmarks:
name: ASV Benchmarks
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
defaults:
run:
shell: bash -el {0}
@@ -131,7 +129,7 @@ jobs:
build_docker_dev_environment:
name: Build Docker Dev Environment
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
defaults:
run:
shell: bash -el {0}
@@ -154,11 +152,11 @@ jobs:
run: docker build --pull --no-cache --tag pandas-dev-env .
- name: Show environment
- run: docker run -w /home/pandas pandas-dev-env mamba run -n pandas-dev python -c "import pandas as pd; print(pd.show_versions())"
+ run: docker run --rm pandas-dev-env python -c "import pandas as pd; print(pd.show_versions())"
requirements-dev-text-installable:
name: Test install requirements-dev.txt
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
concurrency:
# https://github.community/t/concurrecy-not-work-for-push/183068/7
@@ -173,7 +171,7 @@ jobs:
- name: Setup Python
id: setup_python
- uses: actions/setup-python@v3
+ uses: actions/setup-python@v4
with:
python-version: '3.8'
cache: 'pip'
diff --git .github/workflows/codeql.yml .github/workflows/codeql.yml
index 457aa69fb924f..05a5d003c1dd1 100644
--- .github/workflows/codeql.yml
+++ .github/workflows/codeql.yml
@@ -10,7 +10,7 @@ concurrency:
jobs:
analyze:
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
permissions:
actions: read
contents: read
diff --git .github/workflows/docbuild-and-upload.yml .github/workflows/docbuild-and-upload.yml
index cfb4966847721..13da56806de6e 100644
--- .github/workflows/docbuild-and-upload.yml
+++ .github/workflows/docbuild-and-upload.yml
@@ -5,14 +5,12 @@ on:
branches:
- main
- 1.5.x
- - 1.4.x
tags:
- '*'
pull_request:
branches:
- main
- 1.5.x
- - 1.4.x
env:
ENV_FILE: environment.yml
@@ -24,7 +22,7 @@ permissions:
jobs:
web_and_docs:
name: Doc Build and Upload
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
concurrency:
# https://github.community/t/concurrecy-not-work-for-push/183068/7
@@ -66,22 +64,22 @@ jobs:
mkdir -m 700 -p ~/.ssh
echo "${{ secrets.server_ssh_key }}" > ~/.ssh/id_rsa
chmod 600 ~/.ssh/id_rsa
- echo "${{ secrets.server_ip }} ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBE1Kkopomm7FHG5enATf7SgnpICZ4W2bw+Ho+afqin+w7sMcrsa0je7sbztFAV8YchDkiBKnWTG4cRT+KZgZCaY=" > ~/.ssh/known_hosts
+ echo "${{ secrets.server_ip }} ecdsa-sha2-nistp256 AAAAE2VjZHNhLXNoYTItbmlzdHAyNTYAAAAIbmlzdHAyNTYAAABBBFjYkJBk7sos+r7yATODogQc3jUdW1aascGpyOD4bohj8dWjzwLJv/OJ/fyOQ5lmj81WKDk67tGtqNJYGL9acII=" > ~/.ssh/known_hosts
if: github.event_name == 'push' && (github.ref == 'refs/heads/main' || startsWith(github.ref, 'refs/tags/'))
- name: Copy cheatsheets into site directory
run: cp doc/cheatsheet/Pandas_Cheat_Sheet* web/build/
- name: Upload web
- run: rsync -az --delete --exclude='pandas-docs' --exclude='docs' web/build/ docs@${{ secrets.server_ip }}:/usr/share/nginx/pandas
+ run: rsync -az --delete --exclude='pandas-docs' --exclude='docs' web/build/ web@${{ secrets.server_ip }}:/var/www/html
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
- name: Upload dev docs
- run: rsync -az --delete doc/build/html/ docs@${{ secrets.server_ip }}:/usr/share/nginx/pandas/pandas-docs/dev
+ run: rsync -az --delete doc/build/html/ web@${{ secrets.server_ip }}:/var/www/html/pandas-docs/dev
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
- name: Upload prod docs
- run: rsync -az --delete doc/build/html/ docs@${{ secrets.server_ip }}:/usr/share/nginx/pandas/pandas-docs/version/${GITHUB_REF_NAME:1}
+ run: rsync -az --delete doc/build/html/ web@${{ secrets.server_ip }}:/var/www/html/pandas-docs/version/${GITHUB_REF_NAME:1}
if: github.event_name == 'push' && startsWith(github.ref, 'refs/tags/')
- name: Move docs into site directory
diff --git .github/workflows/macos-windows.yml .github/workflows/macos-windows.yml
index 8b3d69943bd9d..5da2d0d281edd 100644
--- .github/workflows/macos-windows.yml
+++ .github/workflows/macos-windows.yml
@@ -5,12 +5,10 @@ on:
branches:
- main
- 1.5.x
- - 1.4.x
pull_request:
branches:
- main
- 1.5.x
- - 1.4.x
paths-ignore:
- "doc/**"
diff --git .github/workflows/python-dev.yml .github/workflows/python-dev.yml
index 683e694069582..0d265182b3924 100644
--- .github/workflows/python-dev.yml
+++ .github/workflows/python-dev.yml
@@ -25,12 +25,10 @@ on:
branches:
- main
- 1.5.x
- - 1.4.x
pull_request:
branches:
- main
- 1.5.x
- - 1.4.x
paths-ignore:
- "doc/**"
@@ -51,7 +49,7 @@ jobs:
strategy:
fail-fast: false
matrix:
- os: [ubuntu-latest, macOS-latest, windows-latest]
+ os: [ubuntu-22.04, macOS-latest, windows-latest]
name: actions-311-dev
timeout-minutes: 120
@@ -75,7 +73,7 @@ jobs:
run: |
python --version
python -m pip install --upgrade pip setuptools wheel
- python -m pip install -i https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
+ python -m pip install --pre --extra-index-url https://pypi.anaconda.org/scipy-wheels-nightly/simple numpy
python -m pip install git+https://github.com/nedbat/coveragepy.git
python -m pip install python-dateutil pytz cython hypothesis==6.52.1 pytest>=6.2.5 pytest-xdist pytest-cov pytest-asyncio>=0.17
python -m pip list
diff --git .github/workflows/sdist.yml .github/workflows/sdist.yml
index 14cede7bc1a39..46b453532ad0b 100644
--- .github/workflows/sdist.yml
+++ .github/workflows/sdist.yml
@@ -5,12 +5,10 @@ on:
branches:
- main
- 1.5.x
- - 1.4.x
pull_request:
branches:
- main
- 1.5.x
- - 1.4.x
types: [labeled, opened, synchronize, reopened]
paths-ignore:
- "doc/**"
@@ -21,7 +19,7 @@ permissions:
jobs:
build:
if: ${{ github.event.label.name == 'Build' || contains(github.event.pull_request.labels.*.name, 'Build') || github.event_name == 'push'}}
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
timeout-minutes: 60
defaults:
run:
@@ -30,7 +28,7 @@ jobs:
strategy:
fail-fast: false
matrix:
- python-version: ["3.8", "3.9", "3.10"]
+ python-version: ["3.8", "3.9", "3.10", "3.11"]
concurrency:
# https://github.community/t/concurrecy-not-work-for-push/183068/7
group: ${{ github.event_name == 'push' && github.run_number || github.ref }}-${{matrix.python-version}}-sdist
@@ -42,7 +40,7 @@ jobs:
fetch-depth: 0
- name: Set up Python
- uses: actions/setup-python@v3
+ uses: actions/setup-python@v4
with:
python-version: ${{ matrix.python-version }}
@@ -86,6 +84,8 @@ jobs:
pip install numpy==1.20.3 ;;
3.10)
pip install numpy==1.21.2 ;;
+ 3.11)
+ pip install numpy==1.23.2 ;;
esac
- name: Import pandas
diff --git .github/workflows/stale-pr.yml .github/workflows/stale-pr.yml
index 69656be18a8b1..c47745e097d17 100644
--- .github/workflows/stale-pr.yml
+++ .github/workflows/stale-pr.yml
@@ -11,7 +11,7 @@ jobs:
stale:
permissions:
pull-requests: write
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
steps:
- uses: actions/stale@v4
with:
diff --git .github/workflows/ubuntu.yml .github/workflows/ubuntu.yml
index b7cddc6bb3d05..4602d12d8505e 100644
--- .github/workflows/ubuntu.yml
+++ .github/workflows/ubuntu.yml
@@ -5,12 +5,10 @@ on:
branches:
- main
- 1.5.x
- - 1.4.x
pull_request:
branches:
- main
- 1.5.x
- - 1.4.x
paths-ignore:
- "doc/**"
@@ -22,7 +20,7 @@ permissions:
jobs:
pytest:
- runs-on: ubuntu-latest
+ runs-on: ubuntu-22.04
defaults:
run:
shell: bash -el {0}
diff --git Dockerfile Dockerfile
index 9de8695b24274..7230dcab20f6e 100644
--- Dockerfile
+++ Dockerfile
@@ -1,42 +1,13 @@
-FROM quay.io/condaforge/mambaforge
+FROM python:3.10.8
+WORKDIR /home/pandas
-# if you forked pandas, you can pass in your own GitHub username to use your fork
-# i.e. gh_username=myname
-ARG gh_username=pandas-dev
-ARG pandas_home="/home/pandas"
+RUN apt-get update && apt-get -y upgrade
+RUN apt-get install -y build-essential
-# Avoid warnings by switching to noninteractive
-ENV DEBIAN_FRONTEND=noninteractive
+# hdf5 needed for pytables installation
+RUN apt-get install -y libhdf5-dev
-# Configure apt and install packages
-RUN apt-get update \
- && apt-get -y install --no-install-recommends apt-utils git tzdata dialog 2>&1 \
- #
- # Configure timezone (fix for tests which try to read from "/etc/localtime")
- && ln -fs /usr/share/zoneinfo/Etc/UTC /etc/localtime \
- && dpkg-reconfigure -f noninteractive tzdata \
- #
- # cleanup
- && apt-get autoremove -y \
- && apt-get clean -y \
- && rm -rf /var/lib/apt/lists/*
-
-# Switch back to dialog for any ad-hoc use of apt-get
-ENV DEBIAN_FRONTEND=dialog
-
-# Clone pandas repo
-RUN mkdir "$pandas_home" \
- && git clone "https://github.com/$gh_username/pandas.git" "$pandas_home" \
- && cd "$pandas_home" \
- && git remote add upstream "https://github.com/pandas-dev/pandas.git" \
- && git pull upstream main
-
-# Set up environment
-RUN mamba env create -f "$pandas_home/environment.yml"
-
-# Build C extensions and pandas
-SHELL ["mamba", "run", "--no-capture-output", "-n", "pandas-dev", "/bin/bash", "-c"]
-RUN cd "$pandas_home" \
- && export \
- && python setup.py build_ext -j 4 \
- && python -m pip install --no-build-isolation -e .
+RUN python -m pip install --upgrade pip
+RUN python -m pip install \
+ -r https://raw.githubusercontent.com/pandas-dev/pandas/main/requirements-dev.txt
+CMD ["/bin/bash"]
diff --git asv_bench/benchmarks/indexing.py asv_bench/benchmarks/indexing.py
index 69e3d166943a8..54da7c109e02a 100644
--- asv_bench/benchmarks/indexing.py
+++ asv_bench/benchmarks/indexing.py
@@ -143,6 +143,12 @@ def setup(self):
def time_loc(self):
self.df.loc[self.idx_scalar, self.col_scalar]
+ def time_at(self):
+ self.df.at[self.idx_scalar, self.col_scalar]
+
+ def time_at_setitem(self):
+ self.df.at[self.idx_scalar, self.col_scalar] = 0.0
+
def time_getitem_scalar(self):
self.df[self.col_scalar][self.idx_scalar]
diff --git asv_bench/benchmarks/sparse.py asv_bench/benchmarks/sparse.py
index d871f907232f5..10390cb4493cd 100644
--- asv_bench/benchmarks/sparse.py
+++ asv_bench/benchmarks/sparse.py
@@ -219,12 +219,12 @@ def setup(self, fill_value):
d = 1e-5
arr = make_array(N, d, np.nan, np.float64)
self.sp_arr = SparseArray(arr)
- b_arr = np.full(shape=N, fill_value=fill_value, dtype=np.bool8)
+ b_arr = np.full(shape=N, fill_value=fill_value, dtype=np.bool_)
fv_inds = np.unique(
np.random.randint(low=0, high=N - 1, size=int(N * d), dtype=np.int32)
)
b_arr[fv_inds] = True if pd.isna(fill_value) else not fill_value
- self.sp_b_arr = SparseArray(b_arr, dtype=np.bool8, fill_value=fill_value)
+ self.sp_b_arr = SparseArray(b_arr, dtype=np.bool_, fill_value=fill_value)
def time_mask(self, fill_value):
self.sp_arr[self.sp_b_arr]
diff --git ci/deps/actions-310.yaml ci/deps/actions-310.yaml
index da3578e7191eb..deb23d435bddf 100644
--- ci/deps/actions-310.yaml
+++ ci/deps/actions-310.yaml
@@ -30,7 +30,7 @@ dependencies:
- gcsfs
- jinja2
- lxml
- - matplotlib
+ - matplotlib>=3.6.1
- numba
- numexpr
- openpyxl
@@ -39,13 +39,13 @@ dependencies:
- psycopg2
- pymysql
- pytables
- - pyarrow
+ - pyarrow<10
- pyreadstat
- python-snappy
- pyxlsb
- s3fs>=2021.08.0
- scipy
- - sqlalchemy
+ - sqlalchemy<1.4.46
- tabulate
- tzdata>=2022a
- xarray
diff --git ci/deps/actions-38-downstream_compat.yaml ci/deps/actions-38-downstream_compat.yaml
index 29ad2669afbd2..06ffafeb70570 100644
--- ci/deps/actions-38-downstream_compat.yaml
+++ ci/deps/actions-38-downstream_compat.yaml
@@ -31,14 +31,14 @@ dependencies:
- gcsfs
- jinja2
- lxml
- - matplotlib
+ - matplotlib>=3.6.1
- numba
- numexpr
- openpyxl
- odfpy
- pandas-gbq
- psycopg2
- - pyarrow
+ - pyarrow<10
- pymysql
- pyreadstat
- pytables
@@ -46,7 +46,7 @@ dependencies:
- pyxlsb
- s3fs>=2021.08.0
- scipy
- - sqlalchemy
+ - sqlalchemy<1.4.46
- tabulate
- xarray
- xlrd
diff --git ci/deps/actions-38.yaml ci/deps/actions-38.yaml
index b478b7c900425..222da40ea9eea 100644
--- ci/deps/actions-38.yaml
+++ ci/deps/actions-38.yaml
@@ -30,14 +30,14 @@ dependencies:
- gcsfs
- jinja2
- lxml
- - matplotlib
+ - matplotlib>=3.6.1
- numba
- numexpr
- openpyxl
- odfpy
- pandas-gbq
- psycopg2
- - pyarrow
+ - pyarrow<10
- pymysql
- pyreadstat
- pytables
@@ -45,7 +45,7 @@ dependencies:
- pyxlsb
- s3fs>=2021.08.0
- scipy
- - sqlalchemy
+ - sqlalchemy<1.4.46
- tabulate
- xarray
- xlrd
diff --git ci/deps/actions-39.yaml ci/deps/actions-39.yaml
index a12f36ba84cca..1c60e8ad6d78a 100644
--- ci/deps/actions-39.yaml
+++ ci/deps/actions-39.yaml
@@ -30,7 +30,7 @@ dependencies:
- gcsfs
- jinja2
- lxml
- - matplotlib
+ - matplotlib>=3.6.1
- numba
- numexpr
- openpyxl
@@ -38,14 +38,14 @@ dependencies:
- pandas-gbq
- psycopg2
- pymysql
- - pyarrow
+ - pyarrow<10
- pyreadstat
- pytables
- python-snappy
- pyxlsb
- s3fs>=2021.08.0
- scipy
- - sqlalchemy
+ - sqlalchemy<1.4.46
- tabulate
- tzdata>=2022a
- xarray
diff --git ci/deps/circle-38-arm64.yaml ci/deps/circle-38-arm64.yaml
index 2b65ece881df7..263521fb74879 100644
--- ci/deps/circle-38-arm64.yaml
+++ ci/deps/circle-38-arm64.yaml
@@ -30,14 +30,14 @@ dependencies:
- gcsfs
- jinja2
- lxml
- - matplotlib
+ - matplotlib>=3.6.1
- numba
- numexpr
- openpyxl
- odfpy
- pandas-gbq
- psycopg2
- - pyarrow
+ - pyarrow<10
- pymysql
# Not provided on ARM
#- pyreadstat
@@ -46,7 +46,7 @@ dependencies:
- pyxlsb
- s3fs>=2021.08.0
- scipy
- - sqlalchemy
+ - sqlalchemy<1.4.46
- tabulate
- xarray
- xlrd
diff --git doc/source/_static/css/pandas.css doc/source/_static/css/pandas.css
index 25153b6a8ad5d..a08be3301edda 100644
--- doc/source/_static/css/pandas.css
+++ doc/source/_static/css/pandas.css
@@ -5,6 +5,10 @@
--pst-color-info: 23, 162, 184;
}
+table {
+ width: auto; /* Override fit-content which breaks Styler user guide ipynb */
+}
+
/* Main index page overview cards */
.intro-card {
diff --git doc/source/development/community.rst doc/source/development/community.rst
index 8046090c36e6f..59689a2cf51d1 100644
--- doc/source/development/community.rst
+++ doc/source/development/community.rst
@@ -100,6 +100,8 @@ The pandas mailing list `[email protected] <mailto://pandas-dev@python
conversations and to engages people in the wider community who might not
be active on the issue tracker but we would like to include in discussions.
+.. _community.slack:
+
Community slack
---------------
diff --git doc/source/development/contributing.rst doc/source/development/contributing.rst
index e76197e302ca4..faa3d29a628f9 100644
--- doc/source/development/contributing.rst
+++ doc/source/development/contributing.rst
@@ -45,8 +45,13 @@ assigned issues, since people may not be working in them anymore. If you want to
that is assigned, feel free to kindly ask the current assignee if you can take it
(please allow at least a week of inactivity before considering work in the issue discontinued).
-Feel free to ask questions on the `mailing list
-<https://groups.google.com/forum/?fromgroups#!forum/pydata>`_ or on `Gitter`_.
+We have several :ref:`contributor community <community>` communication channels, which you are
+welcome to join, and ask questions as you figure things out. Among them are regular meetings for
+new contributors, dev meetings, a dev mailing list, and a slack for the contributor community.
+All pandas contributors are welcome to these spaces, where they can connect with each other. Even
+maintainers who have been with us for a long time felt just like you when they started out, and
+are happy to welcome you and support you as you get to know how we work, and where things are.
+Take a look at the next sections to learn more.
.. _contributing.bug_reports:
@@ -354,8 +359,6 @@ The branch will still exist on GitHub, so to delete it there do::
git push origin --delete shiny-new-feature
-.. _Gitter: https://gitter.im/pydata/pandas
-
Tips for a successful pull request
==================================
diff --git doc/source/development/contributing_codebase.rst doc/source/development/contributing_codebase.rst
index 95803b5be76c3..26692057f3e23 100644
--- doc/source/development/contributing_codebase.rst
+++ doc/source/development/contributing_codebase.rst
@@ -75,6 +75,11 @@ If you want to run checks on all recently committed files on upstream/main you c
without needing to have done ``pre-commit install`` beforehand.
+.. note::
+
+ You may want to periodically run ``pre-commit gc``, to clean up repos
+ which are no longer used.
+
.. note::
If you have conflicting installations of ``virtualenv``, then you may get an
diff --git doc/source/development/contributing_environment.rst doc/source/development/contributing_environment.rst
index 9e6da887671bd..942edd863a19a 100644
--- doc/source/development/contributing_environment.rst
+++ doc/source/development/contributing_environment.rst
@@ -10,123 +10,46 @@ To test out code changes, you'll need to build pandas from source, which
requires a C/C++ compiler and Python environment. If you're making documentation
changes, you can skip to :ref:`contributing to the documentation <contributing_documentation>` but if you skip
creating the development environment you won't be able to build the documentation
-locally before pushing your changes.
+locally before pushing your changes. It's recommended to also install the :ref:`pre-commit hooks <contributing.pre-commit>`.
.. contents:: Table of contents:
:local:
+Step 1: install a C compiler
+----------------------------
-Creating an environment using Docker
---------------------------------------
-
-Instead of manually setting up a development environment, you can use `Docker
-<https://docs.docker.com/get-docker/>`_ to automatically create the environment with just several
-commands. pandas provides a ``DockerFile`` in the root directory to build a Docker image
-with a full pandas development environment.
-
-**Docker Commands**
-
-Build the Docker image::
-
- # Build the image pandas-yourname-env
- docker build --tag pandas-yourname-env .
- # Or build the image by passing your GitHub username to use your own fork
- docker build --build-arg gh_username=yourname --tag pandas-yourname-env .
-
-Run Container::
-
- # Run a container and bind your local repo to the container
- docker run -it -w /home/pandas --rm -v path-to-local-pandas-repo:/home/pandas pandas-yourname-env
-
-Then a ``pandas-dev`` virtual environment will be available with all the development dependencies.
-
-.. code-block:: shell
-
- root@... :/home/pandas# conda env list
- # conda environments:
- #
- base * /opt/conda
- pandas-dev /opt/conda/envs/pandas-dev
-
-.. note::
- If you bind your local repo for the first time, you have to build the C extensions afterwards.
- Run the following command inside the container::
-
- python setup.py build_ext -j 4
-
- You need to rebuild the C extensions anytime the Cython code in ``pandas/_libs`` changes.
- This most frequently occurs when changing or merging branches.
-
-*Even easier, you can integrate Docker with the following IDEs:*
-
-**Visual Studio Code**
-
-You can use the DockerFile to launch a remote session with Visual Studio Code,
-a popular free IDE, using the ``.devcontainer.json`` file.
-See https://code.visualstudio.com/docs/remote/containers for details.
-
-**PyCharm (Professional)**
-
-Enable Docker support and use the Services tool window to build and manage images as well as
-run and interact with containers.
-See https://www.jetbrains.com/help/pycharm/docker.html for details.
-
-Creating an environment without Docker
----------------------------------------
-
-Installing a C compiler
-~~~~~~~~~~~~~~~~~~~~~~~
-
-pandas uses C extensions (mostly written using Cython) to speed up certain
-operations. To install pandas from source, you need to compile these C
-extensions, which means you need a C compiler. This process depends on which
-platform you're using.
-
-If you have setup your environment using ``conda``, the packages ``c-compiler``
-and ``cxx-compiler`` will install a fitting compiler for your platform that is
-compatible with the remaining conda packages. On Windows and macOS, you will
-also need to install the SDKs as they have to be distributed separately.
-These packages will automatically be installed by using the ``pandas``
-``environment.yml`` file.
+How to do this will depend on your platform. If you choose to user ``Docker``
+in the next step, then you can skip this step.
**Windows**
-You will need `Build Tools for Visual Studio 2019
-<https://visualstudio.microsoft.com/downloads/>`_.
-
-.. warning::
- You DO NOT need to install Visual Studio 2019.
- You only need "Build Tools for Visual Studio 2019" found by
- scrolling down to "All downloads" -> "Tools for Visual Studio 2019".
- In the installer, select the "C++ build tools" workload.
+You will need `Build Tools for Visual Studio 2022
+<https://visualstudio.microsoft.com/downloads/#build-tools-for-visual-studio-2022>`_.
-You can install the necessary components on the commandline using
-`vs_buildtools.exe <https://download.visualstudio.microsoft.com/download/pr/9a26f37e-6001-429b-a5db-c5455b93953c/460d80ab276046de2455a4115cc4e2f1e6529c9e6cb99501844ecafd16c619c4/vs_BuildTools.exe>`_:
-
-.. code::
+.. note::
+ You DO NOT need to install Visual Studio 2022.
+ You only need "Build Tools for Visual Studio 2022" found by
+ scrolling down to "All downloads" -> "Tools for Visual Studio".
+ In the installer, select the "Desktop development with C++" Workloads.
- vs_buildtools.exe --quiet --wait --norestart --nocache ^
- --installPath C:\BuildTools ^
- --add "Microsoft.VisualStudio.Workload.VCTools;includeRecommended" ^
- --add Microsoft.VisualStudio.Component.VC.v141 ^
- --add Microsoft.VisualStudio.Component.VC.v141.x86.x64 ^
- --add Microsoft.VisualStudio.Component.Windows10SDK.17763
+Alternatively, you can install the necessary components on the commandline using
+`vs_BuildTools.exe <https://learn.microsoft.com/en-us/visualstudio/install/use-command-line-parameters-to-install-visual-studio?source=recommendations&view=vs-2022>`_
-To setup the right paths on the commandline, call
-``"C:\BuildTools\VC\Auxiliary\Build\vcvars64.bat" -vcvars_ver=14.16 10.0.17763.0``.
+Alternatively, you could use the `WSL <https://learn.microsoft.com/en-us/windows/wsl/install>`_
+and consult the ``Linux`` instructions below.
**macOS**
-To use the ``conda``-based compilers, you will need to install the
+To use the :ref:`mamba <contributing.mamba>`-based compilers, you will need to install the
Developer Tools using ``xcode-select --install``. Otherwise
information about compiler installation can be found here:
https://devguide.python.org/setup/#macos
**Linux**
-For Linux-based ``conda`` installations, you won't have to install any
-additional components outside of the conda environment. The instructions
-below are only needed if your setup isn't based on conda environments.
+For Linux-based :ref:`mamba <contributing.mamba>` installations, you won't have to install any
+additional components outside of the mamba environment. The instructions
+below are only needed if your setup isn't based on mamba environments.
Some Linux distributions will come with a pre-installed C compiler. To find out
which compilers (and versions) are installed on your system::
@@ -138,75 +61,40 @@ which compilers (and versions) are installed on your system::
`GCC (GNU Compiler Collection) <https://gcc.gnu.org/>`_, is a widely used
compiler, which supports C and a number of other languages. If GCC is listed
-as an installed compiler nothing more is required. If no C compiler is
-installed (or you wish to install a newer version) you can install a compiler
-(GCC in the example code below) with::
+as an installed compiler nothing more is required.
- # for recent Debian/Ubuntu:
- sudo apt install build-essential
- # for Red Had/RHEL/CentOS/Fedora
- yum groupinstall "Development Tools"
+If no C compiler is installed, or you wish to upgrade, or you're using a different
+Linux distribution, consult your favorite search engine for compiler installation/update
+instructions.
-For other Linux distributions, consult your favorite search engine for
-compiler installation instructions.
+Let us know if you have any difficulties by opening an issue or reaching out on our contributor
+community :ref:`Slack <community.slack>`.
-Let us know if you have any difficulties by opening an issue or reaching out on `Gitter <https://gitter.im/pydata/pandas/>`_.
+Step 2: create an isolated environment
+----------------------------------------
-Creating a Python environment
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Before we begin, please:
-Now create an isolated pandas development environment:
-
-* Install either `Anaconda <https://www.anaconda.com/products/individual>`_, `miniconda
- <https://docs.conda.io/en/latest/miniconda.html>`_, or `miniforge <https://github.com/conda-forge/miniforge>`_
-* Make sure your conda is up to date (``conda update conda``)
* Make sure that you have :any:`cloned the repository <contributing.forking>`
* ``cd`` to the pandas source directory
-We'll now kick off a three-step process:
+.. _contributing.mamba:
+
+Option 1: using mamba (recommended)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-1. Install the build dependencies
-2. Build and install pandas
-3. Install the optional dependencies
+* Install `mamba <https://mamba.readthedocs.io/en/latest/installation.html>`_
+* Make sure your mamba is up to date (``mamba update mamba``)
.. code-block:: none
# Create and activate the build environment
- conda env create -f environment.yml
- conda activate pandas-dev
-
- # or with older versions of Anaconda:
- source activate pandas-dev
-
- # Build and install pandas
- python setup.py build_ext -j 4
- python -m pip install -e . --no-build-isolation --no-use-pep517
-
-At this point you should be able to import pandas from your locally built version::
-
- $ python
- >>> import pandas
- >>> print(pandas.__version__)
- 0.22.0.dev0+29.g4ad6d4d74
-
-This will create the new environment, and not touch any of your existing environments,
-nor any existing Python installation.
+ mamba env create --file environment.yml
+ mamba activate pandas-dev
-To view your environments::
+Option 2: using pip
+~~~~~~~~~~~~~~~~~~~
- conda info -e
-
-To return to your root environment::
-
- conda deactivate
-
-See the full conda docs `here <https://conda.io/projects/conda/en/latest/>`__.
-
-
-Creating a Python environment (pip)
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-
-If you aren't using conda for your development environment, follow these instructions.
You'll need to have at least the :ref:`minimum Python version <install.version>` that pandas supports.
You also need to have ``setuptools`` 51.0.0 or later to build pandas.
@@ -225,10 +113,6 @@ You also need to have ``setuptools`` 51.0.0 or later to build pandas.
# Install the build dependencies
python -m pip install -r requirements-dev.txt
- # Build and install pandas
- python setup.py build_ext -j 4
- python -m pip install -e . --no-build-isolation --no-use-pep517
-
**Unix**/**macOS with pyenv**
Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
@@ -237,7 +121,6 @@ Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
# Create a virtual environment
# Use an ENV_DIR of your choice. We'll use ~/Users/<yourname>/.pyenv/versions/pandas-dev
-
pyenv virtualenv <version> <name-to-give-it>
# For instance:
@@ -249,19 +132,15 @@ Consult the docs for setting up pyenv `here <https://github.com/pyenv/pyenv>`__.
# Now install the build dependencies in the cloned pandas repo
python -m pip install -r requirements-dev.txt
- # Build and install pandas
- python setup.py build_ext -j 4
- python -m pip install -e . --no-build-isolation --no-use-pep517
-
**Windows**
Below is a brief overview on how to set-up a virtual environment with Powershell
under Windows. For details please refer to the
-`official virtualenv user guide <https://virtualenv.pypa.io/en/latest/user_guide.html#activators>`__
+`official virtualenv user guide <https://virtualenv.pypa.io/en/latest/user_guide.html#activators>`__.
-Use an ENV_DIR of your choice. We'll use ~\\virtualenvs\\pandas-dev where
-'~' is the folder pointed to by either $env:USERPROFILE (Powershell) or
-%USERPROFILE% (cmd.exe) environment variable. Any parent directories
+Use an ENV_DIR of your choice. We'll use ``~\\virtualenvs\\pandas-dev`` where
+``~`` is the folder pointed to by either ``$env:USERPROFILE`` (Powershell) or
+``%USERPROFILE%`` (cmd.exe) environment variable. Any parent directories
should already exist.
.. code-block:: powershell
@@ -275,6 +154,59 @@ should already exist.
# Install the build dependencies
python -m pip install -r requirements-dev.txt
+Option 3: using Docker
+~~~~~~~~~~~~~~~~~~~~~~
+
+pandas provides a ``DockerFile`` in the root directory to build a Docker image
+with a full pandas development environment.
+
+**Docker Commands**
+
+Build the Docker image::
+
+ # Build the image
+ docker build -t pandas-dev .
+
+Run Container::
+
+ # Run a container and bind your local repo to the container
+ # This command assumes you are running from your local repo
+ # but if not alter ${PWD} to match your local repo path
+ docker run -it --rm -v ${PWD}:/home/pandas pandas-dev
+
+*Even easier, you can integrate Docker with the following IDEs:*
+
+**Visual Studio Code**
+
+You can use the DockerFile to launch a remote session with Visual Studio Code,
+a popular free IDE, using the ``.devcontainer.json`` file.
+See https://code.visualstudio.com/docs/remote/containers for details.
+
+**PyCharm (Professional)**
+
+Enable Docker support and use the Services tool window to build and manage images as well as
+run and interact with containers.
+See https://www.jetbrains.com/help/pycharm/docker.html for details.
+
+Step 3: build and install pandas
+--------------------------------
+
+You can now run::
+
# Build and install pandas
python setup.py build_ext -j 4
python -m pip install -e . --no-build-isolation --no-use-pep517
+
+At this point you should be able to import pandas from your locally built version::
+
+ $ python
+ >>> import pandas
+ >>> print(pandas.__version__) # note: the exact output may differ
+ 2.0.0.dev0+880.g2b9e661fbb.dirty
+
+This will create the new environment, and not touch any of your existing environments,
+nor any existing Python installation.
+
+.. note::
+ You will need to repeat this step each time the C extensions change, for example
+ if you modified any file in ``pandas/_libs`` or if you did a fetch and merge from ``upstream/main``.
diff --git doc/source/getting_started/comparison/comparison_with_sas.rst doc/source/getting_started/comparison/comparison_with_sas.rst
index 5a624c9c55782..595f3c85a9dc2 100644
--- doc/source/getting_started/comparison/comparison_with_sas.rst
+++ doc/source/getting_started/comparison/comparison_with_sas.rst
@@ -112,7 +112,7 @@ The pandas method is :func:`read_csv`, which works similarly.
.. ipython:: python
url = (
- "https://raw.github.com/pandas-dev/"
+ "https://raw.githubusercontent.com/pandas-dev/"
"pandas/main/pandas/tests/io/data/csv/tips.csv"
)
tips = pd.read_csv(url)
diff --git doc/source/getting_started/comparison/comparison_with_spreadsheets.rst doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
index a7148405ba8a0..d55b669d94a87 100644
--- doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
+++ doc/source/getting_started/comparison/comparison_with_spreadsheets.rst
@@ -100,7 +100,7 @@ In pandas, you pass the URL or local path of the CSV file to :func:`~pandas.read
.. ipython:: python
url = (
- "https://raw.github.com/pandas-dev"
+ "https://raw.githubusercontent.com/pandas-dev"
"/pandas/main/pandas/tests/io/data/csv/tips.csv"
)
tips = pd.read_csv(url)
diff --git doc/source/getting_started/comparison/comparison_with_sql.rst doc/source/getting_started/comparison/comparison_with_sql.rst
index 0a891a4c6d2d7..a6d9d65e85645 100644
--- doc/source/getting_started/comparison/comparison_with_sql.rst
+++ doc/source/getting_started/comparison/comparison_with_sql.rst
@@ -17,7 +17,7 @@ structure.
.. ipython:: python
url = (
- "https://raw.github.com/pandas-dev"
+ "https://raw.githubusercontent.com/pandas-dev"
"/pandas/main/pandas/tests/io/data/csv/tips.csv"
)
tips = pd.read_csv(url)
diff --git doc/source/getting_started/comparison/comparison_with_stata.rst doc/source/getting_started/comparison/comparison_with_stata.rst
index 636778a2ca32e..b4b0c42d1db1d 100644
--- doc/source/getting_started/comparison/comparison_with_stata.rst
+++ doc/source/getting_started/comparison/comparison_with_stata.rst
@@ -108,7 +108,7 @@ the data set if presented with a url.
.. ipython:: python
url = (
- "https://raw.github.com/pandas-dev"
+ "https://raw.githubusercontent.com/pandas-dev"
"/pandas/main/pandas/tests/io/data/csv/tips.csv"
)
tips = pd.read_csv(url)
diff --git doc/source/getting_started/install.rst doc/source/getting_started/install.rst
index 00251854e3ffa..31eaa2367b683 100644
--- doc/source/getting_started/install.rst
+++ doc/source/getting_started/install.rst
@@ -20,7 +20,7 @@ Instructions for installing from source,
Python version support
----------------------
-Officially Python 3.8, 3.9 and 3.10.
+Officially Python 3.8, 3.9, 3.10 and 3.11.
Installing pandas
-----------------
diff --git doc/source/user_guide/style.ipynb doc/source/user_guide/style.ipynb
index 43021fcbc13fb..620e3806a33b5 100644
--- doc/source/user_guide/style.ipynb
+++ doc/source/user_guide/style.ipynb
@@ -1594,8 +1594,9 @@
"\n",
"\n",
"- Only CSS2 named colors and hex colors of the form `#rgb` or `#rrggbb` are currently supported.\n",
- "- The following pseudo CSS properties are also available to set excel specific style properties:\n",
+ "- The following pseudo CSS properties are also available to set Excel specific style properties:\n",
" - `number-format`\n",
+ " - `border-style` (for Excel-specific styles: \"hair\", \"mediumDashDot\", \"dashDotDot\", \"mediumDashDotDot\", \"dashDot\", \"slantDashDot\", or \"mediumDashed\")\n",
"\n",
"Table level styles, and data cell CSS-classes are not included in the export to Excel: individual cells must have their properties mapped by the `Styler.apply` and/or `Styler.applymap` methods."
]
diff --git doc/source/whatsnew/index.rst doc/source/whatsnew/index.rst
index f9f96f02e7224..e2f3b45d47bef 100644
--- doc/source/whatsnew/index.rst
+++ doc/source/whatsnew/index.rst
@@ -16,6 +16,8 @@ Version 1.5
.. toctree::
:maxdepth: 2
+ v1.5.3
+ v1.5.2
v1.5.1
v1.5.0
diff --git doc/source/whatsnew/v0.13.0.rst doc/source/whatsnew/v0.13.0.rst
index 8265ad58f7ea3..44223bc694360 100644
--- doc/source/whatsnew/v0.13.0.rst
+++ doc/source/whatsnew/v0.13.0.rst
@@ -733,7 +733,7 @@ Enhancements
.. _scipy: http://www.scipy.org
.. _documentation: http://docs.scipy.org/doc/scipy/reference/interpolate.html#univariate-interpolation
-.. _guide: http://docs.scipy.org/doc/scipy/reference/tutorial/interpolate.html
+.. _guide: https://docs.scipy.org/doc/scipy/tutorial/interpolate.html
- ``to_csv`` now takes a ``date_format`` keyword argument that specifies how
output datetime objects should be formatted. Datetimes encountered in the
diff --git doc/source/whatsnew/v1.5.0.rst doc/source/whatsnew/v1.5.0.rst
index e6ef3c45c14bb..ecd38555be040 100644
--- doc/source/whatsnew/v1.5.0.rst
+++ doc/source/whatsnew/v1.5.0.rst
@@ -290,6 +290,52 @@ and attributes without holding entire tree in memory (:issue:`45442`).
.. _`lxml's iterparse`: https://lxml.de/3.2/parsing.html#iterparse-and-iterwalk
.. _`etree's iterparse`: https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.iterparse
+.. _whatsnew_150.enhancements.copy_on_write:
+
+Copy on Write
+^^^^^^^^^^^^^
+
+A new feature ``copy_on_write`` was added (:issue:`46958`). Copy on write ensures that
+any DataFrame or Series derived from another in any way always behaves as a copy.
+Copy on write disallows updating any other object than the object the method
+was applied to.
+
+Copy on write can be enabled through:
+
+.. code-block:: python
+
+ pd.set_option("mode.copy_on_write", True)
+ pd.options.mode.copy_on_write = True
+
+Alternatively, copy on write can be enabled locally through:
+
+.. code-block:: python
+
+ with pd.option_context("mode.copy_on_write", True):
+ ...
+
+Without copy on write, the parent :class:`DataFrame` is updated when updating a child
+:class:`DataFrame` that was derived from this :class:`DataFrame`.
+
+.. ipython:: python
+
+ df = pd.DataFrame({"foo": [1, 2, 3], "bar": 1})
+ view = df["foo"]
+ view.iloc[0]
+ df
+
+With copy on write enabled, df won't be updated anymore:
+
+.. ipython:: python
+
+ with pd.option_context("mode.copy_on_write", True):
+ df = pd.DataFrame({"foo": [1, 2, 3], "bar": 1})
+ view = df["foo"]
+ view.iloc[0]
+ df
+
+A more detailed explanation can be found `here <https://phofl.github.io/cow-introduction.html>`_.
+
.. _whatsnew_150.enhancements.other:
Other enhancements
@@ -1155,7 +1201,6 @@ Plotting
- Bug in :meth:`DataFrame.boxplot` that prevented passing in ``xlabel`` and ``ylabel`` (:issue:`45463`)
- Bug in :meth:`DataFrame.boxplot` that prevented specifying ``vert=False`` (:issue:`36918`)
- Bug in :meth:`DataFrame.plot.scatter` that prevented specifying ``norm`` (:issue:`45809`)
-- The function :meth:`DataFrame.plot.scatter` now accepts ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` for consistency to other plotting functions (:issue:`44670`)
- Fix showing "None" as ylabel in :meth:`Series.plot` when not setting ylabel (:issue:`46129`)
- Bug in :meth:`DataFrame.plot` that led to xticks and vertical grids being improperly placed when plotting a quarterly series (:issue:`47602`)
- Bug in :meth:`DataFrame.plot` that prevented setting y-axis label, limits and ticks for a secondary y-axis (:issue:`47753`)
@@ -1246,4 +1291,4 @@ Other
Contributors
~~~~~~~~~~~~
-.. contributors:: v1.4.4..v1.5.0|HEAD
+.. contributors:: v1.4.4..v1.5.0
diff --git doc/source/whatsnew/v1.5.1.rst doc/source/whatsnew/v1.5.1.rst
index b9477908c6ad9..bcd8ddb9cbc0b 100644
--- doc/source/whatsnew/v1.5.1.rst
+++ doc/source/whatsnew/v1.5.1.rst
@@ -1,6 +1,6 @@
.. _whatsnew_151:
-What's new in 1.5.1 (October ??, 2022)
+What's new in 1.5.1 (October 19, 2022)
--------------------------------------
These are the changes in pandas 1.5.1. See :ref:`release` for a full changelog
@@ -76,14 +76,14 @@ Fixed regressions
- Fixed regression in :meth:`DataFrame.loc` raising ``FutureWarning`` when setting an empty :class:`DataFrame` (:issue:`48480`)
- Fixed regression in :meth:`DataFrame.describe` raising ``TypeError`` when result contains ``NA`` (:issue:`48778`)
- Fixed regression in :meth:`DataFrame.plot` ignoring invalid ``colormap`` for ``kind="scatter"`` (:issue:`48726`)
-- Fixed regression in :meth:`MultiIndex.values`` resetting ``freq`` attribute of underlying :class:`Index` object (:issue:`49054`)
+- Fixed regression in :meth:`MultiIndex.values` resetting ``freq`` attribute of underlying :class:`Index` object (:issue:`49054`)
- Fixed performance regression in :func:`factorize` when ``na_sentinel`` is not ``None`` and ``sort=False`` (:issue:`48620`)
- Fixed regression causing an ``AttributeError`` during warning emitted if the provided table name in :meth:`DataFrame.to_sql` and the table name actually used in the database do not match (:issue:`48733`)
- Fixed regression in :func:`to_datetime` when ``arg`` was a date string with nanosecond and ``format`` contained ``%f`` would raise a ``ValueError`` (:issue:`48767`)
-- Fixed regression in :func:`assert_frame_equal` raising for :class:`MultiIndex` with :class:`Categorical` and ``check_like=True`` (:issue:`48975`)
+- Fixed regression in :func:`testing.assert_frame_equal` raising for :class:`MultiIndex` with :class:`Categorical` and ``check_like=True`` (:issue:`48975`)
- Fixed regression in :meth:`DataFrame.fillna` replacing wrong values for ``datetime64[ns]`` dtype and ``inplace=True`` (:issue:`48863`)
- Fixed :meth:`.DataFrameGroupBy.size` not returning a Series when ``axis=1`` (:issue:`48738`)
-- Fixed Regression in :meth:`DataFrameGroupBy.apply` when user defined function is called on an empty dataframe (:issue:`47985`)
+- Fixed Regression in :meth:`.DataFrameGroupBy.apply` when user defined function is called on an empty dataframe (:issue:`47985`)
- Fixed regression in :meth:`DataFrame.apply` when passing non-zero ``axis`` via keyword argument (:issue:`48656`)
- Fixed regression in :meth:`Series.groupby` and :meth:`DataFrame.groupby` when the grouper is a nullable data type (e.g. :class:`Int64`) or a PyArrow-backed string array, contains null values, and ``dropna=False`` (:issue:`48794`)
- Fixed performance regression in :meth:`Series.isin` with mismatching dtypes (:issue:`49162`)
@@ -99,7 +99,7 @@ Bug fixes
~~~~~~~~~
- Bug in :meth:`Series.__getitem__` not falling back to positional for integer keys and boolean :class:`Index` (:issue:`48653`)
- Bug in :meth:`DataFrame.to_hdf` raising ``AssertionError`` with boolean index (:issue:`48667`)
-- Bug in :func:`assert_index_equal` for extension arrays with non matching ``NA`` raising ``ValueError`` (:issue:`48608`)
+- Bug in :func:`testing.assert_index_equal` for extension arrays with non matching ``NA`` raising ``ValueError`` (:issue:`48608`)
- Bug in :meth:`DataFrame.pivot_table` raising unexpected ``FutureWarning`` when setting datetime column as index (:issue:`48683`)
- Bug in :meth:`DataFrame.sort_values` emitting unnecessary ``FutureWarning`` when called on :class:`DataFrame` with boolean sparse columns (:issue:`48784`)
- Bug in :class:`.arrays.ArrowExtensionArray` with a comparison operator to an invalid object would not raise a ``NotImplementedError`` (:issue:`48833`)
@@ -111,8 +111,6 @@ Bug fixes
Other
~~~~~
- Avoid showing deprecated signatures when introspecting functions with warnings about arguments becoming keyword-only (:issue:`48692`)
--
--
.. ---------------------------------------------------------------------------
@@ -120,3 +118,5 @@ Other
Contributors
~~~~~~~~~~~~
+
+.. contributors:: v1.5.0..v1.5.1
diff --git a/doc/source/whatsnew/v1.5.2.rst b/doc/source/whatsnew/v1.5.2.rst
new file mode 100644
index 0000000000000..6397016d827f2
--- /dev/null
+++ doc/source/whatsnew/v1.5.2.rst
@@ -0,0 +1,46 @@
+.. _whatsnew_152:
+
+What's new in 1.5.2 (November 21, 2022)
+---------------------------------------
+
+These are the changes in pandas 1.5.2. See :ref:`release` for a full changelog
+including other versions of pandas.
+
+{{ header }}
+
+.. ---------------------------------------------------------------------------
+.. _whatsnew_152.regressions:
+
+Fixed regressions
+~~~~~~~~~~~~~~~~~
+- Fixed regression in :meth:`MultiIndex.join` for extension array dtypes (:issue:`49277`)
+- Fixed regression in :meth:`Series.replace` raising ``RecursionError`` with numeric dtype and when specifying ``value=None`` (:issue:`45725`)
+- Fixed regression in arithmetic operations for :class:`DataFrame` with :class:`MultiIndex` columns with different dtypes (:issue:`49769`)
+- Fixed regression in :meth:`DataFrame.plot` preventing :class:`~matplotlib.colors.Colormap` instance
+ from being passed using the ``colormap`` argument if Matplotlib 3.6+ is used (:issue:`49374`)
+- Fixed regression in :func:`date_range` returning an invalid set of periods for ``CustomBusinessDay`` frequency and ``start`` date with timezone (:issue:`49441`)
+- Fixed performance regression in groupby operations (:issue:`49676`)
+- Fixed regression in :class:`Timedelta` constructor returning object of wrong type when subclassing ``Timedelta`` (:issue:`49579`)
+
+.. ---------------------------------------------------------------------------
+.. _whatsnew_152.bug_fixes:
+
+Bug fixes
+~~~~~~~~~
+- Bug in the Copy-on-Write implementation losing track of views in certain chained indexing cases (:issue:`48996`)
+- Fixed memory leak in :meth:`.Styler.to_excel` (:issue:`49751`)
+
+.. ---------------------------------------------------------------------------
+.. _whatsnew_152.other:
+
+Other
+~~~~~
+- Reverted ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` in function :meth:`DataFrame.plot.scatter` (:issue:`49732`)
+
+.. ---------------------------------------------------------------------------
+.. _whatsnew_152.contributors:
+
+Contributors
+~~~~~~~~~~~~
+
+.. contributors:: v1.5.1..v1.5.2|HEAD
diff --git a/doc/source/whatsnew/v1.5.3.rst b/doc/source/whatsnew/v1.5.3.rst
new file mode 100644
index 0000000000000..97c4c73f08c37
--- /dev/null
+++ doc/source/whatsnew/v1.5.3.rst
@@ -0,0 +1,59 @@
+.. _whatsnew_153:
+
+What's new in 1.5.3 (January 18, 2023)
+--------------------------------------
+
+These are the changes in pandas 1.5.3. See :ref:`release` for a full changelog
+including other versions of pandas.
+
+{{ header }}
+
+.. ---------------------------------------------------------------------------
+.. _whatsnew_153.regressions:
+
+Fixed regressions
+~~~~~~~~~~~~~~~~~
+- Fixed performance regression in :meth:`Series.isin` when ``values`` is empty (:issue:`49839`)
+- Fixed regression in :meth:`DataFrame.memory_usage` showing unnecessary ``FutureWarning`` when :class:`DataFrame` is empty (:issue:`50066`)
+- Fixed regression in :meth:`.DataFrameGroupBy.transform` when used with ``as_index=False`` (:issue:`49834`)
+- Enforced reversion of ``color`` as an alias for ``c`` and ``size`` as an alias for ``s`` in function :meth:`DataFrame.plot.scatter` (:issue:`49732`)
+- Fixed regression in :meth:`.SeriesGroupBy.apply` setting a ``name`` attribute on the result if the result was a :class:`DataFrame` (:issue:`49907`)
+- Fixed performance regression in setting with the :meth:`~DataFrame.at` indexer (:issue:`49771`)
+- Fixed regression in the methods ``apply``, ``agg``, and ``transform`` when used with NumPy functions that informed users to supply ``numeric_only=True`` if the operation failed on non-numeric dtypes; such columns must be dropped prior to using these methods (:issue:`50538`)
+- Fixed regression in :func:`to_datetime` raising ``ValueError`` when parsing array of ``float`` containing ``np.nan`` (:issue:`50237`)
+
+.. ---------------------------------------------------------------------------
+.. _whatsnew_153.bug_fixes:
+
+Bug fixes
+~~~~~~~~~
+- Bug in the Copy-on-Write implementation losing track of views when indexing a :class:`DataFrame` with another :class:`DataFrame` (:issue:`50630`)
+- Bug in :meth:`.Styler.to_excel` leading to error when unrecognized ``border-style`` (e.g. ``"hair"``) provided to Excel writers (:issue:`48649`)
+- Bug in :meth:`Series.quantile` emitting warning from NumPy when :class:`Series` has only ``NA`` values (:issue:`50681`)
+- Bug when chaining several :meth:`.Styler.concat` calls, only the last styler was concatenated (:issue:`49207`)
+- Fixed bug when instantiating a :class:`DataFrame` subclass inheriting from ``typing.Generic`` that triggered a ``UserWarning`` on python 3.11 (:issue:`49649`)
+- Bug in :func:`pivot_table` with NumPy 1.24 or greater when the :class:`DataFrame` columns has nested elements (:issue:`50342`)
+- Bug in :func:`pandas.testing.assert_series_equal` (and equivalent ``assert_`` functions) when having nested data and using numpy >= 1.25 (:issue:`50360`)
+
+.. ---------------------------------------------------------------------------
+.. _whatsnew_153.other:
+
+Other
+~~~~~
+
+.. note::
+
+ If you are using :meth:`DataFrame.to_sql`, :func:`read_sql`, :func:`read_sql_table`, or :func:`read_sql_query` with SQLAlchemy 1.4.46 or greater,
+ you may see a ``sqlalchemy.exc.RemovedIn20Warning``. These warnings can be safely ignored for the SQLAlchemy 1.4.x releases
+ as pandas works toward compatibility with SQLAlchemy 2.0.
+
+- Reverted deprecation (:issue:`45324`) of behavior of :meth:`Series.__getitem__` and :meth:`Series.__setitem__` slicing with an integer :class:`Index`; this will remain positional (:issue:`49612`)
+- A ``FutureWarning`` raised when attempting to set values inplace with :meth:`DataFrame.loc` or :meth:`DataFrame.iloc` has been changed to a ``DeprecationWarning`` (:issue:`48673`)
+
+.. ---------------------------------------------------------------------------
+.. _whatsnew_153.contributors:
+
+Contributors
+~~~~~~~~~~~~
+
+.. contributors:: v1.5.2..v1.5.3|HEAD
diff --git environment.yml environment.yml
index 6316e2a489908..20f839db9ad60 100644
--- environment.yml
+++ environment.yml
@@ -31,14 +31,14 @@ dependencies:
- gcsfs
- jinja2
- lxml
- - matplotlib
+ - matplotlib>=3.6.1
- numba>=0.53.1
- numexpr>=2.8.0 # pin for "Run checks on imported code" job
- openpyxl
- odfpy
- pandas-gbq
- psycopg2
- - pyarrow
+ - pyarrow<10
- pymysql
- pyreadstat
- pytables
@@ -46,7 +46,7 @@ dependencies:
- pyxlsb
- s3fs>=2021.08.0
- scipy
- - sqlalchemy
+ - sqlalchemy<1.4.46
- tabulate
- tzdata>=2022a
- xarray
diff --git pandas/_libs/internals.pyx pandas/_libs/internals.pyx
index 94ae4a021da4d..ded161c70f121 100644
--- pandas/_libs/internals.pyx
+++ pandas/_libs/internals.pyx
@@ -676,8 +676,9 @@ cdef class BlockManager:
public bint _known_consolidated, _is_consolidated
public ndarray _blknos, _blklocs
public list refs
+ public object parent
- def __cinit__(self, blocks=None, axes=None, refs=None, verify_integrity=True):
+ def __cinit__(self, blocks=None, axes=None, refs=None, parent=None, verify_integrity=True):
# None as defaults for unpickling GH#42345
if blocks is None:
# This adds 1-2 microseconds to DataFrame(np.array([]))
@@ -690,6 +691,7 @@ cdef class BlockManager:
self.blocks = blocks
self.axes = axes.copy() # copy to make sure we are not remotely-mutable
self.refs = refs
+ self.parent = parent
# Populate known_consolidate, blknos, and blklocs lazily
self._known_consolidated = False
@@ -805,7 +807,9 @@ cdef class BlockManager:
nrefs.append(weakref.ref(blk))
new_axes = [self.axes[0], self.axes[1]._getitem_slice(slobj)]
- mgr = type(self)(tuple(nbs), new_axes, nrefs, verify_integrity=False)
+ mgr = type(self)(
+ tuple(nbs), new_axes, nrefs, parent=self, verify_integrity=False
+ )
# We can avoid having to rebuild blklocs/blknos
blklocs = self._blklocs
@@ -827,4 +831,6 @@ cdef class BlockManager:
new_axes = list(self.axes)
new_axes[axis] = new_axes[axis]._getitem_slice(slobj)
- return type(self)(tuple(new_blocks), new_axes, new_refs, verify_integrity=False)
+ return type(self)(
+ tuple(new_blocks), new_axes, new_refs, parent=self, verify_integrity=False
+ )
diff --git pandas/_libs/lib.pyx pandas/_libs/lib.pyx
index 8c77dc5fe1d1b..d2c2697c05812 100644
--- pandas/_libs/lib.pyx
+++ pandas/_libs/lib.pyx
@@ -1,7 +1,10 @@
from collections import abc
from decimal import Decimal
from enum import Enum
-from typing import Literal
+from typing import (
+ Literal,
+ _GenericAlias,
+)
import warnings
cimport cython
@@ -1136,7 +1139,8 @@ cdef inline bint c_is_list_like(object obj, bint allow_sets) except -1:
# equiv: `isinstance(obj, abc.Iterable)`
getattr(obj, "__iter__", None) is not None and not isinstance(obj, type)
# we do not count strings/unicode/bytes as list-like
- and not isinstance(obj, (str, bytes))
+ # exclude Generic types that have __iter__
+ and not isinstance(obj, (str, bytes, _GenericAlias))
# exclude zero-dimensional duck-arrays, effectively scalars
and not (hasattr(obj, "ndim") and obj.ndim == 0)
# exclude sets if allow_sets is False
@@ -1480,6 +1484,8 @@ def infer_dtype(value: object, skipna: bool = True) -> str:
else:
if not isinstance(value, list):
value = list(value)
+ if not value:
+ return "empty"
from pandas.core.dtypes.cast import construct_1d_object_array_from_listlike
values = construct_1d_object_array_from_listlike(value)
diff --git pandas/_libs/tslibs/offsets.pyx pandas/_libs/tslibs/offsets.pyx
index 678ecf103b3d6..242eeffd1ee79 100644
--- pandas/_libs/tslibs/offsets.pyx
+++ pandas/_libs/tslibs/offsets.pyx
@@ -258,7 +258,9 @@ cdef _to_dt64D(dt):
if getattr(dt, 'tzinfo', None) is not None:
# Get the nanosecond timestamp,
# equiv `Timestamp(dt).value` or `dt.timestamp() * 10**9`
- naive = dt.astimezone(None)
+ # The `naive` must be the `dt` naive wall time
+ # instead of the naive absolute time (GH#49441)
+ naive = dt.replace(tzinfo=None)
dt = np.datetime64(naive, "D")
else:
dt = np.datetime64(dt)
@@ -4132,7 +4134,9 @@ def shift_months(
cnp.broadcast mi = cnp.PyArray_MultiIterNew2(out, dtindex)
- if day_opt not in [None, "start", "end", "business_start", "business_end"]:
+ if day_opt is not None and day_opt not in {
+ "start", "end", "business_start", "business_end"
+ }:
raise ValueError("day must be None, 'start', 'end', "
"'business_start', or 'business_end'")
diff --git pandas/_libs/tslibs/parsing.pyx pandas/_libs/tslibs/parsing.pyx
index 074cc5504d6da..35f97f1978b69 100644
--- pandas/_libs/tslibs/parsing.pyx
+++ pandas/_libs/tslibs/parsing.pyx
@@ -12,7 +12,6 @@ from cpython.datetime cimport (
datetime,
datetime_new,
import_datetime,
- tzinfo,
)
from cpython.object cimport PyObject_Str
from cython cimport Py_ssize_t
@@ -44,7 +43,6 @@ from dateutil.relativedelta import relativedelta
from dateutil.tz import (
tzlocal as _dateutil_tzlocal,
tzoffset,
- tzstr as _dateutil_tzstr,
tzutc as _dateutil_tzutc,
)
@@ -441,7 +439,7 @@ cdef parse_datetime_string_with_reso(
try:
parsed, reso = dateutil_parse(date_string, _DEFAULT_DATETIME,
dayfirst=dayfirst, yearfirst=yearfirst,
- ignoretz=False, tzinfos=None)
+ ignoretz=False)
except (ValueError, OverflowError) as err:
# TODO: allow raise of errors within instead
raise DateParseError(err)
@@ -633,7 +631,6 @@ cdef dateutil_parse(
str timestr,
object default,
bint ignoretz=False,
- object tzinfos=None,
bint dayfirst=False,
bint yearfirst=False,
):
@@ -642,7 +639,7 @@ cdef dateutil_parse(
cdef:
str attr
datetime ret
- object res, tzdata
+ object res
object reso = None
dict repl = {}
@@ -671,24 +668,7 @@ cdef dateutil_parse(
if res.weekday is not None and not res.day:
ret = ret + relativedelta.relativedelta(weekday=res.weekday)
if not ignoretz:
- if callable(tzinfos) or tzinfos and res.tzname in tzinfos:
- # Note: as of 1.0 this is not reached because
- # we never pass tzinfos, see GH#22234
- if callable(tzinfos):
- tzdata = tzinfos(res.tzname, res.tzoffset)
- else:
- tzdata = tzinfos.get(res.tzname)
- if isinstance(tzdata, tzinfo):
- new_tzinfo = tzdata
- elif isinstance(tzdata, str):
- new_tzinfo = _dateutil_tzstr(tzdata)
- elif isinstance(tzdata, int):
- new_tzinfo = tzoffset(res.tzname, tzdata)
- else:
- raise ValueError("offset must be tzinfo subclass, "
- "tz string, or int offset")
- ret = ret.replace(tzinfo=new_tzinfo)
- elif res.tzname and res.tzname in time.tzname:
+ if res.tzname and res.tzname in time.tzname:
ret = ret.replace(tzinfo=_dateutil_tzlocal())
elif res.tzoffset == 0:
ret = ret.replace(tzinfo=_dateutil_tzutc())
diff --git pandas/_libs/tslibs/timedeltas.pyx pandas/_libs/tslibs/timedeltas.pyx
index b0200990809cc..0b33af1159fe5 100644
--- pandas/_libs/tslibs/timedeltas.pyx
+++ pandas/_libs/tslibs/timedeltas.pyx
@@ -188,7 +188,7 @@ def ints_to_pytimedelta(ndarray m8values, box=False):
res_val = <object>NaT
else:
if box:
- res_val = _timedelta_from_value_and_reso(value, reso=reso)
+ res_val = _timedelta_from_value_and_reso(Timedelta, value, reso=reso)
elif reso == NPY_DATETIMEUNIT.NPY_FR_ns:
res_val = timedelta(microseconds=int(value) / 1000)
elif reso == NPY_DATETIMEUNIT.NPY_FR_us:
@@ -737,7 +737,7 @@ cdef bint _validate_ops_compat(other):
def _op_unary_method(func, name):
def f(self):
new_value = func(self.value)
- return _timedelta_from_value_and_reso(new_value, self._reso)
+ return _timedelta_from_value_and_reso(Timedelta, new_value, self._reso)
f.__name__ = name
return f
@@ -807,7 +807,7 @@ def _binary_op_method_timedeltalike(op, name):
# TODO: more generally could do an overflowcheck in op?
return NaT
- return _timedelta_from_value_and_reso(res, reso=self._reso)
+ return _timedelta_from_value_and_reso(Timedelta, res, reso=self._reso)
f.__name__ = name
return f
@@ -938,10 +938,10 @@ cdef _to_py_int_float(v):
def _timedelta_unpickle(value, reso):
- return _timedelta_from_value_and_reso(value, reso)
+ return _timedelta_from_value_and_reso(Timedelta, value, reso)
-cdef _timedelta_from_value_and_reso(int64_t value, NPY_DATETIMEUNIT reso):
+cdef _timedelta_from_value_and_reso(cls, int64_t value, NPY_DATETIMEUNIT reso):
# Could make this a classmethod if/when cython supports cdef classmethods
cdef:
_Timedelta td_base
@@ -951,13 +951,13 @@ cdef _timedelta_from_value_and_reso(int64_t value, NPY_DATETIMEUNIT reso):
# We pass 0 instead, and override seconds, microseconds, days.
# In principle we could pass 0 for ns and us too.
if reso == NPY_FR_ns:
- td_base = _Timedelta.__new__(Timedelta, microseconds=int(value) // 1000)
+ td_base = _Timedelta.__new__(cls, microseconds=int(value) // 1000)
elif reso == NPY_DATETIMEUNIT.NPY_FR_us:
- td_base = _Timedelta.__new__(Timedelta, microseconds=int(value))
+ td_base = _Timedelta.__new__(cls, microseconds=int(value))
elif reso == NPY_DATETIMEUNIT.NPY_FR_ms:
- td_base = _Timedelta.__new__(Timedelta, milliseconds=0)
+ td_base = _Timedelta.__new__(cls, milliseconds=0)
elif reso == NPY_DATETIMEUNIT.NPY_FR_s:
- td_base = _Timedelta.__new__(Timedelta, seconds=0)
+ td_base = _Timedelta.__new__(cls, seconds=0)
# Other resolutions are disabled but cou,ld potentially be implemented here:
# elif reso == NPY_DATETIMEUNIT.NPY_FR_m:
# td_base = _Timedelta.__new__(Timedelta, minutes=int(value))
@@ -1532,7 +1532,7 @@ cdef class _Timedelta(timedelta):
@classmethod
def _from_value_and_reso(cls, int64_t value, NPY_DATETIMEUNIT reso):
# exposing as classmethod for testing
- return _timedelta_from_value_and_reso(value, reso)
+ return _timedelta_from_value_and_reso(cls, value, reso)
def _as_unit(self, str unit, bint round_ok=True):
dtype = np.dtype(f"m8[{unit}]")
@@ -1708,7 +1708,7 @@ class Timedelta(_Timedelta):
if value == NPY_NAT:
return NaT
- return _timedelta_from_value_and_reso(value, NPY_FR_ns)
+ return _timedelta_from_value_and_reso(cls, value, NPY_FR_ns)
def __setstate__(self, state):
if len(state) == 1:
@@ -1800,6 +1800,7 @@ class Timedelta(_Timedelta):
return NaT
return _timedelta_from_value_and_reso(
+ Timedelta,
<int64_t>(other * self.value),
reso=self._reso,
)
diff --git pandas/compat/numpy/__init__.py pandas/compat/numpy/__init__.py
index 60ec74553a207..6f31358dabe86 100644
--- pandas/compat/numpy/__init__.py
+++ pandas/compat/numpy/__init__.py
@@ -9,6 +9,7 @@
np_version_under1p21 = _nlv < Version("1.21")
np_version_under1p22 = _nlv < Version("1.22")
np_version_gte1p22 = _nlv >= Version("1.22")
+np_version_gte1p24 = _nlv >= Version("1.24")
is_numpy_dev = _nlv.dev is not None
_min_numpy_ver = "1.20.3"
diff --git pandas/core/algorithms.py pandas/core/algorithms.py
index 172feb8884b30..9f67b8d4e20cd 100644
--- pandas/core/algorithms.py
+++ pandas/core/algorithms.py
@@ -465,7 +465,11 @@ def isin(comps: AnyArrayLike, values: AnyArrayLike) -> npt.NDArray[np.bool_]:
orig_values = values
values = _ensure_arraylike(list(values))
- if is_numeric_dtype(values) and not is_signed_integer_dtype(comps):
+ if (
+ len(values) > 0
+ and is_numeric_dtype(values)
+ and not is_signed_integer_dtype(comps)
+ ):
# GH#46485 Use object to avoid upcast to float64 later
# TODO: Share with _find_common_type_compat
values = construct_1d_object_array_from_listlike(list(orig_values))
diff --git pandas/core/apply.py pandas/core/apply.py
index 4c535abe585d4..4987a18ae2027 100644
--- pandas/core/apply.py
+++ pandas/core/apply.py
@@ -39,7 +39,10 @@
SpecificationError,
)
from pandas.util._decorators import cache_readonly
-from pandas.util._exceptions import find_stack_level
+from pandas.util._exceptions import (
+ find_stack_level,
+ rewrite_warning,
+)
from pandas.core.dtypes.cast import is_nested_object
from pandas.core.dtypes.common import (
@@ -174,7 +177,15 @@ def agg(self) -> DataFrame | Series | None:
if callable(arg):
f = com.get_cython_func(arg)
if f and not args and not kwargs:
- return getattr(obj, f)()
+ # GH#50538
+ old_msg = "The default value of numeric_only"
+ new_msg = (
+ f"The operation {arg} failed on a column. If any error is "
+ f"raised, this will raise an exception in a future version "
+ f"of pandas. Drop these columns to avoid this warning."
+ )
+ with rewrite_warning(old_msg, FutureWarning, new_msg):
+ return getattr(obj, f)()
# caller can react
return None
@@ -309,7 +320,14 @@ def transform_str_or_callable(self, func) -> DataFrame | Series:
if not args and not kwargs:
f = com.get_cython_func(func)
if f:
- return getattr(obj, f)()
+ old_msg = "The default value of numeric_only"
+ new_msg = (
+ f"The operation {func} failed on a column. If any error is "
+ f"raised, this will raise an exception in a future version "
+ f"of pandas. Drop these columns to avoid this warning."
+ )
+ with rewrite_warning(old_msg, FutureWarning, new_msg):
+ return getattr(obj, f)()
# Two possible ways to use a UDF - apply or call directly
try:
diff --git pandas/core/array_algos/quantile.py pandas/core/array_algos/quantile.py
index 217fbafce719c..d3d9cb1b29b9a 100644
--- pandas/core/array_algos/quantile.py
+++ pandas/core/array_algos/quantile.py
@@ -204,8 +204,10 @@ def _nanpercentile(
result = np.array(result, copy=False).T
if (
result.dtype != values.dtype
+ and not mask.all()
and (result == result.astype(values.dtype, copy=False)).all()
):
+ # mask.all() will never get cast back to int
# e.g. values id integer dtype and result is floating dtype,
# only cast back to integer dtype if result values are all-integer.
result = result.astype(values.dtype, copy=False)
diff --git pandas/core/arrays/interval.py pandas/core/arrays/interval.py
index 8f01dfaf867e7..ea5c6d52f29ba 100644
--- pandas/core/arrays/interval.py
+++ pandas/core/arrays/interval.py
@@ -383,7 +383,8 @@ def _from_factorized(
Left and right bounds for each interval.
closed : {'left', 'right', 'both', 'neither'}, default 'right'
Whether the intervals are closed on the left-side, right-side, both
- or neither.
+ or neither.\
+ %(name)s
copy : bool, default False
Copy the data.
dtype : dtype or None, default None
@@ -408,6 +409,7 @@ def _from_factorized(
_interval_shared_docs["from_breaks"]
% {
"klass": "IntervalArray",
+ "name": "",
"examples": textwrap.dedent(
"""\
Examples
@@ -443,7 +445,8 @@ def from_breaks(
Right bounds for each interval.
closed : {'left', 'right', 'both', 'neither'}, default 'right'
Whether the intervals are closed on the left-side, right-side, both
- or neither.
+ or neither.\
+ %(name)s
copy : bool, default False
Copy the data.
dtype : dtype, optional
@@ -485,6 +488,7 @@ def from_breaks(
_interval_shared_docs["from_arrays"]
% {
"klass": "IntervalArray",
+ "name": "",
"examples": textwrap.dedent(
"""\
>>> pd.arrays.IntervalArray.from_arrays([0, 1, 2], [1, 2, 3])
@@ -520,7 +524,8 @@ def from_arrays(
Array of tuples.
closed : {'left', 'right', 'both', 'neither'}, default 'right'
Whether the intervals are closed on the left-side, right-side, both
- or neither.
+ or neither.\
+ %(name)s
copy : bool, default False
By-default copy the data, this is compat only and ignored.
dtype : dtype or None, default None
@@ -547,6 +552,7 @@ def from_arrays(
_interval_shared_docs["from_tuples"]
% {
"klass": "IntervalArray",
+ "name": "",
"examples": textwrap.dedent(
"""\
Examples
diff --git pandas/core/arrays/masked.py pandas/core/arrays/masked.py
index 7fd4f55940f23..5cdd632d39b3c 100644
--- pandas/core/arrays/masked.py
+++ pandas/core/arrays/masked.py
@@ -1026,6 +1026,11 @@ def _quantile(
raise NotImplementedError
elif self.isna().all():
out_mask = np.ones(res.shape, dtype=bool)
+
+ if is_integer_dtype(self.dtype):
+ # We try to maintain int dtype if possible for not all-na case
+ # as well
+ res = np.zeros(res.shape, dtype=self.dtype.numpy_dtype)
else:
out_mask = np.zeros(res.shape, dtype=bool)
else:
diff --git pandas/core/arrays/sparse/array.py pandas/core/arrays/sparse/array.py
index a389f1f282ee0..62ae6163a073e 100644
--- pandas/core/arrays/sparse/array.py
+++ pandas/core/arrays/sparse/array.py
@@ -728,7 +728,7 @@ def isna(self):
dtype = SparseDtype(bool, self._null_fill_value)
if self._null_fill_value:
return type(self)._simple_new(isna(self.sp_values), self.sp_index, dtype)
- mask = np.full(len(self), False, dtype=np.bool8)
+ mask = np.full(len(self), False, dtype=np.bool_)
mask[self.sp_index.indices] = isna(self.sp_values)
return type(self)(mask, fill_value=False, dtype=dtype)
@@ -1043,7 +1043,7 @@ def __getitem__(
if not key.fill_value:
return self.take(key.sp_index.indices)
n = len(self)
- mask = np.full(n, True, dtype=np.bool8)
+ mask = np.full(n, True, dtype=np.bool_)
mask[key.sp_index.indices] = False
return self.take(np.arange(n)[mask])
else:
diff --git pandas/core/common.py pandas/core/common.py
index 980e7a79414ba..641ddba0222e9 100644
--- pandas/core/common.py
+++ pandas/core/common.py
@@ -242,7 +242,17 @@ def asarray_tuplesafe(values: Iterable, dtype: NpDtype | None = None) -> ArrayLi
if isinstance(values, list) and dtype in [np.object_, object]:
return construct_1d_object_array_from_listlike(values)
- result = np.asarray(values, dtype=dtype)
+ try:
+ with warnings.catch_warnings():
+ # Can remove warning filter once NumPy 1.24 is min version
+ warnings.simplefilter("ignore", np.VisibleDeprecationWarning)
+ result = np.asarray(values, dtype=dtype)
+ except ValueError:
+ # Using try/except since it's more performant than checking is_list_like
+ # over each element
+ # error: Argument 1 to "construct_1d_object_array_from_listlike"
+ # has incompatible type "Iterable[Any]"; expected "Sized"
+ return construct_1d_object_array_from_listlike(values) # type: ignore[arg-type]
if issubclass(result.dtype.type, str):
result = np.asarray(values, dtype=object)
diff --git pandas/core/dtypes/missing.py pandas/core/dtypes/missing.py
index 189ffc3d485bf..e7f57ae0e6658 100644
--- pandas/core/dtypes/missing.py
+++ pandas/core/dtypes/missing.py
@@ -594,6 +594,10 @@ def _array_equivalent_object(left: np.ndarray, right: np.ndarray, strict_nan: bo
if "boolean value of NA is ambiguous" in str(err):
return False
raise
+ except ValueError:
+ # numpy can raise a ValueError if left and right cannot be
+ # compared (e.g. nested arrays)
+ return False
return True
diff --git pandas/core/frame.py pandas/core/frame.py
index b525bf6f57e88..9e0f1363e073c 100644
--- pandas/core/frame.py
+++ pandas/core/frame.py
@@ -86,6 +86,7 @@
function as nv,
np_percentile_argname,
)
+from pandas.errors import InvalidIndexError
from pandas.util._decorators import (
Appender,
Substitution,
@@ -103,6 +104,7 @@
)
from pandas.core.dtypes.cast import (
+ LossySetitemError,
can_hold_element,
construct_1d_arraylike_from_scalar,
construct_2d_arraylike_from_scalar,
@@ -3551,6 +3553,7 @@ def memory_usage(self, index: bool = True, deep: bool = False) -> Series:
result = self._constructor_sliced(
[c.memory_usage(index=False, deep=deep) for col, c in self.items()],
index=self.columns,
+ dtype=np.intp,
)
if index:
index_memory_usage = self._constructor_sliced(
@@ -4206,13 +4209,14 @@ def _set_value(
else:
icol = self.columns.get_loc(col)
iindex = self.index.get_loc(index)
- self._mgr.column_setitem(icol, iindex, value)
+ self._mgr.column_setitem(icol, iindex, value, inplace=True)
self._clear_item_cache()
- except (KeyError, TypeError, ValueError):
+ except (KeyError, TypeError, ValueError, LossySetitemError):
# get_loc might raise a KeyError for missing labels (falling back
# to (i)loc will do expansion of the index)
- # column_setitem will do validation that may raise TypeError or ValueError
+ # column_setitem will do validation that may raise TypeError,
+ # ValueError, or LossySetitemError
# set using a non-recursive method & reset the cache
if takeable:
self.iloc[index, col] = value
@@ -4220,6 +4224,13 @@ def _set_value(
self.loc[index, col] = value
self._item_cache.pop(col, None)
+ except InvalidIndexError as ii_err:
+ # GH48729: Seems like you are trying to assign a value to a
+ # row when only scalar options are permitted
+ raise InvalidIndexError(
+ f"You can only assign a scalar value not a {type(value)}"
+ ) from ii_err
+
def _ensure_valid_index(self, value) -> None:
"""
Ensure that if we don't have an index, that we can create one from the
@@ -8247,7 +8258,9 @@ def update(
if mask.all():
continue
- self.loc[:, col] = expressions.where(mask, this, that)
+ with warnings.catch_warnings():
+ warnings.filterwarnings("ignore", "In a future version, `df.iloc")
+ self.loc[:, col] = expressions.where(mask, this, that)
# ----------------------------------------------------------------------
# Data reshaping
diff --git pandas/core/groupby/generic.py pandas/core/groupby/generic.py
index 97d332394e045..7e6e138fa8fe6 100644
--- pandas/core/groupby/generic.py
+++ pandas/core/groupby/generic.py
@@ -411,7 +411,8 @@ def _wrap_applied_output(
not_indexed_same=not_indexed_same,
override_group_keys=override_group_keys,
)
- result.name = self.obj.name
+ if isinstance(result, Series):
+ result.name = self.obj.name
return result
else:
# GH #6265 #24880
diff --git pandas/core/groupby/groupby.py pandas/core/groupby/groupby.py
index 66c459b90a999..1c3a95b305087 100644
--- pandas/core/groupby/groupby.py
+++ pandas/core/groupby/groupby.py
@@ -8,7 +8,10 @@ class providing the base-class of operations.
"""
from __future__ import annotations
-from contextlib import contextmanager
+from contextlib import (
+ contextmanager,
+ nullcontext,
+)
import datetime
from functools import (
partial,
@@ -64,7 +67,10 @@ class providing the base-class of operations.
cache_readonly,
doc,
)
-from pandas.util._exceptions import find_stack_level
+from pandas.util._exceptions import (
+ find_stack_level,
+ rewrite_warning,
+)
from pandas.core.dtypes.cast import ensure_dtype_can_hold_na
from pandas.core.dtypes.common import (
@@ -982,15 +988,6 @@ def __getattr__(self, attr: str):
f"'{type(self).__name__}' object has no attribute '{attr}'"
)
- def __getattribute__(self, attr: str):
- # Intercept nth to allow both call and index
- if attr == "nth":
- return GroupByNthSelector(self)
- elif attr == "nth_actual":
- return super().__getattribute__("nth")
- else:
- return super().__getattribute__(attr)
-
@final
def _make_wrapper(self, name: str) -> Callable:
assert name in self._apply_allowlist
@@ -1517,7 +1514,9 @@ def _aggregate_with_numba(self, data, func, *args, engine_kwargs=None, **kwargs)
)
)
def apply(self, func, *args, **kwargs) -> NDFrameT:
-
+ # GH#50538
+ is_np_func = func in com._cython_table and func not in com._builtin_table
+ orig_func = func
func = com.is_builtin_func(func)
if isinstance(func, str):
@@ -1555,7 +1554,17 @@ def f(g):
# ignore SettingWithCopy here in case the user mutates
with option_context("mode.chained_assignment", None):
try:
- result = self._python_apply_general(f, self._selected_obj)
+ # GH#50538
+ old_msg = "The default value of numeric_only"
+ new_msg = (
+ f"The operation {orig_func} failed on a column. If any error is "
+ f"raised, this will raise an exception in a future version "
+ f"of pandas. Drop these columns to avoid this warning."
+ )
+ with rewrite_warning(
+ old_msg, FutureWarning, new_msg
+ ) if is_np_func else nullcontext():
+ result = self._python_apply_general(f, self._selected_obj)
except TypeError:
# gh-20949
# try again, with .apply acting as a filtering
@@ -1566,7 +1575,17 @@ def f(g):
# on a string grouper column
with self._group_selection_context():
- return self._python_apply_general(f, self._selected_obj)
+ # GH#50538
+ old_msg = "The default value of numeric_only"
+ new_msg = (
+ f"The operation {orig_func} failed on a column. If any error "
+ f"is raised, this will raise an exception in a future version "
+ f"of pandas. Drop these columns to avoid this warning."
+ )
+ with rewrite_warning(
+ old_msg, FutureWarning, new_msg
+ ) if is_np_func else nullcontext():
+ return self._python_apply_general(f, self._selected_obj)
return result
@@ -1847,7 +1866,10 @@ def _transform(self, func, *args, engine=None, engine_kwargs=None, **kwargs):
# and deal with possible broadcasting below.
# Temporarily set observed for dealing with categoricals.
with com.temp_setattr(self, "observed", True):
- result = getattr(self, func)(*args, **kwargs)
+ with com.temp_setattr(self, "as_index", True):
+ # GH#49834 - result needs groups in the index for
+ # _wrap_transform_fast_result
+ result = getattr(self, func)(*args, **kwargs)
return self._wrap_transform_fast_result(result)
@@ -3015,14 +3037,13 @@ def backfill(self, limit=None):
)
return self.bfill(limit=limit)
- @final
+ # https://github.com/python/mypy/issues/1362
+ # Mypy does not support decorated properties
+ @final # type: ignore[misc]
+ @property
@Substitution(name="groupby")
@Substitution(see_also=_common_see_also)
- def nth(
- self,
- n: PositionalIndexer | tuple,
- dropna: Literal["any", "all", None] = None,
- ) -> NDFrameT:
+ def nth(self) -> GroupByNthSelector:
"""
Take the nth row from each group if n is an int, otherwise a subset of rows.
@@ -3125,6 +3146,13 @@ def nth(
1 1 2.0
4 2 5.0
"""
+ return GroupByNthSelector(self)
+
+ def _nth(
+ self,
+ n: PositionalIndexer | tuple,
+ dropna: Literal["any", "all", None] = None,
+ ) -> NDFrameT:
if not dropna:
with self._group_selection_context():
mask = self._make_mask_from_positional_indexer(n)
diff --git pandas/core/groupby/indexing.py pandas/core/groupby/indexing.py
index be7b7b3369e89..750097b403f26 100644
--- pandas/core/groupby/indexing.py
+++ pandas/core/groupby/indexing.py
@@ -297,7 +297,7 @@ def __call__(
n: PositionalIndexer | tuple,
dropna: Literal["any", "all", None] = None,
) -> DataFrame | Series:
- return self.groupby_object.nth_actual(n, dropna)
+ return self.groupby_object._nth(n, dropna)
def __getitem__(self, n: PositionalIndexer | tuple) -> DataFrame | Series:
- return self.groupby_object.nth_actual(n)
+ return self.groupby_object._nth(n)
diff --git pandas/core/indexes/base.py pandas/core/indexes/base.py
index 5abd04c29e5d4..447ba8d70c73e 100644
--- pandas/core/indexes/base.py
+++ pandas/core/indexes/base.py
@@ -123,7 +123,6 @@
ABCDatetimeIndex,
ABCMultiIndex,
ABCPeriodIndex,
- ABCRangeIndex,
ABCSeries,
ABCTimedeltaIndex,
)
@@ -4213,7 +4212,7 @@ def _validate_positional_slice(self, key: slice) -> None:
self._validate_indexer("positional", key.stop, "iloc")
self._validate_indexer("positional", key.step, "iloc")
- def _convert_slice_indexer(self, key: slice, kind: str_t, is_frame: bool = False):
+ def _convert_slice_indexer(self, key: slice, kind: str_t):
"""
Convert a slice indexer.
@@ -4224,9 +4223,6 @@ def _convert_slice_indexer(self, key: slice, kind: str_t, is_frame: bool = False
----------
key : label of the slice bound
kind : {'loc', 'getitem'}
- is_frame : bool, default False
- Whether this is a slice called on DataFrame.__getitem__
- as opposed to Series.__getitem__
"""
assert kind in ["loc", "getitem"], kind
@@ -4248,46 +4244,7 @@ def is_int(v):
is_positional = is_index_slice and ints_are_positional
if kind == "getitem":
- """
- called from the getitem slicers, validate that we are in fact
- integers
- """
- if self.is_integer():
- if is_frame:
- # unambiguously positional, no deprecation
- pass
- elif start is None and stop is None:
- # label-based vs positional is irrelevant
- pass
- elif isinstance(self, ABCRangeIndex) and self._range == range(
- len(self)
- ):
- # In this case there is no difference between label-based
- # and positional, so nothing will change.
- pass
- elif (
- self.dtype.kind in ["i", "u"]
- and self._is_strictly_monotonic_increasing
- and len(self) > 0
- and self[0] == 0
- and self[-1] == len(self) - 1
- ):
- # We are range-like, e.g. created with Index(np.arange(N))
- pass
- elif not is_index_slice:
- # we're going to raise, so don't bother warning, e.g.
- # test_integer_positional_indexing
- pass
- else:
- warnings.warn(
- "The behavior of `series[i:j]` with an integer-dtype index "
- "is deprecated. In a future version, this will be treated "
- "as *label-based* indexing, consistent with e.g. `series[i]` "
- "lookups. To retain the old behavior, use `series.iloc[i:j]`. "
- "To get the future behavior, use `series.loc[i:j]`.",
- FutureWarning,
- stacklevel=find_stack_level(),
- )
+ # called from the getitem slicers, validate that we are in fact integers
if self.is_integer() or is_index_slice:
# Note: these checks are redundant if we know is_index_slice
self._validate_indexer("slice", key.start, "getitem")
@@ -4701,8 +4658,10 @@ def join(
return self._join_non_unique(other, how=how)
elif not self.is_unique or not other.is_unique:
if self.is_monotonic_increasing and other.is_monotonic_increasing:
- if self._can_use_libjoin:
+ if not is_interval_dtype(self.dtype):
# otherwise we will fall through to _join_via_get_indexer
+ # GH#39133
+ # go through object dtype for ea till engine is supported properly
return self._join_monotonic(other, how=how)
else:
return self._join_non_unique(other, how=how)
@@ -5079,7 +5038,7 @@ def _wrap_joined_index(self: _IndexT, joined: ArrayLike, other: _IndexT) -> _Ind
return self._constructor(joined, name=name) # type: ignore[return-value]
else:
name = get_op_result_name(self, other)
- return self._constructor._with_infer(joined, name=name)
+ return self._constructor._with_infer(joined, name=name, dtype=self.dtype)
@cache_readonly
def _can_use_libjoin(self) -> bool:
diff --git pandas/core/indexes/interval.py pandas/core/indexes/interval.py
index 92331c9777abb..b993af5621923 100644
--- pandas/core/indexes/interval.py
+++ pandas/core/indexes/interval.py
@@ -236,6 +236,11 @@ def __new__(
_interval_shared_docs["from_breaks"]
% {
"klass": "IntervalIndex",
+ "name": textwrap.dedent(
+ """
+ name : str, optional
+ Name of the resulting IntervalIndex."""
+ ),
"examples": textwrap.dedent(
"""\
Examples
@@ -266,6 +271,11 @@ def from_breaks(
_interval_shared_docs["from_arrays"]
% {
"klass": "IntervalIndex",
+ "name": textwrap.dedent(
+ """
+ name : str, optional
+ Name of the resulting IntervalIndex."""
+ ),
"examples": textwrap.dedent(
"""\
Examples
@@ -297,6 +307,11 @@ def from_arrays(
_interval_shared_docs["from_tuples"]
% {
"klass": "IntervalIndex",
+ "name": textwrap.dedent(
+ """
+ name : str, optional
+ Name of the resulting IntervalIndex."""
+ ),
"examples": textwrap.dedent(
"""\
Examples
@@ -764,7 +779,7 @@ def _index_as_unique(self) -> bool:
"cannot handle overlapping indices; use IntervalIndex.get_indexer_non_unique"
)
- def _convert_slice_indexer(self, key: slice, kind: str, is_frame: bool = False):
+ def _convert_slice_indexer(self, key: slice, kind: str):
if not (key.step is None or key.step == 1):
# GH#31658 if label-based, we require step == 1,
# if positional, we disallow float start/stop
@@ -776,7 +791,7 @@ def _convert_slice_indexer(self, key: slice, kind: str, is_frame: bool = False):
# i.e. this cannot be interpreted as a positional slice
raise ValueError(msg)
- return super()._convert_slice_indexer(key, kind, is_frame=is_frame)
+ return super()._convert_slice_indexer(key, kind)
@cache_readonly
def _should_fallback_to_positional(self) -> bool:
diff --git pandas/core/indexes/numeric.py pandas/core/indexes/numeric.py
index a597bea0eb724..fe11a02eccb3c 100644
--- pandas/core/indexes/numeric.py
+++ pandas/core/indexes/numeric.py
@@ -219,7 +219,7 @@ def _should_fallback_to_positional(self) -> bool:
return False
@doc(Index._convert_slice_indexer)
- def _convert_slice_indexer(self, key: slice, kind: str, is_frame: bool = False):
+ def _convert_slice_indexer(self, key: slice, kind: str):
# TODO(2.0): once #45324 deprecation is enforced we should be able
# to simplify this.
if is_float_dtype(self.dtype):
@@ -231,7 +231,7 @@ def _convert_slice_indexer(self, key: slice, kind: str, is_frame: bool = False):
# translate to locations
return self.slice_indexer(key.start, key.stop, key.step)
- return super()._convert_slice_indexer(key, kind=kind, is_frame=is_frame)
+ return super()._convert_slice_indexer(key, kind=kind)
@doc(Index._maybe_cast_slice_bound)
def _maybe_cast_slice_bound(self, label, side: str, kind=lib.no_default):
diff --git pandas/core/indexing.py pandas/core/indexing.py
index 913aa2e5b0e18..dd06d9bee4428 100644
--- pandas/core/indexing.py
+++ pandas/core/indexing.py
@@ -2026,7 +2026,7 @@ def _setitem_single_column(self, loc: int, value, plane_indexer):
"array. To retain the old behavior, use either "
"`df[df.columns[i]] = newvals` or, if columns are non-unique, "
"`df.isetitem(i, newvals)`",
- FutureWarning,
+ DeprecationWarning,
stacklevel=find_stack_level(),
)
# TODO: how to get future behavior?
@@ -2491,7 +2491,7 @@ def convert_to_index_sliceable(obj: DataFrame, key):
"""
idx = obj.index
if isinstance(key, slice):
- return idx._convert_slice_indexer(key, kind="getitem", is_frame=True)
+ return idx._convert_slice_indexer(key, kind="getitem")
elif isinstance(key, str):
diff --git pandas/core/interchange/column.py pandas/core/interchange/column.py
index dc24c928d1f39..359e2fa0b7ab2 100644
--- pandas/core/interchange/column.py
+++ pandas/core/interchange/column.py
@@ -315,7 +315,7 @@ def _get_validity_buffer(self) -> tuple[PandasBuffer, Any]:
valid = invalid == 0
invalid = not valid
- mask = np.zeros(shape=(len(buf),), dtype=np.bool8)
+ mask = np.zeros(shape=(len(buf),), dtype=np.bool_)
for i, obj in enumerate(buf):
mask[i] = valid if isinstance(obj, str) else invalid
diff --git pandas/core/interchange/dataframe_protocol.py pandas/core/interchange/dataframe_protocol.py
index 3ab87d9a60399..2cfdee5517ece 100644
--- pandas/core/interchange/dataframe_protocol.py
+++ pandas/core/interchange/dataframe_protocol.py
@@ -213,7 +213,6 @@ class Column(ABC):
doesn't need its own version or ``__column__`` protocol.
"""
- @property
@abstractmethod
def size(self) -> int:
"""
diff --git pandas/core/interchange/from_dataframe.py pandas/core/interchange/from_dataframe.py
index 4602819b4834a..bec66e414875c 100644
--- pandas/core/interchange/from_dataframe.py
+++ pandas/core/interchange/from_dataframe.py
@@ -155,7 +155,7 @@ def primitive_column_to_ndarray(col: Column) -> tuple[np.ndarray, Any]:
buffers = col.get_buffers()
data_buff, data_dtype = buffers["data"]
- data = buffer_to_ndarray(data_buff, data_dtype, col.offset, col.size)
+ data = buffer_to_ndarray(data_buff, data_dtype, col.offset, col.size())
data = set_nulls(data, col, buffers["validity"])
return data, buffers
@@ -187,7 +187,7 @@ def categorical_column_to_series(col: Column) -> tuple[pd.Series, Any]:
buffers = col.get_buffers()
codes_buff, codes_dtype = buffers["data"]
- codes = buffer_to_ndarray(codes_buff, codes_dtype, col.offset, col.size)
+ codes = buffer_to_ndarray(codes_buff, codes_dtype, col.offset, col.size())
# Doing module in order to not get ``IndexError`` for
# out-of-bounds sentinel values in `codes`
@@ -244,29 +244,29 @@ def string_column_to_ndarray(col: Column) -> tuple[np.ndarray, Any]:
Endianness.NATIVE,
)
# Specify zero offset as we don't want to chunk the string data
- data = buffer_to_ndarray(data_buff, data_dtype, offset=0, length=col.size)
+ data = buffer_to_ndarray(data_buff, data_dtype, offset=0, length=col.size())
# Retrieve the offsets buffer containing the index offsets demarcating
# the beginning and the ending of each string
offset_buff, offset_dtype = buffers["offsets"]
# Offsets buffer contains start-stop positions of strings in the data buffer,
- # meaning that it has more elements than in the data buffer, do `col.size + 1` here
- # to pass a proper offsets buffer size
+ # meaning that it has more elements than in the data buffer, do `col.size() + 1`
+ # here to pass a proper offsets buffer size
offsets = buffer_to_ndarray(
- offset_buff, offset_dtype, col.offset, length=col.size + 1
+ offset_buff, offset_dtype, col.offset, length=col.size() + 1
)
null_pos = None
if null_kind in (ColumnNullType.USE_BITMASK, ColumnNullType.USE_BYTEMASK):
assert buffers["validity"], "Validity buffers cannot be empty for masks"
valid_buff, valid_dtype = buffers["validity"]
- null_pos = buffer_to_ndarray(valid_buff, valid_dtype, col.offset, col.size)
+ null_pos = buffer_to_ndarray(valid_buff, valid_dtype, col.offset, col.size())
if sentinel_val == 0:
null_pos = ~null_pos
# Assemble the strings from the code units
- str_list: list[None | float | str] = [None] * col.size
- for i in range(col.size):
+ str_list: list[None | float | str] = [None] * col.size()
+ for i in range(col.size()):
# Check for missing values
if null_pos is not None and null_pos[i]:
str_list[i] = np.nan
@@ -349,7 +349,7 @@ def datetime_column_to_ndarray(col: Column) -> tuple[np.ndarray, Any]:
Endianness.NATIVE,
),
col.offset,
- col.size,
+ col.size(),
)
data = parse_datetime_format_str(format_str, data)
@@ -501,7 +501,7 @@ def set_nulls(
elif null_kind in (ColumnNullType.USE_BITMASK, ColumnNullType.USE_BYTEMASK):
assert validity, "Expected to have a validity buffer for the mask"
valid_buff, valid_dtype = validity
- null_pos = buffer_to_ndarray(valid_buff, valid_dtype, col.offset, col.size)
+ null_pos = buffer_to_ndarray(valid_buff, valid_dtype, col.offset, col.size())
if sentinel_val == 0:
null_pos = ~null_pos
elif null_kind in (ColumnNullType.NON_NULLABLE, ColumnNullType.USE_NAN):
diff --git pandas/core/internals/array_manager.py pandas/core/internals/array_manager.py
index fd156ccfc8b31..262ed06fdba02 100644
--- pandas/core/internals/array_manager.py
+++ pandas/core/internals/array_manager.py
@@ -874,7 +874,9 @@ def iset(
self.arrays[mgr_idx] = value_arr
return
- def column_setitem(self, loc: int, idx: int | slice | np.ndarray, value) -> None:
+ def column_setitem(
+ self, loc: int, idx: int | slice | np.ndarray, value, inplace: bool = False
+ ) -> None:
"""
Set values ("setitem") into a single column (not setting the full column).
@@ -885,9 +887,12 @@ def column_setitem(self, loc: int, idx: int | slice | np.ndarray, value) -> None
raise TypeError("The column index should be an integer")
arr = self.arrays[loc]
mgr = SingleArrayManager([arr], [self._axes[0]])
- new_mgr = mgr.setitem((idx,), value)
- # update existing ArrayManager in-place
- self.arrays[loc] = new_mgr.arrays[0]
+ if inplace:
+ mgr.setitem_inplace(idx, value)
+ else:
+ new_mgr = mgr.setitem((idx,), value)
+ # update existing ArrayManager in-place
+ self.arrays[loc] = new_mgr.arrays[0]
def insert(self, loc: int, item: Hashable, value: ArrayLike) -> None:
"""
diff --git pandas/core/internals/blocks.py pandas/core/internals/blocks.py
index 9c6b3e506b1d4..5e95f83ddfd08 100644
--- pandas/core/internals/blocks.py
+++ pandas/core/internals/blocks.py
@@ -569,7 +569,6 @@ def replace(
# Note: the checks we do in NDFrame.replace ensure we never get
# here with listlike to_replace or value, as those cases
# go through replace_list
-
values = self.values
if isinstance(values, Categorical):
@@ -608,7 +607,10 @@ def replace(
return blocks
elif self.ndim == 1 or self.shape[0] == 1:
- blk = self.coerce_to_target_dtype(value)
+ if value is None:
+ blk = self.astype(np.dtype(object))
+ else:
+ blk = self.coerce_to_target_dtype(value)
return blk.replace(
to_replace=to_replace,
value=value,
diff --git pandas/core/internals/managers.py pandas/core/internals/managers.py
index 881cea45bdb34..5ab4aef7462ee 100644
--- pandas/core/internals/managers.py
+++ pandas/core/internals/managers.py
@@ -15,7 +15,7 @@
import numpy as np
-from pandas._config import get_option
+from pandas._config.config import _global_config
from pandas._libs import (
algos as libalgos,
@@ -55,6 +55,7 @@
import pandas.core.algorithms as algos
from pandas.core.arrays._mixins import NDArrayBackedExtensionArray
from pandas.core.arrays.sparse import SparseDtype
+import pandas.core.common as com
from pandas.core.construction import (
ensure_wrapped_if_datetimelike,
extract_array,
@@ -146,6 +147,7 @@ class BaseBlockManager(DataManager):
blocks: tuple[Block, ...]
axes: list[Index]
refs: list[weakref.ref | None] | None
+ parent: object
@property
def ndim(self) -> int:
@@ -163,6 +165,7 @@ def from_blocks(
blocks: list[Block],
axes: list[Index],
refs: list[weakref.ref | None] | None = None,
+ parent: object = None,
) -> T:
raise NotImplementedError
@@ -262,6 +265,8 @@ def _clear_reference_block(self, blkno: int) -> None:
"""
if self.refs is not None:
self.refs[blkno] = None
+ if com.all_none(*self.refs):
+ self.parent = None
def get_dtypes(self):
dtypes = np.array([blk.dtype for blk in self.blocks])
@@ -388,10 +393,8 @@ def setitem(self: T, indexer, value) -> T:
return self.apply("setitem", indexer=indexer, value=value)
def putmask(self, mask, new, align: bool = True):
- if (
- _using_copy_on_write()
- and self.refs is not None
- and not all(ref is None for ref in self.refs)
+ if _using_copy_on_write() and any(
+ not self._has_no_reference_block(i) for i in range(len(self.blocks))
):
# some reference -> copy full dataframe
# TODO(CoW) this could be optimized to only copy the blocks that would
@@ -602,7 +605,9 @@ def _combine(
axes[-1] = index
axes[0] = self.items.take(indexer)
- return type(self).from_blocks(new_blocks, axes, new_refs)
+ return type(self).from_blocks(
+ new_blocks, axes, new_refs, parent=None if copy else self
+ )
@property
def nblocks(self) -> int:
@@ -645,11 +650,14 @@ def copy_func(ax):
new_refs: list[weakref.ref | None] | None
if deep:
new_refs = None
+ parent = None
else:
new_refs = [weakref.ref(blk) for blk in self.blocks]
+ parent = self
res.axes = new_axes
res.refs = new_refs
+ res.parent = parent
if self.ndim > 1:
# Avoid needing to re-compute these
@@ -738,6 +746,7 @@ def reindex_indexer(
only_slice=only_slice,
use_na_proxy=use_na_proxy,
)
+ parent = None if com.all_none(*new_refs) else self
else:
new_blocks = [
blk.take_nd(
@@ -750,11 +759,12 @@ def reindex_indexer(
for blk in self.blocks
]
new_refs = None
+ parent = None
new_axes = list(self.axes)
new_axes[axis] = new_axis
- new_mgr = type(self).from_blocks(new_blocks, new_axes, new_refs)
+ new_mgr = type(self).from_blocks(new_blocks, new_axes, new_refs, parent=parent)
if axis == 1:
# We can avoid the need to rebuild these
new_mgr._blknos = self.blknos.copy()
@@ -989,6 +999,7 @@ def __init__(
blocks: Sequence[Block],
axes: Sequence[Index],
refs: list[weakref.ref | None] | None = None,
+ parent: object = None,
verify_integrity: bool = True,
) -> None:
@@ -1053,11 +1064,13 @@ def from_blocks(
blocks: list[Block],
axes: list[Index],
refs: list[weakref.ref | None] | None = None,
+ parent: object = None,
) -> BlockManager:
"""
Constructor for BlockManager and SingleBlockManager with same signature.
"""
- return cls(blocks, axes, refs, verify_integrity=False)
+ parent = parent if _using_copy_on_write() else None
+ return cls(blocks, axes, refs, parent, verify_integrity=False)
# ----------------------------------------------------------------
# Indexing
@@ -1079,7 +1092,7 @@ def fast_xs(self, loc: int) -> SingleBlockManager:
block = new_block(result, placement=slice(0, len(result)), ndim=1)
# in the case of a single block, the new block is a view
ref = weakref.ref(self.blocks[0])
- return SingleBlockManager(block, self.axes[0], [ref])
+ return SingleBlockManager(block, self.axes[0], [ref], parent=self)
dtype = interleaved_dtype([blk.dtype for blk in self.blocks])
@@ -1113,7 +1126,7 @@ def fast_xs(self, loc: int) -> SingleBlockManager:
block = new_block(result, placement=slice(0, len(result)), ndim=1)
return SingleBlockManager(block, self.axes[0])
- def iget(self, i: int) -> SingleBlockManager:
+ def iget(self, i: int, track_ref: bool = True) -> SingleBlockManager:
"""
Return the data as a SingleBlockManager.
"""
@@ -1123,7 +1136,9 @@ def iget(self, i: int) -> SingleBlockManager:
# shortcut for select a single-dim from a 2-dim BM
bp = BlockPlacement(slice(0, len(values)))
nb = type(block)(values, placement=bp, ndim=1)
- return SingleBlockManager(nb, self.axes[1], [weakref.ref(block)])
+ ref = weakref.ref(block) if track_ref else None
+ parent = self if track_ref else None
+ return SingleBlockManager(nb, self.axes[1], [ref], parent)
def iget_values(self, i: int) -> ArrayLike:
"""
@@ -1350,7 +1365,9 @@ def _iset_single(
self._clear_reference_block(blkno)
return
- def column_setitem(self, loc: int, idx: int | slice | np.ndarray, value) -> None:
+ def column_setitem(
+ self, loc: int, idx: int | slice | np.ndarray, value, inplace: bool = False
+ ) -> None:
"""
Set values ("setitem") into a single column (not setting the full column).
@@ -1365,9 +1382,14 @@ def column_setitem(self, loc: int, idx: int | slice | np.ndarray, value) -> None
self.blocks = tuple(blocks)
self._clear_reference_block(blkno)
- col_mgr = self.iget(loc)
- new_mgr = col_mgr.setitem((idx,), value)
- self.iset(loc, new_mgr._block.values, inplace=True)
+ # this manager is only created temporarily to mutate the values in place
+ # so don't track references, otherwise the `setitem` would perform CoW again
+ col_mgr = self.iget(loc, track_ref=False)
+ if inplace:
+ col_mgr.setitem_inplace(idx, value)
+ else:
+ new_mgr = col_mgr.setitem((idx,), value)
+ self.iset(loc, new_mgr._block.values, inplace=True)
def insert(self, loc: int, item: Hashable, value: ArrayLike) -> None:
"""
@@ -1463,7 +1485,9 @@ def idelete(self, indexer) -> BlockManager:
nbs, new_refs = self._slice_take_blocks_ax0(taker, only_slice=True)
new_columns = self.items[~is_deleted]
axes = [new_columns, self.axes[1]]
- return type(self)(tuple(nbs), axes, new_refs, verify_integrity=False)
+ # TODO this might not be needed (can a delete ever be done in chained manner?)
+ parent = None if com.all_none(*new_refs) else self
+ return type(self)(tuple(nbs), axes, new_refs, parent, verify_integrity=False)
# ----------------------------------------------------------------
# Block-wise Operation
@@ -1869,6 +1893,7 @@ def __init__(
block: Block,
axis: Index,
refs: list[weakref.ref | None] | None = None,
+ parent: object = None,
verify_integrity: bool = False,
fastpath=lib.no_default,
) -> None:
@@ -1887,6 +1912,7 @@ def __init__(
self.axes = [axis]
self.blocks = (block,)
self.refs = refs
+ self.parent = parent if _using_copy_on_write() else None
@classmethod
def from_blocks(
@@ -1894,6 +1920,7 @@ def from_blocks(
blocks: list[Block],
axes: list[Index],
refs: list[weakref.ref | None] | None = None,
+ parent: object = None,
) -> SingleBlockManager:
"""
Constructor for BlockManager and SingleBlockManager with same signature.
@@ -1902,7 +1929,7 @@ def from_blocks(
assert len(axes) == 1
if refs is not None:
assert len(refs) == 1
- return cls(blocks[0], axes[0], refs, verify_integrity=False)
+ return cls(blocks[0], axes[0], refs, parent, verify_integrity=False)
@classmethod
def from_array(cls, array: ArrayLike, index: Index) -> SingleBlockManager:
@@ -1922,7 +1949,10 @@ def to_2d_mgr(self, columns: Index) -> BlockManager:
new_blk = type(blk)(arr, placement=bp, ndim=2)
axes = [columns, self.axes[0]]
refs: list[weakref.ref | None] = [weakref.ref(blk)]
- return BlockManager([new_blk], axes=axes, refs=refs, verify_integrity=False)
+ parent = self if _using_copy_on_write() else None
+ return BlockManager(
+ [new_blk], axes=axes, refs=refs, parent=parent, verify_integrity=False
+ )
def _has_no_reference(self, i: int = 0) -> bool:
"""
@@ -2004,7 +2034,7 @@ def getitem_mgr(self, indexer: slice | npt.NDArray[np.bool_]) -> SingleBlockMana
new_idx = self.index[indexer]
# TODO(CoW) in theory only need to track reference if new_array is a view
ref = weakref.ref(blk)
- return type(self)(block, new_idx, [ref])
+ return type(self)(block, new_idx, [ref], parent=self)
def get_slice(self, slobj: slice, axis: int = 0) -> SingleBlockManager:
# Assertion disabled for performance
@@ -2017,7 +2047,9 @@ def get_slice(self, slobj: slice, axis: int = 0) -> SingleBlockManager:
bp = BlockPlacement(slice(0, len(array)))
block = type(blk)(array, placement=bp, ndim=1)
new_index = self.index._getitem_slice(slobj)
- return type(self)(block, new_index, [weakref.ref(blk)])
+ # TODO this method is only used in groupby SeriesSplitter at the moment,
+ # so passing refs / parent is not yet covered by the tests
+ return type(self)(block, new_index, [weakref.ref(blk)], parent=self)
@property
def index(self) -> Index:
@@ -2064,6 +2096,7 @@ def setitem_inplace(self, indexer, value) -> None:
if _using_copy_on_write() and not self._has_no_reference(0):
self.blocks = (self._block.copy(),)
self.refs = None
+ self.parent = None
self._cache.clear()
super().setitem_inplace(indexer, value)
@@ -2080,6 +2113,7 @@ def idelete(self, indexer) -> SingleBlockManager:
self._cache.clear()
# clear reference since delete always results in a new array
self.refs = None
+ self.parent = None
return self
def fast_xs(self, loc):
@@ -2393,5 +2427,8 @@ def _preprocess_slice_or_indexer(
return "fancy", indexer, len(indexer)
+_mode_options = _global_config["mode"]
+
+
def _using_copy_on_write():
- return get_option("mode.copy_on_write")
+ return _mode_options["copy_on_write"]
diff --git pandas/core/ops/__init__.py pandas/core/ops/__init__.py
index e9fefd9268870..cd470b8feff14 100644
--- pandas/core/ops/__init__.py
+++ pandas/core/ops/__init__.py
@@ -334,7 +334,9 @@ def should_reindex_frame_op(
left_uniques = left.columns.unique()
right_uniques = right.columns.unique()
cols = left_uniques.intersection(right_uniques)
- if len(cols) and not (cols.equals(left_uniques) and cols.equals(right_uniques)):
+ if len(cols) and not (
+ len(cols) == len(left_uniques) and len(cols) == len(right_uniques)
+ ):
# TODO: is there a shortcut available when len(cols) == 0?
return True
diff --git pandas/core/reshape/pivot.py pandas/core/reshape/pivot.py
index 66a3425d0d398..7ef58c7836c81 100644
--- pandas/core/reshape/pivot.py
+++ pandas/core/reshape/pivot.py
@@ -21,6 +21,7 @@
Substitution,
deprecate_nonkeyword_arguments,
)
+from pandas.util._exceptions import rewrite_warning
from pandas.core.dtypes.cast import maybe_downcast_to_dtype
from pandas.core.dtypes.common import (
@@ -163,7 +164,18 @@ def __internal_pivot_table(
values = list(values)
grouped = data.groupby(keys, observed=observed, sort=sort)
- agged = grouped.agg(aggfunc)
+ msg = (
+ "pivot_table dropped a column because it failed to aggregate. This behavior "
+ "is deprecated and will raise in a future version of pandas. Select only the "
+ "columns that can be aggregated."
+ )
+ with rewrite_warning(
+ target_message="The default value of numeric_only",
+ target_category=FutureWarning,
+ new_message=msg,
+ ):
+ agged = grouped.agg(aggfunc)
+
if dropna and isinstance(agged, ABCDataFrame) and len(agged.columns):
agged = agged.dropna(how="all")
diff --git pandas/core/window/rolling.py pandas/core/window/rolling.py
index c92c448304de2..1a71b41b0e317 100644
--- pandas/core/window/rolling.py
+++ pandas/core/window/rolling.py
@@ -994,7 +994,7 @@ class Window(BaseWindow):
step : int, default None
- ..versionadded:: 1.5.0
+ .. versionadded:: 1.5.0
Evaluate the window at every ``step`` result, equivalent to slicing as
``[::step]``. ``window`` must be an integer. Using a step argument other
diff --git pandas/io/formats/css.py pandas/io/formats/css.py
index cfc95bc9d9569..34626a0bdfdb7 100644
--- pandas/io/formats/css.py
+++ pandas/io/formats/css.py
@@ -105,9 +105,9 @@ def expand(self, prop, value: str) -> Generator[tuple[str, str], None, None]:
f"border{side}-width": "medium",
}
for token in tokens:
- if token in self.BORDER_STYLES:
+ if token.lower() in self.BORDER_STYLES:
border_declarations[f"border{side}-style"] = token
- elif any([ratio in token for ratio in self.BORDER_WIDTH_RATIOS]):
+ elif any(ratio in token.lower() for ratio in self.BORDER_WIDTH_RATIOS):
border_declarations[f"border{side}-width"] = token
else:
border_declarations[f"border{side}-color"] = token
@@ -181,6 +181,13 @@ class CSSResolver:
"ridge",
"inset",
"outset",
+ "mediumdashdot",
+ "dashdotdot",
+ "hair",
+ "mediumdashdotdot",
+ "dashdot",
+ "slantdashdot",
+ "mediumdashed",
]
SIDE_SHORTHANDS = {
diff --git pandas/io/formats/excel.py pandas/io/formats/excel.py
index ce7f663dd5703..b6e0f271f417b 100644
--- pandas/io/formats/excel.py
+++ pandas/io/formats/excel.py
@@ -159,6 +159,22 @@ class CSSToExcelConverter:
"fantasy": 5, # decorative
}
+ BORDER_STYLE_MAP = {
+ style.lower(): style
+ for style in [
+ "dashed",
+ "mediumDashDot",
+ "dashDotDot",
+ "hair",
+ "dotted",
+ "mediumDashDotDot",
+ "double",
+ "dashDot",
+ "slantDashDot",
+ "mediumDashed",
+ ]
+ }
+
# NB: Most of the methods here could be classmethods, as only __init__
# and __call__ make use of instance attributes. We leave them as
# instancemethods so that users can easily experiment with extensions
@@ -170,10 +186,13 @@ def __init__(self, inherited: str | None = None) -> None:
self.inherited = self.compute_css(inherited)
else:
self.inherited = None
+ # We should avoid lru_cache on the __call__ method.
+ # Otherwise once the method __call__ has been called
+ # garbage collection no longer deletes the instance.
+ self._call_cached = lru_cache(maxsize=None)(self._call_uncached)
compute_css = CSSResolver()
- @lru_cache(maxsize=None)
def __call__(
self, declarations: str | frozenset[tuple[str, str]]
) -> dict[str, dict[str, str]]:
@@ -193,6 +212,11 @@ def __call__(
A style as interpreted by ExcelWriter when found in
ExcelCell.style.
"""
+ return self._call_cached(declarations)
+
+ def _call_uncached(
+ self, declarations: str | frozenset[tuple[str, str]]
+ ) -> dict[str, dict[str, str]]:
properties = self.compute_css(declarations, self.inherited)
return self.build_xlstyle(properties)
@@ -298,6 +322,16 @@ def _border_style(self, style: str | None, width: str | None, color: str | None)
if width_name in ("hair", "thin"):
return "dashed"
return "mediumDashed"
+ elif style in self.BORDER_STYLE_MAP:
+ # Excel-specific styles
+ return self.BORDER_STYLE_MAP[style]
+ else:
+ warnings.warn(
+ f"Unhandled border style format: {repr(style)}",
+ CSSWarning,
+ stacklevel=find_stack_level(),
+ )
+ return "none"
def _get_width_name(self, width_input: str | None) -> str | None:
width = self._width_to_float(width_input)
diff --git pandas/io/formats/printing.py pandas/io/formats/printing.py
index 77431533e703a..abb341f6385c1 100644
--- pandas/io/formats/printing.py
+++ pandas/io/formats/printing.py
@@ -259,10 +259,14 @@ def enable_data_resource_formatter(enable: bool) -> None:
if mimetype not in formatters:
# define tableschema formatter
from IPython.core.formatters import BaseFormatter
+ from traitlets import ObjectName
class TableSchemaFormatter(BaseFormatter):
- print_method = "_repr_data_resource_"
- _return_type = (dict,)
+ print_method = ObjectName("_repr_data_resource_")
+ # Incompatible types in assignment (expression has type
+ # "Tuple[Type[Dict[Any, Any]]]", base class "BaseFormatter"
+ # defined the type as "Type[str]")
+ _return_type = (dict,) # type: ignore[assignment]
# register it:
formatters[mimetype] = TableSchemaFormatter()
diff --git pandas/io/formats/style.py pandas/io/formats/style.py
index 9c3e4f0bb02fb..59c586e0305c6 100644
--- pandas/io/formats/style.py
+++ pandas/io/formats/style.py
@@ -316,6 +316,12 @@ def concat(self, other: Styler) -> Styler:
inherited from the original Styler and not ``other``.
- hidden columns and hidden index levels will be inherited from the
original Styler
+ - ``css`` will be inherited from the original Styler, and the value of
+ keys ``data``, ``row_heading`` and ``row`` will be prepended with
+ ``foot0_``. If more concats are chained, their styles will be prepended
+ with ``foot1_``, ''foot_2'', etc., and if a concatenated style have
+ another concatanated style, the second style will be prepended with
+ ``foot{parent}_foot{child}_``.
A common use case is to concatenate user defined functions with
``DataFrame.agg`` or with described statistics via ``DataFrame.describe``.
@@ -367,7 +373,7 @@ def concat(self, other: Styler) -> Styler:
"number of index levels must be same in `other` "
"as in `Styler`. See documentation for suggestions."
)
- self.concatenated = other
+ self.concatenated.append(other)
return self
def _repr_html_(self) -> str | None:
@@ -3927,7 +3933,15 @@ def _background_gradient(
rng = smax - smin
# extend lower / upper bounds, compresses color range
norm = mpl.colors.Normalize(smin - (rng * low), smax + (rng * high))
- rgbas = plt.cm.get_cmap(cmap)(norm(gmap))
+ from pandas.plotting._matplotlib.compat import mpl_ge_3_6_0
+
+ if mpl_ge_3_6_0():
+ if cmap is None:
+ rgbas = mpl.colormaps[mpl.rcParams["image.cmap"]](norm(gmap))
+ else:
+ rgbas = mpl.colormaps.get_cmap(cmap)(norm(gmap))
+ else:
+ rgbas = plt.cm.get_cmap(cmap)(norm(gmap))
def relative_luminance(rgba) -> float:
"""
diff --git pandas/io/formats/style_render.py pandas/io/formats/style_render.py
index 7631ae2405585..cc3c77fcec692 100644
--- pandas/io/formats/style_render.py
+++ pandas/io/formats/style_render.py
@@ -119,7 +119,7 @@ def __init__(
"blank": "blank",
"foot": "foot",
}
- self.concatenated: StylerRenderer | None = None
+ self.concatenated: list[StylerRenderer] = []
# add rendering variables
self.hide_index_names: bool = False
self.hide_column_names: bool = False
@@ -161,27 +161,34 @@ def _render(
stylers for use within `_translate_latex`
"""
self._compute()
- dx = None
- if self.concatenated is not None:
- self.concatenated.hide_index_ = self.hide_index_
- self.concatenated.hidden_columns = self.hidden_columns
- self.concatenated.css = {
+ dxs = []
+ ctx_len = len(self.index)
+ for i, concatenated in enumerate(self.concatenated):
+ concatenated.hide_index_ = self.hide_index_
+ concatenated.hidden_columns = self.hidden_columns
+ foot = f"{self.css['foot']}{i}"
+ concatenated.css = {
**self.css,
- "data": f"{self.css['foot']}_{self.css['data']}",
- "row_heading": f"{self.css['foot']}_{self.css['row_heading']}",
- "row": f"{self.css['foot']}_{self.css['row']}",
- "foot": self.css["foot"],
+ "data": f"{foot}_data",
+ "row_heading": f"{foot}_row_heading",
+ "row": f"{foot}_row",
+ "foot": f"{foot}_foot",
}
- dx = self.concatenated._render(
+ dx = concatenated._render(
sparse_index, sparse_columns, max_rows, max_cols, blank
)
+ dxs.append(dx)
- for (r, c), v in self.concatenated.ctx.items():
- self.ctx[(r + len(self.index), c)] = v
- for (r, c), v in self.concatenated.ctx_index.items():
- self.ctx_index[(r + len(self.index), c)] = v
+ for (r, c), v in concatenated.ctx.items():
+ self.ctx[(r + ctx_len, c)] = v
+ for (r, c), v in concatenated.ctx_index.items():
+ self.ctx_index[(r + ctx_len, c)] = v
- d = self._translate(sparse_index, sparse_columns, max_rows, max_cols, blank, dx)
+ ctx_len += len(concatenated.index)
+
+ d = self._translate(
+ sparse_index, sparse_columns, max_rows, max_cols, blank, dxs
+ )
return d
def _render_html(
@@ -258,7 +265,7 @@ def _translate(
max_rows: int | None = None,
max_cols: int | None = None,
blank: str = " ",
- dx: dict | None = None,
+ dxs: list[dict] | None = None,
):
"""
Process Styler data and settings into a dict for template rendering.
@@ -278,8 +285,8 @@ def _translate(
Specific max rows and cols. max_elements always take precedence in render.
blank : str
Entry to top-left blank cells.
- dx : dict
- The render dict of the concatenated Styler.
+ dxs : list[dict]
+ The render dicts of the concatenated Stylers.
Returns
-------
@@ -287,6 +294,8 @@ def _translate(
The following structure: {uuid, table_styles, caption, head, body,
cellstyle, table_attributes}
"""
+ if dxs is None:
+ dxs = []
self.css["blank_value"] = blank
# construct render dict
@@ -340,10 +349,12 @@ def _translate(
]
d.update({k: map})
- if dx is not None: # self.concatenated is not None
+ for dx in dxs: # self.concatenated is not empty
d["body"].extend(dx["body"]) # type: ignore[union-attr]
d["cellstyle"].extend(dx["cellstyle"]) # type: ignore[union-attr]
- d["cellstyle_index"].extend(dx["cellstyle"]) # type: ignore[union-attr]
+ d["cellstyle_index"].extend( # type: ignore[union-attr]
+ dx["cellstyle_index"]
+ )
table_attr = self.table_attributes
if not get_option("styler.html.mathjax"):
@@ -847,23 +858,27 @@ def _translate_latex(self, d: dict, clines: str | None) -> None:
for r, row in enumerate(d["head"])
]
- def concatenated_visible_rows(obj, n, row_indices):
+ def _concatenated_visible_rows(obj, n, row_indices):
"""
Extract all visible row indices recursively from concatenated stylers.
"""
row_indices.extend(
[r + n for r in range(len(obj.index)) if r not in obj.hidden_rows]
)
- return (
- row_indices
- if obj.concatenated is None
- else concatenated_visible_rows(
- obj.concatenated, n + len(obj.index), row_indices
- )
- )
+ n += len(obj.index)
+ for concatenated in obj.concatenated:
+ n = _concatenated_visible_rows(concatenated, n, row_indices)
+ return n
+
+ def concatenated_visible_rows(obj):
+ row_indices: list[int] = []
+ _concatenated_visible_rows(obj, 0, row_indices)
+ # TODO try to consolidate the concat visible rows
+ # methods to a single function / recursion for simplicity
+ return row_indices
body = []
- for r, row in zip(concatenated_visible_rows(self, 0, []), d["body"]):
+ for r, row in zip(concatenated_visible_rows(self), d["body"]):
# note: cannot enumerate d["body"] because rows were dropped if hidden
# during _translate_body so must zip to acquire the true r-index associated
# with the ctx obj which contains the cell styles.
diff --git pandas/plotting/_core.py pandas/plotting/_core.py
index 9c418ea7cf30b..be77de2c5a430 100644
--- pandas/plotting/_core.py
+++ pandas/plotting/_core.py
@@ -1643,11 +1643,6 @@ def scatter(self, x, y, s=None, c=None, **kwargs) -> PlotAccessor:
.. versionchanged:: 1.1.0
- size : str, scalar or array-like, optional
- Alias for s.
-
- .. versionadded:: 1.5.0
-
c : str, int or array-like, optional
The color of each point. Possible values are:
@@ -1661,10 +1656,6 @@ def scatter(self, x, y, s=None, c=None, **kwargs) -> PlotAccessor:
- A column name or position whose values will be used to color the
marker points according to a colormap.
- color : str, int or array-like, optional
- Alias for c.
-
- .. versionadded:: 1.5.0
**kwargs
Keyword arguments to pass on to :meth:`DataFrame.plot`.
@@ -1703,19 +1694,7 @@ def scatter(self, x, y, s=None, c=None, **kwargs) -> PlotAccessor:
... c='species',
... colormap='viridis')
"""
- size = kwargs.pop("size", None)
- if s is not None and size is not None:
- raise TypeError("Specify exactly one of `s` and `size`")
- elif s is not None or size is not None:
- kwargs["s"] = s if s is not None else size
-
- color = kwargs.pop("color", None)
- if c is not None and color is not None:
- raise TypeError("Specify exactly one of `c` and `color`")
- elif c is not None or color is not None:
- kwargs["c"] = c if c is not None else color
-
- return self(kind="scatter", x=x, y=y, **kwargs)
+ return self(kind="scatter", x=x, y=y, s=s, c=c, **kwargs)
def hexbin(
self, x, y, C=None, reduce_C_function=None, gridsize=None, **kwargs
diff --git pandas/plotting/_matplotlib/core.py pandas/plotting/_matplotlib/core.py
index 1302413916d58..af91a8ab83e12 100644
--- pandas/plotting/_matplotlib/core.py
+++ pandas/plotting/_matplotlib/core.py
@@ -1222,7 +1222,7 @@ def _make_plot(self):
if self.colormap is not None:
if mpl_ge_3_6_0():
- cmap = mpl.colormaps[se,lf.colormap]
+ cmap = mpl.colormaps.get_cmap(self.colormap)
else:
cmap = self.plt.cm.get_cmap(self.colormap)
else:
@@ -1302,7 +1302,7 @@ def _make_plot(self):
# pandas uses colormap, matplotlib uses cmap.
cmap = self.colormap or "BuGn"
if mpl_ge_3_6_0():
- cmap = mpl.colormaps[cmap]
+ cmap = mpl.colormaps.get_cmap(cmap)
else:
cmap = self.plt.cm.get_cmap(cmap)
cb = self.kwds.pop("colorbar", True)
diff --git pandas/plotting/_misc.py pandas/plotting/_misc.py
index 71209e1598d9a..b7e6fca88b637 100644
--- pandas/plotting/_misc.py
+++ pandas/plotting/_misc.py
@@ -307,7 +307,7 @@ def andrews_curves(
:context: close-figs
>>> df = pd.read_csv(
- ... 'https://raw.github.com/pandas-dev/'
+ ... 'https://raw.githubusercontent.com/pandas-dev/'
... 'pandas/main/pandas/tests/io/data/csv/iris.csv'
... )
>>> pd.plotting.andrews_curves(df, 'Name')
@@ -439,7 +439,7 @@ def parallel_coordinates(
:context: close-figs
>>> df = pd.read_csv(
- ... 'https://raw.github.com/pandas-dev/'
+ ... 'https://raw.githubusercontent.com/pandas-dev/'
... 'pandas/main/pandas/tests/io/data/csv/iris.csv'
... )
>>> pd.plotting.parallel_coordinates(
diff --git pandas/tests/apply/test_frame_apply.py pandas/tests/apply/test_frame_apply.py
index 3bcb7d964fad1..faa89e556a01e 100644
--- pandas/tests/apply/test_frame_apply.py
+++ pandas/tests/apply/test_frame_apply.py
@@ -1287,6 +1287,27 @@ def test_nuiscance_columns():
tm.assert_frame_equal(result, expected)
+@pytest.mark.parametrize("method", ["agg", "apply", "transform"])
+def test_numeric_only_warning_numpy(method):
+ # GH#50538
+ df = DataFrame({"a": [1, 1, 2], "b": list("xyz")})
+ if method == "agg":
+ msg = "The operation <function mean.*failed"
+ with tm.assert_produces_warning(FutureWarning, match=msg):
+ getattr(df, method)(np.mean)
+ # Ensure users can't pass numeric_only
+ with pytest.raises(TypeError, match="got an unexpected keyword argument"):
+ getattr(df, method)(np.mean, numeric_only=True)
+ elif method == "apply":
+ with pytest.raises(TypeError, match="Could not convert"):
+ getattr(df, method)(np.mean)
+ else:
+ with pytest.raises(ValueError, match="Function did not transform"):
+ msg = "The operation <function mean.*failed"
+ with tm.assert_produces_warning(FutureWarning, match=msg):
+ getattr(df, method)(np.mean)
+
+
@pytest.mark.parametrize("how", ["agg", "apply"])
def test_non_callable_aggregates(how):
diff --git pandas/tests/arrays/sparse/test_indexing.py pandas/tests/arrays/sparse/test_indexing.py
index 7ea36ed041f44..311a8a04e5b91 100644
--- pandas/tests/arrays/sparse/test_indexing.py
+++ pandas/tests/arrays/sparse/test_indexing.py
@@ -85,7 +85,7 @@ def test_boolean_slice_empty(self):
def test_getitem_bool_sparse_array(self):
# GH 23122
- spar_bool = SparseArray([False, True] * 5, dtype=np.bool8, fill_value=True)
+ spar_bool = SparseArray([False, True] * 5, dtype=np.bool_, fill_value=True)
exp = SparseArray([np.nan, 2, np.nan, 5, 6])
tm.assert_sp_array_equal(arr[spar_bool], exp)
@@ -95,7 +95,7 @@ def test_getitem_bool_sparse_array(self):
tm.assert_sp_array_equal(res, exp)
spar_bool = SparseArray(
- [False, True, np.nan] * 3, dtype=np.bool8, fill_value=np.nan
+ [False, True, np.nan] * 3, dtype=np.bool_, fill_value=np.nan
)
res = arr[spar_bool]
exp = SparseArray([np.nan, 3, 5])
diff --git pandas/tests/arrays/sparse/test_reductions.py pandas/tests/arrays/sparse/test_reductions.py
index 2dd80c52f1419..5d6d65dde69ad 100644
--- pandas/tests/arrays/sparse/test_reductions.py
+++ pandas/tests/arrays/sparse/test_reductions.py
@@ -142,7 +142,7 @@ def test_sum_min_count(self, arr, fill_value, min_count, expected):
assert result == expected
def test_bool_sum_min_count(self):
- spar_bool = SparseArray([False, True] * 5, dtype=np.bool8, fill_value=True)
+ spar_bool = SparseArray([False, True] * 5, dtype=np.bool_, fill_value=True)
res = spar_bool.sum(min_count=1)
assert res == 5
res = spar_bool.sum(min_count=11)
diff --git pandas/tests/arrays/sparse/test_unary.py pandas/tests/arrays/sparse/test_unary.py
index a34c3b0787753..605023a407a06 100644
--- pandas/tests/arrays/sparse/test_unary.py
+++ pandas/tests/arrays/sparse/test_unary.py
@@ -59,9 +59,9 @@ def test_abs_operator(self):
tm.assert_sp_array_equal(exp, res)
def test_invert_operator(self):
- arr = SparseArray([False, True, False, True], fill_value=False, dtype=np.bool8)
+ arr = SparseArray([False, True, False, True], fill_value=False, dtype=np.bool_)
exp = SparseArray(
- np.invert([False, True, False, True]), fill_value=True, dtype=np.bool8
+ np.invert([False, True, False, True]), fill_value=True, dtype=np.bool_
)
res = ~arr
tm.assert_sp_array_equal(exp, res)
diff --git pandas/tests/copy_view/test_indexing.py pandas/tests/copy_view/test_indexing.py
index d917a3c79aa97..444c6ff204b88 100644
--- pandas/tests/copy_view/test_indexing.py
+++ pandas/tests/copy_view/test_indexing.py
@@ -462,6 +462,158 @@ def test_subset_set_with_column_indexer(
tm.assert_frame_equal(df, df_orig)
+@pytest.mark.parametrize(
+ "method",
+ [
+ lambda df: df[["a", "b"]][0:2],
+ lambda df: df[0:2][["a", "b"]],
+ lambda df: df[["a", "b"]].iloc[0:2],
+ lambda df: df[["a", "b"]].loc[0:1],
+ lambda df: df[0:2].iloc[:, 0:2],
+ lambda df: df[0:2].loc[:, "a":"b"], # type: ignore[misc]
+ ],
+ ids=[
+ "row-getitem-slice",
+ "column-getitem",
+ "row-iloc-slice",
+ "row-loc-slice",
+ "column-iloc-slice",
+ "column-loc-slice",
+ ],
+)
+@pytest.mark.parametrize(
+ "dtype", ["int64", "float64"], ids=["single-block", "mixed-block"]
+)
+def test_subset_chained_getitem(
+ request, method, dtype, using_copy_on_write, using_array_manager
+):
+ # Case: creating a subset using multiple, chained getitem calls using views
+ # still needs to guarantee proper CoW behaviour
+ df = DataFrame(
+ {"a": [1, 2, 3], "b": [4, 5, 6], "c": np.array([7, 8, 9], dtype=dtype)}
+ )
+ df_orig = df.copy()
+
+ # when not using CoW, it depends on whether we have a single block or not
+ # and whether we are slicing the columns -> in that case we have a view
+ subset_is_view = request.node.callspec.id in (
+ "single-block-column-iloc-slice",
+ "single-block-column-loc-slice",
+ ) or (
+ request.node.callspec.id
+ in ("mixed-block-column-iloc-slice", "mixed-block-column-loc-slice")
+ and using_array_manager
+ )
+
+ # modify subset -> don't modify parent
+ subset = method(df)
+ subset.iloc[0, 0] = 0
+ if using_copy_on_write or (not subset_is_view):
+ tm.assert_frame_equal(df, df_orig)
+ else:
+ assert df.iloc[0, 0] == 0
+
+ # modify parent -> don't modify subset
+ subset = method(df)
+ df.iloc[0, 0] = 0
+ expected = DataFrame({"a": [1, 2], "b": [4, 5]})
+ if using_copy_on_write or not subset_is_view:
+ tm.assert_frame_equal(subset, expected)
+ else:
+ assert subset.iloc[0, 0] == 0
+
+
+@pytest.mark.parametrize(
+ "dtype", ["int64", "float64"], ids=["single-block", "mixed-block"]
+)
+def test_subset_chained_getitem_column(dtype, using_copy_on_write):
+ # Case: creating a subset using multiple, chained getitem calls using views
+ # still needs to guarantee proper CoW behaviour
+ df = DataFrame(
+ {"a": [1, 2, 3], "b": [4, 5, 6], "c": np.array([7, 8, 9], dtype=dtype)}
+ )
+ df_orig = df.copy()
+
+ # modify subset -> don't modify parent
+ subset = df[:]["a"][0:2]
+ df._clear_item_cache()
+ subset.iloc[0] = 0
+ if using_copy_on_write:
+ tm.assert_frame_equal(df, df_orig)
+ else:
+ assert df.iloc[0, 0] == 0
+
+ # modify parent -> don't modify subset
+ subset = df[:]["a"][0:2]
+ df._clear_item_cache()
+ df.iloc[0, 0] = 0
+ expected = Series([1, 2], name="a")
+ if using_copy_on_write:
+ tm.assert_series_equal(subset, expected)
+ else:
+ assert subset.iloc[0] == 0
+
+
+@pytest.mark.parametrize(
+ "method",
+ [
+ lambda s: s["a":"c"]["a":"b"], # type: ignore[misc]
+ lambda s: s.iloc[0:3].iloc[0:2],
+ lambda s: s.loc["a":"c"].loc["a":"b"], # type: ignore[misc]
+ lambda s: s.loc["a":"c"] # type: ignore[misc]
+ .iloc[0:3]
+ .iloc[0:2]
+ .loc["a":"b"] # type: ignore[misc]
+ .iloc[0:1],
+ ],
+ ids=["getitem", "iloc", "loc", "long-chain"],
+)
+def test_subset_chained_getitem_series(method, using_copy_on_write):
+ # Case: creating a subset using multiple, chained getitem calls using views
+ # still needs to guarantee proper CoW behaviour
+ s = Series([1, 2, 3], index=["a", "b", "c"])
+ s_orig = s.copy()
+
+ # modify subset -> don't modify parent
+ subset = method(s)
+ subset.iloc[0] = 0
+ if using_copy_on_write:
+ tm.assert_series_equal(s, s_orig)
+ else:
+ assert s.iloc[0] == 0
+
+ # modify parent -> don't modify subset
+ subset = s.iloc[0:3].iloc[0:2]
+ s.iloc[0] = 0
+ expected = Series([1, 2], index=["a", "b"])
+ if using_copy_on_write:
+ tm.assert_series_equal(subset, expected)
+ else:
+ assert subset.iloc[0] == 0
+
+
+def test_subset_chained_single_block_row(using_copy_on_write, using_array_manager):
+ df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [7, 8, 9]})
+ df_orig = df.copy()
+
+ # modify subset -> don't modify parent
+ subset = df[:].iloc[0].iloc[0:2]
+ subset.iloc[0] = 0
+ if using_copy_on_write or using_array_manager:
+ tm.assert_frame_equal(df, df_orig)
+ else:
+ assert df.iloc[0, 0] == 0
+
+ # modify parent -> don't modify subset
+ subset = df[:].iloc[0].iloc[0:2]
+ df.iloc[0, 0] = 0
+ expected = Series([1, 4], index=["a", "b"], name=0)
+ if using_copy_on_write or using_array_manager:
+ tm.assert_series_equal(subset, expected)
+ else:
+ assert subset.iloc[0] == 0
+
+
# TODO add more tests modifying the parent
diff --git pandas/tests/copy_view/test_internals.py pandas/tests/copy_view/test_internals.py
index 2191fc1b33218..1938a1c58fe3d 100644
--- pandas/tests/copy_view/test_internals.py
+++ pandas/tests/copy_view/test_internals.py
@@ -1,7 +1,9 @@
import numpy as np
+import pytest
import pandas.util._test_decorators as td
+import pandas as pd
from pandas import DataFrame
from pandas.tests.copy_view.util import get_array
@@ -43,3 +45,51 @@ def test_consolidate(using_copy_on_write):
subset.iloc[0, 1] = 0.0
assert df._mgr._has_no_reference(1)
assert df.loc[0, "b"] == 0.1
+
+
+@td.skip_array_manager_invalid_test
+def test_clear_parent(using_copy_on_write):
+ # ensure to clear parent reference if we are no longer viewing data from parent
+ if not using_copy_on_write:
+ pytest.skip("test only relevant when using copy-on-write")
+
+ df = DataFrame({"a": [1, 2, 3], "b": [0.1, 0.2, 0.3]})
+ subset = df[:]
+ assert subset._mgr.parent is not None
+
+ # replacing existing columns loses the references to the parent df
+ subset["a"] = 0
+ assert subset._mgr.parent is not None
+ # when losing the last reference, also the parent should be reset
+ subset["b"] = 0
+ assert subset._mgr.parent is None
+
+
+@pytest.mark.single_cpu
+@td.skip_array_manager_invalid_test
+def test_switch_options():
+ # ensure we can switch the value of the option within one session
+ # (assuming data is constructed after switching)
+
+ # using the option_context to ensure we set back to global option value
+ # after running the test
+ with pd.option_context("mode.copy_on_write", False):
+ df = DataFrame({"a": [1, 2, 3], "b": [0.1, 0.2, 0.3]})
+ subset = df[:]
+ subset.iloc[0, 0] = 0
+ # df updated with CoW disabled
+ assert df.iloc[0, 0] == 0
+
+ pd.options.mode.copy_on_write = True
+ df = DataFrame({"a": [1, 2, 3], "b": [0.1, 0.2, 0.3]})
+ subset = df[:]
+ subset.iloc[0, 0] = 0
+ # df not updated with CoW enabled
+ assert df.iloc[0, 0] == 1
+
+ pd.options.mode.copy_on_write = False
+ df = DataFrame({"a": [1, 2, 3], "b": [0.1, 0.2, 0.3]})
+ subset = df[:]
+ subset.iloc[0, 0] = 0
+ # df updated with CoW disabled
+ assert df.iloc[0, 0] == 0
diff --git pandas/tests/copy_view/test_methods.py pandas/tests/copy_view/test_methods.py
index df723808ce06b..0b366f378fa13 100644
--- pandas/tests/copy_view/test_methods.py
+++ pandas/tests/copy_view/test_methods.py
@@ -1,4 +1,5 @@
import numpy as np
+import pytest
from pandas import (
DataFrame,
@@ -156,7 +157,7 @@ def test_to_frame(using_copy_on_write):
ser = Series([1, 2, 3])
ser_orig = ser.copy()
- df = ser.to_frame()
+ df = ser[:].to_frame()
# currently this always returns a "view"
assert np.shares_memory(ser.values, get_array(df, 0))
@@ -169,5 +170,62 @@ def test_to_frame(using_copy_on_write):
tm.assert_series_equal(ser, ser_orig)
else:
# but currently select_dtypes() actually returns a view -> mutates parent
- ser_orig.iloc[0] = 0
- tm.assert_series_equal(ser, ser_orig)
+ expected = ser_orig.copy()
+ expected.iloc[0] = 0
+ tm.assert_series_equal(ser, expected)
+
+ # modify original series -> don't modify dataframe
+ df = ser[:].to_frame()
+ ser.iloc[0] = 0
+
+ if using_copy_on_write:
+ tm.assert_frame_equal(df, ser_orig.to_frame())
+ else:
+ expected = ser_orig.copy().to_frame()
+ expected.iloc[0, 0] = 0
+ tm.assert_frame_equal(df, expected)
+
+
+@pytest.mark.parametrize(
+ "method, idx",
+ [
+ (lambda df: df.copy(deep=False).copy(deep=False), 0),
+ (lambda df: df.reset_index().reset_index(), 2),
+ (lambda df: df.rename(columns=str.upper).rename(columns=str.lower), 0),
+ (lambda df: df.copy(deep=False).select_dtypes(include="number"), 0),
+ ],
+ ids=["shallow-copy", "reset_index", "rename", "select_dtypes"],
+)
+def test_chained_methods(request, method, idx, using_copy_on_write):
+ df = DataFrame({"a": [1, 2, 3], "b": [4, 5, 6], "c": [0.1, 0.2, 0.3]})
+ df_orig = df.copy()
+
+ # when not using CoW, only the copy() variant actually gives a view
+ df2_is_view = not using_copy_on_write and request.node.callspec.id == "shallow-copy"
+
+ # modify df2 -> don't modify df
+ df2 = method(df)
+ df2.iloc[0, idx] = 0
+ if not df2_is_view:
+ tm.assert_frame_equal(df, df_orig)
+
+ # modify df -> don't modify df2
+ df2 = method(df)
+ df.iloc[0, 0] = 0
+ if not df2_is_view:
+ tm.assert_frame_equal(df2.iloc[:, idx:], df_orig)
+
+
+def test_putmask(using_copy_on_write):
+ df = DataFrame({"a": [1, 2], "b": 1, "c": 2})
+ view = df[:]
+ df_orig = df.copy()
+ df[df == df] = 5
+
+ if using_copy_on_write:
+ assert not np.shares_memory(get_array(view, "a"), get_array(df, "a"))
+ tm.assert_frame_equal(view, df_orig)
+ else:
+ # Without CoW the original will be modified
+ assert np.shares_memory(get_array(view, "a"), get_array(df, "a"))
+ assert view.iloc[0, 0] == 5
diff --git pandas/tests/dtypes/test_inference.py pandas/tests/dtypes/test_inference.py
index f08d6b8c9feb8..948d14c1bd1f9 100644
--- pandas/tests/dtypes/test_inference.py
+++ pandas/tests/dtypes/test_inference.py
@@ -18,6 +18,10 @@
from numbers import Number
import re
import sys
+from typing import (
+ Generic,
+ TypeVar,
+)
import numpy as np
import pytest
@@ -228,6 +232,22 @@ def __getitem__(self, item):
assert not inference.is_list_like(NotListLike())
+def test_is_list_like_generic():
+ # GH 49649
+ # is_list_like was yielding false positives for Generic classes in python 3.11
+ T = TypeVar("T")
+
+ class MyDataFrame(DataFrame, Generic[T]):
+ ...
+
+ tstc = MyDataFrame[int]
+ tst = MyDataFrame[int]({"x": [1, 2, 3]})
+
+ assert not inference.is_list_like(tstc)
+ assert isinstance(tst, DataFrame)
+ assert inference.is_list_like(tst)
+
+
def test_is_sequence():
is_seq = inference.is_sequence
assert is_seq((1, 2))
diff --git pandas/tests/dtypes/test_missing.py pandas/tests/dtypes/test_missing.py
index 9c8aeb050ec27..9a242e9491537 100644
--- pandas/tests/dtypes/test_missing.py
+++ pandas/tests/dtypes/test_missing.py
@@ -9,6 +9,7 @@
from pandas._libs import missing as libmissing
from pandas._libs.tslibs import iNaT
+from pandas.compat import is_numpy_dev
from pandas.core.dtypes.common import (
is_float,
@@ -461,7 +462,7 @@ def test_array_equivalent_series(val):
cm = (
# stacklevel is chosen to make sense when called from .equals
tm.assert_produces_warning(FutureWarning, match=msg, check_stacklevel=False)
- if isinstance(val, str)
+ if isinstance(val, str) and not is_numpy_dev
else nullcontext()
)
with cm:
@@ -524,18 +525,120 @@ def test_array_equivalent_str(dtype):
)
-def test_array_equivalent_nested():
+@pytest.mark.parametrize(
+ "strict_nan", [pytest.param(True, marks=pytest.mark.xfail), False]
+)
+def test_array_equivalent_nested(strict_nan):
# reached in groupby aggregations, make sure we use np.any when checking
# if the comparison is truthy
- left = np.array([np.array([50, 70, 90]), np.array([20, 30, 40])], dtype=object)
- right = np.array([np.array([50, 70, 90]), np.array([20, 30, 40])], dtype=object)
+ left = np.array([np.array([50, 70, 90]), np.array([20, 30])], dtype=object)
+ right = np.array([np.array([50, 70, 90]), np.array([20, 30])], dtype=object)
- assert array_equivalent(left, right, strict_nan=True)
- assert not array_equivalent(left, right[::-1], strict_nan=True)
+ assert array_equivalent(left, right, strict_nan=strict_nan)
+ assert not array_equivalent(left, right[::-1], strict_nan=strict_nan)
- left = np.array([np.array([50, 50, 50]), np.array([40, 40, 40])], dtype=object)
+ left = np.empty(2, dtype=object)
+ left[:] = [np.array([50, 70, 90]), np.array([20, 30, 40])]
+ right = np.empty(2, dtype=object)
+ right[:] = [np.array([50, 70, 90]), np.array([20, 30, 40])]
+ assert array_equivalent(left, right, strict_nan=strict_nan)
+ assert not array_equivalent(left, right[::-1], strict_nan=strict_nan)
+
+ left = np.array([np.array([50, 50, 50]), np.array([40, 40])], dtype=object)
right = np.array([50, 40])
- assert not array_equivalent(left, right, strict_nan=True)
+ assert not array_equivalent(left, right, strict_nan=strict_nan)
+
+
+@pytest.mark.parametrize(
+ "strict_nan", [pytest.param(True, marks=pytest.mark.xfail), False]
+)
+def test_array_equivalent_nested2(strict_nan):
+ # more than one level of nesting
+ left = np.array(
+ [
+ np.array([np.array([50, 70]), np.array([90])], dtype=object),
+ np.array([np.array([20, 30])], dtype=object),
+ ],
+ dtype=object,
+ )
+ right = np.array(
+ [
+ np.array([np.array([50, 70]), np.array([90])], dtype=object),
+ np.array([np.array([20, 30])], dtype=object),
+ ],
+ dtype=object,
+ )
+ assert array_equivalent(left, right, strict_nan=strict_nan)
+ assert not array_equivalent(left, right[::-1], strict_nan=strict_nan)
+
+ left = np.array([np.array([np.array([50, 50, 50])], dtype=object)], dtype=object)
+ right = np.array([50])
+ assert not array_equivalent(left, right, strict_nan=strict_nan)
+
+
+@pytest.mark.parametrize(
+ "strict_nan", [pytest.param(True, marks=pytest.mark.xfail), False]
+)
+def test_array_equivalent_nested_list(strict_nan):
+ left = np.array([[50, 70, 90], [20, 30]], dtype=object)
+ right = np.array([[50, 70, 90], [20, 30]], dtype=object)
+
+ assert array_equivalent(left, right, strict_nan=strict_nan)
+ assert not array_equivalent(left, right[::-1], strict_nan=strict_nan)
+
+ left = np.array([[50, 50, 50], [40, 40]], dtype=object)
+ right = np.array([50, 40])
+ assert not array_equivalent(left, right, strict_nan=strict_nan)
+
+
+@pytest.mark.xfail(reason="failing")
+@pytest.mark.parametrize("strict_nan", [True, False])
+def test_array_equivalent_nested_mixed_list(strict_nan):
+ # mixed arrays / lists in left and right
+ # https://github.com/pandas-dev/pandas/issues/50360
+ left = np.array([np.array([1, 2, 3]), np.array([4, 5])], dtype=object)
+ right = np.array([[1, 2, 3], [4, 5]], dtype=object)
+
+ assert array_equivalent(left, right, strict_nan=strict_nan)
+ assert not array_equivalent(left, right[::-1], strict_nan=strict_nan)
+
+ # multiple levels of nesting
+ left = np.array(
+ [
+ np.array([np.array([1, 2, 3]), np.array([4, 5])], dtype=object),
+ np.array([np.array([6]), np.array([7, 8]), np.array([9])], dtype=object),
+ ],
+ dtype=object,
+ )
+ right = np.array([[[1, 2, 3], [4, 5]], [[6], [7, 8], [9]]], dtype=object)
+ assert array_equivalent(left, right, strict_nan=strict_nan)
+ assert not array_equivalent(left, right[::-1], strict_nan=strict_nan)
+
+ # same-length lists
+ subarr = np.empty(2, dtype=object)
+ subarr[:] = [
+ np.array([None, "b"], dtype=object),
+ np.array(["c", "d"], dtype=object),
+ ]
+ left = np.array([subarr, None], dtype=object)
+ right = np.array([list([[None, "b"], ["c", "d"]]), None], dtype=object)
+ assert array_equivalent(left, right, strict_nan=strict_nan)
+ assert not array_equivalent(left, right[::-1], strict_nan=strict_nan)
+
+
+@pytest.mark.xfail(reason="failing")
+@pytest.mark.parametrize("strict_nan", [True, False])
+def test_array_equivalent_nested_dicts(strict_nan):
+ left = np.array([{"f1": 1, "f2": np.array(["a", "b"], dtype=object)}], dtype=object)
+ right = np.array(
+ [{"f1": 1, "f2": np.array(["a", "b"], dtype=object)}], dtype=object
+ )
+ assert array_equivalent(left, right, strict_nan=strict_nan)
+ assert not array_equivalent(left, right[::-1], strict_nan=strict_nan)
+
+ right2 = np.array([{"f1": 1, "f2": ["a", "b"]}], dtype=object)
+ assert array_equivalent(left, right2, strict_nan=strict_nan)
+ assert not array_equivalent(left, right2[::-1], strict_nan=strict_nan)
@pytest.mark.parametrize(
diff --git pandas/tests/extension/base/getitem.py pandas/tests/extension/base/getitem.py
index e966d4602a02c..cf51d9d693155 100644
--- pandas/tests/extension/base/getitem.py
+++ pandas/tests/extension/base/getitem.py
@@ -313,8 +313,7 @@ def test_get(self, data):
expected = s.iloc[[2, 3]]
self.assert_series_equal(result, expected)
- with tm.assert_produces_warning(FutureWarning, match="label-based"):
- result = s.get(slice(2))
+ result = s.get(slice(2))
expected = s.iloc[[0, 1]]
self.assert_series_equal(result, expected)
diff --git pandas/tests/extension/base/setitem.py pandas/tests/extension/base/setitem.py
index 8dbf7d47374a6..83b1679b0da7e 100644
--- pandas/tests/extension/base/setitem.py
+++ pandas/tests/extension/base/setitem.py
@@ -400,7 +400,7 @@ def test_setitem_frame_2d_values(self, data):
warn = None
if has_can_hold_element and not isinstance(data.dtype, PandasDtype):
# PandasDtype excluded because it isn't *really* supported.
- warn = FutureWarning
+ warn = DeprecationWarning
with tm.assert_produces_warning(warn, match=msg):
df.iloc[:] = df
diff --git pandas/tests/frame/indexing/test_indexing.py pandas/tests/frame/indexing/test_indexing.py
index acd742c54b908..e2a99348f45aa 100644
--- pandas/tests/frame/indexing/test_indexing.py
+++ pandas/tests/frame/indexing/test_indexing.py
@@ -785,7 +785,7 @@ def test_getitem_setitem_float_labels(self, using_array_manager):
assert len(result) == 5
cp = df.copy()
- warn = FutureWarning if using_array_manager else None
+ warn = DeprecationWarning if using_array_manager else None
msg = "will attempt to set the values inplace"
with tm.assert_produces_warning(warn, match=msg):
cp.loc[1.0:5.0] = 0
diff --git pandas/tests/frame/indexing/test_setitem.py pandas/tests/frame/indexing/test_setitem.py
index cf0ff4e3603f3..e33c6d6a805cf 100644
--- pandas/tests/frame/indexing/test_setitem.py
+++ pandas/tests/frame/indexing/test_setitem.py
@@ -408,7 +408,7 @@ def test_setitem_frame_length_0_str_key(self, indexer):
def test_setitem_frame_duplicate_columns(self, using_array_manager):
# GH#15695
- warn = FutureWarning if using_array_manager else None
+ warn = DeprecationWarning if using_array_manager else None
msg = "will attempt to set the values inplace"
cols = ["A", "B", "C"] * 2
diff --git pandas/tests/frame/indexing/test_where.py pandas/tests/frame/indexing/test_where.py
index fba8978d2128c..c7e0a10c0d7d0 100644
--- pandas/tests/frame/indexing/test_where.py
+++ pandas/tests/frame/indexing/test_where.py
@@ -384,7 +384,7 @@ def test_where_datetime(self, using_array_manager):
expected = df.copy()
expected.loc[[0, 1], "A"] = np.nan
- warn = FutureWarning if using_array_manager else None
+ warn = DeprecationWarning if using_array_manager else None
msg = "will attempt to set the values inplace"
with tm.assert_produces_warning(warn, match=msg):
expected.loc[:, "C"] = np.nan
@@ -571,7 +571,7 @@ def test_where_axis_multiple_dtypes(self, using_array_manager):
d2 = df.copy().drop(1, axis=1)
expected = df.copy()
- warn = FutureWarning if using_array_manager else None
+ warn = DeprecationWarning if using_array_manager else None
msg = "will attempt to set the values inplace"
with tm.assert_produces_warning(warn, match=msg):
expected.loc[:, 1] = np.nan
diff --git pandas/tests/frame/methods/test_dropna.py pandas/tests/frame/methods/test_dropna.py
index 62351aa89c914..53d9f75494d92 100644
--- pandas/tests/frame/methods/test_dropna.py
+++ pandas/tests/frame/methods/test_dropna.py
@@ -221,7 +221,7 @@ def test_dropna_with_duplicate_columns(self):
df.iloc[0, 0] = np.nan
df.iloc[1, 1] = np.nan
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.iloc[:, 3] = np.nan
expected = df.dropna(subset=["A", "B", "C"], how="all")
expected.columns = ["A", "A", "B", "C"]
diff --git pandas/tests/frame/methods/test_quantile.py pandas/tests/frame/methods/test_quantile.py
index 14b416011b956..139360d332916 100644
--- pandas/tests/frame/methods/test_quantile.py
+++ pandas/tests/frame/methods/test_quantile.py
@@ -929,15 +929,8 @@ def test_quantile_ea_all_na(self, request, obj, index):
qs = [0.5, 0, 1]
result = self.compute_quantile(obj, qs)
- if np_version_under1p21 and index.dtype == "timedelta64[ns]":
- msg = "failed on Numpy 1.20.3; TypeError: data type 'Int64' not understood"
- mark = pytest.mark.xfail(reason=msg, raises=TypeError)
- request.node.add_marker(mark)
-
expected = index.take([-1, -1, -1], allow_fill=True, fill_value=index._na_value)
expected = Series(expected, index=qs, name="A")
- if expected.dtype == "Int64":
- expected = expected.astype("Float64")
expected = type(obj)(expected)
tm.assert_equal(result, expected)
diff --git pandas/tests/frame/methods/test_rename.py pandas/tests/frame/methods/test_rename.py
index f4443953a0d52..405518c372b2c 100644
--- pandas/tests/frame/methods/test_rename.py
+++ pandas/tests/frame/methods/test_rename.py
@@ -178,7 +178,9 @@ def test_rename_nocopy(self, float_frame, using_copy_on_write):
# TODO(CoW) this also shouldn't warn in case of CoW, but the heuristic
# checking if the array shares memory doesn't work if CoW happened
- with tm.assert_produces_warning(FutureWarning if using_copy_on_write else None):
+ with tm.assert_produces_warning(
+ DeprecationWarning if using_copy_on_write else None
+ ):
# This loc setitem already happens inplace, so no warning
# that this will change in the future
renamed.loc[:, "foo"] = 1.0
diff --git pandas/tests/frame/methods/test_replace.py pandas/tests/frame/methods/test_replace.py
index 177f3ec1b4504..f4de685688b00 100644
--- pandas/tests/frame/methods/test_replace.py
+++ pandas/tests/frame/methods/test_replace.py
@@ -1496,6 +1496,18 @@ def test_replace_list_with_mixed_type(
result = obj.replace(box(to_replace), value)
tm.assert_equal(result, expected)
+ @pytest.mark.parametrize("val", [2, np.nan, 2.0])
+ def test_replace_value_none_dtype_numeric(self, val):
+ # GH#48231
+ df = DataFrame({"a": [1, val]})
+ result = df.replace(val, None)
+ expected = DataFrame({"a": [1, None]}, dtype=object)
+ tm.assert_frame_equal(result, expected)
+
+ df = DataFrame({"a": [1, val]})
+ result = df.replace({val: None})
+ tm.assert_frame_equal(result, expected)
+
class TestDataFrameReplaceRegex:
@pytest.mark.parametrize(
diff --git pandas/tests/frame/methods/test_shift.py pandas/tests/frame/methods/test_shift.py
index bfc3c8e0a25eb..9b4dcf58590e3 100644
--- pandas/tests/frame/methods/test_shift.py
+++ pandas/tests/frame/methods/test_shift.py
@@ -372,7 +372,7 @@ def test_shift_duplicate_columns(self, using_array_manager):
warn = None
if using_array_manager:
- warn = FutureWarning
+ warn = DeprecationWarning
shifted = []
for columns in column_lists:
diff --git pandas/tests/frame/methods/test_update.py pandas/tests/frame/methods/test_update.py
index a35530100a425..2903436337820 100644
--- pandas/tests/frame/methods/test_update.py
+++ pandas/tests/frame/methods/test_update.py
@@ -136,7 +136,8 @@ def test_update_from_non_df(self):
def test_update_datetime_tz(self):
# GH 25807
result = DataFrame([pd.Timestamp("2019", tz="UTC")])
- result.update(result)
+ with tm.assert_produces_warning(None):
+ result.update(result)
expected = DataFrame([pd.Timestamp("2019", tz="UTC")])
tm.assert_frame_equal(result, expected)
diff --git pandas/tests/frame/test_arithmetic.py pandas/tests/frame/test_arithmetic.py
index 25257a2c102fd..93c4b44da985a 100644
--- pandas/tests/frame/test_arithmetic.py
+++ pandas/tests/frame/test_arithmetic.py
@@ -1135,6 +1135,26 @@ def test_binop_other(self, op, value, dtype, switch_numexpr_min_elements):
expected = op(df, value).dtypes
tm.assert_series_equal(result, expected)
+ def test_arithmetic_midx_cols_different_dtypes(self):
+ # GH#49769
+ midx = MultiIndex.from_arrays([Series([1, 2]), Series([3, 4])])
+ midx2 = MultiIndex.from_arrays([Series([1, 2], dtype="Int8"), Series([3, 4])])
+ left = DataFrame([[1, 2], [3, 4]], columns=midx)
+ right = DataFrame([[1, 2], [3, 4]], columns=midx2)
+ result = left - right
+ expected = DataFrame([[0, 0], [0, 0]], columns=midx)
+ tm.assert_frame_equal(result, expected)
+
+ def test_arithmetic_midx_cols_different_dtypes_different_order(self):
+ # GH#49769
+ midx = MultiIndex.from_arrays([Series([1, 2]), Series([3, 4])])
+ midx2 = MultiIndex.from_arrays([Series([2, 1], dtype="Int8"), Series([4, 3])])
+ left = DataFrame([[1, 2], [3, 4]], columns=midx)
+ right = DataFrame([[1, 2], [3, 4]], columns=midx2)
+ result = left - right
+ expected = DataFrame([[-1, 1], [-1, 1]], columns=midx)
+ tm.assert_frame_equal(result, expected)
+
def test_frame_with_zero_len_series_corner_cases():
# GH#28600
diff --git pandas/tests/frame/test_constructors.py pandas/tests/frame/test_constructors.py
index b4f027f3a832a..16021facb3986 100644
--- pandas/tests/frame/test_constructors.py
+++ pandas/tests/frame/test_constructors.py
@@ -2604,7 +2604,9 @@ def check_views(c_only: bool = False):
# FIXME(GH#35417): until GH#35417, iloc.setitem into EA values does not preserve
# view, so we have to check in the other direction
- with tm.assert_produces_warning(FutureWarning, match="will attempt to set"):
+ with tm.assert_produces_warning(
+ DeprecationWarning, match="will attempt to set"
+ ):
df.iloc[:, 2] = pd.array([45, 46], dtype=c.dtype)
assert df.dtypes.iloc[2] == c.dtype
if not copy and not using_copy_on_write:
diff --git pandas/tests/frame/test_nonunique_indexes.py pandas/tests/frame/test_nonunique_indexes.py
index 2c28800fb181f..38861a2b04409 100644
--- pandas/tests/frame/test_nonunique_indexes.py
+++ pandas/tests/frame/test_nonunique_indexes.py
@@ -323,7 +323,9 @@ def test_dup_columns_across_dtype(self):
def test_set_value_by_index(self, using_array_manager):
# See gh-12344
warn = (
- FutureWarning if using_array_manager and not is_platform_windows() else None
+ DeprecationWarning
+ if using_array_manager and not is_platform_windows()
+ else None
)
msg = "will attempt to set the values inplace"
diff --git pandas/tests/frame/test_stack_unstack.py pandas/tests/frame/test_stack_unstack.py
index 69e5d5e3d5447..e22559802cbec 100644
--- pandas/tests/frame/test_stack_unstack.py
+++ pandas/tests/frame/test_stack_unstack.py
@@ -23,7 +23,7 @@
class TestDataFrameReshape:
def test_stack_unstack(self, float_frame, using_array_manager):
- warn = FutureWarning if using_array_manager else None
+ warn = DeprecationWarning if using_array_manager else None
msg = "will attempt to set the values inplace"
df = float_frame.copy()
diff --git pandas/tests/frame/test_unary.py pandas/tests/frame/test_unary.py
index a69ca0fef7f8b..9caadd0998a26 100644
--- pandas/tests/frame/test_unary.py
+++ pandas/tests/frame/test_unary.py
@@ -3,6 +3,8 @@
import numpy as np
import pytest
+from pandas.compat import is_numpy_dev
+
import pandas as pd
import pandas._testing as tm
@@ -98,11 +100,6 @@ def test_pos_numeric(self, df):
@pytest.mark.parametrize(
"df",
[
- # numpy changing behavior in the future
- pytest.param(
- pd.DataFrame({"a": ["a", "b"]}),
- marks=[pytest.mark.filterwarnings("ignore")],
- ),
pd.DataFrame({"a": np.array([-1, 2], dtype=object)}),
pd.DataFrame({"a": [Decimal("-1.0"), Decimal("2.0")]}),
],
@@ -112,6 +109,25 @@ def test_pos_object(self, df):
tm.assert_frame_equal(+df, df)
tm.assert_series_equal(+df["a"], df["a"])
+ @pytest.mark.parametrize(
+ "df",
+ [
+ pytest.param(
+ pd.DataFrame({"a": ["a", "b"]}),
+ marks=[pytest.mark.filterwarnings("ignore")],
+ ),
+ ],
+ )
+ def test_pos_object_raises(self, df):
+ # GH#21380
+ if is_numpy_dev:
+ with pytest.raises(
+ TypeError, match=r"^bad operand type for unary \+: \'str\'$"
+ ):
+ tm.assert_frame_equal(+df, df)
+ else:
+ tm.assert_series_equal(+df["a"], df["a"])
+
@pytest.mark.parametrize(
"df", [pd.DataFrame({"a": pd.to_datetime(["2017-01-22", "1970-01-01"])})]
)
diff --git pandas/tests/groupby/aggregate/test_aggregate.py pandas/tests/groupby/aggregate/test_aggregate.py
index bda4d0da9f6ce..4e8cc2cb3869d 100644
--- pandas/tests/groupby/aggregate/test_aggregate.py
+++ pandas/tests/groupby/aggregate/test_aggregate.py
@@ -1454,3 +1454,15 @@ def test_agg_of_mode_list(test, constant):
expected = expected.set_index(0)
tm.assert_frame_equal(result, expected)
+
+
+def test_numeric_only_warning_numpy():
+ # GH#50538
+ df = DataFrame({"a": [1, 1, 2], "b": list("xyz"), "c": [3, 4, 5]})
+ gb = df.groupby("a")
+ msg = "The operation <function mean.*failed"
+ with tm.assert_produces_warning(FutureWarning, match=msg):
+ gb.agg(np.mean)
+ # Ensure users can't pass numeric_only
+ with pytest.raises(TypeError, match="got an unexpected keyword argument"):
+ gb.agg(np.mean, numeric_only=True)
diff --git pandas/tests/groupby/test_apply.py pandas/tests/groupby/test_apply.py
index 47ea6a99ffea9..b6c16c0dee827 100644
--- pandas/tests/groupby/test_apply.py
+++ pandas/tests/groupby/test_apply.py
@@ -335,6 +335,7 @@ def f(piece):
result = grouped.apply(f)
assert isinstance(result, DataFrame)
+ assert not hasattr(result, "name") # GH49907
tm.assert_index_equal(result.index, ts.index)
@@ -1356,3 +1357,16 @@ def test_empty_df(method, op):
)
tm.assert_series_equal(result, expected)
+
+
+def test_numeric_only_warning_numpy():
+ # GH#50538
+ df = DataFrame({"a": [1, 1, 2], "b": list("xyz"), "c": [3, 4, 5]})
+ gb = df.groupby("a")
+ msg = "The operation <function mean.*failed"
+ # Warning is raised from within NumPy
+ with tm.assert_produces_warning(FutureWarning, match=msg, check_stacklevel=False):
+ gb.apply(np.mean)
+ # Ensure users can't pass numeric_only
+ with pytest.raises(TypeError, match="got an unexpected keyword argument"):
+ gb.apply(np.mean, numeric_only=True)
diff --git pandas/tests/groupby/test_groupby.py pandas/tests/groupby/test_groupby.py
index ba39f76203623..f6bbb9eda133d 100644
--- pandas/tests/groupby/test_groupby.py
+++ pandas/tests/groupby/test_groupby.py
@@ -486,9 +486,14 @@ def test_frame_set_name_single(df):
result = df.groupby("A", as_index=False).mean()
assert result.index.name != "A"
+ # GH#50538
+ msg = "The operation <function mean.*failed"
with tm.assert_produces_warning(FutureWarning, match=msg):
result = grouped.agg(np.mean)
assert result.index.name == "A"
+ # Ensure users can't pass numeric_only
+ with pytest.raises(TypeError, match="got an unexpected keyword argument"):
+ grouped.agg(np.mean, numeric_only=True)
result = grouped.agg({"C": np.mean, "D": np.std})
assert result.index.name == "A"
@@ -766,12 +771,16 @@ def test_as_index_series_return_frame(df):
grouped = df.groupby("A", as_index=False)
grouped2 = df.groupby(["A", "B"], as_index=False)
- msg = "The default value of numeric_only"
+ # GH#50538
+ msg = "The operation <function sum.*failed"
with tm.assert_produces_warning(FutureWarning, match=msg):
result = grouped["C"].agg(np.sum)
expected = grouped.agg(np.sum).loc[:, ["A", "C"]]
assert isinstance(result, DataFrame)
tm.assert_frame_equal(result, expected)
+ # Ensure users can't pass numeric_only
+ with pytest.raises(TypeError, match="got an unexpected keyword argument"):
+ grouped.agg(np.mean, numeric_only=True)
result2 = grouped2["C"].agg(np.sum)
expected2 = grouped2.agg(np.sum).loc[:, ["A", "B", "C"]]
@@ -779,6 +788,7 @@ def test_as_index_series_return_frame(df):
tm.assert_frame_equal(result2, expected2)
result = grouped["C"].sum()
+ msg = "The default value of numeric_only"
with tm.assert_produces_warning(FutureWarning, match=msg):
expected = grouped.sum().loc[:, ["A", "C"]]
assert isinstance(result, DataFrame)
@@ -1021,10 +1031,14 @@ def test_wrap_aggregated_output_multindex(mframe):
df["baz", "two"] = "peekaboo"
keys = [np.array([0, 0, 1]), np.array([0, 0, 1])]
- msg = "The default value of numeric_only"
+ # GH#50538
+ msg = "The operation <function mean.*failed"
with tm.assert_produces_warning(FutureWarning, match=msg):
agged = df.groupby(keys).agg(np.mean)
assert isinstance(agged.columns, MultiIndex)
+ # Ensure users can't pass numeric_only
+ with pytest.raises(TypeError, match="got an unexpected keyword argument"):
+ df.groupby(keys).agg(np.mean, numeric_only=True)
def aggfun(ser):
if ser.name == ("foo", "one"):
diff --git pandas/tests/groupby/transform/test_transform.py pandas/tests/groupby/transform/test_transform.py
index 8a2bd64a3deb0..5b5b28be4a501 100644
--- pandas/tests/groupby/transform/test_transform.py
+++ pandas/tests/groupby/transform/test_transform.py
@@ -1549,3 +1549,32 @@ def test_transform_aligns_depr(func, series, expected_values, keys, keys_in_inde
if series:
expected = expected["b"]
tm.assert_equal(result, expected)
+
+
+@pytest.mark.parametrize("keys", ["A", ["A", "B"]])
+def test_as_index_no_change(keys, df, groupby_func):
+ # GH#49834 - as_index should have no impact on DataFrameGroupBy.transform
+ if keys == "A":
+ # Column B is string dtype; will fail on some ops
+ df = df.drop(columns="B")
+ args = get_groupby_method_args(groupby_func, df)
+ gb_as_index_true = df.groupby(keys, as_index=True)
+ gb_as_index_false = df.groupby(keys, as_index=False)
+ result = gb_as_index_true.transform(groupby_func, *args)
+ expected = gb_as_index_false.transform(groupby_func, *args)
+ tm.assert_equal(result, expected)
+
+
+@pytest.mark.parametrize("func", [np.mean, np.cumprod])
+def test_numeric_only_warning_numpy(func):
+ # GH#50538
+ df = DataFrame({"a": [1, 1, 2], "b": list("xyz"), "c": [3, 4, 5]})
+ gb = df.groupby("a")
+ msg = "The default value of numeric_only"
+ with tm.assert_produces_warning(FutureWarning, match=msg):
+ gb.transform(func)
+ # Ensure users can pass numeric_only
+ result = gb.transform(func, numeric_only=True)
+ values = [3.5, 3.5, 5.0] if func == np.mean else [3, 12, 5]
+ expected = DataFrame({"c": values})
+ tm.assert_frame_equal(result, expected)
diff --git pandas/tests/indexes/datetimes/test_date_range.py pandas/tests/indexes/datetimes/test_date_range.py
index 377974a918ad9..07f57d3f9c3f4 100644
--- pandas/tests/indexes/datetimes/test_date_range.py
+++ pandas/tests/indexes/datetimes/test_date_range.py
@@ -1126,6 +1126,24 @@ def test_range_with_millisecond_resolution(self, start_end):
expected = DatetimeIndex([start])
tm.assert_index_equal(result, expected)
+ @pytest.mark.parametrize(
+ "start,period,expected",
+ [
+ ("2022-07-23 00:00:00+02:00", 1, ["2022-07-25 00:00:00+02:00"]),
+ ("2022-07-22 00:00:00+02:00", 1, ["2022-07-22 00:00:00+02:00"]),
+ (
+ "2022-07-22 00:00:00+02:00",
+ 2,
+ ["2022-07-22 00:00:00+02:00", "2022-07-25 00:00:00+02:00"],
+ ),
+ ],
+ )
+ def test_range_with_timezone_and_custombusinessday(self, start, period, expected):
+ # GH49441
+ result = date_range(start=start, periods=period, freq="C")
+ expected = DatetimeIndex(expected)
+ tm.assert_index_equal(result, expected)
+
def test_date_range_with_custom_holidays():
# GH 30593
diff --git pandas/tests/indexes/multi/test_join.py pandas/tests/indexes/multi/test_join.py
index e6bec97aedb38..7000724a6b271 100644
--- pandas/tests/indexes/multi/test_join.py
+++ pandas/tests/indexes/multi/test_join.py
@@ -5,6 +5,8 @@
Index,
Interval,
MultiIndex,
+ Series,
+ StringDtype,
)
import pandas._testing as tm
@@ -158,3 +160,49 @@ def test_join_overlapping_interval_level():
result = idx_1.join(idx_2, how="outer")
tm.assert_index_equal(result, expected)
+
+
+def test_join_midx_ea():
+ # GH#49277
+ midx = MultiIndex.from_arrays(
+ [Series([1, 1, 3], dtype="Int64"), Series([1, 2, 3], dtype="Int64")],
+ names=["a", "b"],
+ )
+ midx2 = MultiIndex.from_arrays(
+ [Series([1], dtype="Int64"), Series([3], dtype="Int64")], names=["a", "c"]
+ )
+ result = midx.join(midx2, how="inner")
+ expected = MultiIndex.from_arrays(
+ [
+ Series([1, 1], dtype="Int64"),
+ Series([1, 2], dtype="Int64"),
+ Series([3, 3], dtype="Int64"),
+ ],
+ names=["a", "b", "c"],
+ )
+ tm.assert_index_equal(result, expected)
+
+
+def test_join_midx_string():
+ # GH#49277
+ midx = MultiIndex.from_arrays(
+ [
+ Series(["a", "a", "c"], dtype=StringDtype()),
+ Series(["a", "b", "c"], dtype=StringDtype()),
+ ],
+ names=["a", "b"],
+ )
+ midx2 = MultiIndex.from_arrays(
+ [Series(["a"], dtype=StringDtype()), Series(["c"], dtype=StringDtype())],
+ names=["a", "c"],
+ )
+ result = midx.join(midx2, how="inner")
+ expected = MultiIndex.from_arrays(
+ [
+ Series(["a", "a"], dtype=StringDtype()),
+ Series(["a", "b"], dtype=StringDtype()),
+ Series(["c", "c"], dtype=StringDtype()),
+ ],
+ names=["a", "b", "c"],
+ )
+ tm.assert_index_equal(result, expected)
diff --git pandas/tests/indexes/object/test_indexing.py pandas/tests/indexes/object/test_indexing.py
index 38bd96921b991..924a33169c132 100644
--- pandas/tests/indexes/object/test_indexing.py
+++ pandas/tests/indexes/object/test_indexing.py
@@ -128,6 +128,13 @@ def test_get_indexer_non_unique_np_nats(self, np_nat_fixture, np_nat_fixture2):
tm.assert_numpy_array_equal(missing, expected_missing)
# dt64nat vs td64nat
else:
+ try:
+ np_nat_fixture == np_nat_fixture2
+ except (TypeError, OverflowError):
+ # Numpy will raise on uncomparable types, like
+ # np.datetime64('NaT', 'Y') and np.datetime64('NaT', 'ps')
+ # https://github.com/numpy/numpy/issues/22762
+ return
index = Index(
np.array(
[
diff --git pandas/tests/indexing/multiindex/test_loc.py pandas/tests/indexing/multiindex/test_loc.py
index d4354766a203b..5cf044280b391 100644
--- pandas/tests/indexing/multiindex/test_loc.py
+++ pandas/tests/indexing/multiindex/test_loc.py
@@ -541,9 +541,9 @@ def test_loc_setitem_single_column_slice():
)
expected = df.copy()
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[:, "B"] = np.arange(4)
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
expected.iloc[:, 2] = np.arange(4)
tm.assert_frame_equal(df, expected)
diff --git pandas/tests/indexing/test_at.py pandas/tests/indexing/test_at.py
index 1e502ca70189a..adbc0e2f8127a 100644
--- pandas/tests/indexing/test_at.py
+++ pandas/tests/indexing/test_at.py
@@ -197,8 +197,12 @@ def test_at_frame_raises_key_error2(self, indexer_al):
def test_at_frame_multiple_columns(self):
# GH#48296 - at shouldn't modify multiple columns
df = DataFrame({"a": [1, 2], "b": [3, 4]})
- with pytest.raises(InvalidIndexError, match=r"slice\(None, None, None\)"):
- df.at[5] = [6, 7]
+ new_row = [6, 7]
+ with pytest.raises(
+ InvalidIndexError,
+ match=f"You can only assign a scalar value not a \\{type(new_row)}",
+ ):
+ df.at[5] = new_row
def test_at_getitem_mixed_index_no_fallback(self):
# GH#19860
@@ -220,3 +224,13 @@ def test_at_categorical_integers(self):
for key in [0, 1]:
with pytest.raises(KeyError, match=str(key)):
df.at[key, key]
+
+ def test_at_applied_for_rows(self):
+ # GH#48729 .at should raise InvalidIndexError when assigning rows
+ df = DataFrame(index=["a"], columns=["col1", "col2"])
+ new_row = [123, 15]
+ with pytest.raises(
+ InvalidIndexError,
+ match=f"You can only assign a scalar value not a \\{type(new_row)}",
+ ):
+ df.at["a"] = new_row
diff --git pandas/tests/indexing/test_floats.py pandas/tests/indexing/test_floats.py
index 186cba62c138f..afc2def7ba8a1 100644
--- pandas/tests/indexing/test_floats.py
+++ pandas/tests/indexing/test_floats.py
@@ -340,8 +340,7 @@ def test_integer_positional_indexing(self, idx):
"""
s = Series(range(2, 6), index=range(2, 6))
- with tm.assert_produces_warning(FutureWarning, match="label-based"):
- result = s[2:4]
+ result = s[2:4]
expected = s.iloc[2:4]
tm.assert_series_equal(result, expected)
diff --git pandas/tests/indexing/test_iloc.py pandas/tests/indexing/test_iloc.py
index 8cc6b6e73aaea..dcc95d9e41a5a 100644
--- pandas/tests/indexing/test_iloc.py
+++ pandas/tests/indexing/test_iloc.py
@@ -84,7 +84,7 @@ def test_iloc_setitem_fullcol_categorical(self, indexer, key, using_array_manage
overwrite = isinstance(key, slice) and key == slice(None)
warn = None
if overwrite:
- warn = FutureWarning
+ warn = DeprecationWarning
msg = "will attempt to set the values inplace instead"
with tm.assert_produces_warning(warn, match=msg):
indexer(df)[key, 0] = cat
@@ -108,7 +108,7 @@ def test_iloc_setitem_fullcol_categorical(self, indexer, key, using_array_manage
frame = DataFrame({0: np.array([0, 1, 2], dtype=object), 1: range(3)})
df = frame.copy()
orig_vals = df.values
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
indexer(df)[key, 0] = cat
expected = DataFrame({0: cat, 1: range(3)})
tm.assert_frame_equal(df, expected)
@@ -904,7 +904,7 @@ def test_iloc_setitem_categorical_updates_inplace(self, using_copy_on_write):
# This should modify our original values in-place
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.iloc[:, 0] = cat[::-1]
if not using_copy_on_write:
@@ -1314,7 +1314,7 @@ def test_iloc_setitem_dtypes_duplicate_columns(
# GH#22035
df = DataFrame([[init_value, "str", "str2"]], columns=["a", "b", "b"])
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.iloc[:, 0] = df.iloc[:, 0].astype(dtypes)
expected_df = DataFrame(
diff --git pandas/tests/indexing/test_indexing.py pandas/tests/indexing/test_indexing.py
index 069e5a62895af..210c75b075011 100644
--- pandas/tests/indexing/test_indexing.py
+++ pandas/tests/indexing/test_indexing.py
@@ -550,7 +550,7 @@ def test_astype_assignment(self):
df = df_orig.copy()
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.iloc[:, 0:2] = df.iloc[:, 0:2].astype(np.int64)
expected = DataFrame(
[[1, 2, "3", ".4", 5, 6.0, "foo"]], columns=list("ABCDEFG")
@@ -558,7 +558,7 @@ def test_astype_assignment(self):
tm.assert_frame_equal(df, expected)
df = df_orig.copy()
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.iloc[:, 0:2] = df.iloc[:, 0:2]._convert(datetime=True, numeric=True)
expected = DataFrame(
[[1, 2, "3", ".4", 5, 6.0, "foo"]], columns=list("ABCDEFG")
@@ -567,7 +567,7 @@ def test_astype_assignment(self):
# GH5702 (loc)
df = df_orig.copy()
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[:, "A"] = df.loc[:, "A"].astype(np.int64)
expected = DataFrame(
[[1, "2", "3", ".4", 5, 6.0, "foo"]], columns=list("ABCDEFG")
@@ -575,7 +575,7 @@ def test_astype_assignment(self):
tm.assert_frame_equal(df, expected)
df = df_orig.copy()
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[:, ["B", "C"]] = df.loc[:, ["B", "C"]].astype(np.int64)
expected = DataFrame(
[["1", 2, 3, ".4", 5, 6.0, "foo"]], columns=list("ABCDEFG")
@@ -586,13 +586,13 @@ def test_astype_assignment_full_replacements(self):
# full replacements / no nans
df = DataFrame({"A": [1.0, 2.0, 3.0, 4.0]})
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.iloc[:, 0] = df["A"].astype(np.int64)
expected = DataFrame({"A": [1, 2, 3, 4]})
tm.assert_frame_equal(df, expected)
df = DataFrame({"A": [1.0, 2.0, 3.0, 4.0]})
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[:, "A"] = df["A"].astype(np.int64)
expected = DataFrame({"A": [1, 2, 3, 4]})
tm.assert_frame_equal(df, expected)
diff --git pandas/tests/indexing/test_loc.py pandas/tests/indexing/test_loc.py
index e62fb98b0782d..235ad3d213a62 100644
--- pandas/tests/indexing/test_loc.py
+++ pandas/tests/indexing/test_loc.py
@@ -368,7 +368,7 @@ def test_loc_setitem_dtype(self):
df = DataFrame({"id": ["A"], "a": [1.2], "b": [0.0], "c": [-2.5]})
cols = ["a", "b", "c"]
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[:, cols] = df.loc[:, cols].astype("float32")
expected = DataFrame(
@@ -633,11 +633,11 @@ def test_loc_setitem_consistency_slice_column_len(self):
df = DataFrame(values, index=mi, columns=cols)
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[:, ("Respondent", "StartDate")] = to_datetime(
df.loc[:, ("Respondent", "StartDate")]
)
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[:, ("Respondent", "EndDate")] = to_datetime(
df.loc[:, ("Respondent", "EndDate")]
)
@@ -720,7 +720,7 @@ def test_loc_setitem_frame_with_reindex_mixed(self):
df = DataFrame(index=[3, 5, 4], columns=["A", "B"], dtype=float)
df["B"] = "string"
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[[4, 3, 5], "A"] = np.array([1, 2, 3], dtype="int64")
ser = Series([2, 3, 1], index=[3, 5, 4], dtype="int64")
expected = DataFrame({"A": ser})
@@ -732,7 +732,7 @@ def test_loc_setitem_frame_with_inverted_slice(self):
df = DataFrame(index=[1, 2, 3], columns=["A", "B"], dtype=float)
df["B"] = "string"
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[slice(3, 0, -1), "A"] = np.array([1, 2, 3], dtype="int64")
expected = DataFrame({"A": [3, 2, 1], "B": "string"}, index=[1, 2, 3])
tm.assert_frame_equal(df, expected)
@@ -909,7 +909,7 @@ def test_loc_setitem_missing_columns(self, index, box, expected):
warn = None
if isinstance(index[0], slice) and index[0] == slice(None):
- warn = FutureWarning
+ warn = DeprecationWarning
msg = "will attempt to set the values inplace instead"
with tm.assert_produces_warning(warn, match=msg):
@@ -1425,7 +1425,7 @@ def test_loc_setitem_single_row_categorical(self):
categories = Categorical(df["Alpha"], categories=["a", "b", "c"])
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[:, "Alpha"] = categories
result = df["Alpha"]
@@ -3211,3 +3211,11 @@ def test_getitem_loc_str_periodindex(self):
index = pd.period_range(start="2000", periods=20, freq="B")
series = Series(range(20), index=index)
assert series.loc["2000-01-14"] == 9
+
+ def test_deprecation_warnings_raised_loc(self):
+ # GH#48673
+ with tm.assert_produces_warning(DeprecationWarning):
+ values = np.arange(4).reshape(2, 2)
+ df = DataFrame(values, columns=["a", "b"])
+ new = np.array([10, 11]).astype(np.int16)
+ df.loc[:, "a"] = new
diff --git pandas/tests/indexing/test_partial.py pandas/tests/indexing/test_partial.py
index 938056902e745..f973bdf7ea6f6 100644
--- pandas/tests/indexing/test_partial.py
+++ pandas/tests/indexing/test_partial.py
@@ -312,7 +312,7 @@ def test_partial_setting_frame(self, using_array_manager):
df = df_orig.copy()
df["B"] = df["B"].astype(np.float64)
msg = "will attempt to set the values inplace instead"
- with tm.assert_produces_warning(FutureWarning, match=msg):
+ with tm.assert_produces_warning(DeprecationWarning, match=msg):
df.loc[:, "B"] = df.loc[:, "A"]
tm.assert_frame_equal(df, expected)
diff --git pandas/tests/io/excel/test_style.py pandas/tests/io/excel/test_style.py
index 00f6ccb96a905..f26df440d263b 100644
--- pandas/tests/io/excel/test_style.py
+++ pandas/tests/io/excel/test_style.py
@@ -115,6 +115,12 @@ def test_styler_to_excel_unstyled(engine):
["border", "left", "color", "rgb"],
{"xlsxwriter": "FF111222", "openpyxl": "00111222"},
),
+ # Border styles
+ (
+ "border-left-style: hair; border-left-color: black",
+ ["border", "left", "style"],
+ "hair",
+ ),
]
@@ -196,6 +202,62 @@ def test_styler_to_excel_basic_indexes(engine, css, attrs, expected):
assert sc_cell == expected
+# From https://openpyxl.readthedocs.io/en/stable/api/openpyxl.styles.borders.html
+# Note: Leaving behavior of "width"-type styles undefined; user should use border-width
+# instead
+excel_border_styles = [
+ # "thin",
+ "dashed",
+ "mediumDashDot",
+ "dashDotDot",
+ "hair",
+ "dotted",
+ "mediumDashDotDot",
+ # "medium",
+ "double",
+ "dashDot",
+ "slantDashDot",
+ # "thick",
+ "mediumDashed",
+]
+
+
+@pytest.mark.parametrize(
+ "engine",
+ ["xlsxwriter", "openpyxl"],
+)
+@pytest.mark.parametrize("border_style", excel_border_styles)
+def test_styler_to_excel_border_style(engine, border_style):
+ css = f"border-left: {border_style} black thin"
+ attrs = ["border", "left", "style"]
+ expected = border_style
+
+ pytest.importorskip(engine)
+ df = DataFrame(np.random.randn(1, 1))
+ styler = df.style.applymap(lambda x: css)
+
+ with tm.ensure_clean(".xlsx") as path:
+ with ExcelWriter(path, engine=engine) as writer:
+ df.to_excel(writer, sheet_name="dataframe")
+ styler.to_excel(writer, sheet_name="styled")
+
+ openpyxl = pytest.importorskip("openpyxl") # test loading only with openpyxl
+ with contextlib.closing(openpyxl.load_workbook(path)) as wb:
+
+ # test unstyled data cell does not have expected styles
+ # test styled cell has expected styles
+ u_cell, s_cell = wb["dataframe"].cell(2, 2), wb["styled"].cell(2, 2)
+ for attr in attrs:
+ u_cell, s_cell = getattr(u_cell, attr, None), getattr(s_cell, attr)
+
+ if isinstance(expected, dict):
+ assert u_cell is None or u_cell != expected[engine]
+ assert s_cell == expected[engine]
+ else:
+ assert u_cell is None or u_cell != expected
+ assert s_cell == expected
+
+
def test_styler_custom_converter():
openpyxl = pytest.importorskip("openpyxl")
diff --git pandas/tests/io/excel/test_writers.py pandas/tests/io/excel/test_writers.py
index d4b74ddbd66e0..f6a77d3acb933 100644
--- pandas/tests/io/excel/test_writers.py
+++ pandas/tests/io/excel/test_writers.py
@@ -496,15 +496,14 @@ def test_float_types(self, np_type, path):
tm.assert_frame_equal(df, recons)
- @pytest.mark.parametrize("np_type", [np.bool8, np.bool_])
- def test_bool_types(self, np_type, path):
- # Test np.bool8 and np.bool_ values read come back as float.
- df = DataFrame([1, 0, True, False], dtype=np_type)
+ def test_bool_types(self, path):
+ # Test np.bool_ values read come back as float.
+ df = DataFrame([1, 0, True, False], dtype=np.bool_)
df.to_excel(path, "test1")
with ExcelFile(path) as reader:
recons = pd.read_excel(reader, sheet_name="test1", index_col=0).astype(
- np_type
+ np.bool_
)
tm.assert_frame_equal(df, recons)
diff --git pandas/tests/io/formats/style/test_html.py pandas/tests/io/formats/style/test_html.py
index 46891863975ea..4ae95645a075d 100644
--- pandas/tests/io/formats/style/test_html.py
+++ pandas/tests/io/formats/style/test_html.py
@@ -1,4 +1,7 @@
-from textwrap import dedent
+from textwrap import (
+ dedent,
+ indent,
+)
import numpy as np
import pytest
@@ -822,18 +825,153 @@ def test_concat(styler):
other = styler.data.agg(["mean"]).style
styler.concat(other).set_uuid("X")
result = styler.to_html()
+ fp = "foot0_"
expected = dedent(
- """\
+ f"""\
<tr>
<th id="T_X_level0_row1" class="row_heading level0 row1" >b</th>
<td id="T_X_row1_col0" class="data row1 col0" >2.690000</td>
</tr>
<tr>
- <th id="T_X_level0_foot_row0" class="foot_row_heading level0 foot_row0" >mean</th>
- <td id="T_X_foot_row0_col0" class="foot_data foot_row0 col0" >2.650000</td>
+ <th id="T_X_level0_{fp}row0" class="{fp}row_heading level0 {fp}row0" >mean</th>
+ <td id="T_X_{fp}row0_col0" class="{fp}data {fp}row0 col0" >2.650000</td>
</tr>
</tbody>
</table>
"""
)
assert expected in result
+
+
+def test_concat_recursion(styler):
+ df = styler.data
+ styler1 = styler
+ styler2 = Styler(df.agg(["mean"]), precision=3)
+ styler3 = Styler(df.agg(["mean"]), precision=4)
+ styler1.concat(styler2.concat(styler3)).set_uuid("X")
+ result = styler.to_html()
+ # notice that the second concat (last <tr> of the output html),
+ # there are two `foot_` in the id and class
+ fp1 = "foot0_"
+ fp2 = "foot0_foot0_"
+ expected = dedent(
+ f"""\
+ <tr>
+ <th id="T_X_level0_row1" class="row_heading level0 row1" >b</th>
+ <td id="T_X_row1_col0" class="data row1 col0" >2.690000</td>
+ </tr>
+ <tr>
+ <th id="T_X_level0_{fp1}row0" class="{fp1}row_heading level0 {fp1}row0" >mean</th>
+ <td id="T_X_{fp1}row0_col0" class="{fp1}data {fp1}row0 col0" >2.650</td>
+ </tr>
+ <tr>
+ <th id="T_X_level0_{fp2}row0" class="{fp2}row_heading level0 {fp2}row0" >mean</th>
+ <td id="T_X_{fp2}row0_col0" class="{fp2}data {fp2}row0 col0" >2.6500</td>
+ </tr>
+ </tbody>
+</table>
+ """
+ )
+ assert expected in result
+
+
+def test_concat_chain(styler):
+ df = styler.data
+ styler1 = styler
+ styler2 = Styler(df.agg(["mean"]), precision=3)
+ styler3 = Styler(df.agg(["mean"]), precision=4)
+ styler1.concat(styler2).concat(styler3).set_uuid("X")
+ result = styler.to_html()
+ fp1 = "foot0_"
+ fp2 = "foot1_"
+ expected = dedent(
+ f"""\
+ <tr>
+ <th id="T_X_level0_row1" class="row_heading level0 row1" >b</th>
+ <td id="T_X_row1_col0" class="data row1 col0" >2.690000</td>
+ </tr>
+ <tr>
+ <th id="T_X_level0_{fp1}row0" class="{fp1}row_he,ading level0 {fp1}row0" >mean</th>
+ <td id="T_X_{fp1}row0_col0" class="{fp1}data {fp1}row0 col0" >2.650</td>
+ </tr>
+ <tr>
+ <th id="T_X_level0_{fp2}row0" class="{fp2}row_heading level0 {fp2}row0" >mean</th>
+ <td id="T_X_{fp2}row0_col0" class="{fp2}data {fp2}row0 col0" >2.6500</td>
+ </tr>
+ </tbody>
+</table>
+ """
+ )
+ assert expected in result
+
+
+def test_concat_combined():
+ def html_lines(foot_prefix: str):
+ assert foot_prefix.endswith("_") or foot_prefix == ""
+ fp = foot_prefix
+ return indent(
+ dedent(
+ f"""\
+ <tr>
+ <th id="T_X_level0_{fp}row0" class="{fp}row_heading level0 {fp}row0" >a</th>
+ <td id="T_X_{fp}row0_col0" class="{fp}data {fp}row0 col0" >2.610000</td>
+ </tr>
+ <tr>
+ <th id="T_X_level0_{fp}row1" class="{fp}row_heading level0 {fp}row1" >b</th>
+ <td id="T_X_{fp}row1_col0" class="{fp}data {fp}row1 col0" >2.690000</td>
+ </tr>
+ """
+ ),
+ prefix=" " * 4,
+ )
+
+ df = DataFrame([[2.61], [2.69]], index=["a", "b"], columns=["A"])
+ s1 = df.style.highlight_max(color="red")
+ s2 = df.style.highlight_max(color="green")
+ s3 = df.style.highlight_max(color="blue")
+ s4 = df.style.highlight_max(color="yellow")
+
+ result = s1.concat(s2).concat(s3.concat(s4)).set_uuid("X").to_html()
+ expected_css = dedent(
+ """\
+ <style type="text/css">
+ #T_X_row1_col0 {
+ background-color: red;
+ }
+ #T_X_foot0_row1_col0 {
+ background-color: green;
+ }
+ #T_X_foot1_row1_col0 {
+ background-color: blue;
+ }
+ #T_X_foot1_foot0_row1_col0 {
+ background-color: yellow;
+ }
+ </style>
+ """
+ )
+ expected_table = (
+ dedent(
+ """\
+ <table id="T_X">
+ <thead>
+ <tr>
+ <th class="blank level0" > </th>
+ <th id="T_X_level0_col0" class="col_heading level0 col0" >A</th>
+ </tr>
+ </thead>
+ <tbody>
+ """
+ )
+ + html_lines("")
+ + html_lines("foot0_")
+ + html_lines("foot1_")
+ + html_lines("foot1_foot0_")
+ + dedent(
+ """\
+ </tbody>
+ </table>
+ """
+ )
+ )
+ assert expected_css + expected_table == result
diff --git pandas/tests/io/formats/style/test_matplotlib.py pandas/tests/io/formats/style/test_matplotlib.py
index 8d9f075d8674d..52fd5355e3302 100644
--- pandas/tests/io/formats/style/test_matplotlib.py
+++ pandas/tests/io/formats/style/test_matplotlib.py
@@ -284,3 +284,17 @@ def test_bar_color_raises(df):
msg = "`color` and `cmap` cannot both be given"
with pytest.raises(ValueError, match=msg):
df.style.bar(color="something", cmap="something else").to_html()
+
+
+@pytest.mark.parametrize(
+ "plot_method",
+ ["scatter", "hexbin"],
+)
+def test_pass_colormap_instance(df, plot_method):
+ # https://github.com/pandas-dev/pandas/issues/49374
+ cmap = mpl.colors.ListedColormap([[1, 1, 1], [0, 0, 0]])
+ df["c"] = df.A + df.B
+ kwargs = dict(x="A", y="B", c="c", colormap=cmap)
+ if plot_method == "hexbin":
+ kwargs["C"] = kwargs.pop("c")
+ getattr(df.plot, plot_method)(**kwargs)
diff --git pandas/tests/io/formats/style/test_to_latex.py pandas/tests/io/formats/style/test_to_latex.py
index b295c955a8967..1c67d125664f8 100644
--- pandas/tests/io/formats/style/test_to_latex.py
+++ pandas/tests/io/formats/style/test_to_latex.py
@@ -1034,6 +1034,26 @@ def test_concat_recursion():
assert result == expected
+def test_concat_chain():
+ # tests hidden row recursion and applied styles
+ styler1 = DataFrame([[1], [9]]).style.hide([1]).highlight_min(color="red")
+ styler2 = DataFrame([[9], [2]]).style.hide([0]).highlight_min(color="green")
+ styler3 = DataFrame([[3], [9]]).style.hide([1]).highlig,ht_min(color="blue")
+
+ result = styler1.concat(styler2).concat(styler3).to_latex(convert_css=True)
+ expected = dedent(
+ """\
+ \\begin{tabular}{lr}
+ & 0 \\\\
+ 0 & {\\cellcolor{red}} 1 \\\\
+ 1 & {\\cellcolor{green}} 2 \\\\
+ 0 & {\\cellcolor{blue}} 3 \\\\
+ \\end{tabular}
+ """
+ )
+ assert result == expected
+
+
@pytest.mark.parametrize(
"df, expected",
[
diff --git pandas/tests/io/formats/style/test_to_string.py pandas/tests/io/formats/style/test_to_string.py
index fcac304b8c3bb..913857396446c 100644
--- pandas/tests/io/formats/style/test_to_string.py
+++ pandas/tests/io/formats/style/test_to_string.py
@@ -53,3 +53,39 @@ def test_concat(styler):
"""
)
assert result == expected
+
+
+def test_concat_recursion(styler):
+ df = styler.data
+ styler1 = styler
+ styler2 = Styler(df.agg(["sum"]), uuid_len=0, precision=3)
+ styler3 = Styler(df.agg(["sum"]), uuid_len=0, precision=4)
+ result = styler1.concat(styler2.concat(styler3)).to_string()
+ expected = dedent(
+ """\
+ A B C
+ 0 0 -0.61 ab
+ 1 1 -1.22 cd
+ sum 1 -1.830 abcd
+ sum 1 -1.8300 abcd
+ """
+ )
+ assert result == expected
+
+
+def test_concat_chain(styler):
+ df = styler.data
+ styler1 = styler
+ styler2 = Styler(df.agg(["sum"]), uuid_len=0, precision=3)
+ styler3 = Styler(df.agg(["sum"]), uuid_len=0, precision=4)
+ result = styler1.concat(styler2).concat(styler3).to_string()
+ expected = dedent(
+ """\
+ A B C
+ 0 0 -0.61 ab
+ 1 1 -1.22 cd
+ sum 1 -1.830 abcd
+ sum 1 -1.8300 abcd
+ """
+ )
+ assert result == expected
diff --git pandas/tests/io/formats/test_info.py pandas/tests/io/formats/test_info.py
index 54b5e699cd034..1657327dd7344 100644
--- pandas/tests/io/formats/test_info.py
+++ pandas/tests/io/formats/test_info.py
@@ -20,6 +20,7 @@
date_range,
option_context,
)
+import pandas._testing as tm
@pytest.fixture
@@ -491,3 +492,12 @@ def test_info_int_columns():
"""
)
assert result == expected
+
+
+def test_memory_usage_empty_no_warning():
+ # GH#50066
+ df = DataFrame(index=["a", "b"])
+ with tm.assert_produces_warning(None):
+ result = df.memory_usage()
+ expected = Series(16 if IS64 else 8, index=["Index"])
+ tm.assert_series_equal(result, expected)
diff --git pandas/tests/io/formats/test_to_excel.py pandas/tests/io/formats/test_to_excel.py
index 7481baaee94f6..2a0f9f59972ef 100644
--- pandas/tests/io/formats/test_to_excel.py
+++ pandas/tests/io/formats/test_to_excel.py
@@ -357,7 +357,7 @@ def test_css_excel_cell_precedence(styles, expected):
"""It applies favors latter declarations over former declarations"""
# See GH 47371
converter = CSSToExcelConverter()
- converter.__call__.cache_clear()
+ converter._call_cached.cache_clear()
css_styles = {(0, 0): styles}
cell = CssExcelCell(
row=0,
@@ -369,7 +369,7 @@ def test_css_excel_cell_precedence(styles, expected):
css_col=0,
css_converter=converter,
)
- converter.__call__.cache_clear()
+ converter._call_cached.cache_clear()
assert cell.style == converter(expected)
@@ -410,7 +410,7 @@ def test_css_excel_cell_cache(styles, cache_hits, cache_misses):
"""It caches unique cell styles"""
# See GH 47371
converter = CSSToExcelConverter()
- converter.__call__.cache_clear()
+ converter._call_cached.cache_clear()
css_styles = {(0, i): _style for i, _style in enumerate(styles)}
for css_row, css_col in css_styles:
@@ -424,8 +424,8 @@ def test_css_excel_cell_cache(styles, cache_hits, cache_misses):
css_col=css_col,
css_converter=converter,
)
- cache_info = converter.__call__.cache_info()
- converter.__call__.cache_clear()
+ cache_info = converter._call_cached.cache_info()
+ converter._call_cached.cache_clear()
assert cache_info.hits == cache_hits
assert cache_info.misses == cache_misses
diff --git pandas/tests/io/parser/common/test_fi,le_buffer_url.py pandas/tests/io/parser/common/test_file_buffer_url.py
index fce1d1260b3fe..e4cd0f1d19a79 100644
--- pandas/tests/io/parser/common/test_file_buffer_url.py
+++ pandas/tests/io/parser/common/test_file_buffer_url.py
@@ -30,7 +30,7 @@
@pytest.mark.network
@tm.network(
url=(
- "https://raw.github.com/pandas-dev/pandas/main/"
+ "https://raw.githubusercontent.com/pandas-dev/pandas/main/"
"pandas/tests/io/parser/data/salaries.csv"
),
check_before_test=True,
@@ -40,7 +40,7 @@ def test_url(all_parsers, csv_dir_path):
kwargs = {"sep": "\t"}
url = (
- "https://raw.github.com/pandas-dev/pandas/main/"
+ "https://raw.githubusercontent.com/pandas-dev/pandas/main/"
"pandas/tests/io/parser/data/salaries.csv"
)
url_result = parser.read_csv(url, **kwargs)
diff --git pandas/tests/io/xml/test_to_xml.py pandas/tests/io/xml/test_to_xml.py
index d3247eb9dd47e..0f42c7e070c4a 100644
--- pandas/tests/io/xml/test_to_xml.py
+++ pandas/tests/io/xml/test_to_xml.py
@@ -1037,9 +1037,16 @@ def test_stylesheet_wrong_path():
def test_empty_string_stylesheet(val):
from lxml.etree import XMLSyntaxError
- with pytest.raises(
- XMLSyntaxError, match=("Document is empty|Start tag expected, '<' not found")
- ):
+ msg = "|".join(
+ [
+ "Document is empty",
+ "Start tag expected, '<' not found",
+ # Seen on Mac with lxml 4.9.1
+ r"None \(line 0\)",
+ ]
+ )
+
+ with pytest.raises(XMLSyntaxError, match=msg):
geom_df.to_xml(stylesheet=val)
diff --git pandas/tests/io/xml/test_xml.py pandas/tests/io/xml/test_xml.py
index fd4ba87bd302c..33a98f57310c2 100644
--- pandas/tests/io/xml/test_xml.py
+++ pandas/tests/io/xml/test_xml.py
@@ -471,7 +471,14 @@ def test_file_handle_close(datapath, parser):
def test_empty_string_lxml(val):
from lxml.etree import XMLSyntaxError
- with pytest.raises(XMLSyntaxError, match="Document is empty"):
+ msg = "|".join(
+ [
+ "Document is empty",
+ # Seen on Mac with lxml 4.91
+ r"None \(line 0\)",
+ ]
+ )
+ with pytest.raises(XMLSyntaxError, match=msg):
read_xml(val, parser="lxml")
diff --git pandas/tests/plotting/frame/test_frame.py pandas/tests/plotting/frame/test_frame.py
index dc70ef7e3520a..6221673d12375 100644
--- pandas/tests/plotting/frame/test_frame.py
+++ pandas/tests/plotting/frame/test_frame.py
@@ -1,3 +1,4 @@
+""" Test cases for DataFrame.plot """
from datetime import (
date,
datetime,
@@ -658,11 +659,6 @@ def test_plot_scatter(self):
with pytest.raises(TypeError, match=msg):
df.plot.scatter(y="y")
- with pytest.raises(TypeError, match="Specify exactly one of `s` and `size`"):
- df.plot.scatter(x="x", y="y", s=2, size=2)
- with pytest.raises(TypeError, match="Specify exactly one of `c` and `color`"):
- df.plot.scatter(x="a", y="b", c="red", color="green")
-
# GH 6951
axes = df.plot(x="x", y="y", kind="scatter", subplots=True)
self._check_axes_shape(axes, axes_num=1, layout=(1, 1))
diff --git pandas/tests/plotting/frame/test_frame_color.py pandas/tests/plotting/frame/test_frame_color.py
index e384861d8a57c..e6653c38df23f 100644
--- pandas/tests/plotting/frame/test_frame_color.py
+++ pandas/tests/plotting/frame/test_frame_color.py
@@ -196,15 +196,14 @@ def test_if_scatterplot_colorbars_are_next_to_parent_axes(self):
assert np.isclose(parent_distance, colorbar_distance, atol=1e-7).all()
@pytest.mark.parametrize("cmap", [None, "Greys"])
- @pytest.mark.parametrize("kw", ["c", "color"])
- def test_scatter_with_c_column_name_with_colors(self, cmap, kw):
+ def test_scatter_with_c_column_name_with_colors(self, cmap):
# https://github.com/pandas-dev/pandas/issues/34316
df = DataFrame(
[[5.1, 3.5], [4.9, 3.0], [7.0, 3.2], [6.4, 3.2], [5.9, 3.0]],
columns=["length", "width"],
)
df["species"] = ["r", "r", "g", "g", "b"]
- ax = df.plot.scatter(x=0, y=1, cmap=cmap, **{kw: "species"})
+ ax = df.plot.scatter(x=0, y=1, cmap=cmap, c="species")
assert ax.collections[0].colorbar is None
def test_scatter_colors(self):
diff --git pandas/tests/resample/test_resample_api.py pandas/tests/resample/test_resample_api.py
index c5cd777962df3..f68a5ed434a36 100644
--- pandas/tests/resample/test_resample_api.py
+++ pandas/tests/resample/test_resample_api.py
@@ -938,3 +938,24 @@ def test_series_downsample_method(method, numeric_only, expected_data):
result = func(numeric_only=numeric_only)
expected = Series(expected_data, index=expected_index)
tm.assert_series_equal(result, expected)
+
+
+@pytest.mark.parametrize("method", ["agg", "apply", "transform"])
+def test_numeric_only_warning_numpy(method):
+ # GH#50538
+ resampled = _test_frame.assign(D="x").resample("H")
+ if method == "transform":
+ msg = "The default value of numeric_only"
+ with tm.assert_produces_warning(FutureWarning, match=msg):
+ getattr(resampled, method)(np.mean)
+ # Ensure users can pass numeric_only
+ result = getattr(resampled, method)(np.mean, numeric_only=True)
+ expected = resampled.transform("mean", numeric_only=True)
+ tm.assert_frame_equal(result, expected)
+ else:
+ msg = "The operation <function mean.*failed"
+ with tm.assert_produces_warning(FutureWarning, match=msg):
+ getattr(resampled, method)(np.mean)
+ # Ensure users can't pass numeric_only
+ with pytest.raises(TypeError, match="got an unexpected keyword argument"):
+ getattr(resampled, method)(np.mean, numeric_only=True)
diff --git pandas/tests/reshape/test_pivot.py pandas/tests/reshape/test_pivot.py
index 77a43954cf699..416ff104b4cae 100644
--- pandas/tests/reshape/test_pivot.py
+++ pandas/tests/reshape/test_pivot.py
@@ -146,7 +146,8 @@ def test_pivot_table_nocols(self):
df = DataFrame(
{"rows": ["a", "b", "c"], "cols": ["x", "y", "z"], "values": [1, 2, 3]}
)
- msg = "The default value of numeric_only"
+ # GH#50538
+ msg = "The operation <function sum.*failed"
with tm.assert_produces_warning(FutureWarning, match=msg):
rs = df.pivot_table(columns="cols", aggfunc=np.sum)
xp = df.pivot_table(index="cols", aggfunc=np.sum).T
@@ -907,7 +908,8 @@ def test_no_col(self):
# to help with a buglet
self.data.columns = [k * 2 for k in self.data.columns]
- msg = "The default value of numeric_only"
+ # GH#50538
+ msg = "The operation <function mean.*failed"
with tm.assert_produces_warning(FutureWarning, match=msg):
table = self.data.pivot_table(
index=["AA", "BB"], margins=True, aggfunc=np.mean
@@ -916,6 +918,7 @@ def test_no_col(self):
totals = table.loc[("All", ""), value_col]
assert totals == self.data[value_col].mean()
+ msg = "pivot_table dropped a column because it failed to aggregate"
with tm.assert_produces_warning(FutureWarning, match=msg):
table = self.data.pivot_table(
index=["AA", "BB"], margins=True, aggfunc="mean"
@@ -975,7 +978,11 @@ def test_margin_with_only_columns_defined(
}
)
- msg = "The default value of numeric_only"
+ if aggfunc == "sum":
+ msg = "pivot_table dropped a column because it failed to aggregate"
+ else:
+ # GH#50538
+ msg = "The operation <function mean.*failed"
with tm.assert_produces_warning(FutureWarning, match=msg):
result = df.pivot_table(columns=columns, margins=True, aggfunc=aggfunc)
expected = DataFrame(values, index=Index(["D", "E"]), columns=expected_columns)
@@ -2004,7 +2011,7 @@ def test_pivot_string_func_vs_func(self, f, f_numpy):
# GH #18713
# for consistency purposes
- msg = "The default value of numeric_only"
+ msg = "pivot_table dropped a column because it failed to aggregate"
with tm.assert_produces_warning(FutureWarning, match=msg):
result = pivot_table(self.data, index="A", columns="B", aggfunc=f)
expected = pivot_table(self.data, index="A", columns="B", aggfunc=f_numpy)
@@ -2269,6 +2276,75 @@ def test_pivot_table_datetime_warning(self):
)
tm.assert_frame_equal(result, expected)
+ def test_pivot_table_with_mixed_nested_tuples(self, using_array_manager):
+ # GH 50342
+ df = DataFrame(
+ {
+ "A": ["foo", "foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar"],
+ "B": ["one", "one", "one", "two", "two", "one", "one", "two", "two"],
+ "C": [
+ "small",
+ "large",
+ "large",
+ "small",
+ "small",
+ "large",
+ "small",
+ "small",
+ "large",
+ ],
+ "D": [1, 2, 2, 3, 3, 4, 5, 6, 7],
+ "E": [2, 4, 5, 5, 6, 6, 8, 9, 9],
+ ("col5",): [
+ "foo",
+ "foo",
+ "foo",
+ "foo",
+ "foo",
+ "bar",
+ "bar",
+ "bar",
+ "bar",
+ ],
+ ("col6", 6): [
+ "one",
+ "one",
+ "one",
+ "two",
+ "two",
+ "one",
+ "one",
+ "two",
+ "two",
+ ],
+ (7, "seven"): [
+ "small",
+ "large",
+ "large",
+ "small",
+ "small",
+ "large",
+ "small",
+ "small",
+ "large",
+ ],
+ }
+ )
+ result = pivot_table(
+ df, values="D", index=["A", "B"], columns=[(7, "seven")], aggfunc=np.sum
+ )
+ expected = DataFrame(
+ [[4.0, 5.0], [7.0, 6.0], [4.0, 1.0], [np.nan, 6.0]],
+ columns=Index(["large", "small"], name=(7, "seven")),
+ index=MultiIndex.from_arrays(
+ [["bar", "bar", "foo", "foo"], ["one", "two"] * 2], names=["A", "B"]
+ ),
+ )
+ if using_array_manager:
+ # INFO(ArrayManager) column without NaNs can preserve int dtype
+ expected["small"] = expected["small"].astype("int64")
+ tm.assert_frame_equal(result, expected)
+
class TestPivot:
def test_pivot(self):
diff --git pandas/tests/scalar/timedelta/test_constructors.py pandas/tests/scalar/timedelta/test_constructors.py
index 5b2438ec30f3a..90e148f51dd87 100644
--- pandas/tests/scalar/timedelta/test_constructors.py
+++ pandas/tests/scalar/timedelta/test_constructors.py
@@ -402,3 +402,12 @@ def test_string_without_numbers(value):
)
with pytest.raises(ValueError, match=msg):
Timedelta(value)
+
+
+def test_subclass_respected():
+ # GH#49579
+ class MyCustomTimedelta(Timedelta):
+ pass
+
+ td = MyCustomTimedelta("1 minute")
+ assert isinstance(td, MyCustomTimedelta)
diff --git pandas/tests/series/indexing/test_get.py pandas/tests/series/indexing/test_get.py
index 1a54796dbeec3..e8034bd4f7160 100644
--- pandas/tests/series/indexing/test_get.py
+++ pandas/tests/series/indexing/test_get.py
@@ -167,8 +167,7 @@ def test_get_with_ea(arr):
expected = ser.iloc[[2, 3]]
tm.assert_series_equal(result, expected)
- with tm.assert_produces_warning(FutureWarning, match="label-based"):
- result = ser.get(slice(2))
+ result = ser.get(slice(2))
expected = ser.iloc[[0, 1]]
tm.assert_series_equal(result, expected)
diff --git pandas/tests/series/indexing/test_getitem.py pandas/tests/series/indexing/test_getitem.py
index cc67dd9caeea9..acdcc03cee92c 100644
--- pandas/tests/series/indexing/test_getitem.py
+++ pandas/tests/series/indexing/test_getitem.py
@@ -338,8 +338,7 @@ def test_getitem_slice_bug(self):
def test_getitem_slice_integers(self):
ser = Series(np.random.randn(8), index=[2, 4, 6, 8, 10, 12, 14, 16])
- with tm.assert_produces_warning(FutureWarning, match="label-based"):
- result = ser[:4]
+ result = ser[:4]
expected = Series(ser.values[:4], index=[2, 4, 6, 8])
tm.assert_series_equal(result, expected)
diff --git pandas/tests/series/indexing/test_setitem.py pandas/tests/series/indexing/test_setitem.py
index 9ab3b6ead017f..500ff2e90adb1 100644
--- pandas/tests/series/indexing/test_setitem.py
+++ pandas/tests/series/indexing/test_setitem.py
@@ -220,15 +220,9 @@ def test_setitem_slice(self):
def test_setitem_slice_integers(self):
ser = Series(np.random.randn(8), index=[2, 4, 6, 8, 10, 12, 14, 16])
- msg = r"In a future version, this will be treated as \*label-based\* indexing"
- with tm.assert_produces_warning(FutureWarning, match=msg):
- ser[:4] = 0
- with tm.assert_produces_warning(
- FutureWarning, match=msg, check_stacklevel=False
- ):
- assert (ser[:4] == 0).all()
- with tm.assert_produces_warning(FutureWarning, match=msg):
- assert not (ser[4:] == 0).any()
+ ser[:4] = 0
+ assert (ser[:4] == 0).all()
+ assert not (ser[4:] == 0).any()
def test_setitem_slicestep(self):
# caught this bug when writing tests
diff --git pandas/tests/series/methods/test_describe.py pandas/tests/series/methods/test_describe.py
index a7cedd580b2d0..56bd9341d2efe 100644
--- pandas/tests/series/methods/test_describe.py
+++ pandas/tests/series/methods/test_describe.py
@@ -1,4 +1,7 @@
import numpy as np
+import pytest
+
+from pandas.compat import is_numpy_dev
from pandas.core.dtypes.common import (
is_complex_dtype,
@@ -163,6 +166,12 @@ def test_numeric_result_dtype(self, any_numeric_dtype):
dtype = "complex128" if is_complex_dtype(any_numeric_dtype) else None
ser = Series([0, 1], dtype=any_numeric_dtype)
+ if dtype == "complex128" and is_numpy_dev:
+ with pytest.raises(
+ TypeError, match=r"^a must be an array of real numbers$"
+ ):
+ ser.describe()
+ return
result = ser.describe()
expected = Series(
[
diff --git pandas/tests/series/methods/test_equals.py pandas/tests/series/methods/test_equals.py
index 22e27c271df88..9278d1b51e1aa 100644
--- pandas/tests/series/methods/test_equals.py
+++ pandas/tests/series/methods/test_equals.py
@@ -5,6 +5,7 @@
import pytest
from pandas._libs.missing import is_matching_na
+from pandas.compat import is_numpy_dev
from pandas.core.dtypes.common import is_float
@@ -50,7 +51,7 @@ def test_equals_list_array(val):
cm = (
tm.assert_produces_warning(FutureWarning, check_stacklevel=False)
- if isinstance(val, str)
+ if isinstance(val, str) and not is_numpy_dev
else nullcontext()
)
with cm:
diff --git pandas/tests/series/methods/test_quantile.py pandas/tests/series/methods/test_quantile.py
index aeff5b3adfe56..e740f569506ff 100644
--- pandas/tests/series/methods/test_quantile.py
+++ pandas/tests/series/methods/test_quantile.py
@@ -225,3 +225,18 @@ def test_quantile_dtypes(self, dtype):
if dtype == "Int64":
expected = expected.astype("Float64")
tm.assert_series_equal(result, expected)
+
+ def test_quantile_all_na(self, any_int_ea_dtype):
+ # GH#50681
+ ser = Series([pd.NA, pd.NA], dtype=any_int_ea_dtype)
+ with tm.assert_produces_warning(None):
+ result = ser.quantile([0.1, 0.5])
+ expected = Series([pd.NA, pd.NA], dtype=any_int_ea_dtype, index=[0.1, 0.5])
+ tm.assert_series_equal(result, expected)
+
+ def test_quantile_dtype_size(self, any_int_ea_dtype):
+ # GH#50681
+ ser = Series([pd.NA, pd.NA, 1], dtype=any_int_ea_dtype)
+ result = ser.quantile([0.1, 0.5])
+ expected = Series([1, 1], dtype=any_int_ea_dtype, index=[0.1, 0.5])
+ tm.assert_series_equal(result, expected)
diff --git pandas/tests/series/methods/test_replace.py pandas/tests/series/methods/test_replace.py
index 77c9cf4013bd7..126a89503d636 100644
--- pandas/tests/series/methods/test_replace.py
+++ pandas/tests/series/methods/test_replace.py
@@ -667,3 +667,11 @@ def test_replace_different_int_types(self, any_int_numpy_dtype):
result = labs.replace(map_dict)
expected = labs.replace({0: 0, 2: 1, 1: 2})
tm.assert_series_equal(result, expected)
+
+ @pytest.mark.parametrize("val", [2, np.nan, 2.0])
+ def test_replace_value_none_dtype_numeric(self, val):
+ # GH#48231
+ ser = pd.Series([1, val])
+ result = ser.replace(val, None)
+ expected = pd.Series([1, None], dtype=object)
+ tm.assert_series_equal(result, expected)
diff --git pandas/tests/series/test_constructors.py pandas/tests/series/test_constructors.py
index 53b2cbedc0ae3..bc26158f40416 100644
--- pandas/tests/series/test_constructors.py
+++ pandas/tests/series/test_constructors.py
@@ -14,6 +14,7 @@
lib,
)
from pandas.compat import is_numpy_dev
+from pandas.compat.numpy import np_version_gte1p24
import pandas.util._test_decorators as td
from pandas.core.dtypes.common import (
@@ -735,18 +736,18 @@ def test_constructor_cast(self):
def test_constructor_signed_int_overflow_deprecation(self):
# GH#41734 disallow silent overflow
msg = "Values are too large to be losslessly cast"
- numpy_warning = DeprecationWarning if is_numpy_dev else None
- with tm.assert_produces_warning(
- (FutureWarning, numpy_warning), match=msg, check_stacklevel=False
- ):
+ warns = (
+ (FutureWarning, DeprecationWarning)
+ if is_numpy_dev or np_version_gte1p24
+ else FutureWarning
+ )
+ with tm.assert_produces_warning(warns, match=msg, check_stacklevel=False):
ser = Series([1, 200, 923442], dtype="int8")
expected = Series([1, -56, 50], dtype="int8")
tm.assert_series_equal(ser, expected)
- with tm.assert_produces_warning(
- (FutureWarning, numpy_warning), match=msg, check_stacklevel=False
- ):
+ with tm.assert_produces_warning(warns, match=msg, check_stacklevel=False):
ser = Series([1, 200, 923442], dtype="uint8")
expected = Series([1, 200, 50], dtype="uint8")
diff --git pandas/tests/util/test_assert_almost_equal.py pandas/tests/util/test_assert_almost_equal.py
index ab53707771be6..4e6c420341bc2 100644
--- pandas/tests/util/test_assert_almost_equal.py
+++ pandas/tests/util/test_assert_almost_equal.py
@@ -458,3 +458,87 @@ def test_assert_almost_equal_iterable_values_mismatch():
with pytest.raises(AssertionError, match=msg):
tm.assert_almost_equal([1, 2], [1, 3])
+
+
+subarr = np.empty(2, dtype=object)
+subarr[:] = [np.array([None, "b"], dtype=object), np.array(["c", "d"], dtype=object)]
+
+NESTED_CASES = [
+ # nested array
+ (
+ np.array([np.array([50, 70, 90]), np.array([20, 30])], dtype=object),
+ np.array([np.array([50, 70, 90]), np.array([20, 30])], dtype=object),
+ ),
+ # >1 level of nesting
+ (
+ np.array(
+ [
+ np.array([np.array([50, 70]), np.array([90])], dtype=object),
+ np.array([np.array([20, 30])], dtype=object),
+ ],
+ dtype=object,
+ ),
+ np.array(
+ [
+ np.array([np.array([50, 70]), np.array([90])], dtype=object),
+ np.array([np.array([20, 30])], dtype=object),
+ ],
+ dtype=object,
+ ),
+ ),
+ # lists
+ (
+ np.array([[50, 70, 90], [20, 30]], dtype=object),
+ np.array([[50, 70, 90], [20, 30]], dtype=object),
+ ),
+ # mixed array/list
+ (
+ np.array([np.array([1, 2, 3]), np.array([4, 5])], dtype=object),
+ np.array([[1, 2, 3], [4, 5]], dtype=object),
+ ),
+ (
+ np.array(
+ [
+ np.array([np.array([1, 2, 3]), np.array([4, 5])], dtype=object),
+ np.array(
+ [np.array([6]), np.array([7, 8]), np.array([9])], dtype=object
+ ),
+ ],
+ dtype=object,
+ ),
+ np.array([[[1, 2, 3], [4, 5]], [[6], [7, 8], [9]]], dtype=object),
+ ),
+ # same-length lists
+ (
+ np.array([subarr, None], dtype=object),
+ np.array([list([[None, "b"], ["c", "d"]]), None], dtype=object),
+ ),
+ # dicts
+ (
+ np.array([{"f1": 1, "f2": np.array(["a", "b"], dtype=object)}], dtype=object),
+ np.array([{"f1": 1, "f2": np.array(["a", "b"], dtype=object)}], dtype=object),
+ ),
+ (
+ np.array([{"f1": 1, "f2": np.array(["a", "b"], dtype=object)}], dtype=object),
+ np.array([{"f1": 1, "f2": ["a", "b"]}], dtype=object),
+ ),
+ # array/list of dicts
+ (
+ np.array(
+ [
+ np.array(
+ [{"f1": 1, "f2": np.array(["a", "b"], dtype=object)}], dtype=object
+ ),
+ np.array([], dtype=object),
+ ],
+ dtype=object,
+ ),
+ np.array([[{"f1": 1, "f2": ["a", "b"]}], []], dtype=object),
+ ),
+]
+
+
+@pytest.mark.filterwarnings("ignore:elementwise comparison failed:DeprecationWarning")
+@pytest.mark.parametrize("a,b", NESTED_CASES)
+def test_assert_almost_equal_array_nested(a, b):
+ _assert_almost_equal_both(a, b)
diff --git a/pandas/tests/util/test_rewrite_warning.py b/pandas/tests/util/test_rewrite_warning.py
new file mode 100644
index 0000000000000..f847a06d8ea8d
--- /dev/null
+++ pandas/tests/util/test_rewrite_warning.py
@@ -0,0 +1,39 @@
+import warnings
+
+import pytest
+
+from pandas.util._exceptions import rewrite_warning
+
+import pandas._testing as tm
+
+
+@pytest.mark.parametrize(
+ "target_category, target_message, hit",
+ [
+ (FutureWarning, "Target message", True),
+ (FutureWarning, "Target", True),
+ (FutureWarning, "get mess", True),
+ (FutureWarning, "Missed message", False),
+ (DeprecationWarning, "Target message", False),
+ ],
+)
+@pytest.mark.parametrize(
+ "new_category",
+ [
+ None,
+ DeprecationWarning,
+ ],
+)
+def test_rewrite_warning(target_category, target_message, hit, new_category):
+ new_message = "Rewritten message"
+ if hit:
+ expected_category = new_category if new_category else target_category
+ expected_message = new_message
+ else:
+ expected_category = FutureWarning
+ expected_message = "Target message"
+ with tm.assert_produces_warning(expected_category, match=expected_message):
+ with rewrite_warning(
+ target_message, target_category, new_message, new_category
+ ):
+ warnings.warn(message="Target message", category=FutureWarning)
diff --git pandas/util/_exceptions.py pandas/util/_exceptions.py
index c718451fbf621..f300f2c52f175 100644
--- pandas/util/_exceptions.py
+++ pandas/util/_exceptions.py
@@ -3,7 +3,9 @@
import contextlib
import inspect
import os
+import re
from typing import Iterator
+import warnings
@contextlib.contextmanager
@@ -47,3 +49,46 @@ def find_stack_level() -> int:
else:
break
return n
+
+
+@contextlib.contextmanager
+def rewrite_warning(
+ target_message: str,
+ target_category: type[Warning],
+ new_message: str,
+ new_category: type[Warning] | None = None,
+) -> Iterator[None]:
+ """
+ Rewrite the message of a warning.
+
+ Parameters
+ ----------
+ target_message : str
+ Warning message to match.
+ target_category : Warning
+ Warning type to match.
+ new_message : str
+ New warning message to emit.
+ new_category : Warning or None, default None
+ New warning type to emit. When None, will be the same as target_category.
+ """
+ if new_category is None:
+ new_category = target_category
+ with warnings.catch_warnings(record=True) as record:
+ yield
+ if len(record) > 0:
+ match = re.compile(target_message)
+ for warning in record:
+ if warning.category is target_category and re.search(
+ match, str(warning.message)
+ ):
+ category = new_category
+ message: Warning | str = new_message
+ else:
+ category, message = warning.category, warning.message
+ warnings.warn_explicit(
+ message=message,
+ category=category,
+ filename=warning.filename,
+ lineno=warning.lineno,
+ )
diff --git pandas/util/_test_decorators.py pandas/util/_test_decorators.py
index 4a4f27f6c7906..dc49dc6adf378 100644
--- pandas/util/_test_decorators.py
+++ pandas/util/_test_decorators.py
@@ -94,6 +94,13 @@ def safe_import(mod_name: str, min_version: str | None = None):
mod = __import__(mod_name)
except ImportError:
return False
+ except SystemError:
+ # TODO: numba is incompatible with numpy 1.24+.
+ # Once that's fixed, this block should be removed.
+ if mod_name == "numba":
+ return False
+ else:
+ raise
if not min_version:
return mod
diff --git pyproject.toml pyproject.toml
index 67c56123a847c..54edbfb8ea938 100644
--- pyproject.toml
+++ pyproject.toml
@@ -5,7 +5,7 @@ requires = [
"setuptools>=51.0.0",
"wheel",
"Cython>=0.29.32,<3", # Note: sync with setup.py, environment.yml and asv.conf.json
- "oldest-supported-numpy>=0.10"
+ "oldest-supported-numpy>=2022.8.16"
]
# uncomment to enable pep517 after versioneer problem is fixed.
# https://github.com/python-versioneer/python-versioneer/issues/193
diff --git requirements-dev.txt requirements-dev.txt
index 3e5e1b37a3e72..95291e4ab5452 100644
--- requirements-dev.txt
+++ requirements-dev.txt
@@ -22,14 +22,14 @@ hypothesis
gcsfs
jinja2
lxml
-matplotlib
+matplotlib>=3.6.1
numba>=0.53.1
numexpr>=2.8.0
openpyxl
odfpy
pandas-gbq
psycopg2
-pyarrow
+pyarrow<10
pymysql
pyreadstat
tables
@@ -37,7 +37,7 @@ python-snappy
pyxlsb
s3fs>=2021.08.0
scipy
-sqlalchemy
+sqlalchemy<1.4.46
tabulate
tzdata>=2022.1
xarray
diff --git setup.cfg setup.cfg
index f2314316f7732..dbf05957896f8 100644
--- setup.cfg
+++ setup.cfg
@@ -22,10 +22,11 @@ classifiers =
Programming Language :: Python :: 3.8
Programming Language :: Python :: 3.9
Programming Language :: Python :: 3.10
+ Programming Language :: Python :: 3.11
Topic :: Scientific/Engineering
project_urls =
Bug Tracker = https://github.com/pandas-dev/pandas/issues
- Documentation = https://pandas.pydata.org/pandas-docs/stable
+ Documentation = https://pandas.pydata.org/docs/
Source Code = https://github.com/pandas-dev/pandas
[options]
@@ -33,6 +34,7 @@ packages = find:
install_requires =
numpy>=1.20.3; python_version<'3.10'
numpy>=1.21.0; python_version>='3.10'
+ numpy>=1.23.2; python_version>='3.11'
python-dateutil>=2.8.1
pytz>=2020.1
python_requires = >=3.8
diff --git web/pandas/about/team.md web/pandas/about/team.md
index 261d577b2abc1..bdd5d5d2b2468 100644
--- web/pandas/about/team.md
+++ web/pandas/about/team.md
@@ -9,7 +9,8 @@ If you want to support pandas development, you can find information in the [dona
## Active maintainers
<div class="card-group maintainers">
- {% for person in maintainers.active_with_github_info %}
+ {% for username in maintainers.active %}
+ {% set person = maintainers.github_info.get(username) %}
<div class="card">
<img class="card-img-top" alt="" src="{{ person.avatar_url }}"/>
<div class="card-body">
@@ -63,7 +64,8 @@ The project governance is available in the [project governance page](governance.
## Inactive maintainers
<ul>
- {% for person in maintainers.inactive_with_github_info %}
+ {% for username in maintainers.inactive %}
+ {% set person = maintainers.github_info.get(username) %}
<li>
<a href="{{ person.blog or person.html_url }}">
{{ person.name or person.login }}
diff --git web/pandas/config.yml web/pandas/config.yml
index 16e1357d405a0..85dee1d114800 100644
--- web/pandas/config.yml
+++ web/pandas/config.yml
@@ -1,10 +1,10 @@
main:
templates_path: _templates
base_template: "layout.html"
+ production_url: "https://pandas.pydata.org/"
ignore:
- _templates/layout.html
- config.yml
- - try.md # the binder page will be added later
github_repo_url: pandas-dev/pandas
context_preprocessors:
- pandas_web.Preprocessors.current_year
diff --git web/pandas_web.py web/pandas_web.py
index 62539574543a9..1a2bc45bd87e0 100755
--- web/pandas_web.py
+++ web/pandas_web.py
@@ -27,6 +27,7 @@
import collections
import datetime
import importlib
+import json
import operator
import os
import pathlib
@@ -163,14 +164,34 @@ def maintainers_add_info(context):
if repeated:
raise ValueError(f"Maintainers {repeated} are both active and inactive")
- for kind in ("active", "inactive"):
- context["maintainers"][f"{kind}_with_github_info"] = []
- for user in context["maintainers"][kind]:
- resp = requests.get(f"https://api.github.com/users/{user}")
- if context["ignore_io_errors"] and resp.status_code == 403:
- return context
- resp.raise_for_status()
- context["maintainers"][f"{kind}_with_github_info"].append(resp.json())
+ maintainers_info = {}
+ for user in (
+ context["maintainers"]["active"] + context["maintainers"]["inactive"]
+ ):
+ resp = requests.get(f"https://api.github.com/users/{user}")
+ if resp.status_code == 403:
+ sys.stderr.write(
+ "WARN: GitHub API quota exceeded when fetching maintainers\n"
+ )
+ # if we exceed github api quota, we use the github info
+ # of maintainers saved with the website
+ resp_bkp = requests.get(
+ context["main"]["production_url"] + "maintainers.json"
+ )
+ resp_bkp.raise_for_status()
+ maintainers_info = resp_bkp.json()
+ break
+
+ resp.raise_for_status()
+ maintainers_info[user] = resp.json()
+
+ context["maintainers"]["github_info"] = maintainers_info
+
+ # save the data fetched from github to use it in case we exceed
+ # git github api quota in the future
+ with open(pathlib.Path(context["target_path"]) / "maintainers.json", "w") as f:
+ json.dump(maintainers_info, f)
+
return context
@staticmethod
@@ -179,11 +200,19 @@ def home_add_releases(context):
github_repo_url = context["main"]["github_repo_url"]
resp = requests.get(f"https://api.github.com/repos/{github_repo_url}/releases")
- if context["ignore_io_errors"] and resp.status_code == 403:
- return context
- resp.raise_for_status()
+ if resp.status_code == 403:
+ sys.stderr.write("WARN: GitHub API quota exceeded when fetching releases\n")
+ resp_bkp = requests.get(context["main"]["production_url"] + "releases.json")
+ resp_bkp.raise_for_status()
+ releases = resp_bkp.json()
+ else:
+ resp.raise_for_status()
+ releases = resp.json()
- for release in resp.json():
+ with open(pathlib.Path(context["target_path"]) / "releases.json", "w") as f:
+ json.dump(releases, f, default=datetime.datetime.isoformat)
+
+ for release in releases:
if release["prerelease"]:
continue
published = datetime.datetime.strptime(
@@ -201,6 +230,7 @@ def home_add_releases(context):
),
}
)
+
return context
@staticmethod
@@ -247,12 +277,20 @@ def roadmap_pdeps(context):
"https://api.github.com/search/issues?"
f"q=is:pr is:open label:PDEP repo:{github_repo_url}"
)
- if context["ignore_io_errors"] and resp.status_code == 403:
- return context
- resp.raise_for_status()
+ if resp.status_code == 403:
+ sys.stderr.write("WARN: GitHub API quota exceeded when fetching pdeps\n")
+ resp_bkp = requests.get(context["main"]["production_url"] + "pdeps.json")
+ resp_bkp.raise_for_status()
+ pdeps = resp_bkp.json()
+ else:
+ resp.raise_for_status()
+ pdeps = resp.json()
+
+ with open(pathlib.Path(context["target_path"]) / "pdeps.json", "w") as f:
+ json.dump(pdeps, f)
- for pdep in resp.json()["items"]:
- context["pdeps"]["under_discussion"].append(
+ for pdep in pdeps["items"]:
+ context["pdeps"]["Under discussion"].append(
{"title": pdep["title"], "url": pdep["url"]}
)
@@ -285,7 +323,7 @@ def get_callable(obj_as_str: str) -> object:
return obj
-def get_context(config_fname: str, ignore_io_errors: bool, **kwargs):
+def get_context(config_fname: str, **kwargs):
"""
Load the config yaml as the base context, and enrich it with the
information added by the context preprocessors defined in the file.
@@ -294,7 +332,6 @@ def get_context(config_fname: str, ignore_io_errors: bool, **kwargs):
context = yaml.safe_load(f)
context["source_path"] = os.path.dirname(config_fname)
- context["ignore_io_errors"] = ignore_io_errors
context.update(kwargs)
preprocessors = (
@@ -332,7 +369,9 @@ def extend_base_template(content: str, base_template: str) -> str:
def main(
- source_path: str, target_path: str, base_url: str, ignore_io_errors: bool
+ source_path: str,
+ target_path: str,
+ base_url: str,
) -> int:
"""
Copy every file in the source directory to the target directory.
@@ -346,7 +385,7 @@ def main(
os.makedirs(target_path, exist_ok=True)
sys.stderr.write("Generating context...\n")
- context = get_context(config_fname, ignore_io_errors, base_url=base_url)
+ context = get_context(config_fname, base_url=base_url, target_path=target_path)
sys.stderr.write("Context generated\n")
templates_path = os.path.join(source_path, context["main"]["templates_path"])
@@ -390,15 +429,5 @@ def main(
parser.add_argument(
"--base-url", default="", help="base url where the website is served from"
)
- parser.add_argument(
- "--ignore-io-errors",
- action="store_true",
- help="do not fail if errors happen when fetching "
- "data from http sources, and those fail "
- "(mostly useful to allow github quota errors "
- "when running the script locally)",
- )
args = parser.parse_args()
- sys.exit(
- main(args.source_path, args.target_path, args.base_url, args.ignore_io_errors)
- )
+ sys.exit(main(args.source_path, args.target_path, args.base_url))
DescriptionThis PR contains a variety of updates and maintenance changes to the pandas codebase. The primary focus is removing support for Python 1.4.x branch, transitioning to newer Ubuntu version (22.04), updating dependencies, and fixing various bugs and regressions. The PR also adds better handling of CoW (Copy-on-Write) functionality and makes multiple improvements to the CI infrastructure. ChangesChanges
sequenceDiagram
participant User
participant CI
participant PandasCore
participant Web
participant DocSys
User->>PandasCore: PR removing Python 1.4.x support
CI->>CI: Update GitHub Actions to run on ubuntu-22.04
CI->>CI: Update dependency versions (PyArrow<10)
CI->>CI: Update Dockerfile for development
PandasCore->>PandasCore: Fix bugs in Copy-on-Write implementation
PandasCore->>PandasCore: Fix regression in MultiIndex.join
PandasCore->>PandasCore: Fix regression in Series.replace
PandasCore->>PandasCore: Add rewrite_warning utility
PandasCore->>PandasCore: Improve nested array handling
PandasCore->>PandasCore: Add better support for Generic types in Python 3.11
DocSys->>DocSys: Add whatsnew docs for 1.5.2 and 1.5.3
DocSys->>DocSys: Update contributor documentation
DocSys->>DocSys: Fix broken links and references
DocSys->>DocSys: Enhance Styler documentation
Web->>Web: Update website generation process
Web->>Web: Add failsafe for GitHub API quota limits
Web->>Web: Improve maintainers page
CI->>User: All CI checks pass
|
This PR contains the following updates:
==1.5.1
->==1.5.3
Release Notes
pandas-dev/pandas (pandas)
v1.5.3
: Pandas 1.5.3Compare Source
This is a patch release in the 1.5.x series and includes some regression and bug fixes. We recommend that all users upgrade to this version.
See the full whatsnew for a list of all the changes.
The release will be available on the defaults and conda-forge channels:
Or via PyPI:
Please report any issues with the release on the pandas issue tracker.
Thanks to all the contributors who made this release possible.
v1.5.2
: Pandas 1.5.2Compare Source
This is a patch release in the 1.5.x series and includes some regression and bug fixes. We recommend that all users upgrade to this version.
See the full whatsnew for a list of all the changes.
The release will be available on the defaults and conda-forge channels:
Or via PyPI:
Please report any issues with the release on the pandas issue tracker.
Thanks to all the contributors who made this release possible.
Configuration
📅 Schedule: Branch creation - "* * * * 2-4" (UTC), Automerge - At any time (no schedule defined).
🚦 Automerge: Disabled by config. Please merge this manually once you are satisfied.
♻ Rebasing: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.
🔕 Ignore: Close this PR and you won't be reminded about this update again.
This PR was generated by Mend Renovate. View the repository job log.