Skip to content

Commit

Permalink
Use rapids-pip-retry in CI jobs that might need retries (#125)
Browse files Browse the repository at this point in the history
Uses a retry wrapper for `pip` commands to try to alleviate CI failures
due to hash mismatches that result from network hiccups

xref rapidsai/build-planning#148

This will retry failures that show up in CI like:

```
   Collecting nvidia-cublas-cu12 (from libraft-cu12==25.2.*,>=0.0.0a0)
    Downloading https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl (604.9 MB)
       ━━━━━━━━━━━━━━━━━━━━━                 350.2/604.9 MB 229.2 MB/s eta 0:00:02
  ERROR: THESE PACKAGES DO NOT MATCH THE HASHES FROM THE REQUIREMENTS FILE. If you have updated the package versions, please update the hashes. Otherwise, examine the package contents carefully; someone may have tampered with them.
      nvidia-cublas-cu12 from https://pypi.nvidia.com/nvidia-cublas-cu12/nvidia_cublas_cu12-12.8.3.14-py3-none-manylinux_2_27_aarch64.whl#sha256=93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3 (from libraft-cu12==25.2.*,>=0.0.0a0):
          Expected sha256 93a4e0e386cc7f6e56c822531396de8170ed17068a1e18f987574895044cd8c3
               Got        849c88d155cb4b4a3fdfebff9270fb367c58370b4243a2bdbcb1b9e7e940b7be
```

This PR also updates the build images used for `ci-wheel` to use CUDA
12.8.0 (we no longer maintain CUDA 12.0.1 build images) and uses NVKS L4
GPUs for amd64 testing, to help move us off of the old CI cluster.

---------

Co-authored-by: Bradley Dice <[email protected]>
  • Loading branch information
gforsyth and bdice authored Feb 25, 2025
1 parent 68e63ae commit 5e37811
Show file tree
Hide file tree
Showing 3 changed files with 11 additions and 11 deletions.
18 changes: 9 additions & 9 deletions .github/actions/compute-matrix/action.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,18 +16,18 @@ runs:
set -eo pipefail
export BUILD_MATRIX="
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.10', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.11', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.12', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.0.1', ARCH: 'arm64', PY_VER: '3.10', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.0.1', ARCH: 'arm64', PY_VER: '3.11', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.0.1', ARCH: 'arm64', PY_VER: '3.12', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.8.0', ARCH: 'amd64', PY_VER: '3.10', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.8.0', ARCH: 'amd64', PY_VER: '3.11', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.8.0', ARCH: 'amd64', PY_VER: '3.12', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.8.0', ARCH: 'arm64', PY_VER: '3.10', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.8.0', ARCH: 'arm64', PY_VER: '3.11', LINUX_VER: 'rockylinux8' }
- { CUDA_VER: '12.8.0', ARCH: 'arm64', PY_VER: '3.12', LINUX_VER: 'rockylinux8' }
"
export TEST_MATRIX="
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.10', LINUX_VER: 'ubuntu20.04', gpu: 'v100', driver: 'latest' }
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.11', LINUX_VER: 'ubuntu20.04', gpu: 'v100', driver: 'latest' }
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.12', LINUX_VER: 'ubuntu20.04', gpu: 'v100', driver: 'latest' }
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.10', LINUX_VER: 'ubuntu20.04', gpu: 'l4', driver: 'latest' }
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.11', LINUX_VER: 'ubuntu20.04', gpu: 'l4', driver: 'latest' }
- { CUDA_VER: '12.0.1', ARCH: 'amd64', PY_VER: '3.12', LINUX_VER: 'ubuntu20.04', gpu: 'l4', driver: 'latest' }
- { CUDA_VER: '12.0.1', ARCH: 'arm64', PY_VER: '3.10', LINUX_VER: 'ubuntu20.04', gpu: 'a100', driver: 'latest' }
- { CUDA_VER: '12.0.1', ARCH: 'arm64', PY_VER: '3.11', LINUX_VER: 'ubuntu20.04', gpu: 'a100', driver: 'latest' }
- { CUDA_VER: '12.0.1', ARCH: 'arm64', PY_VER: '3.12', LINUX_VER: 'ubuntu20.04', gpu: 'a100', driver: 'latest' }
Expand Down
2 changes: 1 addition & 1 deletion ci/build_wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ sccache --zero-stats

rapids-logger "Build wheel"
mkdir -p ./dist
python -m pip wheel . --wheel-dir=./dist -v --disable-pip-version-check --no-deps
rapids-pip-retry wheel . --wheel-dir=./dist -v --disable-pip-version-check --no-deps

sccache --show-adv-stats

Expand Down
2 changes: 1 addition & 1 deletion ci/test_wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ RAPIDS_TESTS_DIR=${RAPIDS_TESTS_DIR:-"${PWD}/test-results"}/
mkdir -p "${RAPIDS_TESTS_DIR}"

rapids-logger "Install wheel"
python -m pip install "$(echo ./dist/pynvjitlink_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]"
rapids-pip-retry install "$(echo ./dist/pynvjitlink_"${RAPIDS_PY_CUDA_SUFFIX}"*.whl)[test]"

rapids-logger "Build Tests"
pushd test_binary_generation
Expand Down

0 comments on commit 5e37811

Please sign in to comment.