feat: add stable install tests by gforsyth · Pull Request #837 · rapidsai/integration

gforsyth · 2026-04-07T20:40:22Z

This is a first pass at rapidsai/build-planning#227.

It adds nightly tests that try to install the latest stable version of RAPIDS on all supported Python and CUDA versions, on amd64 and arm64, with both pip and conda.

The tests are currently limited to:

does it install successfully (and for pip, can we install the available packages from upstream pypi)
can we import the installed libraries without any symbol lookup errors

This should wait until after the 26.04 release before merging

jameslamb

Thanks for doing this!

I nosed in and left a couple comments for your consideration... I've spent the last couple weeks deep in RAPIDS' wheel-testing setup (e.g. for rapidsai/build-planning#256), and that informed some of what I was looking for here.

Overall I do really like the structure! Especially running the jobs in parallel and using the nvidia/cuda:*-base-* image for wheels.

And I agree that "install everything + import the libraries" is a good target for this first pass.

ci/stable_install/install_and_test_conda.sh

jameslamb · 2026-04-07T20:57:56Z

ci/stable_install/install_and_test_conda.sh

+
+set -euo pipefail
+
+STABLE_RAPIDS_VERSION="26.4.*"


https://github.com/rapidsai/integration/blob/main/ci/release/update-version.sh should be updated to modify this

ci/stable_install/install_and_test_conda.sh

jameslamb · 2026-04-07T21:03:08Z

ci/stable_install/install_and_test_pip.sh

+
+STABLE_RAPIDS_VERSION="26.4.*"
+SUPPORTED_PYTHON_VERSIONS=(3.11 3.12 3.13 3.14)
+SUPPORTED_CUDA_VERSIONS=("cu12" "cu13")


Would you consider also having this test the bounds (12.2, 12.9, 13.0, 13.1)?

That could be done by adding a requirement like this to the pip install calls:

cuda-toolkit[all]==${cuda_major_minor}.*

That'd be a really helpful extension of rapidsai/build-planning#256, and I think it'd help us catch conflicts that aren't easily caught in individual projects' CI.

jameslamb · 2026-04-07T21:08:19Z

ci/stable_install/install_and_test_pip.sh

+            "cuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}"
+            --extra-index-url=https://pypi.nvidia.com
+            )
+


Similar to my comments on conda, I think this would be a more powerful test if it combined all the imports into one.

I'd structure it roughly like this:

# get all the pypi.nvidia.com stuff wheels_dir=$(mktemp -d) pip download \ --isolated \ --index-url https://pypi.nvidia.com \ --prefer-binary \ --no-deps \ -d "${wheels_dir}" \ "cucim-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ "cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ "cuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ "cuxfilter-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ "libcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ "libcuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ "pylibcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" pip install \ --isolated \ --index-url https://pypi.org/simple \ --prefer-binary \ "${PIP_INSTALL_PYPI[@]}" \ "${wheels_dir}"/*.whl python -c "import cudf; import dask_cudf; ...; import cuvs"

Would you consider something like that?

.github/workflows/test.yaml

jameslamb · 2026-04-07T21:16:08Z

.github/workflows/test.yaml

+      container_image: "nvidia/cuda:12.9.1-base-ubuntu-24.04"
+      script: |
+        ./ci/stable_install/install_and_test_pip.sh --cuda cu12
+  test-stable-install-pip-cuda-13-amd64:


We could cut the number of jobs here in half by having matrix elements inside of each of these for amd64 and arm64. Like this: https://github.com/rapidsai/cuvs/blob/cbb9db5697eeebcb03a6ed198b7d4386ce14a301/.github/workflows/pr.yaml#L365-L374

Will you consider that?

That language was a little imprecise... cut the number of configurations in half. The number of jobs would be unchanged.

Co-authored-by: James Lamb <jaylamb20@gmail.com>

gforsyth · 2026-04-08T14:31:33Z

ci/stable_install/test_imports.sh

+function testImports {
+    unset imports
+    while [[ $# -gt 0 ]]; do
+        # run standalone import test
+        rapids-logger "Standalone import test for $1"
+        python -c "import $1" || rapids-logger "Test failed for: $1"
+        rapids-logger "Passed"
+        # add import to array for combined import test before shifting
+        imports+=("$1")
+        shift
+    done
+    import_cmd=$(printf "import %s; " "${imports[@]}")
+    rapids-logger "Combined import test for: ${imports[*]}"
+    python -c "${import_cmd}" || rapids-logger "Test failed for: ${imports[*]}"
+    rapids-logger "Passed"
+}


This does enough now that I broke it out into a separate script so it can be sourced by both test scripts.

Imports each library individually in separate Python sessions, then imports all of them sequentially in the same Python session.

I like it a lot!!!

gforsyth · 2026-04-08T14:32:36Z

ci/stable_install/install_and_test_pip.sh

+          WHEELS_DIR=$(mktemp -d)
+          pip download \
+            --isolated \
+            --index-url https://pypi.nvidia.com \
+            --prefer-binary \
+            --no-deps \
+            -d "${WHEELS_DIR}" \
+            "cucim-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "cuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "cuxfilter-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "libcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "libcuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "nx-cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "pylibcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}"


This does an unnecessary double-download for each CUDA major version, but avoiding it requires annoying and error-prone state-tracking

Thanks for calling it out. I think that's totally fine.

jameslamb

Thanks for the import changes, I think we'll be really happy to have those!

I left a couple more suggestions, and one that I think is worth blocking the PR over.

I'd love to also see actual runs of these jobs. Since this was opened from your fork, we won't be able to trigger test.yaml manually with workflow dispatch... could you temporarily add these jobs to pr.yaml so we can see them in PR CI? That could be reverted before this is merged.

jameslamb · 2026-04-10T15:02:23Z

ci/stable_install/test_imports.sh

+set -euo pipefail
+
+function testImports {
+    unset imports


Suggested change

unset imports

local -a imports=()

Would something like this be slightly safer than unset? I think (untested) this would create a function-scoped imports on each call.

jameslamb · 2026-04-10T15:03:15Z

ci/stable_install/test_imports.sh

+function testImports {
+    unset imports
+    while [[ $# -gt 0 ]]; do
+        # run standalone import test
+        rapids-logger "Standalone import test for $1"
+        python -c "import $1" || rapids-logger "Test failed for: $1"
+        rapids-logger "Passed"
+        # add import to array for combined import test before shifting
+        imports+=("$1")
+        shift
+    done
+    import_cmd=$(printf "import %s; " "${imports[@]}")
+    rapids-logger "Combined import test for: ${imports[*]}"
+    python -c "${import_cmd}" || rapids-logger "Test failed for: ${imports[*]}"
+    rapids-logger "Passed"
+}


I like it a lot!!!

jameslamb · 2026-04-10T15:07:07Z

ci/stable_install/test_imports.sh

+    while [[ $# -gt 0 ]]; do
+        # run standalone import test
+        rapids-logger "Standalone import test for $1"
+        python -c "import $1" || rapids-logger "Test failed for: $1"


I don't think this will fail the CI jobs when the imports fail, and I think it should, right? I think we want a big ❌ and a job failure, not for us to need to remember to go read the logs.

This || is going to swallow the exit code of python -c "import $1" and the pipe will exit with 0 because that logging statement will succeed.

And I don't see any other logic that's doing something like "grep for Test failed and exit 1 if you find any" in the calling scripts.

I recommend hard-coding in like python -c "import some_nonsense" here and checking that the CI job fails when we want.

Yep, you're right. For now I'm going to remove the || and let the errors percolate up. I can put up a follow-up at some point that sticks all the output in a named pipe or something and then greps over all the output for errors

jameslamb · 2026-04-10T15:07:48Z

ci/stable_install/install_and_test_pip.sh

+          WHEELS_DIR=$(mktemp -d)
+          pip download \
+            --isolated \
+            --index-url https://pypi.nvidia.com \
+            --prefer-binary \
+            --no-deps \
+            -d "${WHEELS_DIR}" \
+            "cucim-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "cuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "cuxfilter-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "libcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "libcuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "nx-cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
+            "pylibcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}"


Thanks for calling it out. I think that's totally fine.

…s in one go

gforsyth · 2026-04-13T17:09:56Z

Ahh, ok, so shared-workflows definitely assumes we're running a RAPIDS image, because rapids-github-info requires that git is installed, and the sccache setup step assumed that curl and jq are installed.

Even the beefy 5 GB+ cuda-devel images don't have those dependencies installed.

I think I'm going to open a PR to add a "bootstrap script" option to custom_job, that can execute quick install commands before the rest of the shared-workflow steps get kicked off.

gforsyth · 2026-04-13T17:59:35Z

Ahh, ok, so shared-workflows definitely assumes we're running a RAPIDS image, because rapids-github-info requires that git is installed, and the sccache setup step assumed that curl and jq are installed.

Even the beefy 5 GB+ cuda-devel images don't have those dependencies installed.

I think I'm going to open a PR to add a "bootstrap script" option to custom_job, that can execute quick install commands before the rest of the shared-workflow steps get kicked off.

Even with the bootstrap script this doesn't work.
actions/checkout runs before any script is executed and because the nvidia/cuda images don't have git installed, checkout pulls a tarball snapshot instead, so then we don't have a proper repo cloned down.

I genuinely didn't think that creating our own version of nvidia/cuda that also includes curl, git, and jq would be the best path forward, but I'm starting to lean in that direction.

… wrong nvjitlink version

gforsyth · 2026-04-14T18:31:46Z

Met with the cucim team and we noted that cucim is linking against the CUDA driver (will investigate more why that is) -- because of this, we see the import error for cucim:

>>> import cucim
dlopen error libcuda.so.1: cannot open shared object file: No such file or directory
 missing cuda symbols while dynamic loading
 cuFile initialization failed

This only occurs on CPU runners because libcuda.so.1 isn't available.

…pace

gforsyth added 3 commits April 7, 2026 16:36

feat: add stable install testing for pip installs

a4c7a73

feat: add stable install testing for conda installs

68e59ef

ci: add nightly test jobs for pip and conda stable installs

9bc9fbf

gforsyth requested a review from a team as a code owner April 7, 2026 20:40

gforsyth added non-breaking improvement labels Apr 7, 2026

gforsyth requested a review from bdice April 7, 2026 20:40

jameslamb reviewed Apr 7, 2026

View reviewed changes

gforsyth and others added 2 commits April 8, 2026 09:39

refactor: apply suggestions from code review

5c3d9bc

Co-authored-by: James Lamb <jaylamb20@gmail.com>

chore: remove defunct import script

188e5c8

gforsyth commented Apr 8, 2026

View reviewed changes

jameslamb requested changes Apr 10, 2026

View reviewed changes

gforsyth added 7 commits April 13, 2026 12:00

refactor(ci): use matrix for arch selection

eac7264

chore(release): bump stable rapids version for install checks

e6e8f51

refactor: add combined import test

f000c6d

refactor(pip): download nvidia PyPI wheels and install upstream wheel…

ca836f0

…s in one go

refactor(pip): test on cuda-toolkit major-minor

858fa3e

fix(imports): don't swallow import errors

b6bf46d

revertme: add tests to pr.yaml

1597c63

gforsyth force-pushed the stable_install_testing branch from 237840e to 1597c63 Compare April 13, 2026 16:00

gforsyth mentioned this pull request Apr 13, 2026

feat: add "bootstrap" option to custom_job rapidsai/shared-workflows#520

Closed

gforsyth force-pushed the stable_install_testing branch from 8e1b963 to a4c30cb Compare April 13, 2026 17:24

gforsyth added 3 commits April 13, 2026 15:19

fix: source test_imports from CWD

80050f1

fix: source scripts relative to invocation directory

cd8e485

refactor: use rapidsai/citestwheel instead of nvidia/cuda

0e78bc0

gforsyth force-pushed the stable_install_testing branch from c3455ee to 0e78bc0 Compare April 13, 2026 19:20

gforsyth added 2 commits April 13, 2026 15:25

fix: restore bootstrap uv install in pip script

a72cc90

chore: run newer minor versions first

0d44f2d

gforsyth added 2 commits April 13, 2026 17:06

feat(imports): finish import tests but raise if any fail along the way

5b128ac

refactor: put minor version arrays at the top of the file for reference

0d88656

gforsyth force-pushed the stable_install_testing branch from b3a477a to 0d88656 Compare April 14, 2026 16:14

fix(env_setup): install explicit list of cuda-toolkit extras to avoid…

4476022

… wrong nvjitlink version

gforsyth mentioned this pull request Apr 14, 2026

Does cucim need to link against the CUDA driver? rapidsai/cucim#1074

Open

fix: cleanup environment after each run so we don't run out of disk s…

d2dfd6b

…pace


		set -euo pipefail

		STABLE_RAPIDS_VERSION="26.4.*"

Conversation

gforsyth commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jameslamb left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

gforsyth commented Apr 13, 2026

Uh oh!

gforsyth commented Apr 13, 2026

Uh oh!

gforsyth commented Apr 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

gforsyth commented Apr 7, 2026 •

edited

Loading