Conversation
jameslamb
left a comment
There was a problem hiding this comment.
Thanks for doing this!
I nosed in and left a couple comments for your consideration... I've spent the last couple weeks deep in RAPIDS' wheel-testing setup (e.g. for rapidsai/build-planning#256), and that informed some of what I was looking for here.
Overall I do really like the structure! Especially running the jobs in parallel and using the nvidia/cuda:*-base-* image for wheels.
And I agree that "install everything + import the libraries" is a good target for this first pass.
|
|
||
| set -euo pipefail | ||
|
|
||
| STABLE_RAPIDS_VERSION="26.4.*" |
There was a problem hiding this comment.
https://github.com/rapidsai/integration/blob/main/ci/release/update-version.sh should be updated to modify this
|
|
||
| STABLE_RAPIDS_VERSION="26.4.*" | ||
| SUPPORTED_PYTHON_VERSIONS=(3.11 3.12 3.13 3.14) | ||
| SUPPORTED_CUDA_VERSIONS=("cu12" "cu13") |
There was a problem hiding this comment.
Would you consider also having this test the bounds (12.2, 12.9, 13.0, 13.1)?
That could be done by adding a requirement like this to the pip install calls:
cuda-toolkit[all]==${cuda_major_minor}.*
That'd be a really helpful extension of rapidsai/build-planning#256, and I think it'd help us catch conflicts that aren't easily caught in individual projects' CI.
| "cuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" | ||
| --extra-index-url=https://pypi.nvidia.com | ||
| ) | ||
|
|
There was a problem hiding this comment.
Similar to my comments on conda, I think this would be a more powerful test if it combined all the imports into one.
I'd structure it roughly like this:
# get all the pypi.nvidia.com stuff
wheels_dir=$(mktemp -d)
pip download \
--isolated \
--index-url https://pypi.nvidia.com \
--prefer-binary \
--no-deps \
-d "${wheels_dir}" \
"cucim-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
"cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
"cuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
"cuxfilter-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
"libcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
"libcuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \
"pylibcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}"
pip install \
--isolated \
--index-url https://pypi.org/simple \
--prefer-binary \
"${PIP_INSTALL_PYPI[@]}" \
"${wheels_dir}"/*.whl
python -c "import cudf; import dask_cudf; ...; import cuvs"Would you consider something like that?
.github/workflows/test.yaml
Outdated
| container_image: "nvidia/cuda:12.9.1-base-ubuntu-24.04" | ||
| script: | | ||
| ./ci/stable_install/install_and_test_pip.sh --cuda cu12 | ||
| test-stable-install-pip-cuda-13-amd64: |
There was a problem hiding this comment.
We could cut the number of jobs here in half by having matrix elements inside of each of these for amd64 and arm64. Like this: https://github.com/rapidsai/cuvs/blob/cbb9db5697eeebcb03a6ed198b7d4386ce14a301/.github/workflows/pr.yaml#L365-L374
Will you consider that?
There was a problem hiding this comment.
That language was a little imprecise... cut the number of configurations in half. The number of jobs would be unchanged.
Co-authored-by: James Lamb <jaylamb20@gmail.com>
| function testImports { | ||
| unset imports | ||
| while [[ $# -gt 0 ]]; do | ||
| # run standalone import test | ||
| rapids-logger "Standalone import test for $1" | ||
| python -c "import $1" || rapids-logger "Test failed for: $1" | ||
| rapids-logger "Passed" | ||
| # add import to array for combined import test before shifting | ||
| imports+=("$1") | ||
| shift | ||
| done | ||
| import_cmd=$(printf "import %s; " "${imports[@]}") | ||
| rapids-logger "Combined import test for: ${imports[*]}" | ||
| python -c "${import_cmd}" || rapids-logger "Test failed for: ${imports[*]}" | ||
| rapids-logger "Passed" | ||
| } |
There was a problem hiding this comment.
This does enough now that I broke it out into a separate script so it can be sourced by both test scripts.
Imports each library individually in separate Python sessions, then imports all of them sequentially in the same Python session.
| WHEELS_DIR=$(mktemp -d) | ||
| pip download \ | ||
| --isolated \ | ||
| --index-url https://pypi.nvidia.com \ | ||
| --prefer-binary \ | ||
| --no-deps \ | ||
| -d "${WHEELS_DIR}" \ | ||
| "cucim-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "cuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "cuxfilter-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "libcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "libcuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "nx-cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "pylibcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" |
There was a problem hiding this comment.
This does an unnecessary double-download for each CUDA major version, but avoiding it requires annoying and error-prone state-tracking
There was a problem hiding this comment.
Thanks for calling it out. I think that's totally fine.
jameslamb
left a comment
There was a problem hiding this comment.
Thanks for the import changes, I think we'll be really happy to have those!
I left a couple more suggestions, and one that I think is worth blocking the PR over.
I'd love to also see actual runs of these jobs. Since this was opened from your fork, we won't be able to trigger test.yaml manually with workflow dispatch... could you temporarily add these jobs to pr.yaml so we can see them in PR CI? That could be reverted before this is merged.
ci/stable_install/test_imports.sh
Outdated
| set -euo pipefail | ||
|
|
||
| function testImports { | ||
| unset imports |
There was a problem hiding this comment.
| unset imports | |
| local -a imports=() |
Would something like this be slightly safer than unset? I think (untested) this would create a function-scoped imports on each call.
| function testImports { | ||
| unset imports | ||
| while [[ $# -gt 0 ]]; do | ||
| # run standalone import test | ||
| rapids-logger "Standalone import test for $1" | ||
| python -c "import $1" || rapids-logger "Test failed for: $1" | ||
| rapids-logger "Passed" | ||
| # add import to array for combined import test before shifting | ||
| imports+=("$1") | ||
| shift | ||
| done | ||
| import_cmd=$(printf "import %s; " "${imports[@]}") | ||
| rapids-logger "Combined import test for: ${imports[*]}" | ||
| python -c "${import_cmd}" || rapids-logger "Test failed for: ${imports[*]}" | ||
| rapids-logger "Passed" | ||
| } |
ci/stable_install/test_imports.sh
Outdated
| while [[ $# -gt 0 ]]; do | ||
| # run standalone import test | ||
| rapids-logger "Standalone import test for $1" | ||
| python -c "import $1" || rapids-logger "Test failed for: $1" |
There was a problem hiding this comment.
I don't think this will fail the CI jobs when the imports fail, and I think it should, right? I think we want a big ❌ and a job failure, not for us to need to remember to go read the logs.
This || is going to swallow the exit code of python -c "import $1" and the pipe will exit with 0 because that logging statement will succeed.
And I don't see any other logic that's doing something like "grep for Test failed and exit 1 if you find any" in the calling scripts.
I recommend hard-coding in like python -c "import some_nonsense" here and checking that the CI job fails when we want.
There was a problem hiding this comment.
Yep, you're right. For now I'm going to remove the || and let the errors percolate up. I can put up a follow-up at some point that sticks all the output in a named pipe or something and then greps over all the output for errors
| WHEELS_DIR=$(mktemp -d) | ||
| pip download \ | ||
| --isolated \ | ||
| --index-url https://pypi.nvidia.com \ | ||
| --prefer-binary \ | ||
| --no-deps \ | ||
| -d "${WHEELS_DIR}" \ | ||
| "cucim-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "cuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "cuxfilter-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "libcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "libcuvs-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "nx-cugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" \ | ||
| "pylibcugraph-${CUDA_SUFFIX}==${STABLE_RAPIDS_VERSION}" |
There was a problem hiding this comment.
Thanks for calling it out. I think that's totally fine.
237840e to
1597c63
Compare
|
Ahh, ok, so Even the beefy 5 GB+ cuda-devel images don't have those dependencies installed. I think I'm going to open a PR to add a "bootstrap script" option to |
8e1b963 to
a4c30cb
Compare
Even with the bootstrap script this doesn't work. I genuinely didn't think that creating our own version of |
c3455ee to
0e78bc0
Compare
b3a477a to
0d88656
Compare
… wrong nvjitlink version
|
Met with the >>> import cucim
dlopen error libcuda.so.1: cannot open shared object file: No such file or directory
missing cuda symbols while dynamic loading
cuFile initialization failedThis only occurs on CPU runners because |
This is a first pass at rapidsai/build-planning#227.
It adds nightly tests that try to install the latest stable version of RAPIDS on all supported Python and CUDA versions, on amd64 and arm64, with both
pipandconda.The tests are currently limited to:
This should wait until after the 26.04 release before merging