Install upstream dev dependencies in nightly CI #17999

TomAugspurger · 2025-02-13T14:06:30Z

This installs dev versions (from source) of dask and distributed in the nightly CI of dask-cudf.

When paired with rapidsai/rapids-dask-dependency#85, we'll have CI for pull requests and merge commits against released versions of Dask, but keep nightly CI runs against upstream dev versions. This ensures we'll still have advanced notice of upstream changes that cause CI failures, without disrupting day-to-day activity.

Description

I think we we want two things out of our CI, with respect to our dependencies:

CI should be relatively stable: we don't want a change on an upstream dev branch breaking CI for reasons unrelated to the pull request. Call this "insulation from upstream dev changes".
We want to catch upstream behavior changes quickly; cerrtainly before those changes are released. Call this "exposure to upstream dev changes".

The "insulation from" / "exposure to" framing makes clear that these two goals are in tension.

Currently, rapids-dask-dependency specifies a dependency on dask main. When downstream RAPIDS projects (like cuDF) solve their environment, they depend on rapids-dask-dependency and get the transitive dependency on dask main.

This setup gives us complete exposure to upstream dev changes. CI immediately fails, and we scramble to fix things, either upstream in Dask or downstream in RAPIDS libraries. IMO (and it's just that: my opinion), that doesn't strike a good balance between the two goals.

I propose an alternative setup that gives us some of both.

CI runs on pull requests and merges will run against released versions of Dask (and any other dev dependencies we want to test)
Nightly CI will run against dev versions of Dask (and any other dev dependencies we want to test)

Relative to today, we'll have less testing against dask main. But by monitoring nightly builds we still get advanced notice of a change that has broken our builds, which can be addressed before a release is made.

Checklist

I am familiar with the Contributing Guidelines.
New or existing tests cover these changes.
The documentation is up to date with these changes.

This installs dev versions (from source) of dask and distributed in the nightly CI of dask-cudf. When paired with rapidsai/rapids-dask-dependency#85, we'll have CI for pull requests and merge commits against released versions of Dask, but keep nightly CI runs against upstream dev versions. This ensures we'll still have advanced notice of upstream changes that cause CI failures, without disrupting day-to-day activity.

copy-pr-bot · 2025-02-13T14:06:34Z

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

TomAugspurger · 2025-02-13T14:08:37Z

ci/test_wheel_dask_cudf.sh

@@ -16,6 +16,12 @@ rapids-logger "Install dask_cudf, cudf, pylibcudf, and test requirements"
 # generate constraints (possibly pinning to oldest support versions of dependencies)
 rapids-generate-pip-constraints py_test_dask_cudf ./constraints.txt

+# latest-nightly builds are run against upstream *dev* versions of various packages:
+if [[ "${RAPIDS_DEPENDENCIES}" == "latest" ]] && [[ "${BUILD_TYPE}" == "nightly" ]]; then


RAPIDS_DEPENDENCIES is set here and
BUILD_TYPE is set here

With RAPIDS_DEPENDENCIES="latest", rapids-generate-pip-constraints currently outputs an empty text file: https://github.com/rapidsai/gha-tools/blob/4e9add6b630887217825bbbcd2009742dab271ec/tools/rapids-generate-pip-constraints#L32-L34.

rjzamora · 2025-02-13T14:51:46Z

This setup gives us complete exposure to upstream dev changes. CI immediately fails, and we scramble to fix things, either upstream in Dask or downstream in RAPIDS libraries. IMO (and it's just that: my opinion), that doesn't strike a good balance between the two goals.

I share your opinion.

Historical Context:

In the past, we had GPU-CI running on all dask and distributed PRs (for the rest of this blurb, I'll use dask to refer to both dask and distributed). These tests usually caught breaking changes before they were merged into dask:main. Therefore, exposing all of RAPIDS CI to dask:main was probably the best way to strike the balance we needed.

We no longer have GPU-CI running in dask and distributed in todays world, because RAPIDS no longer supports the underlying Jenkins infrastructure (and the GHA replacement option was not deemed acceptable).

In the past, we also had two clear sources of motivation to stay tightly aligned with dask:main:

We were relying heavily on new upstream features in dask-cudf/dask-cuda (that we were contributing), and the only way for RAPIDS users to benefit was to support the latest version of Dask.
We observed a clear correlation between "divergence from dask:main" and "RAPIDS + Dask maintenance time". In other words, every day we spent pinned to an "old" release of Dask seemed to result in a full developer day (or more) of additional maintenance time).

It is no longer our strategy to rely on upstream contributions for RAPIDS roadmap items. However, it is still true that divergence from dask:main can result in significant maintenance time.

My personal take-away:

Given the historical context, I completely agree that it no longer makes sense to expose all of RAPIDS CI to the latest change in dask:main, but that we must make a significant effort to avoid any long-term divergence from dask:main (as long as it's possible to do so).

TomAugspurger · 2025-02-13T15:17:49Z

@pentschev points out that when, say, dask-cudf needs to adapt to an upstream change then CI won't automatically run against upstream dev like it did previously, and so the fix won't actually be tested in CI until after it's merged. Which isn't great.

xarary has a test-upstream job: https://github.com/pydata/xarray/blob/3dafcf9a29b82930828ca0bd8ff0a038b4affcd3/.github/workflows/upstream-dev-ci.yaml#L37. By including test-upstream in a commit message (or some other trigger), the upstream-dev job is run on that PR.

I'll see if we can do something similar here.

TomAugspurger · 2025-02-20T20:27:15Z

Closing this in favor of running some cudf tests in dask-upstream-testing. See rapidsai/rapids-dask-dependency#85 (comment) for a summary.

github-actions bot assigned TomAugspurger Feb 13, 2025

TomAugspurger commented Feb 13, 2025

View reviewed changes

TomAugspurger mentioned this pull request Feb 13, 2025

Depend on released version of Dask rapidsai/rapids-dask-dependency#85

Merged

TomAugspurger closed this Feb 20, 2025

TomAugspurger deleted the tom/nightly-upstream branch February 20, 2025 20:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Install upstream dev dependencies in nightly CI #17999

Install upstream dev dependencies in nightly CI #17999

TomAugspurger commented Feb 13, 2025

copy-pr-bot bot commented Feb 13, 2025

TomAugspurger Feb 13, 2025

rjzamora commented Feb 13, 2025

TomAugspurger commented Feb 13, 2025

TomAugspurger commented Feb 20, 2025

Install upstream dev dependencies in nightly CI #17999

Install upstream dev dependencies in nightly CI #17999

Conversation

TomAugspurger commented Feb 13, 2025

Description

Checklist

copy-pr-bot bot commented Feb 13, 2025

TomAugspurger Feb 13, 2025

Choose a reason for hiding this comment

rjzamora commented Feb 13, 2025

TomAugspurger commented Feb 13, 2025

TomAugspurger commented Feb 20, 2025