Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install upstream dev dependencies in nightly CI #17999

Conversation

TomAugspurger
Copy link
Contributor

This installs dev versions (from source) of dask and distributed in the nightly CI of dask-cudf.

When paired with rapidsai/rapids-dask-dependency#85, we'll have CI for pull requests and merge commits against released versions of Dask, but keep nightly CI runs against upstream dev versions. This ensures we'll still have advanced notice of upstream changes that cause CI failures, without disrupting day-to-day activity.

Description

I think we we want two things out of our CI, with respect to our dependencies:

  1. CI should be relatively stable: we don't want a change on an upstream dev branch breaking CI for reasons unrelated to the pull request. Call this "insulation from upstream dev changes".
  2. We want to catch upstream behavior changes quickly; cerrtainly before those changes are released. Call this "exposure to upstream dev changes".

The "insulation from" / "exposure to" framing makes clear that these two goals are in tension.

Currently, rapids-dask-dependency specifies a dependency on dask main. When downstream RAPIDS projects (like cuDF) solve their environment, they depend on rapids-dask-dependency and get the transitive dependency on dask main.

This setup gives us complete exposure to upstream dev changes. CI immediately fails, and we scramble to fix things, either upstream in Dask or downstream in RAPIDS libraries. IMO (and it's just that: my opinion), that doesn't strike a good balance between the two goals.

I propose an alternative setup that gives us some of both.

  1. CI runs on pull requests and merges will run against released versions of Dask (and any other dev dependencies we want to test)
  2. Nightly CI will run against dev versions of Dask (and any other dev dependencies we want to test)

Relative to today, we'll have less testing against dask main. But by monitoring nightly builds we still get advanced notice of a change that has broken our builds, which can be addressed before a release is made.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

This installs dev versions (from source) of dask and distributed in the
nightly CI of dask-cudf.

When paired with rapidsai/rapids-dask-dependency#85, we'll
have CI for pull requests and merge commits against released versions of Dask,
but keep nightly CI runs against upstream dev versions. This ensures we'll
still have advanced notice of upstream changes that cause CI failures, without
disrupting day-to-day activity.
Copy link

copy-pr-bot bot commented Feb 13, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@@ -16,6 +16,12 @@ rapids-logger "Install dask_cudf, cudf, pylibcudf, and test requirements"
# generate constraints (possibly pinning to oldest support versions of dependencies)
rapids-generate-pip-constraints py_test_dask_cudf ./constraints.txt

# latest-nightly builds are run against upstream *dev* versions of various packages:
if [[ "${RAPIDS_DEPENDENCIES}" == "latest" ]] && [[ "${BUILD_TYPE}" == "nightly" ]]; then
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RAPIDS_DEPENDENCIES is set here and
BUILD_TYPE is set here

With RAPIDS_DEPENDENCIES="latest", rapids-generate-pip-constraints currently outputs an empty text file: https://github.com/rapidsai/gha-tools/blob/4e9add6b630887217825bbbcd2009742dab271ec/tools/rapids-generate-pip-constraints#L32-L34.

@rjzamora
Copy link
Member

This setup gives us complete exposure to upstream dev changes. CI immediately fails, and we scramble to fix things, either upstream in Dask or downstream in RAPIDS libraries. IMO (and it's just that: my opinion), that doesn't strike a good balance between the two goals.

I share your opinion.

Historical Context:

In the past, we had GPU-CI running on all dask and distributed PRs (for the rest of this blurb, I'll use dask to refer to both dask and distributed). These tests usually caught breaking changes before they were merged into dask:main. Therefore, exposing all of RAPIDS CI to dask:main was probably the best way to strike the balance we needed.

We no longer have GPU-CI running in dask and distributed in todays world, because RAPIDS no longer supports the underlying Jenkins infrastructure (and the GHA replacement option was not deemed acceptable).

In the past, we also had two clear sources of motivation to stay tightly aligned with dask:main:

  1. We were relying heavily on new upstream features in dask-cudf/dask-cuda (that we were contributing), and the only way for RAPIDS users to benefit was to support the latest version of Dask.
  2. We observed a clear correlation between "divergence from dask:main" and "RAPIDS + Dask maintenance time". In other words, every day we spent pinned to an "old" release of Dask seemed to result in a full developer day (or more) of additional maintenance time).

It is no longer our strategy to rely on upstream contributions for RAPIDS roadmap items. However, it is still true that divergence from dask:main can result in significant maintenance time.

My personal take-away:

Given the historical context, I completely agree that it no longer makes sense to expose all of RAPIDS CI to the latest change in dask:main, but that we must make a significant effort to avoid any long-term divergence from dask:main (as long as it's possible to do so).

@TomAugspurger
Copy link
Contributor Author

@pentschev points out that when, say, dask-cudf needs to adapt to an upstream change then CI won't automatically run against upstream dev like it did previously, and so the fix won't actually be tested in CI until after it's merged. Which isn't great.

xarary has a test-upstream job: https://github.com/pydata/xarray/blob/3dafcf9a29b82930828ca0bd8ff0a038b4affcd3/.github/workflows/upstream-dev-ci.yaml#L37. By including test-upstream in a commit message (or some other trigger), the upstream-dev job is run on that PR.

I'll see if we can do something similar here.

@TomAugspurger
Copy link
Contributor Author

Closing this in favor of running some cudf tests in dask-upstream-testing. See rapidsai/rapids-dask-dependency#85 (comment) for a summary.

@TomAugspurger TomAugspurger deleted the tom/nightly-upstream branch February 20, 2025 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants