Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add downstream tests #6

Closed

Conversation

TomAugspurger
Copy link
Contributor

This adds some very rudimentary downstream tests to dask-upstream-testing.

With rapidsai/rapids-dask-dependency#85, we won't have immediate feedback on changes to dask main that affect downstream libraries. By including some downstream tests here, we'll have an environment that uses nightly downstream with dask main.

At the moment, it just attempts to import some RAPIDS libraries that use dask. We can expand that as much as we want, but we should be mindful of CI time.

rapidsai/rapids-dask-dependency#85 changes the
dask dependency to use released versions of Dask. To catch prominant
breakage in downstream libraries (like dask-cudf) with dask main, we'll
add some novel tests here that run downstream (cudf) nightly against
upstream (dask) `main`.


def test_import_dask_cudf():
import dask_cudf # noqa: F401

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we only want to test import rapids_package? If we are planning to move away from testing @main in branches of all repos like in rapidsai/rapids-dask-dependency#85 I feel we should be running all dask related pytests of cudf, cuml & dask_cuda in this repo to be able to catch real dask upstream related failures rather than just ImportError's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had a chat with @rjzamora on this topic, but I think that's the big outstanding question here. I think we want two things:

  1. Regular day-to-day workflow shouldn't be vulnerable to upstream changes on main breaking CI
  2. Regressions should be caught early by running tests against main, otherwise CI (and potentially our users' code) will break when the upstream package is released

These two goals are somewhat in tension.

Other projects I've worked on have managed this by having multiple environments, one of which used dev versions of relevant upstream packages. The upstream environment was allowed to fail but would be monitored for breakage. I'm not sure if that's a realistic option for libraries like cudf.

Assuming that's not an option, we can try to get some exposure to dev upstream versions here in dask-upstream-testing. We can do that either by

  1. writing our own tests here (which I've started). But that's code to maintain, and wouldn't get as thorough test coverage.
  2. trying to run the downstream tests in an environment with upstream dev. That gets a little complicated (we're installing downstream libraries from wheels rather than source so we don't have tests unless they're shipped with the built distribution) but it's possible with a bit of work.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2. trying to run the downstream tests in an environment with upstream dev. That gets a little complicated (we're installing downstream libraries from wheels rather than source so we don't have tests unless they're shipped with the built distribution) but it's possible with a bit of work.

I'm +1 to this approach. We can git clone the repositories and run the pytests with dask@main & latest nightly version of rapids_library. Until that happens I don't think we should be merging rapidsai/rapids-dask-dependency#85

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if we could add an upstream_dev branch to rapids-dask-dependency, and GPU-CI could add a new (optional) matrix item to build/tests against this in the necessary downstream repos (dask-cudf/dask-cuda/cuml/raft)? Otherwise, we will indeed need to find a way to include down-stream tests within dask-upstream-testing.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have test.yaml that runs nightly: https://github.com/rapidsai/cudf/actions/workflows/test.yaml

We could just modify each of the repo's jobs to run nightly tests with main dask and achieve what we want + not increase GPU-CI usage by any bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One note: https://github.com/rapidsai/dask-upstream-testing/actions/runs/13292220048 did pick up the failure that spurred all this because dask/dask does have some tests that exercise dask_cudf.


IMO, if the downstream libraries already have nightly runs then it'd be simpler to ensure that nightly environment tests against dask main. Then we won't have to mess with cloning source repositories and syncing commits up with the installed version, installing test dependencies, and picking out the right subset of tests to run.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we made changes to nightlies to pick main and PRs to run with a stable version, that will help drop this repository too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we'll still need this to run the GPU tests defined in the dask and distributed repo (which aren't in the rapidsai org and so I think can't use our runners).

But we wouldn't need these import tests.

@TomAugspurger TomAugspurger marked this pull request as draft February 13, 2025 14:15
@rjzamora
Copy link
Member

@TomAugspurger - I believe we can close this now that the downstream testing strategy is slightly different.

@TomAugspurger
Copy link
Contributor Author

Yep, thanks. And I pulled the linting / CI changes out to another PR so we won't lose those.

@TomAugspurger TomAugspurger deleted the tom/fix/dask-install branch March 3, 2025 13:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants