-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add downstream tests #6
Conversation
rapidsai/rapids-dask-dependency#85 changes the dask dependency to use released versions of Dask. To catch prominant breakage in downstream libraries (like dask-cudf) with dask main, we'll add some novel tests here that run downstream (cudf) nightly against upstream (dask) `main`.
|
||
|
||
def test_import_dask_cudf(): | ||
import dask_cudf # noqa: F401 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we only want to test import rapids_package
? If we are planning to move away from testing @main
in branches of all repos like in rapidsai/rapids-dask-dependency#85 I feel we should be running all dask
related pytests of cudf
, cuml
& dask_cuda
in this repo to be able to catch real dask upstream related failures rather than just ImportError
's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just had a chat with @rjzamora on this topic, but I think that's the big outstanding question here. I think we want two things:
- Regular day-to-day workflow shouldn't be vulnerable to upstream changes on
main
breaking CI - Regressions should be caught early by running tests against
main
, otherwise CI (and potentially our users' code) will break when the upstream package is released
These two goals are somewhat in tension.
Other projects I've worked on have managed this by having multiple environments, one of which used dev
versions of relevant upstream packages. The upstream environment was allowed to fail but would be monitored for breakage. I'm not sure if that's a realistic option for libraries like cudf.
Assuming that's not an option, we can try to get some exposure to dev upstream versions here in dask-upstream-testing. We can do that either by
- writing our own tests here (which I've started). But that's code to maintain, and wouldn't get as thorough test coverage.
- trying to run the downstream tests in an environment with upstream dev. That gets a little complicated (we're installing downstream libraries from wheels rather than source so we don't have tests unless they're shipped with the built distribution) but it's possible with a bit of work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2. trying to run the downstream tests in an environment with upstream dev. That gets a little complicated (we're installing downstream libraries from wheels rather than source so we don't have tests unless they're shipped with the built distribution) but it's possible with a bit of work.
I'm +1 to this approach. We can git clone the repositories and run the pytests with dask@main
& latest nightly version of rapids_library
. Until that happens I don't think we should be merging rapidsai/rapids-dask-dependency#85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we could add an upstream_dev
branch to rapids-dask-dependency
, and GPU-CI could add a new (optional) matrix item to build/tests against this in the necessary downstream repos (dask-cudf/dask-cuda/cuml/raft)? Otherwise, we will indeed need to find a way to include down-stream tests within dask-upstream-testing
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have test.yaml
that runs nightly: https://github.com/rapidsai/cudf/actions/workflows/test.yaml
We could just modify each of the repo's jobs to run nightly tests with main
dask and achieve what we want + not increase GPU-CI usage by any bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One note: https://github.com/rapidsai/dask-upstream-testing/actions/runs/13292220048 did pick up the failure that spurred all this because dask/dask
does have some tests that exercise dask_cudf
.
IMO, if the downstream libraries already have nightly runs then it'd be simpler to ensure that nightly environment tests against dask main. Then we won't have to mess with cloning source repositories and syncing commits up with the installed version, installing test dependencies, and picking out the right subset of tests to run.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we made changes to nightlies to pick main
and PRs to run with a stable version, that will help drop this repository too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'll still need this to run the GPU tests defined in the dask and distributed repo (which aren't in the rapidsai org and so I think can't use our runners).
But we wouldn't need these import tests.
@TomAugspurger - I believe we can close this now that the downstream testing strategy is slightly different. |
Yep, thanks. And I pulled the linting / CI changes out to another PR so we won't lose those. |
This adds some very rudimentary downstream tests to dask-upstream-testing.
With rapidsai/rapids-dask-dependency#85, we won't have immediate feedback on changes to dask
main
that affect downstream libraries. By including some downstream tests here, we'll have an environment that uses nightly downstream with daskmain
.At the moment, it just attempts to import some RAPIDS libraries that use dask. We can expand that as much as we want, but we should be mindful of CI time.