Skip to content

[serve][ci] Run all serve tests with HAProxy, drop the test whitelist#64210

Draft
eicherseiji wants to merge 4 commits into
ray-project:masterfrom
eicherseiji:seiji/remove-haproxy-test-whitelist
Draft

[serve][ci] Run all serve tests with HAProxy, drop the test whitelist#64210
eicherseiji wants to merge 4 commits into
ray-project:masterfrom
eicherseiji:seiji/remove-haproxy-test-whitelist

Conversation

@eicherseiji

Copy link
Copy Markdown
Contributor

Why

The HAProxy CI step (serve: HAProxy tests) ran only the curated list of targets in ci/ray_ci/serve_hap_test_names.txt. HAProxy is now the default Serve ingress, so the whole serve test suite should run against it rather than a hand-maintained subset.

What

  • Replace $(cat ci/ray_ci/serve_hap_test_names.txt) with //python/ray/serve/tests/... in the HAProxy step.
  • Delete ci/ray_ci/serve_hap_test_names.txt.
  • Keep --except-tags post_wheel_build,gpu,ha_integration,serve_tracing,direct_ingress so tests that have their own dedicated steps or build images are not pulled in here. The haproxy-tagged tests still run only in this step.
  • Bump parallelism from 3 to 6 to absorb the larger target set.

Status

Draft. The point of this PR is to surface, via premerge CI, which serve tests do not yet pass with RAY_SERVE_ENABLE_HA_PROXY=1. Failures here are the work items before the whitelist can be removed for real.

The HAProxy CI step previously ran only the curated list of targets in
ci/ray_ci/serve_hap_test_names.txt. HAProxy is now the default ingress and
stable enough that the whole serve test suite should run against it.

Replace the whitelist with //python/ray/serve/tests/... and delete the file.
The step still excludes tags that have their own steps or build images
(ha_integration, serve_tracing, direct_ingress) plus gpu and post_wheel_build.
Bump parallelism from 3 to 6 to absorb the larger target set.

This is a draft to surface, via premerge CI, which serve tests do not yet
pass with HAProxy enabled.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
@eicherseiji eicherseiji added the go add ONLY when ready to merge, run all tests label Jun 18, 2026
@eicherseiji eicherseiji self-assigned this Jun 18, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the HAProxy test configuration in Buildkite to run the full serve test suite dynamically instead of using a curated whitelist file, which has been deleted. It also increases the step's parallelism from 3 to 6. The reviewer suggests expanding the test targets to include '//python/ray/serve/...' and '//python/ray/tests/...' to ensure complete test coverage and maintain consistency with standard serve tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread .buildkite/serve.rayci.yml Outdated
# whitelist. Excludes tags handled by their own steps/images.
# Exit 42 skips whole-step retry (per @kouroshHakha); per-test retry is enough.
- bazel run //ci/ray_ci:test_in_docker -- $(cat ci/ray_ci/serve_hap_test_names.txt) serve
- bazel run //ci/ray_ci:test_in_docker -- //python/ray/serve/tests/... serve

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To ensure complete test coverage and consistency with the standard serve tests step (defined on line 44), we should run the same set of targets: //python/ray/serve/... and //python/ray/tests/.... Using only //python/ray/serve/tests/... might miss serve-related tests located in python/ray/tests/ or other subdirectories under python/ray/serve/.

      - bazel run //ci/ray_ci:test_in_docker -- //python/ray/serve/... //python/ray/tests/... serve

Match the target set used by the standard serve test steps
(//python/ray/serve/... //python/ray/tests/...) instead of the narrower
//python/ray/serve/tests/..., so this step stays uniform with the rest of
the file and picks up any future team:serve test added under python/ray/tests.
Tighten the step comment to one line.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Premerge of the whitelist removal showed a set of serve tests cannot pass
with the HAProxy ingress: gRPC tests (HAProxy does not proxy gRPC yet, see
haproxy.py), tests that probe the native Serve HTTP proxy directly, and a
few lifecycle/shutdown tests whose timing differs under HAProxy.

Add a skip_if_haproxy(reason) helper (pytest.mark.skipif keyed on
RAY_SERVE_ENABLE_HA_PROXY) and apply it to those tests. This replaces the
external serve_hap_test_names.txt allowlist with inline, self-documenting
skips colocated with each test. The skips only trigger in the HAProxy step;
the tests still run normally in every other serve step, so no coverage is
lost, and the full suite now runs under HAProxy minus the documented skips.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
A second premerge run (#68597) surfaced stragglers the first partial harvest
missed: the native-proxy metrics tests in test_metrics.py (proxy/serve metrics
that HAProxy emits differently, covered by test_metrics_haproxy.py) and
test_deployment_scheduler_downscale's downscale_fallback_node. Mark them with
skip_if_haproxy. test_regression.test_replica_memory_growth failed once but
passed on retry, so it is flaky rather than incompatible and is left running.

Signed-off-by: Seiji Eicher <seiji@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant