Add UI end-to-end tests for the FL web portal#21
Conversation
Implements a Playwright-based E2E test suite that exercises the full
federated learning lifecycle through the Controller web portal using
three independent site stacks (1 coordinator + 2 participants).
New files:
- workbench/docker-compose.e2e.yml — three Controller stacks (web +
Celery beat/run/processor workers each) sharing one Router, Postgres,
and Redis; sites isolated by separate Redis DB numbers (1/2/3)
- e2e/test_fl_workflow.py — single sequential test covering all 8
workflow steps: site registration, project creation, project joining,
run start, dataset upload, wait for Success, log inspection, and
artifact download
- e2e/conftest.py — pytest fixtures (base URLs, fixtures dir)
- e2e/fixtures/site_{a,b,c}.csv — 50-row synthetic binary classification
datasets (2 features, no header) for LogisticRegression
- e2e/requirements.txt — pytest, pytest-playwright, playwright
- .github/workflows/e2e-tests.yml — CI job that builds images, starts
the E2E stack, waits for all services, runs tests, and tears down
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Embedded Python code inside a YAML folded scalar caused the Docker Compose parser to fail with 'could not find expected :' on the bare import statement (line 83). Replace the inline Python shell snippet with createsuperuser --noinput, which reads the password from the DJANGO_SUPERUSER_PASSWORD env var and is a single-line shell command that requires no embedded Python code. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. pytest --timeout flag: add pytest-timeout to e2e/requirements.txt. Without it, pytest rejected --timeout=300 with exit code 4 before running any tests. 2. Controller workers starting before router is ready: add a TCP healthcheck to the router service and change all 12 controller service depends_on entries from service_started to service_healthy. The healthcheck uses python3 socket to probe port 8000, which is available in the router image without extra packages. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The create-project JS does window.location.href = "/" but Django
redirects / to /controller/. wait_for_url("http://localhost:8001/")
timed out because the browser landed on /controller/ instead.
Replace with wait_for_load_state("domcontentloaded") which is
redirect-agnostic.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d state
The create-project form submits via $.post(); on success the callback calls
window.location.href = "/" which Django redirects to /controller/.
wait_for_load_state("domcontentloaded") returned immediately (DOM already
loaded) before the AJAX callback fired, so the assertion saw the Create
New Project page. Switch to wait_for_url("/controller/") which blocks
until the navigation actually completes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The navbar in base.html has absolute hrefs like /controller/projects/new
which the previous selector (a[href*="/controller/projects/"]) matched
first. The actual project detail links in index.html use relative hrefs
like projects/{id}/{site_id} (no leading slash), so they never matched.
Switch to iterating all a[href*="projects/"] links and returning the
first one whose href contains two consecutive integers.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The dataset upload view immediately sets the run status to PREPARING in
the router DB, bypassing Celery. If the coordinator uploads first and all
three sites are already in PREPARING, the coordinator's preparing() fires
and transitions everyone to RUNNING before participants' fetch_run beats
(every 5 s) have dispatched their own process_task('preparing'). Those
tasks never run, so prepare_data() is never called, self.logisticRegr
stays None, and training() raises AttributeError.
Fix: upload participants (b, c) first, sleep 15 s (≥ 3 beat cycles) to
let their process_task('preparing') fire and complete, then upload the
coordinator (a) last so it triggers the PREPARING→RUNNING transition only
after participants have initialised their ML model in ml_models.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
This e2e test currently only add one success path for |
|
|
||
| x-controller-build: &controller-build | ||
| build: | ||
| context: ../../controller |
There was a problem hiding this comment.
I think that since the file is in workbench, so going up one level (..) gets you to the root of starfish-fl, where the controller directory is located. Using ../../controller would go up two levels and look for controller in the parent directory, which is wrong.
There was a problem hiding this comment.
Good catch. This appears to be a bug and the test did not capture it because the image has been built before running this step. So I guess the build was skipped during automation. Need to verify it and fix.
|
|
||
| router: | ||
| build: | ||
| context: ../../router |
There was a problem hiding this comment.
Same issue as controller on line 11.
bbd72e9 to
8bac44b
Compare
Summary
workbench/docker-compose.e2e.ymlwith three independent Controller stacks (site-a on port 8001, site-b on 8002, site-c on 8003), each with its own Celery beat/run/processor workers, sharing one Router and Redis. Sites are isolated using separate Redis database numbers (1/2/3).e2e/test_fl_workflow.py— a single sequential Playwright test that drives all 8 workflow steps through the web portal: site registration, project creation, project joining, run start, dataset upload, wait for Success, log inspection, and artifact download.e2e/fixtures/site_{a,b,c}.csv— small synthetic binary classification datasets (50 rows, 2 features, no header) for LogisticRegression..github/workflows/e2e-tests.yml— CI job that builds images, starts the E2E stack, waits for all services to respond, runs the tests with a 5-minute timeout, collects logs on failure, and tears down.Test plan
docker compose -f workbench/docker-compose.e2e.yml up -dstarts without errorscd e2e && pip install -r requirements.txt && playwright install chromium && pytest test_fl_workflow.py -vcompletes all 8 stepsCloses #20
🤖 Generated with Claude Code