Skip to content

Add UI end-to-end tests for the FL web portal#21

Merged
stedrew merged 7 commits intomainfrom
feature/e2e-ui-tests
Mar 4, 2026
Merged

Add UI end-to-end tests for the FL web portal#21
stedrew merged 7 commits intomainfrom
feature/e2e-ui-tests

Conversation

@stedrew
Copy link
Contributor

@stedrew stedrew commented Mar 2, 2026

Summary

  • Adds workbench/docker-compose.e2e.yml with three independent Controller stacks (site-a on port 8001, site-b on 8002, site-c on 8003), each with its own Celery beat/run/processor workers, sharing one Router and Redis. Sites are isolated using separate Redis database numbers (1/2/3).
  • Adds e2e/test_fl_workflow.py — a single sequential Playwright test that drives all 8 workflow steps through the web portal: site registration, project creation, project joining, run start, dataset upload, wait for Success, log inspection, and artifact download.
  • Adds e2e/fixtures/site_{a,b,c}.csv — small synthetic binary classification datasets (50 rows, 2 features, no header) for LogisticRegression.
  • Adds .github/workflows/e2e-tests.yml — CI job that builds images, starts the E2E stack, waits for all services to respond, runs the tests with a 5-minute timeout, collects logs on failure, and tears down.

Test plan

  • docker compose -f workbench/docker-compose.e2e.yml up -d starts without errors
  • cd e2e && pip install -r requirements.txt && playwright install chromium && pytest test_fl_workflow.py -v completes all 8 steps
  • CI E2E job passes on GitHub Actions

Closes #20

🤖 Generated with Claude Code

Steve Drew and others added 7 commits March 1, 2026 22:20
Implements a Playwright-based E2E test suite that exercises the full
federated learning lifecycle through the Controller web portal using
three independent site stacks (1 coordinator + 2 participants).

New files:
- workbench/docker-compose.e2e.yml — three Controller stacks (web +
  Celery beat/run/processor workers each) sharing one Router, Postgres,
  and Redis; sites isolated by separate Redis DB numbers (1/2/3)
- e2e/test_fl_workflow.py — single sequential test covering all 8
  workflow steps: site registration, project creation, project joining,
  run start, dataset upload, wait for Success, log inspection, and
  artifact download
- e2e/conftest.py — pytest fixtures (base URLs, fixtures dir)
- e2e/fixtures/site_{a,b,c}.csv — 50-row synthetic binary classification
  datasets (2 features, no header) for LogisticRegression
- e2e/requirements.txt — pytest, pytest-playwright, playwright
- .github/workflows/e2e-tests.yml — CI job that builds images, starts
  the E2E stack, waits for all services, runs tests, and tears down

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Embedded Python code inside a YAML folded scalar caused the Docker
Compose parser to fail with 'could not find expected :' on the bare
import statement (line 83).

Replace the inline Python shell snippet with createsuperuser --noinput,
which reads the password from the DJANGO_SUPERUSER_PASSWORD env var and
is a single-line shell command that requires no embedded Python code.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
1. pytest --timeout flag: add pytest-timeout to e2e/requirements.txt.
   Without it, pytest rejected --timeout=300 with exit code 4 before
   running any tests.

2. Controller workers starting before router is ready: add a TCP
   healthcheck to the router service and change all 12 controller
   service depends_on entries from service_started to service_healthy.
   The healthcheck uses python3 socket to probe port 8000, which is
   available in the router image without extra packages.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The create-project JS does window.location.href = "/" but Django
redirects / to /controller/. wait_for_url("http://localhost:8001/")
timed out because the browser landed on /controller/ instead.

Replace with wait_for_load_state("domcontentloaded") which is
redirect-agnostic.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…d state

The create-project form submits via $.post(); on success the callback calls
window.location.href = "/" which Django redirects to /controller/.
wait_for_load_state("domcontentloaded") returned immediately (DOM already
loaded) before the AJAX callback fired, so the assertion saw the Create
New Project page.  Switch to wait_for_url("/controller/") which blocks
until the navigation actually completes.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The navbar in base.html has absolute hrefs like /controller/projects/new
which the previous selector (a[href*="/controller/projects/"]) matched
first.  The actual project detail links in index.html use relative hrefs
like projects/{id}/{site_id} (no leading slash), so they never matched.

Switch to iterating all a[href*="projects/"] links and returning the
first one whose href contains two consecutive integers.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The dataset upload view immediately sets the run status to PREPARING in
the router DB, bypassing Celery. If the coordinator uploads first and all
three sites are already in PREPARING, the coordinator's preparing() fires
and transitions everyone to RUNNING before participants' fetch_run beats
(every 5 s) have dispatched their own process_task('preparing'). Those
tasks never run, so prepare_data() is never called, self.logisticRegr
stays None, and training() raises AttributeError.

Fix: upload participants (b, c) first, sleep 15 s (≥ 3 beat cycles) to
let their process_task('preparing') fire and complete, then upload the
coordinator (a) last so it triggers the PREPARING→RUNNING transition only
after participants have initialised their ML model in ml_models.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@stedrew stedrew requested a review from Farhan-Abbas March 2, 2026 06:41
@stedrew
Copy link
Contributor Author

stedrew commented Mar 2, 2026

This e2e test currently only add one success path for LogisticRegression. It generates 3 mock csv files for 3 sites. There is a race condition caused by the 5-sec polls of celery. So if the sites upload data too fast, it may cause issues. This is unlikely to happen if the portal is operated by humans.

@stedrew stedrew requested a review from Zainab-Saad March 4, 2026 02:55

x-controller-build: &controller-build
build:
context: ../../controller
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that since the file is in workbench, so going up one level (..) gets you to the root of starfish-fl, where the controller directory is located. Using ../../controller would go up two levels and look for controller in the parent directory, which is wrong.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. This appears to be a bug and the test did not capture it because the image has been built before running this step. So I guess the build was skipped during automation. Need to verify it and fix.


router:
build:
context: ../../router
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as controller on line 11.

@stedrew stedrew force-pushed the feature/e2e-ui-tests branch from bbd72e9 to 8bac44b Compare March 4, 2026 16:57
@stedrew stedrew merged commit c74805e into main Mar 4, 2026
6 checks passed
@stedrew stedrew deleted the feature/e2e-ui-tests branch March 4, 2026 17:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add UI end-to-end tests for the federated learning web portal

2 participants