tests/
├── conftest.py — Shared fixtures (Flask app, mock LLM, Playwright)
├── mock_llm_server.py — Standalone mock LLM API server
├── run_all.py — Legacy test runner (prefer pytest/make)
│
├── test_smoke.py — Import validation, syntax checks, blueprint registration
├── test_backend_unit.py — Core backend unit tests (build_body, tool parsing, etc.)
├── test_swarm_unit.py — Multi-agent swarm system (protocol, registry, orchestrator)
├── test_package_facades.py — Package façade import validation (search, browser, pdf, skills)
├── test_project_tools.py — Project-mode helpers (output cleaning, write targets, safety)
├── test_cross_platform.py — Cross-platform compat layer tests
├── test_cc_alignment.py — Claude Code alignment feature tests
├── test_new_features.py — New feature integration tests
├── test_compaction_improvements.py — Context compaction pipeline tests
├── test_streaming_and_prefetch.py — Streaming & URL prefetch tests
│
├── test_api_integration.py — API integration tests (Flask test client + mock LLM)
├── test_visual_e2e.py — Playwright visual E2E tests
└── visual_check.py — VLM screenshot analysis helper
| Tier | Marker | Command | Needs Server? | Needs Browser? |
|---|---|---|---|---|
| Unit | @pytest.mark.unit |
make test-unit |
No | No |
| API | @pytest.mark.api |
make test-api |
Test server | No |
| Visual | @pytest.mark.visual |
make test-visual |
Live server | Chromium |
| Slow | @pytest.mark.slow |
(opt-in) | Varies | Varies |
# Quick: unit tests only (fast, no dependencies)
make test-unit
# Full CI pipeline (lint + unit + api + healthcheck)
make ci
# With coverage
make test-coverage
# Individual markers
python -m pytest -m unit -v
python -m pytest -m api --tb=long
python -m pytest -m visual --tb=short
# Filter by name
python -m pytest -k "test_build_body" -v
# Just smoke tests
make smoke| Directory | Purpose | CI? | Style |
|---|---|---|---|
tests/ |
Structured, self-contained pytest tests with mocks | ✅ Yes | pytest classes + markers |
debug/ |
Manual exploration scripts, benchmarks, API-dependent tests | ❌ No | Standalone scripts, python debug/script.py |
Tests in debug/ require real API keys, live servers, or specific hardware
(GPUs, etc.) and are not run in CI. When a debug/ test proves valuable
for regression prevention, it gets migrated to tests/ with proper mocking.
- Create
tests/test_*.pywith test functions/classes - Add
@pytest.mark.unit(orapi/visual/slow) to every test - Use fixtures from
conftest.pyfor Flask app/client/mock LLM - Verify:
make test-unitpasses - Ensure
make cipasses before submitting a PR
Tests without markers will fail (strict-markers is enabled in pyproject.toml).