#744 Add lightweight action trace artifacts for simulator/WebKit failures
Why OpenSafari should reflect this
Playwright and OpenTelemetry show that reliable automation needs action-level evidence: step name, timing, context, retry/timeout state, and links to screenshots/logs. OpenSafari currently has metrics and several reports, but failed iOS Safari/native/Flutter live validations still require reconstructing what happened from scattered tool output.
This should be reflected because OpenSafari's differentiator is direct iOS Safari + simulator control. When private APIs, WebKit sockets, AX trees, or Flutter VM Service calls fail, merge reviewers need a compact artifact that explains the failure without enabling heavy tracing by default.
Scope / how to implement
- Add a lightweight OpenSafari trace artifact contract, preferably JSON-first and dependency-free.
- Capture at minimum: run id, device id, tool/action name, start/end timestamps, duration, status, timeout, retry count if known, context source (webkit/native/flutter), and optional artifact paths (screenshot, console log, HAR, crash log).
- Keep collection opt-in for live validation or failure-only paths; do not add always-on heavy tracing.
- Reuse existing metrics/reporting surfaces where possible (
src/metrics/*, src/qa/*, src/orchestration/*).
- Add unit tests for serialization/redaction/bounded size.
Decisions needed before implementation
- Artifact location:
test-output/opensafari-traces/, tests/*/output/, or caller-provided path?
- Default policy: opt-in env var, failure-only, or scenario-runner-only first?
- Redaction policy: which labels/URLs/headers are safe to include by default?
- Whether PR 1 should be schema+docs only, or include scenario-runner integration.
Success criteria
- A documented trace schema exists and is stable enough for CI artifacts.
- A targeted test proves trace events are bounded, redactable, and preserve timing/status.
- Existing MCP tool behavior is unchanged when tracing is disabled.
- No new production dependency is introduced.
Post-merge OpenSafari live validation
- Run a failing/timeout scenario against a booted simulator and confirm the trace identifies the failing action, device, timeout, and last artifact path.
- Run a passing scenario with tracing disabled and confirm no trace artifact is emitted.
- Attach the trace artifact to a PR/check run or local
test-output directory so maintainers can inspect it after failure.
Ambiguity review
This issue intentionally excludes adopting Playwright's trace format or OpenTelemetry SDK exporters. The first mergeable unit is OpenSafari-native trace evidence only.
Direction and necessity review (2026 OSS comparison)
- Aligned: yes — trace artifacts support OpenSafari's simulator/WebKit reliability without re-platforming to Playwright/Appium.
- Necessary: yes — current logs/metrics are useful but not sufficient as a single failure artifact for merge/post-merge live validation.
- Minimal first PR: schema + dependency-free writer + scenario-runner integration only; no OpenTelemetry SDK/exporter.
#744 Add lightweight action trace artifacts for simulator/WebKit failures
Why OpenSafari should reflect this
Playwright and OpenTelemetry show that reliable automation needs action-level evidence: step name, timing, context, retry/timeout state, and links to screenshots/logs. OpenSafari currently has metrics and several reports, but failed iOS Safari/native/Flutter live validations still require reconstructing what happened from scattered tool output.
This should be reflected because OpenSafari's differentiator is direct iOS Safari + simulator control. When private APIs, WebKit sockets, AX trees, or Flutter VM Service calls fail, merge reviewers need a compact artifact that explains the failure without enabling heavy tracing by default.
Scope / how to implement
src/metrics/*,src/qa/*,src/orchestration/*).Decisions needed before implementation
test-output/opensafari-traces/,tests/*/output/, or caller-provided path?Success criteria
Post-merge OpenSafari live validation
test-outputdirectory so maintainers can inspect it after failure.Ambiguity review
This issue intentionally excludes adopting Playwright's trace format or OpenTelemetry SDK exporters. The first mergeable unit is OpenSafari-native trace evidence only.
Direction and necessity review (2026 OSS comparison)