Add lightweight action trace artifacts for simulator/WebKit failures

#744 Add lightweight action trace artifacts for simulator/WebKit failures
## Why OpenSafari should reflect this

Playwright and OpenTelemetry show that reliable automation needs action-level evidence: step name, timing, context, retry/timeout state, and links to screenshots/logs. OpenSafari currently has metrics and several reports, but failed iOS Safari/native/Flutter live validations still require reconstructing what happened from scattered tool output.

This should be reflected because OpenSafari's differentiator is direct iOS Safari + simulator control. When private APIs, WebKit sockets, AX trees, or Flutter VM Service calls fail, merge reviewers need a compact artifact that explains the failure without enabling heavy tracing by default.

## Scope / how to implement

- Add a lightweight OpenSafari trace artifact contract, preferably JSON-first and dependency-free.
- Capture at minimum: run id, device id, tool/action name, start/end timestamps, duration, status, timeout, retry count if known, context source (webkit/native/flutter), and optional artifact paths (screenshot, console log, HAR, crash log).
- Keep collection opt-in for live validation or failure-only paths; do not add always-on heavy tracing.
- Reuse existing metrics/reporting surfaces where possible (`src/metrics/*`, `src/qa/*`, `src/orchestration/*`).
- Add unit tests for serialization/redaction/bounded size.

## Decisions needed before implementation

1. Artifact location: `test-output/opensafari-traces/`, `tests/*/output/`, or caller-provided path?
2. Default policy: opt-in env var, failure-only, or scenario-runner-only first?
3. Redaction policy: which labels/URLs/headers are safe to include by default?
4. Whether PR 1 should be schema+docs only, or include scenario-runner integration.

## Success criteria

- A documented trace schema exists and is stable enough for CI artifacts.
- A targeted test proves trace events are bounded, redactable, and preserve timing/status.
- Existing MCP tool behavior is unchanged when tracing is disabled.
- No new production dependency is introduced.

## Post-merge OpenSafari live validation

- Run a failing/timeout scenario against a booted simulator and confirm the trace identifies the failing action, device, timeout, and last artifact path.
- Run a passing scenario with tracing disabled and confirm no trace artifact is emitted.
- Attach the trace artifact to a PR/check run or local `test-output` directory so maintainers can inspect it after failure.

## Ambiguity review

This issue intentionally excludes adopting Playwright's trace format or OpenTelemetry SDK exporters. The first mergeable unit is OpenSafari-native trace evidence only.


## Direction and necessity review (2026 OSS comparison)

- Aligned: yes — trace artifacts support OpenSafari's simulator/WebKit reliability without re-platforming to Playwright/Appium.
- Necessary: yes — current logs/metrics are useful but not sufficient as a single failure artifact for merge/post-merge live validation.
- Minimal first PR: schema + dependency-free writer + scenario-runner integration only; no OpenTelemetry SDK/exporter.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add lightweight action trace artifacts for simulator/WebKit failures #744

Why OpenSafari should reflect this

Scope / how to implement

Decisions needed before implementation

Success criteria

Post-merge OpenSafari live validation

Ambiguity review

Direction and necessity review (2026 OSS comparison)

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add lightweight action trace artifacts for simulator/WebKit failures #744

Description

Why OpenSafari should reflect this

Scope / how to implement

Decisions needed before implementation

Success criteria

Post-merge OpenSafari live validation

Ambiguity review

Direction and necessity review (2026 OSS comparison)

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions