Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
113 changes: 113 additions & 0 deletions docs/roadmap/2026-oss-library-comparison.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
# 2026 OSS library comparison for OpenSafari stability, memory, Flutter QA, search, and login

_Last reviewed: 2026-05-14 KST. Source set: official project docs/repos where available._

## Scope and current OpenSafari baseline

OpenSafari is a macOS/iOS-Simulator focused MCP server for iOS Safari, WebKit Remote Debugging Protocol, native Accessibility/SimulatorKit input, and Flutter VM Service inspection. The repository already contains several directionally important surfaces:

- Browser/native automation: `src/webkit/*`, `src/native/*`, `src/tools/app-*`, `src/tools/wait-for.ts`, `src/tools/app-wait-for.ts`.
- Reliability: `src/reliability/*`, `src/watchdog/*`, private API sentinels, headless smoke workflow, simulator/proxy readiness work.
- Memory: `src/metrics/memory-tracker.ts`, `src/metrics/heap-snapshot-diff.ts`, `tests/soak/*`, `src/tools/flutter-memory-profile.ts`.
- Flutter QA: `src/tools/qa-flutter-*`, `src/flutter/vm-service-client.ts`, `src/tools/flutter-*`.
- Login/auth persistence: `src/auth/manager.ts`, `src/tools/auth.ts`, existing issue #699.
- Search: no product-facing web-search engine. OpenSafari searches/queries pages and native trees; general web search is outside the core automation runtime.

The safe strategy is therefore not to import large competitors wholesale. OpenSafari should copy proven patterns that reduce flakiness, increase diagnostic evidence, and expose memory/Flutter validation without changing default runtime semantics.

## Comparative analysis

### Browser and mobile automation frameworks

| Library | Strengths OpenSafari can learn from | Weaknesses / mismatch vs OpenSafari | Safe OpenSafari application |
|---|---|---|---|
| Appium 2 | Mature mobile-web/native/hybrid abstraction, capability-driven sessions, explicit command timeout (`newCommandTimeout`), broad ecosystem for iOS Safari and Flutter drivers. | Heavy server/driver stack; WebDriver indirection can add latency and hides private Simulator/WebKit details that OpenSafari intentionally controls directly. | Adopt capability/contract style for OpenSafari live validation issues and session health docs; avoid runtime dependency. |
| WebdriverIO | Auto-waiting around interactable elements, timeout taxonomy, Appium integration, protocol abstraction across WebDriver/BiDi/mobile. | Its model assumes WebDriver sessions; OpenSafari already has direct MCP tools and native bridges. | Improve native `app_wait_for`/action diagnostics with stability windows and timeout metadata; no dependency. |
| Playwright | Auto-wait, trace viewer, retry-on-failure trace capture, action-level snapshots, console/network correlation. | Desktop WebKit != iOS Safari simulator. Playwright trace format is large and runner-specific. | Add OpenSafari-native lightweight action trace artifacts for failed/long live validations. |
| Puppeteer | Direct protocol control, useful CDP tracing/perf patterns, low-level browser primitives. | Chrome/CDP-centric; not applicable to iOS Safari/WebKit Remote Debugging Protocol without translation. | Keep direct protocol philosophy; do not adopt as dependency. |
| Selenium / WebDriver BiDi | Standardization trajectory for bidirectional browser automation and logs/events. | Safari/iOS support still mediated by drivers; less direct than OpenSafari's target. | Track BiDi vocabulary for future event naming, but do not re-platform. |

**Conclusion:** The mandatory improvement is Playwright/WebdriverIO-inspired diagnostics and auto-wait metadata, not Appium/Selenium/Puppeteer adoption.

### Observability and memory tooling

| Library | Strengths OpenSafari can learn from | Weaknesses / mismatch | Safe OpenSafari application |
|---|---|---|---|
| OpenTelemetry JS | Standard traces/metrics/logs vocabulary; spans map well to MCP tool calls, simulator boot, proxy readiness, WebKit commands. | SDK/exporter dependencies can be non-trivial and introduce startup/config complexity. | Define an OpenTelemetry-compatible trace schema in docs and JSON artifacts first; optional exporter later. |
| Sentry | Error grouping, performance traces, crash reporting across Node and Flutter. | External SaaS/self-hosted dependency, privacy concerns, credentials/config burden. | Keep Sentry as optional downstream consumer of structured logs; do not embed. |
| memlab | Three-snapshot leak detection, class-level heap reasoning, Node/browser snapshot assertions. | Puppeteer/Chromium orientation for browser scenarios; full leak graph analysis can be heavy. | Extend existing heap snapshot diff/memory soak docs with OpenSafari scenario budgets and class-delta thresholds. |
| Clinic.js | Fast local Node profiling for event loop/flame/heap. | Dev-time tool, not runtime feature. | Document as optional triage command for memory/latency regressions. |
| Node heap snapshots | Built-in and dependency-free; good for CI artifacts. | Snapshot creation can pause process and double memory temporarily. | Keep behind explicit soak/live validation only; never default-on hot path. |
| autocannon | Simple HTTP benchmark for transports. | Only applies to HTTP/SSE transport, not stdio/local MCP or simulator latency. | Optional benchmark recipe for HTTP MCP transport; not mandatory now. |

**Conclusion:** Mandatory improvement is a dependency-free OpenSafari memory/trace validation contract that uses existing metrics/heap-snapshot surfaces and avoids default runtime overhead.

### Flutter stability tooling

| Library | Strengths OpenSafari can learn from | Weaknesses / mismatch | Safe OpenSafari application |
|---|---|---|---|
| Flutter DevTools Memory | Allocation timeseries, diff snapshots, GC-aware leak workflows. | GUI/manual; release builds lack VM Service. | Make OpenSafari's `flutter_allocation_profile` leak workflow explicit and thresholded. |
| leak_tracker | Test-time leak assertions around object lifecycle. | Dart package inside target app; OpenSafari cannot require apps to include it. | Provide external VM Service budget checks; recommend leak_tracker only as app-side complement. |
| Patrol | Flutter-first E2E plus native automation; good at native permission/dialog flows. | Requires app/test harness; not a generic MCP runtime dependency. | Mirror the pattern: combine Flutter VM Service + native AX assertions in recipes. |
| Maestro | Semantics-tree, black-box flows, simple YAML, device-level interactions. | Separate runner and DSL; would duplicate OpenSafari orchestration. | Strengthen semantics-first QA and live validation scripts; avoid separate DSL dependency. |
| Appium Flutter Driver | Flutter widget selectors via Appium ecosystem. | Heavy WebDriver/Appium stack; requires app instrumentation. | Keep Flutter VM Service APIs; do not route through Appium. |

**Conclusion:** Mandatory improvement is a Flutter memory budget/live validation recipe and small helper semantics, not importing Patrol/Maestro/Appium.

### Fast web search engines

| Library | Strengths | Weaknesses / mismatch | OpenSafari action |
|---|---|---|---|
| Typesense / Meilisearch | Fast typo-tolerant indexing, search-as-you-type. | Product search engine, not browser automation core. Adds service dependency. | Out of scope for runtime. Could inspire local artifact search later, but not mandatory. |
| SearXNG | Privacy-preserving metasearch. | Running external metasearch is unrelated to iOS Safari automation. | Do not adopt. |
| Tantivy / Quickwit | Fast indexing/log search. | Rust/service integration heavy. | Only consider if log volume outgrows simple JSON artifacts. Not mandatory. |

**Conclusion:** Fast web search is directionally misaligned for OpenSafari core. The aligned substitute is searchable local trace/report artifacts, not a search engine dependency.

### Fast login / authentication libraries

| Library | Strengths | Weaknesses / mismatch | Safe OpenSafari application |
|---|---|---|---|
| SimpleWebAuthn | Clear passkey/WebAuthn ceremony model. | OpenSafari is not an RP/auth server; simulator passkey UX may require system prompts/keychain state. | Add passkey/login validation guidance and prompt-handling recipes; no dependency. |
| Auth.js / Better Auth | Developer-friendly OAuth/session patterns. | Web app auth frameworks, not OpenSafari runtime concerns. | Use only as examples in docs for test-app login flows. |
| Keycloak / ZITADEL / Logto / Ory / SuperTokens | Mature IAM/SSO options. | Heavy infra; adopting them would be out of scope and brittle for a browser automation MCP server. | Do not adopt. Existing #699 covers auth profile persistence; avoid duplicate work. |

**Conclusion:** The mandatory login work is already represented by #699. New work should add non-duplicative validation guidance for login/passkey prompt automation only if it directly supports OpenSafari verification.

## Mandatory improvement candidates

After comparing the libraries to repository direction and existing issues, only the following are mandatory now:

1. **Lightweight OpenSafari action trace artifacts** inspired by Playwright traces and OpenTelemetry spans.
- Why: failures in simulator/WebKit/native flows need correlated command timing, timeout, context, screenshots/log references, and recovery hints.
- Risk control: JSON artifact only; no default behavior change; no dependency.

2. **Stable native wait diagnostics** inspired by WebdriverIO/Playwright auto-wait.
- Why: `app_wait_for` currently reports timeout/query/polls but not last observed candidates, stability windows, or why a visible/enabled condition failed.
- Risk control: backward-compatible optional parameters and richer JSON response.

3. **Flutter memory budget live validation recipe/tooling** inspired by Flutter DevTools Memory, leak_tracker, and memlab.
- Why: OpenSafari already exposes allocation profiles and heap snapshots, but merge/post-merge validation needs a repeatable budget contract.
- Risk control: VM Service only, debug/profile builds only, optional scripts/docs; no app package dependency.

4. **Passkey/login prompt live-validation guidance** inspired by SimpleWebAuthn/Auth.js, but scoped to OpenSafari automation.
- Why: fast login is central to app QA, but OpenSafari should not become an auth framework. The required work is a validation recipe using existing auth profile, native alert handling, app/webview tools, and explicit decisions.
- Risk control: documentation/issue contract unless a gap is found; do not duplicate #699.

Not mandatory now: Appium/WebDriver re-platforming, Sentry/OpenTelemetry SDK embedding, Typesense/Meilisearch/SearXNG/Tantivy/Quickwit services, Auth.js/Keycloak/ZITADEL/Ory/SuperTokens runtime integrations.

## Sources

- Appium introduction/platform support/capabilities: https://appium.io/docs/en/latest/ and https://appium.github.io/appium.io/docs/en/about-appium/intro/
- WebdriverIO auto-wait/timeouts/protocol docs: https://webdriver.io/docs/autowait/ and https://webdriver.io/docs/timeouts
- Playwright trace viewer and browser docs: https://playwright.dev/docs/trace-viewer-intro and https://playwright.dev/docs/browsers
- Puppeteer docs: https://developer.chrome.com/docs/puppeteer
- Selenium WebDriver BiDi docs: https://www.selenium.dev/documentation/webdriver/bidi/
- OpenTelemetry JS docs: https://opentelemetry.io/docs/languages/js/
- memlab docs: https://facebook.github.io/memlab/docs/intro
- Flutter DevTools Memory docs: https://docs.flutter.dev/tools/devtools/memory
- Maestro Flutter/how-it-works docs: https://docs.maestro.dev/get-started/supported-platform/flutter and https://docs.maestro.dev/get-started/how-maestro-works
- Typesense docs: https://typesense.org/docs/
- Meilisearch docs: https://www.meilisearch.com/docs/
- SimpleWebAuthn docs: https://simplewebauthn.dev/docs/
- Auth.js docs: https://authjs.dev/
135 changes: 135 additions & 0 deletions src/observability/action-trace.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,135 @@
import { promises as fs } from 'fs';
import * as path from 'path';

export type ActionTraceStatus = 'passed' | 'failed' | 'timeout' | 'skipped';
export type ActionTraceContext = 'webkit' | 'native' | 'flutter' | 'simulator' | 'orchestration' | 'unknown';

export interface ActionTraceArtifact {
kind: 'screenshot' | 'console' | 'network' | 'crash' | 'log' | 'other';
path: string;
}

export interface ActionTraceEventInput {
action: string;
status: ActionTraceStatus;
context?: ActionTraceContext;
deviceId?: string;
startedAtMs: number;
endedAtMs: number;
timeoutMs?: number;
retryCount?: number;
error?: string;
metadata?: Record<string, unknown>;
artifacts?: ActionTraceArtifact[];
}

export interface ActionTraceEvent extends ActionTraceEventInput {
durationMs: number;
}

export interface ActionTraceDocument {
version: 1;
runId: string;
createdAt: string;
events: ActionTraceEvent[];
}

const MAX_STRING_LENGTH = 500;
const MAX_METADATA_KEYS = 30;
const SECRET_KEY_PATTERN = /(authorization|cookie|password|secret|token|credential|api[-_]?key)/i;

export class ActionTraceRecorder {
private readonly events: ActionTraceEvent[] = [];
private readonly createdAt = new Date().toISOString();

constructor(private readonly runId: string) {}

record(input: ActionTraceEventInput): void {
this.events.push(normalizeEvent(input));
}

toJSON(): ActionTraceDocument {
return {
version: 1,
runId: this.runId,
createdAt: this.createdAt,
events: [...this.events],
};
}

async write(filePath: string): Promise<void> {
await writeActionTrace(filePath, this.toJSON());
}
}

export async function writeActionTrace(
filePath: string,
document: ActionTraceDocument,
): Promise<void> {
await fs.mkdir(path.dirname(filePath), { recursive: true });
await fs.writeFile(filePath, JSON.stringify(document, null, 2) + '\n', 'utf8');
}

export function normalizeEvent(input: ActionTraceEventInput): ActionTraceEvent {
const startedAtMs = finiteOrZero(input.startedAtMs);
const endedAtMs = Math.max(startedAtMs, finiteOrZero(input.endedAtMs));
return {
action: truncate(input.action || 'unknown'),
status: input.status,
context: input.context ?? 'unknown',
...(input.deviceId ? { deviceId: truncate(input.deviceId) } : {}),
startedAtMs,
endedAtMs,
durationMs: endedAtMs - startedAtMs,
...(typeof input.timeoutMs === 'number' ? { timeoutMs: Math.max(0, input.timeoutMs) } : {}),
...(typeof input.retryCount === 'number' ? { retryCount: Math.max(0, Math.floor(input.retryCount)) } : {}),
...(input.error ? { error: truncate(input.error) } : {}),
...(input.metadata ? { metadata: sanitizeMetadata(input.metadata) } : {}),
...(input.artifacts ? { artifacts: input.artifacts.slice(0, 20).map(sanitizeArtifact) } : {}),
};
}

export function sanitizeMetadata(metadata: Record<string, unknown>): Record<string, unknown> {
const out: Record<string, unknown> = {};
for (const key of Object.keys(metadata).slice(0, MAX_METADATA_KEYS)) {
if (SECRET_KEY_PATTERN.test(key)) {
out[key] = '[REDACTED]';
continue;
}
out[key] = sanitizeValue(metadata[key]);
}
return out;
}

function sanitizeValue(value: unknown): unknown {
if (typeof value === 'string') return truncate(value);
if (typeof value === 'number' || typeof value === 'boolean' || value === null) return value;
if (Array.isArray(value)) return value.slice(0, 20).map(sanitizeValue);
if (typeof value === 'object' && value !== null) {
const out: Record<string, unknown> = {};
for (const key of Object.keys(value as Record<string, unknown>).slice(0, MAX_METADATA_KEYS)) {
out[key] = SECRET_KEY_PATTERN.test(key)
? '[REDACTED]'
: sanitizeValue((value as Record<string, unknown>)[key]);
}
return out;
}
return String(value);
}

function sanitizeArtifact(artifact: ActionTraceArtifact): ActionTraceArtifact {
return {
kind: artifact.kind,
path: truncate(artifact.path),
};
}

function finiteOrZero(value: number): number {
return Number.isFinite(value) ? value : 0;
}

function truncate(value: string): string {
return value.length > MAX_STRING_LENGTH
? `${value.slice(0, MAX_STRING_LENGTH)}…`
: value;
}
21 changes: 21 additions & 0 deletions src/orchestration/scenario-runner.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,11 @@
import { SimulatorPool, PooledSimulator } from '../simulator/pool';
import { ActionTraceRecorder } from '../observability/action-trace';

export interface TestScenario {
name: string;
steps: TestStep[];
/** Optional JSON trace artifact path for action-level live-validation evidence. */
tracePath?: string;
}

export interface TestStep {
Expand Down Expand Up @@ -44,18 +47,36 @@ export class ScenarioRunner {
async run(scenario: TestScenario): Promise<ScenarioResult> {
const startTime = Date.now();
const stepResults: StepResult[] = [];
const trace = scenario.tracePath ? new ActionTraceRecorder(scenario.name) : null;
let allPassed = true;

for (let i = 0; i < scenario.steps.length; i++) {
const step = scenario.steps[i];
const result = await this.executeStep(i, step);
for (const device of result.devices) {
trace?.record({
action: `${step.action}:${i}`,
status: device.passed ? 'passed' : 'failed',
context: step.action === 'navigate' || step.action === 'assert' ? 'webkit' : 'orchestration',
deviceId: device.deviceId,
startedAtMs: startTime + Math.max(0, Date.now() - startTime - device.timing),
endedAtMs: startTime + Math.max(0, Date.now() - startTime),
timeoutMs: step.timeout,
error: device.error,
metadata: { device: device.device, result: device.result },
});
}
stepResults.push(result);
if (!result.passed) {
allPassed = false;
// Continue executing remaining steps even on failure
}
}

if (trace && scenario.tracePath) {
await trace.write(scenario.tracePath);
}

const duration = Date.now() - startTime;
const passedSteps = stepResults.filter(s => s.passed).length;
const totalDevices = this.pool.getAll().length;
Expand Down
50 changes: 50 additions & 0 deletions tests/unit/action-trace.test.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
import { mkdtempSync, readFileSync } from 'fs';
import { tmpdir } from 'os';
import * as path from 'path';
import { ActionTraceRecorder, normalizeEvent, sanitizeMetadata } from '../../src/observability/action-trace';

describe('action trace artifacts', () => {
it('normalizes timing and duration', () => {
const event = normalizeEvent({
action: 'tap',
status: 'passed',
startedAtMs: 100,
endedAtMs: 350,
});
expect(event.durationMs).toBe(250);
expect(event.context).toBe('unknown');
});

it('redacts secret-like metadata keys recursively', () => {
expect(sanitizeMetadata({
authorization: 'Bearer secret',
nested: { password: 'pw', ok: 'value' },
})).toEqual({
authorization: '[REDACTED]',
nested: { password: '[REDACTED]', ok: 'value' },
});
});

it('writes bounded JSON trace documents', async () => {
const dir = mkdtempSync(path.join(tmpdir(), 'opensafari-trace-'));
const tracePath = path.join(dir, 'trace.json');
const recorder = new ActionTraceRecorder('run-1');
recorder.record({
action: 'navigate',
status: 'timeout',
context: 'webkit',
deviceId: 'device-1',
startedAtMs: 0,
endedAtMs: 10,
timeoutMs: 10,
error: 'timed out',
metadata: { token: 'secret', url: 'https://example.com' },
});

await recorder.write(tracePath);
const parsed = JSON.parse(readFileSync(tracePath, 'utf8'));
expect(parsed.version).toBe(1);
expect(parsed.events).toHaveLength(1);
expect(parsed.events[0].metadata.token).toBe('[REDACTED]');
});
});
Loading