diff --git a/docs/roadmap/2026-oss-library-comparison.md b/docs/roadmap/2026-oss-library-comparison.md new file mode 100644 index 00000000..82355ebd --- /dev/null +++ b/docs/roadmap/2026-oss-library-comparison.md @@ -0,0 +1,113 @@ +# 2026 OSS library comparison for OpenSafari stability, memory, Flutter QA, search, and login + +_Last reviewed: 2026-05-14 KST. Source set: official project docs/repos where available._ + +## Scope and current OpenSafari baseline + +OpenSafari is a macOS/iOS-Simulator focused MCP server for iOS Safari, WebKit Remote Debugging Protocol, native Accessibility/SimulatorKit input, and Flutter VM Service inspection. The repository already contains several directionally important surfaces: + +- Browser/native automation: `src/webkit/*`, `src/native/*`, `src/tools/app-*`, `src/tools/wait-for.ts`, `src/tools/app-wait-for.ts`. +- Reliability: `src/reliability/*`, `src/watchdog/*`, private API sentinels, headless smoke workflow, simulator/proxy readiness work. +- Memory: `src/metrics/memory-tracker.ts`, `src/metrics/heap-snapshot-diff.ts`, `tests/soak/*`, `src/tools/flutter-memory-profile.ts`. +- Flutter QA: `src/tools/qa-flutter-*`, `src/flutter/vm-service-client.ts`, `src/tools/flutter-*`. +- Login/auth persistence: `src/auth/manager.ts`, `src/tools/auth.ts`, existing issue #699. +- Search: no product-facing web-search engine. OpenSafari searches/queries pages and native trees; general web search is outside the core automation runtime. + +The safe strategy is therefore not to import large competitors wholesale. OpenSafari should copy proven patterns that reduce flakiness, increase diagnostic evidence, and expose memory/Flutter validation without changing default runtime semantics. + +## Comparative analysis + +### Browser and mobile automation frameworks + +| Library | Strengths OpenSafari can learn from | Weaknesses / mismatch vs OpenSafari | Safe OpenSafari application | +|---|---|---|---| +| Appium 2 | Mature mobile-web/native/hybrid abstraction, capability-driven sessions, explicit command timeout (`newCommandTimeout`), broad ecosystem for iOS Safari and Flutter drivers. | Heavy server/driver stack; WebDriver indirection can add latency and hides private Simulator/WebKit details that OpenSafari intentionally controls directly. | Adopt capability/contract style for OpenSafari live validation issues and session health docs; avoid runtime dependency. | +| WebdriverIO | Auto-waiting around interactable elements, timeout taxonomy, Appium integration, protocol abstraction across WebDriver/BiDi/mobile. | Its model assumes WebDriver sessions; OpenSafari already has direct MCP tools and native bridges. | Improve native `app_wait_for`/action diagnostics with stability windows and timeout metadata; no dependency. | +| Playwright | Auto-wait, trace viewer, retry-on-failure trace capture, action-level snapshots, console/network correlation. | Desktop WebKit != iOS Safari simulator. Playwright trace format is large and runner-specific. | Add OpenSafari-native lightweight action trace artifacts for failed/long live validations. | +| Puppeteer | Direct protocol control, useful CDP tracing/perf patterns, low-level browser primitives. | Chrome/CDP-centric; not applicable to iOS Safari/WebKit Remote Debugging Protocol without translation. | Keep direct protocol philosophy; do not adopt as dependency. | +| Selenium / WebDriver BiDi | Standardization trajectory for bidirectional browser automation and logs/events. | Safari/iOS support still mediated by drivers; less direct than OpenSafari's target. | Track BiDi vocabulary for future event naming, but do not re-platform. | + +**Conclusion:** The mandatory improvement is Playwright/WebdriverIO-inspired diagnostics and auto-wait metadata, not Appium/Selenium/Puppeteer adoption. + +### Observability and memory tooling + +| Library | Strengths OpenSafari can learn from | Weaknesses / mismatch | Safe OpenSafari application | +|---|---|---|---| +| OpenTelemetry JS | Standard traces/metrics/logs vocabulary; spans map well to MCP tool calls, simulator boot, proxy readiness, WebKit commands. | SDK/exporter dependencies can be non-trivial and introduce startup/config complexity. | Define an OpenTelemetry-compatible trace schema in docs and JSON artifacts first; optional exporter later. | +| Sentry | Error grouping, performance traces, crash reporting across Node and Flutter. | External SaaS/self-hosted dependency, privacy concerns, credentials/config burden. | Keep Sentry as optional downstream consumer of structured logs; do not embed. | +| memlab | Three-snapshot leak detection, class-level heap reasoning, Node/browser snapshot assertions. | Puppeteer/Chromium orientation for browser scenarios; full leak graph analysis can be heavy. | Extend existing heap snapshot diff/memory soak docs with OpenSafari scenario budgets and class-delta thresholds. | +| Clinic.js | Fast local Node profiling for event loop/flame/heap. | Dev-time tool, not runtime feature. | Document as optional triage command for memory/latency regressions. | +| Node heap snapshots | Built-in and dependency-free; good for CI artifacts. | Snapshot creation can pause process and double memory temporarily. | Keep behind explicit soak/live validation only; never default-on hot path. | +| autocannon | Simple HTTP benchmark for transports. | Only applies to HTTP/SSE transport, not stdio/local MCP or simulator latency. | Optional benchmark recipe for HTTP MCP transport; not mandatory now. | + +**Conclusion:** Mandatory improvement is a dependency-free OpenSafari memory/trace validation contract that uses existing metrics/heap-snapshot surfaces and avoids default runtime overhead. + +### Flutter stability tooling + +| Library | Strengths OpenSafari can learn from | Weaknesses / mismatch | Safe OpenSafari application | +|---|---|---|---| +| Flutter DevTools Memory | Allocation timeseries, diff snapshots, GC-aware leak workflows. | GUI/manual; release builds lack VM Service. | Make OpenSafari's `flutter_allocation_profile` leak workflow explicit and thresholded. | +| leak_tracker | Test-time leak assertions around object lifecycle. | Dart package inside target app; OpenSafari cannot require apps to include it. | Provide external VM Service budget checks; recommend leak_tracker only as app-side complement. | +| Patrol | Flutter-first E2E plus native automation; good at native permission/dialog flows. | Requires app/test harness; not a generic MCP runtime dependency. | Mirror the pattern: combine Flutter VM Service + native AX assertions in recipes. | +| Maestro | Semantics-tree, black-box flows, simple YAML, device-level interactions. | Separate runner and DSL; would duplicate OpenSafari orchestration. | Strengthen semantics-first QA and live validation scripts; avoid separate DSL dependency. | +| Appium Flutter Driver | Flutter widget selectors via Appium ecosystem. | Heavy WebDriver/Appium stack; requires app instrumentation. | Keep Flutter VM Service APIs; do not route through Appium. | + +**Conclusion:** Mandatory improvement is a Flutter memory budget/live validation recipe and small helper semantics, not importing Patrol/Maestro/Appium. + +### Fast web search engines + +| Library | Strengths | Weaknesses / mismatch | OpenSafari action | +|---|---|---|---| +| Typesense / Meilisearch | Fast typo-tolerant indexing, search-as-you-type. | Product search engine, not browser automation core. Adds service dependency. | Out of scope for runtime. Could inspire local artifact search later, but not mandatory. | +| SearXNG | Privacy-preserving metasearch. | Running external metasearch is unrelated to iOS Safari automation. | Do not adopt. | +| Tantivy / Quickwit | Fast indexing/log search. | Rust/service integration heavy. | Only consider if log volume outgrows simple JSON artifacts. Not mandatory. | + +**Conclusion:** Fast web search is directionally misaligned for OpenSafari core. The aligned substitute is searchable local trace/report artifacts, not a search engine dependency. + +### Fast login / authentication libraries + +| Library | Strengths | Weaknesses / mismatch | Safe OpenSafari application | +|---|---|---|---| +| SimpleWebAuthn | Clear passkey/WebAuthn ceremony model. | OpenSafari is not an RP/auth server; simulator passkey UX may require system prompts/keychain state. | Add passkey/login validation guidance and prompt-handling recipes; no dependency. | +| Auth.js / Better Auth | Developer-friendly OAuth/session patterns. | Web app auth frameworks, not OpenSafari runtime concerns. | Use only as examples in docs for test-app login flows. | +| Keycloak / ZITADEL / Logto / Ory / SuperTokens | Mature IAM/SSO options. | Heavy infra; adopting them would be out of scope and brittle for a browser automation MCP server. | Do not adopt. Existing #699 covers auth profile persistence; avoid duplicate work. | + +**Conclusion:** The mandatory login work is already represented by #699. New work should add non-duplicative validation guidance for login/passkey prompt automation only if it directly supports OpenSafari verification. + +## Mandatory improvement candidates + +After comparing the libraries to repository direction and existing issues, only the following are mandatory now: + +1. **Lightweight OpenSafari action trace artifacts** inspired by Playwright traces and OpenTelemetry spans. + - Why: failures in simulator/WebKit/native flows need correlated command timing, timeout, context, screenshots/log references, and recovery hints. + - Risk control: JSON artifact only; no default behavior change; no dependency. + +2. **Stable native wait diagnostics** inspired by WebdriverIO/Playwright auto-wait. + - Why: `app_wait_for` currently reports timeout/query/polls but not last observed candidates, stability windows, or why a visible/enabled condition failed. + - Risk control: backward-compatible optional parameters and richer JSON response. + +3. **Flutter memory budget live validation recipe/tooling** inspired by Flutter DevTools Memory, leak_tracker, and memlab. + - Why: OpenSafari already exposes allocation profiles and heap snapshots, but merge/post-merge validation needs a repeatable budget contract. + - Risk control: VM Service only, debug/profile builds only, optional scripts/docs; no app package dependency. + +4. **Passkey/login prompt live-validation guidance** inspired by SimpleWebAuthn/Auth.js, but scoped to OpenSafari automation. + - Why: fast login is central to app QA, but OpenSafari should not become an auth framework. The required work is a validation recipe using existing auth profile, native alert handling, app/webview tools, and explicit decisions. + - Risk control: documentation/issue contract unless a gap is found; do not duplicate #699. + +Not mandatory now: Appium/WebDriver re-platforming, Sentry/OpenTelemetry SDK embedding, Typesense/Meilisearch/SearXNG/Tantivy/Quickwit services, Auth.js/Keycloak/ZITADEL/Ory/SuperTokens runtime integrations. + +## Sources + +- Appium introduction/platform support/capabilities: https://appium.io/docs/en/latest/ and https://appium.github.io/appium.io/docs/en/about-appium/intro/ +- WebdriverIO auto-wait/timeouts/protocol docs: https://webdriver.io/docs/autowait/ and https://webdriver.io/docs/timeouts +- Playwright trace viewer and browser docs: https://playwright.dev/docs/trace-viewer-intro and https://playwright.dev/docs/browsers +- Puppeteer docs: https://developer.chrome.com/docs/puppeteer +- Selenium WebDriver BiDi docs: https://www.selenium.dev/documentation/webdriver/bidi/ +- OpenTelemetry JS docs: https://opentelemetry.io/docs/languages/js/ +- memlab docs: https://facebook.github.io/memlab/docs/intro +- Flutter DevTools Memory docs: https://docs.flutter.dev/tools/devtools/memory +- Maestro Flutter/how-it-works docs: https://docs.maestro.dev/get-started/supported-platform/flutter and https://docs.maestro.dev/get-started/how-maestro-works +- Typesense docs: https://typesense.org/docs/ +- Meilisearch docs: https://www.meilisearch.com/docs/ +- SimpleWebAuthn docs: https://simplewebauthn.dev/docs/ +- Auth.js docs: https://authjs.dev/ diff --git a/src/observability/action-trace.ts b/src/observability/action-trace.ts new file mode 100644 index 00000000..0bd9b741 --- /dev/null +++ b/src/observability/action-trace.ts @@ -0,0 +1,135 @@ +import { promises as fs } from 'fs'; +import * as path from 'path'; + +export type ActionTraceStatus = 'passed' | 'failed' | 'timeout' | 'skipped'; +export type ActionTraceContext = 'webkit' | 'native' | 'flutter' | 'simulator' | 'orchestration' | 'unknown'; + +export interface ActionTraceArtifact { + kind: 'screenshot' | 'console' | 'network' | 'crash' | 'log' | 'other'; + path: string; +} + +export interface ActionTraceEventInput { + action: string; + status: ActionTraceStatus; + context?: ActionTraceContext; + deviceId?: string; + startedAtMs: number; + endedAtMs: number; + timeoutMs?: number; + retryCount?: number; + error?: string; + metadata?: Record; + artifacts?: ActionTraceArtifact[]; +} + +export interface ActionTraceEvent extends ActionTraceEventInput { + durationMs: number; +} + +export interface ActionTraceDocument { + version: 1; + runId: string; + createdAt: string; + events: ActionTraceEvent[]; +} + +const MAX_STRING_LENGTH = 500; +const MAX_METADATA_KEYS = 30; +const SECRET_KEY_PATTERN = /(authorization|cookie|password|secret|token|credential|api[-_]?key)/i; + +export class ActionTraceRecorder { + private readonly events: ActionTraceEvent[] = []; + private readonly createdAt = new Date().toISOString(); + + constructor(private readonly runId: string) {} + + record(input: ActionTraceEventInput): void { + this.events.push(normalizeEvent(input)); + } + + toJSON(): ActionTraceDocument { + return { + version: 1, + runId: this.runId, + createdAt: this.createdAt, + events: [...this.events], + }; + } + + async write(filePath: string): Promise { + await writeActionTrace(filePath, this.toJSON()); + } +} + +export async function writeActionTrace( + filePath: string, + document: ActionTraceDocument, +): Promise { + await fs.mkdir(path.dirname(filePath), { recursive: true }); + await fs.writeFile(filePath, JSON.stringify(document, null, 2) + '\n', 'utf8'); +} + +export function normalizeEvent(input: ActionTraceEventInput): ActionTraceEvent { + const startedAtMs = finiteOrZero(input.startedAtMs); + const endedAtMs = Math.max(startedAtMs, finiteOrZero(input.endedAtMs)); + return { + action: truncate(input.action || 'unknown'), + status: input.status, + context: input.context ?? 'unknown', + ...(input.deviceId ? { deviceId: truncate(input.deviceId) } : {}), + startedAtMs, + endedAtMs, + durationMs: endedAtMs - startedAtMs, + ...(typeof input.timeoutMs === 'number' ? { timeoutMs: Math.max(0, input.timeoutMs) } : {}), + ...(typeof input.retryCount === 'number' ? { retryCount: Math.max(0, Math.floor(input.retryCount)) } : {}), + ...(input.error ? { error: truncate(input.error) } : {}), + ...(input.metadata ? { metadata: sanitizeMetadata(input.metadata) } : {}), + ...(input.artifacts ? { artifacts: input.artifacts.slice(0, 20).map(sanitizeArtifact) } : {}), + }; +} + +export function sanitizeMetadata(metadata: Record): Record { + const out: Record = {}; + for (const key of Object.keys(metadata).slice(0, MAX_METADATA_KEYS)) { + if (SECRET_KEY_PATTERN.test(key)) { + out[key] = '[REDACTED]'; + continue; + } + out[key] = sanitizeValue(metadata[key]); + } + return out; +} + +function sanitizeValue(value: unknown): unknown { + if (typeof value === 'string') return truncate(value); + if (typeof value === 'number' || typeof value === 'boolean' || value === null) return value; + if (Array.isArray(value)) return value.slice(0, 20).map(sanitizeValue); + if (typeof value === 'object' && value !== null) { + const out: Record = {}; + for (const key of Object.keys(value as Record).slice(0, MAX_METADATA_KEYS)) { + out[key] = SECRET_KEY_PATTERN.test(key) + ? '[REDACTED]' + : sanitizeValue((value as Record)[key]); + } + return out; + } + return String(value); +} + +function sanitizeArtifact(artifact: ActionTraceArtifact): ActionTraceArtifact { + return { + kind: artifact.kind, + path: truncate(artifact.path), + }; +} + +function finiteOrZero(value: number): number { + return Number.isFinite(value) ? value : 0; +} + +function truncate(value: string): string { + return value.length > MAX_STRING_LENGTH + ? `${value.slice(0, MAX_STRING_LENGTH)}…` + : value; +} diff --git a/src/orchestration/scenario-runner.ts b/src/orchestration/scenario-runner.ts index 6db3b76d..d116376f 100644 --- a/src/orchestration/scenario-runner.ts +++ b/src/orchestration/scenario-runner.ts @@ -1,8 +1,11 @@ import { SimulatorPool, PooledSimulator } from '../simulator/pool'; +import { ActionTraceRecorder } from '../observability/action-trace'; export interface TestScenario { name: string; steps: TestStep[]; + /** Optional JSON trace artifact path for action-level live-validation evidence. */ + tracePath?: string; } export interface TestStep { @@ -44,11 +47,25 @@ export class ScenarioRunner { async run(scenario: TestScenario): Promise { const startTime = Date.now(); const stepResults: StepResult[] = []; + const trace = scenario.tracePath ? new ActionTraceRecorder(scenario.name) : null; let allPassed = true; for (let i = 0; i < scenario.steps.length; i++) { const step = scenario.steps[i]; const result = await this.executeStep(i, step); + for (const device of result.devices) { + trace?.record({ + action: `${step.action}:${i}`, + status: device.passed ? 'passed' : 'failed', + context: step.action === 'navigate' || step.action === 'assert' ? 'webkit' : 'orchestration', + deviceId: device.deviceId, + startedAtMs: startTime + Math.max(0, Date.now() - startTime - device.timing), + endedAtMs: startTime + Math.max(0, Date.now() - startTime), + timeoutMs: step.timeout, + error: device.error, + metadata: { device: device.device, result: device.result }, + }); + } stepResults.push(result); if (!result.passed) { allPassed = false; @@ -56,6 +73,10 @@ export class ScenarioRunner { } } + if (trace && scenario.tracePath) { + await trace.write(scenario.tracePath); + } + const duration = Date.now() - startTime; const passedSteps = stepResults.filter(s => s.passed).length; const totalDevices = this.pool.getAll().length; diff --git a/tests/unit/action-trace.test.ts b/tests/unit/action-trace.test.ts new file mode 100644 index 00000000..64a7291f --- /dev/null +++ b/tests/unit/action-trace.test.ts @@ -0,0 +1,50 @@ +import { mkdtempSync, readFileSync } from 'fs'; +import { tmpdir } from 'os'; +import * as path from 'path'; +import { ActionTraceRecorder, normalizeEvent, sanitizeMetadata } from '../../src/observability/action-trace'; + +describe('action trace artifacts', () => { + it('normalizes timing and duration', () => { + const event = normalizeEvent({ + action: 'tap', + status: 'passed', + startedAtMs: 100, + endedAtMs: 350, + }); + expect(event.durationMs).toBe(250); + expect(event.context).toBe('unknown'); + }); + + it('redacts secret-like metadata keys recursively', () => { + expect(sanitizeMetadata({ + authorization: 'Bearer secret', + nested: { password: 'pw', ok: 'value' }, + })).toEqual({ + authorization: '[REDACTED]', + nested: { password: '[REDACTED]', ok: 'value' }, + }); + }); + + it('writes bounded JSON trace documents', async () => { + const dir = mkdtempSync(path.join(tmpdir(), 'opensafari-trace-')); + const tracePath = path.join(dir, 'trace.json'); + const recorder = new ActionTraceRecorder('run-1'); + recorder.record({ + action: 'navigate', + status: 'timeout', + context: 'webkit', + deviceId: 'device-1', + startedAtMs: 0, + endedAtMs: 10, + timeoutMs: 10, + error: 'timed out', + metadata: { token: 'secret', url: 'https://example.com' }, + }); + + await recorder.write(tracePath); + const parsed = JSON.parse(readFileSync(tracePath, 'utf8')); + expect(parsed.version).toBe(1); + expect(parsed.events).toHaveLength(1); + expect(parsed.events[0].metadata.token).toBe('[REDACTED]'); + }); +});