Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
108 changes: 108 additions & 0 deletions CLAUDE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,108 @@
# CLAUDE.md

This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

## Project Overview

Oracle (`@steipete/oracle`) is a CLI tool that wraps OpenAI's Responses API to query multiple AI models (GPT-5.x, Gemini 3.x, Claude 4.x) with file context. It supports API mode, browser automation (ChatGPT/Gemini via Chrome DevTools Protocol), MCP server integration, and remote bridge execution.

## Commands

```bash
# Package manager: pnpm (10.23.0)
pnpm install # Install dependencies

# Build
pnpm run build # TypeScript compile + copy vendor files

# Lint & Format (oxlint + oxfmt, NOT ESLint/Prettier)
pnpm run check # format:check + lint (runs in CI)
pnpm run lint # typecheck + oxlint
pnpm run lint:fix # oxlint --fix + oxfmt
pnpm run format # oxfmt --write
pnpm run typecheck # tsc --noEmit

# Tests (Vitest)
pnpm test # Run all unit tests
pnpm vitest run tests/oracle/run.test.ts # Single test file
pnpm vitest run -t "test name pattern" # Single test by name
pnpm test:coverage # Unit tests with v8 coverage
pnpm test:mcp # Build + MCP unit + mcporter integration
pnpm test:browser # Browser automation smokes (needs Chrome on port 45871)
ORACLE_LIVE_TEST=1 pnpm test:live # Live API tests (costs real tokens)
ORACLE_LIVE_TEST=1 pnpm test:pro # Pro model tests (10+ min)
```

## Architecture

```
bin/
oracle-cli.ts # CLI entry point (commander-based, 1700+ lines)
oracle-mcp.ts # MCP server entry point

src/
oracle/ # Core engine
run.ts # Main orchestrator — assembles prompt, calls API, streams response
client.ts # API client factory (OpenAI, Azure, Gemini, custom endpoints)
modelResolver.ts # Model name → provider routing logic
files.ts # File globbing + token estimation
multiModelRunner.ts # Parallel multi-model execution
gemini.ts / claude.ts # Provider-specific adapters

browser/ # Chrome DevTools Protocol automation
index.ts # Core browser orchestrator (largest file)
chromeLifecycle.ts # Chrome launch/teardown via chrome-launcher
cookies.ts # Cookie sync (sweet-cookie for macOS Keychain)
reattach.ts # Session recovery on navigation/crash
actions/ # DOM interaction modules
assistantResponse.ts # Capture AI response from page
attachments.ts # File/image upload automation
promptComposer.ts # Type prompt into chat input
modelSelection.ts # Pick model from ChatGPT dropdown
navigation.ts # URL/iframe handling
providers/ # DOM selector definitions per site
chatgptDomProvider.ts
geminiDeepThinkDomProvider.ts

cli/ # CLI layer
options.ts # Commander option definitions
sessionRunner.ts # Executes a single oracle run
sessionDisplay.ts # Terminal output rendering
browserConfig.ts # Browser flag aggregation
tui/ # Interactive terminal UI (excluded from coverage)

gemini-web/ # Browser-based Gemini client (no API key needed)
remote/ # Remote Chrome bridge (server + client)
bridge/ # MCP/Codex bridge connection
mcp/ # Model Context Protocol server + tools
sessionManager.ts # Session CRUD (stored in ~/.oracle/sessions/)
config.ts # Global config (~/.oracle/config.json, JSON5)
```

### Key Patterns

- **Engine selection**: API (default when `OPENAI_API_KEY` set) vs Browser (Chrome automation). Controlled by `--engine api|browser` or `ORACLE_ENGINE` env var.
- **Model routing**: `modelResolver.ts` maps model strings to providers. Supports OpenAI, Azure OpenAI, Gemini (API + web), Claude, OpenRouter, Grok, and custom endpoints.
- **Session persistence**: Every run creates a session under `~/.oracle/sessions/<id>/` with metadata, prompt, and response. Sessions can be listed (`oracle status`), replayed (`oracle session <id>`), or restarted (`oracle restart <id>`).
- **Path aliases**: `@src/*` → `src/*`, `@tests/*` → `tests/*` (configured in tsconfig.json and vitest.config.ts).

## Code Style

- **Formatter**: oxfmt — 2 spaces, 100 char width, double quotes, trailing commas, semicolons.
- **Linter**: oxlint with plugins: unicorn, typescript, oxc. Categories correctness/perf/suspicious = error.
- **TypeScript**: Strict mode, ES2022 target, ESNext modules, bundler resolution.
- **Module system**: ESM (`"type": "module"` in package.json). Use `.ts` extensions in imports.

## Testing Notes

- Test setup (`tests/setup-env.ts`) injects fake API keys and isolates session storage to `/tmp/oracle-tests-{pid}`. Non-live tests never hit real APIs.
- Live tests are opt-in via `ORACLE_LIVE_TEST=1` env var and require real API keys.
- Browser smoke tests expect Chrome on DevTools port 45871.
- MCP tests require building first (`pnpm run build`).

## AGENTS.md Highlights

- CLI banner uses the oracle emoji: `🧿 oracle (<version>) ...` — only on initial headline and TUI exit.
- Browser Pro runs: never click "Answer now" — wait for the real response (up to 10 min).
- Before release, check `docs/manual-tests.md` for relevant smoke tests.
- After finishing a feature, update CHANGELOG if it affects end users (read top ~100 lines first, group related edits).
19 changes: 19 additions & 0 deletions bin/oracle-cli.ts
Original file line number Diff line number Diff line change
Expand Up @@ -142,6 +142,7 @@ interface CliOptions extends OptionValues {
browserManualLogin?: boolean;
browserManualLoginProfileDir?: string;
browserThinkingTime?: "light" | "standard" | "extended" | "heavy";
deepResearch?: boolean;
browserAllowCookieErrors?: boolean;
browserAttachments?: string;
browserInlineFiles?: boolean;
Expand Down Expand Up @@ -592,6 +593,13 @@ program
.choices(["light", "standard", "extended", "heavy"])
.hideHelp(),
)
.option(
"--deep-research",
"Use ChatGPT Deep Research mode (browser engine only). " +
"Activates autonomous web research that takes 5-30 minutes. " +
"Requires ChatGPT Plus or Pro subscription.",
false,
)
.addOption(
new Option(
"--browser-allow-cookie-errors",
Expand Down Expand Up @@ -1328,6 +1336,17 @@ async function runRootCommand(options: CliOptions): Promise<void> {
options.baseUrl = userConfig.apiBaseUrl;
}

// --deep-research implies browser engine and validates constraints
if (options.deepResearch) {
if (engine !== "browser" && preferredEngine === "api") {
throw new Error("--deep-research requires --engine browser.");
}
engine = "browser";
if (options.models && options.models.length > 0) {
throw new Error("--deep-research cannot be combined with --models (multi-model runs).");
}
}

if (remoteHost && engine !== "browser") {
throw new Error("--remote-host requires --engine browser.");
}
Expand Down
117 changes: 117 additions & 0 deletions docs/deep-research-plan/00-overview.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
# Deep Research Browser Automation — Implementation Plan

## Goal

Add ChatGPT Deep Research support to Oracle's browser automation engine, enabling users to trigger Deep Research from the CLI and receive structured research reports — all using their existing ChatGPT subscription (no API cost).

## Motivation

- ChatGPT Deep Research is a powerful autonomous research agent that browses the web for 5-30 minutes and produces comprehensive cited reports
- OpenAI offers a Deep Research API (`o3-deep-research`, `o4-mini-deep-research`), but it costs ~$10/M input + $40/M output tokens per run
- Users with ChatGPT Plus/Pro subscriptions already have Deep Research included — browser automation lets them use it programmatically at no extra cost
- Oracle already has mature ChatGPT browser automation; extending it for Deep Research is a natural fit

## Usage

```bash
# Basic Deep Research
oracle --deep-research -p "Research the latest trends in AI agent frameworks in 2026"

# With file context
oracle --deep-research -p "Analyze this codebase architecture" --file "src/**/*.ts"

# With custom timeout (default 40 minutes)
oracle --deep-research --timeout 60m -p "Comprehensive market analysis of EV industry"
```

## Architecture Decision: Iframe Handling

The research plan confirmation UI renders in a **cross-origin iframe** (640x400px), making direct DOM manipulation from the main page impossible. Three options were evaluated:

| Option | Approach | Complexity | Robustness |
|--------|----------|------------|------------|
| **A. Wait for auto-confirm** | Start button has ~60s countdown that auto-confirms | Low | High |
| B. CDP iframe targeting | Use `Target.getTargets()` to find iframe execution context | High | Medium |
| C. Coordinate-based clicking | Use `Input.dispatchMouseEvent` at computed coordinates | Medium | Low |

**Decision: Option A.** The auto-confirm countdown eliminates the need to interact with the iframe at all. After detecting the iframe appears, simply wait ~70 seconds for auto-confirmation. This is the most robust approach and matches natural user behavior.

## Implementation Phases

| Phase | Scope | Doc |
|-------|-------|-----|
| 1 | Types, Config, CLI Flag | [01-types-and-config.md](01-types-and-config.md) |
| 2 | Core Action Module (`deepResearch.ts`) | [02-core-actions.md](02-core-actions.md) |
| 3 | Main Flow Integration (`index.ts`) | [03-flow-integration.md](03-flow-integration.md) |
| 4 | Reattach & Session Support | [04-reattach-and-sessions.md](04-reattach-and-sessions.md) |
| 5 | Testing Strategy | [05-testing.md](05-testing.md) |

## UI Flow (Discovered via Live Exploration)

```
┌─────────────────────────────────────────────────────────────┐
│ Phase 1: Activate Deep Research Mode │
│ │
│ [+] button → radix dropdown → "Deep research" item │
│ Result: "Thinking" pill → "Deep research" pill │
│ + "Apps" and "Sites" buttons appear │
└────────────────────────┬────────────────────────────────────┘
┌────────────────────────▼────────────────────────────────────┐
│ Phase 2: Submit Prompt │
│ │
│ Type prompt in textbox → click [send-button] │
│ URL changes to /c/{conversation-id} │
└────────────────────────┬────────────────────────────────────┘
┌────────────────────────▼────────────────────────────────────┐
│ Phase 3: Research Plan (CROSS-ORIGIN IFRAME) │
│ │
│ ┌──────────────────────────────────────┐ │
│ │ "AI agent frameworks trends" │ │
│ │ ○ Survey academic papers... │ │
│ │ ○ Review documentation... │ │
│ │ ○ Analyze blog posts... │ │
│ │ │ │
│ │ [Edit] [Cancel] [Start (53)] │ │
│ └──────────────────────────────────────┘ │
│ Auto-confirms after ~60 second countdown │
└────────────────────────┬────────────────────────────────────┘
┌────────────────────────▼────────────────────────────────────┐
│ Phase 4: Research Execution (5-30 minutes) │
│ │
│ Status updates in iframe: "Researching..." │
│ "Considering methods for framework comparison..." │
│ [Update] button visible in iframe │
└────────────────────────┬────────────────────────────────────┘
┌────────────────────────▼────────────────────────────────────┐
│ Phase 5: Report Complete │
│ │
│ Iframe disappears, full markdown report in conversation │
│ Copy/Rate buttons appear (FINISHED_ACTIONS_SELECTOR) │
│ Extract text via existing assistantResponse.ts │
└─────────────────────────────────────────────────────────────┘
```

## Key DOM Selectors

| Element | Selector | Notes |
|---------|----------|-------|
| "+" button | `[data-testid="composer-plus-btn"]` | Opens radix dropdown |
| Deep Research menu item | `[data-radix-collection-item]` text="Deep research" | No `data-testid` |
| Deep Research pill | `.__composer-pill-composite` with aria "Deep research" | Replaces Thinking pill |
| Send button | `[data-testid="send-button"]` | Same as normal chat |
| Research plan iframe | `iframe.h-full.w-full` inside assistant turn | Cross-origin |
| Completion indicator | `FINISHED_ACTIONS_SELECTOR` (copy/rate buttons) | Existing constant |

## Risks and Mitigations

| Risk | Mitigation |
|------|-----------|
| ChatGPT changes Deep Research UI selectors | Use text-match "Deep research" as primary; multiple fallback selectors |
| Auto-confirm timer changes | Detect confirmation via iframe state change, not fixed timer |
| Research exceeds timeout | Default 40min timeout; `--timeout` override; reattach mechanism for interrupted runs |
| "+" button `data-testid` changes | Fallback: `button[aria-label*="Add files"]`, positional matching |
| Deep Research unavailable for account tier | Clear error message with subscription requirement info |
Loading