diff --git a/skills/wolverin0/eye2byte/SKILL.md b/skills/wolverin0/eye2byte/SKILL.md new file mode 100644 index 00000000000..498be6b82e4 --- /dev/null +++ b/skills/wolverin0/eye2byte/SKILL.md @@ -0,0 +1,92 @@ +--- +name: eye2byte +description: Give your agent eyes — capture screenshots, voice, and annotations from any screen, monitor, or device via MCP. +version: 0.3.1 +metadata: + openclaw: + requires: + bins: + - python + anyBins: + - ffmpeg + emoji: "\U0001F441" + homepage: https://github.com/wolverin0/Eye2byte + install: + - kind: uv + package: eye2byte + bins: [eye2byte] +--- + +# Eye2byte — Screen Context for Your Agent + +You have access to Eye2byte, which lets you **see the user's screen**. Use these MCP tools when the user asks you to look at something, debug a visual issue, capture their screen, or when visual context would help you assist them better. + +## Available MCP Tools + +### `capture_and_summarize` +Screenshot the user's screen and get a structured analysis. + +Parameters: +- `mode` — `"full"` (default), `"window"`, or `"region"` +- `monitor` — `0` = active monitor (default), `1`/`2`/`3` = specific monitor, `-1` = ALL monitors at once +- `delay` — seconds to wait before capturing (useful for menus/tooltips) +- `window_name` — capture a specific app window by name (e.g., `"chrome"`, `"code"`) + +Use this when the user says things like "look at my screen", "what do you see", "debug this", or "what's wrong here". + +### `capture_with_voice` +Screenshot + voice recording + transcription. Returns both visual analysis and what the user said. + +Use when the user wants to describe something verbally while showing their screen. + +### `record_clip_and_summarize` +Record a short screen clip, extract keyframes, and analyze the sequence. + +Use when the user wants to show you something that changes over time (animations, workflows, step sequences). + +### `summarize_screenshot` +Analyze an existing image file. Pass a file path to get a structured analysis. + +### `transcribe_audio` +Local Whisper transcription of any audio file. + +### `get_recent_context` +Retrieve recent Context Pack summaries from previous captures. + +Use this to recall what you've seen recently without re-capturing. + +## What You Get Back + +Every capture returns a structured **Context Pack**: + +``` +Goal — what the user appears to be doing +Environment — OS, editor, repo, branch, language +Screen State — visible panels, files, terminal output +Signals — verbatim errors, stack traces, warnings +Likely Situation — what's probably happening +Suggested Next Info — what you should ask or do next +``` + +## When to Use Eye2byte + +- User mentions something visual ("this button is broken", "the layout is wrong") +- User asks you to "look at" or "check" something on their screen +- You need to see error dialogs, UI bugs, or terminal output the user can't easily copy +- User is debugging and visual context would help your diagnosis +- User asks you to monitor or watch something +- You want to verify your changes had the intended visual effect + +## Multi-Monitor Tips + +- `monitor=-1` captures ALL monitors stitched together — useful for seeing the full workspace +- `monitor=1`, `2`, `3` for targeting specific displays +- Default (`monitor=0`) captures whichever monitor has the active window + +## Setup + +Eye2byte must be running on the machine whose screen you want to capture: + +**Local (same machine):** Already configured if this skill loaded. + +**Remote (different machine):** The user runs `eye2byte-mcp --sse --token ` on their local machine, and configures the MCP connection URL in openclaw.json.