openclaw · wolverin0 · Feb 27, 2026
diff --git a/skills/wolverin0/eye2byte/SKILL.md b/skills/wolverin0/eye2byte/SKILL.md
@@ -0,0 +1,92 @@
+---
+name: eye2byte
+description: Give your agent eyes — capture screenshots, voice, and annotations from any screen, monitor, or device via MCP.
+version: 0.3.1
+metadata:
+  openclaw:
+    requires:
+      bins:
+        - python
+      anyBins:
+        - ffmpeg
+    emoji: "\U0001F441"
+    homepage: https://github.com/wolverin0/Eye2byte
+    install:
+      - kind: uv
+        package: eye2byte
+        bins: [eye2byte]
+---
+
+# Eye2byte — Screen Context for Your Agent
+
+You have access to Eye2byte, which lets you **see the user's screen**. Use these MCP tools when the user asks you to look at something, debug a visual issue, capture their screen, or when visual context would help you assist them better.
+
+## Available MCP Tools
+
+### `capture_and_summarize`
+Screenshot the user's screen and get a structured analysis.
+
+Parameters:
+- `mode` — `"full"` (default), `"window"`, or `"region"`
+- `monitor` — `0` = active monitor (default), `1`/`2`/`3` = specific monitor, `-1` = ALL monitors at once
+- `delay` — seconds to wait before capturing (useful for menus/tooltips)
+- `window_name` — capture a specific app window by name (e.g., `"chrome"`, `"code"`)
+
+Use this when the user says things like "look at my screen", "what do you see", "debug this", or "what's wrong here".
+
+### `capture_with_voice`
+Screenshot + voice recording + transcription. Returns both visual analysis and what the user said.
+
+Use when the user wants to describe something verbally while showing their screen.
+
+### `record_clip_and_summarize`
+Record a short screen clip, extract keyframes, and analyze the sequence.
+
+Use when the user wants to show you something that changes over time (animations, workflows, step sequences).
+
+### `summarize_screenshot`
+Analyze an existing image file. Pass a file path to get a structured analysis.
+
+### `transcribe_audio`
+Local Whisper transcription of any audio file.
+
+### `get_recent_context`
+Retrieve recent Context Pack summaries from previous captures.
+
+Use this to recall what you've seen recently without re-capturing.
+
+## What You Get Back
+
+Every capture returns a structured **Context Pack**:
+
+```
+Goal           — what the user appears to be doing
+Environment    — OS, editor, repo, branch, language
+Screen State   — visible panels, files, terminal output
+Signals        — verbatim errors, stack traces, warnings
+Likely Situation — what's probably happening
+Suggested Next Info — what you should ask or do next
+```
+
+## When to Use Eye2byte
+
+- User mentions something visual ("this button is broken", "the layout is wrong")
+- User asks you to "look at" or "check" something on their screen
+- You need to see error dialogs, UI bugs, or terminal output the user can't easily copy
+- User is debugging and visual context would help your diagnosis
+- User asks you to monitor or watch something
+- You want to verify your changes had the intended visual effect
+
+## Multi-Monitor Tips
+
+- `monitor=-1` captures ALL monitors stitched together — useful for seeing the full workspace
+- `monitor=1`, `2`, `3` for targeting specific displays
+- Default (`monitor=0`) captures whichever monitor has the active window
+
+## Setup
+
+Eye2byte must be running on the machine whose screen you want to capture:
+
+**Local (same machine):** Already configured if this skill loaded.
+
+**Remote (different machine):** The user runs `eye2byte-mcp --sse --token <secret>` on their local machine, and configures the MCP connection URL in openclaw.json.