Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
92 changes: 92 additions & 0 deletions skills/wolverin0/eye2byte/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
name: eye2byte
description: Give your agent eyes — capture screenshots, voice, and annotations from any screen, monitor, or device via MCP.
version: 0.3.1
metadata:
openclaw:
requires:
bins:
- python
anyBins:
- ffmpeg
emoji: "\U0001F441"
homepage: https://github.com/wolverin0/Eye2byte
install:
- kind: uv
package: eye2byte
bins: [eye2byte]
---

# Eye2byte — Screen Context for Your Agent

You have access to Eye2byte, which lets you **see the user's screen**. Use these MCP tools when the user asks you to look at something, debug a visual issue, capture their screen, or when visual context would help you assist them better.

## Available MCP Tools

### `capture_and_summarize`
Screenshot the user's screen and get a structured analysis.

Parameters:
- `mode` — `"full"` (default), `"window"`, or `"region"`
- `monitor` — `0` = active monitor (default), `1`/`2`/`3` = specific monitor, `-1` = ALL monitors at once
- `delay` — seconds to wait before capturing (useful for menus/tooltips)
- `window_name` — capture a specific app window by name (e.g., `"chrome"`, `"code"`)

Use this when the user says things like "look at my screen", "what do you see", "debug this", or "what's wrong here".

### `capture_with_voice`
Screenshot + voice recording + transcription. Returns both visual analysis and what the user said.

Use when the user wants to describe something verbally while showing their screen.

### `record_clip_and_summarize`
Record a short screen clip, extract keyframes, and analyze the sequence.

Use when the user wants to show you something that changes over time (animations, workflows, step sequences).

### `summarize_screenshot`
Analyze an existing image file. Pass a file path to get a structured analysis.

### `transcribe_audio`
Local Whisper transcription of any audio file.

### `get_recent_context`
Retrieve recent Context Pack summaries from previous captures.

Use this to recall what you've seen recently without re-capturing.

## What You Get Back

Every capture returns a structured **Context Pack**:

```
Goal — what the user appears to be doing
Environment — OS, editor, repo, branch, language
Screen State — visible panels, files, terminal output
Signals — verbatim errors, stack traces, warnings
Likely Situation — what's probably happening
Suggested Next Info — what you should ask or do next
```

## When to Use Eye2byte

- User mentions something visual ("this button is broken", "the layout is wrong")
- User asks you to "look at" or "check" something on their screen
- You need to see error dialogs, UI bugs, or terminal output the user can't easily copy
- User is debugging and visual context would help your diagnosis
- User asks you to monitor or watch something
- You want to verify your changes had the intended visual effect

## Multi-Monitor Tips

- `monitor=-1` captures ALL monitors stitched together — useful for seeing the full workspace
- `monitor=1`, `2`, `3` for targeting specific displays
- Default (`monitor=0`) captures whichever monitor has the active window

## Setup

Eye2byte must be running on the machine whose screen you want to capture:

**Local (same machine):** Already configured if this skill loaded.

**Remote (different machine):** The user runs `eye2byte-mcp --sse --token <secret>` on their local machine, and configures the MCP connection URL in openclaw.json.