Skip to content

feat: add inbound image vision for Telegram photos#140

Open
billyshipp wants to merge 1 commit into
nanocoai:mainfrom
billyshipp:feat/image-vision-inbound
Open

feat: add inbound image vision for Telegram photos#140
billyshipp wants to merge 1 commit into
nanocoai:mainfrom
billyshipp:feat/image-vision-inbound

Conversation

@billyshipp
Copy link
Copy Markdown

What

Adds image vision for Telegram — photos sent to the bot are downloaded, resized, and passed to Claude as multimodal content blocks so the agent can see and understand image content.

Why

PR #9 covers this but includes outbound sendImage with path traversal vulnerabilities (absolute paths from container agents passed verbatim to host sendImage, enabling host filesystem exfiltration). This PR delivers inbound vision only, with security fixes applied.

How it works

  1. message:photo handler downloads the largest available photo from Telegram's file API
  2. sharp resizes to max 1024px, converts to JPEG at quality 85, saves to groups/{folder}/attachments/
  3. Message content becomes [Image: attachments/img-{ts}-{rand}.jpg]
  4. parseImageReferences() extracts refs from messages before each agent run
  5. Agent-runner reads the files and pushes them as multimodal content blocks to Claude

Security vs PR #9

  • Path confinement: path.resolve() + prefix check ensures image paths stay within /workspace/group/attachments/ before readFileSync (fixes audit Finding 3)
  • Media type allowlist: only image/jpeg, image/png, image/gif, image/webp accepted before passing to Claude API (fixes Finding 6)
  • file_path validation: Telegram's file_path validated to not contain :// or .. before URL construction (fixes Finding 5)
  • No sendImage / no IPC send_image: outbound image sending excluded entirely, eliminating the Critical path traversal findings (1, 2, 4) from the audit

How it was tested

Tested on a live NanoClaw + Telegram instance. Sent a photo in a registered Telegram group — agent correctly described the image content.

Type of change

  • Source code change

Downloads photos sent via Telegram, resizes to 1024px max with sharp,
saves as JPEG attachments, and passes as multimodal content blocks so
Claude can see and understand image content.

Security: path confinement added to container agent-runner (all image
paths validated to stay within /workspace/group/attachments/) and
media type allowlisted before passing to Claude API. Outbound sendImage
intentionally excluded — inbound vision only.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant