feat(skills): add /add-voice-transcription-free-whisper skill by ira-at-work · Pull Request #2317 · nanocoai/nanoclaw

ira-at-work · 2026-05-07T09:42:59Z

Summary

New /add-voice-transcription-free-whisper skill that wires local (free) voice transcription to NanoClaw channels
Supports two backends: openai-whisper (Python, GPU-accelerated when available) and whisper.cpp (CPU-only, no Python required)
Pre-flight detects which backend binary is present before modifying any files
Patches Signal, Telegram, and WhatsApp channel adapters to auto-transcribe voice messages and audio attachments to text before the agent processes them
Idempotent — safe to re-run if interrupted

Test plan

Run skill on a fresh install with openai-whisper installed — verify src/transcription.ts is written and adapter patched
Run skill on a fresh install with whisper.cpp installed — verify correct backend selected
Send a voice message via Signal/Telegram — verify it appears as transcribed text to the agent
Run skill twice — verify idempotency (no duplicate patches)
Run skill with neither backend installed — verify pre-flight blocks install with a clear error

🤖 Generated with Claude Code

Adds a skill that wires local (free) voice transcription to any NanoClaw channel — voice messages and audio attachments are automatically transcribed to text before the agent sees them. Supports two backends, each detected and installed via a preflight check: - openai-whisper (Python, GPU-accelerated when available) - whisper.cpp (CPU-only, no Python required) The skill drops a transcription module into src/transcription.ts and patches the relevant channel adapters (Signal, Telegram, WhatsApp) to invoke it. Pre-flight verifies the chosen backend binary is present before modifying any files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

ira-at-work · 2026-05-07T09:46:41Z

Closing — incorrect base, will re-open with clean branch.

@ira-at-work

Adds a feature skill that wires opt-in voice transcription into Discord and any other Chat SDK-bridged channel (Slack, Teams, Webex, Google Chat, etc.) via local whisper.cpp on the host. Pairs with @ira-at-work's nanocoai#2317 (add-voice-transcription-free-whisper) which covers Signal/Telegram/ WhatsApp via per-adapter patches — together they cover every channel from a single skill apiece. Addresses the Discord side of the gap reported in nanocoai#2426 (LLM cant see the image in discord — same shape applies to voice today). Per CONTRIBUTING.md (single feature-skill PR against main): includes SKILL.md trio plus the code that will live on skill/discord-voice- transcription after maintainer extraction. Files: - .claude/skills/add-discord-voice-transcription/SKILL.md: install instructions following the @ddaniels Signal v2 template (nanocoai#1953). Pre-flight, prerequisites, idempotent merge from upstream/skill/discord-voice-transcription, env vars, restart, troubleshooting. - .claude/skills/add-discord-voice-transcription/REMOVE.md: unset WHISPER_BIN (lightweight) or revert chat-sdk-bridge to upstream/main and delete src/transcription.ts (full). - .claude/skills/add-discord-voice-transcription/VERIFY.md: send a voice memo, check inline transcript. - src/transcription.ts: transcribeAudioBuffer(Buffer) + isAudioAttachment(att). Channel-agnostic. Shells out to ffmpeg (input -> 16 kHz mono WAV) and whisper-cli. Returns null on any failure or empty output. - src/transcription.test.ts: 8 tests — isAudioAttachment truth table, env-gate, trim, empty-output, execFile-failure paths. - src/channels/chat-sdk-bridge.ts: +15 lines, gated on process.env.WHISPER_BIN. Skipped silently when unset, so behavior is unchanged for installs that don't opt in. Test plan: - [x] pnpm run build clean on main - [x] npx vitest run src/transcription.test.ts — 8/8 passing - [x] npx vitest run src/channels/chat-sdk-bridge.test.ts — 12/12 passing (no regression in existing bridge tests) - [ ] Live test: send a Discord voice memo with WHISPER_BIN set - [ ] Live test: same with WHISPER_BIN unset, confirm no behavior change Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

@ira-at-work

Opt-in voice transcription for Discord and any other Chat SDK-bridged channel (Slack, Teams, Webex, Google Chat) via local whisper.cpp on the host. No cloud API, no OPENAI_API_KEY — transcription is fully on-device. Pairs with @ira-at-work's nanocoai#2317 (add-voice-transcription-free-whisper) which patches Signal/Telegram/WhatsApp adapters directly. This skill is the bridge-side complement — one shared hook in chat-sdk-bridge.ts covers every Chat SDK-bridged channel. Together the two skills close the voice gap on every channel NanoClaw supports. Addresses the Discord side of @b1ek's nanocoai#2426 (LLM cant see the image in discord — same shape applies to voice today). Files: - .claude/skills/add-discord-voice-transcription/{SKILL.md, REMOVE.md, VERIFY.md} — follows @ddaniels' merged Signal v2 template (nanocoai#1953): pre-flight, prerequisites, git fetch upstream skill/discord-voice- transcription, env vars, restart, troubleshooting. - src/transcription.ts — transcribeAudioBuffer(Buffer) and isAudioAttachment(att). Channel-agnostic. Shells out to ffmpeg for input normalization (any container → 16 kHz mono WAV) and whisper-cli for the transcript. Returns null on any failure or empty output. - src/transcription.test.ts — 8 unit tests covering the truth table for isAudioAttachment plus env-gate, trim, empty-output, and execFile- failure paths. - src/channels/chat-sdk-bridge.ts — +15 lines inside messageToInbound(). Gated on process.env.WHISPER_BIN and isAudioAttachment(entry). When WHISPER_BIN is unset, the path is a no-op so behavior is unchanged for installs that don't opt in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

ira-at-work closed this May 7, 2026

ira-at-work reopened this May 7, 2026

This was referenced May 13, 2026

feat(channels): voice transcription hook in Chat SDK bridge #2458

Closed

feat(skill): add /add-discord-voice-transcription #2459

Open

ira-at-work added 2 commits May 18, 2026 14:15

Merge branch 'main' into feat/add-voice-transcription-free-whisper

cdf3df6

Merge branch 'main' into feat/add-voice-transcription-free-whisper

2aba8bf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skills): add /add-voice-transcription-free-whisper skill#2317

feat(skills): add /add-voice-transcription-free-whisper skill#2317
ira-at-work wants to merge 3 commits into
nanocoai:mainfrom
ira-at-work:feat/add-voice-transcription-free-whisper

ira-at-work commented May 7, 2026

Uh oh!

ira-at-work commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ira-at-work commented May 7, 2026

Summary

Test plan

Uh oh!

ira-at-work commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant