feat(skills): add /add-voice-transcription-free-whisper skill#2317
Open
ira-at-work wants to merge 3 commits into
Open
feat(skills): add /add-voice-transcription-free-whisper skill#2317ira-at-work wants to merge 3 commits into
ira-at-work wants to merge 3 commits into
Conversation
Adds a skill that wires local (free) voice transcription to any NanoClaw channel — voice messages and audio attachments are automatically transcribed to text before the agent sees them. Supports two backends, each detected and installed via a preflight check: - openai-whisper (Python, GPU-accelerated when available) - whisper.cpp (CPU-only, no Python required) The skill drops a transcription module into src/transcription.ts and patches the relevant channel adapters (Signal, Telegram, WhatsApp) to invoke it. Pre-flight verifies the chosen backend binary is present before modifying any files. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Author
|
Closing — incorrect base, will re-open with clean branch. |
This was referenced May 13, 2026
mtichikawa
added a commit
to mtichikawa/nanoclaw
that referenced
this pull request
May 13, 2026
Adds a feature skill that wires opt-in voice transcription into Discord and any other Chat SDK-bridged channel (Slack, Teams, Webex, Google Chat, etc.) via local whisper.cpp on the host. Pairs with @ira-at-work's nanocoai#2317 (add-voice-transcription-free-whisper) which covers Signal/Telegram/ WhatsApp via per-adapter patches — together they cover every channel from a single skill apiece. Addresses the Discord side of the gap reported in nanocoai#2426 (LLM cant see the image in discord — same shape applies to voice today). Per CONTRIBUTING.md (single feature-skill PR against main): includes SKILL.md trio plus the code that will live on skill/discord-voice- transcription after maintainer extraction. Files: - .claude/skills/add-discord-voice-transcription/SKILL.md: install instructions following the @ddaniels Signal v2 template (nanocoai#1953). Pre-flight, prerequisites, idempotent merge from upstream/skill/discord-voice-transcription, env vars, restart, troubleshooting. - .claude/skills/add-discord-voice-transcription/REMOVE.md: unset WHISPER_BIN (lightweight) or revert chat-sdk-bridge to upstream/main and delete src/transcription.ts (full). - .claude/skills/add-discord-voice-transcription/VERIFY.md: send a voice memo, check inline transcript. - src/transcription.ts: transcribeAudioBuffer(Buffer) + isAudioAttachment(att). Channel-agnostic. Shells out to ffmpeg (input -> 16 kHz mono WAV) and whisper-cli. Returns null on any failure or empty output. - src/transcription.test.ts: 8 tests — isAudioAttachment truth table, env-gate, trim, empty-output, execFile-failure paths. - src/channels/chat-sdk-bridge.ts: +15 lines, gated on process.env.WHISPER_BIN. Skipped silently when unset, so behavior is unchanged for installs that don't opt in. Test plan: - [x] pnpm run build clean on main - [x] npx vitest run src/transcription.test.ts — 8/8 passing - [x] npx vitest run src/channels/chat-sdk-bridge.test.ts — 12/12 passing (no regression in existing bridge tests) - [ ] Live test: send a Discord voice memo with WHISPER_BIN set - [ ] Live test: same with WHISPER_BIN unset, confirm no behavior change Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mtichikawa
added a commit
to mtichikawa/nanoclaw
that referenced
this pull request
May 13, 2026
Opt-in voice transcription for Discord and any other Chat SDK-bridged channel (Slack, Teams, Webex, Google Chat) via local whisper.cpp on the host. No cloud API, no OPENAI_API_KEY — transcription is fully on-device. Pairs with @ira-at-work's nanocoai#2317 (add-voice-transcription-free-whisper) which patches Signal/Telegram/WhatsApp adapters directly. This skill is the bridge-side complement — one shared hook in chat-sdk-bridge.ts covers every Chat SDK-bridged channel. Together the two skills close the voice gap on every channel NanoClaw supports. Addresses the Discord side of @b1ek's nanocoai#2426 (LLM cant see the image in discord — same shape applies to voice today). Files: - .claude/skills/add-discord-voice-transcription/{SKILL.md, REMOVE.md, VERIFY.md} — follows @ddaniels' merged Signal v2 template (nanocoai#1953): pre-flight, prerequisites, git fetch upstream skill/discord-voice- transcription, env vars, restart, troubleshooting. - src/transcription.ts — transcribeAudioBuffer(Buffer) and isAudioAttachment(att). Channel-agnostic. Shells out to ffmpeg for input normalization (any container → 16 kHz mono WAV) and whisper-cli for the transcript. Returns null on any failure or empty output. - src/transcription.test.ts — 8 unit tests covering the truth table for isAudioAttachment plus env-gate, trim, empty-output, and execFile- failure paths. - src/channels/chat-sdk-bridge.ts — +15 lines inside messageToInbound(). Gated on process.env.WHISPER_BIN and isAudioAttachment(entry). When WHISPER_BIN is unset, the path is a no-op so behavior is unchanged for installs that don't opt in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This was referenced May 30, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
/add-voice-transcription-free-whisperskill that wires local (free) voice transcription to NanoClaw channelsTest plan
openai-whisperinstalled — verifysrc/transcription.tsis written and adapter patchedwhisper.cppinstalled — verify correct backend selected🤖 Generated with Claude Code