feat(skill): add /add-discord-voice-transcription by mtichikawa · Pull Request #2459 · nanocoai/nanoclaw

mtichikawa · 2026-05-13T15:35:54Z

Opt-in voice transcription for Discord and any other Chat SDK-bridged channel (Slack, Teams, Webex, Google Chat, etc.) via local whisper.cpp on the host. No cloud API, no OPENAI_API_KEY — fully on-device.

Pairs with @ira-at-work's #2317 (add-voice-transcription-free-whisper) which patches Signal/Telegram/WhatsApp adapters directly. This skill is the bridge-side complement — one shared hook in chat-sdk-bridge.ts covers every Chat SDK-bridged channel. Together the two skills close the voice gap on every channel NanoClaw supports.

Addresses the Discord side of @b1ek's #2426 (LLM cant see the image in discord — same shape applies to voice today).

Type of Change

Feature skill - adds a channel or integration (source code changes + SKILL.md)
Utility skill - adds a standalone tool (code files in .claude/skills/<name>/, no source changes)
Operational/container skill - adds a workflow or agent skill (SKILL.md only, no source changes)
Fix - bug fix or security fix to source code
Simplification - reduces or simplifies source code
Documentation - docs, README, or CONTRIBUTING changes only

Description

The Chat SDK bridge is shared by every chat-sdk channel (Discord, Slack, Teams, Webex, Google Chat). Hooking transcription here means all bridge-based channels gain voice support from a single integration point, gated on `process.env.WHISPER_BIN` so it's a no-op for installs that don't opt in.

Files:

`.claude/skills/add-discord-voice-transcription/SKILL.md` — install instructions following the @ddaniels Signal v2 template (feat(skill): Add Signal channel adapter (V2) #1953). Pre-flight, prerequisites, `git fetch upstream skill/discord-voice-transcription && git merge`, env vars, restart, troubleshooting.
`.claude/skills/add-discord-voice-transcription/REMOVE.md` — lightweight unset-env path + full code revert.
`.claude/skills/add-discord-voice-transcription/VERIFY.md` — send a voice memo, check inline transcript.
`src/transcription.ts` — `transcribeAudioBuffer(Buffer): Promise<string | null>` + `isAudioAttachment(att)` helper. Channel-agnostic. Shells out to `ffmpeg` for normalization (any container → 16 kHz mono WAV) and `whisper-cli` for the transcript. Returns null on any failure or empty output; the bridge only injects `[Voice: …]` when the transcript is non-empty.
`src/channels/chat-sdk-bridge.ts` — +15 lines inside `messageToInbound()`. Transcript is stored on the attachment entry (for any downstream consumer) and appended to `serialized.content` so the agent sees it inline alongside the audio file placeholder.
`src/transcription.test.ts` — 8 unit tests covering `isAudioAttachment`, env-gate, transcript trim, empty-output handling, and execFile-failure paths.

For Skills

SKILL.md contains instructions, not inline code (code goes in separate files)
SKILL.md is under 500 lines (~160 lines)
I tested this skill on a fresh clone — pending live end-to-end test on Discord; build + unit tests green (see test plan)

Test plan

`pnpm run build` clean on main
`npx vitest run src/transcription.test.ts` — 8/8 passing
`npx vitest run src/channels/chat-sdk-bridge.test.ts` — 12/12 passing (no regression)
Live: send a Discord voice memo with `WHISPER_BIN=whisper-cli` set, confirm `[Voice: ]` reaches the agent
Live: same with `WHISPER_BIN` unset, confirm no behavior change

🤖 Generated with Claude Code

@ira-at-work

Opt-in voice transcription for Discord and any other Chat SDK-bridged channel (Slack, Teams, Webex, Google Chat) via local whisper.cpp on the host. No cloud API, no OPENAI_API_KEY — transcription is fully on-device. Pairs with @ira-at-work's nanocoai#2317 (add-voice-transcription-free-whisper) which patches Signal/Telegram/WhatsApp adapters directly. This skill is the bridge-side complement — one shared hook in chat-sdk-bridge.ts covers every Chat SDK-bridged channel. Together the two skills close the voice gap on every channel NanoClaw supports. Addresses the Discord side of @b1ek's nanocoai#2426 (LLM cant see the image in discord — same shape applies to voice today). Files: - .claude/skills/add-discord-voice-transcription/{SKILL.md, REMOVE.md, VERIFY.md} — follows @ddaniels' merged Signal v2 template (nanocoai#1953): pre-flight, prerequisites, git fetch upstream skill/discord-voice- transcription, env vars, restart, troubleshooting. - src/transcription.ts — transcribeAudioBuffer(Buffer) and isAudioAttachment(att). Channel-agnostic. Shells out to ffmpeg for input normalization (any container → 16 kHz mono WAV) and whisper-cli for the transcript. Returns null on any failure or empty output. - src/transcription.test.ts — 8 unit tests covering the truth table for isAudioAttachment plus env-gate, trim, empty-output, and execFile- failure paths. - src/channels/chat-sdk-bridge.ts — +15 lines inside messageToInbound(). Gated on process.env.WHISPER_BIN and isAudioAttachment(entry). When WHISPER_BIN is unset, the path is a no-op so behavior is unchanged for installs that don't opt in. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mtichikawa requested review from gabi-simons and gavrielc as code owners May 13, 2026 15:44

mtichikawa mentioned this pull request May 13, 2026

feat(channels): voice transcription hook in Chat SDK bridge #2458

Closed

5 tasks

github-actions Bot added PR: Feature New feature or enhancement PR: Skill Skill package or skill-related changes labels May 13, 2026

mtichikawa force-pushed the feat/add-discord-voice-transcription branch from 62716e3 to 32adefc Compare May 13, 2026 15:46

github-actions Bot added the follows-guidelines PR was created using the current contributing template label May 13, 2026

This was referenced May 14, 2026

🦞 OpenClaw 生态日报 2026-05-14 gsscsd/big_model_radar#342

Open

🦞 OpenClaw 生态日报 2026-05-14 ivanweng2077/big_model_radar#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(skill): add /add-discord-voice-transcription#2459

feat(skill): add /add-discord-voice-transcription#2459
mtichikawa wants to merge 1 commit into
nanocoai:mainfrom
mtichikawa:feat/add-discord-voice-transcription

mtichikawa commented May 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mtichikawa commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Type of Change

Description

For Skills

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mtichikawa commented May 13, 2026 •

edited

Loading