feat(channels): voice transcription hook in Chat SDK bridge by mtichikawa · Pull Request #2458 · nanocoai/nanoclaw

mtichikawa · 2026-05-13T15:35:33Z

Summary

Adds an opt-in voice transcription pass to the shared Chat SDK bridge. When WHISPER_BIN is set and an inbound attachment looks like audio, the bridge runs whisper.cpp on the buffer after fetchData() and appends the transcript to the message content as [Voice: <transcript>].

When WHISPER_BIN is unset (the default), the new code path is a no-op — zero behavior change for existing installs.

Why one bridge hook instead of per-adapter

The Chat SDK bridge is shared by Discord, Slack, Teams, Webex, Google Chat, and every future Chat SDK-supported platform. Hooking transcription here means all bridge-based channels gain voice support from a single integration point. The bridge already handles attachment download centrally (messageToInbound → att.fetchData()), so the transcription pass slots in naturally right after the buffer is in hand.

Sibling effort: @ira-at-work's #2317 (/add-voice-transcription-free-whisper) patches Signal/Telegram/WhatsApp adapters directly (they don't go through the Chat SDK bridge). That PR + this one would together cover every channel.

Demand signal: #2426 (just opened by @b1ek) — "LLM cant see the image in discord" — voice transcription is the same kind of gap.

Files

src/transcription.ts (new, ~95 lines): transcribeAudioBuffer(Buffer): Promise<string | null> plus isAudioAttachment(att) helper. Channel-agnostic. Shells out to ffmpeg for input normalization (any container → 16 kHz mono WAV) and whisper-cli for the transcript. Returns null on any failure or empty output; the bridge only injects [Voice: ...] when the transcript is non-empty.
src/channels/chat-sdk-bridge.ts (+15 lines): gated transcription pass inside messageToInbound(). Runs only when process.env.WHISPER_BIN is set and the attachment passes isAudioAttachment (mimeType: audio/* or coarse type: 'audio'/'voice').
src/transcription.test.ts (new, 8 tests): isAudioAttachment truth table; transcribeAudioBuffer env-gate, trim, empty-output, and execFile-failure paths.

Total: 3 files, +236 lines.

Pairs with

A sibling PR against main adds the user-facing skill metadata: .claude/skills/add-discord-voice-transcription/{SKILL.md, REMOVE.md, VERIFY.md} following the @ddaniels Signal template (#1953). It instructs users to fetch the modified chat-sdk-bridge.ts and the new transcription.ts from origin/channels after this PR lands.

Test plan

pnpm run build — clean (3 pre-existing errors in deltachat/slack/telegram unrelated to this change)
npx vitest run src/transcription.test.ts — 8/8 passing
npx vitest run src/channels/chat-sdk-bridge.test.ts — 7/7 passing (unchanged)
Live test: send a Discord voice memo with WHISPER_BIN=whisper-cli set, confirm the agent receives [Voice: <transcript>] inline. Will run this once the channels-branch checkout is up.
Live test: same with WHISPER_BIN unset, confirm voice attachments arrive as plain audio placeholders (no regression).

🤖 Generated with Claude Code

Add an opt-in voice transcription pass to the shared Chat SDK bridge. When WHISPER_BIN is set and an inbound attachment looks like audio (mimeType audio/* or coarse type audio/voice), the bridge runs whisper.cpp on the buffer after fetchData() and appends the transcript to the message content as [Voice: <transcript>]. Skipped silently when WHISPER_BIN is unset, so no behavior changes for existing installs. The hook lives in chat-sdk-bridge.ts (shared by Discord, Slack, Teams, Webex, etc.) rather than per-adapter, mirroring how the bridge already handles attachment download centrally. Pairs with the SKILL.md trio added in a sibling PR on main. - src/transcription.ts: transcribeAudioBuffer(Buffer) + isAudioAttachment. Channel-agnostic. Uses node:child_process to shell out to ffmpeg (input normalization → 16 kHz mono WAV) and whisper-cli. Returns null on any error or empty output; the bridge only injects [Voice: ...] when the transcript is non-empty. - src/channels/chat-sdk-bridge.ts: gated transcription pass after fetchData(). Stores transcript on the attachment entry as well so any downstream consumer (e.g., formatter, agent-runner) can use it. - src/transcription.test.ts: 8 tests covering isAudioAttachment, env-gate, trim, empty output, and execFile failure paths. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Collapses multi-line execFileAsync calls to match the project's formatter. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mtichikawa · 2026-05-13T15:44:43Z

Closing — wrong target. After re-reading CONTRIBUTING.md, feature-skill PRs should be a single PR against main containing both the SKILL.md trio and the source code, with maintainers extracting the code to a skill/<name> branch on merge. Also src/channels/chat-sdk-bridge.ts exists on main (not just channels), so the modification belongs there.

Folded both changes into #2459 against main — that's now a single self-contained feature-skill PR following the template you used for @ddaniels' merged Signal PR (#1953).

mtichikawa requested review from gabi-simons and gavrielc as code owners May 13, 2026 15:35

mtichikawa mentioned this pull request May 13, 2026

feat(skill): add /add-discord-voice-transcription #2459

Open

14 tasks

style: apply prettier to src/transcription.ts

b59513e

Collapses multi-line execFileAsync calls to match the project's formatter. No behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

mtichikawa closed this May 13, 2026

This was referenced May 14, 2026

🦞 OpenClaw 生态日报 2026-05-14 gsscsd/big_model_radar#342

Open

🦞 OpenClaw 生态日报 2026-05-14 ivanweng2077/big_model_radar#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(channels): voice transcription hook in Chat SDK bridge#2458

feat(channels): voice transcription hook in Chat SDK bridge#2458
mtichikawa wants to merge 2 commits into
nanocoai:channelsfrom
mtichikawa:feat/voice-transcription-bridge

mtichikawa commented May 13, 2026

Uh oh!

mtichikawa commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mtichikawa commented May 13, 2026

Summary

Why one bridge hook instead of per-adapter

Files

Pairs with

Test plan

Uh oh!

mtichikawa commented May 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant