Skip to content

feat(skill): add /add-discord-voice-transcription#2459

Open
mtichikawa wants to merge 1 commit into
nanocoai:mainfrom
mtichikawa:feat/add-discord-voice-transcription
Open

feat(skill): add /add-discord-voice-transcription#2459
mtichikawa wants to merge 1 commit into
nanocoai:mainfrom
mtichikawa:feat/add-discord-voice-transcription

Conversation

@mtichikawa
Copy link
Copy Markdown

@mtichikawa mtichikawa commented May 13, 2026

Opt-in voice transcription for Discord and any other Chat SDK-bridged channel (Slack, Teams, Webex, Google Chat, etc.) via local whisper.cpp on the host. No cloud API, no OPENAI_API_KEY — fully on-device.

Pairs with @ira-at-work's #2317 (add-voice-transcription-free-whisper) which patches Signal/Telegram/WhatsApp adapters directly. This skill is the bridge-side complement — one shared hook in chat-sdk-bridge.ts covers every Chat SDK-bridged channel. Together the two skills close the voice gap on every channel NanoClaw supports.

Addresses the Discord side of @b1ek's #2426 (LLM cant see the image in discord — same shape applies to voice today).

Type of Change

  • Feature skill - adds a channel or integration (source code changes + SKILL.md)
  • Utility skill - adds a standalone tool (code files in .claude/skills/<name>/, no source changes)
  • Operational/container skill - adds a workflow or agent skill (SKILL.md only, no source changes)
  • Fix - bug fix or security fix to source code
  • Simplification - reduces or simplifies source code
  • Documentation - docs, README, or CONTRIBUTING changes only

Description

The Chat SDK bridge is shared by every chat-sdk channel (Discord, Slack, Teams, Webex, Google Chat). Hooking transcription here means all bridge-based channels gain voice support from a single integration point, gated on `process.env.WHISPER_BIN` so it's a no-op for installs that don't opt in.

Files:

  • `.claude/skills/add-discord-voice-transcription/SKILL.md` — install instructions following the @ddaniels Signal v2 template (feat(skill): Add Signal channel adapter (V2) #1953). Pre-flight, prerequisites, `git fetch upstream skill/discord-voice-transcription && git merge`, env vars, restart, troubleshooting.
  • `.claude/skills/add-discord-voice-transcription/REMOVE.md` — lightweight unset-env path + full code revert.
  • `.claude/skills/add-discord-voice-transcription/VERIFY.md` — send a voice memo, check inline transcript.
  • `src/transcription.ts` — `transcribeAudioBuffer(Buffer): Promise<string | null>` + `isAudioAttachment(att)` helper. Channel-agnostic. Shells out to `ffmpeg` for normalization (any container → 16 kHz mono WAV) and `whisper-cli` for the transcript. Returns null on any failure or empty output; the bridge only injects `[Voice: …]` when the transcript is non-empty.
  • `src/channels/chat-sdk-bridge.ts` — +15 lines inside `messageToInbound()`. Transcript is stored on the attachment entry (for any downstream consumer) and appended to `serialized.content` so the agent sees it inline alongside the audio file placeholder.
  • `src/transcription.test.ts` — 8 unit tests covering `isAudioAttachment`, env-gate, transcript trim, empty-output handling, and execFile-failure paths.

For Skills

  • SKILL.md contains instructions, not inline code (code goes in separate files)
  • SKILL.md is under 500 lines (~160 lines)
  • I tested this skill on a fresh clone — pending live end-to-end test on Discord; build + unit tests green (see test plan)

Test plan

  • `pnpm run build` clean on main
  • `npx vitest run src/transcription.test.ts` — 8/8 passing
  • `npx vitest run src/channels/chat-sdk-bridge.test.ts` — 12/12 passing (no regression)
  • Live: send a Discord voice memo with `WHISPER_BIN=whisper-cli` set, confirm `[Voice: ]` reaches the agent
  • Live: same with `WHISPER_BIN` unset, confirm no behavior change

🤖 Generated with Claude Code

@github-actions github-actions Bot added PR: Feature New feature or enhancement PR: Skill Skill package or skill-related changes labels May 13, 2026
Opt-in voice transcription for Discord and any other Chat SDK-bridged
channel (Slack, Teams, Webex, Google Chat) via local whisper.cpp on the
host. No cloud API, no OPENAI_API_KEY — transcription is fully on-device.

Pairs with @ira-at-work's nanocoai#2317 (add-voice-transcription-free-whisper)
which patches Signal/Telegram/WhatsApp adapters directly. This skill is
the bridge-side complement — one shared hook in chat-sdk-bridge.ts covers
every Chat SDK-bridged channel. Together the two skills close the voice
gap on every channel NanoClaw supports.

Addresses the Discord side of @b1ek's nanocoai#2426 (LLM cant see the image in
discord — same shape applies to voice today).

Files:
- .claude/skills/add-discord-voice-transcription/{SKILL.md, REMOVE.md,
  VERIFY.md} — follows @ddaniels' merged Signal v2 template (nanocoai#1953):
  pre-flight, prerequisites, git fetch upstream skill/discord-voice-
  transcription, env vars, restart, troubleshooting.
- src/transcription.ts — transcribeAudioBuffer(Buffer) and
  isAudioAttachment(att). Channel-agnostic. Shells out to ffmpeg for
  input normalization (any container → 16 kHz mono WAV) and whisper-cli
  for the transcript. Returns null on any failure or empty output.
- src/transcription.test.ts — 8 unit tests covering the truth table for
  isAudioAttachment plus env-gate, trim, empty-output, and execFile-
  failure paths.
- src/channels/chat-sdk-bridge.ts — +15 lines inside messageToInbound().
  Gated on process.env.WHISPER_BIN and isAudioAttachment(entry). When
  WHISPER_BIN is unset, the path is a no-op so behavior is unchanged for
  installs that don't opt in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@mtichikawa mtichikawa force-pushed the feat/add-discord-voice-transcription branch from 62716e3 to 32adefc Compare May 13, 2026 15:46
@github-actions github-actions Bot added the follows-guidelines PR was created using the current contributing template label May 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

follows-guidelines PR was created using the current contributing template PR: Feature New feature or enhancement PR: Skill Skill package or skill-related changes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant