Skip to content

feat(skills): add /add-voice-transcription-free-whisper skill#2317

Open
ira-at-work wants to merge 3 commits into
nanocoai:mainfrom
ira-at-work:feat/add-voice-transcription-free-whisper
Open

feat(skills): add /add-voice-transcription-free-whisper skill#2317
ira-at-work wants to merge 3 commits into
nanocoai:mainfrom
ira-at-work:feat/add-voice-transcription-free-whisper

Conversation

@ira-at-work
Copy link
Copy Markdown

Summary

  • New /add-voice-transcription-free-whisper skill that wires local (free) voice transcription to NanoClaw channels
  • Supports two backends: openai-whisper (Python, GPU-accelerated when available) and whisper.cpp (CPU-only, no Python required)
  • Pre-flight detects which backend binary is present before modifying any files
  • Patches Signal, Telegram, and WhatsApp channel adapters to auto-transcribe voice messages and audio attachments to text before the agent processes them
  • Idempotent — safe to re-run if interrupted

Test plan

  • Run skill on a fresh install with openai-whisper installed — verify src/transcription.ts is written and adapter patched
  • Run skill on a fresh install with whisper.cpp installed — verify correct backend selected
  • Send a voice message via Signal/Telegram — verify it appears as transcribed text to the agent
  • Run skill twice — verify idempotency (no duplicate patches)
  • Run skill with neither backend installed — verify pre-flight blocks install with a clear error

🤖 Generated with Claude Code

Adds a skill that wires local (free) voice transcription to any NanoClaw
channel — voice messages and audio attachments are automatically transcribed
to text before the agent sees them.

Supports two backends, each detected and installed via a preflight check:
- openai-whisper (Python, GPU-accelerated when available)
- whisper.cpp (CPU-only, no Python required)

The skill drops a transcription module into src/transcription.ts and patches
the relevant channel adapters (Signal, Telegram, WhatsApp) to invoke it.
Pre-flight verifies the chosen backend binary is present before modifying any
files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@ira-at-work
Copy link
Copy Markdown
Author

Closing — incorrect base, will re-open with clean branch.

@ira-at-work ira-at-work closed this May 7, 2026
@ira-at-work ira-at-work reopened this May 7, 2026
mtichikawa added a commit to mtichikawa/nanoclaw that referenced this pull request May 13, 2026
Adds a feature skill that wires opt-in voice transcription into Discord
and any other Chat SDK-bridged channel (Slack, Teams, Webex, Google Chat,
etc.) via local whisper.cpp on the host. Pairs with @ira-at-work's nanocoai#2317
(add-voice-transcription-free-whisper) which covers Signal/Telegram/
WhatsApp via per-adapter patches — together they cover every channel
from a single skill apiece.

Addresses the Discord side of the gap reported in nanocoai#2426 (LLM cant see
the image in discord — same shape applies to voice today).

Per CONTRIBUTING.md (single feature-skill PR against main): includes
SKILL.md trio plus the code that will live on skill/discord-voice-
transcription after maintainer extraction.

Files:
- .claude/skills/add-discord-voice-transcription/SKILL.md: install
  instructions following the @ddaniels Signal v2 template (nanocoai#1953).
  Pre-flight, prerequisites, idempotent merge from
  upstream/skill/discord-voice-transcription, env vars, restart,
  troubleshooting.
- .claude/skills/add-discord-voice-transcription/REMOVE.md: unset
  WHISPER_BIN (lightweight) or revert chat-sdk-bridge to upstream/main
  and delete src/transcription.ts (full).
- .claude/skills/add-discord-voice-transcription/VERIFY.md: send a
  voice memo, check inline transcript.
- src/transcription.ts: transcribeAudioBuffer(Buffer) +
  isAudioAttachment(att). Channel-agnostic. Shells out to ffmpeg
  (input -> 16 kHz mono WAV) and whisper-cli. Returns null on any
  failure or empty output.
- src/transcription.test.ts: 8 tests — isAudioAttachment truth table,
  env-gate, trim, empty-output, execFile-failure paths.
- src/channels/chat-sdk-bridge.ts: +15 lines, gated on
  process.env.WHISPER_BIN. Skipped silently when unset, so behavior
  is unchanged for installs that don't opt in.

Test plan:
- [x] pnpm run build clean on main
- [x] npx vitest run src/transcription.test.ts — 8/8 passing
- [x] npx vitest run src/channels/chat-sdk-bridge.test.ts — 12/12
  passing (no regression in existing bridge tests)
- [ ] Live test: send a Discord voice memo with WHISPER_BIN set
- [ ] Live test: same with WHISPER_BIN unset, confirm no behavior
  change

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
mtichikawa added a commit to mtichikawa/nanoclaw that referenced this pull request May 13, 2026
Opt-in voice transcription for Discord and any other Chat SDK-bridged
channel (Slack, Teams, Webex, Google Chat) via local whisper.cpp on the
host. No cloud API, no OPENAI_API_KEY — transcription is fully on-device.

Pairs with @ira-at-work's nanocoai#2317 (add-voice-transcription-free-whisper)
which patches Signal/Telegram/WhatsApp adapters directly. This skill is
the bridge-side complement — one shared hook in chat-sdk-bridge.ts covers
every Chat SDK-bridged channel. Together the two skills close the voice
gap on every channel NanoClaw supports.

Addresses the Discord side of @b1ek's nanocoai#2426 (LLM cant see the image in
discord — same shape applies to voice today).

Files:
- .claude/skills/add-discord-voice-transcription/{SKILL.md, REMOVE.md,
  VERIFY.md} — follows @ddaniels' merged Signal v2 template (nanocoai#1953):
  pre-flight, prerequisites, git fetch upstream skill/discord-voice-
  transcription, env vars, restart, troubleshooting.
- src/transcription.ts — transcribeAudioBuffer(Buffer) and
  isAudioAttachment(att). Channel-agnostic. Shells out to ffmpeg for
  input normalization (any container → 16 kHz mono WAV) and whisper-cli
  for the transcript. Returns null on any failure or empty output.
- src/transcription.test.ts — 8 unit tests covering the truth table for
  isAudioAttachment plus env-gate, trim, empty-output, and execFile-
  failure paths.
- src/channels/chat-sdk-bridge.ts — +15 lines inside messageToInbound().
  Gated on process.env.WHISPER_BIN and isAudioAttachment(entry). When
  WHISPER_BIN is unset, the path is a no-op so behavior is unchanged for
  installs that don't opt in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant