DynamicDevices
diff --git a/‎.gitignore‎
Lines changed: 4 additions & 1 deletion b/‎.gitignore‎
Lines changed: 4 additions & 1 deletion
diff --git a/‎QUICKSTART.md‎
Lines changed: 41 additions & 0 deletions b/‎QUICKSTART.md‎
Lines changed: 41 additions & 0 deletions
diff --git a/‎TODO.md‎
Lines changed: 9 additions & 0 deletions b/‎TODO.md‎
Lines changed: 9 additions & 0 deletions
diff --git a/‎pyproject.toml‎
Lines changed: 6 additions & 0 deletions b/‎pyproject.toml‎
Lines changed: 6 additions & 0 deletions
diff --git a/‎tools/generate_phoneme_prompts.py‎
Lines changed: 82 additions & 0 deletions b/‎tools/generate_phoneme_prompts.py‎
Lines changed: 82 additions & 0 deletions
@@ -226,4 +226,7 @@ bin/
 VisemesAtHome/
 
 OpenLipSync.Inference.Test/
-OpenLipSync.Inference.Standalone/
+OpenLipSync.Inference.Standalone/
+
+# Generated phoneme test audio (run tools/generate_phoneme_prompts.py to create)
+data/phoneme_prompts/
@@ -175,6 +175,47 @@ The repo does **not** ship a golden WAV with committed “expected” visemes (t
 
 If the app says **"Model not found"**, ensure there is an ONNX export under `export/` (e.g. `export/quick_laptop_uk_15ep_*/model.onnx` and `config.json`). The app uses the **newest** `model.onnx` under `export/`.
 
+## 8. Real-time mic → visemes → MQTT (test harness)
+
+A Python test harness streams live microphone audio through the ONNX model and publishes viseme activations as JSON to an MQTT broker.
+
+**Install extra deps (from project root):**
+
+```bash
+uv sync --extra realtime
+```
+
+**Run (default: newest model under `export/`, broker `mqtt.dynamicdevices.co.uk:1883`, topic `openlipsync/visemes`):**
+
+```bash
+uv run python tools/realtime_viseme_mqtt.py
+```
+
+**MQTT usage:** Messages are published to `openlipsync/visemes/<client_id>` (client ID is stable per device; override with `--client-id`). Each message is JSON with: `t` (Unix time), `frame`, `client_id`, `visemes` (per-frame activations, normalised to sum=1), and `visemes_peak` (running max over ~1 s, normalised to sum=1). Subscribe to all clients with `mosquitto_sub -h mqtt.dynamicdevices.co.uk -t 'openlipsync/visemes/#'` or to one client with `openlipsync/visemes/<client_id>`.
+
+**Options:**
+
+- `--broker HOST` — MQTT broker host (default: mqtt.dynamicdevices.co.uk)
+- `--port PORT` — MQTT port (default: 1883)
+- `--topic TOPIC` — MQTT topic prefix; the connection’s client ID is appended so each client has a unique topic (default: openlipsync/visemes → openlipsync/visemes/&lt;client_id&gt;)
+- `--client-id ID` — MQTT client ID (default: MAC-derived, e.g. olips-a1b2c3d4e5f6)
+- `--warmup SECS` — Seconds of audio used to compute mel normalization stats (default: 1.0)
+- `--publish-every N` — Publish every N frames (1 = every ~10 ms; 5 = every ~50 ms)
+- `--model-dir PATH` — Use a specific export dir instead of the newest under `export/`
+- `--device NAME` — Sounddevice input device (e.g. list with `python -c "import sounddevice; print(sounddevice.query_devices())"`)
+
+**Phoneme check:** Run with `--phoneme-test`. Each segment is either silence (segment 1) or a test phoneme played from pre-generated files (segment 2–15). The mic captures that audio and the harness reports **peak** viseme activations and ✓/✗ (expected in top 2). Add `--speak` to play the clips: generate once with `uv run --extra realtime python tools/generate_phoneme_prompts.py` (writes `data/phoneme_prompts/`: segment_01.wav = silence, segment_02..15.mp3 = TTS e, ah, eh, oh, oo, p, f, thin, t, k, sh, s, n, r, each repeated 5×). Playback runs in the background; need **ffplay**, **afplay**, or **mpv**.
+
+**JSON payload shape:** Each message is per-frame viseme activations **normalised so each set sums to 1** (including silence). Fields: `t`, `frame`, `client_id`, `visemes` (name → 0–1, sum=1), and `visemes_peak` (running max over last ~1 s, then normalised to sum=1). Example:
+
+```json
+{"t": 1739123456.78, "frame": 42, "client_id": "openlipsync-a1b2c3d4", "visemes": {"silence": 0.9, "PP": 0.01, "aa": 0.02, ...}, "visemes_peak": {"silence": 0.95, "PP": 0.02, "aa": 0.45, ...}}
+```
+
+Normalization uses a short warmup (default 1 s) to estimate mean/std over the mic input; then those stats are fixed for the rest of the session. Stop with Ctrl+C.
+
+**Improving the phoneme test:** The default clips are TTS (Edge TTS). In practice, silence and SS often peak high (playback + mic setup), so the pass rate can be low even when the correct viseme fires. Using **human-recorded** phoneme clips (same filenames in `data/phoneme_prompts/`: segment_01.wav … segment_15.mp3) would improve realism and pass rate. See TODO.md (phoneme test audio).
+
 ## Troubleshooting
 
 | Issue | What to do |
 
@@ -0,0 +1,9 @@
+# TODO
+
+## Phoneme test audio
+
+**Improve phoneme test pass rate and realism** by replacing TTS-generated clips with human-recorded phoneme clips.
+
+- **Current:** `tools/generate_phoneme_prompts.py` creates segment_01.wav (silence) and segment_02..15.mp3 via Edge TTS. Playback + mic often causes silence/SS to dominate, so pass rate is low.
+- **Proposed:** Have someone record the 15 segments (silence + e, ah, eh, oh, oo, p, f, thin, t, k, sh, s, n, r, each repeated several times). Save as the same filenames in `data/phoneme_prompts/` (segment_01.wav, segment_02.mp3 … segment_15.mp3). No harness changes needed.
+- **Refs:** QUICKSTART “Improving the phoneme test”; `tools/generate_phoneme_prompts.py` docstring TODO.
@@ -25,3 +25,9 @@ gui = [
 export = [
     "onnx",
 ]
+realtime = [
+    "onnxruntime",
+    "sounddevice",
+    "paho-mqtt",
+    "edge-tts",
+]
@@ -0,0 +1,82 @@
+#!/usr/bin/env python3
+"""
+Pre-generate phoneme test audio for the realtime harness.
+
+Creates one file per segment in data/phoneme_prompts/:
+- segment_01: 1 s silence (WAV)
+- segment_02..15: TTS of the target sound repeated REPEAT_COUNT times (e.g. "e e e e e", "thin thin ..." for /θ/) as MP3.
+
+At run time, --speak plays the clip for each segment in the background while the mic
+captures; the harness checks that the expected viseme is detected (peak activation, top-2).
+
+TODO: Replace TTS clips with human-recorded phoneme clips (same filenames) for better
+      pass rate and more natural test; see TODO.md (phoneme test audio).
+
+Usage (from project root):
+  uv run --extra realtime python tools/generate_phoneme_prompts.py
+"""
+
+from __future__ import annotations
+
+import asyncio
+import sys
+import wave
+from pathlib import Path
+
+PROJECT_ROOT = Path(__file__).resolve().parents[1]
+OUT_DIR = PROJECT_ROOT / "data" / "phoneme_prompts"
+
+# Segment 1 = silence (we write a WAV). Segments 2-15 = TTS text for the sound only.
+# Use "thin" for TH so TTS produces the /θ/ sound, not "tee aitch".
+PHONEME_TEXTS = [
+    None,  # 1: silence, no TTS
+    "e", "ah", "eh", "oh", "oo", "p", "f", "thin", "t", "k", "sh", "s", "n", "r",
+]
+REPEAT_COUNT = 5  # Say each sound this many times per segment for a stronger test
+SAMPLE_RATE = 16000
+
+VOICE = "en-GB-SoniaNeural"
+
+
+def write_silence_wav(path: Path, duration_sec: float = 1.0) -> None:
+    with wave.open(str(path), "wb") as w:
+        w.setnchannels(1)
+        w.setsampwidth(2)
+        w.setframerate(SAMPLE_RATE)
+        n = int(SAMPLE_RATE * duration_sec)
+        w.writeframes(b"\x00\x00" * n)
+
+
+async def main() -> None:
+    try:
+        import edge_tts
+    except ImportError:
+        print(
+            "edge-tts is not installed. From the project root, run:\n"
+            "  uv run --extra realtime python tools/generate_phoneme_prompts.py\n"
+            "(--extra realtime installs edge-tts into the environment, then runs this script.)",
+            file=sys.stderr,
+        )
+        sys.exit(1)
+
+    OUT_DIR.mkdir(parents=True, exist_ok=True)
+    print(f"Writing {len(PHONEME_TEXTS)} files to {OUT_DIR}")
+
+    # Segment 1: silence
+    seg1 = OUT_DIR / "segment_01.wav"
+    write_silence_wav(seg1)
+    print(f"  {seg1.name} (silence)")
+
+    for i, text in enumerate(PHONEME_TEXTS[1:], start=2):
+        path = OUT_DIR / f"segment_{i:02d}.mp3"
+        # Repeat the sound REPEAT_COUNT times so each segment is a solid test
+        repeated = " ".join([text] * REPEAT_COUNT)
+        communicate = edge_tts.Communicate(repeated, VOICE)
+        await communicate.save(str(path))
+        print(f"  {path.name} ({text} x{REPEAT_COUNT})")
+
+    print("Done. Run phoneme-test with --speak: each segment plays this sound while the mic captures and checks the viseme.")
+
+
+if __name__ == "__main__":
+    asyncio.run(main())