You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: QUICKSTART.md
+42-7Lines changed: 42 additions & 7 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -104,6 +104,16 @@ If CUDA/MPS isn’t available, the trainer falls back to CPU and logs a warning.
104
104
105
105
## 7. Run C# inference with training audio
106
106
107
+
**Just checking visemes (e.g. with your own WAVs)?** You do **not** need dev-clean, Python, MFA, or any training setup. You only need:
108
+
109
+
1.**Clone the repo** (and `git checkout develop` if that’s the branch with the export).
110
+
2.**.NET 8 SDK** – install from [dotnet.microsoft.com/download](https://dotnet.microsoft.com/download) or `sudo apt install dotnet-sdk-8` / `brew install dotnet@8`. Check with `dotnet --version`.
111
+
3. Run the command from the **project root** with a path to your WAV (see “Checking visemes” below). The repo already includes an ONNX model under `export/`, so no LFS or training is required.
112
+
113
+
Dev-clean / training data is only needed if you want to run training yourself or use a dev-clean WAV as a sanity check (see “Example WAV” later in this section).
114
+
115
+
---
116
+
107
117
After exporting a model to ONNX (e.g. from the training UI or export script), you can run the C# test app from the **project root** so it finds the `export/` directory and optional training WAVs.
108
118
109
119
-**Auto-pick a sample** from `training/data/prepared/dev-clean`:
@@ -119,7 +129,9 @@ The app loads the newest ONNX model under `export/` (and its `config.json`), app
119
129
120
130
### Checking visemes (quick run)
121
131
122
-
From the **project root** (so `export/` and the ONNX model are found), run with your own WAV:
132
+
**1. Run with your test WAV files**
133
+
134
+
From the **project root** (so `export/` and the ONNX model are found):
123
135
124
136
```bash
125
137
dotnet run --project inference/OpenLipSync.Inference/OpenLipSync.Inference.Test/OpenLipSync.Inference.Test.csproj -c Release -- /path/to/your-audio.wav
@@ -131,14 +143,37 @@ Example with a file in Downloads:
131
143
dotnet run --project inference/OpenLipSync.Inference/OpenLipSync.Inference.Test/OpenLipSync.Inference.Test.csproj -c Release -- ~/Downloads/julian-01.wav
132
144
```
133
145
134
-
**What you get**
146
+
Outputs are written **next to the WAV** (same folder): `visemes_<basename>.csv` and `visemes_<basename>.html`.
147
+
148
+
**2. Evaluating the CSV output**
149
+
150
+
-**Location:** Same directory as the WAV, e.g. `visemes_julian-01.csv`.
151
+
-**Format:** Header row: `time_ms,sil,PP,FF,TH,DD,kk,CH,SS,nn,RR,aa,E,ih,oh,ou`. Each following row is one time step (frame); values are 0–1 activations.
152
+
-**What to check:** Open in Excel or a text editor. During speech you should see non-zero values in several viseme columns (e.g. `aa`, `E`, `ou`); near silence, `sil` should be high (e.g. > 0.9) and others low. Time should advance by ~21 ms per row (model frame rate). If every row is `sil=1` and the rest 0, something is wrong (e.g. model not loaded or normalization missing).
153
+
154
+
**3. Evaluating the HTML output**
155
+
156
+
-**Location:** Same directory as the WAV, e.g. `visemes_julian-01.html`.
157
+
-**How to use:** Open the file in a browser (double-click or File → Open). You get a line chart: x-axis = time (seconds), y-axis = activation (0–1), one series per viseme.
158
+
-**What to check:** During speech, multiple lines should move (e.g. sil dips, aa/E/ou rise). Use **All on** / **All off** to show or hide all series; click a viseme name in the legend to toggle that line. If the chart is flat (only silence) for a file you know has speech, the pipeline or model is wrong.
159
+
160
+
**4. Console output**
161
+
162
+
The app also prints per-utterance normalization (mean, std) and five sample lines at 0%, 25%, 50%, 75%, and 100% of the clip (e.g. `sil=0.14 aa=0.43 E=0.26`). Use these for a quick sanity check without opening CSV/HTML.
163
+
164
+
**Example WAV with known-ish outputs (sanity check)**
165
+
166
+
The repo does **not** ship a golden WAV with committed “expected” visemes (training data is gitignored). You can still sanity-check the pipeline:
167
+
168
+
-**If you have prepared training data:** After running section 4 (training) once, you’ll have `training/data/prepared/dev-clean/` with WAVs and MFA alignment. Run the C# app on one of those WAVs, e.g.:
169
+
```bash
170
+
dotnet run --project inference/OpenLipSync.Inference/OpenLipSync.Inference.Test/OpenLipSync.Inference.Test.csproj -c Release -- training/data/prepared/dev-clean/1272-128104-0000.wav
171
+
```
172
+
The model was trained on this data, so you should see varied visemes during speech (not all silence). The MFA alignment in the matching `.json` is the “ground truth” used for training; you can compare the CSV roughly to that (same time ranges should show the corresponding visemes).
135
173
136
-
-**Console:** Per-utterance normalization stats, then viseme lines at five time points (start, 25%, 50%, 75%, end). Each line lists which visemes are active (e.g. `sil=0.14 aa=0.43 E=0.26`).
137
-
-**Same folder as the WAV:**
138
-
-`visemes_<basename>.csv` – every frame: `time_ms`, then 15 columns (sil, PP, FF, …). Open in Excel or use for your own plots.
139
-
-`visemes_<basename>.html` – open in a browser: time (s) vs activation (0–1) for all 15 visemes. Use **All on** / **All off** or click the legend to show/hide series.
174
+
-**Without training data:** Use any of your own WAVs and inspect the CSV/HTML as above; there’s no committed reference output to diff against.
140
175
141
-
If the app says "Model not found", ensure there is an ONNX export under `export/` (e.g. `export/quick_laptop_uk_15ep_*/model.onnx` and `config.json`). The app uses the **newest**`model.onnx` under `export/`.
176
+
If the app says **"Model not found"**, ensure there is an ONNX export under `export/` (e.g. `export/quick_laptop_uk_15ep_*/model.onnx` and `config.json`). The app uses the **newest**`model.onnx` under `export/`.
0 commit comments