Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
99ef50a
Added audio requests to vLLM models
karpnv Nov 13, 2025
8297aed
Intorduced vLLM_multimodal model to save multimodal outputs
vmendelev Dec 18, 2025
2575267
generation.py to respect separate server type for the client
vmendelev Dec 18, 2025
b8d95f0
Unified server to work with NeMo models not supported by vLLM
vmendelev Dec 20, 2025
66667b0
Magpie TTS backend
vmendelev Dec 26, 2025
d916e10
nv_tts eval scripts
vmendelev Dec 27, 2025
14523f7
Checkpoint + hparams input instead of nemo
vmendelev Dec 28, 2025
4aa3a2d
Per benchmark scoring jobs
vmendelev Dec 28, 2025
5372f7e
nv_tts benchmarks and scripts to run them
vmendelev Dec 28, 2025
db37cff
EOS FIX 8 chunks per node
vmendelev Dec 28, 2025
a11456e
Documentation and comparison script
vmendelev Dec 28, 2025
e80448e
eos config example
vmendelev Dec 28, 2025
547c912
EAR TTS backend
vmendelev Jan 9, 2026
d318068
EAR TTS config
vmendelev Jan 9, 2026
18981f9
Fix MagpieTTS backend when no context audio is provided
vmendelev Jan 30, 2026
8b9c22f
Reset MagpieTTS decoder cache per request batch
vmendelev Jan 30, 2026
dfb522f
Cache HF resolve URL loads in MagpieTTS backend
vmendelev Jan 30, 2026
80a2d7c
Disable MagpieTTS KV cache to avoid shape mismatches
vmendelev Jan 30, 2026
9fe2703
Fix HF resolve URL caching in MagpieTTS backend
vmendelev Jan 30, 2026
136af12
Avoid killing multi-instance tasks via srun --wait
vmendelev Jan 30, 2026
8f6d68f
Override srun wait for multi-instance jobs
vmendelev Jan 30, 2026
c23805d
Reduce MagpieTTS inference batch size
vmendelev Jan 31, 2026
c482412
Set multi-instance srun wait to 1 hour
vmendelev Jan 31, 2026
5d104d3
Add emergent_tts dataset + eval scripts
vmendelev Jan 31, 2026
52b6599
Fix Emergent scoring deps and paths
vmendelev Feb 3, 2026
88bd09c
Add emergent_tts README
vmendelev Feb 6, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 8 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,14 @@ build
.venv
*.lock

# Local caches / secrets (never ship to remote via rsync)
.ssh/
.hf_cache/
.nemo_run/

# Emergent dataset artifacts (large; stored in shared data_dir instead)
nemo_skills/dataset/emergent_tts/data/

__pycache__
.ipynb_checkpoints

Expand Down
48 changes: 48 additions & 0 deletions cluster_configs/eos_example.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
executor: slurm

ssh_tunnel:
host: login-eos.nvidia.com
# ------------------------------- Fill this up! -------------------------------
user: your_username
job_dir: /lustre/fsw/llmservice_nemo_speechlm/users/your_username/code/nemo-run
identity: ""
# -----------------------------------------------------------------------------

# if you're running directly from cluster, you only need to define job_dir and shouldn't use ssh_tunnel
# job_dir: <some location on slurm cluster to keep job metadata, uploaded code and generated sbatch files>

account: llmservice_nemo_speechlm
partition: batch
job_name_prefix: ""

disable_gpus_per_node: True

containers:
trtllm: /lustre/share/llmservice_nemo_reasoning/images/nemo-skills-trtllm-latest.sqsh
vllm: /lustre/share/llmservice_nemo_reasoning/images/nemo-skills-vllm-latest.sqsh
sglang: /lustre/share/llmservice_nemo_reasoning/images/nemo-skills-sglang-latest.sqsh
nemo-rl: /lustre/share/llmservice_nemo_reasoning/images/nemo-skills-nemo-rl-latest.sqsh
megatron: /lustre/share/llmservice_nemo_reasoning/images/nemo-skills-megatron-latest.sqsh
sandbox: /lustre/share/llmservice_nemo_reasoning/images/nemo-skills-sandbox-latest.sqsh
nemo-skills: /lustre/share/llmservice_nemo_reasoning/images/nemo-skills-latest.sqsh
verl: /lustre/share/llmservice_nemo_reasoning/images/nemo-skills-verl-latest.sqsh

mounts:
# - /lustre/fsw/llmservice_nemo_reasoning/hf_models:/hf_models
# - /lustre/fsw/llmservice_nemo_reasoning/images/swe-bench:/swe-bench-images
- /lustre/fsw/llmservice_nemo_speechlm:/lustre/fsw/llmservice_nemo_speechlm

# you also need to mount your own workspace folder (or any other folder you need)
# - /lustre/fsw/llmservice_nemo_reasoning/users/igitman/:/workspace

env_vars:
# ------------------------------- Fill this up! -------------------------------
- HF_HOME=/lustre/fsw/llmservice_nemo_speechlm/users/your_username/hfcache
# -----------------------------------------------------------------------------

timeouts:
batch: 04:00:00
interactive: 02:00:00

mail_type: FAIL
mail_user: # <your email goes here>
124 changes: 124 additions & 0 deletions nemo_skills/dataset/emergent_tts/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
## EmergentTTS-Eval dataset (`emergent_tts`)

This dataset integration lets you:

- **Prepare** the EmergentTTS-Eval test set under a shared `data_dir` (download baseline audios + metadata + MOS model).
- **Generate** TTS outputs with NeMo-Skills (`ns eval` via `run_tts_eval.py`).
- **Score** the generated outputs with EmergentTTS-Eval (WER/MOS/win-rate, depending on config).

### 1) Prepare the test set (requires `HF_TOKEN`)

`prepare.py` downloads the dataset and writes all required artifacts into:

- `<DATA_DIR>/emergent_tts/emergent/test.jsonl`
- `<DATA_DIR>/emergent_tts/data/emergent_tts_eval_data.jsonl`
- `<DATA_DIR>/emergent_tts/data/baseline_audios/*.wav`
- `<DATA_DIR>/emergent_tts/data/wv_mos.ckpt`

Run it from your dev machine (or any environment with network access):

```bash
cd /home/vmendelev/workspace/expressiveness/src/nemo-skills-tts-eval
. ./.venv/bin/activate

export HF_TOKEN="<your_hf_token>"

python nemo_skills/dataset/emergent_tts/prepare.py \
--output_dir "<DATA_DIR>/emergent_tts"
```

Optional flags:

- `--num_samples 10`: write only the first 10 samples (smoke test).
- `--overwrite`: re-download / regenerate outputs.

### 2) Configure evaluation

Use the example configs in `nemo_skills/dataset/emergent_tts/scripts/config/`.

In `scripts/config/default.yaml`, set:

- `generation.data_dir: <DATA_DIR>`
- `scoring.emergent_data_dir: <DATA_DIR>/emergent_tts/data`
- `scoring.scoring_code_path: <PATH_TO>/EmergentTTS-Eval-public` (on the cluster)

### 3) Clone + patch EmergentTTS-Eval-public for NVIDIA Inference API judging

On EOS (or wherever you run scoring), clone EmergentTTS-Eval:

```bash
cd /lustre/fsw/llmservice_nemo_speechlm/users/vmendelev/code
git clone <repo_url> EmergentTTS-Eval-public
```

Then update Emergent’s judge client selection so that **Gemini models are called via NVIDIA’s OpenAI-compatible Inference API**.

Target behavior:

- **Model name** stays as: `gcp/google/gemini-2.5-pro` (or similar).
- **Base URL** is NVIDIA Inference API: `https://inference-api.nvidia.com/v1`
- **API key** comes from: `JUDGER_API_KEY` (or `NVIDIA_API_KEY`)

Minimal patch checklist inside `EmergentTTS-Eval-public`:

- In `api_clients.py` (or wherever the client is chosen), ensure `gcp/google/*` uses an **OpenAI-compatible** client (not the Google SDK client), e.g.:
- `OpenAI(base_url=<judger_base_url>, api_key=os.getenv("JUDGER_API_KEY"))`
- Thread `judger_base_url` through so calls use `https://inference-api.nvidia.com/v1` (not the full `/v1/chat/completions` endpoint).

After patching, set these in `scripts/config/default.yaml`:

- `scoring.judge_model: gcp/google/gemini-2.5-pro`
- `scoring.judger_base_url: https://inference-api.nvidia.com/v1/chat/completions`

### 3) Run evaluation (generation + scoring)

From your dev machine, submit jobs to EOS:

```bash
cd /home/vmendelev/workspace/expressiveness/src/nemo-skills-tts-eval
. ./.venv/bin/activate
mkdir -p .nemo_run

export NEMORUN_HOME="$PWD/.nemo_run"
export NEMO_SKILLS_CONFIG_DIR=/home/vmendelev/workspace/expressiveness/src/ns_eval/cluster_configs
export NEMO_SKILLS_DISABLE_UNCOMMITTED_CHANGES_CHECK=1

# Required for win-rate judging (NVIDIA Inference API key)
export JUDGER_API_KEY="<your_nvidia_api_key>"

python -m nemo_skills.dataset.emergent_tts.scripts.run_tts_eval \
--config nemo_skills/dataset/emergent_tts/scripts/config/default.yaml \
--stage all \
--expname emergent_eval
```

### 4) Smoke test (10 samples, interactive)

```bash
cd /home/vmendelev/workspace/expressiveness/src/nemo-skills-tts-eval
. ./.venv/bin/activate
mkdir -p .nemo_run

export NEMORUN_HOME="$PWD/.nemo_run"
export NEMO_SKILLS_CONFIG_DIR=/home/vmendelev/workspace/expressiveness/src/ns_eval/cluster_configs
export NEMO_SKILLS_DISABLE_UNCOMMITTED_CHANGES_CHECK=1

python -m nemo_skills.dataset.emergent_tts.scripts.run_tts_eval \
--config nemo_skills/dataset/emergent_tts/scripts/config/interactive_10.yaml \
--stage generation \
--expname emergent_smoke10
```

### Outputs

NeMo-Skills generation writes:

- `<output_dir>/eval-results/emergent_tts.emergent/output.jsonl`
- `<output_dir>/eval-results/emergent_tts.emergent/audio/*.wav` (or equivalent)

Emergent scoring writes (in the same benchmark folder):

- `emergent-tts-eval_*_evaluation-predictions.jsonl`
- `emergent-tts-eval_*_evaluation-metrics.json`
- `metrics.json` (a NeMo-Skills-friendly copy of Emergent metrics)

6 changes: 6 additions & 0 deletions nemo_skills/dataset/emergent_tts/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
"""EmergentTTS-Eval dataset integration for NeMo-Skills.

This package contains tooling to prepare the EmergentTTS-Eval benchmark for
NeMo-Skills evaluation runs.
"""

3 changes: 3 additions & 0 deletions nemo_skills/dataset/emergent_tts/emergent/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# EmergentTTS-Eval benchmark (NeMo-Skills)

GENERATION_ARGS = "++prompt_format=openai"
Loading