low speaker similarity in zero-shot tts #910

LoganLiu66 · 2025-03-03T07:41:22Z

Thank you for this great job. When I try to use zero-shot TTS, I found speakers' similarity is low between spk_smp and generated aduio. My prompt audio、prompt_text and generated audio are in audios.zip. What may be the reason for causing this, and is there any advice for improvement, thanks.

    audio_file = 'sample.wav'
    prompt_text = 'I chance to leave him alone, but[uv_break] no[uv_break]. She just wanted to see him again[uv_break]. Anna[uv_break], you don't know how it feels to lose a sister[uv_break].'
    spk_smp = chat.sample_audio_speaker(load_audio(audio_file, 24000))

    params_infer_code = ChatTTS.Chat.InferCodeParams(
        spk_smp=spk_smp,
        txt_smp=prompt_text,
        temperature=0.3,
        top_P=0.7,
        top_K=20
    )
    params_refine_text = ChatTTS.Chat.RefineTextParams(
        prompt='[oral_5]'
    )

    text = "I do love books, but I think I like writing about them more than selling them."
    wav = chat.infer(
        text,
        params_infer_code=params_infer_code,
        split_text=False,
        params_refine_text=params_refine_text
    )
    torchaudio.save("sample_generated.wav", torch.from_numpy(wav[0]).unsqueeze(0), 24000)

fumiama · 2025-03-12T13:59:14Z

ZeroShot works best on the audio generated by ChatTTS. If you want to use outside audio, make sure that the audio has good quality and the transcript, txt_smp, is completely identical to the audio, including [lbreak] mark, etc.

fumiama added documentation help wanted labels Mar 12, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

low speaker similarity in zero-shot tts #910

low speaker similarity in zero-shot tts #910

LoganLiu66 commented Mar 3, 2025 •

edited by fumiama

Loading

fumiama commented Mar 12, 2025

low speaker similarity in zero-shot tts #910

low speaker similarity in zero-shot tts #910

Comments

LoganLiu66 commented Mar 3, 2025 • edited by fumiama Loading

fumiama commented Mar 12, 2025

LoganLiu66 commented Mar 3, 2025 •

edited by fumiama

Loading