.Net: tts - audio generation: what model to use to generate >4min (longer audio)? or audio at all (tts-hd to deprecate on Sat, Mar 1, 2025) #10655

sophialagerkranspandey · 2025-02-24T16:38:04Z

Discussed in #10645

^{Originally posted by joslat February 23, 2025}
Hi,

I've managed to generate a proper text to speech following the sample:
https://github.com/microsoft/semantic-kernel/blob/main/dotnet/samples/Concepts/TextToAudio/OpenAI_TextToAudio.cs

But the only model i can use is tts or tts-hd - all of them have a cap of 4,096 chars...
This enables a maximum of 4 to 8 minutes of audio, not more.
And on top, this model is to be deprecated on Sat, Mar 1, 2025...

I am building a language teacher and would like to generate audio sessions ranging up to 20 or more minutes...

Is there any way to overcome this "hard cap"? or what should I use instead, tts seems to only have this model...

What would you suggest to use?

Best,
José

sophialagerkranspandey assigned RogerBarreto Feb 24, 2025

sophialagerkranspandey added the .NET Issue or Pull requests regarding .NET code label Feb 24, 2025

sophialagerkranspandey added the needs_port_to_python Indicate this item needs to also be done for Python label Feb 24, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.Net: tts - audio generation: what model to use to generate >4min (longer audio)? or audio at all (tts-hd to deprecate on Sat, Mar 1, 2025) #10655

.Net: tts - audio generation: what model to use to generate >4min (longer audio)? or audio at all (tts-hd to deprecate on Sat, Mar 1, 2025) #10655

sophialagerkranspandey commented Feb 24, 2025

.Net: tts - audio generation: what model to use to generate >4min (longer audio)? or audio at all (tts-hd to deprecate on Sat, Mar 1, 2025) #10655

.Net: tts - audio generation: what model to use to generate >4min (longer audio)? or audio at all (tts-hd to deprecate on Sat, Mar 1, 2025) #10655

Comments

sophialagerkranspandey commented Feb 24, 2025

Discussed in #10645